Challenge 3 – Modelling Smart-City Energy Consumption with ML

The city now wants a predictive model to forecast daily electricity demand across several buildings. Such a model is a building block for intelligent energy management (e.g. demand response, storage control, tariff design).

In this challenge, you will build regression models using the daily dataset created
in Challenge 1, and then analyse the fairness of prediction errors across building
groups. You will also connect back to the fairness logic from Challenge 2 by defining
simple fairness-of-error metrics.

Material

Dataset: `energy_daily_features.csv` – created in Challenge 1.
– One row per building and day
– Contains at least:
    – `date`
    – `building_id`
    – `building_type`
    – `location`
    – `daily_grid_import_kwh`
    – (optionally) `cluster` from k-means.

Colab initial notebook: https://drive.google.com/file/d/1ij7ZWsZ2SIrs5kuRBqy1hUYUV_nTuLwh/view?usp=sharing

Learning objectives

– Engineer time-based and lag features for energy-demand modelling.
– Train and evaluate baseline and non-linear regression models.
– Interpret feature importance in an intelligent energy/smart-city context.
– Evaluate whether prediction errors are balanced across building groups (fairness of errors).

Main Tasks

  1. Load the energy_daily_features.csv dataset into a DataFrame, inspect its contents, and check how many buildings and days are available.
  2. Engineer time-based features: day_of_week, month, is_weekend, and per-building lag features such as lag1 and lag7 for daily_grid_import_kwh; drop rows with missing lags.
  3. Build a feature matrix X (including calendar features, lag features, and optionally one-hot encoded building_type and location) and a target vector y (daily_grid_import_kwh). Split the data into train and test sets (e.g. 80/20).
  4. Train and evaluate a baseline Linear Regression model using RMSE and R², and visualise predicted vs actual daily energy consumption on the test set.
  5. Train and evaluate a RandomForestRegressor on the same features, compare its RMSE and R² to the linear model, and analyse feature importances (e.g. are lag features more important than calendar features?).
  6. Construct a test_results DataFrame that includes building_id, building_type, location, y_true, y_pred_lin, y_pred_rf, and absolute errors (abs_error_lin, abs_error_rf). Compute group-wise mean absolute error (MAE) by building_type and by location to assess fairness of errors.
  7. Optionally, design simple fairness metrics for regression inspired by Challenge 2 (e.g. fairness gap = max MAE − min MAE across groups, fairness ratio = max MAE / min MAE) and compare them between the two models.

Deliverables

  • Completed Jupyter/Colab notebook implementing feature engineering, Linear Regression, Random Forest, and group-wise error analysis.
  • Tables and/or plots showing predicted vs actual values, feature importances, and group-wise mean absolute errors.
  • Short written interpretation of which model performs better, which features matter most, and whether some building groups have systematically higher prediction errors.
  • If the optional part is done: definition and computation of regression fairness metrics (e.g. fairness gap, fairness ratio) and comparison between models.