{"id":593,"date":"2025-12-07T17:07:25","date_gmt":"2025-12-07T17:07:25","guid":{"rendered":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/?page_id=593"},"modified":"2025-12-07T17:39:37","modified_gmt":"2025-12-07T17:39:37","slug":"challenge-1-data-cleaning-outlier-detection-for-smart-city-energy","status":"publish","type":"page","link":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/challenge-1-data-cleaning-outlier-detection-for-smart-city-energy\/","title":{"rendered":"Challenge 1 \u2013 Data Cleaning &amp; Outlier Detection for Smart-City Energy"},"content":{"rendered":"\n<p>You are part of a smart-city analytics team. The city has installed smart meters in several residential buildings and a public school. The goal is to understand typical and atypical daily energy consumption patterns as a first step towards intelligent demand management.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Learning Objectives<\/h2>\n\n\n\n<p>&#8211; Perform basic data-quality checks on a real energy dataset.<br>&#8211; Handle missing values from a statistical perspective.<br>&#8211; Transform wide smart-meter data into a tidy daily dataset.<br>&#8211; Use k-means clustering to identify atypical (outlier) days.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Material<\/h2>\n\n\n\n<p><strong>Dataset<\/strong>: We use the <em>Household Data<\/em> package from Open Power System Data (OPSD), 60\u2011minute resolution.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Package page: https:\/\/data.open-power-system-data.org\/household_data\/2020-04-15\/<\/li>\n\n\n\n<li>Direct CSV (60\u2011minute, single index):&nbsp;&nbsp;<a href=\"https:\/\/data.open-power-system-data.org\/household_data\/2020-04-15\/household_data_60min_singleindex.csv\">https:\/\/data.open-power-system-data.org\/household_data\/2020-04-15\/household_data_60min_singleindex.csv<\/a><\/li>\n<\/ul>\n\n\n\n<p>Initial notebook to open in colab: <a href=\"https:\/\/drive.google.com\/file\/d\/1cZdDM2zz9dnwd-oUzaczpPgExjSIdLxn\/view?usp=sharing\">https:\/\/drive.google.com\/file\/d\/1cZdDM2zz9dnwd-oUzaczpPgExjSIdLxn\/view?usp=sharing<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Main Tasks<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Load the OPSD 60-minute household data CSV into a pandas DataFrame, parse the timestamp, and inspect the structure.<\/li>\n\n\n\n<li>Perform data-quality checks: compute missing-value ratios, identify problematic columns, and choose a cleaning strategy (e.g. dropping high-NA columns, imputing with forward-fill\/backward-fill or interpolation), with justification.<\/li>\n\n\n\n<li>Select several building-level grid-import variables (e.g. DE_KN_residential1\/3\/4_grid_import, DE_KN_public1_grid_import) and aggregate them from hourly to daily energy consumption using resampling.<\/li>\n\n\n\n<li>Transform the data from a wide format to a tidy long format with at least: date, building_id, daily_grid_import_kwh, building_type (e.g. residential or school), and location (e.g. urban or suburban).<\/li>\n\n\n\n<li>Use k-means clustering on daily energy plus simple calendar features (e.g. day_of_week) to identify clusters of days and detect potential outlier days (the smallest cluster).<\/li>\n\n\n\n<li>Store the resulting daily dataset as energy_daily_features.csv for use in Challenge 3 in your Google Drive or locally.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Deliverables (Challenge 1)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Completed Jupyter\/Colab notebook implementing the data cleaning, aggregation, and k-means clustering steps.<\/li>\n\n\n\n<li>Tidy CSV file<strong> energy_daily_features.csv<\/strong> containing the daily energy dataset with metadata.<\/li>\n\n\n\n<li>Short written justification (within the notebook) of the data-cleaning decisions and a brief interpretation of the detected outlier days.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>You are part of a smart-city analytics team. The city has installed smart meters in several residential buildings and a public school. The goal is to understand typical and atypical daily energy consumption patterns as a first step towards intelligent demand management. Learning Objectives &ndash; Perform basic data-quality checks on a real energy dataset.&ndash; Handle [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-templates\/full-width.php","meta":{"footnotes":""},"class_list":["post-593","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/593","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/comments?post=593"}],"version-history":[{"count":2,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/593\/revisions"}],"predecessor-version":[{"id":597,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/593\/revisions\/597"}],"wp:attachment":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/media?parent=593"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}