{"id":595,"date":"2025-12-07T17:09:30","date_gmt":"2025-12-07T17:09:30","guid":{"rendered":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/?page_id=595"},"modified":"2025-12-07T17:39:46","modified_gmt":"2025-12-07T17:39:46","slug":"challenge-2-data-quality-fairness-sql-sampling-with-aif360","status":"publish","type":"page","link":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/challenge-2-data-quality-fairness-sql-sampling-with-aif360\/","title":{"rendered":"Challenge 2 \u2013 Data Quality, Fairness &amp; SQL Sampling with AIF360"},"content":{"rendered":"\n<p>You are designing an AI system that will help a city decide which households should be offered <em>energy-efficiency support measure<\/em>s (e.g. subsidies, home retrofits). Before deploying any model, you must evaluate and <em>mitigate potential bias <\/em>in the training data.<\/p>\n\n\n\n<p>You now consider a socio-economic dataset that could be used to decide which households are eligible for smart-city energy-efficiency support measures (e.g. subsidies or home retrofits). Before training any model, you must evaluate and mitigate bias in the data.<\/p>\n\n\n\n<p>This challenge introduces fairness concepts using the Adult Census Income dataset and the IBM AI Fairness 360 (AIF360) library, and connects to data management via SQL-based fair sampling.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Material<\/h2>\n\n\n\n<p>To focus on fairness methods, we use the<em> Adult Census Income <\/em>dataset from the UCI Machine Learning Repository, which is bundled with IBM\u2019s <em>AI Fairness 360 (AIF360)<\/em> toolkit.<\/p>\n\n\n\n<p>&#8211; UCI Adult dataset page:<a href=\" https:\/\/archive.ics.uci.edu\/dataset\/2\/adult\"> https:\/\/archive.ics.uci.edu\/dataset\/2\/adult<\/a><br>&#8211; Colab initial notebook: <a href=\"https:\/\/drive.google.com\/file\/d\/1nrMxSxZ7pD9oX7uSH5TrZk8pZ_xs16Pl\/view?usp=sharing\">https:\/\/drive.google.com\/file\/d\/1nrMxSxZ7pD9oX7uSH5TrZk8pZ_xs16Pl\/view?usp=sharing<\/a><\/p>\n\n\n\n<p>Think of the label \u201chigh income\u201d as a proxy for the ability to invest in energy-saving technologies, and demographic attributes (sex, race, etc.) as potential sources of unfair bias in a smart-city programme.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Learning objectives<\/h2>\n\n\n\n<p>&#8211; Perform basic quality checks on a socio-economic dataset.<br>&#8211; Use AIF360 to compute fairness metrics for a binary outcome.<br>&#8211; Apply the <em>Reweighing<\/em> algorithm to reduce bias in the dataset.<br>&#8211; Implement <em>fair sampling<\/em> strategies in SQL (balanced sampling by group).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Main Tasks<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Load the Adult dataset via the AIF360 AdultDataset class and convert it to a pandas DataFrame.<\/li>\n\n\n\n<li>Perform basic data-quality checks: inspect shape, columns, missing values, and descriptive statistics, and comment on potential quality issues.<\/li>\n\n\n\n<li>Define a protected attribute (sex) with privileged and unprivileged groups (Male vs Female) and use Binary Label Dataset Metric to compute fairness metrics such as statistical parity difference and disparate impact.<\/li>\n\n\n\n<li>Apply the Reweighing preprocessing algorithm to obtain a reweighted dataset that aims to reduce bias; recompute fairness metrics and compare to the original dataset.<\/li>\n\n\n\n<li>Create an in-memory SQLite database from the DataFrame and implement a balanced sampling query using window functions (ROW_NUMBER() OVER (PARTITION BY sex ORDER BY RANDOM())) to build a training sample with equal representation of each sex.<\/li>\n\n\n\n<li>Convert the SQL-balanced sample back into an AIF360 Binary Label Dataset, and recompute the fairness metrics to compare with the original and reweighted datasets.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Deliverables<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Completed Jupyter\/Colab notebook implementing fairness metrics with AIF360 and SQL-based fair sampling.<\/li>\n\n\n\n<li>Short written comparison of fairness metrics for three cases: original dataset, reweighted dataset, and SQL-balanced sample.<\/li>\n\n\n\n<li>Brief reflection (within the notebook) on how these fairness techniques could be applied in energy-related decision-making (e.g. targeting energy-efficiency support).<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>You are designing an AI system that will help a city decide which households should be offered energy-efficiency support measures (e.g. subsidies, home retrofits). Before deploying any model, you must evaluate and mitigate potential bias in the training data. You now consider a socio-economic dataset that could be used to decide which households are eligible [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-templates\/full-width.php","meta":{"footnotes":""},"class_list":["post-595","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/595","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/comments?post=595"}],"version-history":[{"count":3,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/595\/revisions"}],"predecessor-version":[{"id":599,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/595\/revisions\/599"}],"wp:attachment":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/media?parent=595"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}