DEI activity for the Doctoral Consortium

D&I perspectives in data-driven experiments

Contact

Barbara Catania University of Genoa, Italy
Martina Brocchi, University of Genoa, Italy

Material
Notebook (will be distributed during the session)

Context and objective

Following what was done at ADBIS 2023, we propose a practical activity in the context of the Doctoral Consortium (DC) program at ADBIS 2024 to create awareness and work with PhD students on how to take D&I perspectives when performing data-driven experiments.

Considering that human-related data are the backbone of many computer and data science experiments, the activity’s objective is to make students aware of the implications of using biased human-related datasets during the pre-processing stage of data analytics pipelines. 

Background

A dataset can be considered biased if it does not represent the actual population for the values of specific sensitive attributes of interest, called protected or sensitive attributes. Identifying protected attributes depends on the domain; typical examples are race, gender, and age. Bias can be detected by computing specific fairness-related metrics, which compare in various ways the distributions of the groups of interests, or checking for under-represented parts of the target population in the dataset, taking or not taking the analytical task to be performed into account (algorithmic vs representation bias). In both cases, bias can originate from how and where the data was initially collected, or it can be introduced, sometimes amplified, during the data preparation steps preceding any analytical task. Working with data that are not representative of a given population could make the outcome of the decision system for that population unreliable. In some other situations, even if the result is trustworthy, it might be illegal or unwanted to base any decision on such attributes for domain-dependent reasons.

Learning outcomes

During the proposed D&I activity,  students will learn:

  • understand the metrics for bias detection during a machine learning task;
  • understand how bias can be mitigated before applying a machine learning task, with a particular reference to classification (algorithmic bias);
  • understand how bias can be introduced or amplified by data transformation operations, typical of the data preparation stage, independently from the following analytical steps (representation bias).

Tasks to do

The activity is organised into three stages:

  1. The reference topic will be briefly presented at the beginning, focusing on the main concepts forming the basis of the following practical activity.
  2. A practical but completely guided activity, based on a Google Colab notebook, will then be proposed to students. 
    N.B. depending on their initial skills, students can customise and extend some parts, investigating specific issues in more detail.
  3. Students will finally present to the audience the potential impact of what they learned during the activity on their research.