RESEARCH

Verifying an intersectional, diverse, and inclusive analytics in federated learning

The intersectional perspective for addressing diversity and inclusion can be defined as a multi-objective optimisation problem. In this problem, constraints associated with different data attributes must be “protected” so that their statistical distribution remains “balanced” to some extent along all the steps of a data science pipeline implemented in a federated learning setting. For example, if I privilege a gender, do I harm race, age or location? Given the independence of the nodes participating in a federated learning setting, each node performs a pipeline and must declare to which extent it guarantees that the data and the analytics process address a multi-objective bias level according to a specification coming from a global research question. This leads to the research questions addressed in the project: 

(RQ1) How to estimate bias metrics in the data managed and produced models in each node? Different nodes can manage heterogeneous data that do not necessarily have all the attributes that should be protected from bias. Therefore, the global level must be sure that aggregating the produced models’ bias constraints can still be verified. The U. Genova and DB groups at LIRIS will contribute with their expertise to address this question.

(RQ2) How to assess compliance with an expected fairness index (level of bias in data and models) in the federation? We will inspire from techniques like mean difference and disparate impact, adapting them to deal with multiple variables at the local and global levels of the federation. The expertise of U. Genova will guide the rest of the partners to address this question.

(RQ3) How do we negotiate a node’s participation in the federation such that an expected global fairness index can be fulfilled even though the node might partially fulfil such expectation? The teams BD, SOC and DRIM, together with U. Milano, are currently working on a general proposal for certification-based trust negotiation. This work will serve as a basis to address this question.

(RQ4) How to verify the fulfilment of an intersectional specification of the result and process performed under federated learning gathering the minimum data about the condition in which partial models were produced? This challenge calls for the expertise of all partners.

(RQ5) What are the implications of deploying the execution of subtasks in specific target architectures with different computing and storage resources that can introduce different precisions in the metrics assessed? The expertise of the teams SOC, DRIM and UDLAP-LNS will contribute to address this question.