CONTRIBUTION

The originality of the project is to address the construction of a data lake that will include (1) raw collected data representing Life and Earth sciences phenomena (streams, batch, multimedia, proprietary); (2) data produced along data-driven experiments adopting data science techniques including artificial intelligence algorithms (ML-driven data lakes); and (3) contextual data describing the conditions in which data are collected and experiments are designed and enacted. The data lake will provide data curation modules for extracting metadata according to a well-adapted model and modules exploring data and using them for designing new experiments, thereby adopting an open science perspective.

The contribution will be a data lake with a well-adapted model for metadata about Life and Earth Sciences experiments consuming and producing quantitative and qualitative data. We will define associated exploration operators and pipelines to exploit the content for further maintaining and developing new experiments. The data lake will be tested in real scenarios through collaboration with domain experts in seismology and biodiversity studies in Brazil.