CHALLENGES

  1. Metadata modelling to answer the question of how to structure and organize the metadata related to Life and Earth Sciences, namely, seisms and biodiversity. The metadata model must make the content of a data lake findable, accessible, interoperable, and reusable (FAIR Principles [6]). Most proposed models being either logic-based or graph-structured, the project will rely on the goldMEDAL metamodel proposed by the SID group of the ERIC Lab [3, 5] that will be specialised for seismic geophysics data and biodiversity. Associating metadata will be done through a data curation process considering quantitative and qualitative perspectives. Metadata will represent the data’s structural, semantics and contextual aspects (provenance, conditions and hypothesis in which analytics results are obtained, i.e., ML-driven metadata). A side “product” will also be the identification of invariants in other data lakes to later “industrialize”, i.e., semi-automatically deploy research data lakes.
  2. Since experiments require several data collections, the project will also address a data integration challenge within the data lake data that must be done within a pipeline that includes data discovery, exploration, selection, and integration. This process will be proposed and designed according to the requirements of Life and Earth sciences experiments addressed by the Database group at LIRIS [2]. The heterogeneity of the data (textual, signals, multimedia, proprietary formats stemming from seismographs), the velocity of data often produced as streams in the case of seismic sensors besides the volume are the aspects that will call for original contributions in the design, maintenance and exploration of data lakes.
  3. The entry point will be two pilot experiments, namely (1) the classification process of seismic signals collected by stations throughout different observations for detecting “natural” and human-produced seisms in the Northern region of Brazil; (2) classification of in-situ observations of the “carabela portuguesa” jellyfish and the modelling of its behaviour in the Brazilian coasts.