TUTORIAL – AICCSA’23

This tutorial introduces provenance-aware data curation techniques for hybrid (classic and data-driven) methods used in scientific datasets. We will illustrate our proposal by curating earth and biodiversity data collected using different strategies. We will show how provenance gives insight into the degree of trustability of content produced throughout the tasks to address earth and biodiversity problems. Considering data provenance to provide insight into the conditions in which earth and biodiversity events and phenomena are identified.

  • Introduction: from scientific workflows to data science pipelines for addressing experimental sciences challenges
  • Curation as an enabling action for maintaining content produced in scientific practice [PDF]
    • Quantitative and qualitative methodologies: curating content and processes
    • Provenance as a transversal category for data-driven methods
    • Biodiversity and Earth science provenance-based curation
      • Data wrangling techniques
      • Scientific repositories
  • Data provenance [ PDF ]
    • Provenance in databases: lineage, why-provenance, how-provenance, and where-provenance [CCT09]
    • Capturing provenance in scientific workflows [ZAI19]
    • Provenance within data science pipelines: debugging pipelines [LFS+23] and ensuring fairness in developing machine learning models [GGSS21]
  • Use case: curating earth and biodiversity phenomena detection applications [PDF]
    • Provenance in Earth Sciences
    • Seismic data wrangling
  • Final takeaways