HANDS-ON: LAB

Complete syllabus here: LIS-4112

  • Definition and reference architecture [slides] [YouTube]
  • Use of storage and distributed computing resources
    • Data Labs and first experience with (big) data processing pipelines
      • [Environment][YouTube]
      • Getting started with the data science ecosystem [HO-1]
      • Exploring data collections using descriptive statistics [HO-2]
      • Analysing graphs [HO-3]
    • Data distribution: distributed file systems (e.g., HDFS), NoSQL/NewSQL
    • Parallel programming models (MapReduce) and execution environments (e.g., Hadoop, Spark) [YouTube]
  • Virtualization techniques: hypervisors vs. containers, distributed resource brokers
  • Cloud virtual machines for data science (big data analytics) [slides][YouTube]