HANDS-ON: LAB | Cloud Computing and Big Data

Complete syllabus here: LIS-4112

Definition and reference architecture [slides] [YouTube]
- Principle and reference architecture [slides]
- Service as unit of construction [URL] [HO-1][YouTube][YouTube][YouTube]
- Pay as you go models
Use of storage and distributed computing resources
- Data Labs and first experience with (big) data processing pipelines
  - [Environment][YouTube]
  - Getting started with the data science ecosystem [HO-1]
  - Exploring data collections using descriptive statistics [HO-2]
  - Analysing graphs [HO-3]
- Data distribution: distributed file systems (e.g., HDFS), NoSQL/NewSQL
- Parallel programming models (MapReduce) and execution environments (e.g., Hadoop, Spark) [YouTube]
  - Hadoop: Exercise-1
  - Spark: Exercise-2
Virtualization techniques: hypervisors vs. containers, distributed resource brokers
- DevOps Introduction: Virtual machines for running Spark programs on MS Azure [slides][Exercise][YouTube]
- Componentization: playing with docker [slides] [Exercise-1][Exercise-2][YouTube][YouTube]
Cloud virtual machines for data science (big data analytics) [slides][YouTube]