CONTENT | Cloud Computing and Big Data

Complete syllabus here: LIS-4102

Introduction: dealing with data at scale [slides] [YouTube][YouTube-2]
- Datification and Data properties
- Data-centric applications at scale
- Computing centres: hardware and resources delivery

Distributed data management and storage
- Cluster based data stores [slides][YouTube] [YouTube-2]
  - [MongoExamples] [slides][slides-2][slides-3]
    - Querying: [YouTube-1] [YouTube-2][YouTube-3][YouTube-4]
    - Sharding: [YouTube]
  - Graph databases [slides] [YouTube]
    - Cypher [YouTube]
    - [Neo4JExampl e]
  - [Polyglot UseCase]
  - Non-functional properties: concurrency, eventual consistency, …
- Distributed archival systems [slides]
  - Distributed File Systems
  - Data Labs
  - Data Lakes

Big data processing and analysis Parallel programming models [YouTube] [YouTube][YouTube][YouTube][YouTube] [YouTube][YouTube]
- Map Reduce: families of algorithms and patterns [slides – part A] [slides – part B][Glossary]
- Data flow-based models: operators, data representation, management [slides-part A] [slides-part B]
  - Spark programming Use Case [exercise]

Ecosystems for massive data management and processing
- Virtualisation [slides][YouTube][YouTube]
- Containers [slides] [YouTube] [YouTube]
- High-performance architectures: cluster, HPC, cloud, fog, edge, just in time architectures [slides]

Perspectives: open problems and trends [slides][YouTube-1][YouTube-2]
- Data processing and data management divide
- Data processing workflows: design, test, deployment, and maintenance