Notes: + Further instructions available on google docs
Using big data analytics stacks for managing data collections
- Ex 1 Studying Data Sharding using MongoDB
- Sharding strategies
- Sharding on-cloud
- Sharding-MongoDB-dabate
Exploring Data Collections: control flow & data flow oriented approaches
- Ex 2 Analyzing Large Data Collections with Apache Pig
- Pig Fundamentals
- Corollary: Data exploration
- Ex 3 Processing data streams [2DO2HANDIN]
- Ex 4 Playing with graphs using GraphX (slides) (data.zip) (exercise and solution)
- Ex 5 K-means with Spark & Hadoop [KDD Cup description + data] (slides)
Understanding data collections: machine learning & visualisation
- Ex 6 Processing column oriented data [Spark SQL] (slides)