Part I (5 hours)
Introduction [ pdf ]
- Principle and objective
- Dealing with huge data collections
- Practical exercise 1: thinking map-reduce counting words with Pig
Part II (10 hours)
Summarization patterns [ pdf ]
- Numerical summarization
- Inverted Index summarization
- Counting with counters
Filtering patterns
- Filtering
- Bloom
- Top ten
- Distinct
Part III (12 hours)
Data organization patterns
- Structured to hierarchical
- Partitioning
- Binning
- Total order sorting
- Shuffling
Part IV (12 hours)
Join patterns
- Do U remember how to program joins?
- Reduce side join
- Replicated join
- Composite join
- Cartesian product
Part V (4 hours)
Wrapping up: analysis, limitations, perspectives [ pdf ]
- Discussion material [ pdf ]
Top