Executing Map Reduce Programs on Hadoop Environments

Objective

The general objective of this exercise is to perform the first steps on the use of a Hadoop Environment for executing map-reduce programs (written in Python). This first exercise with show how to install a one node Hadoop setting on Collab and observe how to implement and run a map-reduce program.

Material

Google Collab account
https://github.com/gevargas/bigdata-management/blob/master/Intro_Hadoop.ipynb

Description

The main steps of the exercise are very simple. At first, this exercise does not run on a cluster but on one CPU allocated by default by google cloud. It helps to concentrate on the way the map and reduce functions are specified and how a program is designed on the map-reduce model.

To Do and To Hand In

Propose a UML component diagram of the Hadoop environment installed on Collab.
Propose a UML component diagram of the two map-reduce count words programs tested in the lab.
Explain how the first example implementing a grep operation with a regular expression is executed.
Explain the way the program “count words” is executed in the example.
What is the role of google drive in these examples?

Cloud Computing and Big Data

Theory and Laboratory (LIS-4012, LIS-4112)

Executing Map Reduce Programs on Hadoop Environments

Objective

Material

Description

To Do and To Hand In