Executing Map Reduce Programs on Hadoop Environments

Objective

The general objective of this exercise is to perform the first steps on the use of a Hadoop Environment for executing map-reduce programs (written in Python). This first exercise with show how to install a one node Hadoop setting on Collab and observe how to implement and run a map-reduce program.

Material

 

Description

The main steps of the exercise are very simple. At first, this exercise does not run on a cluster but on one CPU allocated by default by google cloud. It helps to concentrate on the way the map and reduce functions are specified and how a program is designed on the map-reduce model.

To Do and To Hand In

  • Propose a UML component diagram of the Hadoop environment installed on Collab.
  • Propose a UML component diagram of the two map-reduce count words programs tested in the lab.
  • Explain how the first example implementing a grep operation with a regular expression is executed.
  • Explain the way the program “count words” is executed in the example.
  • What is the role of google drive in these examples?