Comparative study of parallel programming models

Objective

The following activities are intended to reinforce your knowledge of programming control flow and data flow-based solutions for parallelizing the processing of data using the Hadoop and Spark engines.

To Do

Part I

Have a try on deploying Hadoop on Colab and executing a simple count words map-reduce program [here]

Part II

Have a try on executing using PySpark environment on Colab and executing an example of a Spark program counting words [here]

Part III (Optionnelle)

  • Compare the principle of control flow and dataflow-based programming models using MapReduce – Hadoop and Spark engines (5 points)
    • State the principle of each the control and data flow-based programming model.
    • Choose an example to compare a solution using each model.
    • State the principle of program execution of each Hadoop and Spark engines.
  • Define, illustrate, and discuss the implication of the lazy evaluation strategy regarding Spark programs performance (5 points)
  • Compare this lazy evaluation strategy with the execution strategy of map reduce programs executed by Hadoop (5 points)
  • Compare the implicit and explicit strategies adopted by Hadoop and Spark for dealing with data persistence, caching, sharing, and shuffling and the implication of these strategies in the design and execution of programs (10 points).

To Hand In

Prepare a PDF document with your responses and upload it on e-Campus. Do not hesitate to use examples and (running) code to support your answers. The idea is to show as much as possible that you feel comfortable with both programming models and execution platforms. Make sure that you make the difference between the programming model and the execution engine. Exhibit that you understand that data structures are the backbone of the principles behind models and engines.

Expected quality

The report must be well and logically organised. Grammar and orthography must be correct as much as possible. Do not hesitate to use Grammarly or similar systems for verifying English. Consider sentence structure (object verb object), avoid adjectives, adverbs, and exaggerating with connectors.

Diversity and Inclusion

Consider using inclusive language (even if it can take sometimes more characters to write) for your report. 

++ Inclusion and diversity in writing https://dbdni.github.io/pages/inclusivewriting.html 

Use adapted fonts and fonts size. If you use images or icons, including human characters, make sure that you avoid gender, race, and socio-economical stereotypes. Also, consider the size, clarity, and quality of the images.