Use Case 2: Dataflow based programming

Spark programming

Objective

The objective of this use case is to go further into the understanding of the principles of parallel programming with Spark. Particularly analysing aspects regarding the coordination of the way data are exchanged among the different workers and stages.

To Do

Continue working in a group.

  1. Complete the exercise done in class as a first approach to the programming in Spark:

https://docs.google.com/document/d/1FMnUtJI-sBXIQzrR9wXYSm6wpmagAxYdBOYI7o2gRT0/edit?usp=sharing

  • Complete and answer the questions of the analysis proposed in the following exercise 

https://drive.google.com/file/d/1I1w1wBKaBSBjXe3Lx-EHkYU8kMPRlPf6/view?usp=sharing

To Hand In

  • For question 1 continue using the shared google doc to complete your answers.
  • For question 2 generate a new document with your answers. Do not forget to write the name of the members of your group.

Share the links to the documents on the blackboard platform. Each member of the group must share the links personally even if you worked collaboratively.

Expected quality

The report must be well and logically organised. Grammar and orthography must be correct as much as possible. Do not hesitate to use Grammarly or similar systems for verifying English. Consider sentence structure (object verb object), avoid adjectives, adverbs, and exaggerating with connectors.

Diversity and Inclusion

Consider using inclusive language (even if it can take sometimes more characters to write) for your report. 

++ Inclusion and diversity in writing https://dbdni.github.io/pages/inclusivewriting.html 

Use adapted fonts and fonts size. If you use images or icons, including human characters, make sure that you avoid gender, race, and socio-economical stereotypes. Also, consider the size, clarity, and quality of the images.