Context and objective

This final exercise is intended to promote your creativity asking to propose a solution for a multidisciplinary problem that can be addressed by analyzing data collections for supporting digital experiments that aim at understanding complex systems.


  1. Exhibiting violence through communities’ analysis in social networks
  2. Analyzing the people behaviour in the city
  3. Who is who in the net: Computing and modelling people digital footprint
  4. Analyzing the technological gap: exploiting the Neubot data set
  5. Solving the Bach puzzle: profiling and authenticating authorship in art

To Do

  1. Model and design a general strategy for addressing the problem defined in the project you were assigned. Your strategy must include:
  • The computational problem statement that you will address. Of course, since the project would require multidisciplinary teams, you must only state the aspect that can be addressed with data science strategies.

(15 points)

  • Specification of the principle of the solution: hypothesis, functional architecture, etc. Use UML or a formal tool for specifying your solution.

(15 points)

2. Show concrete elements of your solution that demonstrate the use of some of the tools studied in class.

  • Considering the sources or data collections that we provided for your project, propose a workflow that coordinates the execution of 4 queries in natural language and in Pig for exploring its content and responding to the requirements stated in the project.

(11 points, 5 extra points for an implemented example)

  • Since you are modelling solutions related to complex systems you can identify a portion of data that ca me dolled as graph. Give an example on how you would build the graph using GraphX. Give examples of queries that can address some analytics related to your project. If you can, implement an example using GraphX.

(12 points, 5 extra points for an implemented example)

  • Based on the data analytics algorithms studied in class, propose an experimental setting using Spark that can help to provide a (partial) solution.

(12 points, 5 extra points for an implemented example)

  1. The queries proposed in question (a.) might have given you some intuition to determine which attribute can be best adapted for sharding your collection with one of the three strategies studied in the MongoDB case. Propose one and justify.

(15 points)

  1. The last part of the course studies the importance of choosing pertinent visualization metaphors for exploring data whether it is consolidated or raw. Choose two metaphors that can be adapted to visualize some of the results of your exploration or analytics tasks. Justify.

(15 points)