Debate on data exploration


  • Hands – on Pig
  • L. Kersten, S. Idreos, S. Manegold, and E. Liarou, “The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds,” Proc. of the VLDB Endowment (PVLDB), vol. 4, no. 12, 2011


Consider that you have access to new releases of the Neubot collection including countries in different regions of the world (e.g., Asia, Latin America, Middle East, etc.). A user would like to explore these releases to determine which are the regions available and ask queries that could let her understand the technological gap in the different regions of the world and at different grains (city, region, country). In her analysis, the technology gap would be defined as:

(1) the connections speed to which people have access in different infrastructures;

(2) the availability of different Internet providers;

(3) the balanced access to bandwidth at different moments in the day.

To Do

Recall the 5 data collection exploration strategies proposed in the paper “The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds,”:

  1. One-minute database kernels for real-time performance.
  2. Multi-scale query processing for gradual exploration.
  3. Result-set post processing for conveying meaningful data.
  4. Query morphing to adjust for proximity results.
  5. Query alternatives to cope with lack of providence.

In groups of 3 or 4 people maximum, perform a 15 minutes debate considering the following.

For each strategy give a solution example scenario in the context of the Pig Hands On that could help the user do her analysis about the technological gap as defined in the previous section. If you consider that a strategy cannot be applied to this use – case, explain and justify why.

  • Prepare a document / slides summarizing your examples and discussion results.
  • Choose a spokesperson of the group who will present your conclusions to the group.
  • Hand in your document / slides to