Monthly Archives: October 2015

CouchDB: Expressing queries

Requirement

To Hand In

This exercise will be handed in electronically:

  • You will form groups of 2 or three
  • A report with answers and explanations of the principle you adopted in PDF
  • The report will state the JavaScript answers to the questions.

To Do

For this exercise you will use a set of file containing information about the films presented in Grenoble in 2011 (information retrieved from AlloCine API). Each of these files contains the films presented in a cinema of Grenoble at that time (i.e. there is a file per cinema and a total number of 9).

Start by creating and populating a new database in CouchDB using the data contained in the files named allocineGrenobleN.txt with from to 9. For this purpose execute the following commands:

curl -X PUT http://localhost:5984/allocine
curl -T "allocineGrenoble1.txt" http://localhost:5984/allocine/allocineGrenoble1
curl -T "allocineGrenoble2.txt" http://localhost:5984/allocine/allocineGrenoble2 
curl -T "allocineGrenoble3.txt" http://localhost:5984/allocine/allocineGrenoble3 
curl -T "allocineGrenoble4.txt" http://localhost:5984/allocine/allocineGrenoble4 
curl -T "allocineGrenoble5.txt" http://localhost:5984/allocine/allocineGrenoble5 
curl -T "allocineGrenoble6.txt" http://localhost:5984/allocine/allocineGrenoble6 
curl -T "allocineGrenoble7.txt" http://localhost:5984/allocine/allocineGrenoble7 
curl -T "allocineGrenoble8.txt" http://localhost:5984/allocine/allocineGrenoble8 
curl -T "allocineGrenoble9.txt" http://localhost:5984/allocine/allocineGrenoble9 

In your report give the code and a screen shot with the content on the database.

Using this database answer the following questions and do not forget to give:

  • the code of the answer
  • the screenshots with the content of the database

Feel free to use the CouchDB book for getting some inspiration.

Question 1

Note that the data set is quite complete. Define a simplified view in MapReduce that contains for each theatre the films presented in it. Hint: You do not need a “reduce” here.

Question 2

Modify your answer to question 1 for filtering the theatres outside Grenoble downtown (e.g., do not include the theatres in Saint Martin d’Hères).

Question 3

Give the number of films that each theatre is presenting. Hint: You need a “reduce” here.

Question 4

Give the list of films with a press rating higher than 4 stars. Attention: filter duplicates.

Question 5

Give the list of films presented 2 years ago (10.12.2011), and for each film, the theatre where it was presented and its schedule.

Question 7 (BONUS)

Give the list of films and for every film the list of theatres that present it. This question is a challenge but we encourage you to try to solve it.

 Acknowledgment

Dr. Alexandre Termier, University Joseph Fourier originally designed this exercise. It has been slightly modified for this course.

CouchDB: A first touch

This practical exercise is intended to make you reason about the differences of principles between “classic” RDBMS and NoSQL servers for building and managing data. DO NOT expect to acquire an extensive practice of the use of the DBMS used here. Rather expect to be able to:

  • Point out some of their principles concerning: data model, design process, internal and external data management, architecture;
  • Compare these principles with those of classic RDBMS.

Therefore we propose an exercise organized into two parts where you are supposed to run some tests and answer control questions for making sure that you are achieving the expected objective of the exercise.

Requirements

Building and Querying a NoSQL Oriented Database

In this exercise you will populate and query a NoSQL database using data coming from the Deezer RESTful service. In particular, you will use data related with the rock band Muse. For instance, the following links give access to Muse’s albums and information about similar artists (note that Muse has the ID 705 in Deezer):

TO DO

Creating and populating a database

  • Using a terminal, create the database Deezer:
curl -X PUT http://localhost:5984/deezer
  • Download (and save into files) the data about Muse’ album and similar artists:
curl -X GET http://api.deezer.com/artist/705/albums > MuseAlbums.json
curl -X GET http://api.deezer.com/artist/705/related > MuseRelatedAlbums.json
  • Populate the Deezer database with the retrieved information by issuing the following commands:
curl -X PUT http://localhost:5984/deezer/muse_albums --upload-file "MuseAlbums.json"
curl -X PUT http://localhost:5984/deezer/muse_related_artists --upload-file "MuseRelatedAlbums.json"
  • Verify the content of the database:
curl -v http://localhost:5984/deezer/muse_albums
curl -v http://localhost:5984/deezer/muse_related_artists

Note that the output includes the HTTP request (and reply) headers sent to (by) CouchDB.

  • Open Fouton (CouchDB Web Interface) on your browser:
http://127.0.0.1:5984/_utils/index.html
  • Access and observe the database Deezer on Fouton.

Querying a database

Execute the following queries:

  • Query 1. Retrieve the name and the web page of the groups that are similar to the rock band Muse.
Map 
function(doc) {
   
var artists = doc.data;

   if(doc._id == "muse_related_artists") {
      for(var i in artists) 
         emit(artists[i].name, artists[i].link);
   } 
}
  • Query 2. Compute the total number of the albums produced by the rock band Muse (requires to check the reduce check button in Fouton).
Map 
function(doc) {
   
var artists = doc.data;

   if(doc._id == "muse_related_artists") {
      for(var i in artists) emit('muse_albums', 1);
   }
}
Reduce 
function(keys, values, rereduce) {
   return sum(values);
}

Theoretic questions (TO ANSWER)

  • Is it possible to represent data under the classic relational model (see the 1 Normal Form)?
  • Compare the notion of key in a relational schema with respect to the notion of key in key-value NoSQL approaches.
  • Why is it necessary to define views for querying a database in CouchDB? Is this a way of integrating data? Justify.

REFERENCES