CONTEXT

 

Data and information management has become an important issue in modern society. It is also at the origin of numerous scientific and computing challenges. Data are distributed, pervasive, and their volume and heterogeneity are continuously growing. People create 2.5 Exabytes per day! In the era of Big Data the challenge is to master the management of big data collections for  ensuring return of interest to society.

Objective

The objective of the course is to study the main aspects of the distributed management of data and their analysis (data mining) considering the use of heterogeneous data management systems “SQL” and nonSQL styles.

Data heterogeneity will be addressed according to different perspectives including integration of sources, persistency through polyglot approaches. We will study the evaluation of multi-source declarative queries as well as the programming of queries and algorithms under the Map Reduce paradigm. This paradigm is particularly used on cloud and cluster architecture for managing large data collections.