HOME

Huge collections of heterogeneous data have become the backbone of scientific, analytic, and forecasting processes. The enactment of these processes must balance the delivery of different types of services such as (i) hardware (computing, storage, and memory), (ii) communication (bandwidth and reliability) and scheduling (iii) greedy analytics and mining with high in-memory and computing cycles requirements. Therefore, it has been essential to revisit and provide infrastructures and systems architectures that cope with Big Data characteristics. 

The key challenge of these infrastructures and systems is to hide the complexity of the access and management of Big Data and provide interfaces for tuning them according to application requirements. The cloud is an example of such architecture. It has enabled the emergence of data science environments (e.g., Microsoft ML environment) that have focused on efficiently providing computing resources required for processing data through greedy analytics algorithms. Data management is still an open and key issue beyond the execution of such tasks using parallel models and their associated technology. How to distribute and duplicate data across CPUs/GPUs farms for ensuring their availability for executing parallel processes? How should data be organized (loaded and indexed) in main memory to perform efficient data processing and analytics at scale?

This course (theory) will focus on data management and processing on cloud architectures. Therefore, it will introduce cloud computing fundamental concepts, DevOps aspects to acquire experience in designing cloud solutions adapted for managing and processing data collections at scale. The theory introduced throughout the lessons will be illustrated using practical examples based on existing cloud environments and execution models.

The associated lab will be a playground to test and develop solutions for managing and processing data on cloud architectures. Therefore, the laboratory will introduce techniques, strategies, and best practices for configuring environments on the cloud adapted to specific technical and economic requirements. The objective will be to develop advanced computing systems engineering skills using major cloud and data management solutions.