Big Data Fest | Data, tools & practice

Big Data is the buzzword everyone talks about: press, TV, the research world and the industrial arena. It concerns every human activity generating (large quantities of) digital data and it has been declared a priority topic by many governments that see data science as a medium for activating economy. However, it is still difficult to characterize the Big Data phenomenon, since different points of view and diverse disciplines attempt to address it.

Big Data forces to view data mathematically first and establish a context for it later. This calls for well-adapted infrastructures that can efficiently organize data, evaluate and optimize queries, and execute algorithms that require important computing and memory resources. With the evolution towards the cloud, data management requirements have to be revisited. In such setting it is possible to exploit parallelism for processing data, and thereby increasing availability and storage reliability thanks to replication.

Organizing Big Data in persistence supports (cache, main memory or disk), dispatching processes and producing and delivering results implies having efficient and well-adapted data management infrastructures that still are not completely delivered in existing systems. Therefore it is important to revisit and provide infrastructures and systems architectures that cope with Big Data characteristics. The key challenge of these infrastructures and systems is to hide the complexity of the access and management of Big Data but also to provide interfaces for tuning them according to application requirements.