Big Data Analytics | Infrastructures for exploiting data

Big Data is the buzzword everyone talks about since it concerns every human activity generating large quantities of digital data (e.g., science, government, economy). It is true that everyone sees behind the term a data deluge for processing and managing big volumes of bytes (Peta 10¹⁵, Exa 10¹⁸, Zetta 10²¹, Yotta 10²⁴, etc.). But beyond this superficial vision, there is a consensus about the three V’s characterizing Big Data: Volume, Variety (different types of representations: structured, not-structured, graphs, etc.), and Velocity (streams of data produced continuously).

Big Data forces to view data mathematically (e.g., measures, values distribution) first and establish a context for it later. For instance, how can researchers use statistical tools and computer technologies to identify meaningful patterns of information? How shall significant data correlations be interpreted? What is the role of traditional forms of scientific theorizing and analytic models in assessing data? What you really want to be doing is looking at the whole data set in ways that tell you things and answers questions that you’re not asking. All these questions call for well-adapted infrastructures that can efficiently organize data, evaluate and optimize queries, and execute algorithms that require important computing and memory resources.