KNOWLEDGE CONTROL

  • Define the notion of “Datification”? In which way is it a revolution with respect to smart environments?
  • Define the characteristics of data centric sciences? What is the role of data for them? What are the two components that make them a new generation of experimental sciences?
  • Define the notion of Big Data. In your opinion how does this notion opens new challenges to data management?
  • Give 5 properties that characterise Big Data ? Explain in which way they are challenging for managing data?
  • In the case of your domain of expertise,  how does Big Data opens novel possibilities or problems/challenges?
  • In terms of multi-dabases used for storing data collections, which are the challenges related to query rewriting in such setting? Design an example of data collections stemming from a smart building or a smart city quarter that can be stored in different databases and that are then queried in the spirit of distributed queries.
  • Data science issues

    • Describe the general methodology of data science? What is its objective?
    • What is a Web IDE? What does IDE stand for? What is a notebook?
    • Give a general description of a Data Science virtual machine
    • Give the general functional architecture showing how does Azure Notebooks communicates with GitHub and with the Python interpreter in the setting used for experimenting in the lab sessions?

    Defining a tabular view of a data collection

    • What is a DataFrame? Define a DataFrame that shows the readings of home appliances energy consumption when they are used according to the following schema:
    <applianceName, initialdate, initialhour, finaldate, finalhour, consumedWatts>

    Manipulating data

    • Consider the operations that can be applied on top of tabular data structures like projection (retrieving a subset of columns/attributes), selection (retrieving a subset of records) and filter (retrieving a subset of records given a condition). Which are the  operators provided by Pandas that implement these operations for DataFrame? What is the result type? Give examples particularly the way null values can be filtered.
    • Which are the aggregation functions that can be applied to the DataFrames and which is the role of the parameters axis and inplace often used together with these functions?
    • Which is the form of the expressions for adding columns to a DataFrame? and Rows? How can rows or columns be deleted?
    • How can default values be added to attributes containing missing or null values?
    • Give an example of the use of the group() method applied on a DataFrame.
    • How are manipulation operators associated to DataFrames related and useful for implementing Data Science processes?

Descriptive Statistics

  • What is the role of descriptive statistics with regard to the analysis of data collections?
  • What type of questions can be answered using descriptive statistics? Which are the mathematical tools used for that?
  • Which methods are provided by Pandas for getting acquainted with data collections content in a quantitative manner?
  • How is the method shape used for analysing data in a DataFrame?
  • What issues have to be considered in order to be able to apply statistics to raw data collections?
  • What is the role of the generation of graphics in the application of descriptive statistics for analysing data?
  • Which are the strategies used for dealing with dirty data when applying descriptive statistics functions?
  • Why can the distribution of the values of a given attribute be important to be known in a data analytics process?

Unsupervised learning

  • What is unsupervised learning? Explain its general principle.
  • What type of questions can unsupervised learning methods answer? Give examples or use cases.
  • Describe the general principle of the K-Means clustering algorithm?
  • Explain which measures can be used for assessing the result fo applying such algorithm on data?
  • What is the role of visualisation of results of the K-Means algorithm applied to a data collection?

Inferential statistics

  • Explain the principle of linear regression and give an example
  • What can be linear regression used for
  • What are the criteria associated to data to be considered for deciding whether linear regression can be applied or not?
  • Define a pipeline that gives the general steps to be implemented to solve a prediction problem using linear regression.
  • What are the scores used for assessing linear regression results?
  • What does it mean to bootstrap the std error of mean?
  • What are confidence intervals and p-values?