CONTENT

  1. Introduction [PDF]
    1. Data centric sciences: principles & common aspects
    2. Digital data collections: characteristics & properties
    3. Data science: big data, data analytics algorithms & tools
  2. From centralized to high scale WIDES, data science laboratories to artificial intelligences studios [PDF]
    1. In house data analytics environments: Jupyter
    2. Targeting large scale: Zeppelin, Spark [PDF]
    3. Data science virtual machines: cloud solutions
    4. Data science labs: CoLab, Kaggle, Azure Notebooks
    5. Artificial intelligences execution environments & studios: tensor, café, Azure ML studio
  3. Designing experiments environments
    1. Data labs: data collections, quality, & profiling
    2. Architectural settings: from in house to large scale experiments
      • Parallel execution platforms & environments
      • Multi-core programming
      • Externalising computation
  4. Data engineering [PDF] [Data]
    a. Data formats, transformations, distribution
    b. Studying data quality
    c. Statistical properties
    d. Techniques for adjusting data and building data samples
    e. An overview of applied mathematics to machine learning
  5. Designing data science pipelines [[PDF]
    1. Linear regression
      • Simple linear regression
      • Multiple linear regression & polynomial regression
      • Sparse model
    2. Logistic regression
    3. Supervised & unsupervised learning projects
      • Learning curves
      • Training, validation & test
      • Two learning models
        • Super vector machines
        • Random forest
    4. Clustering
      • Assessing clustering: metrics
      • Techniques taxonomy
    5. Graph processing: network science
      • Background on networks and graphs
      • Graph operation
      • Similarity & distances