HANDS-ON

Some of the following hands on exercises are modified version of the ones proposed in L. Igual and S. Seguí, Introduction to Data Science: A Python approach to concepts, techniques and applications, Undergraduate topics in Computer Science Series, Springer, 2017

Understanding data collections content: a quantitative vision

Keep in mind that for performing data analytics you are willing to make sense of data and this implies acquiring Data Literacy. Have a look at this reference for background. 

Michel Bowen, Anthony Bartley, The Basics of Data Literacy: making your students (and you!) make sens of data, NST Press, Arlington Virginia

Digital humanities program

  1. Getting acquainted to a Data Lab [data]
  2. First steps into data analytics [Let us be Holmes and find the murderer] 

Engineering master and undergraduate programs stepping into data science and (big) data

1. Characterising data collections according to their V-properties (Desk)

  • What is the smell of the city? [DE-1]  (ENSE3)
  • Data collection campaign: building a smell cartography [DE-1] (EGI BD Industry 4.0)

Working environment settings: From in house to large scale experiments

Data and experimental lab: Kaggle & Colab

  • Access your Kaggle account (https://www.kaggle.com/
  • Prepare your Kaggle environment following the instructions here
  • Create a gmail account for using Colab (https://colab.research.google.com/) and follow instructions in class.
  • Working Locally on your computer (why not?) If you are willing to use your own computer outside the course, using a self contained Data Science environment follow two steps (requires medium technical skills):
    1. Download Anaconda according to the characteristics of your machine and OS.
    2. Install Anaconda following the instructions according to your OS (Windows, MacOS).

Useful information: cheat sheets

  1. Some of the following hands on will be done in Python. So here a memento of the language [PDF]
  2. Imbalanced Data in Classification Cheat Sheet

2. Data exploration and preparation

  1. Getting started with the data science ecosystem [HO-1Bis] [HO-1] [K-Notebook] (ENSE3)
    [K-Notebool in R] (EGI BD Industry 4.0)
    Tabular operators Pandas-SQL [CheatSheet
  2. Exploring data collections using descriptive statistics [HO-2] [PDF] (for Data Science studying programs)
  3. Classification for data exploration: Unsupervised learning light version Google Colab [HO-3][GIST](ENSE3)
    1. Comparing clustering algorithms long version [HO-4] (for Data Science & Computing Science programs)
  4. Dealing with bias example on personal data using Google Colab [HO-4b] (ENSE3)

3. Network analysis: modelling and discovering knowledge using graphs

  1. Network Analysis: 5 graph operations social networks [HO-8][K-Notebook] (ENSE3)

4. Prediction using inferential statistics

  1. Linear regression [HO-5] (for Data Science & Computing Science programs)
  2. Logistic regression [HO-6] [GIST] (ENSE3)

6. Towards Data Analytics at Scale (for Data Engineering Program)

  1. https://github.com/jbsneto-ppgsc-ufrn/spark-tutorial 
  2. Azure Machine Learning Gallery https://gallery.azure.ai