HANDS-ON | Data Centric Smart Experiments

Some of the following hands on exercises are modified version of the ones proposed in L. Igual and S. Seguí, Introduction to Data Science: A Python approach to concepts, techniques and applications, Undergraduate topics in Computer Science Series, Springer, 2017

Understanding data collections content: a quantitative vision

Keep in mind that for performing data analytics you are willing to make sense of data and this implies acquiring Data Literacy. Have a look at this reference for background.

Michel Bowen, Anthony Bartley, The Basics of Data Literacy: making your students (and you!) make sens of data, NST Press, Arlington Virginia

Digital humanities program

Getting acquainted to a Data Lab [data]
First steps into data analytics [Let us be Holmes and find the murderer]

Engineering master and undergraduate programs stepping into data science and (big) data

1. Characterising data collections according to their V-properties (Desk)

What is the smell of the city? [DE-1] (ENSE3)
Data collection campaign: building a smell cartography [DE-1] (EGI BD Industry 4.0)

Working environment settings: From in house to large scale experiments

Data and experimental lab: Kaggle & Colab

Access your Kaggle account (https://www.kaggle.com/)
Prepare your Kaggle environment following the instructions here
Create a gmail account for using Colab (https://colab.research.google.com/) and follow instructions in class.
Working Locally on your computer (why not?) If you are willing to use your own computer outside the course, using a self contained Data Science environment follow two steps (requires medium technical skills):
1. Download Anaconda according to the characteristics of your machine and OS.
2. Install Anaconda following the instructions according to your OS (Windows, MacOS).

Useful information: cheat sheets

Some of the following hands on will be done in Python. So here a memento of the language [PDF]
Imbalanced Data in Classification Cheat Sheet

2. Data exploration and preparation

Getting started with the data science ecosystem [HO-1Bis] [HO-1] [K-Notebook] (ENSE3)
[K-Notebool in R] (EGI BD Industry 4.0)
Tabular operators Pandas-SQL [CheatSheet]
Exploring data collections using descriptive statistics [HO-2] [PDF] (for Data Science studying programs)
Classification for data exploration: Unsupervised learning light version Google Colab [HO-3][GIST](ENSE3)
1. Comparing clustering algorithms long version [HO-4] (for Data Science & Computing Science programs)
Dealing with bias example on personal data using Google Colab [HO-4b] (ENSE3)

3. Network analysis: modelling and discovering knowledge using graphs

Network Analysis: 5 graph operations social networks [HO-8][K-Notebook] (ENSE3)

4. Prediction using inferential statistics

Linear regression [HO-5] (for Data Science & Computing Science programs)
Logistic regression [HO-6] [GIST] (ENSE3)

6. Towards Data Analytics at Scale (for Data Engineering Program)

https://github.com/jbsneto-ppgsc-ufrn/spark-tutorial
Azure Machine Learning Gallery https://gallery.azure.ai