HANDS-ON

Some of the following hands on exercises are modified version of the ones proposed in L. Igual and S. Seguí, Introduction to Data Science: A Python approach to concepts, techniques and applications, Undergraduate topics in Computer Science Series, Springer, 2017

1. If you are willing to use your own computer, using a self contained Data Science environment follow two steps (requires medium technical skills):

  1. Download Anaconda according to the characteristics of your machine and OS.
  2. Install Anaconda following the instructions according to your OS (Windows, MacOS).

2. Online nothing to install (basic technical skills, recommended):

  1. Create a Kaggle account https://www.kaggle.com

Understanding data collections content: a quantitative vision

Keep in mind that for performing data analytics you are willing to make sense of data and this implies acquiring Data Literacy. Have a look at this reference for background. 

Michel Bowen, Anthony Bartley, The Basics of Data Literacy: making your students (and you!) make sens of data, NST Press, Arlington Virginia

 

  1. Getting acquainted to a Data Lab [data]
  2. First steps into data analytics [Let us be Holmes and find the murderer]

Some of the following hands on will be done in Python. So here a memento of the language [PDF]

We are going to use Kaggle to perform this exercice. Access your Kaggle account (https://www.kaggle.com/) and follow instructions in the class.

Prepare your Kaggle environment following the instructions here

  1. Getting started with the data science ecosystem [HO-1] Tabular operators Pandas-SQL [CheatSheet]
  2. Exploring data collections using descriptive statistics [HO-2] [PDF]

Data analysis using artificial intelligence techniques

Imbalanced Data in Classification Cheat Sheet

  1. Classification: Unsupervised learning light version [HO-3][PDF]
  2. Comparing clustering algorithms long version [HO-4] (for Data Science & Computing Science lectures)

For a lecture for a Data Science or Computing Science audience

  1. Supervised learning [HO-7]
  2. Network Analysis: 5 graph operations social networks [HO-8]

Prediction using inferential statistics

  1. Linear regression [HO-5] [PDF]
  2. Extra work: Logistic regression [HO-6] [PDF]

Towards Data Analytics at Scale

  1. https://github.com/jbsneto-ppgsc-ufrn/spark-tutorial 
  2. Azure Machine Learning Gallery https://gallery.azure.ai