HANDS-ON

Some of the following hands on exercises are modified version of the ones proposed in L. Igual and S. Seguí, Introduction to Data Science: A Python approach to concepts, techniques and applications, Undergraduate topics in Computer Science Series, Springer, 2017

1. If you are willing to use your own computer, using a self contained Data Science environment follow two steps (requires medium technical skills):

  1. Download Anaconda according to the characteristics of your machine and OS.
  2. Install Anaconda following the instructions according to your OS (Windows, MacOS).

2. Online nothing to install (basic technical skills, recommended):

  1. Create a Kaggle account https://www.kaggle.com

Understanding data collections content: a quantitative vision

  1. Getting acquainted to a Data Lab [data]
  2. First steps into data analytics [Let us be Holmes and find the murderer]

Some of the following hands on will be done in Python. So here a memento of the language [PDF]

We are going to use Kaggle to perform this exercice. Access your Kaggle account (https://www.kaggle.com/) and follow instructions in the class.

Prepare your Kaggle environment following the instructions here

  1. Getting started with the data science ecosystem [HO-1]
  2. Exploring data collections using descriptive statistics [HO-2] [PDF]

Data analysis using artificial intelligence techniques

  1. Classification: Unsupervised learning light version [HO-3][PDF]
  2. Comparing clustering algorithms long version [HO-4] (for Data Science & Computing Science lectures)

For a lecture for a Data Science or Computing Science audience

  1. Supervised learning [HO-7]
  2. Network Analysis: 5 graph operations social networks [HO-8]

Prediction using inferential statistics

  1. Linear regression [HO-5] [PDF]
  2. Extra work: Logistic regression [HO-6] [PDF]

Towards Data Analytics at Scale

  1. https://github.com/jbsneto-ppgsc-ufrn/spark-tutorial 
  2. Azure Machine Learning Gallery https://gallery.azure.ai