Some of the following hands on exercises are modified version of the ones proposed in L. Igual and S. Seguí, Introduction to Data Science: A Python approach to concepts, techniques and applications, Undergraduate topics in Computer Science Series, Springer, 2017
Understanding data collections content: a quantitative vision
Keep in mind that for performing data analytics you are willing to make sense of data and this implies acquiring Data Literacy. Have a look at this reference for background.
Michel Bowen, Anthony Bartley, The Basics of Data Literacy: making your students (and you!) make sens of data, NST Press, Arlington Virginia
Digital humanities program
- Getting acquainted to a Data Lab [data]
- First steps into data analytics [Let us be Holmes and find the murderer]
Engineering master and undergraduate programs stepping into data science and (big) data
1. Characterising data collections according to their V-properties (Desk)
- What is the smell of the city? [DE-1] (ENSE3)
- Data collection campaign: building a smell cartography [DE-1] (EGI BD Industry 4.0)
Working environment settings: From in house to large scale experiments
Data and experimental lab: Kaggle & Colab
- Access your Kaggle account (https://www.kaggle.com/)
- Prepare your Kaggle environment following the instructions here
- Create a gmail account for using Colab (https://colab.research.google.com/) and follow instructions in class.
- Working Locally on your computer (why not?) If you are willing to use your own computer outside the course, using a self contained Data Science environment follow two steps (requires medium technical skills):
Useful information: cheat sheets
- Some of the following hands on will be done in Python. So here a memento of the language [PDF]
- Imbalanced Data in Classification Cheat Sheet
2. Data exploration and preparation
- Getting started with the data science ecosystem [HO-1Bis] [HO-1] [K-Notebook] (ENSE3)
[K-Notebool in R] (EGI BD Industry 4.0)
Tabular operators Pandas-SQL [CheatSheet] - Exploring data collections using descriptive statistics [HO-2] [PDF] (for Data Science studying programs)
- Classification for data exploration: Unsupervised learning light version Google Colab [HO-3][GIST](ENSE3)
- Comparing clustering algorithms long version [HO-4] (for Data Science & Computing Science programs)
- Dealing with bias example on personal data using Google Colab [HO-4b] (ENSE3)
3. Network analysis: modelling and discovering knowledge using graphs
- Network Analysis: 5 graph operations social networks [HO-8][K-Notebook] (ENSE3)
4. Prediction using inferential statistics
- Linear regression [HO-5] (for Data Science & Computing Science programs)
- Logistic regression [HO-6] [GIST] (ENSE3)
6. Towards Data Analytics at Scale (for Data Engineering Program)
- https://github.com/jbsneto-ppgsc-ufrn/spark-tutorial
- Azure Machine Learning Gallery https://gallery.azure.ai