{"id":505,"date":"2024-11-21T12:20:42","date_gmt":"2024-11-21T12:20:42","guid":{"rendered":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/?page_id=505"},"modified":"2025-11-17T11:58:42","modified_gmt":"2025-11-17T11:58:42","slug":"ho-1-bis-exploring-datasets-getting-acquainted-with-tables-manipulation","status":"publish","type":"page","link":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/ho-1-bis-exploring-datasets-getting-acquainted-with-tables-manipulation\/","title":{"rendered":"HO-1 Bis Exploring datasets: Getting acquainted with tables&#8217; manipulation"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Context<\/h3>\n\n\n\n<p>Tabular datasets are very common. They are formatted as tables defined as a series of (implicitly) typed attributes, which represent the structure\/schema of the table. They contain a series of records aligned to this structure. Different exploration libraries propose a data type representing the notion of a table and sets of operators for manipulating tables. Even if the general principle is similar among tabular data models, the properties of the tabular data structures and operators vary.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p>This exercise aims to get acquainted with the tabular data structure and its associated operators and try a concrete solution provided by the library Pandas of Python. Applying a series of operators on tabular data, it is possible to explore their content, profile their mathematical properties and answer research questions of type &#8220;What happened?&#8221;. Through this exploration, we can determine the quality of a data set in terms of the missing and null values and the statistical distribution of the values in columns. Then, we can decide to &#8221; clean&#8221; the data set and respond to &#8220;what happened?&#8221; like research questions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Material<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.kaggle.com\/code\/gevargas\/ho-1-data-exploration-quantitative-2024\" target=\"_blank\" rel=\"noreferrer noopener\">Exercise HO-1<\/a>, a version with R, can be found here [<a href=\"https:\/\/www.kaggle.com\/code\/gevargas\/ho1-egi-table-operations-r\">K-Notebool in R<\/a>]<\/li>\n\n\n\n<li>Explanation on the whiteboard about the table data structure<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">To Do<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Organise into groups of 3 or 4 people. You can also decide to work alone, even if it is less fun.<\/li>\n\n\n\n<li>Have a look at the experiment proposed in <a href=\"https:\/\/www.kaggle.com\/code\/gevargas\/ho-1-data-exploration-quantitative-2024\" target=\"_blank\" rel=\"noreferrer noopener\">HO-1<\/a><\/li>\n\n\n\n<li>Create a notebook in Kaggle according to in-class instructions and test-learn the tasks implemented in <a href=\"https:\/\/www.kaggle.com\/code\/gevargas\/ho-1-data-exploration-quantitative-2024\" target=\"_blank\" rel=\"noreferrer noopener\">HO-1<\/a><\/li>\n\n\n\n<li>Propose a mind map with the table manipulation operators introduced in class and the corresponding operators proposed by the Pandas library that you have discovered by Testing <a href=\"https:\/\/www.kaggle.com\/code\/gevargas\/ho-1-data-exploration-quantitative-2024\" target=\"_blank\" rel=\"noreferrer noopener\">HO-1<\/a><\/li>\n\n\n\n<li>Draw a pipeline that describes the series of tasks of the pipeline implemented in HO-1. The pipeline intends to answer the question, &#8220;<strong>How similar were EU countries when they invested in education throughout the first decade of the XXI century<\/strong>?&#8221;<\/li>\n\n\n\n<li>Propose an interpretation of the final result in the exercise (interpret the plots) and provide a critical view of their pertinence concerning the research question.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">To Hand In<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Add the names of your group members in your notebook and the program you are inscribed in.<\/li>\n\n\n\n<li>Add the mindmap drawing, the pipeline drawing, and your interpretation to your notebook.<\/li>\n\n\n\n<li>Please share it with the professor USING KAGGLE.<\/li>\n\n\n\n<li><\/li>\n<\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Context Tabular datasets are very common. They are formatted as tables defined as a series of (implicitly) typed attributes, which represent the structure\/schema of the table. They contain a series of records aligned to this structure. Different exploration libraries propose a data type representing the notion of a table and sets of operators for manipulating [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-templates\/full-width.php","meta":{"footnotes":""},"class_list":["post-505","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/505","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/comments?post=505"}],"version-history":[{"count":6,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/505\/revisions"}],"predecessor-version":[{"id":566,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/505\/revisions\/566"}],"wp:attachment":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/media?parent=505"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}