{"id":521,"date":"2024-11-30T19:45:41","date_gmt":"2024-11-30T19:45:41","guid":{"rendered":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/?page_id=521"},"modified":"2024-12-09T17:53:02","modified_gmt":"2024-12-09T17:53:02","slug":"instructions-for-hands-on-exercises-ense3-programs","status":"publish","type":"page","link":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/instructions-for-hands-on-exercises-ense3-programs\/","title":{"rendered":"ENSE3 ICT BD Lab"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Instructions for Hands-On Exercises<\/h1>\n\n\n\n<p>In the 2024 edition of the ICT-Big Data course, you must perform 5 hands-on exercises (two of which have been partially completed in the Lab sessions).&nbsp;<\/p>\n\n\n\n<p>You can work alone or in <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-blue-color\">groups of 3-4 people. The more people in the group, the higher the quality of the answers expected to be (complete, sound, and <\/mark>with critical thinking applied to your discussion).&nbsp;&nbsp;<\/p>\n\n\n\n<p>In the following lines, the specifications:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What is required to be produced by you\/your team?&nbsp;<\/li>\n\n\n\n<li>How to prepare them?<\/li>\n\n\n\n<li>What and when to hand in?<\/li>\n<\/ul>\n\n\n\n<p><strong>This work is 40% of the grade of the ICT: Big Data part of the course by Genoveva Vargas-Solar<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;When?&nbsp;<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">Deadline: 16<sup>th<\/sup>&nbsp;December 2024&nbsp;<strong>12:00 CET (firm deadline)<\/strong><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">2.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Work to do<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2.1 Exercise 1 Dataset quantitative profile and cleaning: The tabular data structure and its operators<\/strong><\/h3>\n\n\n\n<p>Description of the tasks of the exercise [<a href=\"http:\/\/vargas-solar.com\/data-centric-smart-everything\/ho-1-bis-exploring-datasets-getting-acquainted-with-tables-manipulation\/\">HO-1Bis<\/a>]<\/p>\n\n\n\n<p><strong>To hand in<\/strong>: Share a notebook from Kaggle (no PDFs or other types of documents will be accepted), including:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The whole set of instructions (<strong>run<\/strong>)<\/li>\n\n\n\n<li>At the end of the bottom of the exercise notebook \n<ul class=\"wp-block-list\">\n<li><strong>Insert three markdown cells with the interpretations of the results,<\/strong> explaining to which extent the results answer the research question.<\/li>\n\n\n\n<li>A mind map (JPEG\/PNG figure) associated with each family of operations that can be applied to a table (discussed in class) and the corresponding Python instructions that implement them. Use a markdown cell to insert your figure.<\/li>\n\n\n\n<li>The pipeline (JPEG\/PNG figure) describes the logic flow implemented in the notebook for answering the research question (which steps implemented in the notebook lead to an answer to the research question?). Use a markdown cell to insert your figure.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>N.B. For drawing the figures, use a tool like Google Draw or draw the figures by hand and take a photo. Ensure the quality is good so that a human can read and understand.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2.2 Exercise 2 Tracking Outliers<\/strong> using Unsupervised Learning<strong> (classification)<\/strong><\/h3>\n\n\n\n<p>Description of the tasks of the exercise [<a href=\"https:\/\/gist.github.com\/gevargas\/32adebd2bf48c77c2b7d0daa48876b7c\">GIST<\/a>]<br>Look at FaQ for questions about the metrics at the beginning of the exercise.<br><strong>A bit of details that can help can be found <a href=\"http:\/\/vargas-solar.com\/data-centric-smart-everything\/hands-on\/a-step-forward-for-discovering-knowledge-using-unsupervised-learning\/\">here<\/a><\/strong><\/p>\n\n\n\n<p><strong>To hand in<\/strong>: Share a&nbsp;<strong>notebook<\/strong>&nbsp;from Colab to genoveva.vargas@gmail.com (no PDFs or other types of documents will be accepted), including:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The whole set of instructions (run beginning with a&nbsp;<strong>markdown stating the research question that guides the experiment<\/strong>.)<\/li>\n\n\n\n<li>At the bottom of the exercise notebook, add <strong>three mark-down cells <\/strong>with\n<ul class=\"wp-block-list\">\n<li>The interpretations of results explain to which extent the results answer the research question.<\/li>\n\n\n\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Mind_map\">Mind map<\/a>&nbsp;(JPEG\/PNG Figure) with the principle of clustering by defining clusters&nbsp;\n<ul class=\"wp-block-list\">\n<li>What do records\/items in the data collection represent? <\/li>\n\n\n\n<li>What is their relationship with the notion of vector in an n-dimensional space?<\/li>\n\n\n\n<li>How are clusters recognized?<\/li>\n\n\n\n<li>How is a clustering result assessed? <\/li>\n\n\n\n<li>What role do scores introduced in the exercise play in a clustering result?<\/li>\n\n\n\n<li>Why must we perform several iterations until we find a \u201cfinal result\u201d?<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>The pipeline (JPEG\/PNG figure) describes the logic flow implemented in the notebook for answering the research question (which steps implemented in the notebook lead to an answer to the research question?). Use a markdown cell to insert your figure.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>N.B. For drawing the figures, use a tool like Google Draw or draw the figures by hand and take a photo. Ensure the quality is good so that a human can read and understand.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2.3 Exercise 3 The quality of data: Observing Bias<\/strong><\/h3>\n\n\n\n<p>Description of the tasks of the exercise [HO3: <a href=\"https:\/\/gist.github.com\/gevargas\/2876c671b46f511a81b78905d4406e07\">GIST<\/a>]<\/p>\n\n\n\n<p><strong>To Hand in<\/strong>&nbsp;Share a&nbsp;<strong>notebook<\/strong>&nbsp;from Colab to genoveva.vargas@gmail.com (no PDFs or other types of documents will be accepted), including:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The answers to the suggested actions will be found in the notebook, along with an explanation of what you did in a markdown cell.<\/li>\n\n\n\n<li>Mind map (JPEG\/PNG Figure) of how to address bias measuring it in data collections, then how to \u201cfix\u201d it during the preparation and sampling phases.<\/li>\n\n\n\n<li>Choose another variable to protect and reproduce the same pipeline but consider a different \u201cprotected group\u201d.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2.4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>Exercise 4 Predicting Events<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Create a copy and run the example on Kaggle\/Colab (note that data are private) exercise [HO5:&nbsp;<a href=\"https:\/\/gist.github.com\/gevargas\/bf760a656075ee56b53b83659efcc1ed\">GIST<\/a>]<\/li>\n\n\n\n<li>Draw a figure representing the experiment pipeline presented in the notebook.\n<ul class=\"wp-block-list\">\n<li>Exhibit the preparation phases and the aspects to seek when preparing the dataset for predicting with logistic regression.<\/li>\n\n\n\n<li><span style=\"font-size: 1rem;\">Exhibit the phases required for using logistic regression (you can refer to the code snippets in the notebook devoted to this purpose)<\/span><\/li>\n\n\n\n<li><span style=\"font-size: 1rem;\">Exhibit the assessment and interpretation phases (construction of the confusion matrix)<\/span><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Include the figure at the end of the notebook in a Markdown cell; do not hesitate to describe it in natural language in a separate Markdown cell.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2.5 Exercise 5 Modelling knowledge with graphs (Extra Work)<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Run the example on Kaggle (note that data are private) exercise [HO5:\u00a0<a href=\"https:\/\/www.kaggle.com\/code\/gevargas\/top-graph-algorithms?scriptVersionId=212131331\">K-Notebook<\/a>]<br>see explanation here<\/li>\n\n\n\n<li>Propose yet another mind map about:\n<ul class=\"wp-block-list\">\n<li>The type of graph built in the HO5The families of operations applied to graphs<\/li>\n\n\n\n<li>The visualization techniques used for visualizing the graphs<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Design (DO NOT PROGRAM) an example, including\n<ul class=\"wp-block-list\">\n<li>Research question statement in natural language: assume you have a data collection.<\/li>\n\n\n\n<li>Describe the data collection you will have as input<\/li>\n\n\n\n<li>Draw a figure with a pipeline where a graph is used to model smart cities or smart energy problems.&nbsp;\n<ul class=\"wp-block-list\">\n<li><span style=\"font-size: 1rem;\">Exhibit the phases that show how to build a graph to model a studied phenomenon.&nbsp;<\/span><\/li>\n\n\n\n<li>Exhibit the phases that show how <span style=\"font-size: 1rem;\">the graphs&#8217; operations can be used to answer the research question.<\/span><\/li>\n\n\n\n<li>Include the phases that will implement the assessment strategy.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p>Hand in a PDF of a document with the assignment results&nbsp;<a href=\"https:\/\/drive.google.com\/drive\/folders\/1chm_mk3R6-fkYHDNoeiV7eNnZEYKrdMM?usp=sharing\">here<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Instructions for Hands-On Exercises In the 2024 edition of the ICT-Big Data course, you must perform 5 hands-on exercises (two of which have been partially completed in the Lab sessions).&nbsp; You can work alone or in groups of 3-4 people. The more people in the group, the higher the quality of the answers expected to [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-templates\/full-width.php","meta":{"footnotes":""},"class_list":["post-521","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/521","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/comments?post=521"}],"version-history":[{"count":35,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/521\/revisions"}],"predecessor-version":[{"id":573,"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/pages\/521\/revisions\/573"}],"wp:attachment":[{"href":"http:\/\/vargas-solar.com\/data-centric-smart-everything\/wp-json\/wp\/v2\/media?parent=521"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}