{"id":220,"date":"2024-04-16T16:13:08","date_gmt":"2024-04-16T16:13:08","guid":{"rendered":"http:\/\/vargas-solar.com\/bigdata-engineering\/?page_id=220"},"modified":"2024-04-17T07:21:31","modified_gmt":"2024-04-17T07:21:31","slug":"executing-map-reduce-programs-on-hadoop-environments","status":"publish","type":"page","link":"http:\/\/vargas-solar.com\/bigdata-engineering\/executing-map-reduce-programs-on-hadoop-environments\/","title":{"rendered":"Executing Map Reduce Programs on Hadoop Environments"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><em>Objective<\/em><\/h2>\n\n\n\n<p>The general objective of this exercise is to perform the first steps on the use of a Hadoop Environment for executing map-reduce programs (written in Python). This first exercise with show how to install a one node Hadoop setting on Collab and observe how to implement and run a map-reduce program.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Material<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Google Colab account<\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/gevargas\/bigdata-management\/blob\/master\/Intro_Hadoop.ipynb\">https:\/\/github.com\/gevargas\/bigdata-management\/blob\/master\/Intro_Hadoop.ipynb<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.google.com\/document\/d\/1J6W9nOcrw7mXahiU9wt5MD7psvYAk9w2KToMJ6J-H34\/edit?usp=sharing\">Lab: MapReduce in Python using mrjob<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Description<\/h2>\n\n\n\n<p>The main steps of the exercise are very simple. At first, this exercise does not run on a cluster but on one CPU allocated by default by google cloud. It helps to concentrate on the way the map and reduce functions are specified and how a program is designed on the map-reduce model.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">To Do and To Hand In<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Propose a UML component diagram of the Hadoop environment installed on Collab.<\/li>\n\n\n\n<li>Propose a UML component diagram of the two map-reduce count words programs tested in the lab.<\/li>\n\n\n\n<li>Explain how the first example implementing a grep operation with a regular expression is executed.<\/li>\n\n\n\n<li>Explain the way the program \u201ccount words\u201d is executed in the example.<\/li>\n\n\n\n<li>What is the role of google drive in these examples?<\/li>\n<\/ul>\n\n\n\n<p>Follow the steps in the <a href=\"https:\/\/docs.google.com\/document\/d\/1J6W9nOcrw7mXahiU9wt5MD7psvYAk9w2KToMJ6J-H34\/edit?usp=sharing\">Lab: MapReduce in Python using mrjob<\/a> for executing a map-reduce job on a Hadoop cluster deployed on Google Cloud.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Propose a UML component diagram of the Hadoop environment configuration and test on Google Cloud.<\/li>\n\n\n\n<li>Propose a UML component diagram of the two map-reduce programs tested in this version of the lab.<\/li>\n\n\n\n<li>What is the role of the cloud on this version of the lab?<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><\/h2>\n","protected":false},"excerpt":{"rendered":"<p>Objective The general objective of this exercise is to perform the first steps on the use of a Hadoop Environment for executing map-reduce programs (written in Python). This first exercise with show how to install a one node Hadoop setting on Collab and observe how to implement and run a map-reduce program. Material Description The [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-220","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/pages\/220","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/comments?post=220"}],"version-history":[{"count":6,"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/pages\/220\/revisions"}],"predecessor-version":[{"id":249,"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/pages\/220\/revisions\/249"}],"wp:attachment":[{"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/media?parent=220"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}