{"id":12,"date":"2024-04-06T11:42:41","date_gmt":"2024-04-06T11:42:41","guid":{"rendered":"http:\/\/vargas-solar.com\/bigdata-engineering\/?page_id=12"},"modified":"2024-04-16T20:36:17","modified_gmt":"2024-04-16T20:36:17","slug":"content","status":"publish","type":"page","link":"http:\/\/vargas-solar.com\/bigdata-engineering\/content\/","title":{"rendered":"CONTENT"},"content":{"rendered":"\n<figure class=\"wp-block-table is-style-stripes has-x-large-font-size\"><table><thead><tr><th scope=\"col\"><strong>Date<\/strong><\/th><th scope=\"col\"><strong>Content<\/strong><\/th><th scope=\"col\"><strong>Ressources<\/strong><\/th><\/tr><\/thead><tbody><tr><td>D-1<br><\/td><td>By mail: Welcome Instructions for practicals<\/td><td><strong>Important:<\/strong> <a href=\"http:\/\/vargas-solar.com\/bigdata-engineering\/dei-disclaimer\/\">Well-being, D&amp;I and Evaluation<\/a><\/td><\/tr><tr><td>8 Apr<\/td><td><strong>Introduction to Big Data<\/strong> [<a href=\"https:\/\/drive.google.com\/file\/d\/1gDFS-41YlTD4ntrReMEBjnVw-7GO9T8j\/view?usp=sharing\">slides<\/a>]<br><em>Topics:<\/em><br>1. Datafication, 5Vs model Big Data, Platforms History&nbsp;&nbsp;<br>2. Big Data enables architectures <br>&#8212; Evolutive overview<br>&#8212; Cloud Computing<br> &#8212; As-a-Service model (IaaS, PaaS, SaaS)<br> &#8212; Pay-as-you-go economic model<br> &#8212; Global regions &amp; zones<br><br><em>Labs: <\/em><br>a. <a href=\"https:\/\/docs.google.com\/document\/d\/1cSD4816uFa00e9Df53psqbpL0VdKzlKiywkWKU8QSYU\/edit?usp=sharing\">How to: Create &amp; configure EC2 virtual machine<\/a><a href=\"https:\/\/calculator.aws\/\"> AWS <\/a><br>&#8212; <a href=\"https:\/\/calculator.aws\/\">Pricing Calculator<\/a> (to check your VM monthly cost)<br>b. Case study: Urban Computing (<a href=\"http:\/\/vargas-solar.com\/bigdata-engineering\/what-does-your-city-smell-like\">desk exercise<\/a>)<\/td><td><strong>Videos:<\/strong><br> <a href=\"https:\/\/www.youtube.com\/watch?v=bAyrObl7TYE\">What is Big Data?<\/a><a href=\"https:\/\/www.youtube.com\/watch?v=M988_fsOSWo\"> What is Cloud Computing?<\/a><\/td><\/tr><tr><td>9 Apr<\/td><td><strong>Distributed Storage<\/strong> (<a href=\"https:\/\/drive.google.com\/file\/d\/1gfjQq9tWH2YGjXA9nEavt3q60xCjgf8Y\/view?usp=sharing\">slides1<\/a>, <a href=\"https:\/\/drive.google.com\/file\/d\/1a9-7u4mCybTSi2XFvSg9bnQq8YK2ULUu\/view?usp=share_link\">slides2<\/a>, <a href=\"https:\/\/drive.google.com\/file\/d\/1gjD2F-3m8WmM5EiuwtZle5G_jQdttuLq\/view?usp=sharing\">slides3<\/a>)<br><em>Topics: <\/em><br>&#8212; Preamble: storage and management requirements: hot vs cold data<br>1. From distributed file systems to cluster-based stores: NoSQL systems<br>2. Data management guarantees: CAP model<br>3. Polistores: polyglot persistence solutions<br><br><em>Labs:<\/em><br>a. Case study: Amazon S3 (object stores)<br><a href=\"https:\/\/aws.amazon.com\/getting-started\/hands-on\/backup-files-to-amazon-s3\/\">   &#8212; How to: Store and Retrieve a File with Amazon S3<\/a><br><a href=\"https:\/\/docs.google.com\/document\/d\/1dtD3knCp5HKy2pVnmkSzxroSXYxrUcYf96a1zYKrkGM\/edit?usp=sharing\">   &#8212; Lab: Playing with Amazon S3<\/a>&nbsp;<br>b. <strong>HOMEWORK: <\/strong>Case study NoSQL (desk exercise)<br>&#8212; <a href=\"http:\/\/vargas-solar.com\/bigdata-engineering\/polyglot-data-management-on-the-cloud\/\">Mynet Polystore  <\/a><\/td><td><strong>Videos:<\/strong><br><a href=\"https:\/\/youtu.be\/gY090GEDdu8\">&#8211; Understanding Object Storage, Buckets, and S3<\/a><br><a href=\"https:\/\/www.youtube.com\/watch?v=tc2940Zwvyk\">&#8211; Platform Overview &#8211; Data &amp; Storage<\/a> <br><\/td><\/tr><tr><td><strong>10 Apr<\/strong><\/td><td><strong>Visit <a href=\"https:\/\/cc.in2p3.fr\/\">CC-IN2P3<\/a> Computing Centre<\/strong><br>See location on <a href=\"https:\/\/www.google.com\/maps\/place\/IN2P3+Computing+Center\/data=!4m2!3m1!19sChIJ83h2vpfq9EcRVdujC9p9wCY\">map<\/a><br><br>Note: <br><mark style=\"background-color:#fff\" class=\"has-inline-color has-dark-gray-color\">&#8211;<\/mark><mark style=\"background-color:#fff\" class=\"has-inline-color has-white-color\"> <\/mark><mark style=\"background-color:#e6e6e6\" class=\"has-inline-color has-dark-gray-color\"><strong>ID required<\/strong> <strong>!!<\/strong><\/mark><br>&#8211; <mark style=\"background-color:#e6e6e6\" class=\"has-inline-color has-dark-gray-color\"><strong>Backpacks are forbidden<\/strong><\/mark> (you can securely store them at CPE)<br><br><strong>Read before the visit:<\/strong><br>&#8211; <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-dark-gray-color\">Ch 4. Data Center Basics: Building, Power, and Cooling<\/mark> (<a href=\"https:\/\/drive.google.com\/file\/d\/1NcR2oH6wizy2NZ6rUmJVXGHFAWgOH2bU\/view?usp=share_link\">annotated pdf<\/a>) <br>&#8212;&#8211; Book chapter: <strong>The Datacenter as a Computer<\/strong> (see <a href=\"http:\/\/vargas-solar.com\/bigdata-engineering\/ressources\/\">ressources<\/a>)<br>&#8211; <a href=\"https:\/\/cacm.acm.org\/magazines\/2022\/6\/261169-our-house-is-on-fire\/fulltext\">Our House Is On Fire: The Climate Emergency and Computing&#8217;s Responsibility<\/a><\/td><td>CC-IN2P3 <strong>virtual tours<\/strong>: <br>&#8211; <a href=\"https:\/\/www.gabrielacoca.fr\/in2p3\/index.html\">Computing Center<\/a>  <br>&#8211; <a href=\"https:\/\/musee.cc.in2p3.fr\/\">Museum<\/a><br><\/td><\/tr><tr><td><strong>15 Apr<\/strong><\/td><td><strong>Zooming in on Distributed File Systems<\/strong> <strong>&amp; Big Data Processing Platform<\/strong> (<a href=\"https:\/\/drive.google.com\/file\/d\/1gf9O2VO_ErW0V6ijKUCAnO8H0GvHbV7h\/view?usp=share_link\" data-type=\"link\" data-id=\"https:\/\/drive.google.com\/file\/d\/1gf9O2VO_ErW0V6ijKUCAnO8H0GvHbV7h\/view?usp=share_link\">slides<\/a>)<br><em>Topics:<\/em> <br>1. Architecture and general principle<br>2. Fault tolerance<br><br><em>Labs:<\/em><br>a. <a href=\"https:\/\/drive.google.com\/file\/d\/1gt0ehLqK0M4iRqwOBvYlMGXE2HUGca2m\/view?usp=share_link\">Lab: Creating a Hadoop Cluster using Google Cloud<\/a><br>&#8212;- Git repository: <a href=\"https:\/\/gitlab.com\/oaidel\/cpe\">gitlab.com\/oaidel\/cpe<\/a><\/td><td><strong>Readings:<\/strong><br><a href=\"https:\/\/drive.google.com\/file\/d\/1atQ0jvE-NHolS_WQbprV6ezwI2e_-V25\/view?usp=share_link\">&#8211; HDFS Architecture<\/a> (annotated)<\/td><\/tr><tr><td>16 Apr<\/td><td><strong>Processing Big Data: control flow vs data flow solutions (1\/2)<\/strong> [<a href=\"https:\/\/drive.google.com\/file\/d\/1hGegQ5ikt712uVlx-HTp4VN5TK6F0UqE\/view?usp=sharing\">slides<\/a>]<br><em>Topics:<\/em><strong><br><\/strong>1. Programming model: map-reduce<br>2. Integrating map-reduce into control and data flow solutions<br>3. Control flow data processing<br> &#8212; Program definition &amp; execution<br> &#8212; Control flow execution environments<br><br><em>Labs:<\/em><br>a. Case Study: Hadoop Ecosystem<br><a href=\"https:\/\/docs.google.com\/document\/d\/1px2f2yA3K_09pMUjIlRMgZSPdBtLEzquPKFCt6IXQw0\/edit?usp=sharing\">Create &amp; configure Cloud9 XL environment<\/a><br><a href=\"https:\/\/github.com\/javieraespinosa\/dxlab-sharding\">Lab: Sharding Data Collections with MongoDB<\/a><br><em><strong>Question to think: How can you propose a sharding strategy for graphs?<\/strong><\/em><\/td><td><strong>Readings<\/strong>:<br><a href=\"https:\/\/research.google\/pubs\/pub62\/\">&#8211; MapReduce: Simplified Data Processing on Large Clusters<\/a><br>Videos: <a href=\"https:\/\/www.youtube.com\/watch?v=aReuLtY0YMI\">What is Hadoop?<\/a><br><\/td><\/tr><tr><td>17<br>Apr<\/td><td><strong>Processing Big Data: control flow vs data flow solutions (2\/2)<\/strong> (<a href=\"https:\/\/drive.google.com\/file\/d\/1hMXmBnZsjWgfiDKbD_IpXFFP8KpOHb87\/view?usp=sharing\">slides<\/a>)<br><em>Topics<\/em>:<br>4. Data flow data processing<br> &#8212; Program definition &amp; execution<br> &#8212; Processing &amp; data management operators<br> &#8212; Program execution general principle: <br> * Execution DAG<br> * Lazy evaluation<br> &#8212; Data flow execution environments<br><strong>Data Engineering Wrap Up<\/strong> (<a href=\"https:\/\/drive.google.com\/file\/d\/1hP1Rlo3LrsPsK4Pzft0FxtrSq409L0oO\/view?usp=sharing\">slides<\/a>)<br><br><em>Labs:<\/em><br>a. Case Study: Parallel processing with Hadoop and Spark (<a href=\"http:\/\/vargas-solar.com\/bigdata-engineering\/first-steps-into-parallel-programming-models\/\" data-type=\"page\" data-id=\"218\">here<\/a>) &#8211; <strong><em>very long desk and practical exercise!<\/em><\/strong><\/td><td><strong>Readings<\/strong>:<br><a href=\"https:\/\/drive.google.com\/file\/d\/1b-He-K0OfngOIpx1Svu46YmWM4BZ21yY\/view?usp=sharing\">&#8211; Apache Spark: A Unified Engine for Big Data Processing<\/a> (annotated)<\/td><\/tr><tr><td><strong>18 Apr<\/strong><\/td><td>Study time<\/td><td><\/td><\/tr><\/tbody><\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Date Content Ressources D-1 By mail: Welcome Instructions for practicals Important: Well-being, D&amp;I and Evaluation 8 Apr Introduction to Big Data [slides]Topics:1. Datafication, 5Vs model Big Data, Platforms History&nbsp;&nbsp;2. Big Data enables architectures &mdash; Evolutive overview&mdash; Cloud Computing &mdash; As-a-Service model (IaaS, PaaS, SaaS) &mdash; Pay-as-you-go economic model &mdash; Global regions &amp; zones Labs: a. [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"parent":0,"menu_order":40,"comment_status":"closed","ping_status":"closed","template":"page-templates\/full-width.php","meta":{"footnotes":""},"class_list":["post-12","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/pages\/12","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/comments?post=12"}],"version-history":[{"count":90,"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/pages\/12\/revisions"}],"predecessor-version":[{"id":243,"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/pages\/12\/revisions\/243"}],"wp:attachment":[{"href":"http:\/\/vargas-solar.com\/bigdata-engineering\/wp-json\/wp\/v2\/media?parent=12"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}