{"id":21,"date":"2025-04-19T10:53:10","date_gmt":"2025-04-19T10:53:10","guid":{"rendered":"http:\/\/vargas-solar.com\/datsyens\/?page_id=21"},"modified":"2025-04-19T11:04:46","modified_gmt":"2025-04-19T11:04:46","slug":"material","status":"publish","type":"page","link":"http:\/\/vargas-solar.com\/datsyens\/material\/","title":{"rendered":"MATERIAL"},"content":{"rendered":"\n<p><strong>1. Lecture Materials (Slides + Notes)<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Conceptual modules on data architectures, distributed computing, and ML integration<\/li>\n\n\n\n<li>Case studies in smart water management, industrial process control, and environmental monitoring<\/li>\n<\/ul>\n\n\n\n<p><strong>2. Hands-On Labs and Notebooks<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Jupyter\/Colab notebooks for each week (Python-based)<\/li>\n\n\n\n<li>Scenarios including ingestion pipelines, feature engineering, model building, and deployment<\/li>\n<\/ul>\n\n\n\n<p><strong>3. Tools &amp; Platforms<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Programming: Python (Pandas, Dask, Scikit-learn, TensorFlow, Streamlit, MLflow)<\/li>\n\n\n\n<li>Data: SQL, Parquet, Delta Lake, Kafka<\/li>\n\n\n\n<li>Deployment: Docker, FastAPI, Colab, GitHub, AWS\/GCP<\/li>\n\n\n\n<li>Visualization: Plotly, GeoPandas, QGIS<\/li>\n\n\n\n<li><strong>Deployment<\/strong>: FastAPI, Docker, Streamlit<\/li>\n\n\n\n<li><strong>Data Sources<\/strong>: EPA, USGS, Copernicus, OpenStreetMap, Kaggle datasets<\/li>\n\n\n\n<li><strong>Cloud<\/strong>: Google Colab, AWS S3\/EC2 (intro), GitHub Actions<\/li>\n<\/ul>\n\n\n\n<p><strong>4. Resources<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Project charter and design templates<\/li>\n\n\n\n<li>GitHub project starter kit<\/li>\n\n\n\n<li>Peer-review and self-assessment rubrics<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td><strong>Week<\/strong><\/td><td><strong>Focus Area<\/strong><\/td><td><strong>Example Datasets<\/strong><\/td><td><strong>Tools \/ Libraries<\/strong><\/td><\/tr><\/thead><tbody><tr><td><strong>1<\/strong><\/td><td>Systems Thinking<\/td><td>\u2013 Conceptual<\/td><td>Miro, Lucidchart, Draw.io<\/td><\/tr><tr><td><strong>1<\/strong><\/td><td>Data Engineering<\/td><td>&#8211;&nbsp;<a href=\"https:\/\/openaq.org\/\">OpenAQ Sensor Data<\/a><br>&#8211;&nbsp;<a href=\"https:\/\/www.epa.gov\/outdoor-air-quality-data\">EPA Real-Time Sensors<\/a><\/td><td>Apache Kafka, MQTT, Pandas<\/td><\/tr><tr><td><strong>1<\/strong><\/td><td>Large Data Wrangling<\/td><td>&#8211; USGS Water Quality&nbsp;<br>&#8211;&nbsp;<a href=\"https:\/\/echo.epa.gov\/tools\/data-downloads\">EPA ECHO Dataset<\/a><\/td><td>Dask, PySpark, Pandas, Jupyter<\/td><\/tr><tr><td><strong>2<\/strong><\/td><td>Spatiotemporal Analytics<\/td><td>&#8211;&nbsp;<a href=\"https:\/\/github.com\/BYU-PRISM\/SCADA-data\">SCADA Logs (simulated)<\/a><br>&#8211;&nbsp;<a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/air+quality\">UCI Air Quality Dataset<\/a><\/td><td>Matplotlib, Seaborn, SciPy<\/td><\/tr><tr><td><strong>2<\/strong><\/td><td>MLOps &amp; Lifecycle<\/td><td>&#8211; Any time-series set&nbsp;<\/td><td>MLflow, Weights &amp; Biases, GitHub<\/td><\/tr><tr><td><strong>3<\/strong><\/td><td>Real-Time Inference<\/td><td>&#8211; Simulated prediction model<\/td><td>FastAPI, Streamlit, Docker<\/td><\/tr><tr><td><strong>3<\/strong><\/td><td>Forecasting at Scale<\/td><td>&#8211; NYC Water Consumption&nbsp;<br>&#8211;&nbsp;<a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/Individual+household+electric+power+consumption\">Electricity Load (UCI)<\/a><\/td><td>Prophet, LSTM (TensorFlow\/PyTorch), Dask<\/td><\/tr><tr><td><strong>3<\/strong><\/td><td>Remote Sensing \/ GIS<\/td><td>&#8211; Copernicus Sentinel Data&nbsp;<br>&#8211; Landsat Imagery&nbsp;<br>&#8211; Global Surface Water<\/td><td>GeoPandas, Rasterio, QGIS<\/td><\/tr><tr><td><strong>4<\/strong><\/td><td>Cloud Architectures<\/td><td>&#8211; Any prior datasets in cloud storage<\/td><td>Google Colab, AWS S3\/EC2, GitHub Actions<\/td><\/tr><tr><td><strong>4<\/strong><\/td><td>Capstone<\/td><td>&#8211; Student-selected or recombined from above<\/td><td>All tools above, including Git\/GitHub, MLOps stack, notebooks, Docker<\/td><\/tr><\/tbody><\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>1. Lecture Materials (Slides + Notes) 2. Hands-On Labs and Notebooks 3. Tools &amp; Platforms 4. Resources Week Focus Area Example Datasets Tools \/ Libraries 1 Systems Thinking &ndash; Conceptual Miro, Lucidchart, Draw.io 1 Data Engineering &ndash;&nbsp;OpenAQ Sensor Data&ndash;&nbsp;EPA Real-Time Sensors Apache Kafka, MQTT, Pandas 1 Large Data Wrangling &ndash; USGS Water Quality&nbsp;&ndash;&nbsp;EPA ECHO Dataset [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-templates\/full-width.php","meta":{"footnotes":""},"class_list":["post-21","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/vargas-solar.com\/datsyens\/wp-json\/wp\/v2\/pages\/21","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/vargas-solar.com\/datsyens\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/vargas-solar.com\/datsyens\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/datsyens\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"http:\/\/vargas-solar.com\/datsyens\/wp-json\/wp\/v2\/comments?post=21"}],"version-history":[{"count":1,"href":"http:\/\/vargas-solar.com\/datsyens\/wp-json\/wp\/v2\/pages\/21\/revisions"}],"predecessor-version":[{"id":22,"href":"http:\/\/vargas-solar.com\/datsyens\/wp-json\/wp\/v2\/pages\/21\/revisions\/22"}],"wp:attachment":[{"href":"http:\/\/vargas-solar.com\/datsyens\/wp-json\/wp\/v2\/media?parent=21"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}