MATERIAL

1. Lecture Materials (Slides + Notes)

  • Conceptual modules on data architectures, distributed computing, and ML integration
  • Case studies in smart water management, industrial process control, and environmental monitoring

2. Hands-On Labs and Notebooks

  • Jupyter/Colab notebooks for each week (Python-based)
  • Scenarios including ingestion pipelines, feature engineering, model building, and deployment

3. Tools & Platforms

  • Programming: Python (Pandas, Dask, Scikit-learn, TensorFlow, Streamlit, MLflow)
  • Data: SQL, Parquet, Delta Lake, Kafka
  • Deployment: Docker, FastAPI, Colab, GitHub, AWS/GCP
  • Visualization: Plotly, GeoPandas, QGIS
  • Deployment: FastAPI, Docker, Streamlit
  • Data Sources: EPA, USGS, Copernicus, OpenStreetMap, Kaggle datasets
  • Cloud: Google Colab, AWS S3/EC2 (intro), GitHub Actions

4. Resources

  • Project charter and design templates
  • GitHub project starter kit
  • Peer-review and self-assessment rubrics
WeekFocus AreaExample DatasetsTools / Libraries
1Systems Thinking– ConceptualMiro, Lucidchart, Draw.io
1Data Engineering– OpenAQ Sensor Data
– EPA Real-Time Sensors
Apache Kafka, MQTT, Pandas
1Large Data Wrangling– USGS Water Quality 
– EPA ECHO Dataset
Dask, PySpark, Pandas, Jupyter
2Spatiotemporal Analytics– SCADA Logs (simulated)
– UCI Air Quality Dataset
Matplotlib, Seaborn, SciPy
2MLOps & Lifecycle– Any time-series set MLflow, Weights & Biases, GitHub
3Real-Time Inference– Simulated prediction modelFastAPI, Streamlit, Docker
3Forecasting at Scale– NYC Water Consumption 
– Electricity Load (UCI)
Prophet, LSTM (TensorFlow/PyTorch), Dask
3Remote Sensing / GIS– Copernicus Sentinel Data 
– Landsat Imagery 
– Global Surface Water
GeoPandas, Rasterio, QGIS
4Cloud Architectures– Any prior datasets in cloud storageGoogle Colab, AWS S3/EC2, GitHub Actions
4Capstone– Student-selected or recombined from aboveAll tools above, including Git/GitHub, MLOps stack, notebooks, Docker