PCDE Course Outline

Introduction

This is an index of sorts to course content and related notes to this course organized by module. For an overview of the course, see the PCDE Course Overview.

Module 0: Course Orientation

Notes Links

Key Activities

Module 1: Introduction to Python

Notes Links

Learning Outcomes

Key Activities

Module 2: Introduction to NumPy

Notes on Topic

Learning Outcomes

Module 3: Introduction to Pandas

Learning Outcomes

Note Links

Module 4: Databases & Intro to SQL

Module 5: Databases with SQL Statements

Notes on Topic

Key Activities

Outcomes

Module 6: Databases Analysis and the Client Server Interface

Notes on Topic

Key Activities

Time Log

Outcomes

Module 7: A Model to Predict Housing Prices

Due Date: 1629 UTC February 8, 2023 Available for late submission till: February 22, 2023

Notes on Topic

Key Activities

Outcomes

Module 8: ETL, Analysis, Visualization

Due Date: 4:29 PM UTC February 15, 2023 Available for late submission till: February 22, 2023

Notes on Topic

Key Activities

Outcomes

Module 9: GitHub & Advanced Python

Notes on Topic

Key Activities

Outcomes

Module 10: Networks

Outcomes

Notes on Topic

Module 11: Client Server Architecture

Note Links

Outcomes

In this module these topics will be covered:

The most difficult part of this section is correctly generating secure tokens for authentication, getting it wrong can mean loss of access to data or worse leaking data by an attacker.

Module 12: Types of Databases & Database Containerization

Due Data

Note Links

Outcomes

Module 13: Change Data Capture (CDC)

Due Data

Note Links

Outcomes

Module 14: Java & Debezium

Due Data

Note Links

Outcomes

Module 15: Advanced Python and Web Applications

Due Date

Activities

Key Activities

Self-Study Activities

Module 16: Transit Data & APIs

Module 16: Due Date

Module 16: Goals

  1. Describe use cases of location-based applications
  2. Define web development tools for building an application
  3. Identify key components of Mapbox
  4. Build a transit data application

Module 16: Activities

Module 16: Key Activities

Module 16: Self-Study Activities

Module 16: Related Notes

Module 17: Performing ETL Using NiFi

Module 17: Due Date

Module 17: Goals

  1. Identify use cases of ETL in data engineering.
  2. Identify basic elements of NiFi.
  3. Identify other Apache ETL tools and discuss their pros & cons.
  4. Use NiFi to create an ETL pipeline.

Module 17: Activities

Module 17: Key Activities

Module 17: Self-Study Activities

Module 17: Related Notes

Module 18: Platforms for Handling Big Data

Module 18: Due Date

Module 18: Goals

  1. Discuss the importance of big data.
  2. Identify key components of big data.
  3. Identify key components of Hadoop architecture.
  4. Set up Hadoop in a Docker container.
  5. Utilize Hadoop to handle big data.
  6. Identify key components of the Hadoop ecosystem.
  7. Describe applications of Hadoop.
  8. Write a Java program to access the Hadoop database.

Module 18: Activities

Module 18: Key Activities

Module 18: Self-Study Activities

Module 18: Related Notes

Module 19: Processing Big Data with Spark and Airflow

Module 19: Due Date

Module 19: Goals

  1. Describe how scalable solutions address challenges of big data.
  2. Use Docker to create and manipulate Spark images and containers.
  3. Use PySpark to query data.
  4. Identify key components of Spark and Airflow.
  5. Identify use cases for Spark and Airflow.
  6. Create a workflow in Airflow.

Module 19: Activities

Module 19: Key Activities

Module 19: Self-Study Activities

Module 19: Related Notes

Module 20: Introduction to Machine Learning

Module 20: Due Date

Module 20: Goals

  1. Solve advanced mathematical problems.
  2. Describe use cases of linear regression.
  3. Apply gradient descent to reduce error.
  4. Explain the importance of optimization in gradient descent
  5. Describe applications of Bayes Theorem.
  6. Implement spam detection using Python.
  7. Identify use cases for Naive Bayes and Gaussian Naive Bayes theorems.
  8. Implement Naive Bayes theorem using Sci-Kit Learn.
  9. Implement Gaussian Naive Bayes theorem using Sci-Kit Learn.

Module 20: Activities

Module 20: Related Notes

Module 21: Introduction to Reinforcement Learning and Deep Neural Networks

Module 21: Due Date

Module 21: Goals

  1. Discuss applications of machine learning algorithms.
  2. Implement k-means using Scikit-learn.
  3. Identify key components of the k-means algorithm.
  4. Discuss use cases for reinforcement learning.
  5. Implement the Quality matrix and the Bellman equation.
  6. Implement the fundamental steps of reinforcement learning.
  7. Identify key components of reinforcement learning and deep neural networks.

Module 21: Activities

Module 21: Related Notes

Module 22: Processing and Streaming Big Data

Module 22: Due Date

Module 22: Goals

  1. Compare applications of the Parquet and Feather formats for reading and writing big data.
  2. Run parallel operations in DASK.
  3. Discuss use cases for parallel computing.
  4. Identify key concepts of DASK and parallel computing.
  5. Discuss use cases of web sockets.
  6. Stream data through web sockets.

Module 22: Activities

Module 22: Related Notes

Module 23: Creating a Data Pipeline

Module 23: Due Date

Module 23: Goals

  1. Discuss use cases for JavaScript.
  2. Identify key concepts related to visualization, unstructured data, and Javascript.
  3. Implement Python tools to visualize word frequency data.
  4. Implement JavaScript tools to visualize word frequency data.
  5. Create a sense-making data pipeline.

Module 23: Activities

Module 23: Related Notes

Module 24: Handling Big Data with Mosquito, ThingsBoard and Kafka

Module 24: Due Date

Module 23: Goals

  1. Identify key concepts related to Mosquito.
  2. Discuss use cases for Mosquito.
  3. Stream live data to ThingsBoard.
  4. Identify key concepts related to ThingsBoard.
  5. Analyze live streaming data using ThingsBoard.
  6. Discuss use cases for ThingsBoard.
  7. Identify key concepts related to Kafka.
  8. Discuss use cases for Kafka.
  9. Construct a web server using Kafka.

Module 24: Activities

Module 24: Related Notes

References

Notes References