Aleksandar (Alex) Vakanski

CS 488/588: Applied Data Science with Python

Course Website and GitHub Repository

Course website: https://fall-2024-applied-data-science-with-python.readthedocs.io/en/latest/

GitHub repository: https://www.github.com/avakanski/Fall-2024-Applied-Data-Science-with-Python/blob/main/README.md

Link to the course materials from previous years: Fall 2023, Fall 2022

Course Syllabus

Syllabus

Course Description

The course introduces students to Python tools and libraries that are commonly used by organizations for managing the various phases in the life cycle of data science projects. The content is divided into four main themes. The first theme reviews the fundamentals of Python programming. The second theme focuses on data engineering and explores Python tools for data collection, exploration, and visualization. The next theme covers model engineering and includes topics related to model design, selection, and evaluation for image processing, natural language processing, and time series analysis. The last theme introduces Data Science Operations (DSOps) and encompasses techniques for model serving, performance monitoring, diagnosis, and reproducibility of data science projects deployed in production. Throughout the course, students will gain hands-on experience with various Python libraries for data science workflow management. Additional work is required for graduate credit.

Learning Outcomes

Upon the completion of the course, the students should demonstrate the ability to:

  1. Attain proficiency with commonly used Python frameworks for managing the life cycle of data science projects.
  2. Develop pipelines for integrating data from multiple sources, designing predictive models, and deploying the models.
  3. Apply Python tools for data collection, analysis, and visualization, such as NumPy, Pandas, Matplotlib, and Seaborn, to real-world datasets.
  4. Implement machine learning algorithms for image processing, natural language processing, and time series analysis using Python-based frameworks, such as Scikit-Learn, Keras, TensorFlow, and PyTorch.
  5. Understand the principles of model selection and evaluation, including hyperparameter tuning, cross-validation, and regularization.
  6. Understand the primary characteristics of current Python libraries for deployment, continuous integration, and monitoring of data science projects.
  7. Deploy data science projects as web applications using Flask, FastAPI, and Django, and to cloud servers using Microsoft's Azure platform.

Course Materials

There are no required textbooks for this course. All course materials will be provided by the instructor.

Topics

  • A Short History and Current State of Artificial Intelligence
  • Python Basics Review: Data Types, Statements, Files, Expressions, Functions, Iterators, Generators
  • Object-Oriented Programming, Exceptions, Modules, Packages
  • NumPy for Array Operations, Data Manipulation with Pandas, Data Visualization with Matplotlib, Seaborn
  • Data Exploration and Preprocessing
  • Databases and SQL
  • Scikit-Learn Library for Data Science, Ensemble Models
  • Artificial Neural Networks for Classification and Regression
  • Convolutional Neural Networks with Keras and TensorFlow, PyTorch
  • Model selection, Hyperparameter Tuning, Callbacks
  • Natural Language Processing with Keras-TensorFlow, Hugging Face
  • Transformer Networks
  • Diffusion Models for Text-to-Image Generation
  • Large Language Models, Fine-tuning a Pretrained Model
  • Introduction to Data Science Operations (DSOps), Model Serving in a Production Environment
  • Deploying Projects as Web Applications, Deploying Projects to the Cloud
  • Reproducible Data Science Projects, Docker Containers, Kubernetes

Prerequisites

The course requires to have basic programming skills in Python. While having knowledge of data science methods would be advantageous, it is not mandatory.

Evaluation Procedure

Quizzes (3) 30 %
Assignments (6) 60 %
Class participation 10 %