CS 404/504 Special Topics: Python Programming for Data Science
Course Syllabus
Course GitHub page
https://github.com/avakanski/Fall-2022-Python-Programming-for-Data-Science
Course Description
With the increased use of data science projects for improving various functions and operations across organizations, the tools for managing such projects have matured as well. This course introduces students to Python tools and libraries that are commonly used by organizations for management of the different phases in the life cycle of data science projects. The content is divided into four main themes. The first theme reviews the basics of Python programming and extends it with advanced concepts. The second theme focuses on data engineering, and covers Python tools for data exploration and preprocessing. The next theme overviews model engineering, and includes model training, testing, fine-tuning, and selection. The last theme introduces Data Science Operations (DSOps), and covers techniques for model deployment, performance monitoring, and reproducibility of data science projects in production environment. The course will provide hands-on Python programming experience for data science workflow management. Additional work is required for graduate credit.
Learning Outcomes
Upon the completion of the course, the students should demonstrate the ability to:
- Understand and describe commonly used Python frameworks for life cycle management of data science projects.
- Apply advanced Python tools for data collection, analysis, and visualization.
- Design, validate, and justify the selection of data science models using statistical approaches, data mining, and machine learning methods.
- Implement algorithms for processing tabular, image, and natural language data using Python-based frameworks.
- Understand the main characteristics of existing Python libraries for deployment, continuous delivery, and monitoring of data science projects.
- Deploy data science projects on cloud servers and as web applications.
Course Materials
Textbooks:
- Joel Grus, "Data Science from Scratch: First Principles with Python," 2nd Edition, O'Reilly Media, 2019, ISBN: 9781492041139.
- Chip Huyen, "Designing Machine Learning Systems," O'Reilly Media, 2022, ISBN: 9781098107963.
Topics
- A Short History and Current State of Artificial Intelligence
- Python Basics Review: Data Types, Statements, Expressions, Functions and Scope, Exception Coding
- Object-Oriented Programming, Modules and Packages
- Python Decorators, Iterators, Generators, Functional Programming, Callbacks, Closures
- Data Collection, Scrapping the Web, NumPy for Array Operations
- Data Manipulation with Pandas, Data Visualization with Matplotlib
- Feature Selection, Feature Engineering
- Databases and SQL: Query Tables, Groups, Ordering, Subqueries
- Scikit-Learn Library for Data Science, Classification, Regression
- Convolutional Neural Networks for Image Classification with Keras and TensorFlow, PyTorch
- Natural Language Processing, Time Series Analysis and Forecasting
- Transformer Networks, Language Models with Hugging Face
- Model Selection, Fine-tuning, AutoML
- Diffusion Models for Text-to-Image Generation
- Configuring Data Science Projects, Git and Version Control
- DSOps Tools, Model Serving in a Production Environment
- Tools for Monitoring Data Science Projects, Data Distribution Shifts
- Reproducible Data Science Projects, Docker Containers, Kubernetes, Virtual Environments
- Deploying Data Science Projects as Web Applications, Deploying Data Science Projects to the Cloud
Prerequisites
CS 212 Practical Python - OR - CS 477/577 Python for Machine Learning - OR - Instructor Permission
The course requires basic programming skills in Python. Knowledge of data science approaches is recommended, but it is not required.
Evaluation Procedure
Quizzes (3) | 30% |
Assignments (4) | 60% |
Class participation | 10% |