Data Science & Engineering
University of Wisconsin–Madison
Instructor: Laurent Lessard
A hands-on introduction to Data Science using the Python programming language. The course is intended for Freshmen and Sophomores of any major that have limited prior experience in computer programming or data science. The course teaches how to think about data-centric problems in a computational way. Given data from real-world phenomena, students will learn to describe, analyze, and make predictions. To this effect, the course will also introduce programming in Python, which is the most widely used programming language in the data science industry. Topics covered include: how to import, manipulate, summarize, and visualize data of various types, how to perform descriptive analyses such as clustering and principal component analysis, how to perform predictive analyses such as classification and regression, and notions of bias, fairness, and ethics in data science.
Prerequisites: There are no prerequisites for this course. We will provide you with the tools you need and teach you how to use them. Most importantly, we will equip you with the knowledge and ability to continue using what you’ve learned long after you complete the class and for the rest of your career as a student and beyond.
IMPORTANT: The materials below are from Fall 2019-20, which was the last time Prof. Lessard taught this course. More recent offerings of the course might use different notes/materials.
The class is organized into modules.
- Python modules show how to perform specific computations and tasks using Python and/or Jupyter notebooks. These modules consist of lecture slides containing explanations and code snippets.
- Concept modules explain a new concept, typically from a mathematical, geometric, or intuitive perspective, with illustrative examples. These modules consist of lecture slides.
- Case studies apply the concepts covered in previous modules to a realistic use case. This typically includes manipulating and analyzing data sets, and interpreting and visualizing the results. Case Study modules are IPython notebooks (ipynb) and also contain a short quiz at the end.
- Introduction/Survey modules give a high-level overview of what’s to come or a summary of what has been covered thus far. These modules consist of lecture slides.
Part I: Python and Jupyter basics
|1.||Introduction||What is data science? How to learn it?|
|2.||Python 1||Introduction to Python|
|3.||Python 2||Intro to Jupyter|
|4.||Python 3||Data types and functions|
|5.||Python 4||Lists and tuples|
|6.||Python 5||Dictionaries and series|
|8.||Concept 1||Visualizing data using charts|
|9.||Python 7||Data visualization using Python|
|11.||Concept 2||Numerical summaries|
|12.||Python 9||Flow control|
Part II: Unsupervised learning
|13.||Concept 3||Points, distances, embeddings|
|14.||Concept 4||Clustering and K-Means|
|15.||Case study 1||Clustering with the Iris dataset (ipynb)||iris.csv|
|16.||Concept 5||Random numbers|
|17.||Python 10||Pivot tables|
|18.||Concept 6||Data cleaning|
|19.||Python 11||Data cleaning in Python|
|20.||Concept 7||Principal component analysis|
|21.||Case study 2||PCA with Company dataset (ipynb)||companies.zip|
|22.||Case study 3||PCA with MNIST dataset (ipynb)||mnist.npz|
|23.||Survey 1||Unsupervised Learning|
Part III: Supervised learning
|24.||Concept 8||Classification and K-nearest-neighbors|
|25.||Case study 4||KNN with the Iris dataset (ipynb)||iris.csv|
|26.||Concept 9||Decision trees|
|27.||Case study 5||Decision Trees (ipynb)||DT.zip|
|28.||Concept 10||Overfitting and model selection|
|29.||Case study 6||Overfitting and model selection (ipynb)||(none)|
|30.||Python 12||Array manipulation in NumPy|
|31.||Concept 11||Linear regression|
|32.||Concept 12||Multiple regression|
|33.||Case study 7||Multiple regression with a housing dataset (ipynb)||housing_data.csv|
|34.||Case study 8||Polynomial regression (ipynb)||(none)|
|35.||Concept 13||Biased data|
Part IV: Time series
|36.||Concept 14||Time series|
|37.||Python 13||Manipulating time series|
|38.||Concept 15||Autoregressive models|
|39.||Case study 9||Autoregression (ipynb)||daily-min-temp-melb.csv|
|40.||Survey 2||Supervised learning|
ECE 204 is intended to be a first course in programming and learning to reason with data. It was the first course of its kind at UW-Madison, created and first taught by Laurent Lessard with help from Teaching Assistants Scott Sievert, Pankaj Kabra, and Shashank Varma. The course is still under active development and continues to evolve.
In other words: what are the skills you will acquire upon completing this class?
- Write working code in Python to import, manipulate, analyze, visualize, and otherwise interact with datasets of various types. If you don’t know what “writing code” even means, you’ll learn that too!
- Perform descriptive analyses to extract, summarize, and interpret salient features from datasets.
- Perform predictive analyses to model trends and make predictions from datasets.
- Apply techniques to identify and clean data that contains missing entries, outliers, or other forms of noise or uncertainty.
- Recognize and evaluate potential issues pertaining to bias, fairness, privacy, and ethics in applying data science techniques. Also understand the limits of what data can do.
A combination of in-class activities, homework assignments, midterm exams, and a final exam. These will largely be hands-on activities where you will complete tasks on your computer and submit your answers electronically.
The only thing you will need is a laptop. All course-related materials and software will be provided.