Matrix Methods in Machine Learning
University of Wisconsin–Madison
Instructor: Laurent Lessard
This course is an introduction to machine learning that focuses on matrix methods and features real-world applications ranging from classification and clustering to denoising and data analysis. Mathematical topics covered include: linear equations, regression, regularization, the singular value decomposition, and iterative algorithms. Machine learning topics include: the lasso, support vector machines, kernel methods, clustering, dictionary learning, neural networks, and deep learning. Students are expected to have taken a course in vector calculus, such as MATH 234, and have exposure to numerical computing (Matlab, Python, Julia, or equivalent). Appropriate for graduate students or advanced undergraduates.
IMPORTANT: The notes and videos below are from Fall 2017, which was the last time Prof. Lessard taught this course. More recent offerings of the course might use different notes/materials. The notes are provided for your reference only!
- These slides summarize key linear algebra results. (last updated: 10/17/2016)
Part I: Linear algebra and least squares
|Lecture notes||Video||Date||Supplementary reading + materials|
|1.||Introduction and overview||textbook: §1.1–1.2 and 1.6|
|2.||Block matrices and norms||video||textbook: §2.1–2.3|
|3.||Linear independence and rank||video||textbook: §2.5–2.6|
|4.||Subspaces and linear equations||video|
|5.||Least squares||video||textbook: §3.6|
|6.||Vector derivatives and PSD matrices||video|
|7.||Orthogonality and Gram–Schmidt||video||textbook: §4.1|
|8.||LS classification and cross-validation||video|
|9.||Motivating the SVD||video||poly_fitting.m: regression + CV demo|
|10.||Review for first exam||video|
Part II: The singular value decomposition and iterative algorithms
|Lecture notes||Video||Date||Supplementary reading + materials|
|11.||Matrix norms and the SVD||video||textbook: §2.4 and 6.1–6.2|
|12.||SVD geometry and PCA||video||textbook: §6.4|
|13.||Low-rank approx. and pseudoinverse||video||textbook: §6.3 and 6.7|
|14.||Geometry and sensitivity||video||textbook: §6.5–6.6, linear_sensitivity.m|
|15.||Trade-offs and regularization||video|
|19.||Convexity and SVM||video|
|21.||Stochastic gradient method||video|
|22.||Review for second exam||video|
Part III: More machine learning
|Lecture notes||Video||Date||Supplementary reading + materials|
|23.||Max-margin SVM and kernels||video|
|25.||Neural networks and perceptron||video|
|26.||Convolutional neural networks||video||ImageNET papers: 2012, 2014.|
|27.||Unsupervised learning and k-means||video||Dictionary learning paper.|
|28.||Matrix problems and vectorization||video|
Discussion forum:We will use Piazza for class-related discussions. Do not email the course staff directly with questions. Use Piazza or attend office hours!
- On Piazza, anybody can ask questions, answer questions, or edit/improve existing responses. If you’re struggling with a concept, chances are you’re not alone—by posting your question on Piazza, it helps everybody!
- You also have the option to post/respond anonymously if you prefer.
- Do not use Piazza to ask for solutions, do not post solutions, and be nice to each other.
Cheat sheet:I put together some slides summarizing the main linear algebra results covered in class. These slides will be updated as the course progresses. You can download the latest version here (last updated: 10/17/2016)
Textbook:We will make use of the following textbook throughout the class:
- Lars Eldén. Matrix Methods in Data Mining and Pattern Recognition. SIAM, 2007.
Coding:You will be required to write code to solve some of the homework problems. You may write code in Matlab, Python, or Julia. If you’d like to use another language, please ask first.
- Matlab is widely used in industry and academia for scientific data analysis and engineering development. As a UW student, you have access to this software for free via the Campus Software Library. When doing demos in class, I will use Matlab.
- Python is a free, open-source, general-purpose, high-level programming language that is also widely used in industry and in the data science community. For scientific computing purposes (e.g. matrix manipulation, plotting, etc.) you’ll want to install the SciPy stack (NumPy, matplotlib, etc.). The easiest way to get up and running is to install a bundle that contains all the required packages. I recommend Anaconda.
- Julia is a free and open-source language for technical computing. It’s still a rather new language and is used mainly by academics and researchers. The syntax is similar to both Matlab and Python. Pros: it tends to run faster than Matlab, and has a syntax that is much more intuitive for scientific computing compared to Python. Cons: it’s not as widely used so it’s sometimes difficult to find good support.
- Tue Sep 6: first lecture
- Mon Oct 10, 4:50pm–7:00pm: Midterm #1 (Engineering Hall, Room 1800)
- Mon Nov 21, 4:50pm–7:00pm: Midterm #2 (Engineering Hall, Room 1800)
- Thu Nov 24: Thanksgiving (no class)
- Thu Dec 15: last lecture
- Other important dates (e.g. drop deadline): https://registrar.wisc.edu/fall_deadlines_at_a_glance.htm
Tentative syllabus:Here is a list of tentative topics to be covered. We may cover more or less depending on how things go!
- Part I (LS, weeks 1-5): matrix and block-matrix products, norms, linear independence, least squares regression, vector derivatives and PSD matrices, subspaces and orthogonality, Gram-Schmidt, classification using least-squares, cross-validation, overfitting. Midterm #1.
- Part II (SVD, weeks 6-11): singular value decomposition, PCA, low-rank approximation, pseudoinverse, multi-objective trade-offs, pareto curves, alternate norm choices, regularization, convexity, lasso, support vector machines, simple iterative methods, soft thresholding. Midterm #2.
- Part III (ML, weeks 12-15): stochastic gradient descent, max-margin and kernalized SVM, neural networks, deep learning, perceptron, convolutional networks, image identification, unsupervised learning, clustering, dictionary learning, K-means, spectral clustering. Final Project.
Grading:There will be three main components to your grade:
- Homework: 20%. There are weekly homework assignments (roughly 10 total). No assignments will be due the week before exams, or during the final weeks of the semester, to give students time to study and/or work on their projects. Each assignment is graded on a rough scale from 0 to 4. Your lowest homework grade will be dropped.
- Exams: 40%. There will be two midterm exams (20% each). The exam will be taken outside of regular class hours. Dates have already been set, so be sure that you have no scheduling conflicts and plan accordingly.
- Project: 40%. There will be a group project due at the end of the semester. More information here.
- Homework assignments turned in late will not be accepted.
- It is your responsibility to ensure that you are available and present for the exam. The date and time are already set, so plan accordingly!
- Exceptions will be made to the rules above in order to accommodate special circumstances. This includes family or medical emergencies, religious observances, and documented disabilities. If you have a special circumstance and foresee a conflict, please email the instructor as soon as possible to make alternative arrangements.
More information about the homework
- All homework assignments are due by 11:00pm on the date indicated.
- Assignments must be turned in electronically (as PDF or images) via gradescope. For a tutorial on how to do this, please watch this video. If you’ll be scanning documents using a camera phone, read this guide.
- You may use a typesetting program (e.g. Word, LaTeX, IPython notebook,…) or write neatly by hand. Either way, it must end up as a PDF or image files. If coding is required, you may use Matlab, Python, or Julia. If you want to use something else, ask first. Either way, your code must be included as part of your solution. Printing/saving as PDF is usually the easiest way to go.
- It is your responsibility to ensure that what you turn in is legible, there are no pages missing, etc.
- Start each problem on a new page. If a problem has many parts, it’s OK to answer them on a single page.
- Explain your work. This means write in words how you solved the problem, and if the problem asks you to “show” something, we are expecting you to prove it. If code is required, use intuitive variable names, and comment any code you turn in.
- You are encouraged to discuss homework problems with classmates and even work in groups. However, the work you turn in must be your own. If you use any external sources (e.g. the internet) be sure to cite your sources!
OverviewThe goal of the project is to create a tutorial on a topic related to machine learning. You will work in groups of up to 3 students. Your target audience is students that just finished taking the class. Your project should either (1) introduce/explain/demonstrate a machine learning technique that we didn’t cover in class but that builds on the mathematical foundations we laid, or (2) compare and contrast several techniques covered in class but applied to a type of problem we didn’t consider. The project will be in the style of a lab. i.e. teach a new concept and then test the knowledge by posing problems related to the new concepts.
Final reportThe final report should be in PDF format, and it is the only thing you turn in. Although you will undoubtedly write code as part of your project, you must include your code in the report; it will not be turned in separately. We will use Gradescope for the submissions, just as with the Homework assignments. The only difference is that we will use the “group submission” feature, which allows several students to be associated with one submission. Only one submission per group. The report should contain the following parts:
- Title page: This first page should contain the title for the project, and the the names, email addresses, and student numbers of the group members.
- Executive summary (15%): This short section (about 1/2 page) should clearly explain what the project is about, which machine learning techniques are explained and utilized, and what a reader can expect to learn and accomplish if they follow along and work through the examples and problems. This is a very important part of your project, because it’s the first thing the reader sees. Be clear, be concise, and use this opportunity to convey what makes this project interesting/unique/cool.
- Background (30%): This section should explain the context for the main idea behind the project (you may divide this section into multiple subsections if it helps with readability). Start with a description of the problem that will be solved, a brief history (with citations) of how the problem came about, why it’s important/interesting, and any other interesting facts you’d like to talk about. Feel free to use images/diagrams/etc. from research papers, the internet, or other sources to help with your explanation, so long as you cite your references. Finally, the background section should also give a mathematical description of the problem, with equations as needed.
- Warm-up (10%): This section should pose a few (3 or 4) short problems. Think of this as a way of testing whether the reader was paying attention while reading the previous section. These can be math problems, or computational in nature. Either way, they should be short-answer problems, not too difficult, and should serve to prepare the reader for the more difficult tasks that lie ahead. For example, if the reader is expected to use a particular Matlab package later on, this would be a good place to introduce that package and give a simple problem to solve. Solutions to the warm-up should be included in the appendix.
- Lab (35%): This is the main section of the report. It should contain 3 to 5 longer tasks that call on the reader to solve, analyze, visualize, and discuss problems related to the main theme of the project. Each problem may be accompanied by additional explanations if necessary (you can also use multiple subsections if it helps). The problems don’t need to be in a list; you may elect to intersperse the problems within your text but be sure to clearly label them so the reader knows what they’re being asked to do. With regards to format, use your best judgement for what makes your report most readable. If you are using external data files/sources, be sure to address and explain where the problem data is coming from (research? the internet? synthetically generated?). Complete solutions to the lab problems should be included in the appendix.
- References: This section contains the references cited in the Background or Lab sections. Points for this section are counted as part of the Background section.
- Appendix: This section contains complete solutions (including code, figures, etc.) for the warm-up problems and for the lab. If virtually the same code is used to solve different problems, there is no need to include the same code twice. Just make a note of what is different. Points for this section are counted as part of the Warm-up/Lab.
- How long should the final report be? Aim for 12-16 pages including appendices. This is not a strict limit, just a guideline. e.g. if you have figures that take up a lot of space, I don’t mind if you exceed the limit.
- If I really want to work by myself, may I? Yes.
- Can I use Python? Can I use Julia? What about another language? Yes.
Topic ideasYou are encouraged to come up with your own ideas for project topics. However, if you’re struggling to come up with something, here are some ideas to get you started:
- Recommender Systems and Collaborative Filtering http://www.slideshare.net/erikbern/collaborative-filtering-at-spotify-16182818
- Matrix Completion http://statweb.stanford.edu/~candes/papers/SVT.pdf
- Nonlinear Dimensionality Reduction http://www.beermapperapp.com
- Support Vector Machines https://www.cs.cmu.edu/~cga/ai-course/svm.pdf
- Deep Learning http://arxiv.org/abs/1406.3332
- Sparse Coding and Dictionary Learning http://www.di.ens.fr/willow/pdfs/icml09.pdf
- Active Learning http://nowak.ece.wisc.edu/next_template.html
- Neuronal Spike Sorting http://www.scholarpedia.org/article/Spike_sorting
- Sparse Methods for Machine Learning http://www.di.ens.fr/~fbach/nips2009tutorial/nips_tutorial_2009_sparse_methods.pdf
- Topic Modeling http://www.cl.uni-heidelberg.de/courses/ss12/topicmodels/intro.pdf
- Independent Component Analysis http://www.stat.ucla.edu/~yuille/courses/Stat161-261-Spring14/HyvO00-icatut.pdf
- Spectral Clustering http://cs.nyu.edu/~dsontag/courses/ml14/notes/Luxburg07_tutorial_spectral_clustering.pdf
- Climate Data Analysis http://www.princeton.edu/~rvdb/tex/LocalWarming/LocalWarming.pdf
- Image Segmentation http://www.cis.upenn.edu/~jshi/GraphTutorial/
- Anomaly Detection http://www.cs.bu.edu/faculty/crovella/paper-archive/sigc04-network-wide-anomalies.pdf
- Deconvolution and Deblurring http://www.mathcs.emory.edu/~nagy/courses/fall06/ID_lecture1.pdf
- Genomic Data Analysis and Classification http://public.lanl.gov/mewall/kluwer2002.html
- Spectral Learning Algorithms for Natural Language Processing http://www.cs.columbia.edu/~scohen/naacl13tutorial/