My new article titled: “The Analysis of Optimization Algorithms: A Dissipativity Approach” appeared in the June 2022 issue of the IEEE Control Systems Magazine. In the article, I explain how the behavior of iterative optimization algorithms can be viewed through the lens of energy dissipation. In mechanical systems, total energy is always conserved. So if some of that energy is dissipated as heat, what remains (kinetic plus potential) must decrease over time, and the system will eventually come to rest. Analogously, optimization algorithms can be shown to have an abstract notion of energy that dissipates over time, driving the system to come to rest (convergent behavior). In this blog post, I explain dissipativity theory and how it pertains to algorithm analysis!Continue reading
It was an honor and a privilege to be one of eight invited speakers at the bi-annual NCCR Automation Symposium. The symposium was hosted in-person at ETH Zürich in May 2022.
The theme of this symposium was “Systems Theory of Algorithms”. As explained on the symposium website, many iterative algorithms in optimization, games, and learning can be viewed as dynamical systems evolving over time with inputs (real-time signals, historic data samples, or random seeds), outputs (decision variables or algorithm iterates), and uncertainties (noise or unknown problem parameters). The dynamical systems perspective on such algorithms is key in future automation, AI, and socio-technical systems, where challenges arise from the analysis and design of algorithmsContinue reading
Wordle is a popular daily word game that has taken the world by storm. The rules are simple: Guess the hidden 5-letter “solution word” using at most 6 guesses. Every time you make a guess, the letter tiles of your guess word are colored to provide a hint. Wordle was recently acquired by the New York Times, and remains free, though now at a new website.
What is the best first guess to use in Wordle? Is it always possible to win in 5 moves or fewer? what about 4 moves? How much harder is Hard Mode? In this article, I will answer all these questions and more!Continue reading
I had the honor and privilege to be a thesis opponent for the Ph.D. defense of Martin Heyden at Lund University’s Department of Automatic Control in Sweden. The picture on the right is with Anders Rantzer, Martin Heyden, and Richard Pates (Anders and Richard were Martin’s co-advisors). During my visit to Lund, I also gave a seminar on the topic of robust control and optimization algorithms. Lund University is one of the few places in the world to have a department dedicated solely to controls! Most other universities have controls experts spread across various departments (electrical, mechanical, industrial, aerospace, etc.).Continue reading
Excited to share a new manuscript, by Bryan Van Scoy and me titled “The Speed-Robustness Trade-Off for First-Order Methods with Additive Gradient Noise”. It is now available on arXiv. I also have some slides here.
When solving an optimization problem, a ubiquitous approach is to use an iterative algorithm that finds progressively better candidate solutions until a satisfactory solution is obtained. In cases where the problem instances are very large, such as in machine learning applications, gradient-based iterative methods are the most popular. Think of the task of finding the point of highest elevation in a vast landscape. At each step, you measure the slope of the terrain, and move in the direction of steepest increase. Since we don’t know what the terrain shape will be ahead of time, we want algorithms that work well for a wide variety of terrains. For this reason, a popular way of evaluating algorithms is to look at their worst-case convergence rate on a given class of functions. For example, we might be solving a large linear regression problem, so we know ahead of time that the function we are optimizing is a convex quadratic. Much work has gone into finding algorithms that are “optimal” for a given class of functions. For example, we know that the Heavy Ball method is optimal for convex quadratics and the Triple Momentum Method is optimal for smooth strongly convex functions.Continue reading
I’m excited to share our new preprint on robust stabilization, by Saman Cyrus and me. The title is “Generalized necessary and sufficient robust boundedness results for feedback systems”, and it is now available on arXiv. This work is the culmination of Saman’s PhD and formed the core of his thesis, which he recently successfully defended (!).
Robust stabilization is an old problem, dating back at least 70 years to the seminal work of Lur’e, Zames, and Willems. The basic idea is that a known system G is in feedback with an unknown nonlinearity H constrained to lie in some known set, and we seek sufficient conditions on G that ensure stability of the interconnection. In this work, we looked at the case where the nonlinearity is constrained to lie in a cone (also known as “sector-bounded”). Results of this type include: small-gain, circle, passivity, and extended conicity. It may be possible to extend our ideas to dynamic versions of robust stability (multiplier theory, dissipativity, or integral quadratic constraints), but this is an area for future work.Continue reading
I attended the 2020 American Control Conference, which was supposed to take place in Denver, Colorado, but moved to a 100% online format as a precaution against COVID-19. I normally include a photo of the conference venue with these blog posts, but I don’t think anybody would be interested in seeing my home office setup so I’ll go with no photo this time!
At ACC, I presented a paper with Peter Seiler on the topic of using IQCs for algorithm synthesis. Iterative algorithms like gradient descent can be viewed as robust controllers. In robust control, we have an uncertain dynamical system we would like to control, and we assume the uncertainty is bounded. For example, we could have a mechanical or electrical system consisting of components that are subject to manufacturing tolerances and so we can’t know their exact parameters. Or perhaps their parameters change slowly over time as wear and tear sets in. The goal is to design a feedback controller that works for all systems in this bounded set. This robustness is desirable because if we build 100 identical robots, they won’t actually be identical. Using a robust controller means that we can design a single controller to use in all robots, and it will work even though the robots are not exactly identical. Optimization algorithms are very similar: we want to design an iterative optimization algorithm that works on a variety of different objective functions. The optimization algorithm plays the role of the controller and the objective function plays the role of the uncertain plant. In this paper, we showed how IQC synthesis tools from robust control theory can be used to design robust iterative algorithms.
This year, ACC experimented with a new type of presentation. Rather than the standard 20-minute talk, presenters were given the option to do 3-minute “rapid interactive” talks. I opted for this new format, and since all talks had to be pre-recorded, I’m happy to share with you my 3-minute talk explaining our paper! Here is a link to the video. My short talk was also awarded one of the inaugural “Rapid Interactive People’s Choice Awards”!
Overall, I the conference was very well run and I commend the organizers and staff for getting everything together under such difficult circumstances. I definitely miss the ability to meet with people face-to-face and the spontaneous encounters and discussions that happen at large conferences, but there are also advantages to an online format. There are no logistical/scheduling issues that arise from two talks happening at the same time. And the ability to pause a talk or watch it again later is very nice!
I attended the 2019 Conference on Decision and Control in Nice, France. The photo above is a panoramic view of Nice taken from the top of Castle Hill. My research group had a solid showing at this year’s CDC with 3 papers. In no particular order, here are short summaries of each paper and a links to the slides for each talk. Continue reading
I attended the 2019 INFORMS Annual Meeting in Seattle, WA on Oct 20-23, 2019. INFORMS is the leading international association for Operations Research & Analytics professionals. It’s outside my wheelhouse, but I thought it would be nice to attend since Operations Research does touch on optimization, modeling, and data-driven decision-making, which are topics that interest me.
This was my first time attending an INFORMS conference and I was blown away by the sheer size and scope. There were well over 6,000 attendees and the conference had 100 simultaneous tracks over 4 full days. Compare this to CDC (the flagship annual controls conference), which is about 1/4 of the size.Continue reading
I attended the 57th annual Allerton Conference on Communication, Control, and Computing in Monticello, IL. At the conference, I presented an invited talk in a session organized by Bin Hu, my former postdoc who is now at UIUC — so it was nice to see Bin again!
My talk was largely about the following recent paper (arxiv link) by my student Akhil Sundararajan, my postdoc Bryan Van Scoy, and me. The manuscript should get published at some point in 2020 once we make our final revisions. The paper describes how gradient-based distributed optimization algorithms can be parameterized and analyzed (in terms of their worst-case convergence rate) using simple semidefinite programs. In the paper, we study the case where the underlying graph is possibly time-varying (with uniformly bounded spectral gap), and we characterize the worst-case performance under any adversarially-varying graph. The benefit of this approach is that it enables rapid analysis, which means it also enables efficient design! Our paper also presents an “optimal” algorithm design (optimal in the sense that it has the fastest worst-case convergence rate) and a comparison between this algorithm and existing algorithms from the literature. Finally, we show that our approach likely produces tight bounds because we can construct worst-case sequences of graphs and functions that achieve the bound.