NecSys’19 in Chicago


I attended the 8th IFAC Workshop on Distributed Estimation and Control in Networked Systems (NecSys) in Chicago on September 16-17, 2019. NecSys is a small international single-track conference that tends to attract optimization/controls/distributed systems researchers. The last time I attended NecSys was in 2012 (in Santa Barbara) while I was still a postdoc, and I have fond memories of it. It was nice for NecSys to be local again this year, as it meant that I could bring most of my research group along!

At the conference, Bryan presented a poster on his recent distributed optimization work (arxiv link). In this work, Bryan showed that near-optimal gradient-based distributed optimization algorithms can be designed by alternating between simple gradient steps and gossip steps. In the paper, Bryan shows how to tune the ratio of gradient to gossip steps based on the spectral gap of the adjacency matrix of the underlying communication graph (a measure of how “connected” the graph is) in order to achieve the fastest possible convergence rate.

On a personal note, I had the opportunity at NecSys to reconnect with my former PhD advisor Sanjay Lall and my former postdoc advisor Ben Recht (first photo). The fact that NecSys was local also led to a little road trip back to Madison with my students Mruganka, Saman, and Akhil! We stopped for dinner in “Little India”, which is a neighborhood filled with Indian and Pakistani restaurants and shops in a suburb north of Chicago. Akhil guided us to some traditional Chaat (second photo) and it was delicious!

ACC’19 in Philadelphia



I attended the 2019 American Control Conference in Philadelphia, Pennsylvania. The conference was held in downtown Philadelphia, which is filled with beautiful historical buildings, such as the Union League building, pictured on the right.

This was a conference of firsts. Together with my former postdoc Bin Hu, who is now faculty at UIUC, I co-organized my first ACC workshop! Here is a link to the workshop website. It was a full-day workshop on the interplay between optimization, control, and machine learning. Our morning session focused on how tools from controls, differential equations, and dynamical systems could be used to better analyze and design first-order optimization algorithms, which are a key tool in machine learning. Our afternoon session focused on how ideas from machine learning, reinforcement learning, and complexity theory could be used to address large-scale and computationally challenging control problems. We were lucky to have many great speakers, and our workshop was the most popular one at ACC this year — check out the full room on the right! If you’re interested in these topics, the workshop website linked contains info for all the talks that took place (authors, titles, and slides!).

In addition to the workshop, my student Akhil Sundararajan and postdoc Bryan Van Scoy joined me at ACC and Akhil presented a paper by the three of us titled: “A canonical form for first-order distributed optimization algorithms”. In distributed optimization, the goal is for a number of agents (computer servers, robots, mobile phones, etc.) to jointly solve an optimization problem. The problem is global, but the data collection and processing happens locally. The challenge is to design an algorithm that mediates how each agent processes its local data and what is communicated with the other agents so that each agent eventually learns the solution to the global problem. There have been many such algorithms proposed in recent years, but little insight on what makes one algorithm better than another, or how to go about actually designing such an algorithm. To make matters more confusing, there are many equivalent sets of equations that can be used to describe the same algorithm. So it might happen (and it has!) that two researchers invent the same algorithm without realizing it. Our paper makes an effort to classify distributed algorithms, by providing a canonical form that uses as few parameters as possible yet parameterizes a broad class of algorithms (including many existing published algorithms). Our unifying parameterization is a first step toward a more systematic and principled analysis and design of distributed optimization algorithms.

My student Akhil, who is also first author on the paper, presented our work. As a side note, this was Akhil’s first conference presentation, and he did a fantastic job! (see photo on the right). If you’re interested in seeing Akhil’s slides, you can download them here.

L4DC’19 at MIT

I had the pleasure to attend the inaugural Conference on Learning for Dynamics and Control at MIT. The goal of this new conference is to bring together researchers at the interface between machine learning, control, optimization, and related areas, with a focus on addressing scientific and application challenges in real-time physical processes modeled by dynamical or control systems. The conference format is still in flux, but this year’s version was a single-track format with diverse high-profile speakers presenting on a range of topics including: model-predictive control, deep learning, simulating dynamical systems, robotics, traffic systems, and more. There was also two excellent poster sessions. Overall, the conference was a huge success. Over 400 people attended, and all on very short notice! Only about 3 months elapsed between when the idea of this conference was conceived and when the conference actually occurred.

My two highlights of the conference were: (1) Getting to see all the Berkeley folks I knew through my postdoc but haven’t had much of a chance to interact with since then. Many any of them are now moving on to exciting careers in academia and industry. (2) Having the opportunity to meet Alexandre Megretski for the first time and chat about robust control and IQCs. Megretski wrote some of the seminal work in these areas and it was an honor to meet him in person.

Kudos to the organizers (Ali Jadbabaie, George Pappas, Pablo Parrilo, Ben Recht, and Melanie Zellinger) for putting together such an amazing conference on such short notice! I’m looking forward to attending again next year!

8th Midwest Workshop on Control and Game Theory

I recently attended the 8th Midwest Workshop on Control and Game Theory hosted by the University of Washington in St. Louis. The two-day workshop brought together faculty, postdocs, and grad students in the fields of controls, optimization, game theory, and economics. This was my first time visiting Wash U, and it’s a beautiful campus! The buildings are all built of sandstone, and it reminds me of Queen’s University in Kingston, Canada.

I quite like smaller workshop formats such as this one, so much less hectic than the large conferences I attend like CDC or ICML. It’s also nice to meet and network with other researchers in the Midwest!

At the workshop, I have a 30-minute talk on using a robust controls perspective to analyzing and designing optimization algorithms. The video of my talk is available on YouTube.

CDC’18 in Miami Beach

I attended the 2018 Conference on Decision and Control in Miami Beach, Florida. A welcome change of weather from the looming winter in Madison! The photo above was taken from the terrace of the conference venue.

At the conference, I took part in the Workshop on the Intersections of Machine Learning and Parameter Estimation in Control organized by Travis Gibson, Joseph Gaudio, and Anuradha Annaswamy of MIT. This general field at the intersection of machine learning, optimization, and control seems to be gaining momentum. Indeed, there was another workshop on very similar topics that was held concurrently and it was a shame I couldn’t attend both!

I’ll point out two highlights of the conference. Besides attending sessions on robust control, distributed optimization, decentralized control, and machine learning, I also strayed outside my comfort zone and attended a tutorial session on “Control-Theoretic Methods for Biological Networks”. A growing group of controls researchers have taken interest in biological systems, and I have to say I was both impressed at the progress in this field and humbled at how much there is left to do. Biological systems are incredibly complex and it seems the only way to advance the ball (and to be taken seriously) in this area is to work at the problem from both sides: performing biological experiments in a wet-lab environment and doing some serious mathematical modeling and analysis. I thought Hana El-Samad (UCSF) and Eduardo Sontag (Northeastern) both gave very insightful presentations.


The second highlight was a special session commemorating the life and contributions of Bruce Francis to the field of robust control. Bruce passed away in March of this year and it was nice to see so many people (standing room only) come to celebrate his life and accomplishments at CDC. Bruce’s (arguably) biggest contribution was in developing a state-space solution for the H-infinity control problem. The famous “DGKF” paper is among the most highly-cited papers in the field. The other three co-authors (Doyle, Glover, and Khargonekar) were all in attendance and shared some heartfelt words. The first photo on the right was taken during the CDC special session; John Doyle was recounting some of the initial (mixed) reactions to the DGKF paper, including some choice quotes (see the slide!). The first (and last) time I met Bruce Francis was at the Caltech CDS20 workshop five years ago. Incidentally, all four authors of DGKF were present at that event (see second photo on the right — authors are in the DGKF order!). Although Bruce and I had just met, we bonded over the fact that we are both Canadian, and we both enjoy playing Frisbee! Bruce was also a Professor at the University of Toronto, which is where I did my undergrad. Bruce was soft-spoken, kind, and humble — sentiments echoed by many of his colleagues at CDC this year. He will be missed.

This year’s Bode lecture was delivered by Mark Spong of UT Dallas. The topic was robotics, an area that is not really in my wheelhouse, but I thought Mark did a wonderful job of explaining some basic tools and challenges from the field. What amazed me from the presentation was how simple passivity-based control can be used to create very natural-looking movement patterns that are robust to disturbances. Mark showed examples of a self-righting inverted pendulum and various walking robots, all of which were designed using energy conservation principles. If you’re looking for a well-delivered introductory lecture on robotics, I highly recommend this one! The lecture was recorded and should be posted online here soon.

New preprint: Machine teaching!

Hot off the presses, a new paper by me, Xuezhou Zhang, and Xiaojin (Jerry) Zhu. The paper is titled “An Optimal Control Approach to Sequential Machine Teaching” and is now available on arXiv.

The paper is about “machine teaching”, which can be thought of as the dual problem to “machine learning”. The general paradigm is that we are trying to learn a model (described by a set of parameters) that characterizes some process. We observe data samples drawn from this process and the learning algorithm uses the data to refine its model parameters in an iterative fashion. Machine learning is the forward problem: using the given data to update the parameters. Different choices of algorithms can lead to different learning rates, for example. Machine teaching is the scenario where you know the model you’re trying to learn and you know the learner’s algorithm. Your task is to select the sequence of data samples that leads to the fastest possible learning.

Machine teaching comes up when investigating issues of security and robustness of machine learning algorithms. For example: how susceptible is our algorithm to a “data poisoning attack”, where false data is injected to steer our algorithm off course? Machine teaching also has potential use in designing better curricula for classroom teaching tasks.

In this paper, we apply optimal control theory to the simple case of an SGD learner with a squared loss and training examples that are constrained to satisfy a norm bound. We prove that there always exists an optimal trajectory through parameter space that lies in a 2D subspace, regardless of the ambient dimension of the parameter space. In other words, optimal teaching trajectories can be simple even when there are millions of parameters to learn. Also, we show through numerical experiments that the optimal teaching strategy can be quite counter-intuitive in many cases — optimal trajectories often take a circuitous route through parameter space and yet can perform arbitrarily better than obvious heuristics such as a “one-step greedy approach” or a “straight line” approach.

Princeton Day of Optimization 2018

I attended the inaugural Princeton Day of Optimization hosted by the Princeton Operations Research and Financial Engineering (ORFE). The theme of the day was “The intersection of optimization, control, and machine learning,” and the event did not disappoint! There were six keynote talks on a variety of topics related to the theme and a poster session over lunch showcasing the works of many of the graduate student attendees. The event was organized by my colleague Amir Ali Ahmadi and I must say he did a fantastic job. Putting together an event like this is a tremendous amount of work and everything happened smoothly. The Princeton Day of Optimization will be held every two years and I’m looking forward to the next installment! This was my first time seeing the Princeton campus and I took the opportunity to walk around and admire some of the beautiful historic buildings on the campus. Pictured right is the Princeton University Chapel.

New preprint: A canonical form for distributed optimization algorithms

We have just posted a preprint titled: “A Canonical Form for First-Order Distributed Optimization Algorithms”. It is available here and on arXiv. This is joint work with my student Akhil Sundararajan, my postdoc Bryan Van Scoy, and myself. Our work provides a canonical form for distributed optimization algorithms that will enable more streamlined analysis and design.

There has been much recent interest in the analysis and design of distributed optimization algorithms. In this context, “distributed” means that the computation happens on different computers across a network. Each computing node may have access to different local information. The goal is for each node to arrive at the solution of a global optimization problem that involves everybody’s data, whilst each node only performs local computation and communicates only with nearby nodes. All of this should happen without ever needing to store all the data in one place. This paradigm makes sense in cases where the individual nodes are remote and have limited computing capability, or we are trying to conserve power by transmitting and computing as little as possible. Also, we may not want a single node to gather and store all the available information for privacy or security reasons.

In the past 5 years, other researchers have proposed a variety of solutions to this problem. Proposed methods involve local computation, local memory updates, and communication with nearby nodes, in some prescribed order. Because there is a number of different ways to perform these actions, there is a combinatorial explosion of possible designs, analyses, and outcomes.

Our paper provides a way of streamlining research efforts in this area: we showed that in fact, there aren’t that many different possible distributed optimization algorithms. We were able to parameterize a broad sub-family of algorithms using only 5 scalar parameters. In doing so, we were able to neatly categorize existing methods and in some cases, show that different established algorithms were in fact the same algorithm!

Our hope is that this work will help streamline research efforts in distributed optimization in moving toward a more principled understanding of analysis and design.

New preprint: A general absolute stability theorem

We have just posted a preprint titled: “Unified Necessary and Sufficient Conditions for the Robust Stability of Interconnected Sector-Bounded Systems”. It is available here and on arXiv. This is joint work with my student Saman Cyrus and myself. Our work provides a new absolute stability result that generalizes several existing works and enables new analyses, specifically tight certification of exponential stability.

The problem of “absolute stability” is a classical robust control problem that has been studied for over 75 years. The basic setup is that a known system is interconnected with another system that is nonlinear, uncertain, difficult to model, or otherwise troublesome. The goal is to certify “robust” stability. That is: can we guarantee that the interconnected system will be stable for all possible instances of the troublesome system?

The most classical version of this problem is when the troublesome system is “sector-bounded”. This happens for example if our system contains a saturation nonlinearity. Most results for this setting take the form: “if the known system satisfies a certain property, then the interconnected system will be robustly stable”. In other words, they are “sufficient” conditions for stability. Classical examples in the literature include passivity theory, the small-gain theorem, and the circle criterion. There have also been many efforts to generalize or unify these results. For example, researchers have observed that if we use a particular loop transformation, the three aforementioned results are one and the same!

Our contribution in this paper was to formulate a general version of the classical robust stability theorem that has the following properties:

  • The theorem is both necessary and sufficient. So if the condition fails, there must exist a nonlinearity that yields an unstable interconnection. This parallels existing necessary conditions from the literature.
  • It is formulated in a semi-inner product space. This distills the result to its essential features. In this general space, there need not be any notion of “time”. So we can do away with the typical requirements of stability, causality, and boundedness.
  • Existing results can be recovered by simply choosing the correct vector space and associated semi-inner product. This means we can recover passivity, small-gain, circle, discrete vs continuous time, and more.
  • We also provide a computationally verifiable linear matrix inequality (LMI) that allows one to efficiently check the conditions of the theorem in the special cases of ordinary stability (a known result) and exponential stability (a new result).
Finally, and perhaps most importantly, we can apply our general result to new semi-inner product spaces to obtain new and previously unknown results! In the paper, we showcase one such example, where we formulate a new result that gives necessary and sufficient conditions for exponential stability with a prescribed rate of convergence.

ISMP’18 in Bordeaux

I attended the 23rd International Symposium on Mathematical Programming in Bordeaux, France. ISMP is an optimization conference that is held once every three years and it was my first time ever attending. It was also my first time visiting the south of France since my childhood. A nice opportunity to enjoy French cooking and do a bit of sightseeing… On the right is the Église Sainte-Croix de Bordeaux, a Roman Catholic church that dates back to the 11th century.

At the conference, I gave a 30-min talk on our recent work in distributed optimization. Distributed optimization has been a popular recent topic in the field of controls. The basic setup is that a distributed network of agents are trying to jointly optimize a cost function while only having access to a subset of the globally available data. For example, you can imagine a network of robots equipped with sensors, each of which can gather local information, perform computations, and share data with its neighbors. The goal is to have all agents eventually converge on a global model that accounts for all available data. Performing the computation in a decentralized fashion results in a system that is robust to changes such as an agent malfunctioning or communication links randomly failing.

Specifically, we used a robust control framework to analyze and compare the performance of various distributed optimization algorithms. While pure iterative optimization algorithms (with no distributed component) and pure consensus algorithms (information sharing but no optimization) have been extensively analyzed, the theory behind distributed optimization (a combination of optimization and consensus) is not as mature. Our hope is that the sorts of automated tools we are developing will help further our understanding of distributed algorithms, in the sense of analysis as well as design. This was joint work with my student Akhil Sundararajan and my postdocs Bryan Van Scoy and Bin Hu. The slides for my talk are available here.

My postdoc Bryan Van Scoy also came to ISMP and gave a talk of his own on an accelerated optimization algorithm he developed called the Triple Momentum Method. It is the fastest known accelerated method in the sense that it has the best worst-case performance over the set of strongly convex functions. For more information and for a link to slides, check out Bryan’s website!