New preprint: Machine teaching!

Hot off the presses, a new paper by me, Xuezhou Zhang, and Xiaojin (Jerry) Zhu. The paper is titled “An Optimal Control Approach to Sequential Machine Teaching” and is now available on arXiv.

The paper is about “machine teaching”, which can be thought of as the dual problem to “machine learning”. The general paradigm is that we are trying to learn a model (described by a set of parameters) that characterizes some process. We observe data samples drawn from this process and the learning algorithm uses the data to refine its model parameters in an iterative fashion. Machine learning is the forward problem: using the given data to update the parameters. Different choices of algorithms can lead to different learning rates, for example. Machine teaching is the scenario where you know the model you’re trying to learn and you know the learner’s algorithm. Your task is to select the sequence of data samples that leads to the fastest possible learning.

Machine teaching comes up when investigating issues of security and robustness of machine learning algorithms. For example: how susceptible is our algorithm to a “data poisoning attack”, where false data is injected to steer our algorithm off course? Machine teaching also has potential use in designing better curricula for classroom teaching tasks.

In this paper, we apply optimal control theory to the simple case of an SGD learner with a squared loss and training examples that are constrained to satisfy a norm bound. We prove that there always exists an optimal trajectory through parameter space that lies in a 2D subspace, regardless of the ambient dimension of the parameter space. In other words, optimal teaching trajectories can be simple even when there are millions of parameters to learn. Also, we show through numerical experiments that the optimal teaching strategy can be quite counter-intuitive in many cases — optimal trajectories often take a circuitous route through parameter space and yet can perform arbitrarily better than obvious heuristics such as a “one-step greedy approach” or a “straight line” approach.