General Prerequisites:
Basic linear algebra (such as eigenvalues and eigenvectors of real matrices), multivariate real analysis (such as norms, inner products, continuity, differentiability), multivariable calculus (such as Taylor expansions, multivariate differentiation, gradients), basic probability and statistics. All these notions are covered in Prelims Linear Algebra, Analysis, Calculus (Intro and Multivariate), Probability and Statistics. Part A Numerical Analysis is recommended but not essential.
Course Term: Hilary
Course Lecture Information: 16 lectures
Course Weight: 1
Course Level: H
Assessment Type: Written Examination
Course Overview:
Optimisation problems occur naturally in statistics, data analysis, machine learning, and in signal and information processing. Their efficient solution is, for example, the cornerstone of successfully training machine learning models, principal component analysis, matrix completion, and many more. This course covers a few of the typical problem formulations in that appear in such large scaled modern models and then focuses specifically on gradient-based algorithms (namely, deterministic and stochastic first-order methods) and their convergence and complexity analyses for (mainly) convex problems.
Learning Outcomes:
Course participants learn how to model applied problems as optimisation problems and gain an understanding of the underpinnings of gradient-based algorithms for the solution of such problems. While many variants of such algorithms have been published in the modern literature, this course will cover the key formulations and analyses on which state-of-the-art methods are based. The algorithms we study introduce the basic tools that will enable participants to understand the strengths and limitations of these and other algorithms more generally.
Course Synopsis:
Mathematical preliminaries: Global and local optimisers, convexity, gradients and subgradients, optimality conditions, convergence rates.

Steepest descent method and its convergence analysis in the general case, the convex case and the strongly convex case.

Modelling: least squares, matrix completion, sparse inverse covariance estimation, sparse principal components, sparse plus low rank matrix decomposition, support vector machines, logistic regression, deep learning.

Proximal operators and prox-gradient methods.

Accelerating gradient methods: heavy ball method and Nesterov acceleration.

Oracle complexity and the stochastic gradient descent algorithm.

The variance reduced stochastic gradient descent algorithm.

Dimensionality reduction techniques for large scale optimisation (Johnson-Lindenstrauss Lemma)

Data sketching: linear least squares and sums of functions (batch stochastic gradient)

Parameter sketching: Randomised coordinate descent first order methods and random subspace methods.