Summary of Applied Machine Learning in Python (2017-18)

Applied Machine Learning in Python (2017-18)

General Prerequisites:

Undergraduate course in probability, statistics and linear algebra (equivalent to Oxford Part A courses in these subjects).

Course Term: Hilary

Course Overview:

Machine learning lies at the intersection of mathematics, statistics, optmisation and computer science and aimed at developing algorithms that learn from data without being explicitly programmed. During this course, we will focus on two main types of machine learning: supervised and unsupervised learning. In supervised learning, the algorithms search for patterns in data using the labeled training examples that allow us to make predictions about the unseen data coming from a similar distribution. In unsupervised learning, data are not labeled. The unsupervised algorithms explore patterns and structures to extract meaningful information about the distribution of data without the explicit guidance. In addition, we will explore some of the novel mathematical methodologies to learn from sequential data (the signature method) and data processing.

Course Syllabus:

Lecturer(s):

Dr Andrey Kormilitzin

Learning Outcomes:

Students will have developed a practical knowledge of a range of supervised and unsupervised algorithms for classification, regression and clustering. Additionally, they will learn several data preprocessing, feature extraction and dimensionality reduction methods. This course will be taught in Python programming language with practices in NumPy, Pandas, ESig and scikit-learn packages. By the end of the course, students will be able to develop machine learning pipelines for simple supervised and unsupervised tasks as well as estimate the performance if such pipelines.

Course Synopsis:

• Python machine learning framework: managing and installation of relevant packages.
• Tutorials on NumPy, Pandas and ESig packages for data manipulation.
• Supervised learning: k-nearest neighbors, linear/logistic regression, support vector machines, decision trees (optional: random forest and naïve Bayes)
• Unsupervised learning: k-means clustering, Gaussian mixture models
• Data processing: principle component analysis, signature method
• Regularisation, cross-validation methods and evaluation of learning algorithms