C5.10 Mathematics and Data Science for Development (2018-2019)

Dr Neave O'Clery
General Prerequisites: 

No formal pre-requisites, although it is suggested that the Networks course (C5.4) would provide an ideal foundation. In any case, some basic knowledge of linear algebra and probability theory will be assumed. Additionally, students should be comfortable with basic data manipulation and matrix operations in Matlab (other software is permissible for project work but will not be covered in classes).

Course Term: 
Course Weight: 
1.00 unit(s)
Course Level: 

Assessment type:

Course Overview: 

This will be a challenging contemporary course which requires students to

  • Read and critically analyse recent academic papers across disciplines
  • Connect theories in the social sciences to novel datasets and mathematical tools
  • Acquire and handle imperfect data, and take an innovative approach to project work
  • Effectively communicate results to a non-expert policy/government audience

Course material, inc. lecture slides, is available at https://www.maths.ox.ac.uk/node/31297. Please note you will have to be logged in to view the content.

Course Synopsis: 

Module 1: Introduction to data and policy-making (2 lectures)
Policy-makers and governments often have to operate under high uncertainty, with limited information and tools. Conventional data collection techniques are expensive, and unavailable in many countries. Traditional tools have also been limited, centred around simple pattern recognition and linear associations. This course will explore a new wave of data, including so-called ‘big data’ derived from mobile sensing technologies, and tools to extract information about complex spatial, social and economic processes.
• Course overview, types of data (e.g., administrative data, mobile phone data, GPS data etc.), data-wrangling, data processing/cleaning and data classifications/ontologies;
• What is critical to impact policy-making? Key principles for research design, communication and visualisation.

Module 2: Information spreading in social networks (4 lectures)
Social networks, where nodes connect individuals based on friendship (or other) ties, have been the focus of social and network scientists for many decades. Such networks are particularly important in an economic and development setting as individual behaviour is known to be affected by peers in a large number of contexts. This is particularly true in environments where low access to external information implies personal trust is placed at a premium, and social networks play a key role in information spread. A number of network characteristics play a critical role in social network analysis, particularly in the context of information spread. For example, homophily, whereby similar individuals tend to cluster in the network, has implications for spreading processes and can lead to poverty traps. Recent applications include a network model for the diffusion of information about a microfinance scheme in rural villages. This example can be seen as a general model for adoption of a new product or idea, a question of critical importance to policy-makers. More generally, however, in order to scale up the study of social networks to larger populations, alternative methods to survey data collection are required. Some of the most promising avenues for large scale construction of social networks are through mass social media and mobile communications data.
• Network basics: adjacency matrices, node degree, degree distributions, paths, node centralities (betweenness centrality, katz centrality, eigenvalue centrality, PageRank);
• Network models: Erdos Renyi, scale free, small world, generation models (Watts- Strogatz, configuration model, BA model etc.);
• Social networks: homophily, clustering coefficient, assortativity, entropy, inferring social networks from social media and mobile phone data, relationship of network structure to social and economic indicators, segregation and migration patterns, political views etc.;
• Network dynamics: random walks and diffusion, mathematical connection to communities and centralities, robustness/sensitivity of dynamics to network structure and node/edge removal, applications including microfinance in rural communities

Module 3: Urban maps and flows (4 lectures)
Cities are the drivers of economic growth, fuelled by rapid rural to urban migration around the world. In some sense, urban centres thrive against the odds as over-crowding and congestion costs are offset by high diversity and productivity of both firms and people. We will focus on the dual policy/development challenges of minimising these congestion costs in terms of transport and mobility, while also understanding and maximising the attributes which drive the success of cities. Specifically, today we have unparalleled real-time information on how and when people move, including open data from Uber, Google maps and mobile phone providers. Combined with street networks, this information enables us to accurately model and predict the movement patterns of urban dwellers, and extract information on the physical extent of a city beyond administrative boundaries. Beyond characterising physical networks, it has been observed that the hyper-connectivity of cities (people, places, firms etc.), which grows exponentially with size, manifests as a power law scaling relationship between city population and a large number of factors such as income, crime, innovation, road density, .. The exponent of this relationship characterises the propensity of cities, irrespective of scale, to adjust each factor according to population, and is remarkably stable across cities.
• Street networks: characterising and comparing the shape of a city (e.g., angles and street lengths, centralities, entropy measures), models for optimal ride-sharing (e.g., Uber-pool) and road congestion;
• Public transport networks: analysing flows using edge and path metrics, coreperiphery structure, multi-layer networks;
• Mobility models: analysis of patterns of urban activity using Oyster card and mobile phone data, intra and inter-city mobility models including the gravity law and radiation model;
• City scaling laws: scale-free distributions, power laws, exponents.

Module 4: Networks as landscapes for economic diversification processes (3 lectures)

How do nations, cities and regions develop their economies? At the intersection of economics, development and geography, a leading theory of economic growth focuses on the diversification processes of places into new economic activities that are ‘close’ to their current capability base. In essence, a place is constrained by what it can already do or make, and combines existing capabilities to move into complex economic activities via a combinatoric evolutionary process. It is possible to model these processes using networks derived from administrative data (e.g., trade, employment, patents, firm transactions). Resulting metrics are used to both predict growth patterns at a detailed level (not possible via many traditional economic models) and aid regional policy-makers identify potential new industries. These network models form a basis for industrial policy within a large number of international development organisations including the World Bank, IMF and OECD.
• Brief overview of traditional ways to think about growth and industrial progress, economic complexity metrics using bipartite networks constructed from export data, method of reflections, eigenvalue methods for iterative systems;
• Industry network construction using a variety of methods and sources representing different economic mechanisms, network-based models for industrial diversification, model testing and validation including statistical tests for comparing distributions;
• Introduction to econometrics including ordinary least squares linear regression analysis, multi-variate regression analysis, network and spatial regression models, discussion of approaches to assess directionality/causality.

Module 5: Machine learning for development and policy (3 lectures)
Machine learning has emerged as a powerful generalisation of traditional linear regression techniques, capable of handling large datasets with many variables and non-linear relationships. Only recently applied to policy questions, well-known examples include an Oxford-led study on the susceptibility of jobs to automation, and the prediction of criminal re-offending rates. Here we introduce two types of machine learning algorithm: generalised linear models as an extension of the linear models covered earlier in the course, and decision trees and the ‘random forest’ algorithm. Finally, we will discuss current challenges for big data and machine learning in policy-making.
• Generalised linear models, regularization and Lasso method, application to housing prices;
• Decision trees, bootstrapping/bagging, random forest algorithm;
• Application of random forest and variable importance methods to rural crop productivity, challenges and open questions related to using big data and machine learning for policy.

Reading List: 

Module 2

  • Onnela, J. P., Saramäki, J., Hyvönen, J., Szabó, G., Lazer, D., Kaski, K., ... & Barabási, A. L. (2007). Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences, 104(18), 7332-7336.
  • Eagle, N., Macy, M., & Claxton, R. (2010). Network diversity and economic development. Science, 328(5981), 1029-1031.
  • Banerjee, A., Chandrasekhar, A. G., Duflo, E., & Jackson, M. O. (2013). The diffusion of microfinance. Science, 341(6144), 1236498.

Module 3

  • Santi, P., Resta, G., Szell, M., Sobolevsky, S., Strogatz, S. H., & Ratti, C. (2014). Quantifying the benefits of vehicle pooling with shareability networks. Proceedings of the National Academy of Sciences, 111(37), 13290-13294.
  • Simini, F., González, M. C., Maritan, A., & Barabási, A. L. (2012). A universal model for mobility and migration patterns. Nature, 484(7392), 96-100.
  • Bettencourt, L. M., Lobo, J., Helbing, D., Kühnert, C., & West, G. B. (2007). Growth, innovation, scaling, and the pace of life in cities. Proceedings of the National Academy of Sciences, 104(17), 7301-7306.

Module 4

  • Hidalgo, C. A., Klinger, B., Barabási, A. L., & Hausmann, R. (2007). The product space conditions the development of nations. Science, 317(5837), 482-487.
  • Hidalgo, C. A., & Hausmann, R. (2009). The building blocks of economic complexity. Proceedings of the National Academy of Sciences, 106(26), 10570-10575.

Module 5

  • Frey, C. B., & Osborne, M. A. (2017). The future of employment: how susceptible are jobs to computerisation? Technological Forecasting and Social Change, 114, 254-280.
  • Athey, S. (2017). Beyond prediction: Using big data for policy problems. Science, 355(6324), 483-485.