Neural networks, and machine learning in general, have demonstrated to be extremely powerful tools to allow machines to execute complex tasks . In order to make this possible, a large amount of data has to be available to perform their training. While in the past decades, large amounts of data were available just for some specific institutions, like CERN in Geneva, nowadays this is no longer true. Many modern companies, like Amazon or Facebook just to cite some of them, are able to exploit the full power of machine learning thanks to their huge data banks.
The size of modern neural networks, together with the size of the data sets used to train them, makes training unfeasible in reasonable times without the use of some sort of parallel computing [2,3]. It is then crucial to distribute the effort of the optimization over different workers allowing then the execution of many more operations in parallel, speeding up training . Many different algorithms have been proposed to achieve this task [5,6,7], but none of them have been demonstrated to be robust for arbitrarily large number of parallel processes.
In this master thesis the student will be introduced to modern state-of-the-art literature about parallel training of neural networks and to the technological challenges the algorithmic machinery may face when scaled up to huge scales. The aim of the project is to explore the effects on training of neural networks when the dataset is separated in different ways between the different workers. New algorithms, considering previous work on stochastic optimization (evolutionary algorithms, stochastic local search, …) and hyperparameter optimization  , will further be aimed for in order to orchestrate parallelism. Open datasets will be used in this study. The work will be done in close collaboration with Graphcore, and preferably the student will do a substantial part of their work at the Graphcore Oslo office, but other arrangements can also be considered. Summer internships preceding the master thesis may be offered to interested students.
In this project, the joint interests of sponsor, advisor and student(s) come together. Typically, the project will be based on the problem described by the sponsor AND previous research by Prof. Ole Jakob Mengshoel AND interests of students. If only one or two of these are present, there is no basis for a project.
Please send email(s) to sponsor and/or advisor if you're interested in this project.
Ole Jakob Mengshoel is (more or less) following Keith Downing's selection process for master students - read this:
 Exascale Deep Learning for Climate Analytics https://arxiv.org/pdf/1810.01993.pdf
 GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism, https://arxiv.org/pdf/1806.03377.pdf
 PipeDream: Fast and Efficient Pipeline Parallel DNN Training, https://arxiv.org/pdf/1806.03377.pdf
 Image Classification at Supercomputer Scale, https://arxiv.org/pdf/1811.06992.pdf
 Don't Use Large Mini-Batches, Use Local SGD, https://arxiv.org/abs/1808.07217
 Deep learning with Elastic Averaging SGD, https://arxiv.org/abs/1412.6651
 Stochastic Gradient Push for Distributed Deep Learning https://arxiv.org/pdf/1811.10792.pdf
 Thompson Sampling for Optimizing Stochastic Local Search, ECML PKDD 2017 (2017), Tong Yu, Branislav Kveton, Ole J Mengshoel. https://works.bepress.com/ole_mengshoel/67/
 The Crowding Approach to Niching in Genetic Algorithms.Ole J. Mengshoel and David E. Goldberg. Evol Comput. 2008 Fall;16(3):pp. 315-354. doi: 10.1162/evco.2008.16.3.315.