Round Corner
Department of Computer and Information Science


Deep Reinforcement Learning: your own project!

[Project in collaboration with NTNU-Telenor AILab]

Deep Reinforcement Learning: your own project!

We know that scaling RL by stitching it together with deep nets works. This brings excitement to all keen on understanding and building autonomous agents solving hard real world problems. But, we have only seen early results brought about by excellent engineering innovations, barring a few fundamental revisits to RL, e.g. distributional perspective on RL [1].

The latter has shown even greater promise, bringing orders of magnitude improvements in terms of data efficiency w.r.t. competing approaches. We need more revisits like these. We need to take a step back and examine the building blocks of RL, deep nets, training regimens, and the nature of dynamic systems, to see how and when these complement each other. This may lead to novel learning schemes and substrates. In so doing, we will find ways to make RL work orders of magnitude better than it does today.

Fundamental revisits can be made keeping various real world challenges in mind. Some of these include,

Data efficiency: We usually do not have the luxury of utilising simulated environments or simulating real world environments accurately enough to be able to generate data to learn from. We want to extract as much knowledge from as little data as possible.
Exploration: We want to design algorithms which let agents explore to gain knowledge quickly. This applies to both when learning in simulation or in live operation. In live operation, one has to further consider that explorations can be unsafe. We want to go beyond the undirected exploration strategies commonly employed.
Temporal abstractions: Real world decision making problems can have compositional characteristics, where the task can be divided into sub-tasks, solutions to which can be composed to solve a harder task. We want to consider compositionality of problems to build and reuse knowledge for decision making at different time scales.
Generalisation: We want to make algorithms that enable agents to not simply master a task, but work across a large number of tasks. Furthermore, we want agents to gracefully adapt either if the task they are employed to solve changes or if the agents are re-employed to solve another task with similar characteristics which the agent may not have directly experienced before.
We are open to students proposing their own projects in light of the aforementioned ideas. But we are also completely open to students bringing in their own points of views.


[1] Bellemare M.G., Dabney W., Munos R. (2017) A Distributional Perspective on Reinforcement Learning. In: Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, PMLR 70.


Massimiliano Ruocco Massimiliano Ruocco
Adjunct Associate Professor
261 IT-bygget
991 04 568
NTNU logo