Round Corner
Department of Computer and Information Science


Deep Reinforcement Learning

[Project in collaboration with Telenor-NTNU AILab]

Reinforcement Learning enables an artificial agent to learn how to behave in an environment by iteratively interacting with it. Recently, the combination of reinforcement learning algorithms with deep neural networks has brought unprecedented success in previously challenging domains, opening the door to the dream of building truly intelligent artificial agents able to excel at a wide variety of tasks. Despite this recent success, there are still many open challenges in the domain of deep reinforcement learning, such building trustful models/simulators of partially-observable high-dimensional state environments, increasing the stability of learning algorithms or offline evaluation of learnt policies. In this project the student will work with one of these challenges.

More in details:

1) RL for Dialogue systems:
- Deep Reinforcement Learning for Dialogue Generation [2017] (
2) Generalisation in Deep Reinforcement Learning
Unlike supervised learning, where large diverse test sets help evaluate how a method generalises, reinforcement learning methods are typically evaluated on the mastery over a task. This project will look at different ways in which generalisation can be engineered within the RL framework such that an agent performs well on unseen tasks. Some of these include:
Diversity in the experience provided to an agent
Training an agent on multiple tasks, where the tasks may either be provided beforehand or generated by the agent
Emphasising finding different ways of solving a task
Engineering task agnostic/abstract models of environmental dynamics for agent to use
Discovering and learning options at different levels of abstraction in time, with focus on reusability of skills
3) Decoupling Time in Model Based Deep Reinforcement Learning
Modelling or simulating the environment in which an agent acts can assist the agent to learn and plan from simulated experiences, as opposed to having to act in the wild. The latter may prove expensive from the perspective of data efficiency or indeed safety of an interaction. Models help an agent imagine, predicting how the dynamics of an environment unfold with respect to its policy, thereby foreseeing and remedying the consequences of its policy. This project will explore different ways in which models can be built and used for learning and planning, specifically to understand how decoupling a model from time affects agent planning. Decoupling the model would entail predicting salient future sequence of events, as opposed to predicting environmental dynamics at the default time scale (time dependent) at which the agent acts. A time dependent model may aid short term planning while a decoupled model may aid planning in the long term. Combining these models may aid both.

(Project in collaboration with Telenor Research)

A minimal background in Machine Learning is requested (at least one of two courses)


Massimiliano Ruocco Massimiliano Ruocco
Adjunct Associate Professor
261 IT-bygget
991 04 568
NTNU logo