EPSRC logo

Details of Grant 

EPSRC Reference: EP/Y003187/1
Title: Exploring Causality in Reinforcement Learning for Robust Decision Making
Principal Investigator: Du, Dr Y
Other Investigators:
Researcher Co-Investigators:
Project Partners:
University of San Diego
Department: Informatics
Organisation: Kings College London
Scheme: Standard Research - NR1
Starts: 01 December 2023 Ends: 31 May 2025 Value (£): 164,560
EPSRC Research Topic Classifications:
Artificial Intelligence
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
17 May 2023 ECR International Collaboration Grants Panel 1 Announced
Summary on Grant Application Form

Reinforcement learning (RL) has seen significant development in recent years and has demonstrated impressive capabilities in decision-making tasks, such as games (AlphaStar, OpenAI Five), chatbots (ChatGPT, and recommendation systems (Microsoft). The techniques of RL can also be applied to many fields, such as transportation, network communications, autonomous driving, sequential treatment in healthcare, robotics, and control. Unlike traditional supervised learning, RL focuses on making a sequence of decisions to achieve a long-term goal. This makes it particularly well-suited for solving complex problems. However, while RL has the potential to be highly effective, there are challenges that need to be addressed in order to make it more practical for real-world applications, where changing factors cannot be fully considered in training the agent, such as traffic regulations, weather, and clouds. To empower RL algorithms to be deployed in a range of real applications, we need to evaluate and improve the robustness of RL when facing complex changes in the real world and task shifts.

In this project, we aim to develop robust and generalisable reinforcement learning techniques from a causal modelling perspective. The first thrust focuses on utilising causal model learning to create compact and robust representations of tasks. This compact and robust task representation can greatly benefit the overall performance of the RL agent by reducing the complexity of the problem and making the agent's decision-making process more efficient. As a result, the agent can learn faster and generalise better to unseen tasks, which is especially important in real-world scenarios where data is scarce and the complexity of tasks can vary greatly.

The second research thrust focuses on the development of efficient and generalisable algorithms for task assignment transfer. This can enable the RL agent to adapt to new tasks more quickly and effectively and to generalise the learned knowledge to different but related tasks. This is crucial for real-world scenarios where the agent needs to operate in different environments or the task requirements change over time.

One example of an application that would benefit from these contributions is autonomous driving in an industrial setting. While RL agents are usually trained in simulators, they may not perform well in real-world road scenarios and can be easily distracted by task-irrelevant information. For example, visual images that autonomous cars observe contain predominantly task-irrelevant information, like cloud shapes and architectural details, which should not influence the decision on driving.

In this project, we aim to enable the agent to learn a compact and robust representation of the task, enabling it to only retain state information that is relevant to the task, adapt to changing driving scenarios safely, and generalise its knowledge to related tasks such as adapting to the different driving rules in the United States (right-hand drive).

A causal understanding can help identify the minimal sufficient representations that are essential for policy learning and transferring and achieve safe and controllable explorations by leveraging causal structures and counterfactual reasoning.

It can mitigate the issues that are suffered by most existing RL approaches, such as being data-hungry and lacking interpretability and generalisability.

The outcome of this project can greatly improve the scalability and adaptability of RL agents, making them more suitable for real-world applications.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: