Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - INFLUENCE (Influence-based Decision-making in Uncertain Environments)

Teaser

Decision-theoretic sequential decision making (SDM) is concerned with endowing an intelligent agent with the capability to choose actions that optimize task performance. SDM techniques have the potential to revolutionize many aspects of society, and recent successes, e.g...

Summary

Decision-theoretic sequential decision making (SDM) is concerned with endowing an intelligent agent with the capability to choose actions that optimize task performance. SDM techniques have the potential to revolutionize many aspects of society, and recent successes, e.g., agents that play Atari games and beat a world champion in the game of Go, have sparked renewed interest in this field.

However, despite these successes, fundamental problems of scalability prevent these methods from addressing other problems with hundreds or thousands of state variables. For instance, there is no principled way of computing an optimal or near-optimal traffic light control plan for an intersection that takes into account the current state of traffic in an entire city. The problems are that 1) most recent progress is based on using neural networks (i.e., deep learning), but that it is not easy to scale up the inputs to these networks, and still train them efficiently, and 2) such method offer no guarantees or understanding of how well they really perform (what the quality of delivered solutions is).

INFLUENCE will develop a new class of influence-based SDM methods that overcome scalability issues for such problems by using novel ways of abstraction. Considered from a decentralized system perspective, the intersection’s local problem is manageable, but the influence that the rest of the network exerts on it is complex. The key idea is that by using (deep) machine learning methods, we can learn sufficiently accurate representations of such influence to facilitate near-optimal decisions. We call these representations \'approximate influence points\' (AIPs).

The objective are to

1 generate formal understanding of the use of approximate influence point (AIP) representations and develop a formal framework of influence-based SDM,

2 develop novel machine learning methods that can effectively induce representations for AIPs from trajectory data,

3 develop novel simulation-based planning methods that use AIPs to efficiently plan for very large problems,

4 develop novel influence-based reinforcement learning methods that learn an abstracted model of their environment and use this for effective exploration,

5 investigate two high-impact approaches to exploit AIPs in multiagent coordination.

If successful, INFLUENCE will produce a range of influence-based SDM algorithms that can, in a principled manner, deal with a broad range of very large complex problems consisting of hundreds or thousands of variables, thus making an important step towards realizing the promise of autonomous agent technology, particularly for domains such as intelligent traffic light control, or coordination of multi-robot teams.

Work performed

Project started February 2018. Due to a move to Delft University recruitment was delayed by a few months, which meant that in the first half year of the project, the PI was the only person working on the project. During this time the PI divided time over recruiting, doing preparatory work for the simulation and evaluation framework (WP7) and working on the theoretical foundations of AIPs (WP1).

Also, the PI has furthered ongoing collaborations which closely relate to the influence project:

-With Castellini, Savani and Whiteson, we explored the representation capacity of action-value networks. This shows that indeed there seem to be inherent limits on what we can expect to learn with a neural networks in combination with methods such as Q-learning. The work also shows that \'factorization\' into multiple abstractions seems a promising way to overcome these limitations in many problem. As such, it provides further motivation for the core objectives within INFLUENCE, and also relates to WP5.

-With Savani et al., we investigated novel methods for generative adversarial networks, which could be one of the ways in which we can learn (generative models of) AIPs, and thus is closely related to WP2.

-With Katt and Amato, we investigated Bayesian RL for POMDPs in factored settings. This lays the foundation for BRL that makes use of AIPs in WP4.

One postdoc and three PhD students have been recruited and started in fall 2018. A further PhD student is to join the project during fall 2019. The new team members have taken some time to get up to speed with the relevant background. We developed an initial shared code base to run reinforcement learning problems using the SUMO traffic simulator, and developed the infrastructure to deploy our experiments, and process results.

Making use of this infrastructure, we made a first step beyond the state of the art, by integrating the ideas underlying influence-based abstraction in the context of deep reinforcement learning (this relates to WP3, simulation-based planning). The resulting workshop paper (also submitted to a major conference) shows that by keeping track of a small set of variables in the history of previous actions and observations we can learn policies that can effectively control a local region in the global system. This gives further justification to the goals of INFLUENCE.

In October 2018, we had a project meeting where we visited Magazino GmbH to learn more about the robotic warehouse solutions they deliver and what challenges are that they face. In particular, we clarified at what levels of abstraction they simulate various processes. This is critical for us to develop a warehousing simulator to use in INFLUENCE which may be useful in practice.

Final results

Main results were covered above.

Expected results until end of the project:

-Theoretical results about influence-based abstraction and AIPs. We will analyze the relation between AIP representations and the optimal value function, and investigate what AIP-properties can bound error loss.

-Practical machine learning methods to lean AIPs. We will investigate the applicability of a number of machine learning methods, including deep learning methods, which have shown excellent performance on sequence data.

-Simulation-based planning methods are that can use AIPs to plan faster. In particular we will investigate if this can lead to self-improving simulators that use AIPs to improve their own efficiency.

-We will investigate how to combine AIP-learning with existing RL techniques to define novel learning methods for abstracted models and how to perform effective exploration.

-We will investigate approaches to exploit AIPs in multiagent coordination.

Website & more info

More info: https://ii.tudelft.nl/influence/.