PLURELEARN

Plural Reinforcement Learning

 Coordinatore TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY 

 Organization address address: TECHNION CITY - SENATE BUILDING
city: HAIFA
postcode: 32000

contact info
Titolo: Mr.
Nome: Mark
Cognome: Davison
Email: send email
Telefono: +972 4 829 4854
Fax: +972 4 823 2958

 Nazionalità Coordinatore Israel [IL]
 Totale costo 100˙000 €
 EC contributo 100˙000 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2009-RG
 Funding Scheme MC-IRG
 Anno di inizio 2009
 Periodo (anno-mese-giorno) 2009-11-01   -   2013-10-31

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY

 Organization address address: TECHNION CITY - SENATE BUILDING
city: HAIFA
postcode: 32000

contact info
Titolo: Mr.
Nome: Mark
Cognome: Davison
Email: send email
Telefono: +972 4 829 4854
Fax: +972 4 823 2958

IL (HAIFA) coordinator 100˙000.00

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

optimisation    reasoning    discovery    team    first    options    representations    synergetic    environments    combining    problem    policy    planning    learning    teacher    theory    algorithms    structure    error    simulator    models    uncertainty    paradigm    dynamic    model    strategies    trial   

 Obiettivo del progetto (Objective)

'We propose a new paradigm for learning in complex high-dimensions dynamic environments. Our goal is to develop algorithms, theory, and applications that use plurality of learning approaches and models in a synergetic way. Our paradigm considers the task of learning a control policy by combining trial and error in the style of reinforcement learning with learning from a competent teacher whose interaction with the environment can be observed. Instead of using the teacher for imitation, our paradigm is focused on learning good representations of the world-model. We consider four specific issues in the new paradigm: (i) The usage of iteration and reiteration between learning from a teacher and reinforcement learning. (ii) Learning representation and structure from the teacher. (iii) Optimizing policies based on learned representations and reasoning about model uncertainty. (iv) Learning sub-strategies from a teacher and when and how to use them. We will develop algorithms and theory pertaining to the new paradigm and will apply it in two challenging domains: a fighter jet simulator and a network operating center simulator.'

Introduzione (Teaser)

An EU-funded project established a new paradigm for learning in large-scale, dynamic environments associated with elements of uncertainty.

Descrizione progetto (Article)

The overall goal of the project 'Plural reinforcement learning' (PLURELEARN) was to develop algorithms, theory and applications that use a large number of learning approaches and models in a synergetic way.

To realise this goal, the project team identified three objectives: developing a learning approach combining learning from a teacher and learning by trial and error; devising a structure discovery methodology for reasoning about uncertainty in high-dimensional Markov processes; and developing approaches for algorithm selection and mini-strategies.

The team made progress in meeting these objectives. Research on the first objective resulted in papers on how to use a tutor or expert advice in reinforcement learning paradigms. The work showed new algorithms for the problem of learning from multiple sources, as well as how the algorithms work in medium-scale applications.

The problem of structure discovery (objective 2) proved to be quite complex. After developing theoretical and applied aspects of model selection and structure discovery showing the difficulty of detecting dynamic structure, the team developed two approaches for mitigating risks. The first is based on policy gradients and geared toward problems where a simulator is available. The second is based on a robust optimisation approach, where the focus is on a couple of uncertainties between states.

For the third objective, researchers designed two strategies that may lead to improved performance. The first was a way to modify options and then generate new, improved options. The second was a way to make use of 'randomly generated' options to expedite planning and learning.

The project was successful in developing a new framework for planning and learning in data-driven, variable environments. The research has the potential to open up opportunities for large-scale optimisation of dynamic systems that could have a significant impact on the scale of problems that can be solved.

Altri progetti dello stesso programma (FP7-PEOPLE)

PLEBPOLEUINT (2010)

Plebiscitary Politics in European Integration: Analysing the Causes and Effects of Holding Referendums on the EU

Read More  

VOLCEXPLOSEIS (2009)

Experimental reconstruction of volcanic explosions: understanding the fragmentation energy balance and seismic signals

Read More  

NORA (2013)

NORA - Nitrous Oxide Research Alliance Training Network

Read More