#	Pagina
attuale pagina	/open-fp7/projects/93356/index.html
-1	/open-fp7/projects/100255/index.html
-2	/open-h2020/projects/193650/index.html

PLURELEARN

Plural Reinforcement Learning

Coordinatore	TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Organization address address: TECHNION CITY - SENATE BUILDING city: HAIFA postcode: 32000 contact info Titolo: Mr. Nome: Mark Cognome: Davison Email: send email Telefono: +972 4 829 4854 Fax: +972 4 823 2958
Nazionalità Coordinatore	Israel [IL]
Totale costo	100˙000 €
EC contributo	100˙000 €
Programma	FP7-PEOPLE Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
Code Call	FP7-PEOPLE-2009-RG
Funding Scheme	MC-IRG
Anno di inizio	2009
Periodo (anno-mese-giorno)	2009-11-01 - 2013-10-31

Partecipanti

#	participant	country	role	EC contrib. [€]
1	TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Organization address address: TECHNION CITY - SENATE BUILDING city: HAIFA postcode: 32000 contact info Titolo: Mr. Nome: Mark Cognome: Davison Email: send email Telefono: +972 4 829 4854 Fax: +972 4 823 2958	IL (HAIFA)	coordinator	100˙000.00

Mappa

Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

planning simulator algorithms options strategies first combining synergetic trial team uncertainty teacher learning reasoning problem discovery theory dynamic policy models structure error optimisation environments representations paradigm model

Obiettivo del progetto (Objective)

'We propose a new paradigm for learning in complex high-dimensions dynamic environments. Our goal is to develop algorithms, theory, and applications that use plurality of learning approaches and models in a synergetic way. Our paradigm considers the task of learning a control policy by combining trial and error in the style of reinforcement learning with learning from a competent teacher whose interaction with the environment can be observed. Instead of using the teacher for imitation, our paradigm is focused on learning good representations of the world-model. We consider four specific issues in the new paradigm: (i) The usage of iteration and reiteration between learning from a teacher and reinforcement learning. (ii) Learning representation and structure from the teacher. (iii) Optimizing policies based on learned representations and reasoning about model uncertainty. (iv) Learning sub-strategies from a teacher and when and how to use them. We will develop algorithms and theory pertaining to the new paradigm and will apply it in two challenging domains: a fighter jet simulator and a network operating center simulator.'

Introduzione (Teaser)

An EU-funded project established a new paradigm for learning in large-scale, dynamic environments associated with elements of uncertainty.

Descrizione progetto (Article)

The overall goal of the project 'Plural reinforcement learning' (PLURELEARN) was to develop algorithms, theory and applications that use a large number of learning approaches and models in a synergetic way.

To realise this goal, the project team identified three objectives: developing a learning approach combining learning from a teacher and learning by trial and error; devising a structure discovery methodology for reasoning about uncertainty in high-dimensional Markov processes; and developing approaches for algorithm selection and mini-strategies.

The team made progress in meeting these objectives. Research on the first objective resulted in papers on how to use a tutor or expert advice in reinforcement learning paradigms. The work showed new algorithms for the problem of learning from multiple sources, as well as how the algorithms work in medium-scale applications.

The problem of structure discovery (objective 2) proved to be quite complex. After developing theoretical and applied aspects of model selection and structure discovery showing the difficulty of detecting dynamic structure, the team developed two approaches for mitigating risks. The first is based on policy gradients and geared toward problems where a simulator is available. The second is based on a robust optimisation approach, where the focus is on a couple of uncertainties between states.

For the third objective, researchers designed two strategies that may lead to improved performance. The first was a way to modify options and then generate new, improved options. The second was a way to make use of 'randomly generated' options to expedite planning and learning.

The project was successful in developing a new framework for planning and learning in data-driven, variable environments. The research has the potential to open up opportunities for large-scale optimisation of dynamic systems that could have a significant impact on the scale of problems that can be solved.