UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 3 von 11

Anti Imitation-Based Policy Learning

Machine Learning and Knowledge Discovery in Databases, 2016, Vol.9852, p.559-575

2016

Details

Autor(en) / Beteiligte

Titel

Anti Imitation-Based Policy Learning

Ist Teil von

Machine Learning and Knowledge Discovery in Databases, 2016, Vol.9852, p.559-575

Ort / Verlag

Switzerland: Springer International Publishing AG

Erscheinungsjahr

2016

Link zum Volltext

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

The Anti Imitation-based Policy Learning (AIPoL) approach, taking inspiration from the Energy-based learning framework (LeCun et al. 2006), aims at a pseudo-value function such that it induces the same order on the state space as a (nearly optimal) value function. By construction, the greedification of such a pseudo-value induces the same policy as the value function itself. The approach assumes that, thanks to prior knowledge, not-to-be-imitated demonstrations can easily be generated. For instance, applying a random policy on a good initial state (e.g., a bicycle in equilibrium) will on average lead to visit states with decreasing values (the bicycle ultimately falls down). Such a demonstration, that is, a sequence of states with decreasing values, is used along a standard learning-to-rank approach to define a pseudo-value function. If the model of the environment is known, this pseudo-value directly induces a policy by greedification. Otherwise, the bad demonstrations are exploited together with off-policy learning to learn a pseudo-Q-value function and likewise thence derive a policy by greedification. To our best knowledge the use of bad demonstrations to achieve policy learning is original. The theoretical analysis shows that the loss of optimality of the pseudo value-based policy is bounded under mild assumptions, and the empirical validation of AIPoL on the mountain car, the bicycle and the swing-up pendulum problems demonstrates the simplicity and the merits of the approach.

Sprache: Englisch
Identifikatoren: ISBN: 9783319462264, 3319462261
ISSN: 0302-9743
eISSN: 1611-3349
DOI: 10.1007/978-3-319-46227-1_35
Titel-ID: cdi_springer_books_10_1007_978_3_319_46227_1_35

Format: –
Schlagworte: Approximate Policy Iteration, Inverse Reinforcement Learning, Random Policy, Reward Function, Transition Model

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Anti Imitation-Based Policy Learning

Details

Weiterführende Literatur