UB Paderborn / Katalog / Suche / Details

The International journal of robotics research, 2023-08, Vol.42 (9), p.633-654

2023

Autor(en) / Beteiligte

Titel

Stabilizing deep Q-learning with Q-graph-based bounds

Ist Teil von

Ort / Verlag

London, England: SAGE Publications

Erscheinungsjahr

2023

Link zum Volltext

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

State-of-the art deep reinforcement learning has enabled autonomous agents to learn complex strategies from scratch on many problems including continuous control tasks. Deep Q-networks (DQN) and deep deterministic policy gradients (DDPGs) are two such algorithms which are both based on Q-learning. They therefore all share function approximation, off-policy behavior, and bootstrapping—the constituents of the so-called deadly triad that is known for its convergence issues. We suggest to take a graph perspective on the data an agent has collected and show that the structure of this data graph is linked to the degree of divergence that can be expected. We further demonstrate that a subset of states and actions from the data graph can be selected such that the resulting finite graph can be interpreted as a simplified Markov decision process (MDP) for which the Q-values can be computed analytically. These Q-values are lower bounds for the Q-values in the original problem, and enforcing these bounds in temporal difference learning can help to prevent soft divergence. We show further effects on a simulated continuous control task, including improved sample efficiency, increased robustness toward hyperparameters as well as a better ability to cope with limited replay memory. Finally, we demonstrate the benefits of our method on a large robotic benchmark with an industrial assembly task and approximately 60 h of real-world interaction.

Sprache: Englisch
Identifikatoren: ISSN: 0278-3649
eISSN: 1741-3176
DOI: 10.1177/02783649231185165
Titel-ID: cdi_crossref_primary_10_1177_02783649231185165

Format: –
Schlagworte: Algorithms, Control tasks, Decision analysis, Deep learning, Divergence, Lower bounds, Markov processes, Task complexity

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX