Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
The International journal of robotics research, 2023-08, Vol.42 (9), p.633-654
2023

Details

Autor(en) / Beteiligte
Titel
Stabilizing deep Q-learning with Q-graph-based bounds
Ist Teil von
  • The International journal of robotics research, 2023-08, Vol.42 (9), p.633-654
Ort / Verlag
London, England: SAGE Publications
Erscheinungsjahr
2023
Link zum Volltext
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
  • State-of-the art deep reinforcement learning has enabled autonomous agents to learn complex strategies from scratch on many problems including continuous control tasks. Deep Q-networks (DQN) and deep deterministic policy gradients (DDPGs) are two such algorithms which are both based on Q-learning. They therefore all share function approximation, off-policy behavior, and bootstrapping—the constituents of the so-called deadly triad that is known for its convergence issues. We suggest to take a graph perspective on the data an agent has collected and show that the structure of this data graph is linked to the degree of divergence that can be expected. We further demonstrate that a subset of states and actions from the data graph can be selected such that the resulting finite graph can be interpreted as a simplified Markov decision process (MDP) for which the Q-values can be computed analytically. These Q-values are lower bounds for the Q-values in the original problem, and enforcing these bounds in temporal difference learning can help to prevent soft divergence. We show further effects on a simulated continuous control task, including improved sample efficiency, increased robustness toward hyperparameters as well as a better ability to cope with limited replay memory. Finally, we demonstrate the benefits of our method on a large robotic benchmark with an industrial assembly task and approximately 60 h of real-world interaction.
Sprache
Englisch
Identifikatoren
ISSN: 0278-3649
eISSN: 1741-3176
DOI: 10.1177/02783649231185165
Titel-ID: cdi_crossref_primary_10_1177_02783649231185165

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX