Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 25 von 1544
Management science, 2023-11, Vol.69 (11), p.6898-6911
2023
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Approximation Benefits of Policy Gradient Methods with Aggregated States
Ist Teil von
  • Management science, 2023-11, Vol.69 (11), p.6898-6911
Ort / Verlag
Linthicum: INFORMS
Erscheinungsjahr
2023
Quelle
美国运筹学和管理学研究协会期刊(NSTL购买)
Beschreibungen/Notizen
  • Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, in which the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per period is bounded by ϵ , the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as ϵ / ( 1 − γ ) , where γ is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision objective can be far more robust. This paper was accepted by Hamid Nazerzadeh, data science. Supplemental Material: Data are available at https://doi.org/10.1287/mnsc.2023.4788 .
Sprache
Englisch
Identifikatoren
ISSN: 0025-1909
eISSN: 1526-5501
DOI: 10.1287/mnsc.2023.4788
Titel-ID: cdi_proquest_journals_2891167984

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX