UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 13 von 8200

Guided deterministic policy optimization with gradient-free policy parameters information

Expert systems with applications, 2023-11, Vol.231, p.120693, Article 120693

2023

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Guided deterministic policy optimization with gradient-free policy parameters information

Ist Teil von

Expert systems with applications, 2023-11, Vol.231, p.120693, Article 120693

Ort / Verlag

Elsevier Ltd

Erscheinungsjahr

2023

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) are two classical deterministic policy gradient algorithms. It is worth noting that the policies of both DDPG and TD3 are completely dependent on the gradient of critics. This will cause the policy to be unstable and easy to converge to the local optimum in the learning process. Although the idea of maximum entropy learning can provide more effective exploration, it can only be applied to the algorithm using stochastic policy, not to DDPG and TD3. In this paper, we propose a deterministic policy optimization method combining gradient-free policy parameters information (GFPPI). Specifically, we obtain a new set of policies by injecting Gaussian noise into the policy parameters, and then weight these policy parameters based on critics to obtain GFPPI. Finally, GFPPI is used as the regularization term of the policy optimization function to guide the policy update. GFPPI can mitigate premature policy convergence and facilitate exploration with optimistic principles. We provide the theoretical guarantee for monotonic improvement of expected cumulative return using augmented loss function with GFPPI, experimentally analyze the role of GFPPI in policy optimization and combine it with deterministic policy gradient information for policy optimization. The experiments on OpenAI gym demonstrate that GFPPI can improve sample efficiency and enable the algorithm to get higher performance. •We present the computational details of GFPPI and analyze two operators.•We theoretical guarantee the effective of GFPPI.•We propose GFPPI-TD3 and it mitigates the policy update instability.•Our GFPPI-TD3 outperforms the SOTA algorithms on six Mujoco environments.

Sprache: Englisch
Identifikatoren: ISSN: 0957-4174
eISSN: 1873-6793
DOI: 10.1016/j.eswa.2023.120693
Titel-ID: cdi_crossref_primary_10_1016_j_eswa_2023_120693

Format: –
Schlagworte: Deterministic policy gradient, Exploration, Local optimum, Policy optimization, Premature convergence, Sample efficiency

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Guided deterministic policy optimization with gradient-free policy parameters information

Details

Weiterführende Literatur