Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 53
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorPark, Bumgeunko
dc.contributor.authorKim, Taeyoungko
dc.contributor.authorMoon, Woohyeonko
dc.contributor.authorNengroo, Sarvarko
dc.contributor.authorHar, Dongsooko
dc.date.accessioned2023-12-20T07:00:16Z-
dc.date.available2023-12-20T07:00:16Z-
dc.date.created2023-11-30-
dc.date.issued2023-08-10-
dc.identifier.citation19th International Conference on Intelligent Computing, ICIC 2023-
dc.identifier.urihttp://hdl.handle.net/10203/316720-
dc.description.abstractTraining agents via off-policy deep reinforcement learning algorithm requires a replay memory storing past experiences that are sampled uniformly or non-uniformly to create the batches for training. When calculating the loss function, off-policy algorithms commonly assume that all samples are of equal importance. We introduce a novel algorithm that assigns unequal importance, in the form of a weighting factor, to each experience, based on their distribution of temporal difference (TD) error, for the training objective. Results obtained with uniform sampling from the experiments in eight environments of the OpenAI Gym suite show that the proposed algorithm achieves in one environment 10% increase in convergence speed along with a similar success rate and in the other seven environments 3%–46% increases in success rate or 3%–14% increases in cumulative reward, along with similar convergence speed. The algorithm can be combined with existing prioritization method employing non-uniform sampling. The combined technique achieves 20% increase in convergence speed as compared to the prioritization method alone.-
dc.languageEnglish-
dc.publisherSpringer Science and Business Media Deutschland GmbH-
dc.titleOff-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error-
dc.typeConference-
dc.identifier.scopusid2-s2.0-85174703538-
dc.type.rimsCONF-
dc.citation.publicationname19th International Conference on Intelligent Computing, ICIC 2023-
dc.identifier.conferencecountryCC-
dc.identifier.conferencelocationZhengzhou-
dc.identifier.doi10.1007/978-981-99-4761-4_51-
dc.contributor.localauthorHar, Dongsoo-
dc.contributor.nonIdAuthorPark, Bumgeun-
dc.contributor.nonIdAuthorKim, Taeyoung-
dc.contributor.nonIdAuthorMoon, Woohyeon-
dc.contributor.nonIdAuthorNengroo, Sarvar-
Appears in Collection
GT-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0