DSpace at KOASAS: Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

DSpace at KOASAS

College of Engineering(공과대학)Cho Chun Shik Graduate School for Mobility(조천식모빌리티대학원)GT-Conference Papers(학술회의논문)

Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

Cited 0 time in webofscience

Cited 0 time in

Hit : 53
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Park, Bumgeun	ko
dc.contributor.author	Kim, Taeyoung	ko
dc.contributor.author	Moon, Woohyeon	ko
dc.contributor.author	Nengroo, Sarvar	ko
dc.contributor.author	Har, Dongsoo	ko
dc.date.accessioned	2023-12-20T07:00:16Z	-
dc.date.available	2023-12-20T07:00:16Z	-
dc.date.created	2023-11-30	-
dc.date.issued	2023-08-10	-
dc.identifier.citation	19th International Conference on Intelligent Computing, ICIC 2023	-
dc.identifier.uri	http://hdl.handle.net/10203/316720	-
dc.description.abstract	Training agents via off-policy deep reinforcement learning algorithm requires a replay memory storing past experiences that are sampled uniformly or non-uniformly to create the batches for training. When calculating the loss function, off-policy algorithms commonly assume that all samples are of equal importance. We introduce a novel algorithm that assigns unequal importance, in the form of a weighting factor, to each experience, based on their distribution of temporal difference (TD) error, for the training objective. Results obtained with uniform sampling from the experiments in eight environments of the OpenAI Gym suite show that the proposed algorithm achieves in one environment 10% increase in convergence speed along with a similar success rate and in the other seven environments 3%–46% increases in success rate or 3%–14% increases in cumulative reward, along with similar convergence speed. The algorithm can be combined with existing prioritization method employing non-uniform sampling. The combined technique achieves 20% increase in convergence speed as compared to the prioritization method alone.	-
dc.language	English	-
dc.publisher	Springer Science and Business Media Deutschland GmbH	-
dc.title	Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error	-
dc.type	Conference	-
dc.identifier.scopusid	2-s2.0-85174703538	-
dc.type.rims	CONF	-
dc.citation.publicationname	19th International Conference on Intelligent Computing, ICIC 2023	-
dc.identifier.conferencecountry	CC	-
dc.identifier.conferencelocation	Zhengzhou	-
dc.identifier.doi	10.1007/978-981-99-4761-4_51	-
dc.contributor.localauthor	Har, Dongsoo	-
dc.contributor.nonIdAuthor	Park, Bumgeun	-
dc.contributor.nonIdAuthor	Kim, Taeyoung	-
dc.contributor.nonIdAuthor	Moon, Woohyeon	-
dc.contributor.nonIdAuthor	Nengroo, Sarvar	-

Appears in Collection: GT-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

KOASAS

Communities & Collections