DSpace at KOASAS: Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

DSpace at KOASAS

College of Engineering(공과대학)Cho Chun Shik Graduate School for Mobility(조천식모빌리티대학원)GT-Conference Papers(학술회의논문)

Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

Cited 0 time in webofscience

Cited 0 time in

Hit : 54
Download : 0

Export

Park, Bumgeun / Kim, Taeyoung / Moon, Woohyeon / Nengroo, Sarvar / Har, Dongsoo researcher

Training agents via off-policy deep reinforcement learning algorithm requires a replay memory storing past experiences that are sampled uniformly or non-uniformly to create the batches for training. When calculating the loss function, off-policy algorithms commonly assume that all samples are of equal importance. We introduce a novel algorithm that assigns unequal importance, in the form of a weighting factor, to each experience, based on their distribution of temporal difference (TD) error, for the training objective. Results obtained with uniform sampling from the experiments in eight environments of the OpenAI Gym suite show that the proposed algorithm achieves in one environment 10% increase in convergence speed along with a similar success rate and in the other seven environments 3%–46% increases in success rate or 3%–14% increases in cumulative reward, along with similar convergence speed. The algorithm can be combined with existing prioritization method employing non-uniform sampling. The combined technique achieves 20% increase in convergence speed as compared to the prioritization method alone.

Publisher: Springer Science and Business Media Deutschland GmbH

Issue Date: 2023-08-10

Language: English

Citation: 19th International Conference on Intelligent Computing, ICIC 2023

DOI: 10.1007/978-981-99-4761-4_51

URI: http://hdl.handle.net/10203/316720

Appears in Collection: GT-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

KOASAS

Communities & Collections