Batch Prioritization in Multigoal Reinforcement Learning

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 183
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorVecchietti, Luiz Felipeko
dc.contributor.authorKim, Taeyoungko
dc.contributor.authorChoi, Kyujinko
dc.contributor.authorHong, Junheeko
dc.contributor.authorHar, Dongsooko
dc.date.accessioned2020-08-25T01:55:16Z-
dc.date.available2020-08-25T01:55:16Z-
dc.date.created2020-08-10-
dc.date.created2020-08-10-
dc.date.created2020-08-10-
dc.date.issued2020-07-
dc.identifier.citationIEEE ACCESS, v.8, pp.137449 - 137461-
dc.identifier.issn2169-3536-
dc.identifier.urihttp://hdl.handle.net/10203/275936-
dc.description.abstractIn multigoal reinforcement learning, an agent interacts with an environment and learns to achieve multiple goals. The goal-conditioned policy is trained to effectively generalize its behavior for multiple goals. During training, the experiences collected by the agent are randomly sampled from a replay buffer. Because biased sampling of achieved goals affects the success rate of a given task, it should be avoided by considering the valid goal space, introduced here as the set of goals to achieve, and the current competence of the policy. To this end, a novel prioritization method for creation of batches, e.g., collections of samples, is proposed. Candidate batches are sampled and associated with costs; in each iteration the batch with the minimum cost is chosen to train the policy. The cost function is modeled by an intended goal, which is proposed as a hypothetical goal that the policy is trying to learn in each cycle, and the information of the valid goal space. The minimum cost of the batch selected for each iteration decreases throughout training as the policy learns to achieve goals near the center of the valid goal space. The proposed batch prioritization method is combined with hindsight experience replay (HER) for experiments in robotic control tasks presented in the OpenAI gym suite to demonstrate learning performance comparable to that of other state-of-the-art prioritization methods. As a result, the proposed batch prioritization method can achieve improved learning performance in 4 out of 5 tasks, particularly for harder tasks. The experimental results suggest that the proposed method for the creation of training batches, using the valid goal space information and current competence of the policy, can enhance learning performance in multigoal tasks with high-dimensional goal space.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleBatch Prioritization in Multigoal Reinforcement Learning-
dc.typeArticle-
dc.identifier.wosid000557774300001-
dc.identifier.scopusid2-s2.0-85090113302-
dc.type.rimsART-
dc.citation.volume8-
dc.citation.beginningpage137449-
dc.citation.endingpage137461-
dc.citation.publicationnameIEEE ACCESS-
dc.identifier.doi10.1109/ACCESS.2020.3012204-
dc.contributor.localauthorHar, Dongsoo-
dc.contributor.nonIdAuthorVecchietti, Luiz Felipe-
dc.contributor.nonIdAuthorKim, Taeyoung-
dc.contributor.nonIdAuthorChoi, Kyujin-
dc.contributor.nonIdAuthorHong, Junhee-
dc.description.isOpenAccessY-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorTask analysis-
dc.subject.keywordAuthorLearning (artificial intelligence)-
dc.subject.keywordAuthorRobots-
dc.subject.keywordAuthorErbium-
dc.subject.keywordAuthorCost function-
dc.subject.keywordAuthorAerospace electronics-
dc.subject.keywordAuthorExperience replay-
dc.subject.keywordAuthorbatch prioritization-
dc.subject.keywordAuthorgoal distribution-
dc.subject.keywordAuthorreinforcement learning-
dc.subject.keywordAuthorintended goal-
dc.subject.keywordPlusLEVEL-
Appears in Collection
GT-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0