Periodic Q-learning

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 130
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Donghwanko
dc.contributor.authorHe, Niaoko
dc.date.accessioned2020-12-18T07:30:23Z-
dc.date.available2020-12-18T07:30:23Z-
dc.date.created2020-11-24-
dc.date.issued2020-06-11-
dc.identifier.citation2nd Annual Conference on Learning for Dynamics and Control(L4DC)-
dc.identifier.urihttp://hdl.handle.net/10203/278704-
dc.description.abstractThe use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning algorithm (PQ-learning for short), which resembles the technique used in deep Q-learning for solving infinite-horizon discounted Markov decision processes (DMDP) in the tabular setting. PQ-learning maintains two separate Q-value estimates – the online estimate and target estimate. The online estimate follows the standard Q-learning update, while the target estimate is updated periodically. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilon-optimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Q-learning algorithms.-
dc.languageEnglish-
dc.publisherUC Berkeley-
dc.titlePeriodic Q-learning-
dc.typeConference-
dc.type.rimsCONF-
dc.citation.publicationname2nd Annual Conference on Learning for Dynamics and Control(L4DC)-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationOnline-
dc.contributor.localauthorLee, Donghwan-
dc.contributor.nonIdAuthorHe, Niao-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0