Blockwise sequential model learning for partially observable reinforcement learning

Cited 1 time in webofscience Cited 0 time in scopus
  • Hit : 33
  • Download : 0
This paper proposes a new sequential model learning architecture to solve partially observable Markov decision problems. Rather than compressing sequential information at every timestep as in conventional recurrent neural network-based methods, the proposed architecture generates a latent variable in each data block with a length of multiple timesteps and passes the most relevant information to the next block for policy optimization. The proposed blockwise sequential model is implemented based on self-attention, making the model capable of detailed sequential learning in partial observable settings. The proposed model builds an additional learning network to efficiently implement gradient estimation by using self-normalized importance sampling, which does not require the complex blockwise input data reconstruction in the model learning. Numerical results show that the proposed method significantly outperforms previous methods in various partially observable environments.
Publisher
Association for the Advancement of Artificial Intelligence
Issue Date
2022-02-26
Language
English
Citation

36th Conference on Artificial Intelligence, AAAI 2022, pp.7941 - 7948

ISSN
2159-5399
URI
http://hdl.handle.net/10203/299397
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 1 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0