Self-imitation learning algorithms for goal-oriented dialogues목적지향 대화를 위한 자기 모방 학습 알고리즘 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 86
  • Download : 0
Reinforcement learning (RL) aims to learn a policy that maximizes reward through interaction with the environment. Task-oriented dialogues could be naturally formulated by RL problems. However, when we consider applying the standard RL algorithm to real-world task-oriented dialogues, there are three main challenges to consider: (1) task-oriented dialogues assume offline learning that the agent optimizes the policy from the only previously collected dataset without online environment interaction, (2) standard policy-gradient-based RL methods easily fail and generate responses diverging from human language, and (3) optimizing the task-oriented dialogue agent is very challenging because of the enormous action space of natural language actions. In this thesis, to address these challenges, we present three different RL algorithms based on self-imitation learning that the agent learns the policy to imitate the agent’s own past good decisions generated by itself. First, we present a model-based offline RL algorithm that combines RNN-based dialogue generation and MCTS-based Bayesian planning. Secondly, we present a Monte-Carlo planning algorithm that combines Monte-Carlo tree search with language-driven exploration, then introduce RL algorithm built on this planning algorithm. Lastly, we present a model-free offline RL algorithm that is built upon GPT-2 with fine-tuning the language model through behavior cloning of critic-guided self-generated dialogues.
Advisors
Kim, Kee-Eungresearcher김기응researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2022.8,[v, 70 p. :]

Keywords

Reinforcement learning▼aGoal-oriented dialogues▼aSelf-imitation learning▼aOffline reinforcement learning; 강화학습▼a목적지향 대화▼a자기 모방 학습▼a오프라인 강화학습

URI
http://hdl.handle.net/10203/309226
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1007885&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0