GPT-CRITIC: OFFLINE REINFORCEMENT LEARNING FOR END-TO-END TASK-ORIENTED DIALOGUE SYSTEMS

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 220
  • Download : 0
Training a task-oriented dialogue agent can be naturally formulated as offline reinforcement learning (RL) problem, where the agent aims to learn a conversational strategy to achieve user goals, only from a dialogue corpus. It is very challenging in terms of RL since the natural language action space is astronomical, while feasible (syntactically and semantically correct) actions are very sparse. Thus, standard RL methods easily fail and generate responses diverging from human language, even when fine-tuning a powerful pre-trained language model. In this paper, we introduce GPT-Critic, an offline RL method for task-oriented dialogue. GPT-Critic is built upon GPT-2, fine-tuning the language model through behavior cloning of the critic-guided self-generated sentences. GPT-Critic is essentially free from the issue of diverging from human language since it learns from the sentences sampled from the pre-trained language model. In the experiments, we demonstrate that our algorithm outperforms the state-of-the-art in the task-oriented dialogue benchmarks including MultiWOZ 2.0 and ConvLab.
Publisher
International Conference on Learning Representations, ICLR
Issue Date
2022-04
Language
English
Citation

10th International Conference on Learning Representations, ICLR 2022

URI
http://hdl.handle.net/10203/312648
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0