GPT-CRITIC: OFFLINE REINFORCEMENT LEARNING FOR END-TO-END TASK-ORIENTED DIALOGUE SYSTEMS

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 219
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorJang, Youngsooko
dc.contributor.authorLee, Jongminko
dc.contributor.authorKim, Kee-Eungko
dc.date.accessioned2023-09-14T11:01:17Z-
dc.date.available2023-09-14T11:01:17Z-
dc.date.created2023-09-14-
dc.date.issued2022-04-
dc.identifier.citation10th International Conference on Learning Representations, ICLR 2022-
dc.identifier.urihttp://hdl.handle.net/10203/312648-
dc.description.abstractTraining a task-oriented dialogue agent can be naturally formulated as offline reinforcement learning (RL) problem, where the agent aims to learn a conversational strategy to achieve user goals, only from a dialogue corpus. It is very challenging in terms of RL since the natural language action space is astronomical, while feasible (syntactically and semantically correct) actions are very sparse. Thus, standard RL methods easily fail and generate responses diverging from human language, even when fine-tuning a powerful pre-trained language model. In this paper, we introduce GPT-Critic, an offline RL method for task-oriented dialogue. GPT-Critic is built upon GPT-2, fine-tuning the language model through behavior cloning of the critic-guided self-generated sentences. GPT-Critic is essentially free from the issue of diverging from human language since it learns from the sentences sampled from the pre-trained language model. In the experiments, we demonstrate that our algorithm outperforms the state-of-the-art in the task-oriented dialogue benchmarks including MultiWOZ 2.0 and ConvLab.-
dc.languageEnglish-
dc.publisherInternational Conference on Learning Representations, ICLR-
dc.titleGPT-CRITIC: OFFLINE REINFORCEMENT LEARNING FOR END-TO-END TASK-ORIENTED DIALOGUE SYSTEMS-
dc.typeConference-
dc.identifier.scopusid2-s2.0-85147669992-
dc.type.rimsCONF-
dc.citation.publicationname10th International Conference on Learning Representations, ICLR 2022-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationVirtual-
dc.contributor.localauthorKim, Kee-Eung-
dc.contributor.nonIdAuthorJang, Youngsoo-
dc.contributor.nonIdAuthorLee, Jongmin-
Appears in Collection
AI-Conference Papers(학술대회논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0