DSpace at KOASAS: GPT-CRITIC: OFFLINE REINFORCEMENT LEARNING FOR END-TO-END TASK-ORIENTED DIALOGUE SYSTEMS

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Conference Papers(학술대회논문)

GPT-CRITIC: OFFLINE REINFORCEMENT LEARNING FOR END-TO-END TASK-ORIENTED DIALOGUE SYSTEMS

Cited 0 time in webofscience

Cited 0 time in

Hit : 219
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Jang, Youngsoo	ko
dc.contributor.author	Lee, Jongmin	ko
dc.contributor.author	Kim, Kee-Eung	ko
dc.date.accessioned	2023-09-14T11:01:17Z	-
dc.date.available	2023-09-14T11:01:17Z	-
dc.date.created	2023-09-14	-
dc.date.issued	2022-04	-
dc.identifier.citation	10th International Conference on Learning Representations, ICLR 2022	-
dc.identifier.uri	http://hdl.handle.net/10203/312648	-
dc.description.abstract	Training a task-oriented dialogue agent can be naturally formulated as offline reinforcement learning (RL) problem, where the agent aims to learn a conversational strategy to achieve user goals, only from a dialogue corpus. It is very challenging in terms of RL since the natural language action space is astronomical, while feasible (syntactically and semantically correct) actions are very sparse. Thus, standard RL methods easily fail and generate responses diverging from human language, even when fine-tuning a powerful pre-trained language model. In this paper, we introduce GPT-Critic, an offline RL method for task-oriented dialogue. GPT-Critic is built upon GPT-2, fine-tuning the language model through behavior cloning of the critic-guided self-generated sentences. GPT-Critic is essentially free from the issue of diverging from human language since it learns from the sentences sampled from the pre-trained language model. In the experiments, we demonstrate that our algorithm outperforms the state-of-the-art in the task-oriented dialogue benchmarks including MultiWOZ 2.0 and ConvLab.	-
dc.language	English	-
dc.publisher	International Conference on Learning Representations, ICLR	-
dc.title	GPT-CRITIC: OFFLINE REINFORCEMENT LEARNING FOR END-TO-END TASK-ORIENTED DIALOGUE SYSTEMS	-
dc.type	Conference	-
dc.identifier.scopusid	2-s2.0-85147669992	-
dc.type.rims	CONF	-
dc.citation.publicationname	10th International Conference on Learning Representations, ICLR 2022	-
dc.identifier.conferencecountry	US	-
dc.identifier.conferencelocation	Virtual	-
dc.contributor.localauthor	Kim, Kee-Eung	-
dc.contributor.nonIdAuthor	Jang, Youngsoo	-
dc.contributor.nonIdAuthor	Lee, Jongmin	-

Appears in Collection: AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

GPT-CRITIC: OFFLINE REINFORCEMENT LEARNING FOR END-TO-END TASK-ORIENTED DIALOGUE SYSTEMS

KOASAS

Communities & Collections