Towards human-like conversation agent인간과 유사한 대화가 가능한 대화 시스템 이론

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 53
  • Download : 0
Development of deep learning allows machine-learning based algorithms to solve more diverse, complex problems. Recently revealed generative neural networks with outstanding performance show that deep learning is not limited to solving only classification related problems. In the field of natural language processing, machine-translation algorithms have been expanded to text-to-text dialogue systems with the advancement of encoder-decoder structures such as seq2seq along with high performance word embedding models. Traditional conversation models based on rule-based language understanding and output query from a fixed database of sentences target goal-oriented dialogues and simple question-answering, and as a result they are limited to stereotyped dialogues in which users select their response from a given options of input. In contrast, deep-learning based dialogue systems utilize encoder-decoder structures and high-performance word embeddings to understand language and generate a relative response in a word-wise fashion. This generative property allows the system to perform dialogues not limited to format. On the other side the natural language understanding performance of neural network-based systems depend strongly on the utilized word embeddings, and even if long-short-term memory is applied, the system cannot maintain the context of previous utterances when the conversation gets longer than four sequences. Results shown in Chapter 2 introduce a method of building dialogue systems that performs well while maintaining the context of the overall dialogue sequences. The results suggest the possibilities of neural network-based dialogue systems to perform more human-like conversation and overcome the limitations of traditional methods. In Chapter 3, a hybrid approach that combines rule-based language understanding and neural network encoder-decoder structure to utilize the advantages of both goal-oriented and non-goal-oriented dialogue system is presented. The complex property of natural languages that a single word may have different meanings depending on the context of the overall dialogue even if identical words are used makes non-goal-oriented dialogues difficult. Humans use symbolic context features for precise comprehension, which include various factors such as who they are talking to, what conversation came over before and after the target utterance, and visual information. As deep-learning algorithms strictly understands language based on the used word embedding, there are cases when the system misunderstands the context of the input utterance and generate unrelated response or fails to generate appropriate responses. To overcome this limitation, a hybrid system of rule-based context retrieval and an artificial neural network-based encoder combined with Bayesian skip-gram is presented. The example dialogue responses of given context categories show that the hybrid approach improves the performance of a non-goal-oriented dialogue agent. An approach to expand the capabilities of a text-to-text dialogue agent with multi-modal algorithms and persona embedding is introduced in Chapter 4. Real human conversation not solely takes text-given information to interpret messages. Visual, auditory and knowledge-based information are interpreted along with text to offer multiplex interpretation, which sometimes result as a totally different inference. Multi-modal algorithm is already a wide field of research, but a majority of presented models utilize the ensemble method in which the features of different modalities are concatenated as an input into a single network. The end-to-end ensemble-based feature-fusion leads to lower performance, especially in cases when the two modalities do not affect each other in a complementary relationship. In this research, we target the task of context classification of image-text multi-modal SNS posts with corresponding hashtags as the output. A novel multi-phase training method is introduced to effectively fuse image and text features using a single deep perceptron. Also, we perform a mathematical decomposition of the multi-phase training scheme into an end-to-end equivalent, which allows us to use stochastic gradient descent in the training process. The final experiment of this paper introduces a method of embedding personality into machine-learning based language models. Statistical approaches introduce methods by using a fixed format of questionnaires to classify one’s personality into different categories. We introduce a training method to apply these statistical methods into neural network based models, and provide a benchmark model for measuring the performance of personality classification. By the series of research conducted in this paper, we suggest possibilities to build a human-like AI dialogue agent.
Advisors
Kim, Dae-Shikresearcher김대식researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2022.2,[iv, 51 p. :]

URI
http://hdl.handle.net/10203/309068
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=996264&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0