DSpace at KOASAS: Understanding Vqa For Negative Answers Through Visual And Linguistic Inference

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

Understanding Vqa For Negative Answers Through Visual And Linguistic Inference

Cited 0 time in webofscience

Cited 0 time in

Hit : 91
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Jung, Seungjun	ko
dc.contributor.author	Byun, Junyoung	ko
dc.contributor.author	Shim, Kyujin	ko
dc.contributor.author	Hwang, Sanghyun	ko
dc.contributor.author	Kim, Changick	ko
dc.date.accessioned	2021-11-25T06:43:26Z	-
dc.date.available	2021-11-25T06:43:26Z	-
dc.date.created	2021-11-24	-
dc.date.created	2021-11-24	-
dc.date.issued	2021-09-19	-
dc.identifier.citation	IEEE International Conference on Image Processing (ICIP), pp.2873 - 2877	-
dc.identifier.issn	1522-4880	-
dc.identifier.uri	http://hdl.handle.net/10203/289475	-
dc.description.abstract	In order to make Visual Question Answering (VQA) explainable, previous studies not only visualize the attended region of a VQA model, but also generate textual explanations for its answers. However, when the model’s answer is “no,” existing methods have difficulty in revealing detailed arguments that lead to that answer. In addition, previous methods are insufficient to provide logical bases when the question requires common sense to answer. In this paper, we propose a novel textual explanation method to overcome the aforementioned limitations. First, we extract keywords that are essential to infer an answer from a question. Second, we utilize a novel Variable-Constrained Beam Search (VCBS) algorithm to generate explanations that best describe the circumstances in images. Furthermore, if the answer to the question is “yes” or “no,” we apply Natural Langauge Inference (NLI) to determine if contents of the question can be inferred from the explanation using common sense. Our user study, conducted in Amazon Mechanical Turk (MTurk), shows that our proposed method generates more reliable explanations compared to the previous methods. Moreover, by modifying the VQA model’s answer through the output of the NLI model, we show that VQA performance increases by 1.1% from the original model.	-
dc.language	English	-
dc.publisher	IEEE	-
dc.title	Understanding Vqa For Negative Answers Through Visual And Linguistic Inference	-
dc.type	Conference	-
dc.identifier.wosid	000819455102198	-
dc.identifier.scopusid	2-s2.0-85125578051	-
dc.type.rims	CONF	-
dc.citation.beginningpage	2873	-
dc.citation.endingpage	2877	-
dc.citation.publicationname	IEEE International Conference on Image Processing (ICIP)	-
dc.identifier.conferencecountry	US	-
dc.identifier.conferencelocation	Anchorage, AK	-
dc.identifier.doi	10.1109/icip42928.2021.9506242	-
dc.contributor.localauthor	Kim, Changick	-
dc.contributor.nonIdAuthor	Jung, Seungjun	-
dc.contributor.nonIdAuthor	Byun, Junyoung	-
dc.contributor.nonIdAuthor	Shim, Kyujin	-
dc.contributor.nonIdAuthor	Hwang, Sanghyun	-

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Understanding Vqa For Negative Answers Through Visual And Linguistic Inference

KOASAS

Communities & Collections