Understanding Vqa For Negative Answers Through Visual And Linguistic Inference

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 91
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorJung, Seungjunko
dc.contributor.authorByun, Junyoungko
dc.contributor.authorShim, Kyujinko
dc.contributor.authorHwang, Sanghyunko
dc.contributor.authorKim, Changickko
dc.date.accessioned2021-11-25T06:43:26Z-
dc.date.available2021-11-25T06:43:26Z-
dc.date.created2021-11-24-
dc.date.created2021-11-24-
dc.date.issued2021-09-19-
dc.identifier.citationIEEE International Conference on Image Processing (ICIP), pp.2873 - 2877-
dc.identifier.issn1522-4880-
dc.identifier.urihttp://hdl.handle.net/10203/289475-
dc.description.abstractIn order to make Visual Question Answering (VQA) explainable, previous studies not only visualize the attended region of a VQA model, but also generate textual explanations for its answers. However, when the model’s answer is “no,” existing methods have difficulty in revealing detailed arguments that lead to that answer. In addition, previous methods are insufficient to provide logical bases when the question requires common sense to answer. In this paper, we propose a novel textual explanation method to overcome the aforementioned limitations. First, we extract keywords that are essential to infer an answer from a question. Second, we utilize a novel Variable-Constrained Beam Search (VCBS) algorithm to generate explanations that best describe the circumstances in images. Furthermore, if the answer to the question is “yes” or “no,” we apply Natural Langauge Inference (NLI) to determine if contents of the question can be inferred from the explanation using common sense. Our user study, conducted in Amazon Mechanical Turk (MTurk), shows that our proposed method generates more reliable explanations compared to the previous methods. Moreover, by modifying the VQA model’s answer through the output of the NLI model, we show that VQA performance increases by 1.1% from the original model.-
dc.languageEnglish-
dc.publisherIEEE-
dc.titleUnderstanding Vqa For Negative Answers Through Visual And Linguistic Inference-
dc.typeConference-
dc.identifier.wosid000819455102198-
dc.identifier.scopusid2-s2.0-85125578051-
dc.type.rimsCONF-
dc.citation.beginningpage2873-
dc.citation.endingpage2877-
dc.citation.publicationnameIEEE International Conference on Image Processing (ICIP)-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationAnchorage, AK-
dc.identifier.doi10.1109/icip42928.2021.9506242-
dc.contributor.localauthorKim, Changick-
dc.contributor.nonIdAuthorJung, Seungjun-
dc.contributor.nonIdAuthorByun, Junyoung-
dc.contributor.nonIdAuthorShim, Kyujin-
dc.contributor.nonIdAuthorHwang, Sanghyun-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0