Adaptive attention fusion network for visual question answering

Cited 3 time in webofscience Cited 0 time in scopus
  • Hit : 499
  • Download : 0
Automatic understanding of the content of a reference image and natural language questions is needed in Visual Question Answering (VQA). Generating a visual attention map that focuses on the regions related to the context of the question can improve performance of VQA. In this paper, we propose adaptive attention-based VQA network. The proposed method utilizes the complementary information from the attention maps depending on three levels of word embedding (word level, phrase level, and question level embedding), and adaptively fuses the information to represent the image-question pair appropriately. Comparative experiments have been conducted on the public COCO-QA database to validate the proposed method. Experimental results have shown that the proposed method outperforms previous methods in terms of accuracy.
Publisher
IEEE
Issue Date
2017-07-10
Language
English
Citation

IEEE International Conference on Multimedia and Expo (ICME), pp.997 - 1002

ISSN
1945-7871
DOI
10.1109/ICME.2017.8019540
URI
http://hdl.handle.net/10203/224196
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0