Cross-Active Connection for Image-Text Multimodal Feature Fusion

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 214
  • Download : 0
Recent research fields tackle high-level machine learning tasks which often deal with multiplex datasets. Image-text multimodal learning is one of the comparatively challenging domains in Natural Language Processing. In this paper, we suggest a novel method for fusing and training the image-text multimodal feature. The proposed architecture follows a multi-step training scheme to train a neural network for image-text multimodal classification. In the training process, different groups of weights in the network are updated hierarchically in order to reflect the importance of each single modality as well as their mutual relationship. The effectiveness of Cross-Active Connection in image-text multimodal NLP tasks was verified through extensive experiments on the task of multimodal hashtag prediction and image-text feature fusion. © 2021, Springer Nature Switzerland AG.
Publisher
International Conference on Applications of Natural Language to Information Systems
Issue Date
2021-06-24
Language
English
Citation

26th International Conference on Applications of Natural Language to Information Systems, pp.343 - 354

ISSN
0302-9743
DOI
10.1007/978-3-030-80599-9_30
URI
http://hdl.handle.net/10203/286243
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0