DSpace at KOASAS: Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 7
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Kim, Minsu	ko
dc.contributor.author	Choi, Jeongsoo	ko
dc.contributor.author	Maiti, Soumi	ko
dc.contributor.author	Yeo, Jeong Hun	ko
dc.contributor.author	Watanabe, Shinji	ko
dc.contributor.author	Ro, Yong Man	ko
dc.date.accessioned	2024-07-29T12:00:17Z	-
dc.date.available	2024-07-29T12:00:17Z	-
dc.date.created	2023-12-29	-
dc.date.issued	2024-04-16	-
dc.identifier.citation	IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)	-
dc.identifier.uri	http://hdl.handle.net/10203/321174	-
dc.publisher	IEEE Signal Processing Society	-
dc.title	Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens	-
dc.type	Conference	-
dc.type.rims	CONF	-
dc.citation.publicationname	IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)	-
dc.identifier.conferencecountry	KO	-
dc.identifier.conferencelocation	Seoul	-
dc.contributor.localauthor	Ro, Yong Man	-
dc.contributor.nonIdAuthor	Kim, Minsu	-
dc.contributor.nonIdAuthor	Choi, Jeongsoo	-
dc.contributor.nonIdAuthor	Maiti, Soumi	-
dc.contributor.nonIdAuthor	Yeo, Jeong Hun	-
dc.contributor.nonIdAuthor	Watanabe, Shinji	-

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

KOASAS

Communities & Collections