DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Minsu | ko |
dc.contributor.author | Choi, Jeongsoo | ko |
dc.contributor.author | Maiti, Soumi | ko |
dc.contributor.author | Yeo, Jeong Hun | ko |
dc.contributor.author | Watanabe, Shinji | ko |
dc.contributor.author | Ro, Yong Man | ko |
dc.date.accessioned | 2024-07-29T12:00:17Z | - |
dc.date.available | 2024-07-29T12:00:17Z | - |
dc.date.created | 2023-12-29 | - |
dc.date.issued | 2024-04-16 | - |
dc.identifier.citation | IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024) | - |
dc.identifier.uri | http://hdl.handle.net/10203/321174 | - |
dc.publisher | IEEE Signal Processing Society | - |
dc.title | Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens | - |
dc.type | Conference | - |
dc.type.rims | CONF | - |
dc.citation.publicationname | IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024) | - |
dc.identifier.conferencecountry | KO | - |
dc.identifier.conferencelocation | Seoul | - |
dc.contributor.localauthor | Ro, Yong Man | - |
dc.contributor.nonIdAuthor | Kim, Minsu | - |
dc.contributor.nonIdAuthor | Choi, Jeongsoo | - |
dc.contributor.nonIdAuthor | Maiti, Soumi | - |
dc.contributor.nonIdAuthor | Yeo, Jeong Hun | - |
dc.contributor.nonIdAuthor | Watanabe, Shinji | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.