DC Field | Value | Language |
---|---|---|
dc.contributor.author | Moon, Jong Hak | ko |
dc.contributor.author | Lee, Hyungyung | ko |
dc.contributor.author | Shin, Woncheol | ko |
dc.contributor.author | Kim, Young-Hak | ko |
dc.contributor.author | Choi, Yoonjae | ko |
dc.date.accessioned | 2022-12-15T09:00:11Z | - |
dc.date.available | 2022-12-15T09:00:11Z | - |
dc.date.created | 2022-12-03 | - |
dc.date.created | 2022-12-03 | - |
dc.date.created | 2022-12-03 | - |
dc.date.issued | 2022-12 | - |
dc.identifier.citation | IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, v.26, no.12, pp.6070 - 6080 | - |
dc.identifier.issn | 2168-2194 | - |
dc.identifier.uri | http://hdl.handle.net/10203/303063 | - |
dc.description.abstract | Recently a number of studies demonstrated impressive performance on diverse vision-language multi-modal tasks such as image captioning and visual question answering by extending the BERT architecture with multi-modal pre-training objectives. In this work we explore a broad set of multi-modal representation learning tasks in the medical domain, specifically using radiology images and the unstructured report. We propose Medical Vision Language Learner (MedViLL), which adopts a BERT-based architecture combined with a novel multi-modal attention masking scheme to maximize generalization performance for both vision-language understanding tasks (diagnosis classification, medical image-report retrieval, medical visual question answering) and vision-language generation task (radiology report generation). By statistically and rigorously evaluating the proposed model on four downstream tasks with three radiographic image-report datasets (MIMIC-CXR, Open-I, and VQA-RAD), we empirically demonstrate the superior downstream task performance of MedViLL against various baselines, including task-specific architectures. | - |
dc.language | English | - |
dc.publisher | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | - |
dc.title | Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training | - |
dc.type | Article | - |
dc.identifier.wosid | 000894943300028 | - |
dc.identifier.scopusid | 2-s2.0-85139447655 | - |
dc.type.rims | ART | - |
dc.citation.volume | 26 | - |
dc.citation.issue | 12 | - |
dc.citation.beginningpage | 6070 | - |
dc.citation.endingpage | 6080 | - |
dc.citation.publicationname | IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS | - |
dc.identifier.doi | 10.1109/JBHI.2022.3207502 | - |
dc.contributor.localauthor | Choi, Yoonjae | - |
dc.contributor.nonIdAuthor | Lee, Hyungyung | - |
dc.contributor.nonIdAuthor | Shin, Woncheol | - |
dc.contributor.nonIdAuthor | Kim, Young-Hak | - |
dc.description.isOpenAccess | N | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | Healthcare | - |
dc.subject.keywordAuthor | medical | - |
dc.subject.keywordAuthor | multimodal learning | - |
dc.subject.keywordAuthor | representation learning | - |
dc.subject.keywordAuthor | self-supervised learning | - |
dc.subject.keywordAuthor | vision-and-language | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.