DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 주재걸 | - |
dc.contributor.author | Yun, Huiwon | - |
dc.contributor.author | 윤희원 | - |
dc.date.accessioned | 2024-07-25T19:30:43Z | - |
dc.date.available | 2024-07-25T19:30:43Z | - |
dc.date.issued | 2023 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045713&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/320525 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[iii, 24 p. :] | - |
dc.description.abstract | In Scene text recognition (STR), it is important to identify where each character is located in the visual scene when generating text sequence class using a language decoder. Previously, STR models using autoregressive architecture (e.g. RNN, Transformer) were proposed to implicitly learn the separation of the region of each character. Since these models do not use supervision on the localization, they still have a misalignment between the activated region of the visual feature and the ground truth text. To resolve these issues, we present a novel STR method, visual LocAlization LeverAged to LANguage Decoding (vLaLa-Land) which explicitly learns localization by character detection task in the Transformer decoder. In order to train localization and recognition harmonically, we developed two novel mechanisms in the decoder. First, to capture the overall semantic relationship of linguistic and visual information, we apply bidirectional reference-guided Transformer decoder layers on top of the unidirectional autoregressive Transformer decoder layers. Second, to properly recognize the irregular shape text, we consider the height, width, and rotation of each character when computing the cross-attention score. We train our model on synthetic datasets and evaluate our model on real datasets. The experiments show that our method is effective in enhancing text recognition accuracy while simultaneously improving the localization ability of the model. Moreover, our model especially works well on the irregular dataset and archives competitive performance on multiple STR benchmarks. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 이미지 문자 인식▼a객체 탐지▼a인공신경망▼a컴퓨터 비전 | - |
dc.subject | Scene text recognition▼aObject detection▼aArtificial neural network▼aComputer vision | - |
dc.title | Study on design of neural network for location-aware scene text recognizer | - |
dc.title.alternative | 위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :김재철AI대학원, | - |
dc.contributor.alternativeauthor | Choo, Jaegul | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.