Study on design of neural network for location-aware scene text recognizer위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 3
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor주재걸-
dc.contributor.authorYun, Huiwon-
dc.contributor.author윤희원-
dc.date.accessioned2024-07-25T19:30:43Z-
dc.date.available2024-07-25T19:30:43Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045713&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/320525-
dc.description학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[iii, 24 p. :]-
dc.description.abstractIn Scene text recognition (STR), it is important to identify where each character is located in the visual scene when generating text sequence class using a language decoder. Previously, STR models using autoregressive architecture (e.g. RNN, Transformer) were proposed to implicitly learn the separation of the region of each character. Since these models do not use supervision on the localization, they still have a misalignment between the activated region of the visual feature and the ground truth text. To resolve these issues, we present a novel STR method, visual LocAlization LeverAged to LANguage Decoding (vLaLa-Land) which explicitly learns localization by character detection task in the Transformer decoder. In order to train localization and recognition harmonically, we developed two novel mechanisms in the decoder. First, to capture the overall semantic relationship of linguistic and visual information, we apply bidirectional reference-guided Transformer decoder layers on top of the unidirectional autoregressive Transformer decoder layers. Second, to properly recognize the irregular shape text, we consider the height, width, and rotation of each character when computing the cross-attention score. We train our model on synthetic datasets and evaluate our model on real datasets. The experiments show that our method is effective in enhancing text recognition accuracy while simultaneously improving the localization ability of the model. Moreover, our model especially works well on the irregular dataset and archives competitive performance on multiple STR benchmarks.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject이미지 문자 인식▼a객체 탐지▼a인공신경망▼a컴퓨터 비전-
dc.subjectScene text recognition▼aObject detection▼aArtificial neural network▼aComputer vision-
dc.titleStudy on design of neural network for location-aware scene text recognizer-
dc.title.alternative위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :김재철AI대학원,-
dc.contributor.alternativeauthorChoo, Jaegul-
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0