Study on design of neural network for location-aware scene text recognizer위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 4
  • Download : 0
In Scene text recognition (STR), it is important to identify where each character is located in the visual scene when generating text sequence class using a language decoder. Previously, STR models using autoregressive architecture (e.g. RNN, Transformer) were proposed to implicitly learn the separation of the region of each character. Since these models do not use supervision on the localization, they still have a misalignment between the activated region of the visual feature and the ground truth text. To resolve these issues, we present a novel STR method, visual LocAlization LeverAged to LANguage Decoding (vLaLa-Land) which explicitly learns localization by character detection task in the Transformer decoder. In order to train localization and recognition harmonically, we developed two novel mechanisms in the decoder. First, to capture the overall semantic relationship of linguistic and visual information, we apply bidirectional reference-guided Transformer decoder layers on top of the unidirectional autoregressive Transformer decoder layers. Second, to properly recognize the irregular shape text, we consider the height, width, and rotation of each character when computing the cross-attention score. We train our model on synthetic datasets and evaluate our model on real datasets. The experiments show that our method is effective in enhancing text recognition accuracy while simultaneously improving the localization ability of the model. Moreover, our model especially works well on the irregular dataset and archives competitive performance on multiple STR benchmarks.
Advisors
주재걸researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[iii, 24 p. :]

Keywords

이미지 문자 인식▼a객체 탐지▼a인공신경망▼a컴퓨터 비전; Scene text recognition▼aObject detection▼aArtificial neural network▼aComputer vision

URI
http://hdl.handle.net/10203/320525
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045713&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0