DSpace at KOASAS: Study on design of neural network for location-aware scene text recognizer

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Study on design of neural network for location-aware scene text recognizer위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 3
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	주재걸	-
dc.contributor.author	Yun, Huiwon	-
dc.contributor.author	윤희원	-
dc.date.accessioned	2024-07-25T19:30:43Z	-
dc.date.available	2024-07-25T19:30:43Z	-
dc.date.issued	2023	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045713&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/320525	-
dc.description	학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[iii, 24 p. :]	-
dc.description.abstract	In Scene text recognition (STR), it is important to identify where each character is located in the visual scene when generating text sequence class using a language decoder. Previously, STR models using autoregressive architecture (e.g. RNN, Transformer) were proposed to implicitly learn the separation of the region of each character. Since these models do not use supervision on the localization, they still have a misalignment between the activated region of the visual feature and the ground truth text. To resolve these issues, we present a novel STR method, visual LocAlization LeverAged to LANguage Decoding (vLaLa-Land) which explicitly learns localization by character detection task in the Transformer decoder. In order to train localization and recognition harmonically, we developed two novel mechanisms in the decoder. First, to capture the overall semantic relationship of linguistic and visual information, we apply bidirectional reference-guided Transformer decoder layers on top of the unidirectional autoregressive Transformer decoder layers. Second, to properly recognize the irregular shape text, we consider the height, width, and rotation of each character when computing the cross-attention score. We train our model on synthetic datasets and evaluate our model on real datasets. The experiments show that our method is effective in enhancing text recognition accuracy while simultaneously improving the localization ability of the model. Moreover, our model especially works well on the irregular dataset and archives competitive performance on multiple STR benchmarks.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	이미지 문자 인식▼a객체 탐지▼a인공신경망▼a컴퓨터 비전	-
dc.subject	Scene text recognition▼aObject detection▼aArtificial neural network▼aComputer vision	-
dc.title	Study on design of neural network for location-aware scene text recognizer	-
dc.title.alternative	위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :김재철AI대학원,	-
dc.contributor.alternativeauthor	Choo, Jaegul	-

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Study on design of neural network for location-aware scene text recognizer위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구

KOASAS

Communities & Collections