Deep learning-based solutions for empowering visual localization and other vision tasks강력한 시각적 위치 파악 및 기타 컴퓨터 비전 문제를 지원하는 딥 러닝 기반 솔루션

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 313
  • Download : 0
Visual localization is essential for many applications, including AR/VR, robots, and self-driving cars. Traditional methods use large memory and processing resources to estimate the camera position in absolute and relative terms. It gave rise to a new pattern of finding the pose using learning-based methods, i.e., pose regressors. Existing relative camera pose estimation techniques rely solely on balancing hyperparameter tuning manually or automatically in the loss function. On the other hand, current absolute pose regressors generally lack the quality to adapt to different domains of the same scene. In this work, we primarily address these two issues. First, estimating the relative camera position between a pair of images is formulated using a two-stage training strategy that eliminates the need for compensating hyperparameters in the loss function. Our proposed training strategy drastically improved the translation vector estimation by 16.11%, 28.88%, and 52.27% on the KingsCollege, OldHospital, and StMarysChurch scenes, respectively. To demonstrate texture invariance, we explore the generalization of the proposed method by extending the datasets to different scene styles for ablation and qualitative studies using Generative Adversarial Networks(GAN). Second, we offer a novel lightweight domain adaptive training framework to retrain any existing absolute pose regressors(APR) to improve their generalization capability. Our lightweight network outperforms the transformer in translation vector estimation on the visual localization benchmark dataset. The results show that despite using about 24 times fewer FLOPs, 12 times fewer activations, and five times fewer parameters than state-of-the-art MS-Transformer, our approach outperforms all CNN-based architectures and achieves comparable performance to transformer-based architectures. Our method achieves ranks 2nd and 4th with the Cambridge Landmarks and 7Scenes datasets, respectively. Moreover, our approach outperforms and ranks 1st over the MS -transformer on unseen domains. Furthermore, This work explores the demonstration of an APR's inversion for synthesizing views similar to NeRF.
Advisors
Har, Dongsooresearcher하동수researcher
Description
한국과학기술원 :미래자동차학제전공,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 미래자동차학제전공, 2023.2,[vii, 71 p. :]

Keywords

Visual localization▼aCamera pose▼aRelative pose estimation▼aAbsolute pose estimation▼aDomain adaptation; 시각적 현지화▼a카메라 포즈▼a상대 포즈 추정▼a절대 포즈 추정▼a도메인 적응

URI
http://hdl.handle.net/10203/308345
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032372&flag=dissertation
Appears in Collection
PD-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0