One-shot 3D estimation with object guidance물체 정보를 이용한 3차원 정보 추정

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 1
  • Download : 0
3D estimation is a crucial task in computer vision and has various applications such as robotics, autonomous driving, and virtual reality. In human perception, the brain predominantly relies on the analysis of visual information from multiple viewpoints to infer three-dimensional spatial relationships. Interestingly, even with a single image, the brain exhibits a remarkable capacity to perceive and comprehend the underlying structure in three dimensions. However, translating this innate ability to artificial intelligence models poses significant challenges. Traditional approaches in artificial intelligence (AI) models, typically, 3D estimation is accomplished by utilizing geometric constraints via multi-view images with corresponding camera poses and photometric loss with errors about RGB values, where 3D estimation from the single image still remains challenging task. Moreover, acquiring multiple images and corresponding camera poses needs specialized calibrated equipment with camera and IMU or data acquisition in a limited environment without moving objects in the scene. Consequently, learning 3D information using multi-viewpoint data under these environmental constraints is challenging and cannot be applied to arbitrary real environment scenarios. In contrast, a one-shot 3D learning model using a single image can estimate a 3D model without the aforementioned data constraints like moving object scene, utilizing data from arbitrary environments and situations. For example, the model can even be learned using data crawled from the vast internet photos. However, 3D estimation using a single image has the disadvantage of not being able to use geometric constraints and photometric loss from images of multiple viewpoints. To deal with the problem of one-shot 3D estimation, this work aims to address the limitations of one-shot 3D estimation by proposing novel methods and techniques. First, we aim to overcome the challenge of insufficient information in one-shot observations for 3D depth estimation by leveraging objectness to estimate fine-grained depth maps. While the existing one-shot depth estimation models are mainly trained on the scene structure and the vanishing point of outdoor road scenes, they suffer from the drawback of limited capability in estimating fine-grained object depth details. To address this issue, our proposed approach focus on the object regions, resulting in improved fine-grained depth estimation. We evaluate the proposed approach against existing depth estimation models, and analyze how the model learns detailed regions of the scene. In the other hands, the acquisition of diverse data is critical for one-shot 3D estimation to overcome the limitations of limited observations. The diverse data provides sufficient feature information to learn from a single image. As such, researchers have worked to construct 3D datasets using depth sensors such as Time-of-Flight (ToF) and LiDAR sensors in specific environments. In this work, we have contributed to the SideGuide dataset, which expands the coverage of the existing 3D dataset. Typically, outdoor datasets focus on road scenes, however, we have prioritized the sidewalk environment as it is an area with a high volume of pedestrian traffic, including impaired individuals. We have released the SideGuide dataset, which includes object bounding boxes, masks, and depth maps obtained from stereo sensors. By doing so, we hope to facilitate research on one-shot 3D estimation in sidewalk environments. Lastly, we delve into the methodology of one-shot 3D neural rendering estimation using object guidance. In contrast to vanilla NeRF (Neural Radiance Fields) models that train separate models for each 3D structure with hundreds of images, the one-shot neural rendering model have challenges in terms of improving model architecture and learning 3D structures from diverse one-shot observations. To tackle these difficulties, we propose a network architecture for one-shot neural rendering models that aims to address the inherent complexities and train on a single image. We compare the performance of the proposed model on the real-world data by using 3D virtual data. In order to effectively learn from the artificially generated virtual data, we employ the knowledge distillation technique, utilizing a teacher-student framework. Through this approach, we incrementally train the student model with object information extracted from images, ultimately leading to a model that learns and incorporates object information within the rendering process. Additionally, we propose one-shot generative 3D estimation, improving the model's generalization and ability the generation of 3D models via diffusion processes.
Advisors
권인소researcher
Description
한국과학기술원 :미래자동차학제전공,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 미래자동차학제전공, 2023.8,[vi, 57 p. :]

Keywords

3차원 복원▼a스테레오▼a인도 보행▼a뉴럴 렌더링▼a깊이 추정; 3D estimation▼aSingle view depth estimation▼aOne-shot▼aDiffusion model▼aNeural rendering

URI
http://hdl.handle.net/10203/320848
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1046619&flag=dissertation
Appears in Collection
PD-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0