Towards universal visual scene understanding in the wild강인한 딥러닝 기반 범용적 장면 이해

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
Despite the significant advancements in deep learning technology, sophisticated deep models often struggle to perform effectively in real-world scenarios. The fundamental cause of these failures is the lack of or bias in training data. While data scaling is a fundamental and ideal solution, constructing large datasets for numerous target tasks presents practical challenges. In this thesis, we explore practical learning frameworks for training robust deep learning models under data constraints. Specifically, we aim to construct generalized models that are robust to new objects and environments not included in the training data. To achieve this high level of generality, we leverage large-scale pre-existing or easily obtainable data as additional training data. We anticipate that this type of data will effectively generalize and regularize the task knowledge learned from the limited scale of the target task dataset. Under the above problem definition, we mainly investigate the task of the universal scene understanding model. The goal of the task is to provide high-quality scene understanding for input image/video, which offers comprehensive descriptions of what and where the consisting objects are. Different from the traditional recognition model which understands a scene in a predefined way, a universal scene understanding model flexibly handles different task definitions. Various task formats can be defined through predefined dataset-level or arbitrary user-provided semantic concepts and user interactions. To address the aforementioned challenging task, we design a universal scene understanding model from a modular perspective and each module serves unique functionality. The model consists of three key modules: the segmenter, the refiner, and the classifier. The segmenter module initially generates a set of regions that serve as coarse masks, effectively identifying potential object regions within the scene. Subsequently, the refiner module refines the segmented regions to preserve the fine details of the scene structure. This module is at the core for a more comprehensive understanding of the scene. Finally, the classifier module is responsible for assigning class labels to each refined region. It analyzes the content of the regions and assigns them appropriate class names, enabling a detailed categorization of objects within the scene. In Chapter 2, we start with the generalization issue for the image segmented. Specifically, we study the problem in the context of unsupervised domain adaptation. In Chapter 3, we move to the generalization issue for the video segmented. To build a robust video segmenter, we jointly utilize image and video data and explore how to bridge these distinct data mainly in the semi-supervised video object segmentation task. In Chapter 4, we formulate the refiner module as a problem of mask-guided matting. In Chapter 5, we first propose our research effort to build a general classifier module. Inspired by the recent success of vision-language foundation models (e.g. CLIP), we investigate how to utilize these foundation models as a generic knowledge basis for vision tasks. Then, we finally connect all the modules to build a universal scene understanding model. The model is instantiated in different configurations and this results in interesting and novel recognition tasks: panoptic soft segmentation, mask-guided video soft segmentation, and open vocabulary instance soft segmentation.
Advisors
권인소researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.8,[vi, 76 p. :]

Keywords

범용적 장면 이해▼a일반화▼a데이터 부족 문제; Universal scene understanding▼aGeneralization▼aData hungry

URI
http://hdl.handle.net/10203/320930
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1047048&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0