DSpace at KOASAS: Towards universal visual scene understanding in the wild

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Towards universal visual scene understanding in the wild강인한 딥러닝 기반 범용적 장면 이해

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 5
Download : 0

Export

Park, Kwanyong / 박관용

Despite the significant advancements in deep learning technology, sophisticated deep models often struggle to perform effectively in real-world scenarios. The fundamental cause of these failures is the lack of or bias in training data. While data scaling is a fundamental and ideal solution, constructing large datasets for numerous target tasks presents practical challenges. In this thesis, we explore practical learning frameworks for training robust deep learning models under data constraints. Specifically, we aim to construct generalized models that are robust to new objects and environments not included in the training data. To achieve this high level of generality, we leverage large-scale pre-existing or easily obtainable data as additional training data. We anticipate that this type of data will effectively generalize and regularize the task knowledge learned from the limited scale of the target task dataset. Under the above problem definition, we mainly investigate the task of the universal scene understanding model. The goal of the task is to provide high-quality scene understanding for input image/video, which offers comprehensive descriptions of what and where the consisting objects are. Different from the traditional recognition model which understands a scene in a predefined way, a universal scene understanding model flexibly handles different task definitions. Various task formats can be defined through predefined dataset-level or arbitrary user-provided semantic concepts and user interactions. To address the aforementioned challenging task, we design a universal scene understanding model from a modular perspective and each module serves unique functionality. The model consists of three key modules: the segmenter, the refiner, and the classifier. The segmenter module initially generates a set of regions that serve as coarse masks, effectively identifying potential object regions within the scene. Subsequently, the refiner module refines the segmented regions to preserve the fine details of the scene structure. This module is at the core for a more comprehensive understanding of the scene. Finally, the classifier module is responsible for assigning class labels to each refined region. It analyzes the content of the regions and assigns them appropriate class names, enabling a detailed categorization of objects within the scene. In Chapter 2, we start with the generalization issue for the image segmented. Specifically, we study the problem in the context of unsupervised domain adaptation. In Chapter 3, we move to the generalization issue for the video segmented. To build a robust video segmenter, we jointly utilize image and video data and explore how to bridge these distinct data mainly in the semi-supervised video object segmentation task. In Chapter 4, we formulate the refiner module as a problem of mask-guided matting. In Chapter 5, we first propose our research effort to build a general classifier module. Inspired by the recent success of vision-language foundation models (e.g. CLIP), we investigate how to utilize these foundation models as a generic knowledge basis for vision tasks. Then, we finally connect all the modules to build a universal scene understanding model. The model is instantiated in different configurations and this results in interesting and novel recognition tasks: panoptic soft segmentation, mask-guided video soft segmentation, and open vocabulary instance soft segmentation.

Advisors: 권인소 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2023

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.8,[vi, 76 p. :]

Keywords: 범용적 장면 이해▼a일반화▼a데이터 부족 문제; Universal scene understanding▼aGeneralization▼aData hungry

URI: http://hdl.handle.net/10203/320930

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1047048&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Towards universal visual scene understanding in the wild강인한 딥러닝 기반 범용적 장면 이해

KOASAS

Communities & Collections