Towards universal visual scene understanding in the wild강인한 딥러닝 기반 범용적 장면 이해

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 4
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor권인소-
dc.contributor.authorPark, Kwanyong-
dc.contributor.author박관용-
dc.date.accessioned2024-07-26T19:30:49Z-
dc.date.available2024-07-26T19:30:49Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1047048&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/320930-
dc.description학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.8,[vi, 76 p. :]-
dc.description.abstractDespite the significant advancements in deep learning technology, sophisticated deep models often struggle to perform effectively in real-world scenarios. The fundamental cause of these failures is the lack of or bias in training data. While data scaling is a fundamental and ideal solution, constructing large datasets for numerous target tasks presents practical challenges. In this thesis, we explore practical learning frameworks for training robust deep learning models under data constraints. Specifically, we aim to construct generalized models that are robust to new objects and environments not included in the training data. To achieve this high level of generality, we leverage large-scale pre-existing or easily obtainable data as additional training data. We anticipate that this type of data will effectively generalize and regularize the task knowledge learned from the limited scale of the target task dataset. Under the above problem definition, we mainly investigate the task of the universal scene understanding model. The goal of the task is to provide high-quality scene understanding for input image/video, which offers comprehensive descriptions of what and where the consisting objects are. Different from the traditional recognition model which understands a scene in a predefined way, a universal scene understanding model flexibly handles different task definitions. Various task formats can be defined through predefined dataset-level or arbitrary user-provided semantic concepts and user interactions. To address the aforementioned challenging task, we design a universal scene understanding model from a modular perspective and each module serves unique functionality. The model consists of three key modules: the segmenter, the refiner, and the classifier. The segmenter module initially generates a set of regions that serve as coarse masks, effectively identifying potential object regions within the scene. Subsequently, the refiner module refines the segmented regions to preserve the fine details of the scene structure. This module is at the core for a more comprehensive understanding of the scene. Finally, the classifier module is responsible for assigning class labels to each refined region. It analyzes the content of the regions and assigns them appropriate class names, enabling a detailed categorization of objects within the scene. In Chapter 2, we start with the generalization issue for the image segmented. Specifically, we study the problem in the context of unsupervised domain adaptation. In Chapter 3, we move to the generalization issue for the video segmented. To build a robust video segmenter, we jointly utilize image and video data and explore how to bridge these distinct data mainly in the semi-supervised video object segmentation task. In Chapter 4, we formulate the refiner module as a problem of mask-guided matting. In Chapter 5, we first propose our research effort to build a general classifier module. Inspired by the recent success of vision-language foundation models (e.g. CLIP), we investigate how to utilize these foundation models as a generic knowledge basis for vision tasks. Then, we finally connect all the modules to build a universal scene understanding model. The model is instantiated in different configurations and this results in interesting and novel recognition tasks: panoptic soft segmentation, mask-guided video soft segmentation, and open vocabulary instance soft segmentation.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject범용적 장면 이해▼a일반화▼a데이터 부족 문제-
dc.subjectUniversal scene understanding▼aGeneralization▼aData hungry-
dc.titleTowards universal visual scene understanding in the wild-
dc.title.alternative강인한 딥러닝 기반 범용적 장면 이해-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthorKweon, In So-
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0