Reducing human supervision in supervised learning약한 지도를 통한 물체 인식 학습

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 317
  • Download : 0
The ability of deep networks to scale up model complexity allows deep learning to successfully tackle challenging visual tasks in computer vision. As the complexity increases, training the model requires large amounts of labeled data which involves costly human annotation effort. The reliance on expensive and error-prone human annotation limits the ability to build models for domains where annotations are particularly expensive. We consider object localization and representation learning(supervised pretraining) as two sub-areas of computer vision that heavily rely on human annotation. The first area, object localization relies on bounding box or pixel-level annotation which are timely and costly. The second area, representation learning is typically done on large collections of millions of annotated data. A significant number of methods have been proposed to tackle this thorny data issue. Among them, we consider weakly-supervised, and self-supervised learning which are promising research streams in object localization, and representation learning, respectively. The former, weakly-supervised learning in object localization takes training data of only image-level labels removing the burden of bounding box or pixel-level annotations. The latter, self-supervised learning for representation learning, manufactures a supervised task on raw images. This avoids the need of large-scale labeled data, but exploits unlimited amount of unlabeled data. Although such methods mitigate the burden of human annotation, they show limited performances compared to their fully-supervised counterparts. In this thesis, we identify and solve the main problems in existing methods for weakly-supervised object localization, and self-supervised representation learning. First, weakly-supervised object localization predicts the location and extent of objects using only image-level labels rather than bounding box or pixel-level annotations. However, the technique has an inherent weakness that it often fails to accurately capture the extent of objects because the image-level supervision encourages a network to focus only on the most discriminative parts of images. We tackle this issue by proposing two-phase learning with an insight that if we retrain the network while covering the most discriminative parts, it will highlight other important parts. Finally, we achieve our goal by merging the heat maps of the first and second networks. We demonstrate that the two networks learn complementary representations, and thus predict the extent of objects more accurately. In addition, we apply our learning scheme to existing state-of-the-art one-phase baselines in semantic segmentation and object saliency detection, and achieve significant improvements on the challenging PASCAL VOC dataset. Second, self-supervised representation learning refers to unsupervised pretraining that learns useful priors for downstream trainings, while manufacturing the supervisory signal automatically from raw data, as opposed to its supervised counterpart. However, the representations learned by existing self-supervsied methods are often task-specific and have limited task-generality. In order to learn more robust and general-purpose representations, we propose a strategy where we do diverse damages on input data and make the network to recover. We have been motivated by the idea that learning to recover more various damages will encourage the network to build richer and higher-level understanding of data than when uniform damage-and-recover is learned. In order to implement this idea, we begin by complicating existing single-task baselines: jigsaw puzzle, inpainting, and colorization. We show that complicating the self-supervised tasks leads to significant progress in closing the gap between supervised and unsupervised pretraining. To further close this gap, we unify these complicated versions into our final task: "completing damages jigsaw puzzles". We demonstrate that our learned representations are able to generally transferred on high-level target tasks. Among self-supervised learning methods, we achieve the state-of-the-art scores in PASCAL VOC classification and semantic segmentation. In addition, we qualitatively show that our learned representations are more robust and task-general compared to that learned by single-task baselines. The long-term goal of our research is to leverage the abundance of cheaply or freely labeled data. If these techniques continue to improve, they may one day supplant supervised learning methods. We provide a significant step toward this goal.
Advisors
Kweon, In Soresearcher권인소researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2018.2,[v, 51 p. :]

Keywords

Weakly-supervised object localization▼asemantic segmentation▼aself-supervised learning▼arepresentation learning; 약한 지도학습▼a물체 위치 인식▼a이미지 분할▼a자가 지도학습▼a표현학습

URI
http://hdl.handle.net/10203/266704
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=733986&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0