Disentangling and diversifying text-driven image manipulation via StyleGAN generatorStyleGAN 생성모델을 활용한 텍스트 기반 이미지 조작의 분리 및 다양화

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 88
  • Download : 0
With the advent of multi-modal representation models such as CLIP, text-driven style transfer has recently been in the spotlight for that it is more intuitive and directly interpretable. Nevertheless, not much attention has been paid to an unexpected issue that are newly emerging for this new type of manipulation model. In this paper, we propose a novel approach to bypass the entanglement of text guidance with integrated cross-modal information. In addition, we raise the issue that the state-of-the-art language-guided image manipulation in fact cannot generate sufficiently diverse samples due to the rigidity of text guidance. To address this issue, we propose a simple yet effective method, Cross-Modal embedding Surgery (CosMoS), which automatically searches for the desired set of semantics from image and text, then combines the selected subset for source-preserved and multiple-styled results. We show the validity of our method that generates diversified manipulation results in the human and animal face domain.
Advisors
Yang, Eunhoresearcher양은호researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2022.8,[iii, 22 p. :]

Keywords

Text-guided Image manipulation▼aComputer Vision▼aMulti-modal learning; 텍스트 기반 이미지 조작기법▼a컴퓨터비전▼a멀티모달 러닝

URI
http://hdl.handle.net/10203/308188
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1021028&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0