Learning to generate semantic layouts for higher text-image correspondence in text-to-image synthesis문자열 기반 이미지 생성 시 높은 문자열 반영도를 위한 의미론적 분할 지도 동시 생성 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
Existing text-to-image generation approaches have set high standards for photorealism and text-image correspondence, largely benefiting from web-scale text-image datasets, which can include up to 5 billion pairs. However, text-to-image generation models trained on domain-specific datasets, such as urban scenes, medical images, and faces, still suffer from low text-image correspondence due to the lack of text-image pairs. Additionally, collecting billions of text-image pairs for a specific domain can be time-consuming and costly. Thus, ensuring high text-image correspondence without relying on web-scale text-image datasets remains a challenging task. In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. Specifically, we propose a Gaussian-categorical diffusion process that simultaneously generates both images and corresponding layout pairs. Our experiments reveal that we can guide text-to-image generation models to be aware of the semantics of different image regions, by training the model to generate semantic labels for each pixel. We demonstrate that our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset, where text-image pairs are scarce.
Advisors
주재걸researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2024.2,[v, 35 p. :]

Keywords

문자열 기반 이미지 생성▼a생성 모델▼a확산 과정; Text-to-image generation▼aGenerative model▼aDiffusion Process

URI
http://hdl.handle.net/10203/321380
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1096085&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0