DSpace at KOASAS: Text-conditioned sampling framework for text-to-image generation with masked generative models

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Theses_Master(석사논문)

Text-conditioned sampling framework for text-to-image generation with masked generative models마스크 생성 모델에서의 문장 정보와 일치하는 이미지 생성을 위한 토큰 추출 프레임워크

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 2
Download : 0

Export

Lee, Jaewoong / 이재웅

Token-based masked generative models are gaining popularity for their fast inference time with parallel decoding. While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information. TCTS improves not only the image quality but also the semantic alignment of the generated images with the given texts. To further improve the image quality, we introduce a cohesive sampling strategy, Frequency Adaptive Sampling (FAS), to each group of tokens divided according to the self-attention maps. We validate the efficacy of TCTS combined with FAS with various generative tasks, demonstrating that it significantly outperforms the baselines in image-text alignment and image quality. Our text-conditioned sampling framework further reduces the original inference time by more than 50% without modifying the original generative model.

Advisors: 황성주 researcher

Description: 한국과학기술원 :김재철AI대학원,

Publisher: 한국과학기술원

Issue Date: 2023

Identifier: 325007

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[v, 28 p. :]

Keywords: 다중모달▼a문장 정보 기반 이미지 생성▼a토큰 기반 확산모델; Multimodal▼aText-to-image generation▼aToken-based diffusion model

URI: http://hdl.handle.net/10203/320548

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045736&flag=dissertation

Appears in Collection: AI-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Text-conditioned sampling framework for text-to-image generation with masked generative models마스크 생성 모델에서의 문장 정보와 일치하는 이미지 생성을 위한 토큰 추출 프레임워크

KOASAS

Communities & Collections