DSpace at KOASAS: Linear attention is (maybe) all you need (to understand Transformer optimization)

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Conference Papers(학술대회논문)

Linear attention is (maybe) all you need (to understand Transformer optimization)

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 141
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Ahn, Kwangjun	ko
dc.contributor.author	Cheng, Xiang	ko
dc.contributor.author	Song, Minhak	ko
dc.contributor.author	Yun, Chulhee	ko
dc.contributor.author	Jadbabaie, Ali	ko
dc.contributor.author	Sra, Suvrit	ko
dc.date.accessioned	2024-03-13T13:00:15Z	-
dc.date.available	2024-03-13T13:00:15Z	-
dc.date.created	2024-03-13	-
dc.date.issued	2024-05-07	-
dc.identifier.citation	12th International Conference on Learning Representations, ICLR 2024	-
dc.identifier.uri	http://hdl.handle.net/10203/318546	-
dc.description.abstract	Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical linearized shallow Transformer model. Specifically, we train linear Transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of Transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized Transformer model could actually be a valuable, realistic abstraction for understanding Transformer optimization.	-
dc.language	English	-
dc.publisher	International Conference on Learning Representations (ICLR)	-
dc.title	Linear attention is (maybe) all you need (to understand Transformer optimization)	-
dc.type	Conference	-
dc.type.rims	CONF	-
dc.citation.publicationname	12th International Conference on Learning Representations, ICLR 2024	-
dc.identifier.conferencecountry	AU	-
dc.identifier.conferencelocation	Vienna	-
dc.contributor.localauthor	Yun, Chulhee	-
dc.contributor.nonIdAuthor	Ahn, Kwangjun	-
dc.contributor.nonIdAuthor	Cheng, Xiang	-
dc.contributor.nonIdAuthor	Song, Minhak	-
dc.contributor.nonIdAuthor	Jadbabaie, Ali	-
dc.contributor.nonIdAuthor	Sra, Suvrit	-

Appears in Collection: AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Linear attention is (maybe) all you need (to understand Transformer optimization)

KOASAS

Communities & Collections