DSpace at KOASAS: On the Training Instability of Shuffling SGD with Batch Normalization

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Conference Papers(학술대회논문)

On the Training Instability of Shuffling SGD with Batch Normalization

Cited 0 time in webofscience

Cited 0 time in

Hit : 54
Download : 0

Export

Wu, David X / Yun, Chulhee researcher / Sra, Suvrit

We uncover how SGD interacts with batch normalization and can exhibit undesirable training dynamics such as divergence. More precisely, we study how Single Shuffle (SS) and Random Reshuffle (RR)—two widely used variants of SGD—interact surprisingly differently in the presence of batch normalization: RR leads to much more stable evolution of training loss than SS. As a concrete example, for regression using a linear network with batch normalized inputs, we prove that SS and RR converge to distinct global optima that are “distorted” away from gradient descent. Thereafter, for classification we characterize conditions under which training divergence for SS and RR can, and cannot occur. We present explicit constructions to show how SS leads to distorted optima in regression and divergence for classification, whereas RR avoids both distortion and divergence. We validate our results empirically in realistic settings, and conclude that the separation between SS and RR used with batch normalization is relevant in practice.

Publisher: International Conference on Machine Learning

Issue Date: 2023-07-26

Language: English

Citation: 40th International Conference on Machine Learning, ICML 2023, pp.37787 - 37845

ISSN: 2640-3498

URI: http://hdl.handle.net/10203/316021

Appears in Collection: AI-Conference Papers(학술대회논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

On the Training Instability of Shuffling SGD with Batch Normalization

KOASAS

Communities & Collections