On the hyperparameter tuning of a scale-invariant network and its applications스케일 불변 네트워크의 초매개변수 조율과 그 응용

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 175
  • Download : 0
Modern deep neural networks are equipped with normalization layers such as batch normalization or layer normalization to enhance and stabilize training dynamics. If a network contains such normalization layers, the optimization objective is invariant to the scale of the neural network parameters. The scale-invariance induces the neural network's output to be only affected by the weights' direction and not the weights' scale. We address the tuning of hyperparameters and their applications in such scale-invariant neural networks. As a first application, hyperparameter tuning in active learning is dealt with. In active learning, the number of labeled training data continues to increase as learning progresses. We propose a weight decay scheduling method suitable for active learning based on the analysis of the relationship between the number of training data and weight decay. We also propose a method of distilling knowledge from the low-performing network used in the previous round and apply it to active learning. We validate our methods on the MNIST, CIFAR-10, and CIFAR-100 datasets using convolutional neural networks of various sizes. Second, we find a common feature of good hyperparameter combinations on such a scale-invariant network, including learning rate, weight decay, number of data samples, and batch size. Our key observation is that hyperparameter setups that lead to good performance show similar degrees of angular update during one epoch. Using a stochastic differential equation, we analyze the angular update and show how each hyperparameter affects it. With this relationship, we can derive a simple hyperparameter tuning method and apply it to an efficient hyperparameter search.
Advisors
Kim, Junmoresearcher김준모researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 41 p. :]

Keywords

Active learning▼aWeight decay▼aKnowledge distillation▼aScale-invariant network▼aNormalization▼aAngular update▼aHyperparameter tuning; 능동 학습▼a가중치 감소▼a지식 증류▼a스케일 불변 네트워크▼a정규화▼a각도 업데이트▼a초매개변수 조율

URI
http://hdl.handle.net/10203/309092
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030543&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0