Generalization of deep neural networks via discovering flatter loss surfaces = 편평도가 더 높은 손실 평면을 발견함을 통한 딥뉴럴네트워크의 일반화

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 20
  • Download : 0
Achieving generalization is one of a core problem in DNNs(Deep Neural Networks). DNNs have extremely large number of parameters, resulting in high model complexity. Therefore, any well-conditioned training problem can be fit with DNNs, but high model complexity makes solution of DNNs underdetermined, meaning DNNs has too many solutions for the target training problem. To reduce the solution space of this underdetermined system, numerous regularization concepts have been proposed. In this work, the flat minima theory is adopted as a constraint of optimization problem. The first concept of flat minima is described in [19, 18]. In this paper, we give more concrete theoretical explanations on why flat minima works better. A classic viewpoint of generalization is described in output robustness with respect to input perturbations. We analyze the flatness of loss surfaces through the lens of robustness to input perturbations and advocate that gradient descent should be guided to reach flatter region of loss surfaces to achieve generalization. By doing so, we show the relation of learning rate and generalization. Furthermore, we developed a method which can discover flatter minima to improve the optimization of DNNs. Whereas optimizing deep neural networks using stochastic gradient descent has shown great performances in practice, the rule for setting step size (i.e. learning rate) of gradient descent is not well studied. Although it appears that some intriguing learning rate rules such as ADAM [26] have since been developed, they concentrated on improving convergence, not on improving generalization capabilities. Recently, the improved generalization property of the flat minima was revisited, and this research guides us towards promising solutions to many current optimization problems. We suggest a learning rate rule for escaping sharp regions of loss surfaces and propose a concept of learning rate scheduling called peak learning stage. Based on peak learning stage, we propose an adaptive-perparameter version of learning rate scehduling called Adapeak. Finally, we demonstrate the capacity of our approach by performing numerous experiments. To experimentally verify our theories, we performed many perturbation analysis on both input space and weight space. DNNs are extensively high-dimensional model, so it is hard to observe the flatness of its weight space. Therefore, we evaluate the subspace of high-dimensional loss surfaces and propose some effective methods for selecting subspaces of high-dimensional loss surfaces to estimate the generalization capability of the DNN model.
Kim, Junmoresearcher김준모researcher
한국과학기술원 :전기및전자공학부,
Issue Date

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2019.2,[v, 60 p. :]


Deep learning▼alearning rate▼ageneralization▼aloss Surfaces; 딥러닝▼a학습률▼a일반화▼a손실 평면

Appears in Collection
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0