(An) empirical study on DNN inference : Adaptive degree assignation schema based on pruning sensitivity model심층 신경망 추론에 대한 실증적 연구 : 프루닝 민감도 모형 기반 적응적 정도 할당 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 349
  • Download : 0
The need for reduction of computation resources consumed by deep learning applications has motivated researches to introduce several techniques that modify pre-trained DNNs. Pruning is one of said techniques, and is particularly suitable for modifying model which will be executed on GPUs and common DL frameworks. Since the introduction of pruning, a grate number of pruning approaches have been developed, each better than the previous one at reducing the inference computations cost while loosing smaller percentages of the original model’s accuracy. However, not much has been said about the degree to which layers of a DNN should be pruned, a particularly tricky task since it has been proven that different layers present a distinct sensitivity to pruning. The aim of this work is to study, experiment with and improve the means by which a proper pruning degree can be determined. To this end, we introduce a mathematical model for pruning sensitivity, as well as a schema for generating pruning degree assignations specific to each f the model’s layers based on their pruning sensitivity characteristic. Subsequent to this, we explore the usefulness of our proposed schema on a variety of models and datasets, providing thus a holistic view of the potential benefits of pruning. We performed a side-by-side comparison between models pruned according to our schema and models pruned according to literature-based pruning degree assignation. In terms of performance and accuracy, our approach allowed us to prune models which have less than 1% accuracy drop in comparison with those pruned according to the literature, while achieving from 17% to 22% more compute cost reduction. Additionally, we pruned a variety of models and achieved more than 30% compute cost reduction without loosing much more than 2.3% accuracy in the majority of cases. Such benefits were obtained thanks to the mathematical base used for modeling the pruning sensitivity of DNN layers. Said model allowed us to reduce considerably extra weights from robust layers while preserving more weights on the sensitive ones, thus achieving a good compromise between inference performance and accuracy.
Advisors
Youn, Chan-Hyunresearcher윤찬현researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2018.2,[iii, 48 p. :]

Keywords

Interference▼aInterference Serving▼aNeural Network Pruning▼aPruning Criteria▼aPruning Sensitivity▼aPruning Degree

URI
http://hdl.handle.net/10203/266873
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=733975&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0