Development and application of efficient data pruning techniques in deep learning딥러닝을 위한 효과적인 데이터 프루닝 기법의 개발과 적용

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 20
  • Download : 0
Recent advancements in deep learning technology have brought innovations across various fields. However, as the technology has evolved towards using more data and larger models for performance improvement, there has been an exponential increase in the required computational costs. Consequently, the importance of efficient learning techniques, especially in data pruning, is becoming increasingly significant. Nevertheless, there are two key issues with existing data pruning methodologies: the necessity of training with the entire dataset for data selection and the variance in each methodology's performance depending on the data selection ratio. This research proposes methodologies to address these two critical issues in data selection. To tackle the first issue, we propose a `CG-score' (Complexity Gap score), which allows for the understanding of data characteristics without training, and demonstrate that the data selection performance based on this score is comparable to that of existing methodologies. By utilizing the Neural Tangent Kernel, which can mathematically approximate the learning process without directly training deep learning models, we quantified the characteristics of the data using only the training data. For the second issue, we proposed a `BWS' (best window selection) methodology, which involves sorting data by difficulty score and adjusting the selection range according to the selection ratio. We theoretically verify that changing the selection region according to different ratios enables optimal data selection and empirically confirm that this approach outperforms existing methodologies across all selection ratios.
Advisors
정혜원researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2024
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2024.2,[iv, 68 p. :]

Keywords

딥러닝▼a효율적 학습기법▼a데이터 프루닝▼a데이터 선별▼a뉴럴 탄젠트 커널; Deep learning▼aEfficient learning▼aData pruning▼aData subset selection▼aNeural tangent kernel

URI
http://hdl.handle.net/10203/322146
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1100046&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0