(A) hybrid instance selection using nearest-neighbor framework for cross-project defect prediction with consideration of class imbalance클래스 불균형을 고려한 교차 프로젝트 결함 예측용 근접 기반 하이브리드 인스탄스 선택 프레임워크

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 641
  • Download : 0
Software defect prediction can help to allocate testing resources on fault-prone modules. Typically, local data within a company are used to build classifiers. In contrast to such Within-Project Defect Prediction (WPDP), there may exist some cases, e.g., pilot projects, that lack past data. Cross-project defect prediction (CPDP) using data from other projects can be useful in such cases. The major challenge of CPDP is different distributions in the training and test data. To tackle this, instances of the source data similar to the target data are selected to build classifiers. Software defect datasets have a class imbalance problem, i.e., the size ratio of the defective class to the clean class is very low. It usually lowers the performance of classifiers. In the presence of irrelevant or redundant information, prediction performance may be degraded as well. To address all the above issues, we propose a Hybrid Instance Selection using Nearest-Neighbor (HISNN) framework. It performs a hybrid classification that selectively learns local knowledge (via k-Nearest Neighbor) and global knowledge (via naive Bayes). Instances that have strong local knowledge are identified via nearest-neighbors with the same class label. To identify the optimal feature selection technique, we compare 9 feature selection techniques in cross-project settings. After features are chosen, classifiers are built, tested, and later evaluated based on the statistical significance test and the effect size test. The results show that the predictive performances of HISNN are comparable to those of WPDP. Using HISNN, companies without local data can predict defects with high performance until sufficient data are collected. Consequently, software quality can be managed effectively.
Advisors
Baik, Jongmoonresearcher백종문researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2016.8 ,[vii, 101 p. :]

Keywords

Cross-Project Defect Prediction; Class Imbalance; Feature Selection; Instance-based Learning; Selective Learning; 교차 프로젝트 결함 예측; 클래스 불균형; 특징 선택; 인스탄스 기반 학습; 선택 학습

URI
http://hdl.handle.net/10203/222403
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663204&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0