DSpace at KOASAS: Hybrid imputation of cluster-based k-NN and maximum likelihood estimation in software project data

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Hybrid imputation of cluster-based k-NN and maximum likelihood estimation in software project data군집기반 k-NN과 최대우도추정법을 결합한 소프트웨어 프로젝트 데이터용 하이브리드 대치법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 735
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Bae, Doo-Hwan	-
dc.contributor.advisor	배두환	-
dc.contributor.author	Lee, Dong-Ho	-
dc.contributor.author	이동호	-
dc.date.accessioned	2011-12-13T06:07:45Z	-
dc.date.available	2011-12-13T06:07:45Z	-
dc.date.issued	2009	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=303642&flag=dissertation	-
dc.identifier.uri	http://hdl.handle.net/10203/34840	-
dc.description	학위논문(석사) - 한국과학기술원 : 전산학전공, 2009.2, [ vi, 47 p. ]	-
dc.description.abstract	Missing data is one of the common problems that software practitioners face often when they analyze software project data. In the empirical software engineering community, k-NN and Maximum likelihood estimation were known to be effective to software project data. However, they have the following limitations in applying alone to software project data: (1) the imputation accuracy of k-NN is affected by the homogeneity of data, and (2) Maximum likelihood estimation is ineffective in the data set containing less than 100 project instances. To cope with these limitations of existing techniques in applying them alone to software project data, hybrid imputation techniques combining several methods have been developed. However, it can be applied to only software project data with less than 100 project instances. In this paper, we propose a hybrid imputation method using cluster-based k-NN and Maximum likelihood estimation in software project data. Maximum likelihood estimation is applied first and then Hierarchical clustering partitions software project data into clusters. Initial imputation using Maximum likelihood estimation makes k-NN use the non-missing data of project instances having missing data, in its searching; partitioning software project data into clusters increases the homogeneity of data set. After finding most $\it{k}$ similar project instances in the cluster, an average between the result of k-NN and that of Maximum likelihood estimation is taken. In the empirical study, we evaluated our approach and other five methods by experiments on 2,160 data sets, which are generated by injecting missing data into the two industrial data sets such as software project data measured in a bank in Korea and ISBSG data set. The results of the Wilcoxon rank sum test confirm that our approach outperforms the other five methods with respect to the data set size, the number of missing attributes, the missing data percentage, and the missingness mechanism.	eng
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	imputation	-
dc.subject	k-NN	-
dc.subject	maximum likelihood estimation	-
dc.subject	software project data	-
dc.subject	cluster	-
dc.subject	대치법	-
dc.subject	k 최근접이웃대치법	-
dc.subject	최대우도추정법	-
dc.subject	소프트웨어 프로젝트 데이터	-
dc.subject	클러스터	-
dc.subject	imputation	-
dc.subject	k-NN	-
dc.subject	maximum likelihood estimation	-
dc.subject	software project data	-
dc.subject	cluster	-
dc.subject	대치법	-
dc.subject	k 최근접이웃대치법	-
dc.subject	최대우도추정법	-
dc.subject	소프트웨어 프로젝트 데이터	-
dc.subject	클러스터	-
dc.title	Hybrid imputation of cluster-based k-NN and maximum likelihood estimation in software project data	-
dc.title.alternative	군집기반 k-NN과 최대우도추정법을 결합한 소프트웨어 프로젝트 데이터용 하이브리드 대치법	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	303642/325007	-
dc.description.department	한국과학기술원 : 전산학전공,	-
dc.identifier.uid	020073371	-
dc.contributor.localauthor	Bae, Doo-Hwan	-
dc.contributor.localauthor	배두환	-

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Hybrid imputation of cluster-based k-NN and maximum likelihood estimation in software project data군집기반 k-NN과 최대우도추정법을 결합한 소프트웨어 프로젝트 데이터용 하이브리드 대치법

KOASAS

Communities & Collections