DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Bae, Doo-Hwan | - |
dc.contributor.advisor | 배두환 | - |
dc.contributor.author | Lee, Dong-Ho | - |
dc.contributor.author | 이동호 | - |
dc.date.accessioned | 2011-12-13T06:07:45Z | - |
dc.date.available | 2011-12-13T06:07:45Z | - |
dc.date.issued | 2009 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=303642&flag=dissertation | - |
dc.identifier.uri | http://hdl.handle.net/10203/34840 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전산학전공, 2009.2, [ vi, 47 p. ] | - |
dc.description.abstract | Missing data is one of the common problems that software practitioners face often when they analyze software project data. In the empirical software engineering community, k-NN and Maximum likelihood estimation were known to be effective to software project data. However, they have the following limitations in applying alone to software project data: (1) the imputation accuracy of k-NN is affected by the homogeneity of data, and (2) Maximum likelihood estimation is ineffective in the data set containing less than 100 project instances. To cope with these limitations of existing techniques in applying them alone to software project data, hybrid imputation techniques combining several methods have been developed. However, it can be applied to only software project data with less than 100 project instances. In this paper, we propose a hybrid imputation method using cluster-based k-NN and Maximum likelihood estimation in software project data. Maximum likelihood estimation is applied first and then Hierarchical clustering partitions software project data into clusters. Initial imputation using Maximum likelihood estimation makes k-NN use the non-missing data of project instances having missing data, in its searching; partitioning software project data into clusters increases the homogeneity of data set. After finding most $\it{k}$ similar project instances in the cluster, an average between the result of k-NN and that of Maximum likelihood estimation is taken. In the empirical study, we evaluated our approach and other five methods by experiments on 2,160 data sets, which are generated by injecting missing data into the two industrial data sets such as software project data measured in a bank in Korea and ISBSG data set. The results of the Wilcoxon rank sum test confirm that our approach outperforms the other five methods with respect to the data set size, the number of missing attributes, the missing data percentage, and the missingness mechanism. | eng |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | imputation | - |
dc.subject | k-NN | - |
dc.subject | maximum likelihood estimation | - |
dc.subject | software project data | - |
dc.subject | cluster | - |
dc.subject | 대치법 | - |
dc.subject | k 최근접이웃대치법 | - |
dc.subject | 최대우도추정법 | - |
dc.subject | 소프트웨어 프로젝트 데이터 | - |
dc.subject | 클러스터 | - |
dc.subject | imputation | - |
dc.subject | k-NN | - |
dc.subject | maximum likelihood estimation | - |
dc.subject | software project data | - |
dc.subject | cluster | - |
dc.subject | 대치법 | - |
dc.subject | k 최근접이웃대치법 | - |
dc.subject | 최대우도추정법 | - |
dc.subject | 소프트웨어 프로젝트 데이터 | - |
dc.subject | 클러스터 | - |
dc.title | Hybrid imputation of cluster-based k-NN and maximum likelihood estimation in software project data | - |
dc.title.alternative | 군집기반 k-NN과 최대우도추정법을 결합한 소프트웨어 프로젝트 데이터용 하이브리드 대치법 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 303642/325007 | - |
dc.description.department | 한국과학기술원 : 전산학전공, | - |
dc.identifier.uid | 020073371 | - |
dc.contributor.localauthor | Bae, Doo-Hwan | - |
dc.contributor.localauthor | 배두환 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.