DSpace at KOASAS: A taxonomy of dirty data

DSpace at KOASAS

College of Engineering(공과대학)Dept. of Bio and Brain Engineering(바이오및뇌공학과)BiS-Journal Papers(저널논문)

A taxonomy of dirty data

Cited 145 time in

Cited 0 time in

Hit : 825
Download : 1215

Export

DC Field	Value	Language
dc.contributor.author	Kim, W	ko
dc.contributor.author	Choi, BJ	ko
dc.contributor.author	Hong, EK	ko
dc.contributor.author	Kim, SK	ko
dc.contributor.author	Lee, Doheon	ko
dc.date.accessioned	2010-05-19T01:07:07Z	-
dc.date.available	2010-05-19T01:07:07Z	-
dc.date.created	2012-02-06	-
dc.date.created	2012-02-06	-
dc.date.issued	2003-01	-
dc.identifier.citation	DATA MINING AND KNOWLEDGE DISCOVERY, v.7, pp.81 - 99	-
dc.identifier.issn	1384-5810	-
dc.identifier.uri	http://hdl.handle.net/10203/18464	-
dc.description.abstract	Today large corporations are constructing enterprise data warehouses from disparate data sources in order to run enterprise-wide data analysis applications, including decision support systems, multidimensional online analytical applications, data mining, and customer relationship management systems. A major problem that is only beginning to be recognized is that the data in data sources are often "dirty". Broadly, dirty data include missing data, wrong data, and non-standard representations of the same data. The results of analyzing a database/data warehouse of dirty data can be damaging and at best be unreliable. In this paper, a comprehensive classification of dirty data is developed for use as a framework for understanding how dirty data arise, manifest themselves, and may be cleansed to ensure proper construction of data warehouses and accurate data analysis. The impact of dirty data on data mining is also explored.	-
dc.description.sponsorship	This research was partially supported by Korea’s Brain Korea-21 grant. This research was partially supported by Korea’s KISTEP grant.	en
dc.language	English	-
dc.language.iso	en_US	en
dc.publisher	SPRINGER	-
dc.subject	MULTIDATABASE SYSTEMS	-
dc.subject	RELATIONAL DATABASES	-
dc.subject	DATA QUALITY	-
dc.subject	FUZZY	-
dc.subject	HETEROGENEITY	-
dc.title	A taxonomy of dirty data	-
dc.type	Article	-
dc.identifier.wosid	000179705200004	-
dc.identifier.scopusid	2-s2.0-0037240183	-
dc.type.rims	ART	-
dc.citation.volume	7	-
dc.citation.beginningpage	81	-
dc.citation.endingpage	99	-
dc.citation.publicationname	DATA MINING AND KNOWLEDGE DISCOVERY	-
dc.embargo.liftdate	9999-12-31	-
dc.embargo.terms	9999-12-31	-
dc.contributor.localauthor	Lee, Doheon	-
dc.contributor.nonIdAuthor	Kim, W	-
dc.contributor.nonIdAuthor	Choi, BJ	-
dc.contributor.nonIdAuthor	Hong, EK	-
dc.contributor.nonIdAuthor	Kim, SK	-
dc.type.journalArticle	Article	-
dc.subject.keywordAuthor	dirty data	-
dc.subject.keywordAuthor	data quality	-
dc.subject.keywordAuthor	data mining	-
dc.subject.keywordAuthor	data cleansing	-
dc.subject.keywordAuthor	data warehousing	-
dc.subject.keywordPlus	MULTIDATABASE SYSTEMS	-
dc.subject.keywordPlus	RELATIONAL DATABASES	-
dc.subject.keywordPlus	DATA QUALITY	-
dc.subject.keywordPlus	FUZZY	-
dc.subject.keywordPlus	HETEROGENEITY	-

Appears in Collection: BiS-Journal Papers(저널논문)

Files in This Item

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 145 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

A taxonomy of dirty data

This item is cited by other documents in WoS

KOASAS

Communities & Collections