A taxonomy of dirty data

Cited 145 time in webofscience Cited 0 time in scopus
  • Hit : 825
  • Download : 1215
DC FieldValueLanguage
dc.contributor.authorKim, Wko
dc.contributor.authorChoi, BJko
dc.contributor.authorHong, EKko
dc.contributor.authorKim, SKko
dc.contributor.authorLee, Doheonko
dc.date.accessioned2010-05-19T01:07:07Z-
dc.date.available2010-05-19T01:07:07Z-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.issued2003-01-
dc.identifier.citationDATA MINING AND KNOWLEDGE DISCOVERY, v.7, pp.81 - 99-
dc.identifier.issn1384-5810-
dc.identifier.urihttp://hdl.handle.net/10203/18464-
dc.description.abstractToday large corporations are constructing enterprise data warehouses from disparate data sources in order to run enterprise-wide data analysis applications, including decision support systems, multidimensional online analytical applications, data mining, and customer relationship management systems. A major problem that is only beginning to be recognized is that the data in data sources are often "dirty". Broadly, dirty data include missing data, wrong data, and non-standard representations of the same data. The results of analyzing a database/data warehouse of dirty data can be damaging and at best be unreliable. In this paper, a comprehensive classification of dirty data is developed for use as a framework for understanding how dirty data arise, manifest themselves, and may be cleansed to ensure proper construction of data warehouses and accurate data analysis. The impact of dirty data on data mining is also explored.-
dc.description.sponsorshipThis research was partially supported by Korea’s Brain Korea-21 grant. This research was partially supported by Korea’s KISTEP grant.en
dc.languageEnglish-
dc.language.isoen_USen
dc.publisherSPRINGER-
dc.subjectMULTIDATABASE SYSTEMS-
dc.subjectRELATIONAL DATABASES-
dc.subjectDATA QUALITY-
dc.subjectFUZZY-
dc.subjectHETEROGENEITY-
dc.titleA taxonomy of dirty data-
dc.typeArticle-
dc.identifier.wosid000179705200004-
dc.identifier.scopusid2-s2.0-0037240183-
dc.type.rimsART-
dc.citation.volume7-
dc.citation.beginningpage81-
dc.citation.endingpage99-
dc.citation.publicationnameDATA MINING AND KNOWLEDGE DISCOVERY-
dc.embargo.liftdate9999-12-31-
dc.embargo.terms9999-12-31-
dc.contributor.localauthorLee, Doheon-
dc.contributor.nonIdAuthorKim, W-
dc.contributor.nonIdAuthorChoi, BJ-
dc.contributor.nonIdAuthorHong, EK-
dc.contributor.nonIdAuthorKim, SK-
dc.type.journalArticleArticle-
dc.subject.keywordAuthordirty data-
dc.subject.keywordAuthordata quality-
dc.subject.keywordAuthordata mining-
dc.subject.keywordAuthordata cleansing-
dc.subject.keywordAuthordata warehousing-
dc.subject.keywordPlusMULTIDATABASE SYSTEMS-
dc.subject.keywordPlusRELATIONAL DATABASES-
dc.subject.keywordPlusDATA QUALITY-
dc.subject.keywordPlusFUZZY-
dc.subject.keywordPlusHETEROGENEITY-
Appears in Collection
BiS-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 145 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0