A Survey on Data Collection for Machine Learning: a Big Data - AI Integration Perspective

Cited 349 time in webofscience Cited 157 time in scopus
  • Hit : 468
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorRoh, Yujiko
dc.contributor.authorHeo, Geonko
dc.contributor.authorWhang, Steven Euijongko
dc.date.accessioned2021-03-17T06:10:06Z-
dc.date.available2021-03-17T06:10:06Z-
dc.date.created2019-11-22-
dc.date.created2019-11-22-
dc.date.created2019-11-22-
dc.date.issued2021-04-
dc.identifier.citationIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, v.33, no.4, pp.1328 - 1347-
dc.identifier.issn1041-4347-
dc.identifier.urihttp://hdl.handle.net/10203/281608-
dc.description.abstractData collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.-
dc.languageEnglish-
dc.publisherIEEE COMPUTER SOC-
dc.titleA Survey on Data Collection for Machine Learning: a Big Data - AI Integration Perspective-
dc.typeArticle-
dc.identifier.wosid000626617900002-
dc.identifier.scopusid2-s2.0-85102237692-
dc.type.rimsART-
dc.citation.volume33-
dc.citation.issue4-
dc.citation.beginningpage1328-
dc.citation.endingpage1347-
dc.citation.publicationnameIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING-
dc.identifier.doi10.1109/TKDE.2019.2946162-
dc.contributor.localauthorWhang, Steven Euijong-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorMachine learning-
dc.subject.keywordAuthorData collection-
dc.subject.keywordAuthorLabeling-
dc.subject.keywordAuthorData models-
dc.subject.keywordAuthorData acquisition-
dc.subject.keywordAuthorTraining data-
dc.subject.keywordAuthorSmart manufacturing-
dc.subject.keywordAuthorData collection-
dc.subject.keywordAuthordata acquisition-
dc.subject.keywordAuthordata labeling-
dc.subject.keywordAuthormachine learning-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 349 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0