Data Lifecycle Challenges in Production Machine Learning: A Survey

Cited 92 time in webofscience Cited 0 time in scopus
  • Hit : 537
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorPolyzotis, Neoklisko
dc.contributor.authorRoy, Sudipko
dc.contributor.authorWhang, Steven Euijongko
dc.contributor.authorZinkevich, Martinko
dc.date.accessioned2018-12-20T08:06:47Z-
dc.date.available2018-12-20T08:06:47Z-
dc.date.created2018-11-27-
dc.date.created2018-11-27-
dc.date.issued2018-06-
dc.identifier.citationSIGMOD RECORD, v.47, no.2, pp.17 - 28-
dc.identifier.issn0163-5808-
dc.identifier.urihttp://hdl.handle.net/10203/248779-
dc.description.abstractMachine learning has become an essential tool for gleaning knowledge from data and tackling a diverse set of computationally hard tasks. However, the accuracy of a machine learned model is deeply tied to the data that it is trained on. Designing and building robust processes and tools that make it easier to analyze, validate, and transform data that is fed into large-scale machine learning systems poses data management challenges. Drawn from our experience in developing data-centric infrastructure for a production machine learning platform at Google, we summarize some of the interesting research challenges that we encountered, and survey some of the relevant literature from the data management and machine learning communities. Specifically, we explore challenges in three main areas of focus data understanding, data validation and cleaning, and data preparation. In each of these areas, we try to explore how different constraints are imposed on the solutions depending on where in the lifecycle of a model the problems are encountered and who encounters them.-
dc.languageEnglish-
dc.publisherASSOC COMPUTING MACHINERY-
dc.titleData Lifecycle Challenges in Production Machine Learning: A Survey-
dc.typeArticle-
dc.identifier.wosid000453590200002-
dc.identifier.scopusid2-s2.0-85058810729-
dc.type.rimsART-
dc.citation.volume47-
dc.citation.issue2-
dc.citation.beginningpage17-
dc.citation.endingpage28-
dc.citation.publicationnameSIGMOD RECORD-
dc.identifier.doi10.1145/3299887.3299891-
dc.contributor.localauthorWhang, Steven Euijong-
dc.contributor.nonIdAuthorPolyzotis, Neoklis-
dc.contributor.nonIdAuthorRoy, Sudip-
dc.contributor.nonIdAuthorZinkevich, Martin-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 92 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0