Data Lifecycle Challenges in Production Machine Learning: A Survey

Cited 85 time in webofscience Cited 0 time in scopus
  • Hit : 481
  • Download : 0
Machine learning has become an essential tool for gleaning knowledge from data and tackling a diverse set of computationally hard tasks. However, the accuracy of a machine learned model is deeply tied to the data that it is trained on. Designing and building robust processes and tools that make it easier to analyze, validate, and transform data that is fed into large-scale machine learning systems poses data management challenges. Drawn from our experience in developing data-centric infrastructure for a production machine learning platform at Google, we summarize some of the interesting research challenges that we encountered, and survey some of the relevant literature from the data management and machine learning communities. Specifically, we explore challenges in three main areas of focus data understanding, data validation and cleaning, and data preparation. In each of these areas, we try to explore how different constraints are imposed on the solutions depending on where in the lifecycle of a model the problems are encountered and who encounters them.
Publisher
ASSOC COMPUTING MACHINERY
Issue Date
2018-06
Language
English
Article Type
Article
Citation

SIGMOD RECORD, v.47, no.2, pp.17 - 28

ISSN
0163-5808
DOI
10.1145/3299887.3299891
URI
http://hdl.handle.net/10203/248779
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 85 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0