Big data acquisition by crowdsourcing: fundamental limits and efficient algorithms크라우드소싱을 통한 빅데이터 수집에 관한 연구: 이론적 한계 성능 및 효율적 알고리즘 성능 분석

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 90
  • Download : 0
Today, it is crucial to collect accurate and large labeled data for artificial intelligence or machine learning algorithms to train their models. Crowdsourcing system has emerged as an effective platform to acquire labeled data with relatively low cost by using non-expert workers, because one can access to this system anytime anywhere. Although the way of crowdsourced data collection has become ubiquitous, this way can have a problem because workers who provide answers through crowdsourcing may not give accurate answers for various reasons. Therefore, it is common to infer the correct label from many answers provided by workers. However, inferring correct labels from multiple noisy answers on data has been a challenging problem, since the quality of answers varies widely across tasks and workers. Many previous works have assumed a simple model where the order of workers in terms of their reliabilities is fixed across tasks, and focused on estimating the worker reliabilities to aggregate answers with different weights. We propose a highly general $d$-type worker-task specialization model in which the reliability of each worker can change depending on the type of a given task, where the number $d$ of types can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer labels with any given accuracy, and propose an algorithm achieving the optimal result under some assumptions. We also conduct experiments both on synthetic and real datasets, and show that our algorithm outperforms the existing algorithms developed based on strict model assumptions. Finally, we conclude this dissertation by presenting a direction of future work that can be studied later.
Advisors
Chung, Hye Wonresearcher정혜원researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 90 p. :]

Keywords

Crowdsourcing▼aData labeling▼aClustering▼aTask label inference; 크라우드소싱▼a데이터 라벨링▼a클러스터링▼a작업 라벨 추론

URI
http://hdl.handle.net/10203/309179
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030533&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0