DSpace at KOASAS: Big data acquisition by crowdsourcing: fundamental limits and efficient algorithms

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Big data acquisition by crowdsourcing: fundamental limits and efficient algorithms크라우드소싱을 통한 빅데이터 수집에 관한 연구: 이론적 한계 성능 및 효율적 알고리즘 성능 분석

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 90
Download : 0

Export

Kim, Doyeon

Today, it is crucial to collect accurate and large labeled data for artificial intelligence or machine learning algorithms to train their models. Crowdsourcing system has emerged as an effective platform to acquire labeled data with relatively low cost by using non-expert workers, because one can access to this system anytime anywhere. Although the way of crowdsourced data collection has become ubiquitous, this way can have a problem because workers who provide answers through crowdsourcing may not give accurate answers for various reasons. Therefore, it is common to infer the correct label from many answers provided by workers. However, inferring correct labels from multiple noisy answers on data has been a challenging problem, since the quality of answers varies widely across tasks and workers. Many previous works have assumed a simple model where the order of workers in terms of their reliabilities is fixed across tasks, and focused on estimating the worker reliabilities to aggregate answers with different weights. We propose a highly general $d$-type worker-task specialization model in which the reliability of each worker can change depending on the type of a given task, where the number $d$ of types can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer labels with any given accuracy, and propose an algorithm achieving the optimal result under some assumptions. We also conduct experiments both on synthetic and real datasets, and show that our algorithm outperforms the existing algorithms developed based on strict model assumptions. Finally, we conclude this dissertation by presenting a direction of future work that can be studied later.

Advisors: Chung, Hye Won researcher; 정혜원 researcher

Description: 한국과학기술원 :전기및전자공학부,

Publisher: 한국과학기술원

Issue Date: 2023

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iv, 90 p. :]

Keywords: Crowdsourcing▼aData labeling▼aClustering▼aTask label inference; 크라우드소싱▼a데이터 라벨링▼a클러스터링▼a작업 라벨 추론

URI: http://hdl.handle.net/10203/309179

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030533&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Big data acquisition by crowdsourcing: fundamental limits and efficient algorithms크라우드소싱을 통한 빅데이터 수집에 관한 연구: 이론적 한계 성능 및 효율적 알고리즘 성능 분석

KOASAS

Communities & Collections