A comparison of synthetic data approaches using utility and disclosure risk measures유용성과 노출 위험성 지표를 이용한 재현자료 기법 비교 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 512
  • Download : 0
This paper investigates synthetic data generation methods and their evaluation measures. There have been increasing demands for releasing various types of data to the public for different purposes. At the same time, there are also unavoidable concerns about leaking critical or sensitive information. Many synthetic data gener-ation methods have been proposed over the years in order to address these concerns and implemented in some countries, including Korea. The current study aims to introduce and compare three representative synthetic data generation approaches: Sequential regression, nonparametric Bayesian multiple imputations, and deep generative models. Several evaluation metrics that measure the utility and disclosure risk of synthetic data are also reviewed. We provide empirical comparisons of the three synthetic data generation approaches with respect to various eval-uation measures. The findings of this work will help practitioners to have a better understanding of the advantages and disadvantages of those synthetic data methods.
Publisher
KOREAN STATISTICAL SOC
Issue Date
2023-04
Language
English
Article Type
Article
Citation

KOREAN JOURNAL OF APPLIED STATISTICS, v.36, no.2, pp.141 - 166

ISSN
1225-066X
DOI
10.5351/KJAS.2023.36.2.141
URI
http://hdl.handle.net/10203/306836
Appears in Collection
MA-Journal Papers(저널논문)IE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0