DSpace at KOASAS: Learning Bill Similarity with Annotated and Augmented Corpora of Bills

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Conference Papers(학술회의논문)

Learning Bill Similarity with Annotated and Augmented Corpora of Bills

Cited 1 time in

Cited 0 time in scopus

Hit : 124
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Kim, Jiseon	ko
dc.contributor.author	Oh, Alice Haeyun	ko
dc.contributor.author	Kim, In Song	ko
dc.contributor.author	Griggs, Elden	ko
dc.date.accessioned	2021-11-09T06:46:00Z	-
dc.date.available	2021-11-09T06:46:00Z	-
dc.date.created	2021-11-02	-
dc.date.created	2021-11-02	-
dc.date.created	2021-11-02	-
dc.date.issued	2021-11	-
dc.identifier.citation	The 2021 Conference on Empirical Methods in Natural Language Processing	-
dc.identifier.uri	http://hdl.handle.net/10203/289003	-
dc.description.abstract	Bill writing is a critical element of representative democracy. However, it is often overlooked that most legislative bills are derived, or even directly copied, from other bills. Despite the significance of bill-to-bill linkages for understanding the legislative process, existing approaches fail to address semantic similarities across bills, let alone reordering or paraphrasing which are prevalent in legal document writing. In this paper, we overcome these limitations by proposing a 5-class classification task that closely reflects the nature of the bill generation process. In doing so, we construct a human-labeled dataset of 4,721 bill-to-bill relationships at the subsection-level and release this annotated dataset to the research community. To augment the dataset, we generate synthetic data with varying degrees of similarity, mimicking the complex bill writing process. We use BERT variants and apply multi-stage training, sequentially fine-tuning our models with synthetic and human-labeled datasets. We find that the predictive performance significantly improves when training with both human-labeled and synthetic data. Finally, we apply our trained model to infer section- and bill-level similarities. Our analysis shows that the proposed methodology successfully captures the similarities across legal documents at various levels of aggregation.	-
dc.language	English	-
dc.publisher	Empirical Methods in Natural Language Processing (EMNLP 2021)	-
dc.title	Learning Bill Similarity with Annotated and Augmented Corpora of Bills	-
dc.type	Conference	-
dc.identifier.wosid	000860727004011	-
dc.type.rims	CONF	-
dc.citation.publicationname	The 2021 Conference on Empirical Methods in Natural Language Processing	-
dc.identifier.conferencecountry	DR	-
dc.identifier.conferencelocation	Online & Barcelo Bavaro Convention Centre, Punta Cana	-
dc.contributor.localauthor	Oh, Alice Haeyun	-
dc.contributor.nonIdAuthor	Kim, In Song	-
dc.contributor.nonIdAuthor	Griggs, Elden	-

Appears in Collection: CS-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 1 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Learning Bill Similarity with Annotated and Augmented Corpora of Bills

This item is cited by other documents in WoS

KOASAS

Communities & Collections