Learning Bill Similarity with Annotated and Augmented Corpora of Bills

Cited 1 time in webofscience Cited 0 time in scopus
  • Hit : 124
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Jiseonko
dc.contributor.authorOh, Alice Haeyunko
dc.contributor.authorKim, In Songko
dc.contributor.authorGriggs, Eldenko
dc.date.accessioned2021-11-09T06:46:00Z-
dc.date.available2021-11-09T06:46:00Z-
dc.date.created2021-11-02-
dc.date.created2021-11-02-
dc.date.created2021-11-02-
dc.date.issued2021-11-
dc.identifier.citationThe 2021 Conference on Empirical Methods in Natural Language Processing-
dc.identifier.urihttp://hdl.handle.net/10203/289003-
dc.description.abstractBill writing is a critical element of representative democracy. However, it is often overlooked that most legislative bills are derived, or even directly copied, from other bills. Despite the significance of bill-to-bill linkages for understanding the legislative process, existing approaches fail to address semantic similarities across bills, let alone reordering or paraphrasing which are prevalent in legal document writing. In this paper, we overcome these limitations by proposing a 5-class classification task that closely reflects the nature of the bill generation process. In doing so, we construct a human-labeled dataset of 4,721 bill-to-bill relationships at the subsection-level and release this annotated dataset to the research community. To augment the dataset, we generate synthetic data with varying degrees of similarity, mimicking the complex bill writing process. We use BERT variants and apply multi-stage training, sequentially fine-tuning our models with synthetic and human-labeled datasets. We find that the predictive performance significantly improves when training with both human-labeled and synthetic data. Finally, we apply our trained model to infer section- and bill-level similarities. Our analysis shows that the proposed methodology successfully captures the similarities across legal documents at various levels of aggregation.-
dc.languageEnglish-
dc.publisherEmpirical Methods in Natural Language Processing (EMNLP 2021)-
dc.titleLearning Bill Similarity with Annotated and Augmented Corpora of Bills-
dc.typeConference-
dc.identifier.wosid000860727004011-
dc.type.rimsCONF-
dc.citation.publicationnameThe 2021 Conference on Empirical Methods in Natural Language Processing-
dc.identifier.conferencecountryDR-
dc.identifier.conferencelocationOnline & Barcelo Bavaro Convention Centre, Punta Cana-
dc.contributor.localauthorOh, Alice Haeyun-
dc.contributor.nonIdAuthorKim, In Song-
dc.contributor.nonIdAuthorGriggs, Elden-
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 1 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0