Multi-Label Classification of Historical Documents by Using Hierarchical Attention Networks

Cited 3 time in webofscience Cited 2 time in scopus
  • Hit : 436
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Dong-Kyumko
dc.contributor.authorLee, Byunghweeko
dc.contributor.authorKim, Danielko
dc.contributor.authorJeong, Hawoongko
dc.date.accessioned2020-04-02T07:20:11Z-
dc.date.available2020-04-02T07:20:11Z-
dc.date.created2020-03-30-
dc.date.created2020-03-30-
dc.date.created2020-03-30-
dc.date.created2020-03-30-
dc.date.created2020-03-30-
dc.date.created2020-03-30-
dc.date.issued2020-03-
dc.identifier.citationJOURNAL OF THE KOREAN PHYSICAL SOCIETY, v.76, no.5, pp.368 - 377-
dc.identifier.issn0374-4884-
dc.identifier.urihttp://hdl.handle.net/10203/273790-
dc.description.abstractThe quantitative analysis of digitized historical documents has begun in earnest in recent years. Text classification is of particular importance for quantitative historical analysis because it helps to search literature efficiently and to determine the important subjects of a particular age. While numerous historians have joined together to classify large-scale historical documents, consistent classification among individual researchers has not been achieved. In this study, we present a classification method for large-scale historical data that uses a recently developed supervised learning algorithm called the Hierarchical Attention Network (HAN). By applying various classification methods to the Annals of the Joseon Dynasty (AJD), we show that HAN is more accurate than conventional techniques with word-frequency-based features. HAN provides the extent that a particular sentence or word contributes to the classification process through a quantitative value called 'attention'. We extract the representative keywords from various categories by using the attention mechanism and show the evolution of the keywords over the 472-year span of the AJD. Our results reveal that largely two groups of event categories are found in the AJD. In one group, the representative keywords of the categories were stable over long periods while the keywords in the other group varied rapidly, exhibiting repeatedly changing characteristics of the categories. Observing such macroscopic changes of representative words may provide insight into how a particular topic changes over a historical period.-
dc.languageEnglish-
dc.publisherKOREAN PHYSICAL SOC-
dc.titleMulti-Label Classification of Historical Documents by Using Hierarchical Attention Networks-
dc.typeArticle-
dc.identifier.wosid000519447800002-
dc.identifier.scopusid2-s2.0-85081735869-
dc.type.rimsART-
dc.citation.volume76-
dc.citation.issue5-
dc.citation.beginningpage368-
dc.citation.endingpage377-
dc.citation.publicationnameJOURNAL OF THE KOREAN PHYSICAL SOCIETY-
dc.identifier.doi10.3938/jkps.76.368-
dc.identifier.kciidART002566678-
dc.contributor.localauthorJeong, Hawoong-
dc.contributor.nonIdAuthorKim, Daniel-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorDeep learning-
dc.subject.keywordAuthorRecurrent neural network-
dc.subject.keywordAuthorText analysis-
dc.subject.keywordAuthorBig data-
dc.subject.keywordAuthorHistory-
Appears in Collection
PH-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0