Sequential targeting: A continual learning approach for data imbalance in text classification

Cited 8 time in webofscience Cited 0 time in scopus
  • Hit : 291
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorJang, Joelko
dc.contributor.authorKim, Yoonjeonko
dc.contributor.authorChoi, Kyounghoko
dc.contributor.authorSuh, Sunghoko
dc.date.accessioned2021-07-06T01:30:06Z-
dc.date.available2021-07-06T01:30:06Z-
dc.date.created2021-07-05-
dc.date.created2021-07-05-
dc.date.issued2021-10-
dc.identifier.citationEXPERT SYSTEMS WITH APPLICATIONS, v.179-
dc.identifier.issn0957-4174-
dc.identifier.urihttp://hdl.handle.net/10203/286401-
dc.description.abstractText classification has numerous use cases including sentiment analysis, spam detection, document classification, hate speech detection, etc. In realistic settings, classification on text data confronts imbalanced data conditions where classes of interest usually compose a minor fraction. Deep neural networks used for text classification, such as recurrent neural networks and transformer networks, suffer from a lack of efficient methods addressing imbalanced data. Traditional data-level methods attempting to mitigate distributional skew include oversampling and undersampling. The oversampling methods destruct the quality of original language representation of the sparse data coming from minority classes whereas the undersampling methods fail to fully utilize the rich context of majority classes. We address such issues in data-driven approaches by enforcing continual learning on imbalanced data by partitioning the training data distribution into mutually exclusive subsets and performing continual learning, treating the individual subsets as distinct tasks. We demonstrate the effectiveness of our method through experiments on the IMDB dataset and constructed datasets from real-world data. The experimental results show that the proposed method improves by 56.38 %p on the IMDB dataset and by 16.89 %p and 34.76 %p on the constructed datasets compared to the baseline method in terms of the F1-score metric.-
dc.languageEnglish-
dc.publisherPERGAMON-ELSEVIER SCIENCE LTD-
dc.titleSequential targeting: A continual learning approach for data imbalance in text classification-
dc.typeArticle-
dc.identifier.wosid000663549200004-
dc.identifier.scopusid2-s2.0-85105247196-
dc.type.rimsART-
dc.citation.volume179-
dc.citation.publicationnameEXPERT SYSTEMS WITH APPLICATIONS-
dc.identifier.doi10.1016/j.eswa.2021.115067-
dc.contributor.localauthorKim, Yoonjeon-
dc.contributor.nonIdAuthorJang, Joel-
dc.contributor.nonIdAuthorChoi, Kyoungho-
dc.contributor.nonIdAuthorSuh, Sungho-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorContinual learning-
dc.subject.keywordAuthorData imbalance-
dc.subject.keywordAuthorDeep learning-
dc.subject.keywordAuthorSentiment analysis-
dc.subject.keywordAuthorText classification-
dc.subject.keywordPlusNETWORKS-
Appears in Collection
RIMS Journal Papers
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 8 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0