An evaluation of passage-based text categorization

Cited 9 time in webofscience Cited 0 time in scopus
  • Hit : 838
  • Download : 70
DC FieldValueLanguage
dc.contributor.authorKim, Jin Sukko
dc.contributor.authorKim, Myoung Hoko
dc.date.accessioned2007-11-19T01:37:11Z-
dc.date.available2007-11-19T01:37:11Z-
dc.date.created2012-02-06-
dc.date.created2012-02-06-
dc.date.issued2004-07-
dc.identifier.citationJOURNAL OF INTELLIGENT INFORMATION SYSTEMS, v.23, no.1, pp.47 - 65-
dc.identifier.issn0925-9902-
dc.identifier.urihttp://hdl.handle.net/10203/1983-
dc.description.abstractResearches in text categorization have been confined to whole-document-level classification, probably due to lack of full-text test collections. However, full-length documents available today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of subtopic text blocks, or passages. In order to reflect the subtopic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several passages, assigns categories to each passage, and merges the passage categories to the document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. Using four subsets of the Reuters text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluate the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to the main topic(s), depending on their location in the test document.-
dc.description.sponsorshipWe would like to thank Wonkyun Joo for some helpful comments and fruitful discussions, Hwa-muk Yoon for providing raw data for the KISTI-Theses test collection, and Changmin Kim and Jieun Chong for supporting this work.en
dc.languageEnglish-
dc.language.isoen_USen
dc.publisherSPRINGER-
dc.subjectRANKING-
dc.titleAn evaluation of passage-based text categorization-
dc.typeArticle-
dc.identifier.wosid000221745000003-
dc.identifier.scopusid2-s2.0-3042796461-
dc.type.rimsART-
dc.citation.volume23-
dc.citation.issue1-
dc.citation.beginningpage47-
dc.citation.endingpage65-
dc.citation.publicationnameJOURNAL OF INTELLIGENT INFORMATION SYSTEMS-
dc.identifier.doi10.1023/B:JIIS.0000029670.53363.d0-
dc.embargo.liftdate9999-12-31-
dc.embargo.terms9999-12-31-
dc.contributor.localauthorKim, Myoung Ho-
dc.contributor.nonIdAuthorKim, Jin Suk-
dc.type.journalArticleArticle-
dc.subject.keywordAuthortext categorization-
dc.subject.keywordAuthorpassage-
dc.subject.keywordAuthornon-overlapping window-
dc.subject.keywordAuthoroverlapping window-
dc.subject.keywordAuthorparagraph-
dc.subject.keywordAuthorbounded-paragraph-
dc.subject.keywordAuthorpage-
dc.subject.keywordAuthorTextTile-
dc.subject.keywordAuthorpassage weight function-
dc.subject.keywordPlusRANKING-
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 9 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0