Utilizing global and path information with language modelling for hierarchical text classification

Cited 4 time in webofscience Cited 8 time in scopus
  • Hit : 764
  • Download : 10
DC FieldValueLanguage
dc.contributor.authorOh, Heung-Seonko
dc.contributor.authorMyaeng, Sung Hyonko
dc.date.accessioned2014-08-29T01:32:11Z-
dc.date.available2014-08-29T01:32:11Z-
dc.date.created2014-04-07-
dc.date.created2014-04-07-
dc.date.issued2014-04-
dc.identifier.citationJOURNAL OF INFORMATION SCIENCE, v.40, no.2, pp.127 - 145-
dc.identifier.issn0165-5515-
dc.identifier.urihttp://hdl.handle.net/10203/188782-
dc.description.abstractHierarchical text classification of a Web taxonomy is challenging because it is a very large-scale problem with hundreds of thousands of categories and associated documents. Furthermore, the conceptual levels and training data availabilities of categories vary widely. The narrow-down approach is the state of the art; it utilizes a search engine for generating candidates from the taxonomy and builds a classifier for the final category selection. In this paper, we take the same approach but address the issue of using global information in a language modelling framework to improve effectiveness. We propose three methods of using non-local information for the task: a passive way of utilizing global information for smoothing; an aggressive way where a top-level classifier is built and integrated with a local model; and a method of using label terms associated with the path from a category to the root, which is based on our systematic observation that they are underrepresented in the documents. For evaluation, we constructed a document collection from Web pages in the Open Directory Project. A series of experiments and their results show the superiority of our methods and reveal the role of global information in hierarchical text classification.-
dc.languageEnglish-
dc.publisherSAGE PUBLICATIONS LTD-
dc.subjectNAIVE BAYES-
dc.titleUtilizing global and path information with language modelling for hierarchical text classification-
dc.typeArticle-
dc.identifier.wosid000332214700001-
dc.identifier.scopusid2-s2.0-84896752924-
dc.type.rimsART-
dc.citation.volume40-
dc.citation.issue2-
dc.citation.beginningpage127-
dc.citation.endingpage145-
dc.citation.publicationnameJOURNAL OF INFORMATION SCIENCE-
dc.identifier.doi10.1177/0165551513507415-
dc.embargo.liftdate9999-12-31-
dc.embargo.terms9999-12-31-
dc.contributor.localauthorMyaeng, Sung Hyon-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorlanguage models-
dc.subject.keywordAuthorHierarchical text classification-
dc.subject.keywordAuthorweb taxonomy-
dc.subject.keywordPlusNAIVE BAYES-
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 4 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0