DC Field | Value | Language |
---|---|---|
dc.contributor.author | Oh, Heung-Seon | ko |
dc.contributor.author | Myaeng, Sung Hyon | ko |
dc.date.accessioned | 2014-08-29T01:32:11Z | - |
dc.date.available | 2014-08-29T01:32:11Z | - |
dc.date.created | 2014-04-07 | - |
dc.date.created | 2014-04-07 | - |
dc.date.issued | 2014-04 | - |
dc.identifier.citation | JOURNAL OF INFORMATION SCIENCE, v.40, no.2, pp.127 - 145 | - |
dc.identifier.issn | 0165-5515 | - |
dc.identifier.uri | http://hdl.handle.net/10203/188782 | - |
dc.description.abstract | Hierarchical text classification of a Web taxonomy is challenging because it is a very large-scale problem with hundreds of thousands of categories and associated documents. Furthermore, the conceptual levels and training data availabilities of categories vary widely. The narrow-down approach is the state of the art; it utilizes a search engine for generating candidates from the taxonomy and builds a classifier for the final category selection. In this paper, we take the same approach but address the issue of using global information in a language modelling framework to improve effectiveness. We propose three methods of using non-local information for the task: a passive way of utilizing global information for smoothing; an aggressive way where a top-level classifier is built and integrated with a local model; and a method of using label terms associated with the path from a category to the root, which is based on our systematic observation that they are underrepresented in the documents. For evaluation, we constructed a document collection from Web pages in the Open Directory Project. A series of experiments and their results show the superiority of our methods and reveal the role of global information in hierarchical text classification. | - |
dc.language | English | - |
dc.publisher | SAGE PUBLICATIONS LTD | - |
dc.subject | NAIVE BAYES | - |
dc.title | Utilizing global and path information with language modelling for hierarchical text classification | - |
dc.type | Article | - |
dc.identifier.wosid | 000332214700001 | - |
dc.identifier.scopusid | 2-s2.0-84896752924 | - |
dc.type.rims | ART | - |
dc.citation.volume | 40 | - |
dc.citation.issue | 2 | - |
dc.citation.beginningpage | 127 | - |
dc.citation.endingpage | 145 | - |
dc.citation.publicationname | JOURNAL OF INFORMATION SCIENCE | - |
dc.identifier.doi | 10.1177/0165551513507415 | - |
dc.embargo.liftdate | 9999-12-31 | - |
dc.embargo.terms | 9999-12-31 | - |
dc.contributor.localauthor | Myaeng, Sung Hyon | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | language models | - |
dc.subject.keywordAuthor | Hierarchical text classification | - |
dc.subject.keywordAuthor | web taxonomy | - |
dc.subject.keywordPlus | NAIVE BAYES | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.