As the volume of information available on the Internet and corporate intranets continues to increase, there is a growing need for the text categorization in knowledge portal helping people better find, filter, and manage these resources. But the conventional text categorization has some limitation. First of all, it is hard to keep paces with Interest of experts changed dynamically as time goes by. Second, the keyword-based method provokes problems such as very high dimensionality in the documents and sparsely resided feature space, gap between the unrestricted vocabularies of documents and the restricted vocabularies of indexing categories, and computational Complexity.
In this thesis, to overcome these limitations, new text categorization method using reference information is presented. In science and research domains, most documents have reference information that contains a list of related documents and their publication information. The authors write the references that are mostly concentrated on one or two major categories. So text categorization using reference information can reflect a list of related documents and their publication information of authors and reduce dimensionality due to sparse feature set. The performance of suggested method is verified by conducting an experiment.