Text mining: effective feature extraction and classification using NMF algorithm텍스트 마이닝: NMF 알고리즘을 이용한 효율적 특징 선택 및 분류

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 828
  • Download : 0
In this dissertation, we propose a novel concept termed nonnegative matrix factorization based on supervised feature selection and adaptation (NSFA) algorithm as an extension of unsupervised nonnegative matrix factorization (NMF) to document classification. In the text mining systems, term frequency based document vector representation model is the most common one where the terms are regarded as features. The natural language terms or words have some inherent problems such as synonymy that prevent terms being optimal features. The unsupervised NMF algorithm is used to extract the meaningful basis factor and corresponding coefficient factors of the documents where the basis vectors capture the concept of the documents by analyzing the co-occurrence distribution of terms. These basis vectors are used as features instead of individual terms. The unsupervised feature extraction reduces the feature dimension and also addresses the problems of natural language text. All features that are extracted by unsupervised NMF algorithm may not be relevant and optimal for classification. Based on the given category information the relevant features are selected and adapted to enhance the classification performance. As a selection criterion the rank of mutual information (MI) based relevance measure is used. For adaptation process standard NMF structure with single layer perceptron (NMF-SLP) and Feed-forward multilayer perceptron (MLP) networks are used. For NMF-SLP network a hybrid feature adaptation algorithm (NMFH) is proposed where the document feature vectors and classifier layer is trained on the basis of gradient descent based error minimization rule and the basis or concept vectors of the NMF layer are trained based on the KL-divergence minimization rule. For feed-forward multilayer perceptron (MLP) network we proposed two different learning algorithms named as MLP-NMFI and MLP-NMFI-NC. MLP-NMFI is defined as the MLP training by error back-propagation (EBP) rule w...
Advisors
Lee, Soo-Youngresearcher이수영researcher
Description
한국과학기술원 : 바이오및뇌공학과,
Publisher
한국과학기술원
Issue Date
2008
Identifier
303563/325007  / 020044523
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2008. 8., [ vii, 103 p. ]

Keywords

Non-negative Matrix Factorization; Text Mining; Document Classification; Feature Adaptation; Feature Selection; 비음수 행렬 분해법; 텍스트 마이닝; 문서 분류; 특징 적응; 특징 선택; Non-negative Matrix Factorization; Text Mining; Document Classification; Feature Adaptation; Feature Selection; 비음수 행렬 분해법; 텍스트 마이닝; 문서 분류; 특징 적응; 특징 선택

URI
http://hdl.handle.net/10203/27063
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=303563&flag=dissertation
Appears in Collection
BiS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0