Non-negative matrix factorization based text mining: Feature extraction and classification

The unlabeled document or text collections are becoming larger and larger which is common and obvious; mining such data sets are a challenging task. Using the simple word-document frequency matrix as feature space the mining process is becoming more complex. The text documents are often represented as high dimensional about few thousand sparse vectors with sparsity about 95 to 99% which significantly affects the efficiency and the results of the mining process. In this paper, we propose the two-stage Non-negative Matrix Factorization (NMF): in the first stage we tried to extract the uncorrelated basis probabilistic document feature vectors by significantly reducing the dimension of the feature vectors of the word-document frequency from few thousand to few hundred, and in the second stage for clustering or classification. In our propose approach it has been observed that the clustering or classification performance with more than 98.5% accuracy. The dimension reduction and classification performance has observed for the Classic3 dataset.
Publisher
SPRINGER-VERLAG BERLIN
Issue Date
2006
Language
ENG
Citation

NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS BOOK SERIES: LECTURE NOTES IN COMPUTER SCIENCE, v.4233, pp.703 - 712

ISSN
0302-9743
URI
http://hdl.handle.net/10203/10203
Appears in Collection
EE-Journal Papers(저널논문)
  • Hit : 316
  • Download : 1
  • Cited 0 times in thomson ci
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡClick to seewebofscience_button
⊙ Cited 4 items in WoSClick to see citing articles inrecords_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0