Deep neural network for document clustering and speech recognition문서 클러스터링과 음성인식을 위한 심층신경망 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 710
  • Download : 0
We address in semantic language modeling, deep neural network and spontaneous speech recognition. Conventional language model uses very limited history information. However natural language has an intension of speaker that extend over full sentence. To find intension of speaker, semantic analysis is openly used. Recently, there have been studies investigating the use of semantic analysis in language modeling. We have hypothesis that if we combine semantic information with original lexical information, then we can expect improvement of language model. And we want to model long distance dependency of semantic information. We employ the information provided by a semantic analyzer to enhance the language model used in automatic speech recognition system. By using semantic information, we can bias the recognizer towards sentences that are more meaningful within our domain. We introduce several ways to use semantic information in language modeling. We used a shallow semantic parser to extract a concept sequence from a sentence. We design phrasal context free grammars (CFG) to define semantic concept. To model joint probability of semantic information, we employ maximum entropy modeling that tightly integrates lexical and semantic features to form a unified semantic language model. Secondary, Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the trained domain-specific LMs; therefore, the clustering and weight-estimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis (LSA), non-negative matrix factorization (NMF), and LDA with n-gram counting. Thirdly, Latent Dirichlet Allocation (LDA) was adopted for efficient layer-by-layer pre-training of deep neural networks, and applied to document classification tasks. Starting from the word-document matrix, the first layer learned the topic representation of the documents by a generative LDA model, and latter was converted into an approximate feed-forward network by pseudo-inverse of the learned word-topic matrix. The rectified linear unit (ReLU) was incorporated to generate non-negative output activations. Then, additional LDA-based layers were stacked from this output activations. Single or two-layer feedforward networks were added at the end, and trained by a supervised learning algorithm. Then, the whole networks were trained again for fine tuning. The LDA-based initialization of deep neural networks was applied to a document classification task with 10 different random initializations. In comparison with other initializations such as random, stacked auto-encodes, and stacked single hidden-layer with supervised learning, the LDA-based initialization demonstrated much better performance with both smaller mean false recognition rates and smaller standard deviations. Finally, We study robust speech recognition for various speaking rate. We propose new deep neural network structure that is designed for robust speech recognition. We propose max-pooling layer at DNN structure and we employ weight coupling constraint at max-pooling layer for robustness of various speaking rate. Proposed DNN model is used at conversational telephony database (switchboard DB). We study on performance of speaking rate invariance.
Advisors
Lee, Soo-Youngresearcher이수영researcher
Description
한국과학기술원 :바이오및뇌공학과,
Publisher
한국과학기술원
Issue Date
2016
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2016.8 ,[v, 52 p. :]

Keywords

Document Clustering; Language Model; Semantic Language Model; Language Model Adaptation; Deep Neural Network; Spontaneous Speech Recognition; 문서 클러스터링; 언어모델; 의미 언어모델; 언어모델 적응; 심층신경망; 자연어 음성인식

URI
http://hdl.handle.net/10203/221153
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=663092&flag=dissertation
Appears in Collection
BiS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0