This thesis proposes an improved method to verify an utterance that results from a word spotting system. A baseline word spotting system is implemented. The word spotting task in this thesis is to detect keywords from phone conversational database and according to the detected keywords, categorize speech data. To meet the systems specific goal and by analysis of target phone conversational speech, we build a multi-speaker dependent word spotting system. The system is based on HMMs and garbage models are used to model non-keyword intervals. These systems performance strongly rely on garbage models modeling non-keyword intervals. Even with accurate modeling of keyword and non-keyword intervals, these systems result in low performance. In order to improve performance of these systems, we use a two-pass structure which consists of a word spotting system and an utterance verification system.
Using utterance verification for word spotting, the conventional LRT based method which uses simple mean of PLLRs to obtain confidence measures for each word has problems due to inaccurate keyword boundary information in recognition results and unclear pronunciation of words in continuous speech. So, in this thesis, we propose a method to use pattern of PLLRs in each keyword. This pattern information is used to give different weights to each phone in the process of generating confidence measures for each keyword. This proposed method uses word specific information resulting in more discrimination between in-vocabulary and out-of-vocabulary words. We also introduce another similar conventional method which uses PLLR distribution information for comparison with the proposed method.
Experiments are performed on speech data which consists of 500 phone conversations between customers and call center operators.
Experimental results for utterance verification shows that, using proposed method, we could achieve performance improvement of 11.8% compared to a baseline LRT based meth...