Several attempts have been made for invariant feature vectors for a consistent speech recognition system in this dissertation work. The consistent speech recognition means that the recognition accuracy of the system is high whether noise presents or not. A major barrier to the consistent speech recognition is the mismatch between training and testing conditions. The mismatch occurs when the noise type or its amount is different from that in the training phase and it produces discord among feature vectors for the same phonetic unit. In addition, the mismatch in feature vectors causes the degradation of the speech recognition performance.
We devoted ourselves to solve the mismatch problem by constructing consistent speech feature extraction method. Three novel feature extraction methods are proposed in order to achieve consistent speech recognition. They are based on human hearing and production mechanism.
In the first place, we propose a consistent feature extraction algorithm which employs a sub-pitch-based speech analysis method. The sub-pitch-based speech analysis arises from speech production mechanism, especially glottal waveform of voiced sound. The motive of this algorithm also effected by human hearing mechanism in which pitch information is used in segregation of concurrent vowels. The proposed feature extraction algorithm has advantages in extracting efficient spectral information for women````s voices and consistent feature vectors although environmental mismatch occurs. The reason for advantages is that the sub-pitch-based speech analysis method deals with a short duration, which is smaller than pitch period and has relatively high energy within the period. This short duration prevents the tendency of including pitch harmonics in female spectrum and the spectral information of environment noise.
Next, we suggest a pitch-based speaker normalization method which utilizes classical perceptual knowledge in order to normalize individual speaker````s spect...