The two major factors affecting speaker identification performance are the degradations introduced by noisy communication channels and mismatch between the training and the testing data properties. During the last several years, Gaussian Mixture Models (GMMs) have become very popular in speaker identification systems and have proven to perform very well for clean wideband speech. However, in noisy environments or for noisy band-limited telephone speech, the performance degrades considerably. It is also well known that speaker’s voice always changes over time because of the varying factors such as verbal usage, vocal tract, mood, and health.
In this paper, to cope with the mismatches, we proposed the use of prosodic features such as the mean pitch value in voiced intervals while the weighted filter bank analysis (WFBA) is adopted to increase the discriminating capability of mel frequency cepstral coefficients (MFCCs) for speaker identification.
In addition, this thesis includes an exhaustive study on several environments and their combinations in order to produce the most robust speaker identification results. The DWFBA method shows 2.77%~4.65% error reduction rate, added pitch information utilization method produces 21.62%~45.39% error reduction rate and combined DWFBA and pitch information utilizing method produces 31.35%~45.39% error reduction rate comparing to the baseline Gaussian Mixture Model.