Speech recognition is considered as one of the most natural activities in man-machine interaction. Many studies have attempted to provide low error rates for speakers with various characteristics, and speech recognition systems have recently achieved increasingly good performance. However, a speaker-dependent (SD) system generally outperforms a speaker-independent (SI) system when tested on the same speaker. Nevertheless, SI systems are more commonly found in real applications because a large amount of training data is required in SD systems. Speaker adaptation is a technique of producing a system suitable for a specific speaker from an SI system using a small amount of adaptation data for the speaker. Nowadays, researchers are concerned with rapid speaker adaptation, which is a technique of speaker adaptation using a small amount of data, around 30 seconds or less, since the range of applications that cannot request a long speech sample for adaptation data has been growing. Speaker adaptation in eigenvoice space is a popular method for rapid speaker adaptation. This technique constrains the adapted model to a linear combination of a small number of basis vectors, eigenvoices, obtained from a set of reference speakers, thereby reducing the number of free parameters to be estimated. This eigenvoice adaptation method shows good performance given a very small amount of adaptation data, but it has some problems.
One drawback of the technique is that the recognition rate of the adapted model reaches a plateau quite quickly. This is because the number of free parameters is too small to generate a sophisticated model, but overfitting may occur when a model has too many free parameters in relation to the amount of adaptation data. To solve this problem, a method is needed to control the number of free parameters according to amount of adaptation data. In this thesis, we propose speaker adaptation using structural eigenvoices. In this method, we can decide the number ...