Online speaker segmentation and clustering of spoken documents = 음성 문서의 온라인 화자분할 및 군집화

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 299
  • Download : 0
As a variety of multimedia data such as broadcast news, entertainments, and education materials, are produced every day and spread over the internet, content retrieval technologies have become essential to search and manage such a large amount of data. In relation to this, people are beginning to take interest in spoken document retrieval, as research on speech and speaker recognition has led to major technical breakthroughs with smart devices. Spoken documents contain speech from various speakers and thus speaker diarization or speaker indexing is important for retrieval. Speaker diarization determines how many speakers are included in a given spoken document and partitions the document into homogeneous segments according to each speaker`s identity. This task replies to the question "Who spoke when?", whereas speaker recognition addresses the question "Who spoke?". Speaker diarization consists of three processes, speech detection, speaker segmentation, and clustering segments. This dissertation proposes online speaker segmentation and clustering technique of spoken documents for speaker diarization system. Speaker segmentation is to find the change point of the speakers so that each segment contains only one speaker`s speech. It has various applications such as a preprocessing task for audio indexing, speaker tracking, information extraction, and so on. The most popular criterion used in unsupervised speaker segmentation is the Bayesian Information Criterion (BIC). Conventional BIC-based speaker segmentation firstly constructs two single Gaussian models for two divided speech streams respectively, in an analysis window, a regular size of speech data shifted over the audio stream. And then, the dissimilarity between the two independent models is estimated according to the BIC principle. This approach has been successfully applied to speaker segmentation. However, it tends to fail to detect speaker changes for short speech segments since it is hard to represent...
Oh, Yung-Hwanresearcher오영환researcher
한국과학기술원 : 전산학과,
Issue Date
466471/325007  / 020037223

학위논문(박사) - 한국과학기술원 : 전산학과, 2011.2, [ ix, 67 p. ]


local UBM; relative-GLR; intra-GLR; 온라인 화자별 색인; 화자분할; 화자 군집화; 지역적 UBM; 상대적 GLR; online speaker segmentation; online speaker clustering

Appears in Collection
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0