A query-by-speech scheme for photo albuming음성 질의 기반 디지털 사진 검색 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 408
  • Download : 0
It is common to store and manage many personal photos in the personal computer (PC) due to wide use of digital cameras. The larger the number of photos in the PC is, the more difficult it is for us to find a specific one among them. We suggest an attractive way to search photos by using speech query. If speech segment corresponding to the input query is included in some voice documents attached to the photos, the retrieval system will provide us the list of relevant photos that are stored in the PC. For the speech-based contents retrieval system, we propose two approaches that are based not on the speech-to-text conversion strategy but on the speech-to-speech matching strategy. The first one uses phoneme recognition techniques for the matching and the second uses traditional techniques such as vector quantization and dynamic time warping. For the phoneme recognition approach, we take two different methods. One is to use phoneme-occurrence information and the other is to use phoneme-sequential information additionally. These methods use the phoneme recognizer as the baseline process to produce the phoneme sequence for the speech input. In these methods, the pattern of phoneme sequence in the query is compared with those in the recorded files, and the similarities are calculated, which represent how much the queries are similar with the recorded files. The method using vector quantization(VQ) and dynamic time warping(DTW) is that the feature vectors of speech are clustered by vector quantization and the similarities are calculated between the clustered patterns of query and the recorded files by using dynamic time warping. Because dynamic time warping needs an amount of time, an alternative way is used to reduce the computations. At first, the frame sequence is separated into two sequences. One consists of the even numbered frames in the original frame sequence and the other consists of the odd numbered frames. Each sequence is compared with the odd or even...
Advisors
Kim, Hoi-Rinresearcher김회린researcher
Description
한국정보통신대학교 : 공학부,
Publisher
한국정보통신대학교
Issue Date
2006
Identifier
392616/225023 / 020034599
Language
eng
Description

학위논문(석사) - 한국정보통신대학교 : 공학부, 2006, [ x, 44 p. ]

Keywords

Spoken Document Retrieval; Query by Speech; 음성 질의; 음성 정보 검색

URI
http://hdl.handle.net/10203/55436
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=392616&flag=dissertation
Appears in Collection
School of Engineering-Theses_Master(공학부 석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0