Very low bit-rate speech coding using perceptual properties of human ear = 인간의 청각 특성을 이용한 극저전송률 음성 부호화

A major application of speech processing concerns digitally coding the speech signal for efficient, secure storage and transmission. It is very important to determine speech model parameters accurately and quantize them with as few bits as possible without introducing additional perceptual distortion as minimization of bit-rates is the ultimate aim in these applications. There have been considerable researches to encode the speech signal efficiently with bit-rates as small as possible. Among them, multi-band linear predictive coding (MB-LPC) vocoders can produce natural quality speech at a bit-rate as low as 1.2 kbit/s. Although the 1.2 kbit/s MB-LPC vocoder performs well in most cases, further bit-rate reductions can be achieved by considering several issues. One is that voiced/unvoiced decisions of the MB-LPC vocoder are binary values (voiced or unvoiced) so that the interpolation of them between neighboring frames, which is necessary to reduce total bit-rates, cannot be easily done. Another issue is the distortion measure that decides how the quantization and interpolation of model parameters should be performed. In the MB-LPC vocoder, all model parameters are detemined and quantized to minimize the spectral distortion (SD) between the original and synthesized spectra. Since the SD is not exactly proportional to the perceptual distortion actually received in a human ear, the coding performance can be further improved by using the perceptual properties of human auditory system. In this thesis, we propose a new mixed critical band linear predictive coding (MCB-LPC) speech model to overcome major drawbacks of the MB-LPC speech model. In the MCB-LPC speech model, the excitation signal can be represented by the real-valued function of the voiced/unvoiced components on the frequency axis instead of binary voiced/unvoiced decisions of the MB-LPC speech model. This allows the voiced and unvoiced components to be mixed together within the same frequency region and ma...
Advisors
Oh, Yung-Hwanresearcher오영환researcher
Publisher
한국과학기술원
Issue Date
2002
Identifier
174640/325007 / 000975416
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학전공, 2002.2, [ xii, 95 p. ]

Keywords

line spectral frequency; percpetual properties; very low bit-rate speech coding; vector quantization; 벡터 양자화; 선 스펙트럼 주파수; 청각 특성; 극저전송률 음성 부호화

URI
http://hdl.handle.net/10203/33194
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=174640&flag=t
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.
  • Hit : 134
  • Download : 0
  • Cited 0 times in thomson ci

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0