DSpace at KOASAS: Very low bit-rate speech coding using perceptual properties of human ear

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Very low bit-rate speech coding using perceptual properties of human ear인간의 청각 특성을 이용한 극저전송률 음성 부호화

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 455
Download : 0

Export

Han, Woo-Jin / 한우진

A major application of speech processing concerns digitally coding the speech signal for efficient, secure storage and transmission. It is very important to determine speech model parameters accurately and quantize them with as few bits as possible without introducing additional perceptual distortion as minimization of bit-rates is the ultimate aim in these applications. There have been considerable researches to encode the speech signal efficiently with bit-rates as small as possible. Among them, multi-band linear predictive coding (MB-LPC) vocoders can produce natural quality speech at a bit-rate as low as 1.2 kbit/s. Although the 1.2 kbit/s MB-LPC vocoder performs well in most cases, further bit-rate reductions can be achieved by considering several issues. One is that voiced/unvoiced decisions of the MB-LPC vocoder are binary values (voiced or unvoiced) so that the interpolation of them between neighboring frames, which is necessary to reduce total bit-rates, cannot be easily done. Another issue is the distortion measure that decides how the quantization and interpolation of model parameters should be performed. In the MB-LPC vocoder, all model parameters are detemined and quantized to minimize the spectral distortion (SD) between the original and synthesized spectra. Since the SD is not exactly proportional to the perceptual distortion actually received in a human ear, the coding performance can be further improved by using the perceptual properties of human auditory system. In this thesis, we propose a new mixed critical band linear predictive coding (MCB-LPC) speech model to overcome major drawbacks of the MB-LPC speech model. In the MCB-LPC speech model, the excitation signal can be represented by the real-valued function of the voiced/unvoiced components on the frequency axis instead of binary voiced/unvoiced decisions of the MB-LPC speech model. This allows the voiced and unvoiced components to be mixed together within the same frequency region and ma...

Advisors: Oh, Yung-Hwan researcher; 오영환 researcher

Description: 한국과학기술원 : 전산학전공,

Publisher: 한국과학기술원

Issue Date: 2002

Identifier: 174640/325007 / 000975416

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전산학전공, 2002.2, [ xii, 95 p. ]

Keywords: line spectral frequency; percpetual properties; very low bit-rate speech coding; vector quantization; 벡터 양자화; 선 스펙트럼 주파수; 청각 특성; 극저전송률 음성 부호화

URI: http://hdl.handle.net/10203/33194

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=174640&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Very low bit-rate speech coding using perceptual properties of human ear인간의 청각 특성을 이용한 극저전송률 음성 부호화

KOASAS

Communities & Collections