DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kim, Hoi-Rin | - |
dc.contributor.advisor | 김회린 | - |
dc.contributor.author | Kim, Myung-Jong | - |
dc.contributor.author | 김명종 | - |
dc.date.accessioned | 2011-12-14T02:30:05Z | - |
dc.date.available | 2011-12-14T02:30:05Z | - |
dc.date.issued | 2010 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=455270&flag=dissertation | - |
dc.identifier.uri | http://hdl.handle.net/10203/40139 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 정보통신공학과, 2010.08, [ viii, 48 p. ] | - |
dc.description.abstract | With the rapid distribution of the user created contents (UCCs), a variety of multimedia contents including general contents such as music, movies and malicious contents such as adult videos have been made and shared easily through the Internet. These social trends can negatively affect children or teenagers and cause sexual crimes. Therefore, analyzing the multimedia contents to decide whether the contents are malicious or not has recently received a great attention from many researchers and social groups. This thesis addresses the problem of analyzing the multimedia contents based on audio signals to detect and block the objectionable multimedia contents. The malicious sounds such as sexual scream or moan show the distinctive characteristics that have large temporal variations and fast spectral transitions. Therefore, extracting appropriate features to properly represent these characteristics is important in achieving a better performance. In this thesis, we employ segment-based two-dimensional $\emph{Mel}$-frequency cepstral coefficients and histograms of gradient directions as a feature set to characterize both the temporal variations and spectral transitions within a long-range segment of the target signal. Gaussian mixture model (GMM) is adopted to statistically represent the malicious and non-malicious sounds, and the test sounds are classified by a maximum a posterior probability (MAP) method. Evaluation of the proposed extraction method on a database of several hundred malicious and non-malicious sound clips yielded a classification accuracy of 96.06 %, which was a good performance showing a possibility that could be used as an alternative to the image-based methods. | eng |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Histograms of Oriented Gradients | - |
dc.subject | Segmental Two-Dimensional Mel-Frequency Cepstral Coefficients | - |
dc.subject | Multimedia Content Analysis | - |
dc.subject | Audio Feature Extraction | - |
dc.subject | Gaussian Mixture Model | - |
dc.subject | 가우시안 혼합 모델 | - |
dc.subject | 기울기방향성 히스토그램 | - |
dc.subject | 세그먼트기반 2차 멜켑스트럼계수 | - |
dc.subject | 멀티미디어 내용 분석 | - |
dc.subject | 오디오 특징 추출 | - |
dc.title | Audio feature extraction methods for multimedia content analysis | - |
dc.title.alternative | 멀티미디어 내용 분석을 위한 오디오 특징 추출 방법 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 455270/325007 | - |
dc.description.department | 한국과학기술원 : 정보통신공학과, | - |
dc.identifier.uid | 020084297 | - |
dc.contributor.localauthor | Kim, Hoi-Rin | - |
dc.contributor.localauthor | 김회린 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.