DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lee, Soo-Young | - |
dc.contributor.advisor | 이수영 | - |
dc.contributor.author | Lee, Cheong-An | - |
dc.contributor.author | 이청안 | - |
dc.date.accessioned | 2013-09-12T02:01:38Z | - |
dc.date.available | 2013-09-12T02:01:38Z | - |
dc.date.issued | 2013 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=513315&flag=dissertation | - |
dc.identifier.uri | http://hdl.handle.net/10203/180995 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 전기및전자공학과, 2013.2, [ v, 57 p. ] | - |
dc.description.abstract | Sentiment classification is a task to determine overall contextual polarity of a review document. Sentiment classification can be used for a company to check the problem of their products or services from the large data. It also can be used for customer to decide the products or services they would consume. There are two main difficulties dealing with sentiment classification. First, the documents are usually represented as a bag-of-words model and the dimension of such document data is very large, so we need methods to extract or reduce the number of dimension. Secondly, if the domain is different for training data and testing data, the performance decreased severely. However, it is hard to get the labeled data for the all the domain we are interested in. To extract or reduce the dimension, we tried three methods: principal component analysis (PCA), conditional entropy (CE), and independent component analysis (ICA). We can reduce the dimension using PCA without any loss of information. By changing the estimation of probability a little bit, we are able to achieve more balanced estimation of CE, which gives robust recognition through different number of features we selected. ICA can make the features independent, so that it was expected to give better result when we used it with CE. However, experiments suggest that ICA is not useful for CE. To resolve the problem of domain difference, we propose domain adapting Boltzmann machine algorithm. The big difference between domains comes from the word dictionary used for each domain. So we take the approach to generate target domain words that are not appearing in source domain, and vice versa. In this thesis, we first applied this idea to simple toy problem and then real world problem. We improved the classification accuracy using our algorithm. | eng |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | sentiment classification | - |
dc.subject | domain adaptation | - |
dc.subject | Boltzmann machine | - |
dc.subject | conditional entropy | - |
dc.subject | 의견 분류 | - |
dc.subject | 도메인 적응 | - |
dc.subject | 볼츠만 머신 | - |
dc.subject | 조건부 엔트로피 | - |
dc.subject | 독립 요소 분석 | - |
dc.subject | independent component analysis | - |
dc.title | Domain adaptation in sentiment classification based on probabilistic models | - |
dc.title.alternative | 확률 모델에 기반한 의견 분류에서의 도메인 적응 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 513315/325007 | - |
dc.description.department | 한국과학기술원 : 전기및전자공학과, | - |
dc.identifier.uid | 020113491 | - |
dc.contributor.localauthor | Lee, Soo-Young | - |
dc.contributor.localauthor | 이수영 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.