Study on the N-gram measure based flame detection in Korean online messages = N-gram을 이용한 인터넷 게시판에서의 상호 비방 척도 알고리즘에 대한 연구

People often use the internet in order to express their opinions for specific issues or to get some information. Flames among online messages disrupt those uses. In this paper, I propose a heuristic method which detects flames from online messages automatically using an n-gram language model. We focus on flaming in Korean web sites, but our system can be applied to any other languages. I propose a method to extract features based on n-grams and score each feature by a heuristic method. The proposed algorithm outperforms a wordbased algorithm in terms of the accuracy and the recall rates, because the algorithm presented in this paper can solve the two problems: variants of words and abbreviations of blanks. In the evaluation, I compare the proposed method with the word-based algorithm and the algorithm based on an n-gram language model which use SVM learning machine. While the proposed algorithm does not need any stemming and tagging tasks, it can detect more accurately by 10% than the algorithm based on words.
Advisors
Hahn, Min-Sooresearcher한민수researcher
Publisher
한국정보통신대학교
Issue Date
2008
Identifier
392954/225023 / 020054673
Language
eng
Description

학위논문(석사) - 한국정보통신대학교 : 공학부, 2008.2, [ iv, 45 p. ]

Keywords

Sentiment Analysis; Text Mining; Flame; 악플; 비방; 텍스트 마이닝

URI
http://hdl.handle.net/10203/54983
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=392954&flag=t
Appears in Collection
School of Engineering-Theses_Master(공학부 석사논문)
Files in This Item
There are no files associated with this item.
  • Hit : 87
  • Download : 0
  • Cited 0 times in thomson ci

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0