An Approach to Spam Comment Detection through Domain-independent Features

Cited 5 time in webofscience Cited 0 time in scopus
  • Hit : 292
  • Download : 161
Previous research in spam detection, especially in email spam filtering, mainly focused on learning a set of discriminative features that are often present in the spam contents. Nowadays, these commercially oriented spams are well detected; the real challenge lies in filtering rather vague spams that do not exhibit distinctive spam keywords. We investigate two ways of detecting such spams: 1) By comparing the similarity between the publisher posts and user comments, and 2) by learning a single representative meta-feature such as user name or ID. The first measure relieves us from repetitively learning a set of domain-dependent spam features, and the second measure enables us to detect potential spam users even before the aggressive actions are performed. Prior to the language model comparison in the first method, we supplement the background information, normalize the text, perform co-reference resolution, and conduct word-to-word similarity measure in hope of enriching the language models to improve the classification accuracy. To evaluate the first measure, experiments on detecting blog-spam comments are conducted. As for the second measure, we employ SVM on the ID space of e-mail data collected by "Apache Spam Assassin".
Publisher
Korean Institute of Information Scientists and Engineers (KIISE)
Issue Date
2016-01-19
Language
English
Citation

2016 International Conference on Big Data and Smart Computing (BigComp2016), pp.273 - 276

DOI
10.1109/BIGCOMP.2016.7425926
URI
http://hdl.handle.net/10203/214365
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 5 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0