Rumors are a powerful, pervasive, and persistent force that affects society. Interest in the psychology of rumors and their control has increased since World War II, where these early studies relied on extensive yet manual data collection from books, newspapers, and interviews. Rumors have been described in numerous fashions, where the most well-known definitions are ‘public communications that are infused with private hypotheses about how the world works’ and ‘ways of making sense to help us cope with our anxieties and uncertainties’. As these definitions suggest, rumors help members of a society learn about its important issues by offering a collective problem-solving framework to individuals who participate.
With an emergence of microblogging platforms such as Twitter, Facebook, and Instagram, it becomes possible that every person can easily create and spread any type of information. Now a day, many persons utilize these media as main channels of information consumption and spreading as well as communication. However, the remarkable growth of such media has made rumors, spam, and misinformation far more prevalent as well as informative and creative contents. Given that this unexpected side effect is caused by absence of censorship, researchers have paid attention to rumor propagation through online social media in order to build rumor classifiers. However, existing studies are not free from two issues. First, the result can be biased by training data (field and observation period).
In order to solve the bias problem, an objective of initiating study is to combine the rumor theories and practice with data. With 60-day observation period, the temporal, structural and linguistic features driven from rumor theories could deliver more intuitive insights to understand rumor spreading. For the temporal features, we proposed a new method called Periodic External Shocks (PES) model that provide better fit of periodic bursts unique to rumors due to the external shock cycle. For the structural features, we extract properties related to propagation process such as the fraction of isolated spreaders and the information flows from low-degree to high-degree users. For the linguistic features, we examine scores of word-level categories particular to rumors like negation and skepticism. In addition to intuitive interpretation of the proposed features, they can differentiate rumors from others with competitive classification performance compared with existing state-of-the-arts. Furthermore, considered theories support that the features would be less sensitive to topic of training data. This work is one of the first in order to analyze the underlying process of rumor propagation based on annotated data drawn from a near-complete social media stream at the time of investigation.
Next, we characterize how rumor propagation patterns change over time---from the first three days to nearly two months---based on near-complete data of Twitter. A comprehensive set of user, structural, linguistic, and temporal features were examined and their relative strength as a key rumor trait was compared over varying time windows. Structural and temporal features could effectively distinguish rumors from non-rumors over a long-term window, yet they were not effective during the initial phase of rumors. In contrast, user and linguistic features remained a strong indicator throughout the rumor propagation phases. These findings provide new insights for understanding rumor propagation processes and developing an algorithm for early detection of rumors. Furthermore, linguistic features in this study are less sensitive to fields of collected data (e.g., IT, health, Music and so on) compared with other studies because of the selected vocabularies to extract them. The vocabularies about describing thinking styles (e.g., maybe, perhaps, but, not, and never) and cognitive mechanisms (e.g., cause, know, and ought). We, now, have an insight about changes in predictive powers of different feature sets based on observation period.
After relieving the bias issue, we deal with second issue that previous studies did not provide a clear criterion for stopping monitoring and classification. Based on the findings, we apply deep learning algorithms to user and linguistic features. Unlike most previous studies on rumor detection, we rephrase the user and linguistics observation as sequences as inputs for the algorithm. Handling sequential data is yet challenging since most machine learning algorithms deal with stationary data as inputs. With an advance of RNN, it is now possible to handle sequential data for regression or classification. Among existing variations of RNN, we introduce and apply Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) to build a rumor classifier. The proposed rumor classification algorithm showed competitive classification performances with only a small number of initial Tweets. To the best of our knowledge, this is the first super-fast rumor classification attempt that suggests a clear criterion for stopping data monitoring and classification.
The strength of this dissertation is based on bridging theory and practice as well as applying sequential data and deep learning algorithms for rumor identification and detection. We hope that this work will provide a cornerstone to understand rumor propagation in online social media and to build early stage rumor detection.