1. RUMOR GAUGE: PREDICTING THE
VERACITY OF RUMORS ON
TWITTER
LAB
(M10517019)
PAPER FROM ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, VOLUME 11,NO 4, ARTICLE
50, PUBLICATION DATE : JULY 2017
1
3. COMMON SENSE
HOW TO COMPUTE THE DIFFERENCE BETWEEN TWO VECTORS ?
‣ It’s easy , Euclidean distance.
‣ If the length of two vector are not equal ?
3
4. COMMON SENSE
DYNAMIC TIME WARPING , DTW ( )
‣ See the formula first.
‣ But, I’m prefer to explain by steps and illustrations.
4
5. 1.The left and bottom
sides of the table
each have a set of
numbers, for the
input values, are
unequal length
vectors.
DYNAMIC TIME WARPING , DTW ( )
COMMON SENSE 5
6. 2. The upper right is
the end point, the
bottom left is the
starting point.
DYNAMIC TIME WARPING , DTW ( )
COMMON SENSE 5
7. DYNAMIC TIME WARPING , DTW ( )
COMMON SENSE
3.Find the distance
(absolute subtraction)
of the two
characteristics of
each of the two
dimensions in the
table, plus the
minimum distance
of the “only
previous stage”.
5
8. DYNAMIC TIME WARPING , DTW ( )
COMMON SENSE
4.In this example, two
sets of vectors (the
vector can be of
different lengths)
distance is 15.
5
9. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
‣ What is Likelihood?
REALITY
MODEL
POPULATION
DATA
SAMPLE
PROBABILITY
LIKELIHOOD
6
10. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
It assumes that the sample should be approximately the
same normal distribution as the model.
7
11. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
Likelihood Estimate
MAXIMUN LIKELIHOOD
SAMPLE 1
DATA
MODEL
8
12. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
‣ 𝞱 represents the normally distribution parameter (𝛍,𝛔 ), but 𝞱 is actually a
number and multiplies by a series of chances to produce a normal distribution.
‣ 𝓁 is the most likely to produce this set of observations in the case of 𝞱.
2
estimate
9
13. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
Likelihood
Model parameter
A set of sample.
The interval of several parameters.
10
14. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
Natural Log
∑∏
11
15. COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
First order derivative of "THETA"
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
‣ First order derivative for extreme value.
12
16. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
The second derivative of “THETA".
‣ First order derivative for extreme value.
‣ The second derivative determines the maximum or minimum.
‣ Less than zero on behalf of maximum.
‣ More than zero on behalf of minimum.
That’s we want.
13
17. EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
‣ Quote the literature from Wikipedia
‣ Also in estimating the maximum likelihood estimate.
‣ Latent variables?
The EM algorithm is used to find (locally) maximum likelihood parameters of a statistical model in cases
where the equations cannot be solved directly. Typically these models involve latent variables in
addition to unknown parameters and known data observations. That is, either missing values exist
among the data, or the model can be formulated more simply by assuming the existence of further
unobserved data points.
14
18. EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
‣ Quote the literature from Wikipedia
‣ Also in estimating the maximum likelihood estimate.
‣ Latent variables?
The EM algorithm is used to find (locally) maximum likelihood parameters of a statistical model in cases
where the equations cannot be solved directly. Typically these models involve latent variables in
addition to unknown parameters and known data observations. That is, either missing values exist
among the data, or the model can be formulated more simply by assuming the existence of further
unobserved data points.
14
21. EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
And than Randomly
Adjustment parameters until almost all points have reached the MLE.
OBSERVED DATA
Decide to have several Gaussian distributions first (that is, grouping).
17
22. COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set
Loop
Loop
‣ Assume there were Nth 𝜃 representing Nth
Gaussian distributions.
‣ Each 𝜃 represents a distributions parameter.
Calculate each coin’s
Expectation.
B
A
A
B
A
18
23. COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set
Loop
Loop
‣ We don’t know which are coin A or B.
‣ Only know to be divided into N groups ( N
Gaussian distributions ).
Calculate each coin’s
Expectation.
?
?
?
?
?
19
24. COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set
Loop
Loop
‣ For example, if we want to split into two
distributions 1 and 2.
Calculate each coin’s
Expectation.
20
25. COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Calculate each coin’s
Expectation.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set.
Loop
Loop
‣ The percentage of each 𝜃 in the sum.
This represents the probability 𝞱i can be used as the
model parameter of this sample.
21
26. COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Calculate each coin’s
Expectation.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set.
Loop
Loop
‣ The percentage of each 𝜃 multiply the
H_count and T_count of each corresponding
sample.
‣ Add the expected value of the same 𝜃.
22
27. COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Calculate each coin’s
Expectation.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set.
Loop
Loop
‣ Calculate the probability of H in each 𝜃.
‣ Overwrite each 𝜃 values.
‣ Put the new 𝜃 into next generation.
23
28. COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Calculate each coin’s
Expectation.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set.
Loop
Loop 24
29. MARKOV MODEL , MM ( )
COMMON SENSE
‣ represents the States.
‣ represents the number of .
‣ represents the transfer probability.
‣ represents the th transfer probability
from the th .
25
31. MARKOV MODEL , MM ( )
COMMON SENSE
‣ We can choose any one as a starting
point.
‣ When drawing a picture, we can design
an imaginary state s₀ , the total
probability is 1.
25
32. HIDDEN MARKOV MODEL , HMM ( )
‣ represents the observed.
‣ In addition to the observed data, the
rest of the things can not be given.
COMMON SENSE 26
33. ‣ It answers the question: What is the probability of a particular model
producing a particular set of observations?
‣ To evaluate, we use one of two algorithms: forward algorithms or backward
algorithms (DO NOT confuse them with forward-backward algorithms).
EVALUATION PROBLEM
COMMON SENSE 27
34. ‣ It answers the question: Given a series of observation and model
parameters, what is the most likely sequence of states that produces this
observation sequence ?
‣ For decoding we use the Viterbi algorithm.
DECODING PROBLEM
COMMON SENSE 28
36. ‣ It answers the question: Given a model structure and a series of
observations, find the model parameters that best fit the data.
‣ For this problem we can use the following 3 algorithms:
‣ MLE ( maximum likelihood estimation )
‣ Viterbi training( DO NOT confuse with Viterbi decoding )
‣ Baum Welch = forward-backward algorithm
LEARNING PROBLEM
COMMON SENSE 30
37. IN RUMOR GAUGE
Two models
1. Expectation-Maximum Algorithm
2. Forward or Backward Algorithm
3. Unidentified Rumor
OVERVIEW 31
38. IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
32
39. IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
33
40. IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
34
41. IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
It was made certain that all of the rumors have at least
1,000 tweets.
Any rumor that was identified with less than 1,000 tweets
was discarded.
35
42. IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
36
43. IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
‣ The red dot on the green block
represents the false tweet that
is predicted to be true.
37
44. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorNEGATION
FEATURE
‣ Stanford NLP parser
‣ With WordNet, ConceptNet aids may be better.
Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using
neural net-works. In Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP). ACL, 740–750.
38
45. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorAVERAGE SOPHISTICATION AND FORMALITY OF TWEETS
‣ Vulgarity : The presence of vulgar words in the tweet.
‣ Abbreviations : The presence of abbreviations (such as b4 for before, jk for just
kidding and irl for in real life) in the tweet.
‣ Emoticons : The presence of emoticons in the tweet.
‣ Average word complexity : Average length of words in the tweet.
‣ Sentence complexity : The grammatical complexity of the tweet.
Online Dictionaries
Dependency parse tree
FEATURE 39
46. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorAVERAGE SOPHISTICATION AND FORMALITY OF TWEETS
Depth = 5
FEATURE 40
47. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorRATIO OF TWEETS CONTAINING OPINION & INSIGHT
‣ Linguistic Inquiry and Word Count, LIWC
‣ The author should want this property (click here search “insight”)
FEATURE 41
48. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorRATIO OF INFERRING & TENTATIVE TWEETS
‣ Linguistic Inquiry and Word Count, LIWC
‣ The author should want this property (click here search “tentat”)
FEATURE 42
49. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorCONTROVERSIALITY
‣ These replies are then run through a state-of-the-art Twitter sentiment classifier
[Vosoughi et al. 2015], which classifies them as either positive, negative, or neutral.
‣ All conversations for a tweet are classified as positive, negative, neutral.
Soroush Vosoughi, Helen Zhou, and Deb Roy. 2015. Enhanced Twitter sentiment classification
using contextual information. In Proceedings of the 6th Workshop on Computational Approaches
to Sub-jectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics,
16–24.
FEATURE 43
51. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorCREDIBILITY
‣ The credibility of a user is a binary feature measured by whether the user’s account
has been officially verified by Twitter or not.
FEATURE 45
52. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorINFLUENCE
‣ Influence is measured simply by the number of followers of a user. Presumably, the
more followers a user has, the more influential he or she is.
FEATURE 46
53. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorROLE
‣ High followers of the number of users, act as broadcasters.
‣ Low followers number of users, act as recipients.
This feature is not fully utilized in the article.
FEATURE 47
55. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorTIME-INFERRED DIFFUSION
‣ The tweets' graphs do not show who I retweet.
‣ I think this tool is their monopoly technology with reference to Goel et al's
article. [2012].
Tweets' Graphs Tweets' Graphs
FEATURE 49
56. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT)
FEATURE 50
57. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT)
FEATURE 50
58. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT)
FEATURE 50
59. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF NODES IN LARGEST CONNECTED COMPONENT
%Nodes in LCC = 32 / 54 = 0.6 %Nodes in LCC = 9 / 54 = 0.16
FEATURE 51
60. Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorAVERAGE DEPTH TO BREADTH RATIO
FEATURE
%depth-to-breadth = 7 /42 = 0.16
%depth-to-breadth = 12 /42 = 0.28
Average %depth-to-breadth = (0.16 + 0.28) / 2 = 0.22
52
66. FOUR MAIN GOALS
EVALUATION
1. Measure the accuracy at which our model can predict the veracity of a
rumor before the first trusted verification.
2. Measure the contribution of each of the linguistic, user, and propagation
categories as a whole.
3. Measure the contributions of each of the 17 features individually.
4. Measure the accuracy of our model as a function of latency (i.e., time
elapsed since the beginning of a rumor).
58
67. FOUR MAIN GOALS
EVALUATION
1. Measure the accuracy at which our model can predict the veracity of a
rumor before the first trusted verification.
2. Measure the contribution of each of the linguistic, user, and propagation
categories as a whole.
3. Measure the contributions of each of the 17 features individually.
4. Measure the accuracy of our model as a function of latency (i.e., time
elapsed since the beginning of a rumor).
Fake-checking by Wikipedia, Snopes.com, FactCheck.org.
58
68. FOUR MAIN GOALS
EVALUATION
1. Measure the accuracy at which our model can predict the veracity of a
rumor before the first trusted verification.
2. Measure the contribution of each of the linguistic, user, and propagation
categories as a whole.
3. Measure the contributions of each of the 17 features individually.
4. Measure the accuracy of our model as a function of latency (i.e., time
elapsed since the beginning of a rumor).
58
71. ROC CURS
EVALUATION
= FP/(FP + TN)
= TP/(TP + FN)
‣ If the user is a financial market official, he needs
most of the real rumors to assist in stock
trading.
‣ He can choose x = 0.6, y = 0.97 position.
61
72. ACCURACY VS. LATENCY
EVALUATION
1. The model reaches 75% accuracy right
before trusted verification.
2. The model barely performs better than
chance (54%) before 50% latency.
3. The propagation features did not contribute
much until around 65% latency.
4. The early performance of the model seems to
be fulled mostly by the linguistic and user
features, which then plateau at around 55%
latency, as the amount of information they can
contribute to the model saturates.
1.
2.
3.
4.
62
73. TEMPORAL VS. NON-TEMPORAL MODELS
EVALUATION
‣ The performance of the non-temporal model
converges much faster than the temporal
model, which again can be attributed mostly to
the delayed effect(?) of the propagation
features in the temporal model.
‣ The non-temporal linguistic and user models
track their corresponding temporal models
fairly closely, with the linguistic models being
the most similar.
63
75. NEAR REAL-TIME VERACITY PREDICTION
EVALUATION
‣ The computation power required for computing
the features is minimal, the real bottle- neck, is
the Twitter API limit.
65
76. CONCLUSION
MY CONCLUSION
‣ I focus on fake news, so the dataset and the feature will be different.
‣ Author's article is GMM-HMM( Gaussian Mixture Model - Hidden Markov Model).
‣ Change the validation method.
‣ Picture truth?
‣ The first step is collect truth and fake news.
66