HMM & R & FK

RUMOR GAUGE: PREDICTING THE
VERACITY OF RUMORS ON
TWITTER
LAB
(M10517019)
PAPER FROM ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, VOLUME 11,NO 4, ARTICLE
50, PUBLICATION DATE : JULY 2017
1

OUTLINE
‣ Common Sense
‣ Overview
‣ Dataset
‣ Three Type Of Features
‣ Model
‣ Evaluation
2

COMMON SENSE
HOW TO COMPUTE THE DIFFERENCE BETWEEN TWO VECTORS ?
‣ It’s easy , Euclidean distance.
‣ If the length of two vector are not equal ?
3

COMMON SENSE
DYNAMIC TIME WARPING , DTW ( )
‣ See the formula ﬁrst.
‣ But, I’m prefer to explain by steps and illustrations.
4

1.The left and bottom
sides of the table
each have a set of
numbers, for the
input values, are
unequal length
vectors.
COMMON SENSE 5

2. The upper right is
the end point, the
bottom left is the
starting point.
COMMON SENSE 5

COMMON SENSE
3.Find the distance
(absolute subtraction)
of the two
characteristics of
each of the two
dimensions in the
table, plus the
minimum distance
of the “only
previous stage”.
5

COMMON SENSE
4.In this example, two
sets of vectors (the
vector can be of
different lengths)
distance is 15.
5

MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
‣ What is Likelihood?
REALITY
MODEL
POPULATION
DATA
SAMPLE
PROBABILITY
LIKELIHOOD
6

COMMON SENSE
It assumes that the sample should be approximately the
same normal distribution as the model.
7

COMMON SENSE
Likelihood Estimate
MAXIMUN LIKELIHOOD
SAMPLE 1
DATA 
MODEL
8

COMMON SENSE
‣ 𝞱 represents the normally distribution parameter (𝛍,𝛔 ), but 𝞱 is actually a
number and multiplies by a series of chances to produce a normal distribution.
‣ 𝓁 is the most likely to produce this set of observations in the case of 𝞱.
2
estimate
9

COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
Likelihood
Model parameter
A set of sample.
The interval of several parameters.
10

COMMON SENSE
CREATE A FUNCTION.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
Natural Log
∑∏
11

COMMON SENSE
CREATE A FUNCTION.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
First order derivative of "THETA"
‣ First order derivative for extreme value.
12

COMMON SENSE
CREATE A FUNCTION.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
The second derivative of “THETA".
‣ First order derivative for extreme value.
‣ The second derivative determines the maximum or minimum.
‣ Less than zero on behalf of maximum.
‣ More than zero on behalf of minimum.
That’s we want.
13

EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
‣ Quote the literature from Wikipedia
‣ Also in estimating the maximum likelihood estimate.
‣ Latent variables?
The EM algorithm is used to ﬁnd (locally) maximum likelihood parameters of a statistical model in cases
where the equations cannot be solved directly. Typically these models involve latent variables in
addition to unknown parameters and known data observations. That is, either missing values exist
among the data, or the model can be formulated more simply by assuming the existence of further
unobserved data points.
14

COMMON SENSE
GROUP A
GROUP B
OBSERVED DATA
15

COMMON SENSE
OBSERVED DATA
‣ If we don’t know the source?
16

COMMON SENSE
And than Randomly
Adjustment parameters until almost all points have reached the MLE.
OBSERVED DATA
Decide to have several Gaussian distributions ﬁrst (that is, grouping).
17

COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Normalize the set of
MLE.
Overwrite the 𝜃 set
Loop
Loop
‣ Assume there were Nth 𝜃 representing Nth
Gaussian distributions.
‣ Each 𝜃 represents a distributions parameter.
Calculate each coin’s
Expectation.
B
A
A
B
A
18

COMMON SENSE
Observed Data
MLE.
Loop
Loop
‣ We don’t know which are coin A or B.
‣ Only know to be divided into N groups ( N
Gaussian distributions ).
Expectation.
?
?
?
?
?
19

COMMON SENSE
Observed Data
MLE.
Loop
Loop
‣ For example, if we want to split into two
distributions 1 and 2.
Expectation.
20

COMMON SENSE
Observed Data
Expectation.
MLE.
Overwrite the 𝜃 set.
Loop
Loop
‣ The percentage of each 𝜃 in the sum.
This represents the probability 𝞱i can be used as the
model parameter of this sample.
21

COMMON SENSE
Observed Data
Expectation.
MLE.
Loop
Loop
‣ The percentage of each 𝜃 multiply the
H_count and T_count of each corresponding
sample.
‣ Add the expected value of the same 𝜃.
22

COMMON SENSE
Observed Data
Expectation.
MLE.
Loop
Loop
‣ Calculate the probability of H in each 𝜃.
‣ Overwrite each 𝜃 values.
‣ Put the new 𝜃 into next generation.
23

COMMON SENSE
Observed Data
Expectation.
MLE.
Loop
Loop 24

MARKOV MODEL , MM ( )
COMMON SENSE
‣ represents the States.
‣ represents the number of .
‣ represents the transfer probability.
‣ represents the th transfer probability
from the th .
25

COMMON SENSE 25

COMMON SENSE
‣ We can choose any one as a starting
point.
‣ When drawing a picture, we can design
an imaginary state s₀ , the total
probability is 1.
25

HIDDEN MARKOV MODEL , HMM ( )
‣ represents the observed.
‣ In addition to the observed data, the
rest of the things can not be given.
COMMON SENSE 26

‣ It answers the question: What is the probability of a particular model
producing a particular set of observations?
‣ To evaluate, we use one of two algorithms: forward algorithms or backward
algorithms (DO NOT confuse them with forward-backward algorithms).
EVALUATION PROBLEM
COMMON SENSE 27

‣ It answers the question: Given a series of observation and model
parameters, what is the most likely sequence of states that produces this
observation sequence ?
‣ For decoding we use the Viterbi algorithm.
DECODING PROBLEM
COMMON SENSE 28

VITERBI ALGORITHM
COMMON SENSE 29

‣ It answers the question: Given a model structure and a series of
observations, ﬁnd the model parameters that best ﬁt the data.
‣ For this problem we can use the following 3 algorithms:
‣ MLE ( maximum likelihood estimation )
‣ Viterbi training( DO NOT confuse with Viterbi decoding )
‣ Baum Welch = forward-backward algorithm
LEARNING PROBLEM
COMMON SENSE 30

IN RUMOR GAUGE
Two models
1. Expectation-Maximum Algorithm
2. Forward or Backward Algorithm
3. Unidentiﬁed Rumor
OVERVIEW 31

IN RUMOR GAUGE
DATASET
Two models
2. Forward-Backward Algorithm
32

IN RUMOR GAUGE
DATASET
Two models
33

IN RUMOR GAUGE
DATASET
Two models
34

IN RUMOR GAUGE
DATASET
Two models
It was made certain that all of the rumors have at least
1,000 tweets.
Any rumor that was identiﬁed with less than 1,000 tweets
was discarded.
35

IN RUMOR GAUGE
DATASET
Two models
36

IN RUMOR GAUGE
DATASET
Two models
‣ The red dot on the green block
represents the false tweet that
is predicted to be true.
37

Two models
3. Unidentiﬁed RumorNEGATION
FEATURE
‣ Stanford NLP parser
‣ With WordNet, ConceptNet aids may be better.
Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using
neural net-works. In Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP). ACL, 740–750.
38

Two models
3. Unidentiﬁed RumorAVERAGE SOPHISTICATION AND FORMALITY OF TWEETS
‣ Vulgarity : The presence of vulgar words in the tweet.
‣ Abbreviations : The presence of abbreviations (such as b4 for before, jk for just
kidding and irl for in real life) in the tweet.
‣ Emoticons : The presence of emoticons in the tweet.
‣ Average word complexity : Average length of words in the tweet.
‣ Sentence complexity : The grammatical complexity of the tweet.
Online Dictionaries
Dependency parse tree
FEATURE 39

Two models
3. Unidentiﬁed RumorAVERAGE SOPHISTICATION AND FORMALITY OF TWEETS
Depth = 5
FEATURE 40

Two models
3. Unidentiﬁed RumorRATIO OF TWEETS CONTAINING OPINION & INSIGHT
‣ Linguistic Inquiry and Word Count, LIWC
‣ The author should want this property (click here search “insight”)
FEATURE 41

Two models
3. Unidentiﬁed RumorRATIO OF INFERRING & TENTATIVE TWEETS
‣ Linguistic Inquiry and Word Count, LIWC
‣ The author should want this property (click here search “tentat”)
FEATURE 42

Two models
3. Unidentified RumorCONTROVERSIALITY
‣ These replies are then run through a state-of-the-art Twitter sentiment classifier
[Vosoughi et al. 2015], which classifies them as either positive, negative, or neutral.
‣ All conversations for a tweet are classified as positive, negative, neutral.
Soroush Vosoughi, Helen Zhou, and Deb Roy. 2015. Enhanced Twitter sentiment classification
using contextual information. In Proceedings of the 6th Workshop on Computational Approaches
to Sub-jectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics,
16–24.
FEATURE 43

Two models
3. Unidentiﬁed RumorORIGINALITY
FEATURE 44

Two models
3. Unidentified RumorCREDIBILITY
‣ The credibility of a user is a binary feature measured by whether the user’s account
has been officially verified by Twitter or not.
FEATURE 45

Two models
3. Unidentified RumorINFLUENCE
‣ Influence is measured simply by the number of followers of a user. Presumably, the
more followers a user has, the more influential he or she is.
FEATURE 46

Two models
3. Unidentiﬁed RumorROLE
‣ High followers of the number of users, act as broadcasters.
‣ Low followers number of users, act as recipients.
This feature is not fully utilized in the article.
FEATURE 47

Two models
3. Unidentiﬁed RumorENGAGEMENT
FEATURE 48

Two models
3. Unidentiﬁed RumorTIME-INFERRED DIFFUSION
‣ The tweets' graphs do not show who I retweet.
‣ I think this tool is their monopoly technology with reference to Goel et al's
article. [2012].
Tweets' Graphs Tweets' Graphs
FEATURE 49

Two models
3. Unidentiﬁed RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT)
FEATURE 50

Two models
3. Unidentiﬁed RumorFRACTION OF NODES IN LARGEST CONNECTED COMPONENT
%Nodes in LCC = 32 / 54 = 0.6 %Nodes in LCC = 9 / 54 = 0.16
FEATURE 51

Two models
3. Unidentiﬁed RumorAVERAGE DEPTH TO BREADTH RATIO
FEATURE
%depth-to-breadth = 7 /42 = 0.16
%depth-to-breadth = 12 /42 = 0.28
Average %depth-to-breadth = (0.16 + 0.28) / 2 = 0.22
52

Two models
3. Unidentiﬁed RumorRATIO OF NEW USERS
FEATURE 53

Two models
3. Unidentiﬁed RumorRATIO OF ORIGINAL TWEETS
FEATURE 54

Two models
3. Unidentiﬁed RumorFRACTION OF TWEETS CONTAINING OUTSIDE LINKS
FEATURE 55

Two models
3. Unidentiﬁed RumorFRACTION OF ISOLATED NODES
FEATURE
%Isolated Nodes = 19/54 = 0.35 %Isolated Nodes = 6/54 = 0.11
56

ESTIMATED VERACITY
ESTIMATED CONFIDENCE
HIDDEN MARKOV MODEL , HMM ( ) 57

FOUR MAIN GOALS
EVALUATION
1. Measure the accuracy at which our model can predict the veracity of a
rumor before the ﬁrst trusted veriﬁcation.
2. Measure the contribution of each of the linguistic, user, and propagation
categories as a whole.
3. Measure the contributions of each of the 17 features individually.
4. Measure the accuracy of our model as a function of latency (i.e., time
elapsed since the beginning of a rumor).
58

FOUR MAIN GOALS
EVALUATION
1. Measure the accuracy at which our model can predict the veracity of a
rumor before the ﬁrst trusted veriﬁcation.
2. Measure the contribution of each of the linguistic, user, and propagation
categories as a whole.
3. Measure the contributions of each of the 17 features individually.
4. Measure the accuracy of our model as a function of latency (i.e., time
elapsed since the beginning of a rumor).
Fake-checking by Wikipedia, Snopes.com, FactCheck.org.
58

EVALUATION METHOD ?
EVALUATION
‣ Leave-one-out cross-
validation, LOOCV (
)
‣ Number of sample = 209
rumors.
59

ROC CURS
EVALUATION
= FP/(FP + TN)
= TP/(TP + FN)
‣ If the user is a ﬁnancial market ofﬁcial, he needs
most of the real rumors to assist in stock
trading.
‣ He can choose x = 0.6, y = 0.97 position.
61

ACCURACY VS. LATENCY
EVALUATION
1. The model reaches 75% accuracy right
before trusted veriﬁcation.
2. The model barely performs better than
chance (54%) before 50% latency.
3. The propagation features did not contribute
much until around 65% latency.
4. The early performance of the model seems to
be fulled mostly by the linguistic and user
features, which then plateau at around 55%
latency, as the amount of information they can
contribute to the model saturates.
1.
2.
3.
4.
62

TEMPORAL VS. NON-TEMPORAL MODELS
EVALUATION
‣ The performance of the non-temporal model
converges much faster than the temporal
model, which again can be attributed mostly to
the delayed effect(?) of the propagation
features in the temporal model.
‣ The non-temporal linguistic and user models
track their corresponding temporal models
fairly closely, with the linguistic models being
the most similar.
63

NEAR REAL-TIME VERACITY PREDICTION
EVALUATION
‣ The computation power required for computing
the features is minimal, the real bottle- neck, is
the Twitter API limit.
65

CONCLUSION
MY CONCLUSION
‣ I focus on fake news, so the dataset and the feature will be different.
‣ Author's article is GMM-HMM( Gaussian Mixture Model - Hidden Markov Model).
‣ Change the validation method.
‣ Picture truth?
‣ The ﬁrst step is collect truth and fake news.
66

HMM & R & FK

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to HMM & R & FK

Similar to HMM & R & FK (20)

Recently uploaded

Recently uploaded (20)

HMM & R & FK