SlideShare a Scribd company logo
1 of 86
Download to read offline
RUMOR GAUGE: PREDICTING THE
VERACITY OF RUMORS ON
TWITTER
LAB
(M10517019)
PAPER FROM ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, VOLUME 11,NO 4, ARTICLE
50, PUBLICATION DATE : JULY 2017
1
OUTLINE
‣ Common Sense
‣ Overview
‣ Dataset
‣ Three Type Of Features
‣ Model
‣ Evaluation
2
COMMON SENSE
HOW TO COMPUTE THE DIFFERENCE BETWEEN TWO VECTORS ?
‣ It’s easy , Euclidean distance.
‣ If the length of two vector are not equal ?
3
COMMON SENSE
DYNAMIC TIME WARPING , DTW ( )
‣ See the formula first.
‣ But, I’m prefer to explain by steps and illustrations.
4
1.The left and bottom
sides of the table
each have a set of
numbers, for the
input values, are
unequal length
vectors.
DYNAMIC TIME WARPING , DTW ( )
COMMON SENSE 5
2. The upper right is
the end point, the
bottom left is the
starting point.
DYNAMIC TIME WARPING , DTW ( )
COMMON SENSE 5
DYNAMIC TIME WARPING , DTW ( )
COMMON SENSE
3.Find the distance
(absolute subtraction)
of the two
characteristics of
each of the two
dimensions in the
table, plus the
minimum distance
of the “only
previous stage”.
5
DYNAMIC TIME WARPING , DTW ( )
COMMON SENSE
4.In this example, two
sets of vectors (the
vector can be of
different lengths)
distance is 15.
5
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
‣ What is Likelihood?
REALITY
MODEL
POPULATION
DATA
SAMPLE
PROBABILITY
LIKELIHOOD
6
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
It assumes that the sample should be approximately the
same normal distribution as the model.
7
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
Likelihood Estimate
MAXIMUN LIKELIHOOD
SAMPLE 1
DATA

MODEL
8
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
‣ 𝞱 represents the normally distribution parameter (𝛍,𝛔 ), but 𝞱 is actually a
number and multiplies by a series of chances to produce a normal distribution.
‣ 𝓁 is the most likely to produce this set of observations in the case of 𝞱.
2
estimate
9
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
Likelihood
Model parameter
A set of sample.
The interval of several parameters.
10
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
Natural Log
∑∏
11
COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
First order derivative of "THETA"
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
‣ First order derivative for extreme value.
12
MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , )
COMMON SENSE
CREATE A FUNCTION.
ADD THE NATURAL LOG.
TAKE THE
EXTREME VALUE.
MAKE SURE IT IS THE
MAXIMUM VALUE.
( OPTIONAL )
The second derivative of “THETA".
‣ First order derivative for extreme value.
‣ The second derivative determines the maximum or minimum.
‣ Less than zero on behalf of maximum.
‣ More than zero on behalf of minimum.
That’s we want.
13
EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
‣ Quote the literature from Wikipedia
‣ Also in estimating the maximum likelihood estimate.
‣ Latent variables?
The EM algorithm is used to find (locally) maximum likelihood parameters of a statistical model in cases
where the equations cannot be solved directly. Typically these models involve latent variables in
addition to unknown parameters and known data observations. That is, either missing values exist
among the data, or the model can be formulated more simply by assuming the existence of further
unobserved data points.
14
EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
‣ Quote the literature from Wikipedia
‣ Also in estimating the maximum likelihood estimate.
‣ Latent variables?
The EM algorithm is used to find (locally) maximum likelihood parameters of a statistical model in cases
where the equations cannot be solved directly. Typically these models involve latent variables in
addition to unknown parameters and known data observations. That is, either missing values exist
among the data, or the model can be formulated more simply by assuming the existence of further
unobserved data points.
14
EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
GROUP A
GROUP B
OBSERVED DATA
15
EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
OBSERVED DATA
‣ If we don’t know the source?
16
EXPECTATION MAXIMUM ALGORITHM , EM ( )
COMMON SENSE
And than Randomly
Adjustment parameters until almost all points have reached the MLE.
OBSERVED DATA
Decide to have several Gaussian distributions first (that is, grouping).
17
COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set
Loop
Loop
‣ Assume there were Nth 𝜃 representing Nth
Gaussian distributions.
‣ Each 𝜃 represents a distributions parameter.
Calculate each coin’s
Expectation.
B
A
A
B
A
18
COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set
Loop
Loop
‣ We don’t know which are coin A or B.
‣ Only know to be divided into N groups ( N
Gaussian distributions ).
Calculate each coin’s
Expectation.
?
?
?
?
?
19
COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set
Loop
Loop
‣ For example, if we want to split into two
distributions 1 and 2.
Calculate each coin’s
Expectation.
20
COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Calculate each coin’s
Expectation.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set.
Loop
Loop
‣ The percentage of each 𝜃 in the sum.
This represents the probability 𝞱i can be used as the
model parameter of this sample.
21
COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Calculate each coin’s
Expectation.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set.
Loop
Loop
‣ The percentage of each 𝜃 multiply the
H_count and T_count of each corresponding
sample.
‣ Add the expected value of the same 𝜃.
22
COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Calculate each coin’s
Expectation.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set.
Loop
Loop
‣ Calculate the probability of H in each 𝜃.
‣ Overwrite each 𝜃 values.
‣ Put the new 𝜃 into next generation.
23
COMMON SENSE
𝜃 set = Randomly ,
Observed Data
Calculate the MLE set of
each sample and 𝜃 pair.
Calculate each coin’s
Expectation.
EXPECTATION MAXIMUM ALGORITHM , EM ( )
Normalize the set of
MLE.
Overwrite the 𝜃 set.
Loop
Loop 24
MARKOV MODEL , MM ( )
COMMON SENSE
‣ represents the States.
‣ represents the number of .
‣ represents the transfer probability.
‣ represents the th transfer probability
from the th .
25
MARKOV MODEL , MM ( )
COMMON SENSE 25
MARKOV MODEL , MM ( )
COMMON SENSE
‣ We can choose any one as a starting
point.
‣ When drawing a picture, we can design
an imaginary state s₀ , the total
probability is 1.
25
HIDDEN MARKOV MODEL , HMM ( )
‣ represents the observed.
‣ In addition to the observed data, the
rest of the things can not be given.
COMMON SENSE 26
‣ It answers the question: What is the probability of a particular model
producing a particular set of observations?
‣ To evaluate, we use one of two algorithms: forward algorithms or backward
algorithms (DO NOT confuse them with forward-backward algorithms).
EVALUATION PROBLEM
COMMON SENSE 27
‣ It answers the question: Given a series of observation and model
parameters, what is the most likely sequence of states that produces this
observation sequence ?
‣ For decoding we use the Viterbi algorithm.
DECODING PROBLEM
COMMON SENSE 28
VITERBI ALGORITHM
COMMON SENSE 29
‣ It answers the question: Given a model structure and a series of
observations, find the model parameters that best fit the data.
‣ For this problem we can use the following 3 algorithms:
‣ MLE ( maximum likelihood estimation )
‣ Viterbi training( DO NOT confuse with Viterbi decoding )
‣ Baum Welch = forward-backward algorithm
LEARNING PROBLEM
COMMON SENSE 30
IN RUMOR GAUGE
Two models
1. Expectation-Maximum Algorithm
2. Forward or Backward Algorithm
3. Unidentified Rumor
OVERVIEW 31
IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
32
IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
33
IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
34
IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
It was made certain that all of the rumors have at least
1,000 tweets.
Any rumor that was identified with less than 1,000 tweets
was discarded.
35
IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
36
IN RUMOR GAUGE
DATASET
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified Rumor
‣ The red dot on the green block
represents the false tweet that
is predicted to be true.
37
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorNEGATION
FEATURE
‣ Stanford NLP parser
‣ With WordNet, ConceptNet aids may be better.
Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using
neural net-works. In Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP). ACL, 740–750.
38
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorAVERAGE SOPHISTICATION AND FORMALITY OF TWEETS
‣ Vulgarity : The presence of vulgar words in the tweet.
‣ Abbreviations : The presence of abbreviations (such as b4 for before, jk for just
kidding and irl for in real life) in the tweet.
‣ Emoticons : The presence of emoticons in the tweet.
‣ Average word complexity : Average length of words in the tweet.
‣ Sentence complexity : The grammatical complexity of the tweet.
Online Dictionaries
Dependency parse tree
FEATURE 39
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorAVERAGE SOPHISTICATION AND FORMALITY OF TWEETS
Depth = 5
FEATURE 40
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorRATIO OF TWEETS CONTAINING OPINION & INSIGHT
‣ Linguistic Inquiry and Word Count, LIWC
‣ The author should want this property (click here search “insight”)
FEATURE 41
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorRATIO OF INFERRING & TENTATIVE TWEETS
‣ Linguistic Inquiry and Word Count, LIWC
‣ The author should want this property (click here search “tentat”)
FEATURE 42
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorCONTROVERSIALITY
‣ These replies are then run through a state-of-the-art Twitter sentiment classifier
[Vosoughi et al. 2015], which classifies them as either positive, negative, or neutral.
‣ All conversations for a tweet are classified as positive, negative, neutral.
Soroush Vosoughi, Helen Zhou, and Deb Roy. 2015. Enhanced Twitter sentiment classification
using contextual information. In Proceedings of the 6th Workshop on Computational Approaches
to Sub-jectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics,
16–24.
FEATURE 43
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorORIGINALITY
FEATURE 44
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorCREDIBILITY
‣ The credibility of a user is a binary feature measured by whether the user’s account
has been officially verified by Twitter or not.
FEATURE 45
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorINFLUENCE
‣ Influence is measured simply by the number of followers of a user. Presumably, the
more followers a user has, the more influential he or she is.
FEATURE 46
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorROLE
‣ High followers of the number of users, act as broadcasters.
‣ Low followers number of users, act as recipients.
This feature is not fully utilized in the article.
FEATURE 47
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorENGAGEMENT
FEATURE 48
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorTIME-INFERRED DIFFUSION
‣ The tweets' graphs do not show who I retweet.
‣ I think this tool is their monopoly technology with reference to Goel et al's
article. [2012].
Tweets' Graphs Tweets' Graphs
FEATURE 49
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT)
FEATURE 50
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT)
FEATURE 50
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT)
FEATURE 50
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF NODES IN LARGEST CONNECTED COMPONENT
%Nodes in LCC = 32 / 54 = 0.6 %Nodes in LCC = 9 / 54 = 0.16
FEATURE 51
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorAVERAGE DEPTH TO BREADTH RATIO
FEATURE
%depth-to-breadth = 7 /42 = 0.16
%depth-to-breadth = 12 /42 = 0.28
Average %depth-to-breadth = (0.16 + 0.28) / 2 = 0.22
52
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorRATIO OF NEW USERS
FEATURE 53
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorRATIO OF ORIGINAL TWEETS
FEATURE 54
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF TWEETS CONTAINING OUTSIDE LINKS
FEATURE 55
Two models
1. Expectation-Maximum Algorithm
2. Forward-Backward Algorithm
3. Unidentified RumorFRACTION OF ISOLATED NODES
FEATURE
%Isolated Nodes = 19/54 = 0.35 %Isolated Nodes = 6/54 = 0.11
56
ESTIMATED VERACITY
ESTIMATED CONFIDENCE
HIDDEN MARKOV MODEL , HMM ( ) 57
FOUR MAIN GOALS
EVALUATION
1. Measure the accuracy at which our model can predict the veracity of a
rumor before the first trusted verification.
2. Measure the contribution of each of the linguistic, user, and propagation
categories as a whole.
3. Measure the contributions of each of the 17 features individually.
4. Measure the accuracy of our model as a function of latency (i.e., time
elapsed since the beginning of a rumor).
58
FOUR MAIN GOALS
EVALUATION
1. Measure the accuracy at which our model can predict the veracity of a
rumor before the first trusted verification.
2. Measure the contribution of each of the linguistic, user, and propagation
categories as a whole.
3. Measure the contributions of each of the 17 features individually.
4. Measure the accuracy of our model as a function of latency (i.e., time
elapsed since the beginning of a rumor).
Fake-checking by Wikipedia, Snopes.com, FactCheck.org.
58
FOUR MAIN GOALS
EVALUATION
1. Measure the accuracy at which our model can predict the veracity of a
rumor before the first trusted verification.
2. Measure the contribution of each of the linguistic, user, and propagation
categories as a whole.
3. Measure the contributions of each of the 17 features individually.
4. Measure the accuracy of our model as a function of latency (i.e., time
elapsed since the beginning of a rumor).
58
EVALUATION METHOD ?
EVALUATION
‣ Leave-one-out cross-
validation, LOOCV (
)
‣ Number of sample = 209
rumors.
59
ACCURACY
EVALUATION 60
ROC CURS
EVALUATION
= FP/(FP + TN)
= TP/(TP + FN)
‣ If the user is a financial market official, he needs
most of the real rumors to assist in stock
trading.
‣ He can choose x = 0.6, y = 0.97 position.
61
ACCURACY VS. LATENCY
EVALUATION
1. The model reaches 75% accuracy right
before trusted verification.
2. The model barely performs better than
chance (54%) before 50% latency.
3. The propagation features did not contribute
much until around 65% latency.
4. The early performance of the model seems to
be fulled mostly by the linguistic and user
features, which then plateau at around 55%
latency, as the amount of information they can
contribute to the model saturates.
1.
2.
3.
4.
62
TEMPORAL VS. NON-TEMPORAL MODELS
EVALUATION
‣ The performance of the non-temporal model
converges much faster than the temporal
model, which again can be attributed mostly to
the delayed effect(?) of the propagation
features in the temporal model.
‣ The non-temporal linguistic and user models
track their corresponding temporal models
fairly closely, with the linguistic models being
the most similar.
63
DELAYED EFFECT
EVALUATION 64
NEAR REAL-TIME VERACITY PREDICTION
EVALUATION
‣ The computation power required for computing
the features is minimal, the real bottle- neck, is
the Twitter API limit.
65
CONCLUSION
MY CONCLUSION
‣ I focus on fake news, so the dataset and the feature will be different.
‣ Author's article is GMM-HMM( Gaussian Mixture Model - Hidden Markov Model).
‣ Change the validation method.
‣ Picture truth?
‣ The first step is collect truth and fake news.
66
67
68
69
70
71
72
73
74
75
76

More Related Content

What's hot

Newton raphson method
Newton raphson methodNewton raphson method
Newton raphson methodMeet Patel
 
Chapter 3: Roots of Equations
Chapter 3: Roots of EquationsChapter 3: Roots of Equations
Chapter 3: Roots of EquationsMaria Fernanda
 
B02110105012
B02110105012B02110105012
B02110105012theijes
 
Secent method
Secent methodSecent method
Secent methodritu1806
 
Secant Iterative method
Secant Iterative methodSecant Iterative method
Secant Iterative methodIsaac Yowetu
 
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)Ivan Corneillet
 
Regula Falsi (False position) Method
Regula Falsi (False position) MethodRegula Falsi (False position) Method
Regula Falsi (False position) MethodIsaac Yowetu
 
Newton Raphson Method
Newton Raphson MethodNewton Raphson Method
Newton Raphson MethodBarkha Gupta
 
How to perform linear regression
How to perform linear regressionHow to perform linear regression
How to perform linear regressionDEEPAK VERMA
 
Root Equations Methods
Root Equations MethodsRoot Equations Methods
Root Equations MethodsUIS
 
Newton raphson method
Newton raphson methodNewton raphson method
Newton raphson methodJayesh Ranjan
 
Roots of equations
Roots of equationsRoots of equations
Roots of equationsMileacre
 
Regulafalsi_bydinesh
Regulafalsi_bydineshRegulafalsi_bydinesh
Regulafalsi_bydineshDinesh Kumar
 

What's hot (14)

Newton raphson method
Newton raphson methodNewton raphson method
Newton raphson method
 
Chapter 3: Roots of Equations
Chapter 3: Roots of EquationsChapter 3: Roots of Equations
Chapter 3: Roots of Equations
 
B02110105012
B02110105012B02110105012
B02110105012
 
Secent method
Secent methodSecent method
Secent method
 
Secant Iterative method
Secant Iterative methodSecant Iterative method
Secant Iterative method
 
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)
 
Regula Falsi (False position) Method
Regula Falsi (False position) MethodRegula Falsi (False position) Method
Regula Falsi (False position) Method
 
Newton Raphson Method
Newton Raphson MethodNewton Raphson Method
Newton Raphson Method
 
How to perform linear regression
How to perform linear regressionHow to perform linear regression
How to perform linear regression
 
Root Equations Methods
Root Equations MethodsRoot Equations Methods
Root Equations Methods
 
Newton raphson method
Newton raphson methodNewton raphson method
Newton raphson method
 
Roots of equations
Roots of equationsRoots of equations
Roots of equations
 
Regulafalsi_bydinesh
Regulafalsi_bydineshRegulafalsi_bydinesh
Regulafalsi_bydinesh
 
Diff eq
Diff eqDiff eq
Diff eq
 

Similar to HMM & R & FK

Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full ArticleMenglinLiu1
 
Robust Design And Variation Reduction Using DiscoverSim
Robust Design And Variation Reduction Using DiscoverSimRobust Design And Variation Reduction Using DiscoverSim
Robust Design And Variation Reduction Using DiscoverSimJohnNoguera
 
Intro to Quant Trading Strategies (Lecture 2 of 10)
Intro to Quant Trading Strategies (Lecture 2 of 10)Intro to Quant Trading Strategies (Lecture 2 of 10)
Intro to Quant Trading Strategies (Lecture 2 of 10)Adrian Aley
 
Presentation on GMM
Presentation on GMMPresentation on GMM
Presentation on GMMMoses sichei
 
44 randomized-algorithms
44 randomized-algorithms44 randomized-algorithms
44 randomized-algorithmsAjitSaraf1
 
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...Golden Helix Inc
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notesUmeshJagga1
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxsmithashetty24
 
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...cscpconf
 
Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Adrian Aley
 
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Alexander Litvinenko
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
UNIT - I Reinforcement Learning .pptx
UNIT - I Reinforcement Learning .pptxUNIT - I Reinforcement Learning .pptx
UNIT - I Reinforcement Learning .pptxDrUdayKiranG
 
Trust Measurement Presentation_Part 3
Trust Measurement Presentation_Part 3Trust Measurement Presentation_Part 3
Trust Measurement Presentation_Part 3Gan Chun Chet
 

Similar to HMM & R & FK (20)

Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
 
Robust Design And Variation Reduction Using DiscoverSim
Robust Design And Variation Reduction Using DiscoverSimRobust Design And Variation Reduction Using DiscoverSim
Robust Design And Variation Reduction Using DiscoverSim
 
Intro to Quant Trading Strategies (Lecture 2 of 10)
Intro to Quant Trading Strategies (Lecture 2 of 10)Intro to Quant Trading Strategies (Lecture 2 of 10)
Intro to Quant Trading Strategies (Lecture 2 of 10)
 
Presentation on GMM
Presentation on GMMPresentation on GMM
Presentation on GMM
 
44 randomized-algorithms
44 randomized-algorithms44 randomized-algorithms
44 randomized-algorithms
 
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notes
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
 
Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)Intro to Quant Trading Strategies (Lecture 10 of 10)
Intro to Quant Trading Strategies (Lecture 10 of 10)
 
Algoritmos
AlgoritmosAlgoritmos
Algoritmos
 
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
1. linear model, inference, prediction
1. linear model, inference, prediction1. linear model, inference, prediction
1. linear model, inference, prediction
 
Hmm and neural networks
Hmm and neural networksHmm and neural networks
Hmm and neural networks
 
UNIT - I Reinforcement Learning .pptx
UNIT - I Reinforcement Learning .pptxUNIT - I Reinforcement Learning .pptx
UNIT - I Reinforcement Learning .pptx
 
Trust Measurement Presentation_Part 3
Trust Measurement Presentation_Part 3Trust Measurement Presentation_Part 3
Trust Measurement Presentation_Part 3
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

HMM & R & FK

  • 1. RUMOR GAUGE: PREDICTING THE VERACITY OF RUMORS ON TWITTER LAB (M10517019) PAPER FROM ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, VOLUME 11,NO 4, ARTICLE 50, PUBLICATION DATE : JULY 2017 1
  • 2. OUTLINE ‣ Common Sense ‣ Overview ‣ Dataset ‣ Three Type Of Features ‣ Model ‣ Evaluation 2
  • 3. COMMON SENSE HOW TO COMPUTE THE DIFFERENCE BETWEEN TWO VECTORS ? ‣ It’s easy , Euclidean distance. ‣ If the length of two vector are not equal ? 3
  • 4. COMMON SENSE DYNAMIC TIME WARPING , DTW ( ) ‣ See the formula first. ‣ But, I’m prefer to explain by steps and illustrations. 4
  • 5. 1.The left and bottom sides of the table each have a set of numbers, for the input values, are unequal length vectors. DYNAMIC TIME WARPING , DTW ( ) COMMON SENSE 5
  • 6. 2. The upper right is the end point, the bottom left is the starting point. DYNAMIC TIME WARPING , DTW ( ) COMMON SENSE 5
  • 7. DYNAMIC TIME WARPING , DTW ( ) COMMON SENSE 3.Find the distance (absolute subtraction) of the two characteristics of each of the two dimensions in the table, plus the minimum distance of the “only previous stage”. 5
  • 8. DYNAMIC TIME WARPING , DTW ( ) COMMON SENSE 4.In this example, two sets of vectors (the vector can be of different lengths) distance is 15. 5
  • 9. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , ) COMMON SENSE ‣ What is Likelihood? REALITY MODEL POPULATION DATA SAMPLE PROBABILITY LIKELIHOOD 6
  • 10. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , ) COMMON SENSE It assumes that the sample should be approximately the same normal distribution as the model. 7
  • 11. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , ) COMMON SENSE Likelihood Estimate MAXIMUN LIKELIHOOD SAMPLE 1 DATA
 MODEL 8
  • 12. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , ) COMMON SENSE ‣ 𝞱 represents the normally distribution parameter (𝛍,𝛔 ), but 𝞱 is actually a number and multiplies by a series of chances to produce a normal distribution. ‣ 𝓁 is the most likely to produce this set of observations in the case of 𝞱. 2 estimate 9
  • 13. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , ) COMMON SENSE CREATE A FUNCTION. ADD THE NATURAL LOG. TAKE THE EXTREME VALUE. MAKE SURE IT IS THE MAXIMUM VALUE. ( OPTIONAL ) Likelihood Model parameter A set of sample. The interval of several parameters. 10
  • 14. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , ) COMMON SENSE CREATE A FUNCTION. ADD THE NATURAL LOG. TAKE THE EXTREME VALUE. MAKE SURE IT IS THE MAXIMUM VALUE. ( OPTIONAL ) Natural Log ∑∏ 11
  • 15. COMMON SENSE CREATE A FUNCTION. ADD THE NATURAL LOG. TAKE THE EXTREME VALUE. MAKE SURE IT IS THE MAXIMUM VALUE. ( OPTIONAL ) First order derivative of "THETA" MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , ) ‣ First order derivative for extreme value. 12
  • 16. MAXIMUM LIKELIHOOD ESTIMATE , MLE ( , ) COMMON SENSE CREATE A FUNCTION. ADD THE NATURAL LOG. TAKE THE EXTREME VALUE. MAKE SURE IT IS THE MAXIMUM VALUE. ( OPTIONAL ) The second derivative of “THETA". ‣ First order derivative for extreme value. ‣ The second derivative determines the maximum or minimum. ‣ Less than zero on behalf of maximum. ‣ More than zero on behalf of minimum. That’s we want. 13
  • 17. EXPECTATION MAXIMUM ALGORITHM , EM ( ) COMMON SENSE ‣ Quote the literature from Wikipedia ‣ Also in estimating the maximum likelihood estimate. ‣ Latent variables? The EM algorithm is used to find (locally) maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly. Typically these models involve latent variables in addition to unknown parameters and known data observations. That is, either missing values exist among the data, or the model can be formulated more simply by assuming the existence of further unobserved data points. 14
  • 18. EXPECTATION MAXIMUM ALGORITHM , EM ( ) COMMON SENSE ‣ Quote the literature from Wikipedia ‣ Also in estimating the maximum likelihood estimate. ‣ Latent variables? The EM algorithm is used to find (locally) maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly. Typically these models involve latent variables in addition to unknown parameters and known data observations. That is, either missing values exist among the data, or the model can be formulated more simply by assuming the existence of further unobserved data points. 14
  • 19. EXPECTATION MAXIMUM ALGORITHM , EM ( ) COMMON SENSE GROUP A GROUP B OBSERVED DATA 15
  • 20. EXPECTATION MAXIMUM ALGORITHM , EM ( ) COMMON SENSE OBSERVED DATA ‣ If we don’t know the source? 16
  • 21. EXPECTATION MAXIMUM ALGORITHM , EM ( ) COMMON SENSE And than Randomly Adjustment parameters until almost all points have reached the MLE. OBSERVED DATA Decide to have several Gaussian distributions first (that is, grouping). 17
  • 22. COMMON SENSE 𝜃 set = Randomly , Observed Data Calculate the MLE set of each sample and 𝜃 pair. EXPECTATION MAXIMUM ALGORITHM , EM ( ) Normalize the set of MLE. Overwrite the 𝜃 set Loop Loop ‣ Assume there were Nth 𝜃 representing Nth Gaussian distributions. ‣ Each 𝜃 represents a distributions parameter. Calculate each coin’s Expectation. B A A B A 18
  • 23. COMMON SENSE 𝜃 set = Randomly , Observed Data Calculate the MLE set of each sample and 𝜃 pair. EXPECTATION MAXIMUM ALGORITHM , EM ( ) Normalize the set of MLE. Overwrite the 𝜃 set Loop Loop ‣ We don’t know which are coin A or B. ‣ Only know to be divided into N groups ( N Gaussian distributions ). Calculate each coin’s Expectation. ? ? ? ? ? 19
  • 24. COMMON SENSE 𝜃 set = Randomly , Observed Data Calculate the MLE set of each sample and 𝜃 pair. EXPECTATION MAXIMUM ALGORITHM , EM ( ) Normalize the set of MLE. Overwrite the 𝜃 set Loop Loop ‣ For example, if we want to split into two distributions 1 and 2. Calculate each coin’s Expectation. 20
  • 25. COMMON SENSE 𝜃 set = Randomly , Observed Data Calculate the MLE set of each sample and 𝜃 pair. Calculate each coin’s Expectation. EXPECTATION MAXIMUM ALGORITHM , EM ( ) Normalize the set of MLE. Overwrite the 𝜃 set. Loop Loop ‣ The percentage of each 𝜃 in the sum. This represents the probability 𝞱i can be used as the model parameter of this sample. 21
  • 26. COMMON SENSE 𝜃 set = Randomly , Observed Data Calculate the MLE set of each sample and 𝜃 pair. Calculate each coin’s Expectation. EXPECTATION MAXIMUM ALGORITHM , EM ( ) Normalize the set of MLE. Overwrite the 𝜃 set. Loop Loop ‣ The percentage of each 𝜃 multiply the H_count and T_count of each corresponding sample. ‣ Add the expected value of the same 𝜃. 22
  • 27. COMMON SENSE 𝜃 set = Randomly , Observed Data Calculate the MLE set of each sample and 𝜃 pair. Calculate each coin’s Expectation. EXPECTATION MAXIMUM ALGORITHM , EM ( ) Normalize the set of MLE. Overwrite the 𝜃 set. Loop Loop ‣ Calculate the probability of H in each 𝜃. ‣ Overwrite each 𝜃 values. ‣ Put the new 𝜃 into next generation. 23
  • 28. COMMON SENSE 𝜃 set = Randomly , Observed Data Calculate the MLE set of each sample and 𝜃 pair. Calculate each coin’s Expectation. EXPECTATION MAXIMUM ALGORITHM , EM ( ) Normalize the set of MLE. Overwrite the 𝜃 set. Loop Loop 24
  • 29. MARKOV MODEL , MM ( ) COMMON SENSE ‣ represents the States. ‣ represents the number of . ‣ represents the transfer probability. ‣ represents the th transfer probability from the th . 25
  • 30. MARKOV MODEL , MM ( ) COMMON SENSE 25
  • 31. MARKOV MODEL , MM ( ) COMMON SENSE ‣ We can choose any one as a starting point. ‣ When drawing a picture, we can design an imaginary state s₀ , the total probability is 1. 25
  • 32. HIDDEN MARKOV MODEL , HMM ( ) ‣ represents the observed. ‣ In addition to the observed data, the rest of the things can not be given. COMMON SENSE 26
  • 33. ‣ It answers the question: What is the probability of a particular model producing a particular set of observations? ‣ To evaluate, we use one of two algorithms: forward algorithms or backward algorithms (DO NOT confuse them with forward-backward algorithms). EVALUATION PROBLEM COMMON SENSE 27
  • 34. ‣ It answers the question: Given a series of observation and model parameters, what is the most likely sequence of states that produces this observation sequence ? ‣ For decoding we use the Viterbi algorithm. DECODING PROBLEM COMMON SENSE 28
  • 36. ‣ It answers the question: Given a model structure and a series of observations, find the model parameters that best fit the data. ‣ For this problem we can use the following 3 algorithms: ‣ MLE ( maximum likelihood estimation ) ‣ Viterbi training( DO NOT confuse with Viterbi decoding ) ‣ Baum Welch = forward-backward algorithm LEARNING PROBLEM COMMON SENSE 30
  • 37. IN RUMOR GAUGE Two models 1. Expectation-Maximum Algorithm 2. Forward or Backward Algorithm 3. Unidentified Rumor OVERVIEW 31
  • 38. IN RUMOR GAUGE DATASET Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified Rumor 32
  • 39. IN RUMOR GAUGE DATASET Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified Rumor 33
  • 40. IN RUMOR GAUGE DATASET Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified Rumor 34
  • 41. IN RUMOR GAUGE DATASET Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified Rumor It was made certain that all of the rumors have at least 1,000 tweets. Any rumor that was identified with less than 1,000 tweets was discarded. 35
  • 42. IN RUMOR GAUGE DATASET Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified Rumor 36
  • 43. IN RUMOR GAUGE DATASET Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified Rumor ‣ The red dot on the green block represents the false tweet that is predicted to be true. 37
  • 44. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorNEGATION FEATURE ‣ Stanford NLP parser ‣ With WordNet, ConceptNet aids may be better. Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural net-works. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 740–750. 38
  • 45. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorAVERAGE SOPHISTICATION AND FORMALITY OF TWEETS ‣ Vulgarity : The presence of vulgar words in the tweet. ‣ Abbreviations : The presence of abbreviations (such as b4 for before, jk for just kidding and irl for in real life) in the tweet. ‣ Emoticons : The presence of emoticons in the tweet. ‣ Average word complexity : Average length of words in the tweet. ‣ Sentence complexity : The grammatical complexity of the tweet. Online Dictionaries Dependency parse tree FEATURE 39
  • 46. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorAVERAGE SOPHISTICATION AND FORMALITY OF TWEETS Depth = 5 FEATURE 40
  • 47. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorRATIO OF TWEETS CONTAINING OPINION & INSIGHT ‣ Linguistic Inquiry and Word Count, LIWC ‣ The author should want this property (click here search “insight”) FEATURE 41
  • 48. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorRATIO OF INFERRING & TENTATIVE TWEETS ‣ Linguistic Inquiry and Word Count, LIWC ‣ The author should want this property (click here search “tentat”) FEATURE 42
  • 49. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorCONTROVERSIALITY ‣ These replies are then run through a state-of-the-art Twitter sentiment classifier [Vosoughi et al. 2015], which classifies them as either positive, negative, or neutral. ‣ All conversations for a tweet are classified as positive, negative, neutral. Soroush Vosoughi, Helen Zhou, and Deb Roy. 2015. Enhanced Twitter sentiment classification using contextual information. In Proceedings of the 6th Workshop on Computational Approaches to Sub-jectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, 16–24. FEATURE 43
  • 50. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorORIGINALITY FEATURE 44
  • 51. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorCREDIBILITY ‣ The credibility of a user is a binary feature measured by whether the user’s account has been officially verified by Twitter or not. FEATURE 45
  • 52. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorINFLUENCE ‣ Influence is measured simply by the number of followers of a user. Presumably, the more followers a user has, the more influential he or she is. FEATURE 46
  • 53. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorROLE ‣ High followers of the number of users, act as broadcasters. ‣ Low followers number of users, act as recipients. This feature is not fully utilized in the article. FEATURE 47
  • 54. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorENGAGEMENT FEATURE 48
  • 55. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorTIME-INFERRED DIFFUSION ‣ The tweets' graphs do not show who I retweet. ‣ I think this tool is their monopoly technology with reference to Goel et al's article. [2012]. Tweets' Graphs Tweets' Graphs FEATURE 49
  • 56. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT) FEATURE 50
  • 57. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT) FEATURE 50
  • 58. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorFRACTION OF LOW-TO-HIGH DIFFUSION(SIGNIFICANT) FEATURE 50
  • 59. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorFRACTION OF NODES IN LARGEST CONNECTED COMPONENT %Nodes in LCC = 32 / 54 = 0.6 %Nodes in LCC = 9 / 54 = 0.16 FEATURE 51
  • 60. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorAVERAGE DEPTH TO BREADTH RATIO FEATURE %depth-to-breadth = 7 /42 = 0.16 %depth-to-breadth = 12 /42 = 0.28 Average %depth-to-breadth = (0.16 + 0.28) / 2 = 0.22 52
  • 61. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorRATIO OF NEW USERS FEATURE 53
  • 62. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorRATIO OF ORIGINAL TWEETS FEATURE 54
  • 63. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorFRACTION OF TWEETS CONTAINING OUTSIDE LINKS FEATURE 55
  • 64. Two models 1. Expectation-Maximum Algorithm 2. Forward-Backward Algorithm 3. Unidentified RumorFRACTION OF ISOLATED NODES FEATURE %Isolated Nodes = 19/54 = 0.35 %Isolated Nodes = 6/54 = 0.11 56
  • 66. FOUR MAIN GOALS EVALUATION 1. Measure the accuracy at which our model can predict the veracity of a rumor before the first trusted verification. 2. Measure the contribution of each of the linguistic, user, and propagation categories as a whole. 3. Measure the contributions of each of the 17 features individually. 4. Measure the accuracy of our model as a function of latency (i.e., time elapsed since the beginning of a rumor). 58
  • 67. FOUR MAIN GOALS EVALUATION 1. Measure the accuracy at which our model can predict the veracity of a rumor before the first trusted verification. 2. Measure the contribution of each of the linguistic, user, and propagation categories as a whole. 3. Measure the contributions of each of the 17 features individually. 4. Measure the accuracy of our model as a function of latency (i.e., time elapsed since the beginning of a rumor). Fake-checking by Wikipedia, Snopes.com, FactCheck.org. 58
  • 68. FOUR MAIN GOALS EVALUATION 1. Measure the accuracy at which our model can predict the veracity of a rumor before the first trusted verification. 2. Measure the contribution of each of the linguistic, user, and propagation categories as a whole. 3. Measure the contributions of each of the 17 features individually. 4. Measure the accuracy of our model as a function of latency (i.e., time elapsed since the beginning of a rumor). 58
  • 69. EVALUATION METHOD ? EVALUATION ‣ Leave-one-out cross- validation, LOOCV ( ) ‣ Number of sample = 209 rumors. 59
  • 71. ROC CURS EVALUATION = FP/(FP + TN) = TP/(TP + FN) ‣ If the user is a financial market official, he needs most of the real rumors to assist in stock trading. ‣ He can choose x = 0.6, y = 0.97 position. 61
  • 72. ACCURACY VS. LATENCY EVALUATION 1. The model reaches 75% accuracy right before trusted verification. 2. The model barely performs better than chance (54%) before 50% latency. 3. The propagation features did not contribute much until around 65% latency. 4. The early performance of the model seems to be fulled mostly by the linguistic and user features, which then plateau at around 55% latency, as the amount of information they can contribute to the model saturates. 1. 2. 3. 4. 62
  • 73. TEMPORAL VS. NON-TEMPORAL MODELS EVALUATION ‣ The performance of the non-temporal model converges much faster than the temporal model, which again can be attributed mostly to the delayed effect(?) of the propagation features in the temporal model. ‣ The non-temporal linguistic and user models track their corresponding temporal models fairly closely, with the linguistic models being the most similar. 63
  • 75. NEAR REAL-TIME VERACITY PREDICTION EVALUATION ‣ The computation power required for computing the features is minimal, the real bottle- neck, is the Twitter API limit. 65
  • 76. CONCLUSION MY CONCLUSION ‣ I focus on fake news, so the dataset and the feature will be different. ‣ Author's article is GMM-HMM( Gaussian Mixture Model - Hidden Markov Model). ‣ Change the validation method. ‣ Picture truth? ‣ The first step is collect truth and fake news. 66
  • 77. 67
  • 78. 68
  • 79. 69
  • 80. 70
  • 81. 71
  • 82. 72
  • 83. 73
  • 84. 74
  • 85. 75
  • 86. 76