Credibility Ranking of Tweets during High Impact Events

Credibility
Ranking
of
Tweets

during
High
Impact
Events

Adi$
Gupta
&
Ponnurangam
Kumaraguru

PSOSM@WWW

April
17,
2012

Problem
MoOvaOon

precog.iiitd.edu.in

IIIT-‐Delhi

2

Problem
MoOvaOon

Informa$on

Opinion

Spam

precog.iiitd.edu.in

IIIT-‐Delhi

3

Outline

• 
• 
• 
• 
• 
• 
• 

Research
statement

Architecture

Data
collecOon

Analysis

Results

ImplementaOon

Future
direcOon

precog.iiitd.edu.in

IIIT-‐Delhi

4

Research
Statement

•  IdenOfy
parameters
that
aﬀect
credibility
of

content
on
TwiTer

•  Develop
a
semi-‐automated
algorithm
to

assess
credibility
of
tweets

precog.iiitd.edu.in

IIIT-‐Delhi

5

Terminology

TWEET:
A
status
(140

chars)

HASHTAG

RETWEET

USER

PROFILE

URL

USER
NAME
@screen_name

FOLLOWERS

Tweets

@-‐MENTIONS

precog.iiitd.edu.in

IIIT-‐Delhi

6

Credibility

•  “The
quality
of
being
trusted
and
believed
in.”

•  In
this
research

–  Assess
the
credibility
of
the
informaOon
in
the

content
of
a
tweet
(message)
by
a
user
on
TwiTer.

– 
A
tweet
is
said
to
contain
credible
informaOon

about
a
news
event,
if
you
trust
or
believe
that

informaOon
in
the
tweet
to
be
correct
/
true.

precog.iiitd.edu.in

IIIT-‐Delhi

7

News
on
TwiTer

News
on

Twi7er

Topics
on

Twi7er

News

Events

E.g.
#Irene,

#Libyacrisis

precog.iiitd.edu.in

Credible

Informa$on

Chit-‐Chat

Fake
news
/
Rumors
/
Spam
/
Personal

Opinions

E.g.

#nothingwrongwith,

#goodmorningtwiTer

Non-‐
Credible

Informa$on

IIIT-‐Delhi

8

Our
ContribuOons

•  30%
of
tweets
provide
informaOon
(17%
credible
informaOon)

and
14%
was
spam

•  Linear
logisOc
regression

–  Content
based:
#unique
characters,
swear
words,

pronouns
and
emoOcons

–  User
based:
#followers
and
length
of
username

•  Present
automated
algorithm
(supervised
ML
and
relevance

feedback)
to
assess
credibility
in
tweets

precog.iiitd.edu.in

IIIT-‐Delhi

9

Data
StaOsOcs

Total
tweets

35,748,136

Total
unique
users

6,877,320

Tweets
with
URLs

4,973,457

Number
of
singleton
tweets

22,481,898

Number
of
re-‐tweets
/
replies

13,266,238

Start
date

12th
July,
2011

End
date

30th
August,
2011

•  High
impact
events:

–  Greater
25K
tweets

–  More
than
48
hours
in
trending
topics

precog.iiitd.edu.in

IIIT-‐Delhi

10

Data
StaOsOcs

precog.iiitd.edu.in

IIIT-‐Delhi

11

Data
StaOsOcs

Events

542,685

#ukriots, #londonri- ots, #prayforlondon

Libya Crisis

389,506

libya, tripoli

Earthquake in Virginia

277,604

#earthquake, Earth- quake in SF

JanLokPal Bill Agitation

182,692

Anna Hazare, #janlokpal, #anna

Apple CEO Steve Jobs resigns

158,816

Steve Jobs, Tim Cook, Apple CEO

US Downgrading

148,047

S&P, AAA to AA

Hurricane Irene

90,237

Hurricane Irene, Tropical Storm Irene

Google acquires Motorola Mobility

68,527

Google, Motorola Mobility

News of the World Scandal

67,602

Rupert Murdoch, #murdoch

Abercrombie & Fitch stocks drop

54,763

Abercrombie & Fitch, A&F

Muppets Bert and Ernie were gay

52,401

Bert and Ernie

Indiana State Fair Tragedy

49,924

Indiana State Fair

Mumbai Blast, 2011

32,156

#mumbaiblast, Dadar, #needhelp

New Facebook Messenger

Trending Topics

UK Riots

Tweets

28,206

Facebook Messenger

precog.iiitd.edu.in

IIIT-‐Delhi

12

Architecture

precog.iiitd.edu.in

IIIT-‐Delhi

13

Human
AnnotaOon

•  For
each
tweet:

–  Tweet
contains
informaOon
about
the
event.
Rate
the
credibility
of

informaOon
present:

•  Deﬁnitely
Credible

•  Seems
Credible

•  Deﬁnitely
Incredible

•  I
can’t
Decide

–  Tweet
is
related
to
the
news
event,
but
contains
no
informaOon

–  Tweet
is
not
related
to
news
event

–  Skip
tweet

•  Each
tweet
annotated
by
3
people

•  Inter-‐annotator
agreement
(Cronbach
Alpha)
=
0.748

•  30%
of
tweets
provide
informaOon
(17%
credible
informaOon)
and

14%
was
spam

precog.iiitd.edu.in

IIIT-‐Delhi

14

ANALYSIS

precog.iiitd.edu.in

IIIT-‐Delhi

15

Feature
Sets

Message based features

Source based features

Length of the tweet
Registration age of the user

Number of words
Number of unique characters

Number of statuses

Number of hashtags
Number of followers

Number of retweets
Number of swear language words

Number of friends

Number of positive sentiment words
Number of negative sentiment words

Is a veriﬁed account

Tweet is a retweet

Length of description

Number of special symbols [$, !]
Length of screen name

Number of emoticons [:-), :-(]
Tweet is a reply

Has URL

Number of @- mentions
Ratio of followers to followees

Number of retweets
Time lapse since the query

Source based features

Has URL
Registration age of the user

Number of URLs
Use of URL shortener service

Number of statuses

Message based features
Number of followers

Length of the tweet
Number of words

precog.iiitd.edu.in

IIIT-‐Delhi

16

PRF

•  PRF
(Pseudo
Relevance
Feedback)

–  Extract
k
ranked
documents
and
then
re-‐rank

those
documents
according
to
a
deﬁned
score

–  Re-‐ranking
based
on
‘context’
of
the
event

–  Top
n
unigrams
based
on
BM25
metric

precog.iiitd.edu.in

IIIT-‐Delhi

17

Algorithm

precog.iiitd.edu.in

IIIT-‐Delhi

18

EvaluaOon
Metric

EvaluaOon
Metric:
NDCG
(Normalized
Discounted
CumulaOve
Gain)

NDCG
is
the
standard
metric
used
to
evaluate
“graded”
results

precog.iiitd.edu.in

IIIT-‐Delhi

19

Ranking
Results

•  Tweet
and
user
based
features
contribute
in
determining
the
credibility
–
it

maTers
“what
you
post
and
who
you
are”

•  Context
based
(PRF)
ranking
greatly
enhances
the
performance
(upto
.74

NDCG)

precog.iiitd.edu.in

IIIT-‐Delhi

20

Web-‐portal
ImplementaOon

precog.iiitd.edu.in

IIIT-‐Delhi

21

LimitaOons
&
Future
Work

•  Human
input
required

–  Need
to
develop
self
learning
(completely

automated)
soluOons

•  Analyze
events
with
a
greater
temporal

variaOon

•  Understanding
user’s
perspecOve
of
credibility

of
content
on
TwiTer

precog.iiitd.edu.in

IIIT-‐Delhi

22

Challenges

• 
• 
• 
• 

Large
volume
of
data
being
generated

Real-‐Ome
soluOons
needed

Only
140
characters

Informal
language

precog.iiitd.edu.in

IIIT-‐Delhi

23

Acknowledgements

•  All
members
of
our
research
group

•  Dept.
of
InformaOon
Technology,
Government

of
India

precog.iiitd.edu.in

IIIT-‐Delhi

24

References

•  C.
CasOllo,
M.
Mendoza,
and
B.
Poblete.
InformaOon
Credibility
on
TwiTer.

In
WWW,
pages
675–684,
2011.

•  J.
Chen,
R.
Nairn,
L.
Nelson,
M.
Bernstein,
and
E.
Chi.
Short
and
tweet:

experiments
on
recommending
content
from
informaOon
streams.
CHI

’10,
pages
1185–1194,
2010.

•  J.
Ratkiewicz,
M.
Conover,
M.
Meiss,
B.
Gon
̧calves,
S.
PaOl,
A.
Flammini,

and
F.
Menczer.
Truthy:
mapping
the
spread
of
astroturf
in
microblog

streams.
WWW
’11.

•  S.
E.
Robertson,
S.
Walker,
and
M.
Beaulieu.
Okapi
at
trec-‐7:
automaOc
ad

hoc,
ﬁltering,
vlc
and
interacOve
track.
IN,
1999.

•  T.
Sakaki,
M.
Okazaki,
and
Y.
Matsuo.
Earthquake
shakes
twiTer
users:

real-‐Ome
event
detecOon
by
social
sensors.
WWW
’10,
2010.

•  S.
Verma,
S.
Vieweg,
W.
J.
Corvey,
L.
Palen,
J.
H.
MarOn,
M.
Palmer,
A.

Schram,
and
K.
M.
Anderson.
Nlp
to
the
rescue?
extracOng
“situaOonal

awareness”
tweets
during
mass
emergency.
ICWSM,
2011.

precog.iiitd.edu.in

IIIT-‐Delhi

25

QuesOons?

precog.iiitd.edu.in

IIIT-‐Delhi

26

Thank
You!

adiOg@iiit.ac.in

pk@iiitd.ac.in

precog.iiitd.edu.in

For
any
further
informaOon,
please
write
to

pk@iiitd.ac.in

precog.iiitd.edu.in

28

Credibility Ranking of Tweets during High Impact Events

Recommended

Recommended

More Related Content

Similar to Credibility Ranking of Tweets during High Impact Events

Similar to Credibility Ranking of Tweets during High Impact Events (20)

More from IIIT Hyderabad

More from IIIT Hyderabad (20)

Recently uploaded

Recently uploaded (20)

Credibility Ranking of Tweets during High Impact Events