Toward Formal Reasoning with Epistemic Policies about Information Quality in the Twittersphere
Upcoming SlideShare
Loading in...5
×
 

Toward Formal Reasoning with Epistemic Policies about Information Quality in the Twittersphere

on

  • 454 views

Presentation by VIStology Inc. at Fusion 2011 Conference, Chicago, IL. July 2011.

Presentation by VIStology Inc. at Fusion 2011 Conference, Chicago, IL. July 2011.

Automatically evaluating the reliability and credibility of messages on Twitter.

Statistics

Views

Total Views
454
Views on SlideShare
454
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

  Toward Formal Reasoning with Epistemic Policies about Information Quality in the Twittersphere Toward Formal Reasoning with Epistemic Policies about Information Quality in the Twittersphere Presentation Transcript

  • Toward Formal Reasoning with Epistemic Policies About Information Quality in the Twittersphere
    Brian Ulicny
    VIStology, Inc.
    bulicny@vistology.com
    Mieczyslaw Kokar
    Northeastern University and VIStology, Inc.
    kokar@coe.neu.edu
    VIStology, Inc - Fusion 2011
    1
  • Arab Spring Uprisings 2011
    2
    VIStology, Inc - Fusion 2011
  • Situation Awareness (?):Al Jazeera’s Twitter Monitor
    3
    VIStology, Inc - Fusion 2011
  • Situation Awareness:Attention Spikes from Twitter
    4
    VIStology, Inc - Fusion 2011
  • Situation Awareness: Flu Trends from Social Media
    Detecting influenza outbreaks by
    analyzing Twitter messages
    AronCulotta
    arXiv:1007.4748v1 [cs.IR] 27 Jul 2010
    5
    VIStology, Inc - Fusion 2011
  • Twitter as Open Source Intel
    6
    VIStology, Inc - Fusion 2011
  • 7
    VIStology, Inc - Fusion 2011
    Confidence = <Reliability, Credibility>
  • Problem Statement
    How can we assess not only the volume of tweets per time period
    And the frequency of terms they contain
    But the reliability, credibility & confidence of the information they convey
    In a potentially adversarial situation?
    8
    VIStology, Inc - Fusion 2011
  • Naïve STANAG 2022 for Twitter
    Reliability = F: Cannot Be Judged
    All “sources not used in the past”
    Credibility = 1: Confirmed by Other Sources
    More than two string identical tweets?
    Or Credibility = 3, Possibly True
    Because Sources not Independent
    Because Path between all sources in Twitter graph
    9
    VIStology, Inc - Fusion 2011
  • Need
    Tractable Way to Calculate:
    Twitter Source Reliability
    Twitter Content Credibility
    Twitter Source Independence
    Where
    Entire Twitter graph contains 105 Million Users
    As of April, 2010
    55 Million Tweets per Day
    3 Billion Requests per day to Twitter API
    10
    VIStology, Inc - Fusion 2011
  • The Argument from Google
    There are too many Twitter sources to evaluate their reliability directly.
    However, Google has shown that there is great value in using eigenvector centrality (PageRank) as a proxy for reliability.
    Therefore, we assume that a PageRank-like metric correlates with Reliability because
    (1) We assume that people do not pass along information they believe to be unreliable
    (2) Eigenvector centrality/retweet influence, unlike simple indegree centrality, is difficult to fake.
    11
    VIStology, Inc - Fusion 2011
  • Not Every Twitter User is Real
    CENTCOM
    Operation Earnest Voice
    12
    VIStology, Inc - Fusion 2011
  • TunkRank as Reliability
    Influence(X) = Expected number of people who will read a tweet that X tweets, including all retweets of that tweet. For simplicity, we assume that, if a person reads the same message twice (because of retweets), both readings count.
    If X is a member of Followers(Y), then there is a 1/||Following(X)|| probability that X will read a tweet posted by Y, where Following(X) is the set of people that X follows.
    If X reads a tweet from Y, there’s a constant probability p that X will retweet it.
    D. Tunkelang. 2009. A Twitter Analog to PageRank.
    http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/
    13
    VIStology, Inc - Fusion 2011
  • TunkRank as Reliability
    TunkRankvs Indegree Centrality (log scale)
    Mapping TunkRank to STANAG 2022 Reliability
    14
    VIStology, Inc - Fusion 2011
  • Unreliability Indicators
    If X retweets a message, e.g:
    RT @Whitehouse Zombie uprising in Scranton
    And there is no corresponding original tweet
    Then X is E: Unreliable.
    If X tweets a message with the same URL (shortened or dereferenced)
    But different content
    More than twice
    Then X is D: Not Usually Reliable.
    (On the other hand: Verification: Reliability )
    15
    VIStology, Inc - Fusion 2011
  • Source Independence
    There is a path connecting (nearly) every user in the Twitter graph.
    This does not mean that there is no source independence in Twitter.
    We count any sources as independent if they originate the message, and
    The shortest path between them is ≥ 4.
    In T.H. dataset, 4/20 tweets cite same NY Times URL via 3 shortened URLs.
    So, not independent.
    Other news sources: 2 cite Guardian, 1 BBC, 1 Der Spiegel, 1 WaPo, 1 Times of London
    No explicit Retweets
    No Implicit Retweets
    => 16 originating sources
    Compute distance between remaining sources
    16
    VIStology, Inc - Fusion 2011
  • Sameness of Content
    String identical tweets are not independent. Implicit retweets
    @BWJones: Tim Hetherington, photographer and 'Restrepo' co-director, killed in Misrata, Libya http://nyti.ms/dIm29T4/20/2011 6:16:25 PM
    @Frieze_magazine: Tim Hetherington, photographer and 'Restrepo' co-director, killed in Misrata, Libya http://nyti.ms/dIm29T4/20/2011 7:01:30 PM
    Custom Regexes to handle dead/alive
    Tweet =~ (<subject> .* (dead|died|killed|notalive|RIP) ) &&
    Tweet !~ (<subject> .* (not (dead|died|killed)) => Dead
    Tim Hetherington, Restrepo director has been killed in Misurata
    OR: Tweet =~ (<subject>.*(alive|(not (killed|dead|died)) &&
    Tweet !~ (<subject> .* (not alive|RIP) => Alive
    E.g. C H still alive. (true positive) Wish T H were still alive (false positive)
    Misses: C H in serious condition ( |= alive)
    >2x P vs not-P: Confirmed P; not-P: Improbable; > 1.5x P vs not-P: Probably True P, Doubtful not-P; ~same P, not-P: Possibly-true P, Possibly-true not-P
    435 Tweets report C H dead; vs 7 C H alive: Confirmed: C H Dead; Improbable: C H not Dead.
    17
    VIStology, Inc - Fusion 2011
  • Recap: Algorithm
    Identify set of Tweets by Search API on name
    Classify into Dead/Alive content
    Calculate TunkRank on Users
    Discount false retweeters
    Calculate Source Independence
    Group same media URLs; retweets, implicit retweets
    Calculate distance between sources for joint network two hops out for each source.
    @NYTImesPhoto: An attack in Misurata, Libya today killed the photographer Tim Hetherington. 4/20/2011 7:11:15 PM
    TunkRank: 99th percentile; > 5 independent sources assert T H died; 0 alive
    <A:Completely Reliable, 1:Confirmed by Other Sources>
    @Cmovila: Sad news Tim Hetherington died in Misrata now when covering the front line. 4/20/2011 4:39:57 PM
    TunkRank: 0th Percentile; > 5 Independent sources assert T H died; 0 alive
    <E: Unreliable; 1:Confirmed by Other Sources>
    T H Alive: 5: Improbable>
    18
    VIStology, Inc - Fusion 2011
  • Notional Architecture
    VIStology, Inc - Fusion 2011
    19
    Twitter
    Search API
    Tweet to RDF
    Conversion
    Message Classifier
    Twitter
    API
    BaseVISor
    Inference
    Engine
    TunkRank
    API
    Distance
    Calculator
    Tweets Augmented with STANAG 2022
    Assessments
  • Conclusions
    Treating all Tweets as equally legitimate OK in non-adversarial, high volume situations.
    As OSINT, Tweets need to be evaluated according to the STANAG 2022 rubric
    We have outlined tractable ways to calculate reliability (TunkRank), credibility (sameness of content) and source (in)dependence.
    By converting Tweets to RDF, we can reason about them formally with a formal reasoner (BaseVISor)
    Future work: Do large scale demonstration showing efficacy in distinguishing low-confidence death rumors from high-confidence death notices on Twitter
    20
    VIStology, Inc - Fusion 2011
  • Questions?
    21
    VIStology, Inc - Fusion 2011