Unraveling Ebola One Tweet
at a Time:
Dynamic Network Analysis of an
Ebola-Related Twitter Data Set
Steve Kramer, Ph.D.
Pr...
What Are We Doing?
• Provide valuable intelligence results to clients using our
dynamic anomaly detection software and dat...
How Is It Done Today?
• Existing approaches
– Standard SNA metrics
– Rule-based systems (transaction profiling, etc.)
– Ba...
What Is New in Our Patented
Approach?
• A powerful anomaly detection approach that incorporates
nonlinear time series anal...
What Is New in Our Approach?
(Cont’d.)
• Our framework inherently captures the dynamics of the entities under
study, witho...
Dynamic Anomaly Detection Overview
• A general approach that incorporates nonlinear time series
analysis methods
– Complex...
Finite-Time Lyapunov Exponents
(FTLEs)
• General dynamical system
• Flow map
– Advects points in the state
space
– Describ...
Finite-Time Lyapunov Exponents
(FTLEs)
• FTLEs characterize the amount of stretching or contraction
about a point x0 durin...
Derived Jacobian Vectors
• Similarly, characteristic vectors derived from the flow map’s
Jacobian can describe the general...
Paragon Dynamic Anomaly Detection
Paragon Science, Inc. 10
Representation
of Data at t=ti
Cluster
Resolution
Feature Vecto...
Example: Ebola Twitter Analysis
• Sample data set from Twitter API collected using twittertap
– Date range: 11/8/2014 – 11...
K-Core Decomposition of the Ebola Network
Paragon Science, Inc. 12
http://sourceforge.net/projects/lanet-vi/
Central Core of the Ebola Network
Paragon Science, Inc. 13
Top URLs in the Central Core
Paragon Science, Inc. 14
URL K
Shell
Degree
http://goo.gl/pFg3Z2 49 279
http://goo.gl/BFEUgy ...
Top-Ranked Website (URLs 1, 2, and 4)
Paragon Science, Inc. 15
UMA MENTIRA CHAMADA ,,EBOLA,, VEJAM !!! |
NOTICIÃRIO DA WEB...
5th
Ranked Website
Paragon Science, Inc. 16
6th
Ranked Website
Paragon Science, Inc. 17
Topic Detection in the Ebola Twitter
Network
Paragon Science, Inc. 18
User A User B
User C
replies to
mentions
URL 1 URL 2...
Summary of Top 200 Topic Anomalies
Paragon Science, Inc. 19
Topic Peak Start Time Peak End Time Max Change
Metric
# Anomal...
Key Sites Related to Top 5 Ebola Topic Anomalies
Paragon Science, Inc. 20
Topic Max
Change
Metric
Peak
Datetime
Top Relate...
Example: Topic 99 URL-to-User Links
Paragon Science, Inc. 21
Topic 99a: Economic Consequences
Paragon Science, Inc. 22
Topic 99b: Mobile Data to Prevent Ebola
Paragon Science, Inc. 23
Topic 99c: ISIS and Ebola
Paragon Science, Inc. 24
Topic 99d: @ebolafiles (Twitter user)
Paragon Science, Inc. 25
Topic 99e: Emergency Funding Request
Paragon Science, Inc. 26
Topic 99f: Follow Ebola
Paragon Science, Inc. 27
Follow Ebola | Updated every second & see what the
#CDC & #WHO is not tel...
Animation of Evolving Topic Network
Paragon Science, Inc. 28
http://youtu.be/AEQ02hv4Xjw
Paragon Science, Inc. 29
What Are the Payoffs?
• Quickly identify key influencers and trends in online
networks
• Provide ...
30
Third-Party Software Acknowledgements
• Paragon Science gratefully acknowledges the following researchers and software
...
Upcoming SlideShare
Loading in …5
×

Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Related Twitter Data Set

2,370 views

Published on

A sample of 2.5M tweets mentioning "Ebola" was collected during November 5-12, 2014. The titles of the 6227 web pages referenced by the tweets were used to cluster the web pages into roughly 100 topics. Then Paragon Science's patented dynamic anomaly detection software (http://www.paragonscience.com/intellectual_property.htm) then identified the top five most-anomalous topics. This research demonstrates how these techniques allow us to focus attention quickly on viral, emerging topics. A video showing an animation of those anomalous topics and the key related web pages for every hour of that week in November 2014 is available at https://www.youtube.com/watch?v=AEQ02hv4Xjw.

Published in: Science

Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Related Twitter Data Set

  1. 1. Unraveling Ebola One Tweet at a Time: Dynamic Network Analysis of an Ebola-Related Twitter Data Set Steve Kramer, Ph.D. President & Chief Scientist Paragon Science, Inc. January 2015 Copyright © 2006-2015 Paragon Science, Inc. All rights reserved.
  2. 2. What Are We Doing? • Provide valuable intelligence results to clients using our dynamic anomaly detection software and data mining tools • Many possible application areas: – Social media alerting and sentiment change detection – Analysis of web trends and user activities – Pricing and market trend analysis and alerting – Network defense against cyberattacks – Insider threat detection – Fraud prevention (banking, insurance, online auctions,…) – Healthcare data mining Paragon Science, Inc. 2
  3. 3. How Is It Done Today? • Existing approaches – Standard SNA metrics – Rule-based systems (transaction profiling, etc.) – Bayesian and other statistical/probabilistic models – Machine learning tools (neural nets, HMMs, etc.) • Some limitations of existing methods – Training requirements can be large for neural nets. – For rule-based systems, it is difficult to effectively predict or define new “bad” anomalies or patterns in advance. – Many current methods are not scalable to real-world operational requirements. Paragon Science, Inc. 3
  4. 4. What Is New in Our Patented Approach? • A powerful anomaly detection approach that incorporates nonlinear time series analysis methods – US Patent #8738652 (1.usa.gov/1kkyVD9) “Systems and Methods for Dynamic Anomaly Detection” • Key questions answered: – Which entities behave or evolve differently than others in the data set? – Which entities have shifted their behavior unexpectedly? Paragon Science, Inc. 4
  5. 5. What Is New in Our Approach? (Cont’d.) • Our framework inherently captures the dynamics of the entities under study, without having to specify in advance normal vs. abnormal behavior. • We can simultaneously analyze the time evolution of – Network structures – Any associated attributes (text terms, geospatial position, etc.) • Our technique is robust with respect to missing or erroneous data. • As result, we can – Find key players in rapidly changing networks – Provide early warning of viral videos and online documents – Focus attention on the most-anomalous events or transactions Paragon Science, Inc. 5
  6. 6. Dynamic Anomaly Detection Overview • A general approach that incorporates nonlinear time series analysis methods – Complexity measures – Finite-time Lyapunov exponents (FTLEs) • Input data – Communications or transactional data streams – General time-dependent data sets • Key questions – Which entities behave or evolve differently than others in the data set? – Which entities have shifted their behavior unexpectedly? Paragon Science, Inc. 6
  7. 7. Finite-Time Lyapunov Exponents (FTLEs) • General dynamical system • Flow map – Advects points in the state space – Describes the time evolution of the system Paragon Science, Inc. 7
  8. 8. Finite-Time Lyapunov Exponents (FTLEs) • FTLEs characterize the amount of stretching or contraction about a point x0 during a time interval T – Stability – Predictability • Definition Paragon Science, Inc. 8
  9. 9. Derived Jacobian Vectors • Similarly, characteristic vectors derived from the flow map’s Jacobian can describe the generalized directions of the local stretching or contraction. • Possible derivation approaches: – Weight-based column sampling – Singular value decomposition (SVD) – Principal component analysis (PCA) Paragon Science, Inc. 9
  10. 10. Paragon Dynamic Anomaly Detection Paragon Science, Inc. 10 Representation of Data at t=ti Cluster Resolution Feature Vector Encoding Outlier Detection at t=ti More Time Intervals? Yes No Clustering / Segmentation Dynamic Anomaly Detection Nonlinear Time Series Analysis FTLEs, Dynamic Thresholds, etc. Pattern Classification Outlier Detection Domain-Specific Filtering Threat Signatures, Risk Profiles, etc.
  11. 11. Example: Ebola Twitter Analysis • Sample data set from Twitter API collected using twittertap – Date range: 11/8/2014 – 11/16/2014 – 2,541,812 tweets – 4,708,678 generated links with hashtags, URLs, and user replies • Research plan – Perform k-core decomposition – Run anomaly detection software on sub-networks of nodes in the central core to find the most influential users and most viral URLs – Carry out community detection and topic detection Paragon Science, Inc. 11
  12. 12. K-Core Decomposition of the Ebola Network Paragon Science, Inc. 12 http://sourceforge.net/projects/lanet-vi/
  13. 13. Central Core of the Ebola Network Paragon Science, Inc. 13
  14. 14. Top URLs in the Central Core Paragon Science, Inc. 14 URL K Shell Degree http://goo.gl/pFg3Z2 49 279 http://goo.gl/BFEUgy 49 233 http://goo.gl/S37kHT 49 212 http://goo.gl/silISF 47 364 http://invst.rs/7MKWHB 22 779 http://cnn.it/1wlIlUe 22 741 http://trib.al/YKSMCSN 22 734 http://nyp.st/136BPG3 22 698 http://nypost.com/2014/10/29/cdc-admits-droplets-from-a-sneeze-could- spread-ebola/ 22 415 http://fxn.ws/1oVgLwc 22 406
  15. 15. Top-Ranked Website (URLs 1, 2, and 4) Paragon Science, Inc. 15 UMA MENTIRA CHAMADA ,,EBOLA,, VEJAM !!! | NOTICIÃRIO DA WEB A statement made by a man in Ghana called Nana Kwame rocked the internet in recent days. The following information has to reach people. We need to see the Ebola for what it really is. It's time to wake up the world agenda behind this whole story. Follow what this man has to say about what is happening in their country of origin: People in the world need to know what is happening here in West Africa. They are lying! The '' Ebola''como a virus does not exist and is not contagious. The Red Cross brought a disease to four specific countries, for four specific reasons and is only contracted by those who receive treatments and injections of the Red Cross. That's why Liberians and Nigerians began to expel the Red Cross in their countries!
  16. 16. 5th Ranked Website Paragon Science, Inc. 16
  17. 17. 6th Ranked Website Paragon Science, Inc. 17
  18. 18. Topic Detection in the Ebola Twitter Network Paragon Science, Inc. 18 User A User B User C replies to mentions URL 1 URL 2 references Term 1 Term 2 Term N Term 3 Topic 1 Topic 2 Topic M
  19. 19. Summary of Top 200 Topic Anomalies Paragon Science, Inc. 19 Topic Peak Start Time Peak End Time Max Change Metric # Anomalies Topic 99 2014-11-06 06:18 2014-11-12 10:18 2.97 40 Topic 8 2014-11-05 20:18 2014-11-07 07:18 2.891 34 Topic 59 2014-11-06 20:18 2014-11-11 19:18 2.43 28 Topic 1 2014-11-05 17:18 2014-11-05 19:18 2.32 3 Topic 52 2014-11-05 17:18 2014-11-05 18:18 2.30 2 Topic 50 2014-11-05 19:18 2014-11-06 15:18 2.22 11 Topic 32 2014-11-05 18:18 2014-11-05 19:18 2.18 2 Topic 20 2014-11-05 20:18 2014-11-06 02:18 2.11 7 Topic 2 2014-11-07 07:18 2014-11-12 16:18 2.10 33 Topic 28 2014-11-05 20:18 2014-11-05 22:18 2.00 3 Topic 29 2014-11-08 02:18 2014-11-12 18:18 1.96 21 Topic 97 2014-11-06 09:18 2014-11-07 03:18 1.91 4 Topic 30 2014-11-05 20:18 2014-11-05 20:18 1.84 1 Topic 22 2014-11-05 23:18 2014-11-06 02:18 1.79 4 Topic 18 2014-11-05 17:18 2014-11-05 17:18 1.65 1 Topic 15 2014-11-05 19:18 2014-11-05 19:18 1.63 1 Topic 4 2014-11-08 14:18 2014-11-12 15:18 1.61 5
  20. 20. Key Sites Related to Top 5 Ebola Topic Anomalies Paragon Science, Inc. 20 Topic Max Change Metric Peak Datetime Top Related URL Title Topic 99 2.973 2014-11-06 17:18:27 FACT SHEET: Emergency Funding Request to Enhance the U.S. Government’s Response to Ebola at Home and Abroad | The White House Topic 8 2.888 2014-11-05 20:18:27 BBC News - Ebola outbreak: Barack Obama 'to ask Congress for $6bn' Topic 59 2.426 2014-11-07 02:18:27 » Obama Caught Ordering Press to Cover Up Ebola Alex Jones' Infowars: There's a war on for your mind! Topic 1 2.321 2014-11-05 17:18:27 UMA MENTIRA CHAMADA ,,EBOLA,, VEJAM !!! | NOTICIÃRIO DA WEB Topic 52 2.296 2014-11-05 17:18:27 Nigeria Property: Ebola Virus Originated From US Bio- warfare Labs In West Africa – American Prof
  21. 21. Example: Topic 99 URL-to-User Links Paragon Science, Inc. 21
  22. 22. Topic 99a: Economic Consequences Paragon Science, Inc. 22
  23. 23. Topic 99b: Mobile Data to Prevent Ebola Paragon Science, Inc. 23
  24. 24. Topic 99c: ISIS and Ebola Paragon Science, Inc. 24
  25. 25. Topic 99d: @ebolafiles (Twitter user) Paragon Science, Inc. 25
  26. 26. Topic 99e: Emergency Funding Request Paragon Science, Inc. 26
  27. 27. Topic 99f: Follow Ebola Paragon Science, Inc. 27 Follow Ebola | Updated every second & see what the #CDC & #WHO is not telling you about #Ebola
  28. 28. Animation of Evolving Topic Network Paragon Science, Inc. 28 http://youtu.be/AEQ02hv4Xjw
  29. 29. Paragon Science, Inc. 29 What Are the Payoffs? • Quickly identify key influencers and trends in online networks • Provide early warning of viral videos, anomalous web events, or unusual network traffic • Enable enhanced business intelligence without having to specify normal vs. abnormal behavior in advance 29Paragon Science, Inc.
  30. 30. 30 Third-Party Software Acknowledgements • Paragon Science gratefully acknowledges the following researchers and software providers: – Cytoscape (http://www.cytoscape.org/) – dynnetwork Cytoscape plugin (https://code.google.com/p/dynnetwork/) – Lanet-vi (http://sourceforge.net/projects/lanet-vi/) • J. Alvarez-Hamelin, et al. "Understanding Edge Connectivity in the Internet through Core Decomposition," Internet Mathematics 7 (1): 45–66, 2011. – Louvain community detection software (http://perso.crans.org/aynaud/communities/) • V. Blondel, et al., “Fast Unfolding of Communities in Large Networks,” Journal of Statistical Mechanics: Theory and Experiment, 10, P10008, 2008. – Networkx (https://networkx.github.io/) • A Hagberg, D Conway, "Hacking social networks using the Python programming language (Module II - Why do SNA in NetworkX)", Sunbelt 2010: International Network for Social Network Analysis. Paragon Science, Inc.

×