Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How a Tweet Went Viral - BIWA Summit 2017


Published on

A series of tweets I posted about my 11hr struggle to make a cup of tea with my WiFi kettle ended-up going viral, got picked-up by the national and then international press, and led to thousands of retweets, comments and references in the media. In this session we’ll take the data I recorded on this Twitter activity over the period and use Oracle Big Data Graph and Spatial to understand what caused the breakout and the tweet going viral, who were the key influencers and connectors, and how the tweet spread over time and over geography from my original series of posts in Hove, England.

Published in: Technology
  • Be the first to comment

How a Tweet Went Viral - BIWA Summit 2017

  1. 1. T : @markrittman HOW A TWEET WENT VIRAL Mark Rittman, Oracle ACE Director & Independent Analyst
 MJR Analytics ltd ( BIWA SUMMIT 2017, SAN FRANCISCO
  2. 2. •Oracle ACE Director, now Independent Analyst •Past ODTUG Executive Board Member •Author of two books on Oracle BI •Co-founder & CTO of Rittman Mead •15+ Years in Oracle BI, DW, ETL + now Big Data •Now working in analytics product management + strategy •Host of the Drill to Detail Podcast ( •Based in Brighton & work in London, UK About The Presenter 2
  4. 4. •One of my personal interests is Home Automation •Started with Nest thermostat and Philips Hue lights •Extended the Nest system to include 
 Nest Protect and Nest Cam •Used Apple HomeKit, Apple TV for Siri voice control •Added Samsung Smart Things hub for Z-wave, 
 Zigbee compatibility •Linked Smart Things to Homekit using open-source HomeBridge project to enable for Siri control •Added Logitech Harmony for TV, Console, Roku Home Automation and Smart ‘IoT’ Devices Philips Hue 
 Lighting Nest Protect (X2), 
 Thermostat, Cam Withings
 Smart Scales Airplay
 Speakers Homebridge
 Homekit / Smarthings 
 Connector Samsung
 Smart Things Hub (Z-Wave, Zigbee) Door, Motion, Moisture,
 Presence Sensors Apple Homekit,
 Apple TV, Siri •Then Amazon Echo (x2) and Echo Dots (x4) to
 extend voice control + add Alexa skills •… and then Google Home + Chromecasts for
 hangouts, Google Assistant + Google Search
  5. 5. Voice Control - When Home Automation Gets Real Philips Hue 
 Lighting Nest Protect (X2), 
 Thermostat, Cam Withings
 Smart Scales Airplay
 Speakers Samsung
 Smart Things Hub (Z-Wave, 
 Zigbee) Door, Motion, Moisture,
 Presence Sensors •Position multiple units around the house for 
 ubiquitous voice control and music playback •Integration with smart home devices •Use ML algorithms in the cloud, constantly improving and leveraging cloud-scale processing “Alexa, turn on the kitchen lights” “Hey Google, turn up the heating” Amazon Echo Google Home
  6. 6. ONE DAY BACK IN SEPTEMBER 2016 … 6
  7. 7. 7
  9. 9. 9
  10. 10. 10
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. 14
  15. 15. 15
  16. 16. BUT WAIT… 16
  18. 18. 18 All Device Data at Home Logged to Hadoop Cluster
  19. 19. •Data extracted or transported to target platform using LogStash, CSV file batch loads •Landed into HDFS as JSON documents, then exposed as Hive tables using Storage Handler •Cataloged, visualised and analysed using Oracle Big Data Discovery + Python ML Other Personal Project : Home + Wearables Analytics 19 Data Transfer Data Access “Personal” Data Lake Jupyter
 Web Notebook 6 Node Hadoop Cluster (CDH5.5) Discovery & Development Labs
 Oracle Big Data Discovery 1.2 Data sets and samples Models and programs Oracle DV
 Desktop Models BDD Shell,
 Spark ML Data Factory LogStash
 via HTTP Manual
 CSV U/L Data streams CSV, IFTTT
 or API call Raw JSON log files in HDFS Each document an event, daily record or comms message Hive Tables
 w/ Elastic
 Storage Handler Index data turned into tabular format Health Data Unstructured Comms Data Smart Home
 Sensor Data
  20. 20. 20
  21. 21. 21
  22. 22. 22
  23. 23. THIS TIME LAST YEAR… 23
  24. 24. 24
  25. 25. •Graph, spatial and raster data processing for big data •Primarily documented + tested against Oracle BDA •Installable on commodity cluster using CDH •Data stored in Apache HBase or Oracle NoSQL DB •Complements Spatial & Graph in Oracle Database •Designed for trillions of nodes, edges etc •Out-of-the-box spatial enrichment services •Over 35 of most popular graph analysis functions •Graph traversal, recommendations •Finding communities and influencers, •Pattern matching Oracle Big Data Spatial & Graph 25
  28. 28. 28
  30. 30. 30 3454Tweets, retweets and mentions 
 over 48 hours 3017Twitters users commenting 30+Number of countries ‘WiFi Kettle” 
 became meme or news item
  31. 31. •Tweets in HDFS files processed and transformed into OBDGS file format for HBase load Loading Tweets (Edges) And Users (Vertices) • Unique ID for the vertex • Integer added via sequence in ODI • Property name (“name”, “followers”) • Vertex Property datatype and value Vertex File (.opv) • Unique ID for the edge • Leading edge vertex ID • Trailing edge vertex ID • Edge Type (“tweet”) • Edge Property (“timestamp” or “location”) • Edge Property datatype and value Edge File (.ope)
  32. 32. •Data loaded from files or through Java API into HBase •In-Memory Analytics layer runs common graph and spatial algorithms on data •Visualised using Cytoscape, R or in this example, Tom Sawyer Perspectives Oracle Big Data Graph And Spatial Architecture 32 Massively Scalable Graph Store • Oracle NoSQL • HBase Lightning-Fast In-Memory Analytics • YARN Container • Standalone Server • Embedded
  33. 33. cfg = GraphConfigBuilder.forPropertyGraphHbase() .setName("connectionsHBase") .setZkQuorum("bigdatalite").setZkClientPort(2181) .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3) .setInitialVertexNumRegions(3).setSplitsPerRegion(1) .build(); opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository(); vfile=“../../data/kettle_nodes.opv" efile=“../../data/kettle_edges.ope" opgdl=OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, vfile, efile, 2); // read through the vertices opg.getVertices(); // read through the edges opg.getEdges(); Loading Edges And Vertices Into Hbase 33 Uses “Gremlin” Shell for HBase • Creates connection to HBase • Sets initial configuration for database • Builds the database ready for load • Defines location of Vertex and Edge files • Creates instance of 
 OraclePropertyGraphDataLoader • Loads data from files • Prepares the property graph for use • Loads in Edges and Vertices • Now ready for in-memory processing
  34. 34. •Plugin created by Oracle to add to open-source Cytoscape analysis tool •Connects to HBase or NoSQL property graph •Connect to PGX analytics engine •Run Page Rank and other analyses •Visualize property graph on-screen •Search for nodes and edges using
 Apache Solr search engine Visualize And Analyze Using Cytoscape Plugin 34
  35. 35. Top 5 Influencers Based On Mentions, Retweets 35
  36. 36. 36 @markrittman @erinscafe @internetofshit
  37. 37. •The story was picked-up by several influential Twitter users and online news sites •ErinsCafe, BoingBoing, Internet of Sh*t •Featured as a “Twitter Moment” on Day 1 PM •Guardian Newspaper website Day 2 AM •Influencers identified in two ways •By number of followers in Twitter profile •By number of connecting edges in tweets
 Property Graph using Page Rank algorithm Role Of Network Influencers In Meme Propagation 37 •But … how did they hear about the story?
  38. 38. Understanding How A User Joined Conversation 38
  39. 39. Visualising Potential Story Paths To Influencer Nodes 39 @philjoneswired But did this tweet cause, or just comment on, the virality? We need to see the timeline…
  40. 40. •Filters PGX analysis on timestamp edge or vertex property when present in property graph •Select start date, optional end date for filter •Supports two-sided timeline in directed graphs •View property graph as it develops over time New Cytoscape Plugin Feature - Timeline Analysis 40
  41. 41. •The Timeline Analysis plugin for Cytoscape is useful and helps us filter by date range •Another option for visualising property graphs is Tom Sawyer Perspectives •Timeline analysis down to the hour - 3hr periods are perfect for this analysis •Map visualization, network visualization •Prototype using subsets of tweets in CSV files, or connect to full HBase/NoSQL dataset Tom Sawyer Perspectives For Social Network Analysis 42
  42. 42. 43 Day 1 : 10am GMT
  43. 43. 44 Day 1 : 3pm GMT
  44. 44. 45 Day 1 : 6pm GMT
  45. 45. 46 Day 1 : 8pm GMT
  46. 46. 47 Day 2 : 6am GMT
  47. 47. IT WAS @ERINSCAFE 48
  48. 48. IT WAS @ERINSCAFE 49
  49. 49. 50
  50. 50. 51
  52. 52. •Tweet went viral because it was picked-up on by a very well-connected Twitter user •And why did that happen? Probably because the story “had legs”… •Some mentions in the Twitter-verse before this but main viral explosion due to @erinscafe •All subsequent activity including mentions by @guardian, @internetofsh*t followed that •You can analyse Twitter and other meme breakouts using Oracle Big Data Spatial & Graph •New Timeline Analysis feature in Cytoscape Plugin useful for time-slice analysis of data •Tom Sawyer Perspectives provides even more visualisation incl. mapping analysis capabilities •Thank you to Alan Wu, Juan Francisco & Hans Viehmann from Oracle, 
 and Kevin Madden & Austris Krastiņš from Tom Sawyer for their help with the demos Conclusions 54
  53. 53. T : @markrittman HOW A TWEET WENT VIRAL Mark Rittman, Oracle ACE Director & Independent Analyst
 MJR Analytics ltd ( BIWA SUMMIT 2017, SAN FRANCISCO