Open analytics social media framework

1,993
-1

Published on

IKANOW's OA D

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,993
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
40
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Introduction and Topic
  • Introduction and Topic
  • Agenda:Social Media: An Intelligence perspectiveCommon Analytic PitfallsAn Analytic FrameworkCase Study: Brand ManagementWays Forward, Future AnalysisQuestions?
  • Intelligence is information (data) that has been transformed to meet an operational need.There are a lot of ways to move from raw data to usable intelligence and every organization will do this differently because the answers they are seeking are organizationally specific. I.e. seen through the “operational lens”.
  • No matter what methodology you use…intelligence analysis is an iterative processYou Collect the data, Store it, Analyze it, and Distribute the end results to your organization in some usable format.
  • HUMINT, Human Intelligence: intelligence gathering by means of interpersonal contact. Pros: Can reveal intentions Cons: Can be unreliableOSINT, Open Source Intelligence: intelligence collected from publicly available sources. Pros: Fast and accessible Cons: NoiseSIGINT, Signals Intelligence: intelligence-gathering by interception of signals. Pros: High volume Cons: Noise
  • Provide value to the organization – turn data into intelligence using an “operational lens” (answer the questions your organization is asking in other words)Ensure cyclical feedback occurs during collection, processing, analysis, and consumption (learn from the process and adjust to based on what you learn, intel gathering and analysis is not a static process)Validate that a particular network is the right source of data for the questions you need answered (i.e. is Twitter the right place to look for data related to weather?)
  • My mom does not Tweet or have a FaceBook profile. It only seems like your friends posts or tweets every 30 seconds.People use different networks for different reasons so tracking individuals consistently can be difficult (what people say on Twitter may not match what they say on Facebook, Yelp, etc.)
  • Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
  • http://apps.washingtonpost.com/politics/transcripts/2012/presidential/live/737/Washington Post and Votertide collaboration to analyze how viewers reacted to Clinton’s speech at the DNC Convention a few months ago.They captured 496,222 tweets and generated what amounts to a very basic word cloud that really provides limited value from an analysis perspective.What does the content of this page tell use about the speech Clinton gave and what users thought about the speech?
  • What can you learn from this type of experiment with the wrong tools?A lot of people were tweeting when Clinton was speaking but not many were really tweeting about what Clinton was saying.People like funny tweetsWas anyone actually listening to Clinton?
  • Word clouds can tell you something about the language used but not the meaning behind the language. What you see in the cloud is what but not why.
  • So how do you avoid some of these pitfalls and get useful intelligence from social media? The answer to that question, or partial answer at least is the focus of the remainder of the presentation.The key ingredients for a framework include Data Capture, Data Reporting and Data Analysis components. All of which are important but the Data Analysis components are the most interesting. :>
  • There are a lot of platforms that you could use to do social analysis but a few key issues to consider before making a commitment:
  • This is where I repeat that I shamelessly stole most of this presentation from a coworker Andrew who is a huge gaming fan as well as being a super bright analyst.The IKANOW team ran an experimental use case studying how organizations like Zenimax Online Studios could benefit from a well thought out social media analysis initiative.
  • We started with the question of: How can brand managers use social media to track and understand public attitudes toward a product?When performing this kind of analysis there are two primary challenges you will face:Query too large = false positivesQuery too small = miss potential information
  • As a starting point for the use case we begin harvesting tweets. We picked Twitter because it has excellent analytical potential (emphasis on potential).Of course twitter has its limitations as well.- 140 characters- Limited historical (lookback) capacity without using a 3rd party provider like DataSift or GNIP (NOTE: Twitter is going to be offering historical tweets from what I hear)Almost all NLP/text extraction/unstructured data analysis tools perform poorly on small blobs of text
  • As a starting point we collected tweets matching the search term “Elder Scrolls Online” for a period of 10 days in July.
  • The tweets were processed (using the Infinit.e document analysis platform) to extract entities (who, what, when and where) and create associations with those entities (somebody tweeted to someone, somebody retweeted someone, someone used the hashtag #elderscrollsonline, etc.) You can see a visualization of what this type of processing looks like here in the slide with the various extracted components highlighted.
  • Entities extracted can also include URLs allowing your analysis tools to follow new paths of information and expand your data sets out from the original harvested sources.
  • Entities extracted can also include URLs allowing your analysis tools to follow new paths of information and expand your data sets out from the original harvested sources.
  • Once you have a capture and reporting system in place you can start working on developing your analysis. Never forget that analysis needs to be rooted in an operation need to be useful, i.e. You need to have one or more organizationally specific questions that you are trying to answer in order for analysis to produce value to the organization.Your emphasis should be on hypothesis generation, testing, and experimentation especially at the start.
  • Based on our question: “How can I use social media to track and understand public attitudes toward my product” we started by analyzing the top hashtags returned in the tweets we harvested.Interestingly we found that these hashtags were mostly generic and not related directly to the specific game we were interested in monitoring. This means that tracking and understanding user opinions of the game will be more difficult since we are unable to focus in on a specific, known set of hashtags.
  • One of the benefits of expanding our search to web pages is that it makes it possible to do a higher level of sentiment analysis on the data then is normally possible with tweets alone.
  • Negative: Users weren’t impressed by the game’s teaser and graphics suggesting that the trailer hadn’t been well received.Positive: Other hashtags showed that fans still had positive sentiment towards the Elder Scrolls franchise in general.
  • Negative: Users weren’t impressed by the game’s teaser and graphics suggesting that the trailer hadn’t been well received.Positive: Other hashtags showed that fans still had positive sentiment towards the Elder Scrolls franchise in general.
  • One of the biggest lessons learned from this exercise though is that Zenimax (or any brand) needs to try and shape the conversation around their products in such a way as to make it easier to track, analyze, and respond to what customers are saying.At the end of our analysis Andrew wrote a white paper detailing our findings and forwarded the whitepaper to Zenimax. Not long after we noticed the following tweet from the @TESOnline Twitter handle.
  • Data segmentation allows to take huge volumes of data and distill them down into more digestible and meaningful groupings for analysis. Common segmentation patterns include, user names or handles, hashtags, keywords, geographic regions.
  • Data segmentation allows to take huge volumes of data and distill them down into more digestible and meaningful groupings for analysis. Common segmentation patterns include, user names or handles, hashtags, keywords, geographic regions.
  • Here is an illustration of an experiment we ran were we segmented (or clustered) tweets geographically and used a net sentiment score for the clusters to show were people were relatively happy or sad around the world.
  • Use Graph Analysis to explore the links between entities extracted from your data, for example:Identify Key InfluencersView links between tweets, websites, and blogs
  • Use Graph Analysis to explore the links between entities extracted from your data, for example:Identify Key InfluencersView links between tweets, websites, and blogs
  • Don’t - Try drinking from a fire hose, sometimes less really is more;Don’t - Use metrics you can’t tie to actions;Don’t - Use visualizations or reports that strip the data from its context.
  • Segment data rather than attempting to work in the aggregate;Look for the why behind the message;Always return to the source material;Explore alternative explanations;Always consider the ultimate goal.
  • Segment data rather than attempting to work in the aggregate;Look for the why behind the message;Always return to the source material;Explore alternative explanations;Always consider the ultimate goal.
  • Open analytics social media framework

    1. 1. Open Analytics Summit DC 2013Building Effective Frameworks for Social Media Analysis Presented by: Josh Liss
    2. 2. Segway - 230+ million monthly active users - globalewebindex - 175 million tweets/day in 2012 – infographics labs - 1+ billion monthly active users - facebook - 17 billion geo-tagged pictures & check-ins - gizmodo - 200+ million users in 200 countries – techcrunch - Incredible amount of personal information - 10 million mo. unique visitors faster than any independent site in history – Sirona Consulting - 28.1% annual household income of $100K - ultralinx - Google + button used 5 billion times/day - alltwitter - 625,000 new users on Google+ every day - alltwitter
    3. 3. Agenda • Social Media: An Intelligence perspective • Common Analytic Pitfalls • An Analytic Framework • Case Study: Superstorm #Sandy – Problem Definition – Source Selection – Data Capture – Data Reporting – Data Analysis • The Way Forward – do’s & don’ts • Discussion
    4. 4. Intelligence • Intelligence is information that has been transformed to meet an operational need Data Intelligence Operational Lens
    5. 5. Intelligence Cycle • No matter what methodology you use… Collect Distribute Store Analyze intelligence analysis is an iterative process.
    6. 6. Social Media: Intelligence Perspective • Intelligence derived from social media brings with it the best and worst aspects of: – HUMINT – SIGINT – OSINT HUMINT OSINT SIGINT
    7. 7. Social Media Analysis Goals • Provide value to the organization – turn data into intelligence using an “operational lens” • Ensure cyclical feedback occurs during collection, processing, analysis, and consumption • Validate that a particular network is the right source of data for the questions you need answered • $$$
    8. 8. Common Misconceptions • Social media is not a panacea – Not everyone uses social media – Users of social media use it unevenly – User behavior changes based on situations • Just because people can talk about anything does not mean they talk about everything all the time.
    9. 9. Common Pitfalls • Analyzing What Instead of Why: The important thing is often not what people are saying… but why they are saying it. • Using the Wrong Analysis Tools: Reporting tools rarely help dig into the why. Many common tools, reports, and metrics are misleading: – Word clouds atomize message context – Sentiment metrics are often highly inaccurate – Information in aggregate hides more than it reveals
    10. 10. Pitfalls: An Example of the Challenge
    11. 11. Pitfalls: An Example of the Challenge
    12. 12. Dangers of Disintegration The problems are analytical rather than aesthetic or technical. The context is virtually indecipherable: - Source: Matthew Auer, Policy Studies Journal, Volume 39, Issue 4, pages 709–736, Nov 2011
    13. 13. Analytic Framework • Data Capture (DC) Capture • Data Reporting (DR) • Data Analysis (DA) – What to measure Analyze Report – What the data is saying – What should be done based on the data Source: Avinash Kaushik, Occam’s Razor Blog http://www.kaushik.net/avinash/web-analytics-consulting- framework-smarter-decisions/
    14. 14. Choosing a Platform• Social media, and the ways that it is used, is relatively new and evolving rapidly: – Static approaches to social media are flawed from the outset – No one metric or set of metrics will always let you know what is happening – No turn-key solution to all problems• Platforms need to be open and highly adaptable to facilitate data capture, reporting, and analysis
    15. 15. Case Study: Superstorm Sandy • Industry: Disaster Response/Crisis Informatics – 14 Billion-dollar disasters in 2011 – 11 Billion-dollar disasters in 2012 • Over $100 Billion in total damages • Oct 29 2012 - Hurricane Sandy – $50+ Billion Damages – 72 deaths directly attributed to storm • Additional 87 deaths indirectly attributed • Can social media SAVE money/lives/resources?
    16. 16. Problem Definition • Question: How can social media assist civil authorities responding to natural disasters: – Prevent/limit loss of life and limb – Prevent/limit damage and loss of property – Protect critical infrastructure • Challenges: Capture relevant information from social media sources. – Query too large/broad = false positives – Query too small/narrow = miss potential information – Signal vs. Noise
    17. 17. The Source: Twitter • Twitter has excellent analytical potential: – Enormous volume, 400+ million tweets per day – Large user base, 200+ million active users – Open API • But its not without its limitations: – 140 characters – Limited historical (look-back) capacity without using a 3rd party provider like DataSift or GNIP = $$$ – Anonymity, credibility – Fact vs. satire
    18. 18. Data Capture • 975,000+ Tweets – Filters: temporal, geo, keywords, hashtags – Timeline: 28 Oct to 06 Nov • Pre-land fall, Land-fall, Aftermath, Recovery – Geo focus on Tri-state area • Entity Extraction / Sentiment – NLP extracts the entities, events and associations from unstructured text • Isolates Twitter Handles, Keywords, URLs, etc.
    19. 19. Data Capture: Entities & Associations Twitter Handles Unstructured Keywords Hashtags URL Time / Date Stamp Who What When Where TwitterHandles, Hashtags, Keywords, Time, Date Geo (if Available) retweeters URLs
    20. 20. Data Reporting
    21. 21. Data Reporting Keywords Twitter handle
    22. 22. Data Analysis • Analysis must be rooted in the operational need: – How can social media help civil authorities & first responders during natural disaster response and relief efforts. • Emphasis on hypothesis generation, testing, and experimentation
    23. 23. Data Analysis: Hashtags • Top hashtags were almost all generic or abstract – Undermines tracking and understanding – Generates leads for further analysis Hashtags #Sandy #Recovery #NYC #Power #Hoboken #SandyABC7 #NJ #Gas #Brooklyn #JERSEYSTRONG
    24. 24. Data Analysis: Sentiment • Sentiment analysis on small chunks of text like Tweets is generally poor • Follow and convert linked URLs into derivative sources Larger text sources offer potential value with sentiment analysis that tweets alone cannot offer
    25. 25. Data Analysis: Sentiment • Top negative and positive sentiment scores can provide a glimpse into aggregate attitudes • Provide starting points for additional analysis
    26. 26. Data Analysis: Narrow the scope
    27. 27. Next Steps: Agile Intelligence • New Problem Identified: – NYC 911 received approx. 20,000 calls/hour – Life/limb emergencies could not get through – Callers prompted to text or call 311 – NYC spent $2 Billion since 2009 “overhauling” the system • $680 Million call center – “Unified Call Taker” system • New Question: Can social media serve as a supplement/alternative to traditional emergency response systems during times of natural disasters, state of emergencies? – Promote/monitor hashtags – Dedicated analysts/dispatchers – Facilitate proactive use of local/city/state resources
    28. 28. Next Steps: Segment the Data • Segment, or cluster, your data by: – User name or twitterhandle – Hashtags – Keywords – Geographic region – Timeline to explore patterns and trends at the micro level versus the entire dataset
    29. 29. Next Steps: Try on different lenses Highest traffic occurred during the height of the storm, despite spreading power outages
    30. 30. Next Steps: Segment the Data < 5% of of Tweets are geo-tagged
    31. 31. Next Steps: Graph Analysis Visualize associations between top influencers
    32. 32. Next Steps: Findings • Targeted queries based on tailored information requirements • Findings: – Few legitimate “calls for help” – No dedicated #’s • #help used for encouraging donations/volunteering • #distress used for – Significant & accurate i-reporting on flooding, downed trees/power lines, fires, etc. – Crowd-sourced info on where to find gas, food/water, donate goods, volunteer, etc. – Despite widespread power outages, cell service was a life-line
    33. 33. Lessons Learned • Don’t: – Try drinking from a fire hose • sometimes less really is more – Use metrics you can’t tie to actions – Use visualizations or reports that strip the data from its context
    34. 34. Lessons Learned • Do: – Segment data rather than attempting to work in the aggregate – Look for the why behind the message – Always return to the source material – Explore alternative explanations – Always consider the ultimate goal
    35. 35. Discussion Success stories or lessons learned from social media analysis/monitoring in 2012? Arguments for or against the use of social media? Where will social media monitoring/analysis be in 2014?
    36. 36. Thank You! Joshua Liss jliss@ikanow.com www.ikanow.com github.com/ikanow/infinit.e
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×