Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Approaching Big Data: Lesson Plan

1,559 views

Published on

Slides for class session I taught at USC Annenberg on approaching big data for a non-technical audience so that they can learn the project planning skills to work with technical teams. The goal is to teach students the mindset they should when taking in mixed methods and applying to large datasets prior to selecting software packages and methodology. The slides take us through a previous use case and guidance moving forward from a process and cross-functional team perspective.

Published in: Data & Analytics
  • Be the first to comment

Approaching Big Data: Lesson Plan

  1. 1. Leveraging Engagement: Big Data Lesson
  2. 2. Agenda What is Big Data? •  Some Definitions •  Mixed Methods Approach Champion’s League & World Cup Case Study •  Process •  Results and Usage •  Pitfalls and Learnings Moving Forward •  Data Approach Flow •  Caveats •  Organization and Communication
  3. 3. What is Big Data? So many different definitions… nobody quite agrees…. … except that it’s definitely a buzzword
  4. 4. What is Big Data? It is just generally agreed upon that it’s messy and complex. This is an opportunity and challenge for us to innovate. “an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.” “Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.” “Volume, Variety, Velocity, Variability, Complexity” Quotes  from:     h-p://www.forbes.com/sites/gilpress/2014/09/03/12-­‐big-­‐data-­‐definiBons-­‐whats-­‐yours/2/   h-p://www.webopedia.com/TERM/B/big_data.html   h-p://en.wikipedia.org/wiki/Big_data  
  5. 5. What We Do Need to Solve Big Data? … for leveraging engagement at least.
  6. 6. …  for  leveraging  engagement  at  least.   Determine  Right   QuesBons  and   Goals  for  Data   Interdisciplinary   Approach   IteraBve   Refinement   “Combining the what (quantitative) with the why (qualitative) can be exponentially powerful.  It is also critical to our ability to take all our clickstream data and truly analyze it, to find insights that drive meaningful website changes that will improve our customers’ experiences.” – Avinash Kaushik Answer: Mixed Methods and Innovation Quote  from:    Web  AnalyBcs  in  One  Hour  a  Day  by  Avinash  Kaukshik  
  7. 7. CHAMPIONS LEAGUE AND WORLD CUP BIG DATA DISCOVERY PROCESS Annenberg Lab Framework
  8. 8. GOALS
  9. 9. Sports Fan and Engagement Study Overall Goals for HAVAS •  to identify and define communities of sports fans based around passion points(A) •  to analyze fan interactions with those passions (B) •  position HAVAS Sports & Entertainment to more effectively advise brands on how to meaningfully engage with sports fans by leveraging passion- based communities. (C)
  10. 10. Big Data Research Objectives •  Discover a mixed methodology framework for sports and entertainment fan engagement External for Havas •  Justify our fan logic topology in relation to Twitter conversations through natural language processing Internal for Lab
  11. 11. Initial Data Collection Steps 1) Modify data collection process to fit live soccer events using Champion’s league as a test run 2) Establish methodology in seeding initial pool of users, keywords, and hashtags 3) Analyze tweets and how they fit into logics of engagement 4) Establish methodology in how to gain insight from twitter conversations
  12. 12. “Analyzing Big Data is a BIG JOB with Many People” – Jake Inputs & Equipment Keywords, hashtags, user clusters file on txt document Dedicated server system colllecting information Engineering Run and modify Python script Register Public Screening API Parse for results Live Viewing Team Team to watch game and look for patterns
  13. 13. Data Collection Process Engineering & Team: Tech and Data Set-Up Engineer: Run Script with Seed File Team: Watch Event for Patterns and Additional Seeds Team: Decide Data to Analyze Engineer: Parse Data into User- Friendly Format Team: Look at Data and prepare for next event
  14. 14. DATA SEED METHODOLOGY
  15. 15. Initial Keyword Seed Scoping Keep it simple Discover through observations
  16. 16. Soccer Hashtags and Keywords Official Hashtags Sponsors Team Names Key Terms Key Players
  17. 17. Headliners Official Organization Handles Official Team Handles Official   Hashtags   Sponsors   Team   Names   Key  Terms   Key   Players  
  18. 18. Sponsors Sponsors will often have official hashtags promoted during sporting events to cross-promote their brand and the sporting event. Official   Hashtags   Sponsors   Team   Names   Key   Terms   Key   Players  
  19. 19. Supporting Characters Superfans -Fans with unusual followings on Twitter Sports Commentators -ESPN commentators and the like Prominent Bloggers -Blogs or bloggers with large following on certain teams
  20. 20. Initial Data Seed Scoping Caveats • Twitter caps at couple of thousand tweets per second on Public API • Public API received tweets do not appear to be affected by location based factors the way individual user feeds are • Twitter chunks these tweets in mysterious algorithm it deems important • Number of Tweets scrapped render these factors nominal in terms of large-scale user behavior
  21. 21. ENGAGEMENT HYPOTHESIS & ANALYSIS
  22. 22. What kind of Tweets or tone in tweets fit into logics of engagement? *Informed by survey and ethnography Entertainment Immersion Social Connection Identification Mastery Pride Play Advocacy
  23. 23. Operational Process Plan for World Cup & Modeling with Beacon Capabilities See how conservations analyzed from a big data perspective fit and build on the logics of engagement model Determine what data frameworks worked in capturing useful information Initial qualitative look at data
  24. 24. Exercise: Seed Scoping
  25. 25. Questions on Approach Before We Get Into Analysis?
  26. 26. Big Data Analysis Process to Dashboard
  27. 27. Big Data Basic Methods of Analysis • Text processing of tweets and plotting using algorithms into agglomerative clusters (aka cool visuals) • Frequency of terms, associations, and word clouds fall under here • Goal: Find texts of what spurred the most conversation Textual • A way to visually see social connection data • Understand forms of bonds and the connections between individual data points worth exploring • Goal: Detecting communities (our clusters, brands) Networks • Toolkits (such as Hootsuite) that measure “sentiment” using positive and negative language • Can be used to see if an initiative performed well • Goal: Measure success of a campaign at different times Sentiment
  28. 28. Big Data Low-Hanging Fruit - Topline Rt Author Screenname FIFAWorldCup 76172 9GAG 37459 DFB_Team_EN 21247 BBCSport 19564 FCBayern 14782 FTBpro 13409 _Snape_ 11371 benparr 10616 TheTweetOfGod 9435 espn 7465 Queen_UK 7174 thereaIbanksy 7113 sulsultm3 6646 damnitstrue 6603 asshaaban 6513 SportsCenter 6470 fifaworldcup_es 6365 LicDice_ 6361 FIFAworldcup_e 6241 DFB_Team 6114 Argentina 5964
  29. 29. Big Data Low-Hanging Fruit – Sentiment Analysis
  30. 30. Fan Handles1 Game Data2 Brand Data3 Integrate insights with Ethnographic and Survey Data for final deliverables Initial Idealized Approach
  31. 31. •  Survey Twitter Handles –  See if their online behavior matches survey logics –  What does the content they’re sharing look like –  Trends by cluster, gender, other data points •  Match Data –  Look for clusters of behavior to events in games –  See popularity of brand campaigns and behavioral response to brand stories –  Gain insight from bursts of activity and real-time marketing –  See what are characteristics of influencers •  Brand Data –  Identify how these strategies were executed in online conversations and responses –  Identify types of interactions/content/other markers around brands on Twitter –  Do influential brands mean consistent users interacting across brands? Why are people interacting in this way? How can we categorize these interactions according to our logic clusters? –  Was the content agile? –  See how users responded by the logics to different types of content –  Look for differences in fan response and fan-initiated behavior to the brands Questions and Hypothesis
  32. 32. What We Planned To Do •  Steps •  Define interesting WC fan moments and brand moments •  Examine moments in time and certain brand campaigns •  Investigate possible Natural Language Processing tools •  Formulated Questions •  Timeline •  Created a timeline assigning roles to each person •  Deliverables •  TBD, likely looking at clusters of behavior around brand campaigns. •  Sentiment analysis may tie in here
  33. 33. Ethnographic Report -What did people say about the brand or the logics they used? Survey Data -Under this brand logic utilized, what is the intensity and who are the clusters? Big Data -How did audiences respond online to actions by the brand? Approaching with Mixed Methods
  34. 34. Exercise: Group Datasets Figure out what insight you might be able to get from each piece of data and how would you apply mixed methods.
  35. 35. Dashboard Process
  36. 36. The Future of Social Media Analytics “We will be moving beyond key-word based queries into machine-learning algorithms. Influencers whom I have with with echo similar ideas about the increasing use and refine of latent semantic indexing (or some variant of it) and other machine-learning algorithms in order to improve social listening, automatic categorization of content, and the ability to take action on data” - Marshall Sponder
  37. 37. Key Learnings for Mood Board Ethnography Survey Twitter Data Brazil Brought Together All Data
  38. 38. Concept Creation
  39. 39. The Dashboard Build Process Pulled 250 Retweeted Tweets with Verification from BigSheets Coded Tweets According to Logic for Testing Data Built Dictionary According to Sample Tweets, Ethnography, Survey Created Natural Language Processing and Machine Learning Algorithms Fan Engagement Dashboard Prototype
  40. 40. Model Technology Collaboration Innovation Fan Engagement Dashboard Prototype jStart Beacon Custom-Built Twitter Collection Web App jStart BigSheets Leveraging Engagement Framework
  41. 41. Annenberg Innovation Lab Fan Engagement Dashboard built through collaboration and mixed methods learning. 67% Accuracy in classifying tweets by Logic of Engagement leading to actionable insight and business intelligence for Leveraging Fan Engagement.
  42. 42. The Process End-to-End Collecting and Managing Data Data Back Up Data Clean Up Run Models Gain InsightsRefine Models Learn Actionable Insights Communicate Insights (Reports, Infographic Blueprints) Create Initial Dictionary for Natural Language Processing Annotate/Code Tweets for Training Data for Machine Learning Created Dashboard Improve on Design
  43. 43. Now What?
  44. 44. Moving Forward Your Challenge •  Your data will be different client-to-client •  Twitter is just the beginning •  Your will get to be creative and work on collaborative cross-functional teams to dive into the data •  *This will be both rewarding and potentially difficult Tasks Ahead •  Begin thinking about what you can learn from data to help our sponsors reach their goals •  Start thinking about how your fans behave in your approach to figuring out what questions to ask the data
  45. 45. Most Basic Steps Determine Goals Capture Data Curate Data Merge Datasets and Bring Together Methodologies if Necessary Additional Data Processing to Usable Form Deliver Insight to the Client
  46. 46. Thinking About Process
  47. 47. Bumps in the Road Ahead •  Privacy Issues and Respecting the Fans •  Company layers and politics – releasing data from companies is fraught with back and forth •  Getting data into a usable form •  Assumptions were wrong or have to be redefined – it’s ok to fail fast – but be ready to keep moving •  Working in cross- functional groups
  48. 48. Image  from:  CapGemini  h-p://www.capgemini.com/sites/default/files/technology-­‐blog/files/2012/09/big-­‐data-­‐vendors.jpg  
  49. 49. Cross-Functional Communication Goal   Timing     Point   People   Resources   Needed    
  50. 50. Bring it Together Draw connections between the data sets and how could they relate to the eight logics and situational triggers. “While social media data are always interesting in themselves (at least, for an analyst), when business owners are able to combine data and layer them efficiently, the information will become more useful and actionable.” – Marshall Sponder
  51. 51. Thank You Questions?

×