Boston Hadoop User Group Presentation

680 views
614 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
680
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
41
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Boston Hadoop User Group Presentation

    1. 1. Boston Hadoop User GroupJeremy Rishel, SVP Engineering, Products, & DataApril 2012
    2. 2. Which is Better?A. More DataB. Better DataC. Better Algorithms Bluefin Labs Proprietary and Confidential
    3. 3. Which is Better?A. More DataB. Better DataC. Better AlgorithmsD. All of the Above Bluefin Labs Proprietary and Confidential
    4. 4. Social TVTelevision Social Web
    5. 5. Social TVTelevision Social Web
    6. 6. Social TVTelevision Social Web
    7. 7. Impressions
    8. 8. Impressions Expressions
    9. 9. Impressions Expressions
    10. 10. Kinds of Data and AlgorithmsPublic social media (Twitter, Facebook) 250M+ documents per dayProgramming info for 200+ U.S. networksVideo signal for 65+ U.S. networksBrand conversation & ad tracking for thousands of brandsRealtime semantic analysis of commentsDemographic & behavioral analysis of authorsAdvertising context & effect of advertising on brand dynamicsOverlap between audiences and comparative analysis Bluefin Labs Proprietary and Confidential
    11. 11. Realtime & Historical Data2M show telecasts1.5M ad airings / month50M links between social media users and TV shows / month10B links between social media users and TV ads / monthEnd-to-end latency in minutes - visible & searchable in realtimeHistorical data visible & searchable through various UIs/toolsSearchable text index of all social media comments in our archive &methods for large-scale analysis jobs (including MR) Bluefin Labs Proprietary and Confidential
    12. 12. Kinds of QuestionsWe often deal at the intersection of multiple data streams or data &algorithmsHow much chatter about a show (realtime)? (Social media +programming info + semantic analysis)What ads are airing (near realtime)? (Video signals + programminginfo + computer vision/audio fingerprinting)Which brands does the audience of a show talk most about? Whichshows do brand engaged authors talk most about? (Social media +programming info + brand data + semantic analysis + audienceoverlap analysis) Bluefin Labs Proprietary and Confidential
    13. 13. More Data“More data” can mean new streams, broader streams, or moregranular data“More data” powers better algorithms & aids in creating better data Bluefin Labs Proprietary and Confidential
    14. 14. More Data“More data” can mean new streams, broader streams, or moregranular data“More data” powers better algorithms & aids in creating better dataCapturing color, texture, & audio features from the TV video streamimproved our ad detection Bluefin Labs Proprietary and Confidential
    15. 15. More Data“More data” can mean new streams, broader streams, or moregranular data“More data” powers better algorithms & aids in creating better dataCapturing color, texture, & audio features from the TV video streamimproved our ad detectionTapping into full author history permitted better age classification Bluefin Labs Proprietary and Confidential
    16. 16. More Data“More data” can mean new streams, broader streams, or moregranular data“More data” powers better algorithms & aids in creating better dataCapturing color, texture, & audio features from the TV video streamimproved our ad detectionTapping into full author history permitted better age classificationAnalyzing closed caption gave us another dimension of semanticanalysis and avenues to explore social/mass media engagement Bluefin Labs Proprietary and Confidential
    17. 17. Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more useful Bluefin Labs Proprietary and Confidential
    18. 18. Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more usefulBoth realtime and large scale review & curation Bluefin Labs Proprietary and Confidential
    19. 19. Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more usefulBoth realtime and large scale review & curationSystematic monitoring, statistical QA, & estimation models Bluefin Labs Proprietary and Confidential
    20. 20. Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more usefulBoth realtime and large scale review & curationSystematic monitoring, statistical QA, & estimation modelsHigh quality data supports in-domain benchmarking (How is a showor network vs. competitors? How is a brand within its sector?) Bluefin Labs Proprietary and Confidential
    21. 21. Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more usefulBoth realtime and large scale review & curationSystematic monitoring, statistical QA, & estimation modelsHigh quality data supports in-domain benchmarking (How is a showor network vs. competitors? How is a brand within its sector?)High quality and consistent data permits richer trend analysis (e.g.season-over-season or ad campaign-to-ad campaign comparison) Bluefin Labs Proprietary and Confidential
    22. 22. Better Algorithms“Better algorithms” include both new analytics & improvements toexisting ones“Better algorithm” approaches can be taken with more & better data Bluefin Labs Proprietary and Confidential
    23. 23. Better Algorithms“Better algorithms” include both new analytics & improvements toexisting ones“Better algorithm” approaches can be taken with more & better dataFocus areas of NLP/machine learning, computer vision, & statisticalanalysis; key to “better” is having a way to measure “goodness” Bluefin Labs Proprietary and Confidential
    24. 24. Better Algorithms“Better algorithms” include both new analytics & improvements toexisting ones“Better algorithm” approaches can be taken with more & better dataFocus areas of NLP/machine learning, computer vision, & statisticalanalysis; key to “better” is having a way to measure “goodness”Ad discovery methods possible changed once we shifted to broaderapproach Bluefin Labs Proprietary and Confidential
    25. 25. Better Algorithms“Better algorithms” include both new analytics & improvements toexisting ones“Better algorithm” approaches can be taken with more & better dataFocus areas of NLP/machine learning, computer vision, & statisticalanalysis; key to “better” is having a way to measure “goodness”Ad discovery methods possible changed once we shifted to broaderapproachHigher quality show telecast engagement data permits more preciseaudience analysis across domains - e.g. shows & networks to brands Bluefin Labs Proprietary and Confidential
    26. 26. All of the AboveMore data helps build better data & algorithmsBetter data improves algorithms & makes large data more usefulBetter algorithms get leverage out of more & better dataYou should care about all three Bluefin Labs Proprietary and Confidential
    27. 27. Jeremy Rishel jrishel@bluefinlabs.comConfidential

    ×