• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Boston Hadoop User Group Presentation
 

Boston Hadoop User Group Presentation

on

  • 655 views

 

Statistics

Views

Total Views
655
Views on SlideShare
655
Embed Views
0

Actions

Likes
1
Downloads
36
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Boston Hadoop User Group Presentation Boston Hadoop User Group Presentation Presentation Transcript

  • Boston Hadoop User GroupJeremy Rishel, SVP Engineering, Products, & DataApril 2012
  • Which is Better?A. More DataB. Better DataC. Better Algorithms Bluefin Labs Proprietary and Confidential
  • Which is Better?A. More DataB. Better DataC. Better AlgorithmsD. All of the Above Bluefin Labs Proprietary and Confidential
  • Social TVTelevision Social Web
  • Social TVTelevision Social Web
  • Social TVTelevision Social Web
  • Impressions
  • Impressions Expressions
  • Impressions Expressions
  • Kinds of Data and AlgorithmsPublic social media (Twitter, Facebook) 250M+ documents per dayProgramming info for 200+ U.S. networksVideo signal for 65+ U.S. networksBrand conversation & ad tracking for thousands of brandsRealtime semantic analysis of commentsDemographic & behavioral analysis of authorsAdvertising context & effect of advertising on brand dynamicsOverlap between audiences and comparative analysis Bluefin Labs Proprietary and Confidential
  • Realtime & Historical Data2M show telecasts1.5M ad airings / month50M links between social media users and TV shows / month10B links between social media users and TV ads / monthEnd-to-end latency in minutes - visible & searchable in realtimeHistorical data visible & searchable through various UIs/toolsSearchable text index of all social media comments in our archive &methods for large-scale analysis jobs (including MR) Bluefin Labs Proprietary and Confidential
  • Kinds of QuestionsWe often deal at the intersection of multiple data streams or data &algorithmsHow much chatter about a show (realtime)? (Social media +programming info + semantic analysis)What ads are airing (near realtime)? (Video signals + programminginfo + computer vision/audio fingerprinting)Which brands does the audience of a show talk most about? Whichshows do brand engaged authors talk most about? (Social media +programming info + brand data + semantic analysis + audienceoverlap analysis) Bluefin Labs Proprietary and Confidential
  • More Data“More data” can mean new streams, broader streams, or moregranular data“More data” powers better algorithms & aids in creating better data Bluefin Labs Proprietary and Confidential
  • More Data“More data” can mean new streams, broader streams, or moregranular data“More data” powers better algorithms & aids in creating better dataCapturing color, texture, & audio features from the TV video streamimproved our ad detection Bluefin Labs Proprietary and Confidential
  • More Data“More data” can mean new streams, broader streams, or moregranular data“More data” powers better algorithms & aids in creating better dataCapturing color, texture, & audio features from the TV video streamimproved our ad detectionTapping into full author history permitted better age classification Bluefin Labs Proprietary and Confidential
  • More Data“More data” can mean new streams, broader streams, or moregranular data“More data” powers better algorithms & aids in creating better dataCapturing color, texture, & audio features from the TV video streamimproved our ad detectionTapping into full author history permitted better age classificationAnalyzing closed caption gave us another dimension of semanticanalysis and avenues to explore social/mass media engagement Bluefin Labs Proprietary and Confidential
  • Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more useful Bluefin Labs Proprietary and Confidential
  • Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more usefulBoth realtime and large scale review & curation Bluefin Labs Proprietary and Confidential
  • Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more usefulBoth realtime and large scale review & curationSystematic monitoring, statistical QA, & estimation models Bluefin Labs Proprietary and Confidential
  • Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more usefulBoth realtime and large scale review & curationSystematic monitoring, statistical QA, & estimation modelsHigh quality data supports in-domain benchmarking (How is a showor network vs. competitors? How is a brand within its sector?) Bluefin Labs Proprietary and Confidential
  • Better Data“Better data” achieved through human-machine collaboration, with aview to continual improvement“Better data” makes for better algorithms & big data more usefulBoth realtime and large scale review & curationSystematic monitoring, statistical QA, & estimation modelsHigh quality data supports in-domain benchmarking (How is a showor network vs. competitors? How is a brand within its sector?)High quality and consistent data permits richer trend analysis (e.g.season-over-season or ad campaign-to-ad campaign comparison) Bluefin Labs Proprietary and Confidential
  • Better Algorithms“Better algorithms” include both new analytics & improvements toexisting ones“Better algorithm” approaches can be taken with more & better data Bluefin Labs Proprietary and Confidential
  • Better Algorithms“Better algorithms” include both new analytics & improvements toexisting ones“Better algorithm” approaches can be taken with more & better dataFocus areas of NLP/machine learning, computer vision, & statisticalanalysis; key to “better” is having a way to measure “goodness” Bluefin Labs Proprietary and Confidential
  • Better Algorithms“Better algorithms” include both new analytics & improvements toexisting ones“Better algorithm” approaches can be taken with more & better dataFocus areas of NLP/machine learning, computer vision, & statisticalanalysis; key to “better” is having a way to measure “goodness”Ad discovery methods possible changed once we shifted to broaderapproach Bluefin Labs Proprietary and Confidential
  • Better Algorithms“Better algorithms” include both new analytics & improvements toexisting ones“Better algorithm” approaches can be taken with more & better dataFocus areas of NLP/machine learning, computer vision, & statisticalanalysis; key to “better” is having a way to measure “goodness”Ad discovery methods possible changed once we shifted to broaderapproachHigher quality show telecast engagement data permits more preciseaudience analysis across domains - e.g. shows & networks to brands Bluefin Labs Proprietary and Confidential
  • All of the AboveMore data helps build better data & algorithmsBetter data improves algorithms & makes large data more usefulBetter algorithms get leverage out of more & better dataYou should care about all three Bluefin Labs Proprietary and Confidential
  • Jeremy Rishel jrishel@bluefinlabs.comConfidential