Your SlideShare is downloading. ×
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media Markets
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

One Billion Rows per Second: Analytics for the Digital Media Markets

4,139

Published on

Published in: Economy & Finance, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,139
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
34
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Across traditional desktop, mobile, and now gaming platforms, there are billions of advertising events occurring ever day. Many of these are priced and bought in real-time.Willie Sutton was once asked, why do you rob banks? That’s where the money is.For me, the reason I was enticed by this vertical is similar: that’s where the data is.
  • Strategic implications:
  • Practically speaking, we define this as:data freshness on the order of minutesbut queries over the data, made through our dashboard, return in secondsHadoop isn’t enough.
  • Practically speaking, we define this as:data freshness on the order of minutesbut queries over the data, made through our dashboard, return in seconds
  • Hadoop summarizes and precomputes a ton.
  • Hadoop summarizes and precomputes a ton.
  • Hadoop summarizes and precomputes a ton.
  • We all know that things taste better when they’re fresh.Data is no different.Jeff Jonas says, no value is knowing where the traffic was five minutes ago.
  • We all know that things taste better when they’re fresh.Data is no different.Jeff Jonas says, no value is knowing where the traffic was five minutes ago.
  • Dialogue with the data.Eliminate the chain of data bureaucrats and put the data in the hand of the decision maker.Get in the car & drive yourself.
  • Transcript

    • 1. One Billion Rows Per Second:
      Analytics for the Digital Media Markets
      STRATA SUMMIT NYC
      September 21, 2011
      MICHAEL DRISCOLL
      CO-FOUNDER & CTO
      @medriscoll
    • 2. Taming the Inferno of the Online Ad Markets
      • billions of microtransactions per day
      • 3. dozens of publisher, advertiser, & audience attributes
    • Goal: Fast Dashboards
      Over Big Data
    • 4. Goal: Fast Dashboards
      Over Big Data
      dashboard
      queries in
      seconds
      database
      data
      crunched in minutes
      ingestion
    • 5. Solution 1:
      Relational
      Database
      dashboard
      queries in
      minutes
      database
      MPP relational DB
      data
      crunched in minutes
      ingestion
      Hadoop
    • 6. Solution 2:
      HBase
      dashboard
      queries
      in seconds
      database
      HBase
      data
      crunched
      in hours
      ingestion
      Hadoop
    • 7. Solution 3:
      Do It Ourselves: Druid
      dashboard
      queries
      in seconds
      database
      Druid
      data
      crunched
      in minutes
      ingestion
      Hadoop
    • 8. Four Principles of Druid’s Performance at Scale
      SUMMARIZE
      100x smaller
      vs raw data
      DISTRIBUTE
      100x throughput
      vs a single node
      PARALLELIZE
      100x faster
      vs reading disk
      STORE IN-MEMORY
      = 10^6
      Druid can filter and aggregate over 1 billion rows per second on a 50-core cluster,
      or 20m rows per core per second
      factor speed-up
    • 9. Consequences of Speed: Data Freshness
      photo credit: Lars P. http://www.flickr.com/photos/lars_p/4911238308/sizes/o/in/photostream/
    • 10. Consequences of Speed: Blue Sky Exploration
      photo credit: MonkeyAt Large http://www.flickr.com/photos/monkeyatlarge/16645379/sizes/l/in/photostream/
    • 11. Consequences of Speed: Interactivity
      photo credit tonylanciabeta http://www.flickr.com/photos/tonysphotos/3305157904/sizes/o/in/photostream/
    • 12. One Billion Rows Per Second:
      Analytics for the Digital Media Markets
      QUESTIONS? CONTACT ME AT MIKE@METAMARKETSGROUP.COM
      MICHAEL DRISCOLL
      CO-FOUNDER & CTO
      @medriscoll

    ×