© 2014 Aerospike. All rights reserved. Confidential 1
Advanced Visual Analytics and
Real-time Analytics at
Platform scale
Kunal Umrigar
Senior Architect at Pubmatic
In conversation with Brian Bulkowski
CTO and co-founder
Aerospike
© 2014 Aerospike. All rights reserved. Confidential 2
Who am I ?
■ Starting: TRS-80, PC,
Apple II, Vax 11/70, Wang
■ First product: lightpen university teaching kiosk
■ Networks: computers without people are boring
■ Silicon Valley internet boom
■ 10B market cap in 1999, employee 32
■ 2003-2007 “time off” ( startups )
■ Citrusleaf / Aerospike history
■ 42 year old first-time CEO (me)
■ 2008 Prototype
■ 2010 First sale, get the band back together
■ 2011+ 3 rounds of funding (Draper, ALP, NEA, CNTP)
■ 2014 Open Source
■ 70 employees, 2 offices
brian@bulkowski.org
brian@aerospike.com
@bbulkow
© 2014 Aerospike. All rights reserved. Confidential 3
MILLIONS OF CONSUMERS
BILLIONS OF DEVICES
APP SERVERS
DATA
WAREHOUSEINSIGHTS
Advertising Technology Stack
WRITE CONTEXT
In-memory NoSQL
WRITE REAL-TIME CONTEXT
READ RECENT CONTENT
PROFILE STORE
Cookies, email, deviceID, IP address, location,
segments, clicks, likes, tweets, search terms...
REAL-TIME ANALYTICS
Best sellers, top scores, trending tweets
BATCH ANALYTICS
Discover patterns,
segment data:
location patterns,
audience affinity
© 2014 Aerospike. All rights reserved. Confidential 4
Introduction to Advertising: Real-time Bidding
© 2014 Aerospike. All rights reserved. Confidential 5
North American RTB speeds & feeds
■ 1 to 6 billion cookies tracked
■ Some companies track 200M, some track 20B
■ Each bidder has their own data pool
■ Data is your weapon
■ Recent searches, behavior, IP addresses
■ Audience clusters (K-cluster, K-means) from offline Hadoop
■ “Remnant” from Google, Yahoo is about 0.6 million / sec
■ Facebook exchange: about 0.6 million / sec
■ “other” is 0.5 million / sec
Currently about 3.0M / sec in North American
© 2014 Aerospike. All rights reserved. Confidential 6
Old Architecture ( scale out in 2000 )
Request routing and sharding
APP SERVERS
CACHE
DATABASE
STORAGE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
© 2014 Aerospike. All rights reserved. Confidential 7
Modern Scale Out Architecture
Load balancer
Simple stateless
APP SERVERS
IN-MEMORY NoSQL
RESEARCH
WAREHOUSE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
Long term cold
storage
Fast stateless
HDFS BASED

Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian Bulkowski, co-founder & CTO at Aerospike

  • 1.
    © 2014 Aerospike.All rights reserved. Confidential 1 Advanced Visual Analytics and Real-time Analytics at Platform scale Kunal Umrigar Senior Architect at Pubmatic In conversation with Brian Bulkowski CTO and co-founder Aerospike
  • 2.
    © 2014 Aerospike.All rights reserved. Confidential 2 Who am I ? ■ Starting: TRS-80, PC, Apple II, Vax 11/70, Wang ■ First product: lightpen university teaching kiosk ■ Networks: computers without people are boring ■ Silicon Valley internet boom ■ 10B market cap in 1999, employee 32 ■ 2003-2007 “time off” ( startups ) ■ Citrusleaf / Aerospike history ■ 42 year old first-time CEO (me) ■ 2008 Prototype ■ 2010 First sale, get the band back together ■ 2011+ 3 rounds of funding (Draper, ALP, NEA, CNTP) ■ 2014 Open Source ■ 70 employees, 2 offices brian@bulkowski.org brian@aerospike.com @bbulkow
  • 3.
    © 2014 Aerospike.All rights reserved. Confidential 3 MILLIONS OF CONSUMERS BILLIONS OF DEVICES APP SERVERS DATA WAREHOUSEINSIGHTS Advertising Technology Stack WRITE CONTEXT In-memory NoSQL WRITE REAL-TIME CONTEXT READ RECENT CONTENT PROFILE STORE Cookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms... REAL-TIME ANALYTICS Best sellers, top scores, trending tweets BATCH ANALYTICS Discover patterns, segment data: location patterns, audience affinity
  • 4.
    © 2014 Aerospike.All rights reserved. Confidential 4 Introduction to Advertising: Real-time Bidding
  • 5.
    © 2014 Aerospike.All rights reserved. Confidential 5 North American RTB speeds & feeds ■ 1 to 6 billion cookies tracked ■ Some companies track 200M, some track 20B ■ Each bidder has their own data pool ■ Data is your weapon ■ Recent searches, behavior, IP addresses ■ Audience clusters (K-cluster, K-means) from offline Hadoop ■ “Remnant” from Google, Yahoo is about 0.6 million / sec ■ Facebook exchange: about 0.6 million / sec ■ “other” is 0.5 million / sec Currently about 3.0M / sec in North American
  • 6.
    © 2014 Aerospike.All rights reserved. Confidential 6 Old Architecture ( scale out in 2000 ) Request routing and sharding APP SERVERS CACHE DATABASE STORAGE CONTENT DELIVERY NETWORK LOAD BALANCER
  • 7.
    © 2014 Aerospike.All rights reserved. Confidential 7 Modern Scale Out Architecture Load balancer Simple stateless APP SERVERS IN-MEMORY NoSQL RESEARCH WAREHOUSE CONTENT DELIVERY NETWORK LOAD BALANCER Long term cold storage Fast stateless HDFS BASED

Editor's Notes

  • #2 Hello, my name is brian bulkowski My credentials are that I have spent over 25 years in silicon valley as an engineer, architect, technical lead, and CTO, working for companies like Novell, a streaming video company in the 90’s, a Netscape spinoff called Navio that went public under the name Liberate and reached a $10B valuation. As the initial Founder of Aerospike, I’ve had the privilege to work with companies reaching the highest levels of data scale, and help them achieve their business as well as technical goals. Companies such as BlueKai (now Oracle), Neustar, AppNexus, Yahoo, AOL, Ebay and a wide variety of others. Today my goal is to discuss the next generation data and scale architectures used inside the most leading-edge consumers of data – the advertising industry – which is now being used to power more responsive and personalized online experiences across many industries.
  • #4 This is the technology stack that major advertising technology companies built To sustain the crushing load of aggregating the clicks and views from so many websites Individual retailers are now using this same tech stack, for the same reason They wish to present an experience, and include Analytics-based results
  • #5 Let’s start with what happened in internet advertising that kicked off a scale revolution. In 2000, Google launched AdWords. This was the ability to buy advertising on a search keyword, instead of a web page. Display advertising was still static. Prior to 2005, internet advertising was traded statically. A person bought a certain number of impressions on a website – like buying 1M impressions on the Yahoo home page. These were “rotated” using a variety of technologies, but the model fit the existing model of media buying. Advertising companies would say “you want an article in Car and Driver? And on Car and Driver’s website?” In April 2007, Yahoo bought RightMedia. March 2008, Google acquired DoubleClick. Both of these systems matching display advertisements with consumers, based on “cost per click”, and revolutionized the industry. Google’s position in the center of advertising, as a black box, was challenged by open bidding exchanges. Founders of both RightMedia and DoubleClick created several companies, and an open “auction system” to democratize the flow of impressions (from publishers) and ads (from advertisers). These companies realized that real time pricing – individual auctions – were the only fair system for determining price. The RTB system has been used to monitize “long tail” (remenant) advertising, which catches users wherever they might go. “Premium” advertising is still in high demand, and may or may not enter the real-time bidding system. At the time of Facebook’s public launch, they used the same closed system that Google Search uses. They eventually found they didn’t have enough advertising content, income was down, thus they opened facebook exchange. At the time, many technologists said about advertising: the algorithms are simple, it’s only scaling that’s hard. Exchanges that slowed down publisher websites were quickly avoided. The 150 millisecond rule was established, with advertising platforms needing to deliver ads in 150ms to an end user. Platform companies realized the critical nature of keeping that contract – if they failed, there would be fewer ads per page, and less revenue to be had. Although some might argue that there is too much display advertising today, this exchange capacity has become necessary for satisfying mobile.
  • #7 This is the old scale out architecture In the old system, you used cache and storage tiers, and traditional databases. This architecture does work. System like Facebook reached massive scale Few internet companies used storage vendor applications. Amazon and Google didn’t. Tell Srini’s story about scaling Yahoo Mobile with Netapp and Oracle.
  • #8 Here are the technologies and technology providers to watch in each area. Go through the App Layer in particular Research warehouse --- includes new systems like Spark --- easy to have multiple analytics systems, common in large deployments --- HDFS based systems