SlideShare a Scribd company logo
1 of 32
Download to read offline
© 2014 Aerospike. All rights reserved. Confidential 1
What Enterprises Can Learn
from Real-time Bidding
How, and why, to achieve
Operational Big Data
Brian Bulkowski
CTO and co-founder
Aerospike
© 2014 Aerospike. All rights reserved. Confidential 2
REQUIREMENTS FOR INTERNET ENTERPRISES
© 2014 Aerospike. All rights reserved. Confidential 3
Introduction to Advertising: Real-time Bidding
© 2014 Aerospike. All rights reserved. Confidential 4
North American RTB speeds & feeds
■  1 to 6 billion cookies tracked
■  Some companies track 200M, some track 20B
■  Each bidder has their own data pool
■  Data is your weapon
■  Recent searches, behavior, IP addresses
■  Audience clusters (K-cluster, K-means) from offline Hadoop
■  “Remnant” from Google, Yahoo is about 0.6 million / sec
■  Facebook exchange: about 0.6 million / sec
■  “other” is 0.5 million / sec
Currently about 3.0M / sec in North American
© 2014 Aerospike. All rights reserved. Confidential 5
Advertising requirements
■  100 millisecond or 150 millisecond ad delivery
■  De-facto standard set in 2004 by Washington Post and others
■  North America is 70 to 90 milliseconds wide
■  Two or three data centers
■  Auction is limited to 30 milliseconds
■  Typically closes in 5 milliseconds
■  Winners have more data, better models – in 5 milliseconds
© 2014 Aerospike. All rights reserved. Confidential 6
MILLIONS OF CONSUMERS
BILLIONS OF DEVICES
APP SERVERS
DATA
WAREHOUSEINSIGHTS
Advertising Technology Stack
WRITE CONTEXT
OPERATIONAL DB
WRITE REAL-TIME CONTEXT
READ RECENT CONTENT
PROFILE STORE
Cookies, email, deviceID, IP address, location,
segments, clicks, likes, tweets, search terms...
REAL-TIME ANALYTICS
Best sellers, top scores, trending tweets
BATCH ANALYTICS
Discover patterns,
segment data:
location patterns,
audience affinity
© 2014 Aerospike. All rights reserved. Confidential 7
Financial Services – Intraday Positions
LEGACY DATABASE
(MAINFRAME)
Read/Write
Start of Day
Data Loading
End of Day
Reconciliation
Query
REAL-TIME
DATA FEED
ACCOUNT
POSITIONS
XDR
10M+ user records
Primary key access
1M+ TPS planned
Finance App
Records App
RT Reporting App
© 2014 Aerospike. All rights reserved. Confidential 8
Social Media
MYSQL or POSTGRES
(ROTATIONAL DISK)
Recent user
generated content
Java application tier
Data abstraction
and sharding
MODIFIED REDIS
(SSD ENABLED)
Content and
Historical data
© 2014 Aerospike. All rights reserved. Confidential 9
Travel Portal
PRICING DATABASE
(RATE LIMITED)
Poll for
Pricing
Changes
PRICING
DATA
Store
Latest
Price
SESSION
MANAGEMENT
Session
Data
Read
Price
XDR
Airlines forced interstate
banking
Legacy mainframe
technology
Multi-company
reservation and pricing
Requirement: 1M TPS
allowing overhead
Travel App
© 2014 Aerospike. All rights reserved. Confidential 10
SOURCE
DEVICE/ USER
QOS & Real-Time Billing for Telcos
■  In-switch Per HTTP request Billing
■ US Telcos: 200M subscribers, 50 metros
■  In-memory use case
Hot Standby
Execute Request
Real-time
Checks
DESTINATION
Update
Device
User
Settings
Request
XDR
Real-time Auth. QoS Billing
Config Module App
© 2014 Aerospike. All rights reserved. Confidential 11
MILLIONS OF CONSUMERS
BILLIONS OF DEVICES
APP SERVERS
BATCH
ANALYTICSINSIGHTS
BATCH ANALYTICS
The New Architecture
WRITE CONTEXT
TRANSACTIONS &
HOT ANALYTICS
© 2014 Aerospike. All rights reserved. Confidential 12
Old Architecture ( scale out in 2000 )
Request routing and sharding
APP SERVERS
CACHE
DATABASE
STORAGE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
© 2014 Aerospike. All rights reserved. Confidential 13
Modern Scale Out Architecture
Load balancer
Simple stateless
APP SERVERS
IN-MEMORY NoSQL
RESEARCH
WAREHOUSE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
Long term cold storageFast stateless
© 2014 Aerospike. All rights reserved. Confidential 14
Modern Scale Out Architecture
Load balancer
Simple stateless
APP SERVERS
IN-MEMORY NoSQL
RESEARCH
WAREHOUSE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
Long term cold storageFast stateless
HDFS BASED
© 2014 Aerospike. All rights reserved. Confidential 15
Build a data layer
Use open source
Focus on Key Value
Use In-memory NoSQL
Use Flash
© 2014 Aerospike. All rights reserved. Confidential 16
How Fast You Can Go
And
How To Do It
© 2014 Aerospike. All rights reserved. Confidential 17
SHARED-NOTHING SYSTEM:100% DATA
AVAILABILITY
■  Every node in a cluster is identical,
handles both transactions and long
running tasks
■  Data is replicated synchronously with
immediate consistency within the cluster
■  Data is replicated asynchronously
across data centers
OHIO Data Center
© 2014 Aerospike. All rights reserved. Confidential 18
LESSONS
1.  Optimize key-value code paths
■  No hot spots (e.g., robust DHT)
■  Scales up easily (e.g., easy to size)
■  Avoids points of failure (e.g., single node type)
■  Binary protocol
2.  Code in C
■  Read() / Write() / Linux AIO ( don’t trust a library )
■  Multithread !
■  Direct device
■  C++ could work, leads to structural complexity
3.  Memory allocation matters
■  Stack based allocators
■  Own stack allocator
■  JEMalloc for pools
© 2014 Aerospike. All rights reserved. Confidential 19
LESSONS (cont’d)
4.  Innovation: masters in a shared nothing system
■  Fast cluster organization
■  Fast transaction capabilities
■  Can be CP or AP - and resolve data accurately
5.  Clients are hard
■  Fast stable connection pools are hard
■  API design matters
■  Slow languages need Aerospike more
6.  Aggregations / queries required
■  Row oriented
■  Secondary indexes as filters, and MR style
■  Data transformation hurts
© 2014 Aerospike. All rights reserved. Confidential 20
LESSONS (cont’d)
7.  Network interrupts are painful
■  TCP is still better than UDP
■  Some great hardware out there (solarflare)
■  Network queues, RRD
■  New PCI-e, other interfaces are not there yet
■  Larger clusters solve interface issues
8.  “Large data types” (better documents)
■  Ordered List, Map, Set, Stack
■  Time series and documents
■  Beyond per-row storage layout, more optimal than document
9.  UDFs for flexibility
© 2014 Aerospike. All rights reserved. Confidential 21
WRITING RELIABILY WITH HIGH PERFORMANCE
1.  Write sent to row master
2.  Latch against simultaneous writes
3.  Apply write to master memory and replica
memory synchronously
4.  Queue operations to disk
5.  Signal completed transaction (optional
storage commit wait)
6.  Master applies conflict resolution policy
(rollback/ rollforward)
master replica
1.  Cluster discovers new node via gossip
protocol
2.  Paxos vote determines new data
organization
3.  Partition migrations scheduled
4.  When a partition migration starts, write
journal starts on destination
5.  Partition moves atomically
6.  Journal is applied and source data deleted
transactions
continue
Writing with Immediate Consistency Adding a Node
© 2014 Aerospike. All rights reserved. Confidential 22
Key Value Store + Lists, Maps
■  Namespaces (policy containers)
■  Determine storage - DRAM or Flash
■  Determine replication factor
■  Contain records and sets
■  Sets (tables) of records
■  Arbitrary grouping
■  Records (rows) of key/bins
■  Block size (128k – 2MB)
■  Bin with same name can contain values
of different types
■  String, integer, bytes (raw, blob, etc)
■  list ( an ordered collection of values )
■  map ( a collection of keys and values )
■  Bins can be added anytime
■  Meta data
■  Generation counter so apps can ensure that a
record was not modified since last read
■  Time-to-live value for auto expiration, keeping most
recent context or "hot" data, aging out historical
context
© 2014 Aerospike. All rights reserved. Confidential 23
KVS + Lists, Maps + Queries + UDFs
STREAM
AGGREGATIONS
(INDEXED MAP-REDUCE)
Pipe Query results
through UDFs
■  Filter, Transform,
Aggregate.. Map,
Reduce
■  Enforce security
■  UDFs in Lua to
■ CRUD a record
■ Calculation based
on data within a
record
■ Iterate through a
set / namespace of
records
■  UDFs for real-time
analytics and
aggregations
© 2014 Aerospike. All rights reserved. Confidential 24
SQL & NoSQL
➤ Secondary index
§  Equality, Range, IN (,,,), Compound
§  e.g. WHERE group_id = 1234,
WHERE last_activity > 1349293398, WHERE
branch_id IN (5,6,7,8)
➤ Filters
§  SQL: Where clause with non-indexed “AND”s
(e.g. “AND gender=‘M’ ”)
§  NOSQL: Map step
➤ Aggregation
§  SQL: GROUP BY, ORDER BY, LIMIT, OFFSET
§  NOSQL: Reduce step
Secondary Key
Primary Key
Record
Filter Map
Aggregate
DRAM
SSD
Aggregate
Client
Client
Server
Reduce
Aggregate
Query
© 2014 Aerospike. All rights reserved. Confidential 25
Flash storage
The Power of Flash Storage
© 2014 Aerospike. All rights reserved. Confidential 26
DATABASE
OS FILE SYSTEM
PAGE CACHE
BLOCK INTERFACE
SSD HDD
BLOCK INTERFACE
SSD SSD
OPEN NVM
SSD
Ask me and I’ll tell you the answer.Ask me. I’ll look up the answer and then tell it to
you.
DATABASE
HYBRID MEMORY SYSTEM™
•  Direct device access
•  Large Block Writes
•  Indexes in DRAM
•  Highly Parallelized
•  Log-structured FS “copy-on-write”
•  Fast restart with shared memory
FLASH OPTIMIZED HIGH
PERFORMANCE
© 2014 Aerospike. All rights reserved. Confidential 27© 2012 Aerospike. All rights reserved. Pg. 27
Measure your drives!
Aerospike Certification Tool (ACT)
http://github.com/aerospike/act
Transactional database workload
Reads: 1.5KB
(can’t batch / cache reads, random)
Writes: 128K blocks
(log based layout)
(plus defragmentation)
Turn up the load until
latency is over required SLA
© 2014 Aerospike. All rights reserved. Confidential 28
Aerospike’s Flash Experience
■  Know your Flash
■ ACT benchmark http://github.com/aerospike/act
■ Read-write benchmark results back to 2011
■  All clouds support flash now
■ New EC2 instances
■ Google Compute
■ Internap, Softlayer, GoGrid…
■  Write durability usually not a problem with modern flash
■ Durability is high (5 “drive writes per day” for 5 years, etc)
■ Read performance suffers under write load anyway
© 2014 Aerospike. All rights reserved. Confidential 29
Aerospike’s Flash Experience
■  Densities increasing
■ 100G 2 years ago à 800G today
■ SATA vs PCI-E
■ Appliances: 50T per 1U this year
■  Prices still dropping: perhaps $1/G next year
■  Intel P3700 results
■ 250K per device @ $2.5 / G
■ Old standard: Micron P320h 500K @ $8 / G
■  “Wide SATA”
■ 20 SATA drives
■ LSI “pass through mode”
■ 250K+ per server
© 2014 Aerospike. All rights reserved. Confidential 30
Use Open Source
© 2014 Aerospike. All rights reserved. Confidential 31
Aerospike: the trusted In-Memory NoSQL
Performance
• Over ten trillion transactions per month
• 99% of transactions < 2 ms
• 150K TPS per server
Scalability
• Billions of Internet users
• Clustered Software
• Maintenance without downtime
• Scale up & scale out
Reliability
• 50 customers; zero down-time
• Immediate Consistency
• Rapid Failover; Data Center Replication
Price/Performance
• Makes impossible projects affordable
• Flash-optimized
• 1/10 the servers required
© 2014 Aerospike. All rights reserved. Confidential 32

More Related Content

What's hot

Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory StorageDataWorks Summit
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance DataWorks Summit/Hadoop Summit
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingZhe Zhang
 
Hadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionHadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionErik Krogen
 
Tachyon meetup slides.
Tachyon meetup slides.Tachyon meetup slides.
Tachyon meetup slides.David Groozman
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraDataWorks Summit
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFSHadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFSErik Krogen
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringErik Krogen
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryChris Nauroth
 

What's hot (20)

Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory Storage
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
 
Hadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionHadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop Encryption
 
Tachyon meetup slides.
Tachyon meetup slides.Tachyon meetup slides.
Tachyon meetup slides.
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFSHadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
 

Similar to What enterprises can learn from Real Time Bidding (RTB)

Brian Bulkowski : what startups can learn from real-time bidding
Brian Bulkowski : what startups can learn from real-time biddingBrian Bulkowski : what startups can learn from real-time bidding
Brian Bulkowski : what startups can learn from real-time biddingAerospike
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?Aerospike, Inc.
 
Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Aerospike, Inc.
 
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACIDACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACIDAerospike, Inc.
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveAerospike, Inc.
 
Real-Time Analytics in Transactional Applications by Brian Bulkowski
Real-Time Analytics in Transactional Applications by Brian BulkowskiReal-Time Analytics in Transactional Applications by Brian Bulkowski
Real-Time Analytics in Transactional Applications by Brian BulkowskiData Con LA
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike ArchitecturePeter Milne
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial ServicesAerospike
 
NoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesNoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesRonen Botzer
 
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...Aerospike, Inc.
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...In-Memory Computing Summit
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyStuart Pook
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsAshish Mrig
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-finalHaluk Ulubay
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Datamichaelguia
 

Similar to What enterprises can learn from Real Time Bidding (RTB) (20)

Brian Bulkowski : what startups can learn from real-time bidding
Brian Bulkowski : what startups can learn from real-time biddingBrian Bulkowski : what startups can learn from real-time bidding
Brian Bulkowski : what startups can learn from real-time bidding
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?
 
Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...
 
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACIDACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
 
Real-Time Analytics in Transactional Applications by Brian Bulkowski
Real-Time Analytics in Transactional Applications by Brian BulkowskiReal-Time Analytics in Transactional Applications by Brian Bulkowski
Real-Time Analytics in Transactional Applications by Brian Bulkowski
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
 
NoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesNoSQL in Real-time Architectures
NoSQL in Real-time Architectures
 
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
 
Introduction to Aerospike
Introduction to AerospikeIntroduction to Aerospike
Introduction to Aerospike
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
 

More from bigdatagurus_meetup

More from bigdatagurus_meetup (11)

Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
 
Quantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSQuantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFS
 
Scaling HBase at Pinterest
Scaling HBase at PinterestScaling HBase at Pinterest
Scaling HBase at Pinterest
 
Continuuity Weave
Continuuity WeaveContinuuity Weave
Continuuity Weave
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Cloudera Developer Kit (CDK)
Cloudera Developer Kit (CDK)Cloudera Developer Kit (CDK)
Cloudera Developer Kit (CDK)
 
Lipstick On Pig
Lipstick On Pig Lipstick On Pig
Lipstick On Pig
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

What enterprises can learn from Real Time Bidding (RTB)

  • 1. © 2014 Aerospike. All rights reserved. Confidential 1 What Enterprises Can Learn from Real-time Bidding How, and why, to achieve Operational Big Data Brian Bulkowski CTO and co-founder Aerospike
  • 2. © 2014 Aerospike. All rights reserved. Confidential 2 REQUIREMENTS FOR INTERNET ENTERPRISES
  • 3. © 2014 Aerospike. All rights reserved. Confidential 3 Introduction to Advertising: Real-time Bidding
  • 4. © 2014 Aerospike. All rights reserved. Confidential 4 North American RTB speeds & feeds ■  1 to 6 billion cookies tracked ■  Some companies track 200M, some track 20B ■  Each bidder has their own data pool ■  Data is your weapon ■  Recent searches, behavior, IP addresses ■  Audience clusters (K-cluster, K-means) from offline Hadoop ■  “Remnant” from Google, Yahoo is about 0.6 million / sec ■  Facebook exchange: about 0.6 million / sec ■  “other” is 0.5 million / sec Currently about 3.0M / sec in North American
  • 5. © 2014 Aerospike. All rights reserved. Confidential 5 Advertising requirements ■  100 millisecond or 150 millisecond ad delivery ■  De-facto standard set in 2004 by Washington Post and others ■  North America is 70 to 90 milliseconds wide ■  Two or three data centers ■  Auction is limited to 30 milliseconds ■  Typically closes in 5 milliseconds ■  Winners have more data, better models – in 5 milliseconds
  • 6. © 2014 Aerospike. All rights reserved. Confidential 6 MILLIONS OF CONSUMERS BILLIONS OF DEVICES APP SERVERS DATA WAREHOUSEINSIGHTS Advertising Technology Stack WRITE CONTEXT OPERATIONAL DB WRITE REAL-TIME CONTEXT READ RECENT CONTENT PROFILE STORE Cookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms... REAL-TIME ANALYTICS Best sellers, top scores, trending tweets BATCH ANALYTICS Discover patterns, segment data: location patterns, audience affinity
  • 7. © 2014 Aerospike. All rights reserved. Confidential 7 Financial Services – Intraday Positions LEGACY DATABASE (MAINFRAME) Read/Write Start of Day Data Loading End of Day Reconciliation Query REAL-TIME DATA FEED ACCOUNT POSITIONS XDR 10M+ user records Primary key access 1M+ TPS planned Finance App Records App RT Reporting App
  • 8. © 2014 Aerospike. All rights reserved. Confidential 8 Social Media MYSQL or POSTGRES (ROTATIONAL DISK) Recent user generated content Java application tier Data abstraction and sharding MODIFIED REDIS (SSD ENABLED) Content and Historical data
  • 9. © 2014 Aerospike. All rights reserved. Confidential 9 Travel Portal PRICING DATABASE (RATE LIMITED) Poll for Pricing Changes PRICING DATA Store Latest Price SESSION MANAGEMENT Session Data Read Price XDR Airlines forced interstate banking Legacy mainframe technology Multi-company reservation and pricing Requirement: 1M TPS allowing overhead Travel App
  • 10. © 2014 Aerospike. All rights reserved. Confidential 10 SOURCE DEVICE/ USER QOS & Real-Time Billing for Telcos ■  In-switch Per HTTP request Billing ■ US Telcos: 200M subscribers, 50 metros ■  In-memory use case Hot Standby Execute Request Real-time Checks DESTINATION Update Device User Settings Request XDR Real-time Auth. QoS Billing Config Module App
  • 11. © 2014 Aerospike. All rights reserved. Confidential 11 MILLIONS OF CONSUMERS BILLIONS OF DEVICES APP SERVERS BATCH ANALYTICSINSIGHTS BATCH ANALYTICS The New Architecture WRITE CONTEXT TRANSACTIONS & HOT ANALYTICS
  • 12. © 2014 Aerospike. All rights reserved. Confidential 12 Old Architecture ( scale out in 2000 ) Request routing and sharding APP SERVERS CACHE DATABASE STORAGE CONTENT DELIVERY NETWORK LOAD BALANCER
  • 13. © 2014 Aerospike. All rights reserved. Confidential 13 Modern Scale Out Architecture Load balancer Simple stateless APP SERVERS IN-MEMORY NoSQL RESEARCH WAREHOUSE CONTENT DELIVERY NETWORK LOAD BALANCER Long term cold storageFast stateless
  • 14. © 2014 Aerospike. All rights reserved. Confidential 14 Modern Scale Out Architecture Load balancer Simple stateless APP SERVERS IN-MEMORY NoSQL RESEARCH WAREHOUSE CONTENT DELIVERY NETWORK LOAD BALANCER Long term cold storageFast stateless HDFS BASED
  • 15. © 2014 Aerospike. All rights reserved. Confidential 15 Build a data layer Use open source Focus on Key Value Use In-memory NoSQL Use Flash
  • 16. © 2014 Aerospike. All rights reserved. Confidential 16 How Fast You Can Go And How To Do It
  • 17. © 2014 Aerospike. All rights reserved. Confidential 17 SHARED-NOTHING SYSTEM:100% DATA AVAILABILITY ■  Every node in a cluster is identical, handles both transactions and long running tasks ■  Data is replicated synchronously with immediate consistency within the cluster ■  Data is replicated asynchronously across data centers OHIO Data Center
  • 18. © 2014 Aerospike. All rights reserved. Confidential 18 LESSONS 1.  Optimize key-value code paths ■  No hot spots (e.g., robust DHT) ■  Scales up easily (e.g., easy to size) ■  Avoids points of failure (e.g., single node type) ■  Binary protocol 2.  Code in C ■  Read() / Write() / Linux AIO ( don’t trust a library ) ■  Multithread ! ■  Direct device ■  C++ could work, leads to structural complexity 3.  Memory allocation matters ■  Stack based allocators ■  Own stack allocator ■  JEMalloc for pools
  • 19. © 2014 Aerospike. All rights reserved. Confidential 19 LESSONS (cont’d) 4.  Innovation: masters in a shared nothing system ■  Fast cluster organization ■  Fast transaction capabilities ■  Can be CP or AP - and resolve data accurately 5.  Clients are hard ■  Fast stable connection pools are hard ■  API design matters ■  Slow languages need Aerospike more 6.  Aggregations / queries required ■  Row oriented ■  Secondary indexes as filters, and MR style ■  Data transformation hurts
  • 20. © 2014 Aerospike. All rights reserved. Confidential 20 LESSONS (cont’d) 7.  Network interrupts are painful ■  TCP is still better than UDP ■  Some great hardware out there (solarflare) ■  Network queues, RRD ■  New PCI-e, other interfaces are not there yet ■  Larger clusters solve interface issues 8.  “Large data types” (better documents) ■  Ordered List, Map, Set, Stack ■  Time series and documents ■  Beyond per-row storage layout, more optimal than document 9.  UDFs for flexibility
  • 21. © 2014 Aerospike. All rights reserved. Confidential 21 WRITING RELIABILY WITH HIGH PERFORMANCE 1.  Write sent to row master 2.  Latch against simultaneous writes 3.  Apply write to master memory and replica memory synchronously 4.  Queue operations to disk 5.  Signal completed transaction (optional storage commit wait) 6.  Master applies conflict resolution policy (rollback/ rollforward) master replica 1.  Cluster discovers new node via gossip protocol 2.  Paxos vote determines new data organization 3.  Partition migrations scheduled 4.  When a partition migration starts, write journal starts on destination 5.  Partition moves atomically 6.  Journal is applied and source data deleted transactions continue Writing with Immediate Consistency Adding a Node
  • 22. © 2014 Aerospike. All rights reserved. Confidential 22 Key Value Store + Lists, Maps ■  Namespaces (policy containers) ■  Determine storage - DRAM or Flash ■  Determine replication factor ■  Contain records and sets ■  Sets (tables) of records ■  Arbitrary grouping ■  Records (rows) of key/bins ■  Block size (128k – 2MB) ■  Bin with same name can contain values of different types ■  String, integer, bytes (raw, blob, etc) ■  list ( an ordered collection of values ) ■  map ( a collection of keys and values ) ■  Bins can be added anytime ■  Meta data ■  Generation counter so apps can ensure that a record was not modified since last read ■  Time-to-live value for auto expiration, keeping most recent context or "hot" data, aging out historical context
  • 23. © 2014 Aerospike. All rights reserved. Confidential 23 KVS + Lists, Maps + Queries + UDFs STREAM AGGREGATIONS (INDEXED MAP-REDUCE) Pipe Query results through UDFs ■  Filter, Transform, Aggregate.. Map, Reduce ■  Enforce security ■  UDFs in Lua to ■ CRUD a record ■ Calculation based on data within a record ■ Iterate through a set / namespace of records ■  UDFs for real-time analytics and aggregations
  • 24. © 2014 Aerospike. All rights reserved. Confidential 24 SQL & NoSQL ➤ Secondary index §  Equality, Range, IN (,,,), Compound §  e.g. WHERE group_id = 1234, WHERE last_activity > 1349293398, WHERE branch_id IN (5,6,7,8) ➤ Filters §  SQL: Where clause with non-indexed “AND”s (e.g. “AND gender=‘M’ ”) §  NOSQL: Map step ➤ Aggregation §  SQL: GROUP BY, ORDER BY, LIMIT, OFFSET §  NOSQL: Reduce step Secondary Key Primary Key Record Filter Map Aggregate DRAM SSD Aggregate Client Client Server Reduce Aggregate Query
  • 25. © 2014 Aerospike. All rights reserved. Confidential 25 Flash storage The Power of Flash Storage
  • 26. © 2014 Aerospike. All rights reserved. Confidential 26 DATABASE OS FILE SYSTEM PAGE CACHE BLOCK INTERFACE SSD HDD BLOCK INTERFACE SSD SSD OPEN NVM SSD Ask me and I’ll tell you the answer.Ask me. I’ll look up the answer and then tell it to you. DATABASE HYBRID MEMORY SYSTEM™ •  Direct device access •  Large Block Writes •  Indexes in DRAM •  Highly Parallelized •  Log-structured FS “copy-on-write” •  Fast restart with shared memory FLASH OPTIMIZED HIGH PERFORMANCE
  • 27. © 2014 Aerospike. All rights reserved. Confidential 27© 2012 Aerospike. All rights reserved. Pg. 27 Measure your drives! Aerospike Certification Tool (ACT) http://github.com/aerospike/act Transactional database workload Reads: 1.5KB (can’t batch / cache reads, random) Writes: 128K blocks (log based layout) (plus defragmentation) Turn up the load until latency is over required SLA
  • 28. © 2014 Aerospike. All rights reserved. Confidential 28 Aerospike’s Flash Experience ■  Know your Flash ■ ACT benchmark http://github.com/aerospike/act ■ Read-write benchmark results back to 2011 ■  All clouds support flash now ■ New EC2 instances ■ Google Compute ■ Internap, Softlayer, GoGrid… ■  Write durability usually not a problem with modern flash ■ Durability is high (5 “drive writes per day” for 5 years, etc) ■ Read performance suffers under write load anyway
  • 29. © 2014 Aerospike. All rights reserved. Confidential 29 Aerospike’s Flash Experience ■  Densities increasing ■ 100G 2 years ago à 800G today ■ SATA vs PCI-E ■ Appliances: 50T per 1U this year ■  Prices still dropping: perhaps $1/G next year ■  Intel P3700 results ■ 250K per device @ $2.5 / G ■ Old standard: Micron P320h 500K @ $8 / G ■  “Wide SATA” ■ 20 SATA drives ■ LSI “pass through mode” ■ 250K+ per server
  • 30. © 2014 Aerospike. All rights reserved. Confidential 30 Use Open Source
  • 31. © 2014 Aerospike. All rights reserved. Confidential 31 Aerospike: the trusted In-Memory NoSQL Performance • Over ten trillion transactions per month • 99% of transactions < 2 ms • 150K TPS per server Scalability • Billions of Internet users • Clustered Software • Maintenance without downtime • Scale up & scale out Reliability • 50 customers; zero down-time • Immediate Consistency • Rapid Failover; Data Center Replication Price/Performance • Makes impossible projects affordable • Flash-optimized • 1/10 the servers required
  • 32. © 2014 Aerospike. All rights reserved. Confidential 32