Cassandra @ eBay
Jay Patel
Architect, Platform Systems
@pateljay3001
eBay Marketplaces
Thousands of servers
Petabytes of data
Billions of SQLs/day24x7x365
99.98+% Availability
turning over a TBevery second
Multiple Datacenters
Near-Real-time
Always online
400+ million items for sale
$75 billion+ per year in goods are sold on eBay
Big Data
112 million active users
Billions of page views/day
3
eBay Site Data Infrastructure
Don’t force!
One size does not fit all.
It’s a mixture of
multiple SQL &
NoSQL databases.
We use the right
database for the
right problem.
eBay Site Data Infrastructure
A heterogeneous mixture
Thousands of nodes
> 2K sharded logical host
> 16K tables
> 27K indexes
> 140 billion SQLs/day
> 5 PB provisioned
Hundreds of nodes
Persistent & in-memory
> 40 billion SQLs/day
10+ clusters, 100+ nodes
> 250 TB provisioned
(local HDD + shared SSD)
> 9 billion writes/day
> 5 billion reads/day
Hundreds of nodes
> 50 TB
> 2 billion ops/day
Thousands of nodes
The world largest
cluster with 2K+ nodes
Dozens of nodes
How do we scale RDBMS?
 Shard
– Patterns: Modulus, lookup-based, range, etc.
– Application sees only logical shard/database
 Replicate
– Disaster recovery, read availability & read scalability
 Big NOs
– No transactions
– No joins
– No referential integrity constraints 5
Why Cassandra?
 Multi-datacenter (active-active)
 Always Available - No SPOF
 Easy to scale up & down
6
 Write performance
 Distributed counters
 Hadoop support
Not replacing RDBMS, but complementing!
 Some use cases don’t fit well in RDBMS - sparse data, big data,
flexible schema, real-time analytics, …
 Many use cases don’t need top-tier set-ups.
Cassandra Growth
Aug,2011
Aug,2012
ay,2013
1
2
3
4
5
6
7
Billions
(per day)
writes
async. reads
sync. site reads
Terabytes
50
100
200
250
300
350
storage capacity
Doesn’t predict
business
7
eBay Use Cases on Cassandra
 Time-series data, real-time insights & immediate actions
• Fraud detection & prevention
• Quality Click Pricing for affiliates
• Order & shipment tracking and insights
• Mobile notification logging & tracking
• Cloud CMS change history storage
• RedLaser server logs and analytics
 Server metrics collection for monitoring & alerting
 Taste graph based next-gen recommendation system
 Personalization Data Service
 Social Signals on eBay Product & Item pages
 Milo’s store-item availability inventory (evaluation phase) 8
Real-time insights & actions for
9
Fraud Prevention Reporting
Quality Click Pricing More…
10
System Overview
Business Event Stream
Checkout Shipping Refund & Recoup …
Order placed
(bin/bid)
Paid Shipped Refunded
Rawdata
Simple in-memory aggregations +/
Complex Event Processing +/
Cassandra’s distributed counters
Label printed per day per user
User segmentation for affiliate pricing
Orders per hour, …
Multiple Cassandra clusters
Payment
Actinreal-time
Fraud Prevention
Affiliate Pricing Engine
(eBay Partner Network)
Order tracking
Real-time reporting
…
(Kept from several months to years)
A glimpse on Data Model
11
Historic & real-time insights per user per carrier.
Sudden & drastic change might be suspicious.
User bucketing based on historic
& real-time buying activity.
A glimpse on Data Model
12
Fraud Detection & Prevention
13
Shop with Confidence
System Overview
14
Cassandra
Fraud Detection & Prevention System
Sign-ininfo
Business events
(checkout, sell,…)
StaaSOracle
Checkout Shipping …PaymentSelling
Real-time
Beacons data
Real-time
Insights
Other data
Machine
Learned Models
15
A glimpse on Data Model
Collected at sign-in
& stored as key-value.
Pulled periodically to StaaS for
training machine learned models.
Metrics collection for monitoring & alerting
16
System Overview
17
Transport (HTTP, …)
Scalable NIO
servers based
on Netty
Thousands of
production
machines
Cassandra
Stats for CPU, Memory, Disk, ..
…
agent agent agent agent …
Server Server Server Server Server
In-memory grid (hazelcast) for rollups
A glimpse on Data Model
18
Granular data points
Rolled up metrics
for various time intervals
Taste graph based recommendation system
19
Data Model
20
TasteGraph
TasteVector
50 billion+ edges, 600 million+ writes, 3 billion+ reads, 30TB+ of data on SSD
System Overview
21
Business Event Stream
Recommendation system
Taste GraphTaste Vector
1. Item purchased.
2a. Write purchase edge.
2b. Read other edges for this user & item.
4. Req. recommendations.
5. Finds other items close to
user’s coordinates.
6. Reco. shown to user
More, http://www.slideshare.net/planetcassandra/e-bay-nyc
Real-time Personalization Data Service
22
User performs search using keyword User gets personalized pages based on
implicit/explicit profile
System Overview
23
Personalization Data Service
CacheMesh
(write-back cache)
Heavy writes
eBay site pages (personalized)
Every few mins
in-memory
MySQL
& XMP DB
CassandraOracle
(scaled out) Heavyreads
Cache miss
user profiles
Application SOA services (multiple)
Data
Warehouse
Data Model
24
• Keep column names short.
• Don’t overload one CF with all the data:
- Split hot & cold data in separate CF.
- Splitting & sharding can help compaction.
Static column families
25
Served by
Cassandra
Social Signals
Manage signals via “Your Favorites”
26
Whole page is
served by
Cassandra
More, http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Multi-Datacenter Deployment
27
Topology - NTS
RF - 1:1 or 2:2 or 3:3
Read CL - ONE/QUORUM
Write CL - ONE
Data is backed up periodically
to protect against human or
software error
User request has no datacenter affinity
Non-sticky load balancing
Multi-Datacenter Deployment
Topology - NTS
RF – 1:1:1 or
2:2:2
Lessons & Best Practices
• One size does not fit all
– Use Cassandra for the right use cases.
• Choose proper Replication Factor and Consistency Level
– They alter latency, availability, durability, consistency and cost.
– Cassandra supports tunable consistency, but remember strong consistency is not free.
• Many ways to model data in Cassandra
– The best way depends on your use case and query patterns.
• De-normalize and duplicate for read performance
– But don’t de-normalize if you don’t need to.
http://www.slideshare.net/jaykumarpatel/cassandra-data-modeling-best-practices
29
Are you excited? Come Join Us!
30
Thank You
@pateljay3001
#cassandra13

Cassandra at eBay - Cassandra Summit 2013

  • 1.
    Cassandra @ eBay JayPatel Architect, Platform Systems @pateljay3001
  • 2.
    eBay Marketplaces Thousands ofservers Petabytes of data Billions of SQLs/day24x7x365 99.98+% Availability turning over a TBevery second Multiple Datacenters Near-Real-time Always online 400+ million items for sale $75 billion+ per year in goods are sold on eBay Big Data 112 million active users Billions of page views/day
  • 3.
    3 eBay Site DataInfrastructure Don’t force! One size does not fit all. It’s a mixture of multiple SQL & NoSQL databases. We use the right database for the right problem.
  • 4.
    eBay Site DataInfrastructure A heterogeneous mixture Thousands of nodes > 2K sharded logical host > 16K tables > 27K indexes > 140 billion SQLs/day > 5 PB provisioned Hundreds of nodes Persistent & in-memory > 40 billion SQLs/day 10+ clusters, 100+ nodes > 250 TB provisioned (local HDD + shared SSD) > 9 billion writes/day > 5 billion reads/day Hundreds of nodes > 50 TB > 2 billion ops/day Thousands of nodes The world largest cluster with 2K+ nodes Dozens of nodes
  • 5.
    How do wescale RDBMS?  Shard – Patterns: Modulus, lookup-based, range, etc. – Application sees only logical shard/database  Replicate – Disaster recovery, read availability & read scalability  Big NOs – No transactions – No joins – No referential integrity constraints 5
  • 6.
    Why Cassandra?  Multi-datacenter(active-active)  Always Available - No SPOF  Easy to scale up & down 6  Write performance  Distributed counters  Hadoop support Not replacing RDBMS, but complementing!  Some use cases don’t fit well in RDBMS - sparse data, big data, flexible schema, real-time analytics, …  Many use cases don’t need top-tier set-ups.
  • 7.
    Cassandra Growth Aug,2011 Aug,2012 ay,2013 1 2 3 4 5 6 7 Billions (per day) writes async.reads sync. site reads Terabytes 50 100 200 250 300 350 storage capacity Doesn’t predict business 7
  • 8.
    eBay Use Caseson Cassandra  Time-series data, real-time insights & immediate actions • Fraud detection & prevention • Quality Click Pricing for affiliates • Order & shipment tracking and insights • Mobile notification logging & tracking • Cloud CMS change history storage • RedLaser server logs and analytics  Server metrics collection for monitoring & alerting  Taste graph based next-gen recommendation system  Personalization Data Service  Social Signals on eBay Product & Item pages  Milo’s store-item availability inventory (evaluation phase) 8
  • 9.
    Real-time insights &actions for 9 Fraud Prevention Reporting Quality Click Pricing More…
  • 10.
    10 System Overview Business EventStream Checkout Shipping Refund & Recoup … Order placed (bin/bid) Paid Shipped Refunded Rawdata Simple in-memory aggregations +/ Complex Event Processing +/ Cassandra’s distributed counters Label printed per day per user User segmentation for affiliate pricing Orders per hour, … Multiple Cassandra clusters Payment Actinreal-time Fraud Prevention Affiliate Pricing Engine (eBay Partner Network) Order tracking Real-time reporting … (Kept from several months to years)
  • 11.
    A glimpse onData Model 11 Historic & real-time insights per user per carrier. Sudden & drastic change might be suspicious. User bucketing based on historic & real-time buying activity.
  • 12.
    A glimpse onData Model 12
  • 13.
    Fraud Detection &Prevention 13 Shop with Confidence
  • 14.
    System Overview 14 Cassandra Fraud Detection& Prevention System Sign-ininfo Business events (checkout, sell,…) StaaSOracle Checkout Shipping …PaymentSelling Real-time Beacons data Real-time Insights Other data Machine Learned Models
  • 15.
    15 A glimpse onData Model Collected at sign-in & stored as key-value. Pulled periodically to StaaS for training machine learned models.
  • 16.
    Metrics collection formonitoring & alerting 16
  • 17.
    System Overview 17 Transport (HTTP,…) Scalable NIO servers based on Netty Thousands of production machines Cassandra Stats for CPU, Memory, Disk, .. … agent agent agent agent … Server Server Server Server Server In-memory grid (hazelcast) for rollups
  • 18.
    A glimpse onData Model 18 Granular data points Rolled up metrics for various time intervals
  • 19.
    Taste graph basedrecommendation system 19
  • 20.
    Data Model 20 TasteGraph TasteVector 50 billion+edges, 600 million+ writes, 3 billion+ reads, 30TB+ of data on SSD
  • 21.
    System Overview 21 Business EventStream Recommendation system Taste GraphTaste Vector 1. Item purchased. 2a. Write purchase edge. 2b. Read other edges for this user & item. 4. Req. recommendations. 5. Finds other items close to user’s coordinates. 6. Reco. shown to user More, http://www.slideshare.net/planetcassandra/e-bay-nyc
  • 22.
    Real-time Personalization DataService 22 User performs search using keyword User gets personalized pages based on implicit/explicit profile
  • 23.
    System Overview 23 Personalization DataService CacheMesh (write-back cache) Heavy writes eBay site pages (personalized) Every few mins in-memory MySQL & XMP DB CassandraOracle (scaled out) Heavyreads Cache miss user profiles Application SOA services (multiple) Data Warehouse
  • 24.
    Data Model 24 • Keepcolumn names short. • Don’t overload one CF with all the data: - Split hot & cold data in separate CF. - Splitting & sharding can help compaction. Static column families
  • 25.
  • 26.
    Manage signals via“Your Favorites” 26 Whole page is served by Cassandra More, http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
  • 27.
    Multi-Datacenter Deployment 27 Topology -NTS RF - 1:1 or 2:2 or 3:3 Read CL - ONE/QUORUM Write CL - ONE Data is backed up periodically to protect against human or software error User request has no datacenter affinity Non-sticky load balancing
  • 28.
    Multi-Datacenter Deployment Topology -NTS RF – 1:1:1 or 2:2:2
  • 29.
    Lessons & BestPractices • One size does not fit all – Use Cassandra for the right use cases. • Choose proper Replication Factor and Consistency Level – They alter latency, availability, durability, consistency and cost. – Cassandra supports tunable consistency, but remember strong consistency is not free. • Many ways to model data in Cassandra – The best way depends on your use case and query patterns. • De-normalize and duplicate for read performance – But don’t de-normalize if you don’t need to. http://www.slideshare.net/jaykumarpatel/cassandra-data-modeling-best-practices 29
  • 30.
    Are you excited?Come Join Us! 30 Thank You @pateljay3001 #cassandra13