SlideShare a Scribd company logo
1 of 15
Download to read offline
Feng Qu, Sr MTS
Bass Chorng, Principal Capacity Engineer
DB Capacity Planning at eBay
#CassandraSummit2015	
  	
  
Who Am I?
#CassandraSummit2015 2
Bass Chorng – Principal Capacity Engineer @
eBay
Specializes in database performance, availability
& scalability in a large website.
Established DB capacity team at eBay in 2003.
Loves mountain biking.
#CassandraSummit2015	
  	
  
	
  
eBay Site DB Traffic At A Glance
NoSQL Total – 52 B/Day
Cassandra – 15 B
Mongo – 15 B
CouchBase – 12 B
PushVM – 10B
RDBMS Total – 350 B
MySQL – 10 B
Oracle – 340 B
Peak Traffic – 8M/sec
Site Total DB Calls – 400B/Day across 2,000 NoSQL Nodes + 450 Oracle Nodes
Hosting 800M Active items & 120M Active Users
Y-o-Y Growth – 30% ~ 35%
15 15 12 10
10
340
Billion SQL Calls per Day
Cassandra
Mongo
CouchBase
PushVM
MySQL
Oracle
Capacity Planning - Simply Put
Ø  Analyze Traffic
o  Data
Ø  Analyze Utilization
o  Data
Ø  Analyze The Relationship Of The Above Two
o  Same Data
Ø  Forecast Growth
o  Simple Models, Then Impress Your Boss.
Ø  Convert Resource Need into $
o  A Calculator, Then Impress Your CIO’s
BTW, You Also Need To Know …
•  Platform Domain Knowledge – Server, DB Engine, IO Subsystem, Networks …
•  Relationship Between System Overhead & Utilization
•  Seasonality & Workload Characteristics
•  Bottlenecks – Components, Systems, Platforms, Architecture, Site & Apps
•  New Technologies
#CassandraSummit2015 4
Domain Knowledge Stack
#CassandraSummit2015 5
APPS
DB
UNIX
STORAGE
C
A
P
A
C
I
T
Y
C
A
P
A
C
I
T
Y
aka Whom To Blame Stack
Bottom of food chain =>
Data
Ø What To Collect?
Apps, Database, Sessions, CPU, Memory, Connections, IOPS,
IO Time, NIC, HBA, Array
Ø How To Collect?
Time Resolution, Aggregation Level, Retention
Ø How To Use It?
Average, Max, 95th percentile, Dashboard, Reporting, Trending
#CassandraSummit2015 6
0.0
1.0
2.0
3.0
4.0
5/1/2015
5/2/2015
5/3/2015
5/4/2015
5/5/2015
5/6/2015
5/7/2015
5/8/2015
5/10/2015
5/11/2015
5/12/2015
5/13/2015
5/14/2015
5/15/2015
5/16/2015
5/17/2015
5/19/2015
5/20/2015
5/21/2015
5/22/2015
5/23/2015
5/24/2015
5/25/2015
5/26/2015
5/27/20150
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
1/26/2015
1/28/2015
1/30/2015
2/1/2015
2/3/2015
2/5/2015
2/7/2015
2/9/2015
2/11/2015
2/13/2015
2/15/2015
2/17/2015
2/19/2015
2/21/2015
2/23/2015
2/25/2015
2/27/2015
3/1/2015
Forecast
Ø Model Traffic, Not Resources
Ø Need One Year Trend
Ø Forecast At Daily Level
Ø Eliminate Outliers
Ø No Data Is Better Than Wrong Data
Ø Convert Traffic To Resource Usage
Ø Linear Extrapolation Only (CPU Utilization, not IO Time)
Ø Simple Excel Formula Works Well
Ø For Long Term Resource Planning Only
Ø Use Average, Not Max
Ø Not All Workloads Are Predictable
#CassandraSummit2015
7
0
10
20
30
40
50
60
70
01/01/2012 01/01/2013 01/01/2014 01/01/2015
Billion
Calls
CATY Traffic Forecast
Forecast Actual Capacity
Things To Watch For
Myths
Ø More CPU Makes Apps Run Faster
Ø More Data Makes Apps Run Slower
Ø Apps Run Twice As Fast On CPU Twice The Speed
Ø High Session = High Load
Pitfalls
Ø Cause VS. Symptom
Ø Time Resolution Masks Issues
Ø Look At The Whole Picture
Ø Slow Down In Order To Go Faster < Throttle >
Challenges
Ø Data Quality – Data Missing, Data Source Changes, F/O Data Residency, Data Errors …
Ø Varieties of Data Formats & Resolutions
Ø Data Collection In Secured Zones
#CassandraSummit2015
8
Me: Everything NoSQL
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Ø Prior to 2011: Worked on Oracle at DoubleClick/Yahoo/Intuit
Ø Worked on NoSQL at eBay Database Infrastructure team:
Ø Cassandra since 2011
Ø MongoDB since 2012
Ø Couchbase since 2014
Ø Cassandra Summit speaker for 2013, 2014, 2015
Ø DataStax Cassandra MVP for 2014, 2015
For Cassandra
Ø Capacity Measurements
Ø Throughput
Ø Latency
Ø E.g. 30,000 reads/sec with SLA of P99 at 5ms
Ø Hardware SKU Example
Ø CPU: 20 cores
Ø Memory: 128GB RAM
Ø Storage: 1.5TB local SSD
Ø Network: 10g NIC
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Benchmarking
Ø Benchmarking for different hardware
Ø High I/O SKU
Ø High memory SKU
Ø High storage SKU
Ø Bare metal or cloud
Ø Benchmarking for different software releases
Ø Benchmarking for different workloads
Ø  100% Writes
Ø  50% Writes, 50% Reads
Ø  5% Writes, 95% Reads
Ø  100% Reads
Ø Benchmarking Tools
Ø YCSB
Ø Cassandra-stress
Ø Proactive and repeated process using near real-time traffic in prod like environment
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Capacity Planning
Ø Key to avoid surprise in production
Ø The concept behind capacity planning is simple, but the mechanics are harder.
Ø Business requirements may increase, need to forecast how much resource must be
added to the system to ensure that user experience continues uninterrupted
Ø  Input: clearly defined capacity goal coming from business requirement and performance baseline
from benchmark test
Ø  Output: Identify resources to be added, such as memory, CPU, storage, I/O, network
Ø Always prepare for peak + headroom
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Capacity Planning Process
Ø Initial Sizing
Ø Storage size vs. data size
Ø Compaction overhead, compression ratio, RF, indexes
Ø Cost-effective configuration to meet capacpity/latency SLA
Ø Routine Review
Ø System utilization on I/O, storage, network, CPU, memory etc
Ø Cassandra metrics on GC, compaction, latency, throughput etc
Ø Compactionstats, cfhistoralgrams, tpstats etc
Ø Forecasting
Ø Historical comparison
Ø Traffic projection
Ø Flex up or Flex down
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Scale Up vs. Scale Out
Ø Scale Up(vertical)
Ø  Pros
Ø Smaller data center footprint, such as space, power, cooling
Ø Less license cost
Ø  Cons
Ø Likely cost more using proprietary hardware
Ø Less fault tolerant
Ø Limited upgradability in future
Ø Scale Out(horizontal)
Ø  Pros
Ø Cheaper using commodity hardware
Ø More fault tolerant
Ø (unlimited) upgradability
Ø  Cons
Ø Bigger data center footprint
Ø More license cost
Ø Likely need more network equipment
CassandraSummit2015	
  |	
  #CassandraSummit	
  
Questions ?
CassandraSummit2015	
  |	
  #CassandraSummit	
  
eBay is hiring experienced NoSQL professionals, please send resume to fengqu@ebay.com

More Related Content

What's hot

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
DataStax
 

What's hot (20)

Performance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxPerformance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. Datastax
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application DeveloperScylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleHow Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
 
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia ForwardScylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
 
Fast dataarchitecture
Fast dataarchitectureFast dataarchitecture
Fast dataarchitecture
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
 
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
 
Using ScyllaDB with JanusGraph for Cyber Security
Using ScyllaDB with JanusGraph for Cyber SecurityUsing ScyllaDB with JanusGraph for Cyber Security
Using ScyllaDB with JanusGraph for Cyber Security
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streaming
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B Files
 

Viewers also liked

Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
Takahiro Inoue
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
MongoDB
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
MongoDB
 

Viewers also liked (14)

ebay
ebayebay
ebay
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
Artigo Nosql
Artigo NosqlArtigo Nosql
Artigo Nosql
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
NOSQL uma breve introdução
NOSQL uma breve introduçãoNOSQL uma breve introdução
NOSQL uma breve introdução
 
eBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLeBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQL
 
No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodb
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action:
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 

Similar to Ebay: DB Capacity planning at eBay

Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
Skillspeed
 
Data flow in the data center
Data flow in the data centerData flow in the data center
Data flow in the data center
Adam Cataldo
 

Similar to Ebay: DB Capacity planning at eBay (20)

Building Products Quantitatively
Building Products QuantitativelyBuilding Products Quantitatively
Building Products Quantitatively
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Predicting When Your Applications Will Go Off the Rails! Managing DB2 Appli...
Predicting When Your Applications Will Go Off the Rails!  Managing DB2 Appli...Predicting When Your Applications Will Go Off the Rails!  Managing DB2 Appli...
Predicting When Your Applications Will Go Off the Rails! Managing DB2 Appli...
 
AWS Webcast - Journey through the Cloud - Cost Optimization
AWS Webcast - Journey through the Cloud - Cost OptimizationAWS Webcast - Journey through the Cloud - Cost Optimization
AWS Webcast - Journey through the Cloud - Cost Optimization
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
 
Tech Talk: Five Simple Steps to a More Powerful Database Experience
Tech Talk: Five Simple Steps to a More Powerful Database ExperienceTech Talk: Five Simple Steps to a More Powerful Database Experience
Tech Talk: Five Simple Steps to a More Powerful Database Experience
 
Sybase ASE 15.7- Two Case Studies of Successful Migration
Sybase ASE 15.7- Two Case Studies of Successful Migration Sybase ASE 15.7- Two Case Studies of Successful Migration
Sybase ASE 15.7- Two Case Studies of Successful Migration
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
CA Performance Management 2.6 Deep Dive
CA Performance Management 2.6 Deep DiveCA Performance Management 2.6 Deep Dive
CA Performance Management 2.6 Deep Dive
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with Snowplow
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
 
MeasureWorks - The Art of Staying Fast
MeasureWorks - The Art of Staying FastMeasureWorks - The Art of Staying Fast
MeasureWorks - The Art of Staying Fast
 
Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...
Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...
Infrastructure Performance Management: Flexibility Combining Breadth, Depth ...
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16
 
The Cloud - What's different
The Cloud - What's differentThe Cloud - What's different
The Cloud - What's different
 
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
Stop the Blame Game with Increased Visibility of your Mobile-to-Mainframe IT ...
 
Data flow in the data center
Data flow in the data centerData flow in the data center
Data flow in the data center
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Ebay: DB Capacity planning at eBay

  • 1. Feng Qu, Sr MTS Bass Chorng, Principal Capacity Engineer DB Capacity Planning at eBay #CassandraSummit2015    
  • 2. Who Am I? #CassandraSummit2015 2 Bass Chorng – Principal Capacity Engineer @ eBay Specializes in database performance, availability & scalability in a large website. Established DB capacity team at eBay in 2003. Loves mountain biking.
  • 3. #CassandraSummit2015       eBay Site DB Traffic At A Glance NoSQL Total – 52 B/Day Cassandra – 15 B Mongo – 15 B CouchBase – 12 B PushVM – 10B RDBMS Total – 350 B MySQL – 10 B Oracle – 340 B Peak Traffic – 8M/sec Site Total DB Calls – 400B/Day across 2,000 NoSQL Nodes + 450 Oracle Nodes Hosting 800M Active items & 120M Active Users Y-o-Y Growth – 30% ~ 35% 15 15 12 10 10 340 Billion SQL Calls per Day Cassandra Mongo CouchBase PushVM MySQL Oracle
  • 4. Capacity Planning - Simply Put Ø  Analyze Traffic o  Data Ø  Analyze Utilization o  Data Ø  Analyze The Relationship Of The Above Two o  Same Data Ø  Forecast Growth o  Simple Models, Then Impress Your Boss. Ø  Convert Resource Need into $ o  A Calculator, Then Impress Your CIO’s BTW, You Also Need To Know … •  Platform Domain Knowledge – Server, DB Engine, IO Subsystem, Networks … •  Relationship Between System Overhead & Utilization •  Seasonality & Workload Characteristics •  Bottlenecks – Components, Systems, Platforms, Architecture, Site & Apps •  New Technologies #CassandraSummit2015 4
  • 5. Domain Knowledge Stack #CassandraSummit2015 5 APPS DB UNIX STORAGE C A P A C I T Y C A P A C I T Y aka Whom To Blame Stack Bottom of food chain =>
  • 6. Data Ø What To Collect? Apps, Database, Sessions, CPU, Memory, Connections, IOPS, IO Time, NIC, HBA, Array Ø How To Collect? Time Resolution, Aggregation Level, Retention Ø How To Use It? Average, Max, 95th percentile, Dashboard, Reporting, Trending #CassandraSummit2015 6 0.0 1.0 2.0 3.0 4.0 5/1/2015 5/2/2015 5/3/2015 5/4/2015 5/5/2015 5/6/2015 5/7/2015 5/8/2015 5/10/2015 5/11/2015 5/12/2015 5/13/2015 5/14/2015 5/15/2015 5/16/2015 5/17/2015 5/19/2015 5/20/2015 5/21/2015 5/22/2015 5/23/2015 5/24/2015 5/25/2015 5/26/2015 5/27/20150 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 1/26/2015 1/28/2015 1/30/2015 2/1/2015 2/3/2015 2/5/2015 2/7/2015 2/9/2015 2/11/2015 2/13/2015 2/15/2015 2/17/2015 2/19/2015 2/21/2015 2/23/2015 2/25/2015 2/27/2015 3/1/2015
  • 7. Forecast Ø Model Traffic, Not Resources Ø Need One Year Trend Ø Forecast At Daily Level Ø Eliminate Outliers Ø No Data Is Better Than Wrong Data Ø Convert Traffic To Resource Usage Ø Linear Extrapolation Only (CPU Utilization, not IO Time) Ø Simple Excel Formula Works Well Ø For Long Term Resource Planning Only Ø Use Average, Not Max Ø Not All Workloads Are Predictable #CassandraSummit2015 7 0 10 20 30 40 50 60 70 01/01/2012 01/01/2013 01/01/2014 01/01/2015 Billion Calls CATY Traffic Forecast Forecast Actual Capacity
  • 8. Things To Watch For Myths Ø More CPU Makes Apps Run Faster Ø More Data Makes Apps Run Slower Ø Apps Run Twice As Fast On CPU Twice The Speed Ø High Session = High Load Pitfalls Ø Cause VS. Symptom Ø Time Resolution Masks Issues Ø Look At The Whole Picture Ø Slow Down In Order To Go Faster < Throttle > Challenges Ø Data Quality – Data Missing, Data Source Changes, F/O Data Residency, Data Errors … Ø Varieties of Data Formats & Resolutions Ø Data Collection In Secured Zones #CassandraSummit2015 8
  • 9. Me: Everything NoSQL CassandraSummit2015  |  #CassandraSummit   Ø Prior to 2011: Worked on Oracle at DoubleClick/Yahoo/Intuit Ø Worked on NoSQL at eBay Database Infrastructure team: Ø Cassandra since 2011 Ø MongoDB since 2012 Ø Couchbase since 2014 Ø Cassandra Summit speaker for 2013, 2014, 2015 Ø DataStax Cassandra MVP for 2014, 2015
  • 10. For Cassandra Ø Capacity Measurements Ø Throughput Ø Latency Ø E.g. 30,000 reads/sec with SLA of P99 at 5ms Ø Hardware SKU Example Ø CPU: 20 cores Ø Memory: 128GB RAM Ø Storage: 1.5TB local SSD Ø Network: 10g NIC CassandraSummit2015  |  #CassandraSummit  
  • 11. Benchmarking Ø Benchmarking for different hardware Ø High I/O SKU Ø High memory SKU Ø High storage SKU Ø Bare metal or cloud Ø Benchmarking for different software releases Ø Benchmarking for different workloads Ø  100% Writes Ø  50% Writes, 50% Reads Ø  5% Writes, 95% Reads Ø  100% Reads Ø Benchmarking Tools Ø YCSB Ø Cassandra-stress Ø Proactive and repeated process using near real-time traffic in prod like environment CassandraSummit2015  |  #CassandraSummit  
  • 12. Capacity Planning Ø Key to avoid surprise in production Ø The concept behind capacity planning is simple, but the mechanics are harder. Ø Business requirements may increase, need to forecast how much resource must be added to the system to ensure that user experience continues uninterrupted Ø  Input: clearly defined capacity goal coming from business requirement and performance baseline from benchmark test Ø  Output: Identify resources to be added, such as memory, CPU, storage, I/O, network Ø Always prepare for peak + headroom CassandraSummit2015  |  #CassandraSummit  
  • 13. Capacity Planning Process Ø Initial Sizing Ø Storage size vs. data size Ø Compaction overhead, compression ratio, RF, indexes Ø Cost-effective configuration to meet capacpity/latency SLA Ø Routine Review Ø System utilization on I/O, storage, network, CPU, memory etc Ø Cassandra metrics on GC, compaction, latency, throughput etc Ø Compactionstats, cfhistoralgrams, tpstats etc Ø Forecasting Ø Historical comparison Ø Traffic projection Ø Flex up or Flex down CassandraSummit2015  |  #CassandraSummit  
  • 14. Scale Up vs. Scale Out Ø Scale Up(vertical) Ø  Pros Ø Smaller data center footprint, such as space, power, cooling Ø Less license cost Ø  Cons Ø Likely cost more using proprietary hardware Ø Less fault tolerant Ø Limited upgradability in future Ø Scale Out(horizontal) Ø  Pros Ø Cheaper using commodity hardware Ø More fault tolerant Ø (unlimited) upgradability Ø  Cons Ø Bigger data center footprint Ø More license cost Ø Likely need more network equipment CassandraSummit2015  |  #CassandraSummit  
  • 15. Questions ? CassandraSummit2015  |  #CassandraSummit   eBay is hiring experienced NoSQL professionals, please send resume to fengqu@ebay.com