Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay


Published on

Presenter: Feng Qu, Principal DBA at eBay

Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.

Published in: Technology

Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

  1. 1. Cassandra Best Prac-ces at ebay inc Feng Qu principal database engineer, ebay inc September 11, 2014 CassandraSummit2014 | #CassandraSummit
  2. 2. CassandraSummit2014 | #CassandraSummit Agenda • ebay inc Cassandra footprints • NoSQL life cycle • Cassandra best prac?ces • Q&A
  3. 3. CassandraSummit2014 | #CassandraSummit ebay inc
  4. 4. CassandraSummit2014 | #CassandraSummit ebay inc Database Pla5orms • We manage thousands of databases powering eBay and PayPal
  5. 5. CassandraSummit2014 | #CassandraSummit Why NoSQL? • Challenges of tradi?onal RDBMS • Performance penalty to maintain ACID features • Lack of na?ve sharding and replica?on features • Lack of linear scalability • Cost of soMware/hardware • Higher cost of commit • NoSQL used in eBay inc • Cassandra, Couchbase, MongoDB managed by DBA • HBase, Redis, OpenTSDB managed by developers
  6. 6. CassandraSummit2014 | #CassandraSummit Cassandra @ ebay inc • Started in 2011 at eBay and later expanded to PayPal • Started with Apache Cassandra 0.8, now using Apache Cassandra 2.0 and DataStax Enterprise 4.0 • Over a dozen produc?on clusters on hundreds of servers across 3 data centers • Choices between dedicated cluster for large/cri?cal use case and mul?-­‐tenant cluster for small use cases • Over 20 billions daily reads/writes to Cassandra • Cluster size varies from 4-­‐node to 80-­‐node • 100TB+ user data on HDD, local SSD and SSD array • One cluster is es?mated to grow over few PBs
  7. 7. CassandraSummit2014 | #CassandraSummit Use Case Analysis Data Modeling Capacity NoSQL Life Cycle Operation Deployment Planning
  8. 8. CassandraSummit2014 | #CassandraSummit Data Modeling Phase • Development team requests a review mee?ng for a new use case with data architect • Once data architect understands requirement and then recommends a proper data store. It could be either one of RDBMS or one of NoSQL products we support • Both par?es work on data modeling together • Outputs the engagement are a set of ?ckets, for tracking purpose, which captures project informa?on and data configura?on for chosen data store.
  9. 9. CassandraSummit2014 | #CassandraSummit Data Modeling Best Prac-ces • Unlike tradi?onal RDBMS, data modeling for Cassandra is quite different. • Modeling around query pa_ern, not en?ty • De-­‐normalize to improve read performance • Separate read heavy data from write heavy data • Store values in column names as names are physical sorted already • Former eBay architect Jay Patel published few technical blogs on Cassandra data modeling.
  10. 10. Data Modeling Best Prac-ces -­‐ indexing • Secondary CassandraSummit2014 | #CassandraSummit index + Less overhead as built in + data and index are changed atomically -­‐ not scale well with high cardinality data • Column family as index + No hot spot -­‐ index is maintained manually by applica?on -­‐ index change is not atomically • Avoid secondary index and use column family as index if possible
  11. 11. CassandraSummit2014 | #CassandraSummit Benchmark Tes-ng • Benchmark tes?ng is key to capacity planning • Performance baseline with near-­‐real traffic in produc?on size environment • for different type of hardware • for different soMware release • for different use case or workload • A proac?ve and repe??ve process
  12. 12. CassandraSummit2014 | #CassandraSummit Capacity Planning Phase • Is key to avoid surprise in produc?on • The concept behind capacity planning is simple, but the mechanics are harder. • Business requirements may increase, need to forecast how much resource must be added to the system to ensure that user experience con?nues uninterrupted • Input: clearly defined capacity goal coming from business requirement and performance baseline from benchmark test • Output: Iden?fy resources to be added, such as memory, CPU, storage, I/O, network • Always prepare for peak + headroom
  13. 13. CassandraSummit2014 | #CassandraSummit Deployment Best Prac-ces • SoMware packages with customized op?miza?on • kernel, JVM heap, compac?on • Deployment automa?on for efficiency • Mul? data center deployment for load balancing and disaster recovery • Vnode is a must for manageability • SSD as default storage requires addi?onal OS level tuning
  14. 14. CassandraSummit2014 | #CassandraSummit Opera-on Best Prac-ces • Collect system and database metrics • Monitoring and aler?ng • event driven and metrics driven alerts • Opera?on runbook • Reduce human error • Performance tuning runbook • nodetool tpstats for dropped requests • nodetool cdistograms for latency distribu?on • Troubleshoo?ng runbook • Document previous incidents as future reference
  15. 15. CassandraSummit2014 | #CassandraSummit Opera-on Best Prac-ces • Rou?ne repair is not really needed if there is no deletes. You s?ll need run repair aMer bringing up a down node if it is dead for a while • Use CNAME in client configura?on to avoid client conf change in case of hardware replacement with new IP/ name • Reduce gc_grace to reduce overall data size • Disable row cache, unless you have <100K rows • Collect sta?s?cs, real-­‐?me or historical, to monitor overall system performance • Disable swap to avoid a slow node
  16. 16. CassandraSummit2014 | #CassandraSummit Capacity Review • Rou?ne capacity review and adjustment • When to scale up and when to scale out • In general, scale out by adding nodes to increase capacity with NoSQL • Some?mes, it’s cost efficient to scale up at component level by iden?fying scaling bo_leneck, then resolve it accordingly • Network bandwidth: upgrade to 10 Gbps network • I/O latency: upgrade to (be_er) SSD • Storage: add/expand data volume
  17. 17. CassandraSummit2014 | #CassandraSummit Typical Use Cases • Write Intensive: metrics collec?on, logging • Collec?ng metrics from tens of thousands devices periodically • Read Intensive: home page feeds • Recommenda?on backend to generate dynamic taste graph • Mixed workload: personaliza?on, classifica?on • Data is loaded from data warehouse periodically in bulk and from user events consistently • Data is retrieved in real ?me when user visits ebay site
  18. 18. CassandraSummit2014 | #CassandraSummit Metrics Collec-on Applica-on
  19. 19. CassandraSummit2014 | #CassandraSummit The End • We are hiring for NoSQL talent. • Contact: • • • Q&A