Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

eBay: From here to there: our journey to 1000s of nodes – Couchbase Connect 2016

540 views

Published on

eBay has one of the largest Couchbase deployments in the world. Over last few years, we have deployed thousands of nodes with multi-petabytes of data in multiple data centers. It powers many critical applications which are accessed by hundreds of millions of eBay users every day. In this talk, I will discuss our Couchbase journey, how we manage thousands of nodes with limited resources by automating admin tasks, and share our experience and learnings on how to build a resilient NoSQL platform to support mission-critical applications.

Published in: Software
  • Be the first to comment

  • Be the first to like this

eBay: From here to there: our journey to 1000s of nodes – Couchbase Connect 2016

  1. 1. Feng Qu Sr MTS eBay Database Infrastructure From here to there: Our journey to 1000s of nodes Couchbase Connect 2016
  2. 2. Couchbase Connect 2016 Feng Qu - Sr MTS in eBay DBA Team • Have worked on Oracle since early 1990s • Have worked on Cassandra, MongoDB and Couchbase since 2011 • Led company wide NoSQL projects • 2014 and 2015 DataStax Cassandra MVP • Speaker at 2013, 2014 and 2015 Cassandra Sumit • Speaker at EDW 2016 • Speaker at NoCoug 2016
  3. 3. Couchbase Connect 2016 eBay At A Glance Active Listings 1.1B Active Users 164M Total DB Calls 610B/day Y-o-Y Growth 30%-35% Total DB Servers 4000+ Peak DB Calls 15M/sec RDBMS Calls 500B/day NoSQL Calls 110B/day
  4. 4. Couchbase Connect 2016 Challenges of Traditional RDBMS • Challenges • Performance penalty to maintain ACID features • Lack of native sharding and replication features • Cost of software/hardware • Higher cost of commit
  5. 5. Couchbase Connect 2016 Different Databases Serve Different Purposes
  6. 6. Couchbase Connect 2016 NoSQL Databases Pros and Cons •Geo distributed replication & sharding •Location aware low latency query performance •Workload & access pattern optimized •Linear scalability with reduced disruption to business •Supports semi-structured or un-structured data •Flexible schema provides significant increase in Dev agility •Lack strict ACID compliant transaction •Lack strong data model control & governance •Not suitable for ad-hoc workload & random access pattern •Requires change of mindset, ecosystem and infrastructure •Rapidly changing technology & competitive landscape •Requires Dev expertise in nuances of distributed system
  7. 7. Couchbase Connect 2016 MongoDB Pros and Cons • Dev friendly rich JSON document model • Secondary index enables mixed access patterns • High business value (semi-) structure data • Balanced scale-out reads & writes (with optional sharding) • Straightforward admin effort • Short write interruption during primary re-election • Not suitable for nanosecond latency writing • Potentially high TCO for large scale sharded cluster • Lack resource isolation
  8. 8. Couchbase Connect 2016 Cassandra Pros and Cons • Peer-to-peer without SPOF (Single Point of Failure) • Active-active cross Datacenter • High read & very high write performance • Absolute linear scalability • Inefficient secondary index (pre-V3) • Not suitable for mixed user query & access patterns • High compaction overhead for frequent random deletes • Require JVM tuning to mitigate GC pauses • Lack resource isolation • Slow cluster rebalancing
  9. 9. Couchbase Connect 2016 Couchbase Pros and Cons • Memcached compatible persistent document store • Peer-to-peer architecture • High read & write performance • Active-active cluster replication • Strong local cluster RW consistency • Resource isolation • Short write interruption during node failover • Counter intuitive cross DC write conflict resolution (pre V4.6) • Slow cluster rebalancing • Slow warm-up
  10. 10. Couchbase Connect 2016 NoSQL Footprints at eBay Besides Oracle/MySQL, we also have • Cassandra • Couchbase • MongoDB • HBase • Memcached • Neo4j • OpenTSDB • Redis • …
  11. 11. Couchbase Connect 2016 Why Couchbase? • Memcached compatible persistent caching • Elastic scalability • High RW performance & throughput • Active-active bi-directional XDCR • High local cluster RW consistency • Flexible document model • Development agility • SQL integration • And more…
  12. 12. Couchbase Connect 2016 Environment • Support both dedicated & multi-tenant clusters • Couchbase Enterprise 3.1/4.5 running on BM & VM • High I/O flavor • High memory flavor • High storage flavor • Customized RPM • Customized to suit for eBay environment • Easy to install/upgrade, easy to maintain and ensure deployment consistency across board and easy to identify deployment difference • Built in pre-defined tuning parameters when needed • Homegrown client wrapper for central application logging and reporting • QA/LnP/PreProd/Prod environments
  13. 13. Couchbase Connect 2016 Couchbase Onboarding Process Understand product limitation - Avoid known anti-pattern and look beyond generic use case NoSQL product evaluation & selection - Business & Technology perspective - Product selection flowchart & detailed scoring card Data modeling, POC with LnP, failover & DR testing - Review test result, re-evaluate initial assumptions Capacity planning and provisioning
  14. 14. Couchbase Connect 2016 eBay Couchbase At A Glance Total Clusters 120 Total Servers 1400 Couchbase Calls 80B/day Y-o-Y Growth >100% Total Data Size 90TB Total Documents 60B Peak Sets/Cluster 800,000/sec Peak Gets/Cluster 1,200,000/sec
  15. 15. Couchbase Connect 2016 Typical Use Cases • Write Intensive • user session tracking • 13 billion writes per day • Read Intensive • email notification • 4 billion reads per day • Mixed workload • Central monitoring platform where metrics collected for hundreds of thousands of devices real time • 2 billion writes per day • 10 billion reads per day
  16. 16. Couchbase Connect 2016 Global User Preference • Global repository with streamlined service to managing world-wide user preferences which come from Data Warehouse • Seller advertising • Member communication • User account setting • Notification preferences, etc.
  17. 17. Couchbase Connect 2016 Central Monitoring and Alerting Entire eBay site monitoring system is built on Couchbase! • We have 2 set of clusters(active/passive) A(3 DC) and B(3 DC) for upgrade/patch
  18. 18. Couchbase Connect 2016 Elastic Scalability • Benchmarking • performance baseline for new hardware, new software release • Enforce full scale testing in dedicated LnP env before going to production • In general, scale out by adding more nodes to increase throughput or reduce latency • Sometimes, it’s cost-efficient to scale up at component level by Identifying scaling bottleneck, then resolve it accordingly • Scale up(vertical) • Smaller data center footprint, such as space, power, cooling • Less license cost • Scale out(horizontal) • Cheaper using commodity hardware • More fault tolerant • (Unlimited) upgradability
  19. 19. Couchbase Connect 2016 Couchbase Learning Experience • Lack of always available writes • Application option to write to remote DC when local write fails • Cross DC update conflict resolution • Unpredictable behavior but new features in v4.6 solve this • Metadata memory overhead • 56 bytes metadata is too much if you have a small key • Memory fragmentation • CB 4.x replaces TCMalloc with jemalloc libraries • Slow rebalance • Using swap rebalance when applicable • Slow warm-up • Remove access log to speed up warm up • 10 bucket limits not working well for shared QA env
  20. 20. Couchbase Connect 2016 Couchbase Wish List • We like to see • Point-in-time recovery so we can store SOR data • Global admin console to manage multi clusters • Smaller meta data to reduce memory requirement • Robust rebalance • Lazy warmup so failed node can join quicker • Simplified XDCR/Compaction tuning • One log, just one log
  21. 21. Couchbase Connect 2016 Questions ? eBay is hiring experienced NoSQL professionals, please send resume to fengqu@ebay.com

×