Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Couchbase Architecture and IT Cost Management at UPS – Connect Silicon Valley 2018


Published on

Speaker: Mike Ryder, Data Architect, UPS

Demanding requirements, the right technology to meet them – everything is coming together, but… what if the solution costs so much that it starts eroding the business case? How can a more economical solution be found? The one with fewer Couchbase nodes, yet matching or exceeding the capabilities of a larger cluster? The authors share a practical experience in Couchbase creative economics. Oh, and don’t take our word for it – you can experience the result on your mobile device, before you leave the room!

Key points
1) UPS use case + demo
2) Presenting a problem of node sprawl
3) Is all document content created equal? Indexing vs. payload
4) Compression and sharding techniques
5) Wins
6) Couchbase is evolving … feature wishlist to help contain node sprawl
7) Lessons learned

Published in: Technology
  • My personal experience with research paper writing services was highly positive. I sent a request to ⇒ ⇐ and found a writer within a few minutes. Because I had to move house and I literally didn’t have any time to sit on a computer for many hours every evening. Thankfully, the writer I chose followed my instructions to the letter. I know we can all write essays ourselves. For those in the same situation I was in, I recommend ⇒ ⇐.
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Couchbase Architecture and IT Cost Management at UPS – Connect Silicon Valley 2018

  1. 1. Friend or Foe? Couchbase Architecture and IT Cost Management A Case Study in Controlling Node Sprawl. Konstantin Tadenev, UPS Mike Ryder, UPS
  2. 2. What are we going to cover today?  The use case at UPS  Problems associated with node sprawl  The architectural choices made and why  Wins  Features we’d like to see added  Lessons learned 2
  3. 3. Package tracking use case - Demo 3
  4. 4. Quality of Service Requirements  Changing state and serving inquiries for billions of packages  Top performance  Near-linear scalability  Flexibility in supporting new requirements  Billions of documents  Tens of terabytes of data  Tens of thousands of operations per second  30+ million document inserts and updates per day 4
  5. 5. Why Couchbase (three generations – tracking evolution)?  Generation 1 – traditional RDBMS storage and retrieval  Legacy, but better solutions available Network Client Database Application 5
  6. 6. Why Couchbase (three generations – tracking evolution)?  Generation 2 – addition of caching  Legacy relational with non-persistent elastic caching array Network Client Database Write-aside Cache Application 6
  7. 7. Why Couchbase (three generations – tracking evolution)?  Generation 3 – persistent caching with Couchbase  The data sleeps in relational databases, the data plays in Couchbase  The relational databases represent the system of record; Couchbase – system of engagement Network Client Couchbase Publish/Subscribe Source 7
  8. 8. What Drove the Number of Nodes?  Optimal Couchbase node size depends on the percentage of data cached. The following is true up to Couchbase 5.0  In maintenance operations, such as rebalance, If cached data >= 90% of all data, then the data is streamed directly from memory to the target  If cached data < 90%, then the data is paged from storage into memory before being sent to the target. This paging background process is referred to as backfills in Couchbase  Backfills are most stable, when nodes have < 2 TB of data, and when at least 15% of that data fits in memory  15% of 2TB is ~307GB, which conservatively maps to the typical 256GB RAM configuration  In our case cached data was below 90%, which led to limiting each node to 256GB of RAM.  The smaller the RAM per node, the more nodes are needed, even if the compute is over-allocated 8
  9. 9. Ripple Effects of Growing Number of Nodes in a Cluster  More nodes  higher probability of concurrent failures (servers, network, etc.)  more replicas are needed to safeguard against them  even more nodes  Typically clusters with fewer than 10 nodes can be safely operated with one replica. It is a good practice to configure larger clusters with a greater number of replicas 9
  10. 10. Not all document content is equal  Few elements are used as search predicates  The remaining elements are not interrogated, they are written and read as payload only { “recordReference": "testRecordResponse" }, “movement": { “senderNumber": "75AXXX", “shipmentUnitNumber": "1Z75A1E303662XXXXXX", “collectionDate": "20180123", “serviceName": "UPS Super", “serviceCode": "509", "billType": "P/P", "": "Prepaid (aka PRE)", "declaredBillTypeText": { "declaredValueFlag": "" }, "inquiryID": { "code": “81", "value": "1Z75A1E30366XXXXXX" }, "invoice": {}, “itemCounts": { "originalItemCount": "1", "expectedItemCount": "1", "actualItemCount": "1", "voidItemCount": "0" }, "referenceValue": {}, "service": {}, “movementType": { "code": "01", …. } 10
  11. 11. UPS Solution  Data size + Couchbase guidelines resulted in ~ 100 nodes  The compression idea:  Compress “payload” prior to ingestion into Couchbase and treat it as key-value pair. Note that the storage-side Snappy (Couchbase 5.0) compression is not useful for in- memory operations  Maintain all search predicates in indexable, uncompressed documents  The sharding idea:  Logically shard the data across multiple physical clusters as to maintain one-replica configuration (#shards >= #clusters)  Configure the clusters to be flexible for data growth  Limit the number of nodes in each cluster even with future data growth  Retain multiple smaller clusters to afford better maintainability  Balance workloads by moving shards among clusters 11
  12. 12. Wins Reduction in the number of nodes from ~100 to ~20 Other benefits:  Data compression ~80%  Parallel maintenance activities can be performed  Replica count kept to one  Compressed documents reduce infrastructure footprint  Sharding provides for growth without adding replicas or compromising maintainability 12
  13. 13. Features we’d like to see (1 of 3)  Native in-memory compression  UPS requested this capability in May 2017  Couchbase 5.5 introduced this feature in July 2018  Using the SNAPPY library for compression  XDCR traffic is also compressed  UPS would like to see further improvements with:  Better compression ratio  While maintaining high performance 13
  14. 14. Features we’d like to see (2 of 3)  SDK to manage sharding across multiple clusters  UPS submitted requirements for client-side distribution of KV data across clusters, similar to how data is distributed over vbuckets within a cluster, as well as an SDK-level scatter/gather implementation for View and N1QL queries. Couchbase request JDBC- 1084  Couchbase has not yet targeted this feature for a release 14
  15. 15. Features we’d like to see (3 of 3)  Increased node density  Customer requirements exist to enable Couchbase nodes with 1% or less in RAM residency. One of the critical factors in this scenario is rebalance speed and stability, which must be improved substantially to achieve higher node density. Couchbase request MB-23243  UPS also would like to see improvements in rebalance speed at the RAM residency levels of 15-25%.  Couchbase has not yet targeted this feature for a release 15
  16. 16. Lessons learned  In order to optimize a Couchbase cluster one needs to consider  Quality of Service requirements (e.g., data size, performance, responsiveness, etc.)  Couchbase best practices (e.g., RAM per node, number of replicas in relation to number of nodes, etc.)  Deployment considerations may drive Couchbase node sprawl:  Limiting RAM per node leads to more nodes  As the number of nodes grows, so does the optimal number of replicas, driving the number of nodes even higher  Number of nodes affects the cost of the solution, which in turn may erode the business case  Possible answers to node sprawl:  Compression  Sharding across multiple clusters 16
  17. 17. Question & Comments 17