Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CouchConf Tokyo Couchbase in Production

1,467 views

Published on

Published in: Technology
  • Be the first to comment

CouchConf Tokyo Couchbase in Production

  1. 1. Couchbase Server in Production Perry Krug Sr. Solutions Architect 1
  2. 2. Typical Couchbase production environment Application users Load Balancer Application Servers Servers 2
  3. 3. We’ll focus on App-Couchbase interaction … Application users Load Balancer Application Servers Servers 3
  4. 4. … at each step of the application lifecycle Dev/Test Size Deploy Monitor Manage 4
  5. 5. KEY CONCEPTS 5
  6. 6. Reading, Writing and Arithmetic Reading Data Writing Data Application Server Application Server Give me Please store document A A document A Here is A OK, I stored document A document A Server Server (We’ll save the arithmetic for the sizing section : ) 6
  7. 7. Reading data Application Server Give me document A Here is document A If document A is in memory A RAM return document A to the application Else A add document to read queue reader eventually loads document from disk into memory return document A to the application DISK Server Reading Data 7
  8. 8. Keeping working data set in RAM is key to read performance Your application’s working set should fit in RAM… … or else! (because you don’t want the “else” part happening very often – it is MUCH slower than a memory read and you could have to wait in line an indeterminate amount of time for the read to happen.) Reading Data 8
  9. 9. Working set ratio depends on your application working/total set = .01 working/total set = .33 working/total set = 1 Server Server Server Late stage social game Business application Ad Network Many users no longer Users logged in during Any cookie can show up active; few logged in at the day. Day moves at any time. any given time. around the globe. Reading Data 9
  10. 10. Couchbase in operation: Writing data Application Server Store document A OK, it is stored If there is room for the document in RAM A Store the document in RAM RAM Else Eject other document(s) from RAM A Store the document in RAM Add the document to the replication queue Replicator eventually transmits document Add the document to write queue DISK Writer eventually writes document to disk Server Writing Data 10
  11. 11. Flow of data when writing Application Server Application Server Application ServerApplications writing to Couchbase Server Couchbase transmitting replicas Couchbase writing to disk network Writing Data 11
  12. 12. Queues build if aggregate arrival rate exceeds drain rates Application Server Application Server Application Server Server Replication queue Disk write queue network Writing Data 12
  13. 13. Scaling out permits matching of aggregate flow rates soqueues do not grow Application Server Application Server Application Server Server Server Server network network network 13
  14. 14. DEVELOPMENTDev-Test Size Deploy Monitor Manage 14
  15. 15. Couchbase SDKsJava SDK User Code.Net SDK Java client API CouchbaseClient cb = new CouchbaseClient(listURIs, "aBucket", "letmein"); // this is all the same as before cb.set("hello", 0, "world"); cb.get("hello"); Couchbase Java LibraryPHP SDK (spymemcached)Ruby SDK Couchbase Server…and manymore http://www.couchbase.com/develop 15
  16. 16. Couchbase Data• Couchbase uses (and is completely compatible with) the memcached protocol.• While you can use any standard memcached library, Couchbase also provides it’s own libraries for a variety of languages.• Couchbase is document-oriented• See http://www.couchbase.com/develop 16
  17. 17. Couchbase Client Deployment Application server Farm Town Wars Farm Town Wars Application server App Code App Code Memcached Client Couchbase Java Moxi (Couchbase proxy) Client library OR 11210 8091 11210 8091 Couchbase Server Couchbase Server Couchbase Couchbase Server Server Client-side Moxi (“smart”) library 17
  18. 18. SERVER AND CLUSTER SIZING (TIME FOR THE ARITHMETIC)Dev-Test Size Deploy Monitor Manage 18
  19. 19. Size Couchbase Server Sizing == performance • Serve reads out of RAM • Enough IO for writes • Mitigate inevitable failures Reading Data Writing Data Application Server Application Server Give me Please store document A A document A Here is A OK, I stored document A document A Server Server 19
  20. 20. How many nodes?4 Key Factors determine number of nodes needed:1) RAM2) Disk Application user3) Network4) Data Distribution/Safety Web application server Couchbase Servers 20
  21. 21. RAM sizing Reading DataKeep working set in RAM Application Serverfor best read performance Give me document A A Here is document A 1) RAM • Working set A • Metadata A • Buffer/overhead • Active+Replica(s) Server 21
  22. 22. Disk sizing: Space and I/O Writing Data Application Server Please store OK, I stored A document A document A2) Disk• Sustained write rate A I/O• Rebalance capacity A• Backups• Total dataset Space• Active+Replicas Server 23
  23. 23. Network sizing3) Network Reads+Writes• Client traffic• Replication (writes)• Rebalancing Replication (multiply writes) and Rebalancing 24
  24. 24. Data Distribution Servers fail, be prepared. The more nodes, the less impact a failure will have.4) Data Distribution / Safety (assuming one replica):• 1 node = BAD• 2 nodes = …better…• 3+ nodes = BEST!Note: Many applications will need more than 3 nodes 25
  25. 25. Data Distribution APP SERVER 1 APP SERVER 2 COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY CLUSTER MAP CLUSTER MAP Read/Write/Update Read/Write/Update SERVER 1 SERVER 2 SERVER 3 Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC COUCHBASE SERVER CLUSTER 26
  26. 26. How many nodes? (recap)4 Key Factors determine number of nodes needed:1) RAM2) Disk Application user3) Network4) Data Distribution Web application server Couchbase Servers 27
  27. 27. MONITORINGDev-Test Size Deploy Monitor Manage 28
  28. 28. Key resources: RAM, Disk, Network Application Server Application Server Application Server NETWORK RAM RAM RAM DISK DISK DISK Server Server Server 29
  29. 29. Monitoring Once in production, heart of operations is monitoring -RAM Usage -Disk writes queues / read activity -Network bandwidth, replication queues -Data distribution (balance, replicas) 30
  30. 30. How do you know when your working set is not in RAM? Application Server Give me document A Here is document A If document A is in memoryA RAM return document A to the application ElseA add document to read queue reader eventually loads document from disk into memory return document A to the application DISK Server Cache Miss Ratio 31
  31. 31. How do you know when you don’t have enough disk I/O?Disk Write Queue 32
  32. 32. How do you know when you don’t have enough network I/O?TAP Replication Queue 33
  33. 33. 35
  34. 34. MANAGEMENT AND MAINTENANCEDev-Test Size Deploy Monitor Manage 36
  35. 35. GrowthGoing from 5 million to 100 million users… – RAM usage is growing: • Cache misses increasing • Resident item ratios decreasing • Disk fetches increasing – Disk write queue growing higher than usualNeed to add a few more nodes...…More RAM, disk and network without any downtime 37
  36. 36. Add Nodes APP SERVER 1 APP SERVER 2 COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY CLUSTER MAP CLUSTER MAP Read/Write/Update Read/Write/Update SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 6 Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 9 Doc 8 DOC Doc 2 DOC Doc 5 DOC COUCHBASE SERVER CLUSTER 38
  37. 37. Backup As simple as running a packaged script (cbbackup) Done on live system with minimal to no performance impact 39
  38. 38. Restore1) Replace backup files, server will automatically “warmup”from disk files upon restart – Traditional RDBMS performance is acceptable while slowly populating cache – Our applications demand a different level of performance – Couchbase Server pre-loads as much as possible into RAM warmup 40
  39. 39. Restore2) “cbrestore” used to restore data into live/different cluster cbrestore Data Files 41
  40. 40. Upgrade 1. Add nodes of new version, rebalance… 2. Remove nodes of old version, rebalance… 3. Done! No disruption General use for software upgrade, hardware refresh, planned maintenance Upgrade existing Membase 1.7 to Couchbase Server 1.8 42
  41. 41. Disk fragmentationCurrent use of sqlite causes performancedegradation as DB files get fragmented -“vacuum” available (but not as online operation) - Best practice: Repeat rebalance to “clean” disk filesUnder Development: “Maintenance mode” to allow forsafely offlining of node to perform vacuuming in place.Couchbase Server 2.0 has much improved behavior 43
  42. 42. Failures Happen! Hardware Network Bugs 44
  43. 43. Easy to Manage failures with Couchbase• Failover (automatic or manual): – Replica data promoted for immediate access – Replicas not recreated – Do NOT failover healthy node 45
  44. 44. Fail Over APP SERVER 1 APP SERVER 2 COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY CLUSTER MAP CLUSTER MAP SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 9 DOC Doc 6 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 5 DOC Doc 8 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 2 DOC Doc 9 COUCHBASE SERVER CLUSTER 46
  45. 45. Easy to maintain Couchbase• Use remove+rebalance on “malfunctioning” node: – Protects data distribution and “safety” – Replicas recreated – Best to “swap” with new node to maintain capacity 47
  46. 46. QUESTIONS? 48

×