• Like

CouchConf Israel 2013_Couchbase Server in Production

  • 331 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
331
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • In this session, we’re shifting gears from development to production. I’m going to talk about how to operate Couchbase in production – how to “care and feed” for the system to maintain application uptime and performance.I will try to demo as much as time permits – as this is a lot about practice.-This presentation will discuss the new features and production impact of 2.0, while most of this remains the same for 1.8 I will call out the specific differences as we come to them.
  • The typical couchbase production environment. Many users of a web application, served by a load balanced tier of web/application servers, backed by a cluster of Couchbase Servers. Couchbase provides the real-time/transactional data store for the application data.
  • Ultimately, what matters is the interaction between the application and the database. The database must allow data to be randomly read by the application with low latency and high throughput. It must also accept writes from the application, replicate the data for safety and durably store the data as quickly as possible.
  • And it must continue those things across the application lifecycle. Not only when the application is in “steady state” but when adding and removing capacity. When a node fails. When nodes are in the process of maintenance. Sizing the cluster properly is not just about ensuring things work when everything is “steady” but also about ensuring that things work when things aren’t “steady.”
  • Before getting into the detailed recommendations and considerations for operating Couchbase across the application lifecycle, we’ll cover a few key concepts and describe the “high level” considerations for successfully operating Couchbase in production.
  • As I mentioned, each Couchbase node is exactly the same.All nodes are broken down into two components: A data manager (on the left) and a cluster manager (on the right). It’s important to realize that these are separate processes within the system specifically designed so that a node can continue serving its data even in the face of cluster problems like network disruption. The data manager is written in C and C++ and is responsible both for the object caching layer, persistence layer and querying engine. It is based off of memcached and so provides a number of benefits;-The very low lock contention of memcached allows for extremely high throughput and low latencies both to a small set of documents (or just one) as well as across millions of documents-Being compatible with the memcached protocol means we are not only a drop-in replacement, but inherit support for automatic item expiration (TTL), atomic incrementer.-We’ve increased the maximum object size to 20mb, but still recommend keeping them much smaller-Support for both binary objects as well as natively supporting JSON documents-All of the metadata for the documents and their keys is kept in RAM at all times. While this does add a bit of overhead per item, it also allows for extremely fast “miss” speeds which are critical to the operation of some applications….we don’t have to scan a disk to know when we don’t have some data.The cluster manager is based on Erlang/OTP which was developed by Ericsson to deal with managing hundreds or even thousands of distributed telco switches. This component is responsible for configuration, administration, process monitoring, statistics gathering and the UI and REST interface. Note that there is no data manipulation done through this interface.
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • When an application server or process starts up, it instantiates a Couchbase client object. This object takes a bit of configuration (language dependent) which includes one or more URL’s to the Couchbase Server cluster. That client object then makes a connection on port 8091 to one of the URL’s in its list and receives the topology of the cluster (called a vbucket map). Technically a client connects to one bucket within the cluster. Using this map, the client library then sends the data requests to the individual Couchbase Server nodes. In this way, every application server does the load balancing for us without the need for any routing or proxy process.Let’s first start out by looking at the operations within each single node. Keep in mind again that each node is completely independent from one another when it comes to taking in and serving data. Every operation (with the exception of queries) is only between a single application server and a single Couchbase node. ALL operations are atomic and there is no blocking or locking done by the database itself. Application requests are responded to as quickly as possible which should mean sub-ms depending on your network unless a read is coming from disk and any failure (except timeouts) is designed to be sent as quickly as possible…”fail fast”.
  • Bulletize the text. Make sure the builds work.
  • Understanding those same operations, let’s look at how this functions across a cluster.With a Couchbase Server cluster of three nodes, you can see that the documents are evenly distributed throughout the cluster. (click) Additionally, the replica document are also evenly distributed so that no replica document is on the same node as it’s active. This is showing one replica copy, but the same logic applies when there are two or three. After the application server comes online and receives the vbucket map, all requests (read/write/update/delete) to a given document are sent to the node that is active for it. In this way, Couchbase ensures immediate and strong consistency. An application will always read its own writes. At no point is the replica data read which would introduce inconsistency. We will see later what happens when a node fails and the replica data needs to be activated.The data is distributed (or “sharded”) based upon a CRC32 hash of the key name which creates very even and random distribution of the data across all the nodes. Other systems shard based upon some user-generated value, which can lead to hot spots and imbalances within a cluster which we don’t have. By distributing the data evenly across the cluster and letting the clients load balance themselves, the load is also evenly distributed across all the nodes in the cluster, making them “active-active”. Other systems using “master-slave” configurations basically end up wasting processing power and hardware in the background.Although the diagram only shows a few “shards” of data, we actually use 1024 slices/shards/vbuckets, technically this limits us to 1024 active nodes in a cluster, but also has lots of benefits for smaller clusters. The data is sharded very granularly and can be moved and compacted as such. This allows the cluster to scale very evenly and linearly for more RAM/disk/network and CPU.
  • Calculate for both Active and number of replicas.Replicas will be the first to be dropped out of ram if not enough memory
  • The solution to scale writes is to add more servers to the couchbase cluster ensuring AGGREGATE back-end IO performance to match AGGREGATE front-end data rate (or to at least allow the absorption of the maximum write spike you expect). If queues get too built up and Couchbase can’t drain them fast enough, Couchbase will eventually tell your application to “slow down” that it needs time to ingest the spike. As we’ll discuss in the sizing section, ensuring aggregate back end disk IO is available and sizing RAM to match working set size are the two primary requirements for getting your cluster correctly configured. Likewise, monitoring will primarily focus on ensuring you’ve done that job correctly and don’t need to make adjustments.
  • Each on of these can determine the number of nodesData sets, work load, etc.
  • Calculate for both Active and number of replicas.Replicas will be the first to be dropped out of ram if not enough memory
  • Different applications, and even where the application is in its lifecycle, will lead to different required ratios between data in RAM and data only on disk (i.e. the working set to total set ratio will vary by application). We have three examples of very different working set to total dataset size ratios.
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • not unique to Couchbase…MySQL suffers as well for example
  • not unique to Couchbase…MySQL suffers as well for example
  • not unique to Couchbase…MySQL suffers as well for example
  • Replication is needed only for writes/updates. Gets are not replicated.
  • Replication is needed only for writes/updates. Gets are not replicated.
  • Chart shows average latency (response times) across varying document sizes (1KB – 16KB)It demonstrates that Couchbase Server is ridiculously fast and responds in microsecond responses. (That is latency is < 100 μsec on a 10gig Ethernet network for documents of all sizes)The network latency has an impact on a 1GIG Ethernet network, however latency is flat/ consistent on a 10GIG Ethernet networkCouchbase Server gives you a consistent, predictable latency at any document size
  • The more nodes you have the less impactful a failure of one node on the remaining nodes on the cluster.1 node is a single point of failure, obviously bad2 nodes gives you replication which is better, but if one node goes down, the whole load goes to just one node and now you’re at an spof3 nodes is the minimal recommendation because a failure of one distributes the load over twoThe more node the better, as recovering from a single node failure is easier with more nodes in the cluster
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • Calculate for both Active and number of replicas.Replicas will be the first to be dropped out of ram if not enough memory
  • Add CPU
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • StatsStats timingDemo 3- Stats – check how we are on time.Load: -h localhost -i1000000 –M1000 –m9900 -t8 –K sharon –c500000 -l
  • StatsStats timingDemo 3- Stats – check how we are on time.Load: -h localhost -i1000000 –M1000 –m9900 -t8 –K sharon –c500000 -l
  • An example of a weekly view of application on production, clearly see the oscillation on the disk write queue load.About 13 node cluster at the time (grew since then),With ops/sec that varies from 1k at the low time to 65K at peak, running on EC2.We can easily see the traffic patterns on the disk write queue, and regardless the load, the application sees the same deterministic latency.
  • Calculate for both Active and number of replicas.Replicas will be the first to be dropped out of ram if not enough memory
  • So the monitoring goal is to help assess the cluster capacity usage which derive the decision of when to grow.
  • So the monitoring goal is to help assess the cluster capacity usage which derive the decision of when to grow.
  • Talk about the Amazon “disaster” in December. Amazon told almost all our customers that almost all of their nodes would be restarted. We advised them to proactively rebalance in a whole cluster of new nodes and rebalance out the old ones, preventing any disruption when the restarts actually happened.
  • Do not failover a healthy node!
  • Worthwhile to say that during warmup, data is not available from node…Unlike traditional RDBMS…Can handle at application level with “move on”, “retry”, “log”, “blow up”…some data is unavailable, not all
  • Do not failover a healthy node!
  • Do not failover a healthy node!
  • And it must continue those things across the application lifecycle. Not only when the application is in “steady state” but when adding and removing capacity. When a node fails. When nodes are in the process of maintenance. Sizing the cluster properly is not just about ensuring things work when everything is “steady” but also about ensuring that things work when things aren’t “steady.”
  • Do not failover a healthy node!

Transcript

  • 1. Couchbase Server 2.0 in Production Perry Krug Sr. Solutions Architect 1
  • 2. Typical Couchbase production environment Application users Load Balancer Application Servers Servers 2
  • 3. We’ll focus on App-Couchbase interaction … Application users Load Balancer Application Servers Servers 3
  • 4. … at each step of the application lifecycle Dev/Test Size Deploy Monitor Manage 4
  • 5. KEY CONCEPTS 5
  • 6. Couchbase Single Node Architecture 8092 11210 / 11211 Query API 8091 Query Engine Data access ports Admin Console http Object-managed Cache REST management API/Web UI Erlang /OTP Replication, Rebalance, Shard State Manager Storage Engine Data Manager Cluster Manager 6
  • 7. Couchbase Single Node 2 Doc 1 App Server Couchbase Server Node 3 2 3 Managed Cache To other node Replication Doc 1 Queue Disk Queue Disk Doc 1 View engine XDCR Queue To other cluster 11
  • 8. Couchbase deployment Web Web Web Application … Application … Application Couchbase Client Library Data Flow Couchbase Server Couchbase Server Couchbase Server Couchbase ServerReplication Flow Cluster Management 12
  • 9. Couchbase in a Cluster APP SERVER 1 APP SERVER 2 COUCHBASE Client Library COUCHBASE Client Library CLUSTER MAP CLUSTER MAP READ/WRITE/UPDATE Query SERVER 1 SERVER 2 SERVER 3 ACTIVE ACTIVE ACTIVE Doc 5 Doc Doc 3 Doc Doc 4 Doc Doc 2 Doc Doc 1 Doc Doc 6 Doc Doc 9 Doc Doc 8 Doc Doc 7 Doc REPLICA REPLICA REPLICA Doc 3 Doc Doc 6 Doc Doc 2 Doc Doc 1 Doc Doc 4 Doc Doc 5 Doc Doc 7 Doc Doc 9 Doc Doc 8 Doc COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1 13
  • 10. NODE AND CLUSTER SIZINGDev-Test Size Deploy Monitor Manage 15
  • 11. Size Couchbase Server Sizing == performance • Serve reads out of RAM • Enough IO for writes and disk operations • Mitigate inevitable failures Reading Data Writing Data Application Server Application Server Give me Please store document A A document A Here is A OK, I stored document A document A Server Server 16
  • 12. Scaling out permits matching of aggregate flow rates soqueues do not grow Application Server Application Server Application Server Server Server Server network network network 17
  • 13. How many nodes?5 Key Factors determine number of nodes needed:1) RAM2) Disk Application user3) CPU4) Network Web application server5) Data Distribution/Safety Couchbase Servers 18
  • 14. RAM sizing Reading DataKeep working set in RAM Application Serverfor best read performance Give me document A A Here is document A1) Total RAM: A• Managed document cache: • Working set A • Metadata • Active+Replicas• Index caching (I/O buffer) Server 19
  • 15. Working set ratio depends on your application working/total set = .01 working/total set = .33 working/total set = 1 Server Server Server Late stage social game Business application Ad Network Many users no longer Users logged in during Any cookie can show up active; few logged in at the day. Day moves at any time. any given time. around the globe. Reading Data 20
  • 16. RAM sizing – Working set managed cacheAs memory grows, some cached data will be removedfrom RAM to make space: • Active and replica data share RAM • Threshold based (NRU, favoring active data) • Only cleanly persisted data can be “ejected” • Only data values can be “ejected” which means RAM can fill up with metadata 21
  • 17. RAM Sizing - View/Index cache (disk I/O)• File system cache availability for the index has a big impact performance:• Test runs based on 10 million items with 16GB bucket quota and 4GB, 8GB system RAM availability for indexes• Performance results show that by doubling system cache availability – query latency reduces by half – throughput increases by 50%• Leave RAM free with quotas 22
  • 18. Disk sizing: Space and I/O Writing Data Application Server2) Disk• Sustained write rate Please store document A A OK, I stored document A• Rebalance capacity• Backups A• XDCR I/O A• Compaction• Total dataset: (active + replicas + Space indexes)• Append-only Server 23
  • 19. Disk sizing: I/OImpacting disk I/O needed: • Peak write load • Sustained write load • Compaction • XDCR • Views/indexing Configurable paths/partitions for data and indexes allows for separation of space and I/O 24
  • 20. Disk sizing: SpaceImpacting amount of disk space needed: • Total data set • Indexes • Overhead for compaction (~3x): Both data and indexes are “append-only”Configurable paths/partitions for data and indexes allows for separation of space and I/O 25
  • 21. Disk sizing: Impact of Views on IO and Space• Number of Design Documents • Extra space for each DD • Extra IO to process for each DD • Segregate views by DD• Complexity of Views (IO)• Amount of view output (space) • Emit as little as possible • Doc ID automatically included• Use Development views and extrapolate 26
  • 22. Disk sizing: Append only• Append-only file format puts all new/updated/deleted items at the end of the on- disk file. – Better performance and reliability – No more fragmentation!• This can lead to invalidated data in the “back” of the file.• Need to compact data 27
  • 23. Disk compactionInitial file layout: Doc A Doc B Doc CUpdate some data: Doc A Doc B Doc C Doc A’ Doc B’ Doc D Doc A’’After compaction: Doc C Doc B’ Doc D Doc A’’ 28
  • 24. Disk compaction• Compaction happens automatically: – Settings for “threshold” of stale data – Settings for time of day – Split by data and index files – Per-bucket or global• Reduces size of on-disk files – data files AND index files• Temporarily increased disk I/O and CPU, but no downtime! 29
  • 25. CPU sizing3) CPU• Disk writing• Views/compaction/XDCR• RAM r/w performance not impacted1.8 used VERY little CPU. Under the sameworkloads, 2.0 should not be much different.New 2.0 features will require more CPU 30
  • 26. Network sizing4) Network Reads+Writes• Client traffic• Replication (writes)• Rebalancing• XDCR Replication (multiply writes) and Rebalancing 31
  • 27. Consistent low latency with varying doc sizes Consistently low latencies in microseconds for varying documents sizes with a mixed workload 32
  • 28. Data Distribution Servers fail, be prepared. The more nodes, the less impact a failure will have.5) Data Distribution / Safety (assuming one replica):• 1 node = BAD• 2 nodes = …better…• 3+ nodes = BEST!Note: Many applications will need more than 3 nodes 33
  • 29. How many nodes? (recap)New to 2.0 feature will affect sizing requirements:• Views/Indexing/Querying• XDCR• Append-only file format5 Key Factors still determine number of nodes needed:1) RAM2) Disk Application user3) CPU4) Network Web application server5) Data Distribution Couchbase Servers 34
  • 30. MONITORINGDev-Test Size Deploy Monitor Manage 35
  • 31. Key resources: RAM, Disk, Network, CPU Application Server Application Server Application Server NETWORK RAM RAM RAM DISK DISK DISK Server Server Server 36
  • 32. Monitoring Once in production, heart of operations is monitoring • RAM Usage • Disk space and I/O: • write queues / read activity / indexing • Network bandwidth, replication queues • CPU Usage • Data distribution (balance, replicas) 37
  • 33. MonitoringIMMENSE amount of information available• Real-time traffic graphs• REST API accessible• Per bucket, per node and aggregate statistics• Application and inter-node traffic• RAM <-> Disk• Inter-system timing 38
  • 34. Key Stats to Monitor• Working set doesn’t fit in RAM – Cache miss rate / disk fetches• Disk I/O not keeping up – Disk Write queue size• Internal replication lag – TAP queues• Indexing not keeping up• XDCR lag 39
  • 35. 40
  • 36. MANAGEMENT AND MAINTENANCEDev-Test Size Deploy Monitor Manage 41
  • 37. Management/Maintenance• Scaling• Upgrading/Scheduled maintenance• Backup/Restore• Dealing with Failures 42
  • 38. ScalingCouchbase Scales out Linearly:Need more RAM? Add nodes…Need more Disk IO or space? Add nodes…Couchbase also makes it easy to scale up by swapping largernodes for smaller ones without any disruption 43
  • 39. Couchbase + Cisco + Solarflare High throughput with 1.4 GB/sec data transfer rate using 4 serversOperations per second Linear throughput scalability Number of servers in cluster 44
  • 40. Additional benchmark details• Cluster of 8 nodes running Couchbase Server 1.8.0• One server used as the client to run the workload• Workload used for the test was Couchbase’s streaming load generator• GET and SET operations were performed in the 70:30 ratioTest System and Parameters• Couchbase Server 1.8.0• Cisco Nexus 5548UP Switch• Solarflare SFN5122F 10 Gigabit Ethernet Enhanced Small Form-Factor Pluggable (SFP+) server adapters• Solarflare OpenOnload• Servers: Nine Cisco UCS C200 M2 High-Density Rack Servers with Intel Xeon processor X5670 six-core 2.93-GHz CPU, running Red Hat Enterprise Linux (RHEL) 5.5 x86 64-bit, with 100-GB RAM and four 2- TB hard drives 45
  • 41. Upgrade 1. Add nodes of new version, rebalance… 2. Remove nodes of old version, rebalance… 3. Done! No disruption General use for software upgrade, hardware refresh, planned maintenance Upgrade existing Couchbase Server 1.8 to Couchbase Server 2.0! 46
  • 42. Easy to Maintain Couchbase• Use remove+rebalance on “malfunctioning” node: – Protects data distribution and “safety” – Replicas recreated – Best to “swap” with new node to maintain capacity and move minimal amount of data 47
  • 43. Backup cbbackup Server Server Server Data Files network network network 48
  • 44. Restore2) “cbrestore” used to restore data into live/different cluster cbrestore Data Files 49
  • 45. Failures Happen! Hardware Network Bugs 50
  • 46. Easy to Manage failures with Couchbase• Failover (automatic or manual): – Replica data and indexes promoted for immediate access – Replicas not recreated – Do NOT failover healthy node – Perform rebalance after returning cluster to full or greater capacity 51
  • 47. Fail Over APP SERVER 1 APP SERVER 2 COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY CLUSTER MAP CLUSTER MAP SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 9 DOC Doc 6 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 5 DOC Doc 8 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 2 DOC Doc 9 COUCHBASE SERVER CLUSTER 52
  • 48. Conclusion Dev/Test Size Deploy Monitor Manage 53
  • 49. Want more?Lots of details and best practices in our documentation: http://www.couchbase.com/docs/ 54
  • 50. QUESTIONS?PERRY@COUCHBASE.COM @PERRYKRUG 55