Couchbase_UK_2013_Couchbase_in_Production_I

1,838 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,838
On SlideShare
0
From Embeds
0
Number of Embeds
715
Actions
Shares
0
Downloads
45
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • In this session, we’re shifting gears from development to production. I’m going to talk about how to operate Couchbase in production – how to “care and feed” for the system to maintain application uptime and performance.I will try to demo as much as time permits – as this is a lot about practice.-This presentation will discuss the new features and production impact of 2.0, while most of this remains the same for 1.8 I will call out the specific differences as we come to them.
  • The typical couchbase production environment. Many users of a web application, served by a load balanced tier of web/application servers, backed by a cluster of Couchbase Servers. Couchbase provides the real-time/transactional data store for the application data.
  • The typical couchbase production environment. Many users of a web application, served by a load balanced tier of web/application servers, backed by a cluster of Couchbase Servers. Couchbase provides the real-time/transactional data store for the application data.
  • Ultimately, what matters is the interaction between the application and the database. The database must allow data to be randomly read by the application with low latency and high throughput. It must also accept writes from the application, replicate the data for safety and durably store the data as quickly as possible.
  • Before getting into the detailed recommendations and considerations for operating Couchbase across the application lifecycle, we’ll cover a few key concepts and describe the “high level” considerations for successfully operating Couchbase in production.
  • As I mentioned, each Couchbase node is exactly the same.All nodes are broken down into two components: A data manager (on the left) and a cluster manager (on the right). It’s important to realize that these are separate processes within the system specifically designed so that a node can continue serving its data even in the face of cluster problems like network disruption. The data manager is written in C and C++ and is responsible both for the object caching layer, persistence layer and querying engine. It is based off of memcached and so provides a number of benefits;-The very low lock contention of memcached allows for extremely high throughput and low latencies both to a small set of documents (or just one) as well as across millions of documents-Being compatible with the memcached protocol means we are not only a drop-in replacement, but inherit support for automatic item expiration (TTL), atomic incrementer.-We’ve increased the maximum object size to 20mb, but still recommend keeping them much smaller-Support for both binary objects as well as natively supporting JSON documents-All of the metadata for the documents and their keys is kept in RAM at all times. While this does add a bit of overhead per item, it also allows for extremely fast “miss” speeds which are critical to the operation of some applications….we don’t have to scan a disk to know when we don’t have some data.The cluster manager is based on Erlang/OTP which was developed by Ericsson to deal with managing hundreds or even thousands of distributed telco switches. This component is responsible for configuration, administration, process monitoring, statistics gathering and the UI and REST interface. Note that there is no data manipulation done through this interface.
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • 1.  A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. Couchbase Server then Replicates the data out to memory in the other nodes4. At the same time it is put the data into a write que to be persisted to disk
  • This slide has an click-by-click animation1.  (click) A set request comes in from the application .2.  Couchbase Server responses back that they key is written3. (click)Couchbase Server then Replicates the data out to memory in the other nodesAt the same time it is put the data into a write queue to be persisted to disk(click)Once it is on disk, the item is processed by the view engine and sent out any configured XDCR link to one or more clusters
  • When an application server or process starts up, it instantiates a Couchbase client object. This object takes a bit of configuration (language dependent) which includes one or more URL’s to the Couchbase Server cluster. That client object then makes a connection on port 8091 to one of the URL’s in its list and receives the topology of the cluster (called a vbucket map). Technically a client connects to one bucket within the cluster. Using this map, the client library then sends the data requests to the individual Couchbase Server nodes. In this way, every application server does the load balancing for us without the need for any routing or proxy process.Let’s first start out by looking at the operations within each single node. Keep in mind again that each node is completely independent from one another when it comes to taking in and serving data. Every operation (with the exception of queries) is only between a single application server and a single Couchbase node. ALL operations are atomic and there is no blocking or locking done by the database itself. Application requests are responded to as quickly as possible which should mean sub-ms depending on your network unless a read is coming from disk and any failure (except timeouts) is designed to be sent as quickly as possible…”fail fast”.
  • Bulletize the text. Make sure the builds work.
  • Now lets look at what happens when it comes time to add servers to the cluster. Starting with the same set of three nodes, we bring two more online (click). Note that you can add or remove multiple nodes at once before actually migrating any data. This helps greatly when needing to add or swap lots of nodes since you don’t have to move the data around multiple times.Once the administrator is ready, pressing the rebalance button (click) moves some of the active data and some of the replica data to the new nodes. Despite what the animation shows, this is actually done incrementally one shard (or vbucket) at a time which not only means that load is immediately and incrementally transferred to the new nodes, but this process can be stopped at any point in the middle and leave the cluster still in a stable, albeit unbalanced state.This whole process is done online which the application is accessing data. There is an atomic switchover for each shard as it is moved, and the application continues reading and writing data from the original location until that happens. Any writes are synchronized to the new location before switching over, and it is also replicated (and optionally persisted) to ensure data safety. This same process can be used for software upgrades, hardware refreshes, and removing or swapping out misbehaving nodes.
  • Understanding those same operations, let’s look at how this functions across a cluster.With a Couchbase Server cluster of three nodes, you can see that the documents are evenly distributed throughout the cluster. (click) Additionally, the replica document are also evenly distributed so that no replica document is on the same node as it’s active. This is showing one replica copy, but the same logic applies when there are two or three. After the application server comes online and receives the vbucket map, all requests (read/write/update/delete) to a given document are sent to the node that is active for it. In this way, Couchbase ensures immediate and strong consistency. An application will always read its own writes. At no point is the replica data read which would introduce inconsistency. We will see later what happens when a node fails and the replica data needs to be activated.The data is distributed (or “sharded”) based upon a CRC32 hash of the key name which creates very even and random distribution of the data across all the nodes. Other systems shard based upon some user-generated value, which can lead to hot spots and imbalances within a cluster which we don’t have. By distributing the data evenly across the cluster and letting the clients load balance themselves, the load is also evenly distributed across all the nodes in the cluster, making them “active-active”. Other systems using “master-slave” configurations basically end up wasting processing power and hardware in the background.Although the diagram only shows a few “shards” of data, we actually use 1024 slices/shards/vbuckets, technically this limits us to 1024 active nodes in a cluster, but also has lots of benefits for smaller clusters. The data is sharded very granularly and can be moved and compacted as such. This allows the cluster to scale very evenly and linearly for more RAM/disk/network and CPU.
  • Calculate for both Active and number of replicas.Replicas will be the first to be dropped out of ram if not enough memory
  • The solution to scale writes is to add more servers to the couchbase cluster ensuring AGGREGATE back-end IO performance to match AGGREGATE front-end data rate (or to at least allow the absorption of the maximum write spike you expect). If queues get too built up and Couchbase can’t drain them fast enough, Couchbase will eventually tell your application to “slow down” that it needs time to ingest the spike. As we’ll discuss in the sizing section, ensuring aggregate back end disk IO is available and sizing RAM to match working set size are the two primary requirements for getting your cluster correctly configured. Likewise, monitoring will primarily focus on ensuring you’ve done that job correctly and don’t need to make adjustments.
  • Each on of these can determine the number of nodesData sets, work load, etc.
  • Calculate for both Active and number of replicas.Replicas will be the first to be dropped out of ram if not enough memory
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • Replication is needed only for writes/updates. Gets are not replicated.
  • Replication is needed only for writes/updates. Gets are not replicated.
  • The more nodes you have the less impactful a failure of one node on the remaining nodes on the cluster.1 node is a single point of failure, obviously bad2 nodes gives you replication which is better, but if one node goes down, the whole load goes to just one node and now you’re at an spof3 nodes is the minimal recommendation because a failure of one distributes the load over twoThe more node the better, as recovering from a single node failure is easier with more nodes in the cluster
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • Calculate for both Active and number of replicas.Replicas will be the first to be dropped out of ram if not enough memory
  • Add CPU
  • An example of a weekly view of application on production, clearly see the oscillation on the disk write queue load.About 13 node cluster at the time (grew since then),With ops/sec that varies from 1k at the low time to 65K at peak, running on EC2.We can easily see the traffic patterns on the disk write queue, and regardless the load, the application sees the same deterministic latency.
  • Each on of these can determined the number of nodesData sets, work load, etc.
  • StatsStats timingDemo 3- Stats – check how we are on time.Load: -h localhost -i1000000 –M1000 –m9900 -t8 –K sharon –c500000 -l
  • StatsStats timingDemo 3- Stats – check how we are on time.Load: -h localhost -i1000000 –M1000 –m9900 -t8 –K sharon –c500000 -l
  • Calculate for both Active and number of replicas.Replicas will be the first to be dropped out of ram if not enough memory
  • So the monitoring goal is to help assess the cluster capacity usage which derive the decision of when to grow.
  • So the monitoring goal is to help assess the cluster capacity usage which derive the decision of when to grow.
  • Talk about the Amazon “disaster” in December. Amazon told almost all our customers that almost all of their nodes would be restarted. We advised them to proactively rebalance in a whole cluster of new nodes and rebalance out the old ones, preventing any disruption when the restarts actually happened.
  • Do not failover a healthy node!
  • Worthwhile to say that during warmup, data is not available from node…Unlike traditional RDBMS…Can handle at application level with “move on”, “retry”, “log”, “blow up”…some data is unavailable, not all
  • Do not failover a healthy node!
  • Do not failover a healthy node!
  • Do not failover a healthy node!
  • Talk about the Amazon “disaster” in December. Amazon told almost all our customers that almost all of their nodes would be restarted. We advised them to proactively rebalance in a whole cluster of new nodes and rebalance out the old ones, preventing any disruption when the restarts actually happened.
  • Do not failover a healthy node!
  • Couchbase_UK_2013_Couchbase_in_Production_I

    1. 1. Couchbase Server 2.0 in ProductionPart IPerry KrugSr. Solutions Architect
    2. 2. Agenda• Introduction• Couchbase Server Internal Operations• Basic Sizing• Intro to Monitoring• Management/Maintenance• Hardware Recommendations
    3. 3. Typical Couchbase productionenvironmentApplication usersLoad BalancerApplication ServersCouchbase Servers
    4. 4. We’ll focus on App-Couchbase interaction…Application usersLoad BalancerApplication ServersCouchbase Servers
    5. 5. Key concepts
    6. 6. Couchbase Single Node ArchitectureReplication, Rebalance,Shard State ManagerREST managementAPI/Web UI8091Admin ConsoleErlang/OTP11210 / 11211Data access portsObject-managedCacheStorage Engine8092Query APIQueryEnginehttpData Manager Cluster Manager
    7. 7. 33 2Single node - Couchbase Write Operation2Managed CacheDiskQueueDiskReplicationQueueApp ServerCouchbase Server NodeDoc 1Doc 1Doc 1To other node
    8. 8. 33 2Update Operation2Managed CacheDiskQueueReplicationQueueApp ServerCouchbase Server NodeDoc 1’Doc 1Doc 1’Doc 1Doc 1’DiskTo other node
    9. 9. 33 2Cache Eviction2DiskQueueReplicationQueueApp ServerCouchbase Server NodeDoc 1Doc 6Doc 5Doc 4Doc 3Doc 2Doc 1Doc 6 Doc 5 Doc 4 Doc 3 Doc 2Managed CacheDiskTo other node
    10. 10. GETDoc133 2Read Operation2DiskQueueReplicationQueueApp ServerCouchbase Server NodeDoc 1Doc 1Doc 1Managed CacheDiskTo other node
    11. 11. 33 2Cache Miss2DiskQueueReplicationQueueApp ServerCouchbase Server NodeDoc 1Doc 3Doc 5 Doc 2Doc 4Doc 6 Doc 5 Doc 4 Doc 3 Doc 2Doc 4GETDoc1Doc 1Doc 1Managed CacheDiskTo other node
    12. 12. 33 2View processing and XDCR2Managed CacheDiskQueueDiskReplicationQueueApp ServerCouchbase Server NodeDoc 1Doc 1Doc 1To other nodeXDCR QueueDoc 1To other clusterView engineDoc 1
    13. 13. WebApplicationCouchbase deploymentCluster ManagementWebApplicationCouchbaseClient LibraryWebApplication … …Couchbase Server Couchbase Server Couchbase Server Couchbase ServerReplication FlowData ports
    14. 14. COUCHBASE SERVER CLUSTERCouchbase in a ClusterUser Configured Replica Count = 1ACTIVEDoc 5Doc 2DocDocDocSERVER 1REPLICADoc 3Doc 1Doc 7DocDocDocAPP SERVER 1COUCHBASE Client LibraryCLUSTER MAPCOUCHBASE Client LibraryCLUSTER MAPAPP SERVER 2Doc 9ACTIVEDoc 3Doc 1DocDocDocSERVER 2REPLICADoc 6Doc 4Doc 9DocDocDocDoc 8ACTIVEDoc 4Doc 6DocDocDocSERVER 3REPLICADoc 2Doc 5Doc 8DocDocDocDoc 7QueryREAD/WRITE/UPDATE
    15. 15. Cluster wide - Add Nodes to Cluster• Two servers addedOne-click operation• Docs automaticallyrebalanced acrossclusterEven distribution of docsMinimum doc movement• Cluster map updated• App databasecalls now distributedover larger number ofserversREPLICAACTIVEDoc 5Doc 2DocDocDoc 4Doc 1DocDocSERVER 1REPLICAACTIVEDoc 4Doc 7DocDocDoc 6Doc 3DocDocSERVER 2REPLICAACTIVEDoc 1Doc 2DocDocDoc 7Doc 9DocDocSERVER 3 SERVER 4 SERVER 5REPLICAACTIVEREPLICAACTIVEDocDoc 8 DocDoc 9 DocDoc 2 DocDoc 8 DocDoc 5 DocDoc 6READ/WRITE/UPDATE READ/WRITE/UPDATEAPP SERVER 1COUCHBASE Client LibraryCLUSTER MAPCOUCHBASE Client LibraryCLUSTER MAPAPP SERVER 2COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1
    16. 16. Node and cluster sizing
    17. 17. Size Couchbase ServerSizing == performance• Serve reads out of RAM• Enough IO for writes and disk operations• Mitigate inevitable failuresReading Data Writing DataCouchbase ServerGive medocument AHere isdocument AApplication ServerACouchbase ServerPlease storedocument AOK, I storeddocument AApplication ServerA
    18. 18. Scaling out permits matching of aggregate flow rates soqueues do not growApplication ServerApplication Server Application Servernetwork networknetworkCouchbase Server Couchbase Server Couchbase Server
    19. 19. How many nodes?5 Key Factors determine number of nodes needed:1) RAM2) Disk3) CPU4) Network5) Data Distribution/SafetyCouchbase ServersWeb application serverApplication user
    20. 20. RAM sizing1) Total RAM:• Managed document cache:• Working set• Metadata• Active+Replicas• Index caching (I/O buffer)Keep working set in RAMfor best read performanceServerGive medocument AHere isdocument AApplication ServerAAAReading Data
    21. 21. Disk sizing: Space and I/O2) Disk• Sustained write rate• Rebalance capacity• Backups• XDCR• View processing• Compaction• Total dataset:(active + replicas +indexes)• Append-onlyI/OSpacePlease storedocument AOK, I storeddocument AApplication ServerAAAWriting DataCouchbase Server
    22. 22. CPU sizing3) CPU• Disk writing• Views/compaction/XDCR• RAM r/w performance not impacted
    23. 23. Network sizing4) Network• Client traffic• Replication (writes)• Rebalancing• XDCRReplication (multiply writes) and RebalancingReads+Writes
    24. 24. Data Distribution5) Data Distribution / Safety (assuming one replica):• 1 node = BAD• 2 nodes = …better…• 3+ nodes = BEST!Note: Many applications will need more than 3 nodesServers fail, be prepared.The more nodes, the less impact a failure will have.
    25. 25. How many nodes? (recap)New to 2.0 feature will affect sizing requirements:• Views/Indexing/Querying• XDCR• Append-only file format5 Key Factors still determine number of nodes needed:1) RAM2) Disk3) CPU4) Network5) Data DistributionCouchbase ServersWeb application serverApplication user
    26. 26. Monitoring
    27. 27. ServerKey resources: RAM, Disk, Network, CPURAMDISKNETWORKServerRAMDISKServerRAMDISKApplication Server Application Server Application Server
    28. 28. MonitoringOnce in production, heart of operations ismonitoring• RAM Usage• Disk space and I/O:• write queues / read activity / indexing• Network bandwidth, replication queues• CPU Usage• Data distribution (balance, replicas)
    29. 29. MonitoringIMMENSE amount of information available• Real-time traffic graphs• REST API accessible• Per bucket, per node and aggregate statistics• Application and inter-node traffic• RAM <-> Disk• Inter-system timing
    30. 30. Key Stats to Monitor• “Summary” section of UI graphs• Working set doesn’t fit in RAM– Cache miss rate / disk fetches• Disk I/O not keeping up– Disk Write queue size• Internal replication lag– TAP queues• Indexing not keeping up• XDCR lag
    31. 31. Management and maintenance
    32. 32. Management/Maintenance• Scaling• Upgrading/Scheduled maintenance• Backup/Restore• Dealing with Failures• Amazon
    33. 33. ScalingCouchbase Scales out Linearly:Need more RAM? Add nodes…Need more Disk IO or space? Add nodes…Couchbase also makes it easy to scale up by swapping largernodes for smaller ones without any disruption
    34. 34. Upgrade1. Add nodes of new version, rebalance…2. Remove nodes of old version, rebalance…3. Done!No disruptionGeneral use for software upgrade, hardware refresh,planned maintenanceUpgrade existing Couchbase Server 1.8 toCouchbase Server 2.0!
    35. 35. Easy to Maintain Couchbase• Use remove+rebalance on “malfunctioning” node:- Protects data distribution and “safety”- Replicas recreated- Best to “swap” with new node to maintain capacity andmove minimal amount of data
    36. 36. BackupData FilescbbackupServerServer Servernetwork networknetwork
    37. 37. Restore2) “cbrestore” used to restore data into live/different clusterData Filescbrestore
    38. 38. Failures Happen!HardwareNetworkBugs
    39. 39. Easy to Manage failures with Couchbase• Failover (automatic or manual):- Replica data and indexes promoted for immediate access- Replicas not recreated- Do NOT failover healthy node- Perform rebalance after returning cluster to full or greatercapacity
    40. 40. Fail OverDoc 7Doc 9Doc 3Active DocsReplica DocsDoc 6COUCHBASE CLIENT LIBRARYCLUSTER MAPAPP SERVER 1COUCHBASE CLIENT LIBRARYCLUSTER MAPAPP SERVER 2Doc 4Doc 2Doc 5SERVER 1Doc 6Doc 4SERVER 2Doc 7Doc 1SERVER 3Doc 3Doc 9Doc 7 Doc 8Doc 6Doc 3DOCDOCDOCDOCDOCDOCDOC DOCDOCDOCDOC DOCDOCDOCDOCDoc 9Doc 5DOCDOCDOCDoc 1Doc 8Doc 2Replica Docs Replica Docs Replica DocsActive Docs Active Docs Active DocsSERVER 4 SERVER 5Active Docs Active DocsReplica Docs Replica DocsCOUCHBASE SERVER CLUSTER
    41. 41. Hardware• Designed for commodity hardware• Scale out, not up• Tested and deployed in EC2• VMs not best practice (unless in private “cloud”)- RAM use inefficient- Disadvantaged for clustering• “Rule-of-thumb” minimums:- 3 or more nodes- 4GB+ RAM- 4+ CPU Cores- “best” local storage available
    42. 42. Amazon Considerations• Use a hostname instead of IP:- Easier connectivity (when using public hostname)- Easier restoration• RAID-10 EBS for better IO• XDCR:- Must use hostname when crossing regions- Utilize Amazon-provided VPN for security• You will need more nodes in general
    43. 43. Want more?Lots of details and best practices in ourdocumentation:http://www.couchbase.com/docs/
    44. 44. Thank youCouchbaseNoSQL Document Databaseperry@couchbase.com@couchbase

    ×