How-to NoSQL webinars - Couchbase 101


Published on

Review the basics of the Couchbase software. We will take you through topics including installation of Couchbase on various platforms, setup and configuration parameters, monitoring, scaling, and the Admin console.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Firstly, lets see how a write operation on a single document is handled

    (click) 1.  A set request comes in from the application .
    (click) 2.  Couchbase Server responds back that they key is written
    (click) 3. Couchbase Server then replicates the data out to memory to one or more nodes
    (click) 4. At the same time it is put the data into a write queue to be persisted to disk.

    Note that our primary form of high availability is getting the data off the node as quickly as possible. This is done from RAM to RAM and happens extremely quickly. The disk write process is always going to be a bit slower. We do everything asynchronously for the best performance, but also have a separate operation that the client can perform to wait for an item to be replicated and/or persisted to disk. It’s a separate operation on a key-by-key basis so the application developer can make the trade-off between performance and resiliency.
  • Now let’s look at what a read operation looks like.
    (click) 1.  A get request comes in from the application.
    (click) 2.  Assuming the data is actually present in the system and available in cache, it is returned right away without any other interaction with other nodes or the disk. If a document is actually not in the node at all, an immediate message of “not found” is returned to the application
  • Now, as you fill up memory (click), some data that has already been written to disk will be ejected from RAM to make room for new data. (click)

    Couchbase supports holding much more data than you have RAM available. It’s important to size the RAM capacity appropriately for your working set: the portion of data your application is working with at any given point in time and needs very low latency, high throughput access to. In some applications this is the entire data set, in others it is much smaller. As RAM fills up, we use a “not recently used” algorithm to determine the best data to be ejected from cache.
  • Should a read now come in for one of those documents that has been ejected (click), it is copied back from disk into RAM and sent back to the application. The document then remains in RAM as long as there is space and it is being accessed.

    Finally, let’s look at what happens when a node fails.

    Imagine the application is reading and writing to server #3. (click) In reality, it is sending requests to all the servers, but let’s just focus on number 3.

    If that nodes goes down, there have to be some requests that fail. Some will have already been sent on the wire, and others may be sent before the failure is detected. It’s important for your application to be prepared for some requests to fail, whether it’s a problem with Couchbase or not.

    Once the failure is detected, the node can be failed over either automatically by the cluster or manually by the administrator pressing a button or a script triggering our REST API. Once this happens (click), the replica data elsewhere in the cluster is made active, (click) the client libraries are updated and (click) subsequent accesses are immediately directed at the other nodes. Notice that server 3 doesn’t fail all of its data over to just one other server which would disproportionately increase the load on that node, but all of the other nodes in the cluster take on some of that data and traffic.

    Note also that the data on that node is not re-replicated. This would put undo load on an already degraded cluster and potentially lead to further failures.

    The failed node can now be rebooted or replaced and rebalanced back into the cluster. It is our best practice to return the cluster to full capacity before rebalancing which will automatically recreate any missing replicas. There is no worry about that node bringing its potentially stale data back online, once failed over the node is not allowed to return to the cluster without a rebalance.
  • How-to NoSQL webinars - Couchbase 101

    1. 1. Justin Michaels Solution Architect Couchbase 101 Server Fundamentals
    2. 2. Evolution of Couchbase Founders were key contributors to memcached Evolved into Membase, a distributed key-value cache Original contributors to CouchDB re-wrote storage engine to create Couchbase Server Couchbase Server is a cache and persistence engine in a single easy to scale distributed database.
    3. 3. Couchbase Server Core Principles Easy Scalability Consistent High Performance Always On 24x365 Grow cluster without application changes, without downtime with a single click Consistent sub-millisecond read and write response times with consistent high throughput No downtime for software upgrades, hardware maintenance, etc. Flexible Data Model JSON Anywhere document model with no fixed schema.
    5. 5. Install on Amazon via AMI
    6. 6. Download Couchbase
    7. 7. 1. Run what you downloaded 2. Installs to /opt/couchbase 3. Access <ip>:8091 via http *CLI and scripted deployments available* Let’s see a deployment
    9. 9. Couchbase Node Heartbeat Processmonitor Globalsingletonsupervisor Configurationmanager http RESTmanagementAPI/WebUI HTTP 8091 Erlang port mapper 4369 Distributed Erlang 21100 - 21199 Erlang/OTP storage interface Couchbase EP Engine 11210 Memcapable 2.0 Moxi 11211 Memcapable 1.0 Memcached New Persistence Layer 8092 Query APIQueryEngine Data Manager Cluster Manager Server/Cluster Management & Communication (Erlang) RAM Cache, Indexing & Persistence Management (C)
    10. 10. Couchbase Node New Persistence Layer storage interface Couchbase EP Engine 11210 Memcapable 2.0 Moxi 11211 Memcapable 1.0 Object-level Cache memcached Disk Persistence couchstore 8092 Query API QueryEngine HTTP 8091 Erlang port mapper 4369 Distributed Erlang 21100 - 21199 Heartbeat Processmonitor Configurationmanager http RESTmanagementAPI/WebUI Erlang/OTP Data Manager Cluster Manager
    11. 11. Couchbase Data • Key is any string up to 250 Bytes • Similar to primary key • Must be unique within a bucket • ID based document lookup is extremely fast • Value can be up to 20mb • Simple Datatypes: strings, numbers, datetime, boolean, binary • Complex Datatypes: dictionaries/hashes, arrays/lists, can be stored in JSON format • JSON: Automatically checked by the server • BLOB: Array of bytes • Cluster Aware Clients (Smartclient SDK) PythonRuby
    12. 12. Couchbase Data meta { “id”: “u::tesla”, “rev”: “1-0002bce0000000000”, “flags”: 0, “expiration”: 0, “type”: “json” } document { “sellerid”: 123456, “type”: “car”, “style”: “sedan”, “year”: 2013, “trim”: “performance”, “model”: “s” } Server stores metadata with each key/value pair (document) Unique and Kept in RAM Document Value Most Recent In RAM And Persisted To Disk
    14. 14. 33 2 Write (‘set’) Operation 2 Managed Cache DiskQueue Disk Replication Queue App Server Couchbase Server Node Doc 1Doc 1 Doc 1 To other nodeManaged Cache Node
    15. 15. GET Doc1 33 2 Read (‘get’) Operation 2 DiskQueue Replication Queue App Server Couchbase Server Node Doc 1 Doc 1Doc 1 Managed Cache Disk To other nodeManaged Cache Node
    16. 16. 33 2 Cache Ejection 2 DiskQueue Replication Queue App Server Couchbase Server Node Doc 1 Doc 6Doc 5Doc 4Doc 3Doc 2 Doc 1 Doc 6 Doc 5 Doc 4 Doc 3 Doc 2 Managed Cache Disk To other nodeManaged Cache Node
    17. 17. 33 2 Cache Miss 2 DiskQueue Replication Queue App Server Couchbase Server Node Doc 1 Doc 3Doc 5 Doc 2Doc 4 Doc 6 Doc 5 Doc 4 Doc 3 Doc 2 Doc 4 GET Doc1 Doc 1 Doc 1 Managed Cache Disk To other nodeManaged Cache Node
    18. 18. Cluster Mechanics Failover User Configured Replica Count = 1 SERVER 4 Replica Active App Server 1 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP App Server 2 Couchbase Server Cluster Active SERVER 1 Doc 5 Doc 2 Doc 9Doc Doc Doc Replica Doc 4 Doc 1 Doc Doc Active SERVER 2 Doc 4 Doc 7 Doc 8 Doc Doc Replica Doc 6 Doc 3 Doc 2 Doc Doc Active SERVER 3 Doc 1 Doc 3 Doc Doc Replica Doc 7 Doc 9 Doc 5Doc Doc Doc • App servers accessing docs. For this example only a small subset of the documents in 1024 partitions are shown. • Requests to Server 3 begin to fail • The Cluster detects the failure and initiates a failover if autofailover is enabled. –Promotes replicas of object to active from Memory. The replica vBuckets on Disk are also promoted to active. –Updates cluster map and smart client is immediately aware. • Requests for docs now go to appropriate server • Typically a rebalance would follow, but is not required. • Rebalance is an online operation Doc 1 Doc 3 Doc
    19. 19. Client Interaction with Couchbase or vBucket
    20. 20. Client Interaction with Couchbase CRC32("tesla") => Partition[0..1023] {962} ClusterMap[P(962)] => [x.x.x.x] => IP of Server Responsible for Partition 962 or vBucket
    21. 21. Add Capacity Online Operation or vBucket CRC32("tesla") => Partition[0..1023] {962} ClusterMap[P(962)] => [x.x.x.x] => IP of Server Responsible for Partition 962
    22. 22. Add Capacity Two More Nodes or vBucket
    23. 23. All Metadata for All Documents (64 bytes + Key Length) Document Values (NRU Ejected if RAM Quota Used > 90%) Also Leave RAM For OS: [Filesystem Cache, Views and Data Center Replication] Document Indexing Monitoring XDCR Recommended: minimum 4 Cores + 1 core per design document + 1 core per replicated bucket across data centers (XDCR) Persisted Documents All Indexes for Design Documents/Views Append-Only Disk Format & Compaction Performance: Multiple EBS Volumes High IOPS Raid 0 on Amazon RAM, CPU and IO Guidelines RAM CPU Disk IO
    24. 24. MONITORING
    25. 25. Couchbase Monitoring
    26. 26. DEMO Let’s grow an online cluster and monitor progress