Couchbase 101 provides an overview of Couchbase including:
- Key concepts of Couchbase such as its use as a key-value store and document store using JSON documents.
- Single node and cluster-wide operations for reading, writing and updating documents.
- Cross data center replication (XDCR) to replicate data between geographically distributed clusters.
- Indexing and querying features including secondary indexes, views, and the new N1QL query language.
#4 KEY POINTS: BIG DATA IS NOT ONE THING – IT’S A COMBINATION OF OPERATIONAL (NOSQL) AND ANALYTICAL DATABASES. YOU NEED BOTH. COUCHBASE PROVIDES THE OPERATIONAL SOLUTION.
Big data has two major pieces: Operational and Analytical
Operational is about:
Real time
Online, interactive
Customer/consumer facing
Processing data at high velocity
Analytical is about:
Offline analytics
Often batch oriented
Takes time processing
Directly touches relatively few users (business analysts)
These two pieces together form “Big Data”
There’s some overlap
NoSQL can deliver some analytics
Hadoop can deliver some operational
But in general each technology designed for separate purposes
Couchbase fits on the operational side, Hadoop on the analytics side
#5 KEY POINTS: COUCHBASE DELIVERS ALL THE CAPABILITIES NEEDED TO MEET TODAY’S REQUIREMENTS FOR PERFORMANCE, SCALABILITY, AVAILABILITY, AND DATA MODEL FLEXIBILITY. THESE TRANSLATE INTO MAJOR BENEFITS FOR YOUR BUSINESS.
Couchbase was purpose-built to solve today’s requirements for enterprise-class, mission-critical, web and mobile applications.
Specifically, Couchbase delivers the following capabilities:
Fast performance at scale -- submillisecond latency to enable highly responsive applications, for millions or even hundreds of millions of users.
Easy, affordable scalability – Couchbase is a distributed database that scales out on commodity hardware with push button simplicity. We make it very easy to add or remove capacity on demand with no system downtime. On premises, in the cloud, wherever you want.
High availability – Couchbase automatically replicates your data across your servers, clusters, and data centers, so it’s always available, 24x7. And Couchbase doesn’t require any downtime to maintain.
Flexible data model – Couchbase gives you complete flexibility to handle any kind of data, and to change your data model on the fly to accommodate new data attributes or new data types.
It’s the kind of flexibility that developers love, because it gets rid of the rigid schemas that slow them down. So developers can build applications faster and easier.
All this adds up to powerful benefits for your enterprise:
Faster development & time to market
Better business agility
Improved customer experience
Increased loyalty and revenue
Lower IT costs and increased efficiency
#6 KEY POINT: ENTERPRISES ARE USING COUCHBASE ACROSS A RANGE OF MISSION CRITICAL USE CASES.
As the slide shows, Couchbase supports a wide range of use cases, from Profile Management to Fraud Detection.
Each use case has its own set of requirements – some need very high performance, some need very high availability, some need flexibility of the data model.
The ability to meet all of these requirements is what has driven adoption of Couchbase.
#8 All information that you store in Couchbase Server are documents with keys. Keys are unique identifiers for a document, and values are either JSON documents or if you choose the data you want to store can be byte stream, data types, or other forms of serialized objects. Value can be JSON or binary objects, such as integers and strings.
Keys are also known as document IDs and serve the same function as a SQL primary key. A key in Couchbase Server can be any string, including strings with separators and identifiers, such as ‘person_93679.’ A key is unique.
When Couchbase Server is used as a store for JSON documents, the records can be indexed and queried. Couchbase Server provides a JavaScript-based query engine to find records based on field values.
#9 Key selection is very important. Key’s are hard to change at a latter point. ID’s are similar to the primary key defined when the table is created. Lookups are extremely fast because clients know exactly which server the document belongs to based on consistent hashing.
ID’s can appear only once per bucket. In couchbase, we call them buckets, A bucket is equivalent to a table or a collection.
Selection your ID depends on your document model as well.
Questions.
Options.
UUID….
Hand crafted.
In Some NoSQL database systems, data is sorted by ID. If you use prefixes for related objects , you can look up related objects faster. Selecting a clever ID, can make your life a lot easier.
#13 KEY POINT: COUCHBASE PROVIDES A SET OF MULTI-PURPOSE, CORE CAPABILITIES THAT SUPPORT A BROAD RANGE OF APPLICATIONS AND USE CASES, ALL IN A SINGLE DATA MANAGEMENT PLATFORM.
Couchbase provides a set of technology capabilities to support a broad range of applications and use cases:
High Availability Cache: Couchbase provides an integrated managed object cache, so you can start out using Couchbase as a high availability cache on top of your existing relational database. For example, you can use Couchbase as a session store in front of your relational database, if your relational DB is struggling to keep up with the load required for online interactive applications.
Key-Value Store: Many customers start with Couchbase as a cache and then broaden their usage to other capabilities, like using Couchbase as a Key-Value Store for things like Profile Management.
Document Database: From there, you can grow into using Couchbase as a Document Database, where you can do more with capabilities like indexing and Cross Data Center Replication.
Embedded Database: Couchbase also provides an embedded database called Couchbase Lite. It’s a purpose-built database for the device, so you can build applications that are always available and always work, whether offline or online.
Sync Management: Finally, as part of our solution for mobile applications, we provide Couchbase Sync Gateway, which automatically synchronizes data on the device with Couchbase Server in the cloud so your developer doesn’t have to write code to manage the complex sync process.
Starting with cache and then expanding to other capabilities is often a good way to learn the technology and get comfortable with Couchbase for a wider set of use cases.
#14 Couchbase has emerged as a leading NoSQL provider for number of reasons:
Best in performance and scalability
We’ve engineered Couchbase from the ground up for high performance and scalability
Couchbase is designed to deliver sub-millisecond responsiveness with very high throughput for both reads and writes
We consistently outperform competitors like MongoDB and DataStax in multiple independent benchmarks
Our performance advantage is driven in large part by our memory-centric architecture, which includes an integrated managed object cache and stream-based replication
Broad use case support
We’re the only NoSQL provider that has consolidated distributed cache, key-value store, and a JSON-based document database in a single platform
This means customers can use Couchbase for a much broader range of applications
Integrated mobile solution
We’re the only vendor that provides an end-to-end NoSQL mobile solution -- allows customers to easily build mobile apps that run great on or offline
Includes a JSON database embedded on the device, along with a prebuilt syncing tier
So apps run great on the device, even without a network connection or no connectivity at all
Data on the device auto-syncs with the backend server when a connection is available
Simplified administration
We’ve designed Couchbase to be exceptionally easy to deploy and manage
Features such as an integrated Admin Console and single-click cluster expansion & rebalance dramatically increase admin efficiency
#16 Each Couchbase node is exactly the same.
All nodes are broken down into two components: A data manager (on the left) and a cluster manager (on the right). It’s important to realize that these are separate processes within the system specifically designed so that a node can continue serving its data even in the face of cluster problems like network disruption.
The data manager is written in C and C++ and is responsible both for the object caching layer, persistence layer and querying engine. It is based off of memcached and so provides a number of benefits;
-The very low lock contention of memcached allows for extremely high throughput and low latencies both to a small set of documents (or just one) as well as across millions of documents
-Being compatible with the memcached protocol means we are not only a drop-in replacement, but inherit support for automatic item expiration (TTL), atomic incrementer.
-We’ve increased the maximum object size to 20mb, but still recommend keeping them much smaller
-Support for both binary objects as well as natively supporting JSON documents
-All of the metadata for the documents and their keys is kept in RAM at all times. While this does add a bit of overhead per item, it also allows for extremely fast “miss” speeds which are critical to the operation of some applications….we don’t have to scan a disk to know when we don’t have some data.
The cluster manager is based on Erlang/OTP which was developed by Ericsson to deal with managing hundreds or even thousands of distributed telco switches. This component is responsible for configuration, administration, process monitoring, statistics gathering and the UI and REST interface. Note that there is no data manipulation done through this interface.
#24 The application makes a call for a key called NYC MQ1
We run the key through the crc 32 function and the result of that hash function is that it points to vbucket3
Which in turn points to couchbase server number 1
#25 We now run a different key through through the has and we now come up with differnet vbucket, vbucket 4 and that points to server 3
#26 We now run a different key through through the has and we now come up with differnet vbucket, vbucket 4 and that points to server 3
#32 KEY POINT: COUCHBASE’S MEMORY-TO-MEMORY DATA REPLICATION IS MARKETING DEFINING AND UNIQUE TO COUCHBASE – IT’S ONE OF THE KEY REASONS ENTERPRISES CHOOSE COUCHBASE OVER RELATIONAL DATABASES AND OTHER NOSQL PRODUCTS.
The built-in replication in Couchbase is extremely fast and highly scalable.
It’s memory to memory – which means it’s not limited by the slower speed of reading data from a disk, so it’s very, very fast.
And it’s extremely scalable – you can very high throughput with large numbers of writes going from one cluster to another. You can have different topology on both sides. Obviously you need to have the capacity appropriately sized so they can handle the load.
This memory-to-memory replication is market defining and unique to Couchbase. No other solution like this, built into the database, exists in the market today.
This is one of the key reasons enterprises choose Couchbase over other NoSQL products like MongoDB and Cassandra, and over relational databases.
#33 Every node must be able to talk to every other node in each cluster…this has certain implications for cloud deployments: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-tasks-xdcr-cloud.html
#37 This slide has an click-by-click animation
1. (click) A set request comes in from the application .
2. Couchbase Server responses back that they key is written
3. (click)Couchbase Server then Replicates the data out to memory in the other nodes
At the same time it is put the data into a write queue to be persisted to disk
(click)Once it is on disk, the item is processed by the view engine and sent out any configured XDCR link to one or more clusters
#38 Indexing and querying
distributed
create indexes on the fields in JSON documents
Called Views in Couchabse Server
Views are queried to find the objects you are interested in, e.g. range queries to find all players that have black sheep on their farm
(if asked: No ad-hoc query language. Index are described via simple Javascript.)
Incremental Map Reduce
“Normal” map reduce is batch based: i.e. it has to run across all data everytime. So you don’t get updated results often, especially over large data sets. Incremental Map reduce is only considering data that has changed and then calculates the updated result.
This happens fast, in near real-time.
Distributed across all nodes, so able to cope with large data amounts
Does single map and reduce step, so great for simple analytics like leaderboards, counts sums, across data having specific attributes/charcteristics.
Full Text Search
Integration with separate Elastic search cluster, using XDCR technology
Robust, so will efficiently cope with node failures rebalances or interrupted connections to keep the full text index in sync
Elastic search is a very fast JSON document based full text indexing open source solution, based on Apache Lucene (the same as used by SOLR that more people will know)
Elastic search is also clustered and scales easily and provides very flexible and powerful full text search capabilities
#39 This slide has an click-by-click animation
1. (click) A set request comes in from the application .
2. Couchbase Server responses back that they key is written
3. (click)Couchbase Server then Replicates the data out to memory in the other nodes
At the same time it is put the data into a write queue to be persisted to disk
(click)Once it is on disk, the item is processed by the view engine and sent out any configured XDCR link to one or more clusters
#41 http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search
ElasticSearch cluster is fed the documents from the Couchbase Server cluster
Elastic search indexes the fields(configurable which ones) and by default will only store references back to the document id
The application does document access via the Couchbase Server Cluster and uses The Views and incremental map reduce on the Couchbase cluster.
For full text queries it queries the Ealstic search cluster directly (simple Http and JSON interface)
The full text queries typically returns the ids of the matching documents.
Documents are then retrieved from the Couchbase Server cluster.
This way the high throughput document access always comes from high performance Couchbase Cluster.
#42 http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search
ElasticSearch cluster is fed the documents from the Couchbase Server cluster
Elastic search indexes the fields(configurable which ones) and by default will only store references back to the document id
The application does document access via the Couchbase Server Cluster and uses The Views and incremental map reduce on the Couchbase cluster.
For full text queries it queries the Ealstic search cluster directly (simple Http and JSON interface)
The full text queries typically returns the ids of the matching documents.
Documents are then retrieved from the Couchbase Server cluster.
This way the high throughput document access always comes from high performance Couchbase Cluster.
#47 We now run a different key through through the has and we now come up with differnet vbucket, vbucket 4 and that points to server 3