2. Couchbase NoSQL Leadership
Leading NoSQL database company
The company behind Couchbase open source project
Open Source development & business model
Document-oriented NoSQL database
Focused on interactive internet and mobile applications
Provide more flexible, higher performance,
more scalable database than relational alternative
Most mature, reliable and widely deployed solution
>5,000 paid production deployments worldwide, over 350 customers
Headquarters in Silicon Valley (Mountain View, CA)
~100 employees including >60 in engineering/product
>80% of commits to Couchbase, memcached, Apache CouchDB
2
3. Market Adoption
Internet Companies Enterprises
• Social Gaming • Communications
• Ad Networks • Retail
• Social Networks • Financial Services
• Online Business • Health Care
Services
• Automotive/Airline
• E-Commerce
• Agriculture
• Online Media
• Content Management • Consumer Electronics
• Cloud Services • Business Systems
3
4. Market Adoption – Customers
Internet Companies Enterprises
More than 350 customers -- 5,000 production deployments worldwide
4
5. Relational Technology Scales Up
Application Scales Out
Just add more commodity web servers
System Cost
Application Performance
Web/App Server Tier
Users
RDBMS Scales Up
Get a bigger, more complex server
System Cost
Application Performance
Won’t
scale
beyond
this point
Relational Database
Users
Expensive and disruptive sharding, doesn’t perform at web scale
5
6. Couchbase Server Scales Out Like App Tier
Application Scales Out
Just add more commodity web servers
System Cost
Application Performance
Web/App Server Tier
Users
NoSQL Database Scales Out
Cost and performance mirrors app tier
System Cost
Application Performance
Couchbase Distributed Data Store
Users
Scaling out flattens the cost and performance curves
6
7. Couchbase Server Is The Complete Solution
Easy Consistent High
✔ Scalability ✔ Performance
One click scalability and no app Sub millisecond latency with high
changes. throughput for reads and writes.
✔ Always On ✔ Flexible
24x365 Data Model
Maintenance, upgrades and JSON document model with no fixed
cluster resizing all online schema.
without application downtime
7
9. Data driven use cases
• Support for unlimited data growth
• Data with non-homogenous structure
• Need to quickly and often change data structure
• 3rd party or user defined structure
• Variable length documents
• Sparse data records
• Hierarchical data
9
11. Performance driven use cases
• Low latency matters
• High throughput matters
• Large number of users
• Unknown demand with sudden growth of
users/data
• Predominantly direct document access
• Workloads with very high mutation rate per
document
11
12. Use Case Examples
Web app or Use-case Couchbase Solution Example Customer
Content and Metadata Couchbase document store + Elastic Search McGraw-Hill…
Management System
Social Game or Mobile Couchbase stores game and player data Zynga, OMGPOP…
App
Ad Targeting Couchbase stores user information for fast AOL…
access
User Profile Store Couchbase Server as a key-value store TuneWiki…
Session Store Couchbase Server as a key-value store Concur….
High Availability Couchbase Server as a memcached tier Orbitz…
Caching Tier replacement
Chat/Messaging Couchbase Server DOCOMO…
Platform
12
13. Orbitz use cases
Web app or Use-case Couchbase Solution Example Customer
Content and Metadata Couchbase document store + Elastic Search McGraw-Hill…
Management System
Social Game or Mobile Couchbase stores game and player data Zynga…
App
Ad Targeting Couchbase stores user information for fast AOL…
access
User Profile Store Couchbase Server as a key-value store TuneWiki…
Session Store Couchbase Server as a key-value store Concur….
High Availability Couchbase Server as a memcached tier Orbitz…
Caching Tier replacement
Chat/Messaging Couchbase Server DOCOMO…
Platform
13
14. Orbitz Use case – Caching Tier
SCALABILITY
RELIABILITY
PERFORMANCE
MANAGEABILITY
14
17. Deployment speed
• Previous Cache • Couchbase
– Weeks of planning - Days of planning
– Hours of testing - Hours of testing
– Hours of down time - Minutes to deploy
to implement without downtime
• Issues regularly
require rework
17
21. Node Distribution
• Previous Cache • Couchbase
– Multicast Address bound – Cross Data Center
– Same Rack / vLan – Distributed Chassis
– Local Cache – Different subnets
– Different vLan
21
24. Detour: how Couchbase works
Partitioning The Data – vbucket (internal shards) map
24
25. Basic Operation – scale out
APP SERVER 1 APP SERVER 2
Docs distributed evenly across
COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY
servers in the cluster
Each server stores both active
CLUSTER MAP CLUSTER MAP
& replica docs
Only one server active at a time
Client library provides app with
Read/Write/Update Read/Write/Update simple interface to database
Cluster map provides map to
which server doc is on
App never needs to know
SERVER 1 SERVER 2 SERVER 3
App reads, writes, updates
Active Docs Active Docs Active Docs
docs
Doc 5 DOC Doc 4 DOC Doc 1 DOC
Multiple App Servers can
Doc 2 DOC Doc 7 DOC Doc 3 DOC access same document at
Doc 9 DOC Doc 8 DOC Doc 6 DOC
same time
Replica Docs Replica Docs Replica Docs
Doc 4 DOC Doc 6 DOC Doc 7 DOC
Doc 1 DOC Doc 3 DOC Doc 9 DOC
Doc 8 DOC Doc 2 DOC Doc 5 DOC
COUCHBASE SERVER CLUSTER
User Configured Replica Count = 1 25
26. Add Nodes
APP SERVER 1 APP SERVER 2
Two servers added to
COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY cluster
One-click operation
CLUSTER MAP CLUSTER MAP
Docs automatically
rebalanced across
cluster
Even distribution of
docs
Read/Write/Update Read/Write/Update Minimum doc
movement
Cluster map updated
App database calls now
distributed over larger #
SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 of servers
Active Docs Active Docs Active Docs Active Docs Active Docs
Active Docs
Doc 5 DOC Doc 4 DOC Doc 1 DOC
Doc 3
Doc 2 DOC Doc 7 DOC Doc 3 DOC
Doc 6
Doc 9 DOC Doc 8 DOC Doc 6 DOC
Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs
Replica Docs
Doc 4 DOC Doc 6 DOC Doc 7 DOC
Doc 7
Doc 1 DOC Doc 3 DOC Doc 9 DOC
Doc 9
Doc 8 DOC Doc 2 DOC Doc 5 DOC
COUCHBASE SERVER CLUSTER
User Configured Replica Count = 1 26
27. Fail Over Node
APP SERVER 1 APP SERVER 2
App servers happily accessing docs
on Server 3
COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY Server fails
App server requests to server 3 fail
CLUSTER MAP CLUSTER MAP Cluster detects server has failed
Promotes replicas of docs to active
Updates cluster map
App server requests for docs now
go to appropriate server
Typically rebalance would follow
SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5
Active Docs Active Docs Active Docs Active Docs Active Docs
Active Docs
Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 9 DOC Doc 6 DOC
Doc 3
Doc 2 DOC Doc 7 DOC Doc 3 Doc 8 DOC
Doc 6
DOC
Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs
Replica Docs
Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 5 DOC Doc 8 DOC
Doc 7
Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 2 DOC
Doc 9
COUCHBASE SERVER CLUSTER
User Configured Replica Count = 1 27
28. Back to Orbitz: PERFORMANCE
• Latency Improvements
• Resident / In-Memory
• Uptime / Availability
• Couchbase vs. RDBMS
28
29. Latency
Pull data from Couchbase using the rest API into Graphite
29
31. Detour: Key results of Cisco and Solarflare Benchmark
Couchbase Server demonstrates
• Consistent sub-millisecond
latency for mixed workload
(40% writes)
• High throughput
• Linear scalability
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf 31
32. Your secret weapon: Sub-millisecond AND consistent latency
Latency (micro seconds)
Consistently low latencies
in microseconds for
varying documents sizes
with a mixed workload
Object size (Bytes)
32
33. Your secret weapon: Linear scalability
High throughput with 1.4
GB/sec data transfer rate
using 4 servers
Operations per second
Linear throughput
scalability
Number of servers in cluster
33
43. Summary
Scalable
Reliable
Performance
Manageable
The biggest change Orbitz made in the past year to produce lower latency, and faster product to their
customers. It performs “awesome”.
Orbitz has not encountered any kind of negative tradeoff, everything has been positive at this point.
43
44. Looking Ahead
• More Conversions
• Consolidations
• DB Augmentation
– Integration to search index
• Couchbase 2.0
– Data Center Replication
– Indexing and querying
• Reduce Mapping for analytics
– Automatic data compaction
• Metrics Integration
– Graphite
– AppDynamics
44
45. resources
• Gigaom article: Balancing Oracle and open source at Orbitz
http://gigaom.com/cloud/balancing-oracle-and-open-source-at-orbitz/
• Use cases
http://www.couchbase.com/couchbase-server/use-cases
• Getting started with Couchbase
http://www.couchbase.com/couchbase-server/getting-started
• Couchbase 2.0 developer guide
http://www.couchbase.com/docs/couchbase-devguide-2.0/index.html
45
46. Couchbase Server 2.0
• Next major release of Couchbase Server
• Approaching GA
What’s new:
• New storage engine technology (Append only b-tree)
• Indexing and Querying
• Incremental Map Reduce
• Cross Data Center Replication
• Better memory management, large data sets, and other
technological improvments
• Fully backwards compatible with existing Couchbase Server
46
47. THANK YOU
COUCHBASE
SIMPLE, FAST, ELASTIC NOSQL
sharon@couchbase.com
47
Editor's Notes
I am going to talk about some usecases of Couchbase, specifically about how Orbitz replaced it’s entire caching tier infastructure.It is based on presentation Orbitz gave on our last CouchConf San Francisco, while I’ll paint the picture around it a little bit with how things work under the hood.Happy to answer any questions at the end, especially about Couchbase.
These are the market segments
Partial listing of companies with paid production deploymentsThousands more using open source
Typical architecture, we have stateless application servers, sitting behind a load balancer. as the usage grows, adding additional app servers , update the load balancer and scale out the application linearly on both aspects – Costs and Performance. But the data tier is has a shared everything architecture. At a minimum, these are shared cache or shared disk systems. And so you need to scale up you will need expensive hardware. And even from a performance perspective you hit a limit. so both cost and performance with this approach is non –linear.
If you contrast this architecture for NoSQL systems with relational systems, with a document model and auto-sharding, the database now scales horizontally along with your app servers tier. Giving you the linear cost and performance you want.
Cost saving, complexity, managing the cluster
Shared nothing architecture, all nodes are equal. It doesn’t matter which network/vlan it sits at.All you need is to plan for your capacity, define the buckets parameters and deploy
Horizontal scale of the cluster with the standard machines. Optimizing for machine spec and cluster size, but don’t need to worry about scale up limit.
Note, that Couchbase can be used as a high available cache, meaning that if one node on a cluster goes down, failing it over will activate the replica instantly, make the data vailable. Vs. the need to reload all the date of the failed node from the original DB.
This is due to Couchbase very low lock contention architecture and predictable latency and throughput.
To paint more color on the latency and high throughput,
600Gb with 30min TTlDropped to 50G of RAM used for the same number of sessions
Combination of:Down town for maintenanceIssues after bringing cluster back on line Right after Couchbase installed, 100% availability except hardware issues, however, with the automatic failover feature, even that is mitigated
When compared to a database,They saw fewer connections, less buckets (equivalent of databases on RDBMS)
Moving applications between clusters required a lot of energy, downtime planning. Which now is much simple, and derives..Consolidation. Fewer DBs in total, consolidating many systems into one cluster.Application agility – changing document schema, size, etc.
Easy to coordinate geo located teamsEasy to replicate data with the tools
Conversions to latest releasesConsolidations of systems to Couchbase, and to existing clustersDB augmentation – reduce the calls to the database and reduce DB cost.