Membase is a distributed database that provides simple, fast, and elastic key-value storage. It allows applications to scale out across commodity servers through consistent hashing and automatic rebalancing. A Membase cluster can be set up in five minutes or less with just a single node, and new nodes can be added elastically with no downtime. Membase uses the Memcached protocol and is compatible with thousands of existing applications.
3. Membase is a distributed database
Application user
Web application server Web application server Web application server
Membase Servers
In the data center On the administrator console
3
4. Membase is Simple, Fast, Elastic
Five minutes or less to a working
cluster
• Downloads for Linux and Windows
• Start with a single node
• One button press joins nodes to a cluster
Easy to develop against
• Just SET and GET – no schema required
• Drop it in. 10,000+ existing applications
already “speak membase” (via memcached)
• Practically every language and application
framework is supported, out of the box
Easy to manage
• One-click failover and cluster rebalancing
• Graphical and programmatic interfaces
• Configurable alerting
4
5. Membase is Simple, Fast, Elastic
Predictable
• “Never keep an application waiting”
• Quasi-deterministic latency and throughput
Low latency
• Built-in Memcached technology
High throughput
• Multi-threaded
• Low lock contention
• Asynchronous wherever possible
• Automatic write de-duplication
5
6. Membase is Simple, Fast, Elastic
Zero-downtime elasticity
• Spread I/O and data across commodity
servers (or VMs)
• Consistent performance with linear cost
• Dynamic rebalancing of a live cluster
All nodes are created equal
• No special case nodes
• Any node can replace any other node, online
• Clone to grow
Extensible
• Filtered TAP interface provides hook points
for external systems (e.g. full-text search,
backup, warehouse)
• Data bucket – engine API for specialized
container types
6
7. Built-in Memcached Caching Layer
Memcached Membase Cache
Membase Database Membase Database
Memcached Mode Membase Mode
Fact: Membase development team has also contributed over
half of the code to the Memcached project. 7
12. Clustering
• Underlying cluster
functionality based on
erlang OTP
• Have a custom, vector
clock based way of storing
and propagating...
– Cluster topology
– vBucket mapping
• Collect statistics from many
nodes of the cluster
– Identify hot keys, resource
utilization
12
13. TAP
• A generic, scalable method of streaming mutations
from a given server
– As data operations arrive, they can be sent to arbitrary TAP
receivers
• Leverages the existing memcached engine interface,
and the non-blocking IO interfaces to send data
• Three modes of operation
Data
Mutations Data set
Data
Mutations Data set Data set
13
14. Membase data flow – under the hood
SET request arrives at SET acknowledgement
KEY’s master server 1 2 returned to application
3 Listener-Sender
3
RAM
membase storage engine
4
Disk Disk Disk
Disk Disk Disk
Replica Server 1 for KEY Master server for KEY Replica Server 2 for KEY
14
16. A Membase Node: Component View
moxi ns_server
membase ns_server
(memcached + membase engine)
TAP
vbucketmigrator
16
17. Clients, nodes and other nodes
Client moxi + Client
port 11210
port 11211 memcached operations REST/comet
memcached operations cluster topology
and vbucket map
memcached operations
moxi ns_server
membase ns_server
(memcached + membase engine)
TAP
memcached operations
with tap commands vbucketmigrator
17
18. Data buckets are secure membase “slices”
Application user
Web application server
Bucket 1
Bucket 2
Aggregate Cluster Memory and Disk Capacity
Membase data servers
In the data center On the administrator console
18
20. Disk > Memory
Dataset may have many memory quota
Bucket Configuration
items infrequently accessed. mem_high_wat
However, memcached has mem_low_wat
different behavior (LRU) than
wanted with membase.
Still, traditional (most)
RDBMS implementations are
not 100% correct for us
either. The speed of a miss
is very, very important.
20
31. Membase Datatypes
• byte[]
– Does your data have
1s and 0s?
“Any customer can have
a car painted any colour
that he wants so long as
it is black.”
26
32. Membase Datatypes
• byte[]
– Does your data have
1s and 0s?
• Items do have flags
– Many clients use flags
– Data type options
• Google protobuf “Any customer can have
a car painted any colour
• Thrift that he wants so long as
• Avro it is black.”
26
33. Transactions
• Lock == slow me down
• CAS operations User 1 User 2
– Optimistic locking
Su
• Very useful with complex
cc
il!
es
Fa
datatypes
s
– Imagine two clients trying to
update a complex item
• You’re likely using CAS
already... if you use a CPU
27
34. Common Use: Sessions
• Web user sessions
– Highly read, less writes in many case
– Protocol advantage of memcached
• Options already for PHP, Ruby and Java
• Application state
– Not necessarily “entity” style things
– May be appropriate for a “cache” pool
28
35. Common Use (cache): Rate Limiting
• Want to provide API calls
into the system Your Users
– Twitter search h !
uc
¡O
– Google search services
• Use the atomic increment Your App
– Set an item with a unique ID
– Upon API request,
increment and check
• HTTP 420: go away and come
back later
29
39. NodeCode - What is it?
Method for extending & customizing Membase
Separate code modules
Defined interface to datapath and cluster manager
Notification on events
• Synchronous
• Asynchronous
33
40. NodeCode – Drivers
Simple
• Packaged modules for easy install and enable
• Library of “off the shelf” modules
• Module monitoring
• Straight forward development and debugging
Fast
• Low latency/high-throughput
• Per-bucket process isolation
• Don’t break data manager performance/correctness
Elastic
• Automatically migrate and instantiate on rebalance
• Provide support for migration of internal data
• Leverage native Membase engine for internal data storage
34