Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology
  • Be the first to comment


  1. 1. TuningCouchbaseTim Smith, Engineer
  2. 2. IntroductionI am:●●● Support Engineer● Sales Engineer
  3. 3. IntroductionYou are:● Using Membase● Using CouchDB● Using it in production● 100ms response● 2ms response● Smarter than I am
  4. 4. Simple, Fast, Elastic*Membase● 5-minute cluster setup● Memcached API● Memcached-fast responses (working set)● Saturate network with minimal CPU● Pleasant admin UI● Rebalance on the fly
  5. 5. Simple, Fast*, ElasticCouchDB● Webophilic: JSON, RESTy HTTP, Javascript● Append-only, crash-only, MVCC (hide the fancy bits)● Bidi replication, /_changes● Replication & app-level sharding scale out● Your data. Everywhere.
  6. 6. Simple*, Fast, ElasticCouchbase Server 2.0● Auto-sharding, clustered elasticity● Caching, predictable low-latency● Scatter-gather, incremental map/reduce● Rich hands-on-your-data features
  7. 7. API ComboCouchbase Server 2.0● Both memcached binary protocol● And CouchDB HTTP protocol● SDK provides consistent interface● Optionally synchronous persistence
  8. 8. Lots to Learn…We welcome your discoveries, ingenioussolutions and feedback.But we’re not starting from scratch!Best practices from Membase and CouchDBstill apply.● Hardware and system resources● Client and API usage● Data modeling
  9. 9. HardwareMembase:● RAM, RAM, RAM!● Fast disk (throughput) helpful for write- intensive applications, and disk-heavy ops (rebalance)● Network bandwidth may become an issue● Adding more nodes can help with all three
  10. 10. Hardware—RAMProper cluster sizing is #1● wiki: Sizing Guidelines● Main variables include total number of items, size of working set, replicas and per-item overhead● Under-provisioning reduces elasticity
  11. 11. Hardware—DiskFast disk not too important…until it is● Rebalance can move a lot of data around● Especially when disk > RAM● Warm-up time after node restart● Under-provisioning reduces elasticity● More nodes spread out the I/O● SSD, RAID, the usual stuf
  12. 12. HardwareCouchDB:● CPU usage can be signifcant with view updates, replication flters, formatting via /_list and /_show● Fast disk helpful for many applications, and disk-heavy ops (compaction)● Separate data and view on diferent filesystems to improve I/O● RAM can’t hurt
  13. 13. Hardware—CloudCloud hosting brings variability● Disk bandwidth can occasionally drop● Even identical instances may perform diferently● Large instances more reliable● More instances provide redundancy● Best bang-for-the-buck still an open question
  14. 14. ConfigurationMembase client● Use a Membase-aware smart client (spymemcached for Java, Enyim for C#) ● Or, run moxi on the client host ● Minimizes network hops, preserves bandwidth● Value compression (often automatic)
  15. 15. ConfigurationCouchDB client● Caching, Etag / If-None-Match● Compression, Accept-Encoding: gzip● Keep-Alive● By the way, Couchbase Single Server has some killer performance increases (coming soon to an Apache CouchDB release)
  16. 16. API UsageMemcached API● Binary protocol● Multi-get and multi-set● Incr, decr, append, prepend● TTL expiration, get-and-touch
  17. 17. API UsageCouchDB API● HEAD vs. GET, ?limit=1● ?startkey, not ?skip● Use built-in reduce functions: _sum, _count, _stats; write views in Erlang● Keep view index size in mind—emit just what you need
  18. 18. API UsageCouchDB API● Use ?group_level to aggregate over structured keys● Emit null, and use ?include_docs to get more data (faster view generation)● Emit more data, so ?include_docs isn’t needed (avoid random I/O on query)
  19. 19. Modeling—Doc SizeBundle related info into one document● Fewer items → less caching overhead● Reduce number of requests clients make● Promotes server-side processing with _show functions● More context available for flexible maps
  20. 20. Modeling—Doc SizeBreak up serial items to separate docs● E.g., comments, events, other “feeds”● Each entry is self-described● Avoids write contention on a container● Avoid read/write of container contents just to make a small addition● May be gathered with map/reduce view
  21. 21. Modeling—Key SizeUse short key values● At the clustering layer, all keys are kept in RAM, tracked for replicas, etc.● 255 bytes max length, but prefer short keys● At CouchDB layer, id is likewise used in many places, and short ids are more efficient● Semantic keys
  22. 22. Modeling—IndexesConsider other index types● Full-text integration● Geo-spatial (can be used for non-spatial data, too)● Hadoop connector w/ Couchbase Server (via TAP)
  23. 23. Modeling—K/V TricksNon-obvious models in key/value space● Example: level of indirection to “remove” a bunch of keys without knowing their keys:● Defne a master key, e.g. obj_rev: 3● Defne subordinate attribute keys with the master value in the key name, e.g. obj_foo-3, obj_bar-3● Increment obj_rev, and rely on TTL to reap stale attribute items
  24. 24. Diagnostic StatsMonitoring Couchbase Server● Ops/sec● RAM usage vs. high/low water marks● Growth of RAM usage (mem_used)● Growth of metadata usage (ep_overhead)
  25. 25. Diagnostic StatsMonitoring Couchbase Server● RAM ejections for active/replica data (*eject*)● Cache miss ratio (get_hits vs. ep_bg_fetched)● Disk write queue size (ep_queue_size + flusher_todo)● Disk space available
  26. 26. Diagnostic StatsError condition stats● Disk write errors (*failed*)● Uptime resets● Out of memory conditions (*oom*)● Swap usage
  27. 27. What’d I MissBefore questions, I want assertions.That is, you’re smarter than I am● …and you’ve got more experience● What’s the most important tip you know?● What mistakes did I make?
  28. 28. Thank You Tim Smith tim@couchbase.com @couchtim