Your SlideShare is downloading. ×
  • Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

CouchConf Tokyo 2013_App Development with Documents Indexes and Queries



  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • example: running the mapper and reducer over all of the docs.Caveat: use careful guards to be sure view execution doesn’t stop due to unbound variables
  • First, walk through the optionsThen mention Observe
  • Mention in summary…Also, note there are other systems such as Hadoop, Elastic search that have a lot of merit to add, and there are system like relational databases that don’tadddresscontemporaryneeds


  • 1. App DevelopmentDocuments, Indexes and Queries Matt Ingenthron Director, Developer Solutions
  • 2. Agenda• Introduction to Indexing and Querying in Couchbase• The lifecycle of Couchbase Views• Indexing and Querying with related documents• Patterns
  • 4. Couchbase Server 2.0: Views• Views can cover a few different use cases – Primary Index – Simple secondary indexes (the most common) – Complex secondary, tertiary and composite indexes – Aggregation functions (reduction) • Example: count the number of North American Ales – Organizing related data• Built using Map/Reduce – Map function creates a matrix from document fields – Reduce function summarizes (reduces) information – Written using superfast Javascript
  • 5. Querying from Views Querying from Ruby Clientblog = c.design_docs[blog]blog.views #=> ["recent_posts"]blog.recent_posts #=> [#<Couchbase::ViewRow:9855800 @id="hello-world"@key="2009/01/15 15:52:20" @value="Hello World" @doc=nil @meta={}@views=[]>, ...]blog.recent_posts.each do |doc| # do something # with doc object doc.key # gives the key argument of the emit() doc.value # gives the value argument of the emit()end
  • 7. View Definition (in JavaScript)like:CREATE INDEX city ON brewery city; 8
  • 8. Distributed Index Build Phase• Optimized for lookups, in-order access and aggregations• All view reads from disk (different performance profile)• View builds against every document on every node – This is why you should group them in a design document• Automatically kept up to date SERVER SERVER SERVER 3 Active Docs Active Docs Active Docs 1 2 Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC 9
  • 9. Dynamic Range Queries with Optional Aggregation• Efficiently fetch an row or group of related rows.• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries ?startkey=“J”&endkey=“K” {“rows”:[{“key”:“Juneau”,“value”:null}]} SERVER SERVER SERVER 3 Active Docs Active Docs Active Docs 1 2 Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC
  • 10. Queries run against stale indexes by default• stale=update_after (default if nothing is specified) – always get fastest response – can take two queries to read your own writes• stale=ok – auto update will trigger eventually – might not see your own writes for a few minutes – least frequent updates -> least resource impact• stale=false – Use with Persistence observe if data needs to be included in view results – BUT aware of delay it adds, only use when really required
  • 11. Development vs. Production Views• Development views index a subset of the data.• Publishing a view builds the index across the entire cluster.• Queries on production views are scattered to all cluster members and results are gathered and returned to the client.
  • 12. Emergent Schema • Falls out of your key-value usage • Helps to know whats efficient • Mostly you can relax"Capture the users intent" Github API Twitter API
  • 14. Use a built-in reduce function with a group query• Lets find average abv for each brewery!
  • 15. We are reducing doc.abv with _stats
  • 16. Group reduce (reduce by unique key)
  • 18. Find patterns in beer comments by time { "type": "comment", "about_id": "beer_Enlightened_Black_Ale", "user_id": 525,timestam "text": "tastes like college!",p "updated": "2010-07-22 20:00:20" } { "id": "u525_c1" }
  • 19. Query with group_level=2 to get monthly rollups
  • 20. dateToArray() is your friend• String or Integer based timestamps• Output optimized for group_level queries• array of JSON numbers: [2012,9,21,11,30,44]
  • 21. group_level=2 results• Monthly rollup• Sorted by time—sort the query results in your application if you want to rank by value—no chained map-reduce 2
  • 22. group_level=3 - daily results - great for graphing• Daily, hourly, minute or second rollup all possible with the same index.•
  • 24. Aggregate value stored in a document• Lets find the top-rated beers! { "brewery": "New Belgium Brewing", "name": "1554 Enlightened Black Ale", "abv": 5.5, "description": "Born of a flood...", "category": "Belgian and French Ale", "style": "Other Belgian-Style Ales", "updated": "2010-07-22 20:00:20", “ratings” : { “ingenthr” : 5, ratings “jchris” : 4, “scalabl3” : 5, “damienkatz” : 1
  • 25. Sort each beer by its average rating• Lets find the top-rated beers! average 26
  • 27. Join Through CollationSee Bradley Holt’s presentationfrom CouchConf Boston:
  • 28. Anti-patterns• Emitting document or too much data into a view – Especially avoid including the doc itself in an emit() call• Reduces that don’t reduce – If you implement a custom reduce, make sure it doesn’t expand!• Expecting a query on an index to be as fast – Secondary indexes need to be built, happen asynchronously, and are cached at the filesystem level• Trying to do too much with one view – Instead, co-locate views in design documents, or have separate design documents• Note that sometimes, you may need to make requests of multiple views – There is not directly a method of doing a join, but there is a technique
  • 30. Integration with ElasticSearch1. ElasticSearch Query 2. ElasticSearch Result 3. Couchbase Multi-GET 4. Couchbase Result ElasticSearch
  • 31. The Learning Portal • Designed and built as a collaboration between MHE Labs and Couchbase • Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration • Available for download and further development as open source code
  • 32. Integration with Hadoop Ad Targeting Platform Logs Logs LogsCouchbase Server Cluster Logs sqoop export Logs flume flow sqoop import Hadoop Cluster
  • 33. In SummaryCouchbase has Views for Indexing andQueryingViews are incremental map-reduce code that run across all documents.Views Allow Common Methods of QueryingCommon patterns such as simple secondary indexes, count and averageaggregations, and time series rollups are simple and fast.Couchbase Integrates for Full Text and Large AnalyticsCouchbase integrates with ElasticSearch, Hadoop and other systems. 35
  • 34. Q&A 36
  • 35. THANKS! 37