Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • example: running the mapper and reducer over all of the docs.Caveat: use careful guards to be sure view execution doesn’t stop due to unbound variables
  • First, walk through the optionsThen mention Observe
  • Mention in summary…Also, note there are other systems such as Hadoop, Elastic search that have a lot of merit to add, and there are system like relational databases that don’tadddresscontemporaryneeds
  • Couchbase_UK_2013_App_Dev_with_Indexes_Queries_Geo

    1. 1. App Development withIndexes, Queries and Geo Michael Nitschinger Developer Advocate
    2. 2. Agenda• Introduction to Indexing and Querying in Couchbase• The lifecycle of Couchbase Views• Indexing and Querying with related documents• Patterns
    3. 3. Indexing and Querying
    4. 4. Couchbase Server 2.0: Views• Views can cover a few different use cases - Simple secondary indexes (the most common) - Complex secondary, tertiary and composite indexes - Aggregation functions (reduction) • Example: count the number of North American Ales - Organizing related data• Built using Map/Reduce - Map function creates a matrix from document fields - Reduce function summarizes (reduces) information - Written using superfast Javascript (Google V8)
    5. 5. Querying from Views Querying from Ruby Client
    6. 6. View Lifecycle:Define – Build - Query
    7. 7. View Definition (in JavaScript)like: CREATE INDEX city ON;
    8. 8. Distributed Index Build Phase• Optimized for lookups, in-order access and aggregations• All view reads from disk (different performance profile)• View builds against every document on every node• Automatically kept up to date (on writes and reads) SERVER SERVER SERVER 3 Active Docs Active Docs Active Docs 1 2 Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC 9
    9. 9. Dynamic Range Queries with Optional Aggregation• Efficiently fetch an row or group of related rows.• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries ?startkey=“J”&endkey=“K” {“rows”:[{“key”:“Juneau”,“value”:null}]} SERVER SERVER SERVER 3 Active Docs Active Docs Active Docs 1 2 Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC
    10. 10. Queries run against stale indexes by default• stale = UPDATE_AFTER (default if nothing is specified) - always get fastest response - can take two queries to read your own writes• stale = OK - auto update will trigger eventually - might not see your own writes for a few minutes - least frequent updates -> least resource impact• stale = FALSE - Use with Persistence observe if data needs to be included in view results - BUT aware of delay it adds, only use when really required
    11. 11. Development vs. Production Views• Development views index a subset of the data.• Publishing a view builds the index across the entire cluster.• Queries on production views are scattered to all cluster members and results are gathered and returned to the client.
    12. 12. Emergent Schema • Falls out of your key-value usage • Helps to know whats efficient • Deal with unstructured data more easily - Different schemas/APIs "Capture the users intent"Github API
    13. 13. Query Pattern: Basic Aggregations
    14. 14. Simple secondary Index• Lets find average abv for each brewery!
    15. 15. Aggregation: Reducing doc.abv with _stats
    16. 16. Group reduce (reduce by unique key)
    17. 17. Query Pattern: Time Based Rollups
    18. 18. Find patterns in beer comments by time { "type": "comment", "about_id": "beer_Enlightened_Black_Ale", "user_id": 525,timestam "text": "tastes like college!",p "updated": "2010-07-22 20:00:20" } { "id": "u525_c1" }
    19. 19. Query with group_level=2 to get monthly rollups
    20. 20. dateToArray() is your friend• String or Integer based timestamps• Output optimized for group_level queries• array of JSON numbers: [2012,9,21,11,30,44]
    21. 21. group_level=2 results• Monthly rollup• Sorted by time—sort the query results in your application if you want to rank by value—no chained map-reduce 2
    22. 22. group_level=3 - daily results - great for graphing• Daily, hourly, minute or second rollup all possible with the same index.•
    23. 23. Query Pattern: Leaderboard
    24. 24. Aggregate value stored in a document • Lets find the top-rated beers! { "brewery": "New Belgium Brewing", "name": "1554 Enlightened Black Ale", "style": "Other Belgian-Style Ales", "updated": "2010-07-22 20:00:20", “ratings” : { “ingenthr” : 5, “jchris” : 4,ratings “scalabl3” : 5, “damienkatz” : 1 }, “comments” : [ “f1e62”, “6ad8c“ ] }
    25. 25. Sort each beer by its average rating• Lets find the top-rated beers! average 26
    26. 26. Query Pattern:Collation of Related Docs
    27. 27. Join Through CollationSee Bradley Holt’s presentationfrom CouchConf Boston:
    28. 28. Anti-patterns• Emitting document or too much data into a view - Especially avoid including the doc itself in an emit() call• Reduces that don’t reduce - If you implement a custom reduce, make sure it doesn’t expand!• Expecting a query on an index to be as fast - Secondary indexes need to be built, happen asynchronously, and are (currently) cached at the filesystem level• Trying to do too much with one view - Instead, co-locate views in design documents, or have separate design documents• Note that sometimes, you may need to make requests of multiple views - There is not directly a method of doing a join, but there is a technique
    29. 29. What about Geo?• Experimental in the 2.0 release• Currently completely rewritten internally• Supports GeoJSON, will support more rich queries soon.• Java SDK contains Geo support right now!
    30. 30. Couchbase Integration
    31. 31. Integration with ElasticSearch1. ElasticSearch Query 2. ElasticSearch Result 3. Couchbase Multi-GET 4. Couchbase Result ElasticSearch
    32. 32. The Learning Portal • Designed and built as a collaboration between MHE Labs and Couchbase • Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration • Available for download and further development as open source code
    33. 33. Integration with Hadoop Ad Targeting Platform Logs Logs LogsCouchbase Server Cluster Logs sqoop export Logs flume flow sqoop import Hadoop Cluster
    34. 34. SummaryCouchbase has Views for Indexing andQueryingViews are incremental map-reduce code that run across all documents.Views Allow Common Methods of QueryingCommon patterns such as simple secondary indexes, count and averageaggregations, and time series rollups are simple and fast.Couchbase Integrates for Full Text and Large AnalyticsCouchbase integrates with ElasticSearch, Hadoop and other systems.
    35. 35. Q&A
    36. 36. Thanks!