Developing with Views:See Inside the Data           Matt Ingenthron           Director, Developer Solutions               ...
What we’ll talk about• Lifecycle of a view• Index definition, build, and query phase• Consistency options (async by defaul...
VIEW LIFECYCLE:DEFINE - BUILD - QUERY                         3   3
View Definition (in JavaScript)like:CREATE INDEX city ON brewery city;                                     4
Distributed Index Build Phase  • Optimized for lookups, in-order access and aggregations  • All view reads from disk (diff...
Dynamic Range Queries with Optional Aggregation• Efficiently fetch an row or group of related rows.• Queries use cached va...
Queries run against stale indexes by default• stale=update_after (default if nothing is specified)   – always get fastest ...
Development vs. Production Views• Development views  index a subset of the  data.• Publishing a view builds  the index acr...
EMERGENT SCHEMA                  9   9
Emergent Schema   • Falls out of your key-value usage   • Helps to know whats efficient   • Mostly you can relax"Capture t...
QUERY PATTERN:FIND BY ATTRIBUTE                    11   1
Find documents by a specific attribute  • Lets find beers by brewery_id!                                         12
The index definition                       13
The result set: beers keyed by brewery_id                                            14
QUERY PATTERN:BASIC AGGREGATIONS                     15   1
Use a built-in reduce function with a group query  • Lets find average abv for each brewery!                              ...
We are reducing doc.abv with _stats                                      17
Group reduce (reduce by unique key)                                      18                                       18
QUERY PATTERN:TIME-BASED ROLLUPS                     19   1
Find patterns in beer comments by time               {                 "type": "comment",                 "about_id":     ...
Query with group_level=2 to get monthlyrollups                                          21
dateToArray() is your friend• String or Integer based timestamps• Output optimized for group_level  queries• array of JSON...
group_level=2 results• Monthly rollup• Sorted by time—sort the query results in your  application if you want to rank by v...
group_level=3 - daily results - great for graphing• Daily, hourly, minute or second rollup all possible with the  same ind...
QUERY PATTERN: LEADERBOARD                 2525
Aggregate value stored in a document  • Lets find the top-rated beers!                      {                        "brew...
Sort each beer by its average rating  • Lets find the top-rated beers!                      average                       ...
WHAT NOT TO WRITE                    2828
Most common mistakes• Reduces that don’t reduce• Trying to do too many things with one view• Emitting too much data into a...
GEOGRAPHIC INDEX                   3030
Experimental Status• Not yet using Superstar trees  • (only fast on large clusters)• Optimized for bulk loading           ...
FULL TEXT INDEX                   32                  32
Elastic Search Adapter  • Elastic Search is good for ad-hoc queries and faceted browsing  • Our adapter is aware of changi...
QUESTIONS?              34             34
Views Under The Hood        THIS TALK IS NOT WRITTEN YET        maybe combine with Dustin’s        internals talk about vb...
What we’ll talk about• Key areas/topics discussed                               36
Dynamic Time Range Queries                             37   3
The B-tree Index• Helps to know whats efficient• Superstar       http://damienkatz.net/2012/05/stabilizing_couchbase_serve...
Logical View B-tree• Incremental reduce values are stored in the tree                                                     39
Logical View B-tree• Incremental reduce values are stored in the tree                           25           7       5    ...
Reduce!• Incremental reduce values are stored in the tree_countfunction(keys, values) {                             25  re...
Dynamic Queries• You can query that tree dynamically• Lots of the patterns are about pulling value from this data structur...
Dynamic Queries• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversa...
Respect Reduce! (anti-pattern)•function(keys, values) { are stored in the tree   Incremental reduce values   return values...
Just use the Map• If you think you need ―the identity reduce‖—just use the  map.                          [―ace‖, ―argh!‖,...
Lookup via key-range• Find tables during yesterdays lunch shift• Find shifts owned by which manager                       ...
Schema evolution                   4747
Application and Views• Interactive schema fully controlled by application• If your code can handle it, the database can• L...
Incremental schema evolution• Use a view to decide which documents need work• Make your workers idempotent• Once all your ...
Upcoming SlideShare
Loading in …5
×

CCB12 App Development with Indexes, Queries and Geo

591 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
591
On SlideShare
0
From Embeds
0
Number of Embeds
239
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • example: running the mapper and reducer over all of the docs.Caveat: use careful guards to be sure view execution doesn’t stop due to unbound variables
  • First, walk through the optionsThen mention Observe
  • CCB12 App Development with Indexes, Queries and Geo

    1. 1. Developing with Views:See Inside the Data Matt Ingenthron Director, Developer Solutions 1 1
    2. 2. What we’ll talk about• Lifecycle of a view• Index definition, build, and query phase• Consistency options (async by default)• Emergent Schema - Views and Documents• Patterns: • Secondary index • Basic aggregations (avg ratings by brewery) • Time-based analytics with group_level • Leaderboard • Schema Evolution 2
    3. 3. VIEW LIFECYCLE:DEFINE - BUILD - QUERY 3 3
    4. 4. View Definition (in JavaScript)like:CREATE INDEX city ON brewery city; 4
    5. 5. Distributed Index Build Phase • Optimized for lookups, in-order access and aggregations • All view reads from disk (different performance profile) • View builds against every document on every node – This is why you should group them in a design document • Automatically kept up to date SERVER SERVER SERVER 3 Active Docs Active Docs Active Docs 1 2 Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC 5
    6. 6. Dynamic Range Queries with Optional Aggregation• Efficiently fetch an row or group of related rows.• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries ?startkey=―J‖&endkey=―K‖ {―rows‖:[{―key‖:―Juneau‖,―value‖:null}]} SERVER SERVER SERVER 3 Active Docs Active Docs Active Docs 1 2 Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC 6
    7. 7. Queries run against stale indexes by default• stale=update_after (default if nothing is specified) – always get fastest response – can take two queries to read your own writes• stale=ok – auto update will trigger eventually – might not see your own writes for a few minutes – least frequent updates -> least resource impact• stale=false – Use with Persistence observe if data needs to be included in view results – BUT aware of delay it adds, only use when really required 7
    8. 8. Development vs. Production Views• Development views index a subset of the data.• Publishing a view builds the index across the entire cluster.• Queries on production views are scattered to all cluster members and results are gathered and returned to the client. 8
    9. 9. EMERGENT SCHEMA 9 9
    10. 10. Emergent Schema • Falls out of your key-value usage • Helps to know whats efficient • Mostly you can relax"Capture the users intent" JSON.org Github API Twitter API 10
    11. 11. QUERY PATTERN:FIND BY ATTRIBUTE 11 1
    12. 12. Find documents by a specific attribute • Lets find beers by brewery_id! 12
    13. 13. The index definition 13
    14. 14. The result set: beers keyed by brewery_id 14
    15. 15. QUERY PATTERN:BASIC AGGREGATIONS 15 1
    16. 16. Use a built-in reduce function with a group query • Lets find average abv for each brewery! 16
    17. 17. We are reducing doc.abv with _stats 17
    18. 18. Group reduce (reduce by unique key) 18 18
    19. 19. QUERY PATTERN:TIME-BASED ROLLUPS 19 1
    20. 20. Find patterns in beer comments by time { "type": "comment", "about_id": "beer_Enlightened_Black_Ale", "user_id": 525, timestam "text": "tastes like college!", p "updated": "2010-07-22 20:00:20" } { "id": "f1e62" } 20
    21. 21. Query with group_level=2 to get monthlyrollups 21
    22. 22. dateToArray() is your friend• String or Integer based timestamps• Output optimized for group_level queries• array of JSON numbers: 22
    23. 23. group_level=2 results• Monthly rollup• Sorted by time—sort the query results in your application if you want to rank by value—no chained map-reduce 23
    24. 24. group_level=3 - daily results - great for graphing• Daily, hourly, minute or second rollup all possible with the same index.• http://crate.im/posts/couchbase-views-reddit-data/ 24
    25. 25. QUERY PATTERN: LEADERBOARD 2525
    26. 26. Aggregate value stored in a document • Lets find the top-rated beers! { "brewery": "New Belgium Brewing", "name": "1554 Enlightened Black Ale", "abv": 5.5, "description": "Born of a flood...", "category": "Belgian and French Ale", "style": "Other Belgian-Style Ales", "updated": "2010-07-22 20:00:20", ―ratings‖ : { ―ingenthr‖ : 5, ratings ―jchris‖ : 4, ―scalabl3‖ : 5, ―damienkatz‖ : 1 26 },
    27. 27. Sort each beer by its average rating • Lets find the top-rated beers! average 27 27
    28. 28. WHAT NOT TO WRITE 2828
    29. 29. Most common mistakes• Reduces that don’t reduce• Trying to do too many things with one view• Emitting too much data into a view value• Expecting view query performance to be as fast as get/set• Recursive queries require application code. 29
    30. 30. GEOGRAPHIC INDEX 3030
    31. 31. Experimental Status• Not yet using Superstar trees • (only fast on large clusters)• Optimized for bulk loading 31
    32. 32. FULL TEXT INDEX 32 32
    33. 33. Elastic Search Adapter • Elastic Search is good for ad-hoc queries and faceted browsing • Our adapter is aware of changing Couchbase topology • Indexed by Elastic Search after stored to disk in Couchbase ElasticSearch 33
    34. 34. QUESTIONS? 34 34
    35. 35. Views Under The Hood THIS TALK IS NOT WRITTEN YET maybe combine with Dustin’s internals talk about vbucket handoff J Chris Anderson Architect 35 35
    36. 36. What we’ll talk about• Key areas/topics discussed 36
    37. 37. Dynamic Time Range Queries 37 3
    38. 38. The B-tree Index• Helps to know whats efficient• Superstar http://damienkatz.net/2012/05/stabilizing_couchbase_server_2.html 38
    39. 39. Logical View B-tree• Incremental reduce values are stored in the tree 39
    40. 40. Logical View B-tree• Incremental reduce values are stored in the tree 25 7 5 5 3 2 3 40
    41. 41. Reduce!• Incremental reduce values are stored in the tree_countfunction(keys, values) { 25 return keys ? values.length : sum(values);} 7 5 5 3 2 3 41
    42. 42. Dynamic Queries• You can query that tree dynamically• Lots of the patterns are about pulling value from this data structure_countfunction(keys, values) { 25 return keys ? values.length : sum(values);} 7 5 5 3 2 3{ }?startkey=―abba‖&endkey=―robot‖{―value‖:19} 42
    43. 43. Dynamic Queries• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries_countfunction(keys, values) { 25 19 return keys ? values.length : sum(values);} (7 7 5 5 2) 3 2 3{ } {?startkey=―abba‖&endkey=―robot‖{―value‖:19} 43
    44. 44. Respect Reduce! (anti-pattern)•function(keys, values) { are stored in the tree Incremental reduce values return values; } [―ace‖, ―argh!‖,―asphalt‖, ―fr garage‖,―hibernate‖] [―pluto‖, ―nectar‖,―mira [―ace‖, ―argh!‖,―asphalt‖]s [―front‖, ―garage‖,―hibernate 44
    45. 45. Just use the Map• If you think you need ―the identity reduce‖—just use the map. [―ace‖, ―argh!‖,―asphalt‖, ―fr garage‖,―hibernate‖] 45
    46. 46. Lookup via key-range• Find tables during yesterdays lunch shift• Find shifts owned by which manager 25 7 5 5 3 2 3?startkey=―abba‖&endkey=―robot‖{―value‖:19} 46
    47. 47. Schema evolution 4747
    48. 48. Application and Views• Interactive schema fully controlled by application• If your code can handle it, the database can• Learn to write views defensively 48
    49. 49. Incremental schema evolution• Use a view to decide which documents need work• Make your workers idempotent• Once all your data is cleaned up, and old clients are no longer writing the old format• The cleanup view is obsolete, so is any app code for dealing with the old case• Youve evolved! 49

    ×