Couchbase 2.0: Indexing and Querying


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • JSON support – natively stored as json, whne you build an app, there is not conversion required. New doc viewing , editing capability. Indexing and querying – look inside your json, build views and query for a key, for ranges or to aggregate data Incremental mapreduce – powers indexing. Build complex views over your data. Great for real-time analytics XDCR – replicate information from one cluster to another cluster
  • if you are ingesting Tweets, git commits, and linked-in API data, there ’ s little value in transforming it before you save it. just store it and sort it out later — the same holds for user data
  • Most of you are probably familiar with the table layout. A table is defined with a set of column. And each record in the table conforms to the schema. If you wish to capture different data in the future, the table schema must be changed using the alter table statement. Typically data is normalized in the 3 rd normal form reduce duplication. Large tables are split into smaller tables.using foreign keys
  • Example. Normalized schema 2 tables Fk connects the two. To get information about a specific error, you will perform and join across the two tables
  • The data is modeled for the application code and not for the database.
  • no downtime deploy?
  • schemaless is good as far as it goes, but what it ’ s really saying is: “ don ’ t worry about the database ” so a lot of the patterns move to the application. that ’ s what this section is about.
  • Bulletize the text. Make sure the builds work.
  • Defined via SDKs or administration console Deploying a new view to production is an online operation , but can be heavy
  • If Sum = 11 and Count = 2, the Average is 5.5
  • ratings are stored in a hash to ensure each user can only rate each beer once
  • Could be used to integrate with other offerings like graph databases, relational databases, etc
  • (indexed only once it ’ s on disk) (scatter gather graphic) the actual story about scale is more complex, all you need to know is the abstraction, a basic sorted and groupable set, implemented as a B-tree.
  • Couchbase 2.0: Indexing and Querying

    1. 1. Couchbase Server 2.0Indexing and Querying Dipti Borkar Director, Product Management 1 1
    2. 2. Couchbase Server 2.0 - Webinar Series Introducing Couchbase Server 2.0 Couchbase Server 2.0 and Indexing/Querying Couchbase Server 2.0 and Incremental Map Reduce for Real-Time Analytics Couchbase Server 2.0 and Cross Data Center Replication Couchbase Server 2.0 and Full-Text Search Integration Couchbase Server 2.0 Use Cases Overview
    3. 3. New in Two Indexing and JSON support Querying Incremental Map Cross data center Reduce replication
    4. 4. What we’ll talk about• Records vs Documents• Lifecycle of a view  Index definition, build, and query phase  Indexing details• Replica indexes, failover and compaction• Patterns  Primary and Secondary indexes  Basic aggregations (avg ratings by brewery)  Time-based analytics with group_level  Leaderboard 4
    5. 5. Relational vs Document data model C1 C2 C3 C4 { JSON JSON } JSON Relational data model Document data model Highly-structured table organization Collection of complex documents with with rigidly-defined data formats and arbitrary, nested data formats and record structure. varying “record” format. 5
    6. 6. Example: User Profile User Info Address Info KEY First Last ZIP_id ZIP_id CITY STATE ZIP 1 Dipti Borkar 2 1 DEN CO 30303 2 Joe Smith 2 2 MV CA 94040 3 Ali Dodson 2 3 CHI IL 60609 4 John Doe 3 4 NY NY 10010 To get information about specific user, you perform a join across two tables 6
    7. 7. Document Example: User Profile{ “ID”: 1, = + “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”} JSON All data in a single document 7
    8. 8. Making the Same Change with a Document Database { “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: ,} { “TEXT”: “At Conf” } “GEO_LOC”: “134” }, “COUNTRY”: ”USA” } JSON Just add information to a document 8
    9. 9. JSON Documents• Maps more closely to objects or entities• CRUD Operations, lightweight schema { “fields” : [“with basic types”, 3.14159, true], “like” : “your favorite language” }• Stored under an identifier key client.set(“mydocumentid”, myDocument); mySavedDocument = client.get(“mydocumentid”); 9
    10. 10. Couchbase Server 2.0: Views• Views can cover a few different use cases – Primary Index – Simple secondary indexes (the most common) – Complex secondary, tertiary and composite indexes – Aggregation functions (reduction) • Example: count the number of North American Ales – Organizing related data• Built using Map/Reduce – Map function creates a matrix from document fields – Reduce function summarizes (reduces) information – Written using superfast Javascript
    11. 11. What are Views?• Extract fields from JSON documents and produce an index of the selected information
    12. 12. Development vs. Production Views• Development views index a subset of the data.• Publishing a view builds the index across the entire cluster.• Queries on production views are scattered to all cluster members and results are gathered and returned to the client. 12
    13. 13. View LifecycleDefine -> Build -> Query 13 13
    14. 14. Buckets & Design docs & Views• Create design documents on a bucket• Create views within a design document BUCKET 1 View 1 Design  document 1 View 2 View 3 Design View 4  document 2 View 5 Design View 6  document 3 View 7 BUCKET 2 14
    15. 15. Distributed Indexing and Querying Create Index / View App Server 1 App Server 2 COUCHBASE Client Library COUCHBASE Client Library Cluster Map Cluster Map Query Server 1 Server 2 Server 3 Active • Indexing work is distributed Active Active amongst nodes Doc 5 Doc Doc 3 Doc Doc 4 Doc • Parallelize the effort Doc Doc 2 Doc 1 Doc Doc 6 Doc • Each node has index for data stored Doc 9 Doc Doc 8 Doc Doc 7 Doc on it • Queries combine the results from REPLICA REPLICA REPLICA required nodes Doc Doc 3 Doc 6 Doc Doc 2 Doc Doc Doc 1 Doc 4 Doc Doc 5 Doc Doc Doc 7 Doc 9 Doc Doc 8 Doc Couchbase Server Cluster 15User Configured Replica Count = 1
    16. 16. DEFINE  Index / View Definition in JavaScript CREATE INDEX City ON Brewery.City; 16
    17. 17. BUILD  Distributed Index Build Phase• Optimized for lookups, in-order access and aggregations• All view reads from disk (different performance profile)• View builds against every document on every node – This is why you should group them in a design document• Automatically kept up to date 17
    18. 18. QUERY  Dynamic Queries with Optional Aggregation• Efficiently fetch a document or group of similar documents• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries Query ?startkey=“J”&endkey=“K” {“rows”:[{“key”:“Juneau”,“value”:null}]} 18
    19. 19. Index building details – All the views within a design document are incrementally updated when the view is accessed or auto-indexing kicks in – Automatic view updates • In addition to forcing an index build, active & replica indexes are updated every 3 seconds of inactivity if there are at least 5000 new changes (configurable) – The entire view is recreated if the view definition has changed – Views can be conditionally updated by specifying the “stale” argument to the view query – The index information stored on disk consists of the combination of both the key and value information defined within your view.
    20. 20. Queries run against stale indexes by default• stale=update_after (default if nothing is specified) – always get fastest response – can take two queries to read your own writes• stale=ok – auto update will trigger eventually – might not see your own writes for a few minutes – least frequent updates -> least resource impact• stale=false – Use with “set with persistence” if data needs to be included in view results – BUT be aware of delay it adds, only use when really required 20
    21. 21. Views and Replica indexes• In addition to replicas for data (up to 3 copies), optionally create replica for indexes – Set at a bucket level – Replica index populated from replica data – Replica index is used after a failover• Each node manages replica index data structures – Index structure optimized to handle sharded data (vBuckets) – One for all active shards on the node – One for all replica shards on the node
    22. 22. Views and failover• Replica indexes enabled on failover• Replicas indexes are rebuilt on replica nodes – Automatically incrementally built based on replica data – Updated every 3 seconds of inactivity if there are at least 5000 new changes – Not copied/moved to be consistent with persisted replica data
    23. 23. View Compaction • Compaction is ONLINE • Reclaims empty allocated space from disk • Indexes are stored on disk for active vBuckets on each node and updated in append-only manner • Auto-compaction performed in the background – Set the database fragmentation levels – Set the index fragmentation levels – Choose a schedule – Global and bucket specific settings
    24. 24. Primary and Secondary Indexing 24 2
    25. 25. Example Document Document ID 25
    26. 26. Define a primary index on the bucket• Lookup the document ID / key by key, range, prefix, suffix Index definition 26
    27. 27. Define a secondary index on the bucket• Lookup an attribute by value, range, prefix, suffix Index definition 27
    28. 28. Query PatternsFind by Attribute 28 2
    29. 29. Find documents by a specific attribute• Lets find beers by brewery_id! 29
    30. 30. The index definition Key Value 30
    31. 31. The result set: beers keyed by brewery_id 31
    32. 32. Query PatternBasic Aggregations 32 3
    33. 33. Use a built-in reduce function with a group query• Lets find average abv for each brewery! 33
    34. 34. We are reducing doc.abv with _stats 34 34
    35. 35. Group reduce (reduce by unique key) 35 35
    36. 36. Query PatternTime-based Rollups 36 3
    37. 37. Find patterns in beer comments by time { "type": "comment", "about_id": "beer_Enlightened_Black_Ale", "user_id": 525, timestamp "text": "tastes like college!", "updated": "2010-07-22 20:00:20" { } "id": "f1e62" } 37
    38. 38. Query with group_level=2 to get monthly rollups 38
    39. 39. dateToArray() is your friend y( r ra oA eT ) dat• String or Integer based timestamps• Output optimized for group_level queries• array of JSON numbers: 39
    40. 40. group_level=2 results• Monthly rollup• Sorted by time—sort the query results in your application if you want to rank by value—no chained map-reduce 40
    41. 41. group_level=3 - daily results - great for graphing• Daily, hourly, minute or second rollup all possible with the same index. 41
    42. 42. Query Pattern Leaderboard 42 4
    43. 43. Aggregate value stored in a document• Lets find the top-rated beers! { "brewery": "New Belgium Brewing", "name": "1554 Enlightened Black Ale", "abv": 5.5, "description": "Born of a flood...", "category": "Belgian and French Ale", "style": "Other Belgian-Style Ales", "updated": "2010-07-22 20:00:20", “ratings” : { ratings “jchris” : 5, “scalabl3” : 4, “damienkatz” : 1 }, 43
    44. 44. Sort each beer by its average rating• Lets find the top-rated beers! average 44 44
    45. 45. What NOT to Write 45 4
    46. 46. Most common mistakes• Reduces that don’t reduce• Trying to do too many things with one view• Emitting too much data into a view value• Expecting view query performance to be as fast as get/set• Recursive queries require application code. 46
    47. 47. Full Text index 47 4
    48. 48. Elastic Search Adapter• Elastic Search is good for ad-hoc queries and faceted browsing• Our adapter is aware of changing Couchbase topology• Indexed by Elastic Search after stored to disk in Couchbase ElasticSearch Couchbase Server 2.0 and Full-Text Search Integration 48
    49. 49. Geographic index 49 4
    50. 50. Experimental Status• Not yet using Superstar trees • (only fast on large clusters)• Optimized for bulk loading 50
    51. 51. Questions? 51 5
    53. 53. The B-tree Index• Helps to know whats efficient• Superstar 53
    54. 54. Logical View B-tree•Incremental reduce values are stored in the tree C ES DU RE 54
    55. 55. Logical View B-tree• Incremental reduce values are stored in the tree C ES DU 25 RE 7 5 5 3 2 3 55
    56. 56. Reduce!• Incremental reduce values are stored in the tree_countfunction(keys, values) { 25 return keys ? values.length : sum(values);} 7 5 5 3 2 3 56
    57. 57. Dynamic Queries• You can query that tree dynamically• Lots of the patterns are about pulling value from this data structure_countfunction(keys, values) { 25 return keys ? values.length : sum(values);} 7 5 5 3 2 3{ }?startkey=“abba”&endkey=“robot”{“value”:19} 57
    58. 58. Lookup via key-range• Find tables during yesterdays lunch shift• Find shifts owned by which manager 25 7 5 5 3 2 3?startkey=“abba”&endkey=“robot”{“value”:19} 58