Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Business, Technology
  • Be the first to comment


  1. 1. Couchase and Hadooperry Krugr. Solutions Architect
  2. 2. Agenda• View basics• Lifecycle of a view• Index definition, build, and query phase• Indexing details• Replica indexes, failover and compaction• Primary and Secondary indexes• View best practices• Couchbase and Elastic Search• Couchbase and Hadoop
  3. 3. pol·y·glot / päli glät/ˈ ˌAdjective: Knowing or using several languages.Noun: A person who knows several languages.Synonyms: multilingualper·sist·ence /p r sist ns/ə ˈ əNoun: The continued or prolonged existenceof something.Synonyms: perseverance - tenacity - pertinacity –stubbornness
  4. 4. Couchbase Views – The basics• Define materialized views on JSON documents and then queryacross the data set• Using views you can define• Primary indexes• Simple secondary indexes (most common use case)• Complex secondary, tertiary and composite indexes• Aggregations (reduction)• Indexes are eventually indexed• Queries are eventually consistent with respect to documents• Built using Map/Reduce technology• Map and Reduce functions are written in Javascript
  5. 5. View LifecycleDefine -> Build -> Query5
  6. 6. Buckets & Design docs & Viewsreate design documents on a bucketreate views within a design documentBUCKET 1Designdocument 1View 1View 1View 2View 2View 3View 3Designdocument 2View 4View 4View 5View 5Designdocument 3View 6View 6View 7View 7BUCKET 2
  7. 7. Couchbase Server ClusterDistributed Indexing and QueryingUser Configured Replica Count = 1ActiveDoc 5Doc 2DocDocDocServer 1REPLICADoc 3Doc 1Doc 7DocDocDocApp Server 1COUCHBASE Client LibraryCOUCHBASE Client LibraryCluster MapCOUCHBASE Client LibraryCOUCHBASE Client LibraryCluster MapApp Server 2Doc 9• Indexing work is distributedamongst nodes• Parallelize the effort• Each node has index for datastored on it• Queries combine the results fromrequired nodesActiveDoc 3Doc 1DocDocDocServer 2REPLICADoc 6Doc 4Doc 9DocDocDocDoc 8ActiveDoc 4Doc 6DocDocDocServer 3REPLICADoc 2Doc 5Doc 8DocDocDocDoc 7QueryCreate Index / View
  8. 8. 3333 22Eventually indexed Views – Data flow2Managed CacheDiskQueueDiskReplicationQueueApp ServerCouchbase Server NodeDoc 1Doc 1Doc 1To other nodeView engineDoc 1
  9. 9. DEFINE  Index / View Definition inJavaScriptCREATE INDEX City ON Brewery.City;
  10. 10. BUILD  Distributed Index BuildPhase• Optimized for lookups, in-order access and aggregations• View reads are from disk (different performance profile than GET/SET)• Views built against every document on every node­ Group them in a design document• Views are automatically kept up to date
  11. 11. QUERY  Dynamic Queries with OptionalAggregation• Eventually consistent with respect to document updates• Efficiently fetch a document or group of similar documents• Queries will use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queriesQuery ?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}
  12. 12. Simple Primary andSecondary Indexing
  13. 13. Example DocumentDocumentID
  14. 14. Define a primary index on the bucket• Lookup the document ID / key by key, range, prefix, suffixIndexdefinition
  15. 15. Define a secondary index on thebucket• Lookup an attribute by value, range, prefix, suffixIndexdefinition
  16. 16. Find documents by a specific attribute• Lets find beers by brewery_id!
  17. 17. The index definitionValueKey
  18. 18. The result set: beers keyed bybrewery_id
  19. 19. Query PatternBasic Aggregations
  20. 20. Use a built-in reduce function with agroup query• Lets find average abv for each brewery!
  21. 21. Group reduce (reduce by unique key)
  22. 22. Query PatternTime-based Rollups
  23. 23. Find patterns in beer comments bytime{   "type": "comment",   "about_id": "beer_Enlightened_Black_Ale",   "user_id": 525,   "text": "tastes like college!",   "updated": "2010-07-22 20:00:20"}{   "id": "f1e62"}timestamp
  24. 24. Query with group_level=2 to getmonthly rollups
  25. 25. group_level=3 - daily results - greatfor graphing
  26. 26. Query PatternLeaderboard
  27. 27. Aggregate value stored in a document• Lets find the top-rated beers!{   "brewery": "New Belgium Brewing",   "name": "1554 Enlightened Black Ale",   "abv": 5.5,   "description": "Born of a flood...",   "category": "Belgian and French Ale",   "style": "Other Belgian-Style Ales",   "updated": "2010-07-22 20:00:20",  “ratings” : {    “jchris” : 5,    “scalabl3” : 4,    “damienkatz” : 1 },  “comments” : [     “f1e62”,     “6ad8c”ratings
  28. 28. Sort each beer by its average rating• Lets find the top-rated beers!average
  29. 29. Couchbase and Elastic Search
  30. 30. Full Text Search
  31. 31. {"name": "Abbey Belgian Style Ale","description": "Winner of four World Beer Cupmedals and eight medals at theGreat American Beer Fest, AbbeyBelgian Ale is the Mark Spitzof New Belgium’s lineup – butit didn’t start out that way."}Search Across Full JSON BodySearch term: abbey
  32. 32. {"name": "Abbey Belgian Style Ale","description": "Winner of four World Beer Cupmedals and eight medals at theGreat American Beer Fest, AbbeyBelgian Ale is the Mark Spitzof New Belgium’s lineup – butit didn’t start out that way."}Search Across Full JSON BodySearch term: abbey
  33. 33. Faceted SearchCategoriesItems with CountsRange Facets
  34. 34. Learning Portal – Proof of Concept
  35. 35. Couchbase and Hadoop
  36. 36. Cloudera, etc.Operational vs. Analytic DatabasesCouchbaseAnalyticAnalyticDatabasesDatabasesGet insights from Get insights from datadataReal-time, Real-time, Interactive DatabasesInteractive DatabasesFast access Fast access to datato dataNoSQL
  37. 37. What is Sqoop?Sqoop is a tool designed to transfer data betweenHadoop and [OLTP] databases. You can use Sqoopto import data from [an OLTP] databasemanagement system (RDBMS) such as MySQL orOracle [or Couchbase] into the Hadoop DistributedFile System (HDFS), transform the data in HadoopMapReduce, and then export the data
  38. 38. Traditional ETLApplication DataDataTWhat is Sqoop?
  39. 39. A different paradigmDataApplicationDataWhat is Sqoop?
  40. 40. A very scalable different paradigmDataApplicationDataApplicationDataApplicationData
  41. 41. Where did the Transform go?ApplicationDataTTT TTT TTT TTTWhat is Sqoop?
  42. 42. Couchbase Import and Export$ sqoop import –-connect http://localhost:8091/pools --table DUMP$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5$ sqoop export --connect http://localhost:8091/pools--table DUMP –export-dir DUMP•For Imports, table must be:– DUMP: All keys currently in Couchbase– BACKFILL_n: All key mutations for n minutes•Specified –username maps to bucket– By default set to “default” bucket
  43. 43. Hadoop and Couchbase – Ad Targetingclick streameventsprofiles, campaignsprofiles, real time campaignstatistics40 milliseconds to respondwith the decision.231
  44. 44. Moving Parts
  45. 45. Content & RecommendationTargeting
  46. 46. Moving Parts
  47. 47. Thank youCouchbaseNoSQL Document Database