Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

0 views
264 views

Published on

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
0
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • if you are ingesting Tweets, git commits, and linked-in API data, there ’ s little value in transforming it before you save it. just store it and sort it out later — the same holds for user data
  • schemaless is good as far as it goes, but what it ’ s really saying is: “ don ’ t worry about the database ” so a lot of the patterns move to the application. that ’ s what this section is about.
  • Bulletize the text. Make sure the builds work.
  • 1.  A set request comes in from the application . 2.  Couchbase Server responses back that they key is written 3. Couchbase Server then Replicates the data out to memory in the other nodes 4. At the same time it is put the data into a write que to be persisted to disk
  • Defined via SDKs or administration console Deploying a new view to production is an online operation , but can be heavy
  • See appendix for more indepth query patterns
  • If Sum = 11 and Count = 2, the Average is 5.5
  • ratings are stored in a hash to ensure each user can only rate each beer once
  • http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search ElasticSearch cluster is fed the documents from the Couchbase Server cluster Elastic search indexes the fields(configurable which ones) and by default will only store references back to the document id The application does document access via the Couchbase Server Cluster and uses The Views and incremental map reduce on the Couchbase cluster. For full text queries it queries the Ealstic search cluster directly (simple Http and JSON interface) The full text queries typically returns the ids of the matching documents. Documents are then retrieved from the Couchbase Server cluster. This way the high throughput document access always comes from high performance Couchbase Cluster.
  • There are two types of databases . Each is focused on a very different problem. Analytic databases were referred to in the past as OLAP databases. They are focused on looking through every record in a huge database to answer a question or gain an insight about the data contained in it. These analyses are batch processes that access every piece of data in the database, are very “ read ” heavy, and produce results in seconds , minutes, or sometimes days . For analytic databases, “ real time ” means an analysis takes a few seconds to run. Real-time interactive databases are often referred to as operational databases. They store a lot of data but usually much less than an analytic database. They must provide access to individual records in a database in milliseconds so that users of an application get good response time. Since the requirements of each database is very different, the architectures and capabilities of each are very different as well. When I refer to NoSQL in my presentation , I am referring to real-time, interactive databases . This is the type of NoSQL database Couchbase provides.
  • Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

    1. 1. Couchase and Hadooperry Krugr. Solutions Architect
    2. 2. Agenda• View basics• Lifecycle of a view• Index definition, build, and query phase• Indexing details• Replica indexes, failover and compaction• Primary and Secondary indexes• View best practices• Couchbase and Elastic Search• Couchbase and Hadoop
    3. 3. pol·y·glot / päli glät/ˈ ˌAdjective: Knowing or using several languages.Noun: A person who knows several languages.Synonyms: multilingualper·sist·ence /p r sist ns/ə ˈ əNoun: The continued or prolonged existenceof something.Synonyms: perseverance - tenacity - pertinacity –stubbornness
    4. 4. Couchbase Views – The basics• Define materialized views on JSON documents and then queryacross the data set• Using views you can define• Primary indexes• Simple secondary indexes (most common use case)• Complex secondary, tertiary and composite indexes• Aggregations (reduction)• Indexes are eventually indexed• Queries are eventually consistent with respect to documents• Built using Map/Reduce technology• Map and Reduce functions are written in Javascript
    5. 5. View LifecycleDefine -> Build -> Query5
    6. 6. Buckets & Design docs & Viewsreate design documents on a bucketreate views within a design documentBUCKET 1Designdocument 1View 1View 1View 2View 2View 3View 3Designdocument 2View 4View 4View 5View 5Designdocument 3View 6View 6View 7View 7BUCKET 2
    7. 7. Couchbase Server ClusterDistributed Indexing and QueryingUser Configured Replica Count = 1ActiveDoc 5Doc 2DocDocDocServer 1REPLICADoc 3Doc 1Doc 7DocDocDocApp Server 1COUCHBASE Client LibraryCOUCHBASE Client LibraryCluster MapCOUCHBASE Client LibraryCOUCHBASE Client LibraryCluster MapApp Server 2Doc 9• Indexing work is distributedamongst nodes• Parallelize the effort• Each node has index for datastored on it• Queries combine the results fromrequired nodesActiveDoc 3Doc 1DocDocDocServer 2REPLICADoc 6Doc 4Doc 9DocDocDocDoc 8ActiveDoc 4Doc 6DocDocDocServer 3REPLICADoc 2Doc 5Doc 8DocDocDocDoc 7QueryCreate Index / View
    8. 8. 3333 22Eventually indexed Views – Data flow2Managed CacheDiskQueueDiskReplicationQueueApp ServerCouchbase Server NodeDoc 1Doc 1Doc 1To other nodeView engineDoc 1
    9. 9. DEFINE  Index / View Definition inJavaScriptCREATE INDEX City ON Brewery.City;
    10. 10. BUILD  Distributed Index BuildPhase• Optimized for lookups, in-order access and aggregations• View reads are from disk (different performance profile than GET/SET)• Views built against every document on every node­ Group them in a design document• Views are automatically kept up to date
    11. 11. QUERY  Dynamic Queries with OptionalAggregation• Eventually consistent with respect to document updates• Efficiently fetch a document or group of similar documents• Queries will use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queriesQuery ?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}
    12. 12. Simple Primary andSecondary Indexing
    13. 13. Example DocumentDocumentID
    14. 14. Define a primary index on the bucket• Lookup the document ID / key by key, range, prefix, suffixIndexdefinition
    15. 15. Define a secondary index on thebucket• Lookup an attribute by value, range, prefix, suffixIndexdefinition
    16. 16. Find documents by a specific attribute• Lets find beers by brewery_id!
    17. 17. The index definitionValueKey
    18. 18. The result set: beers keyed bybrewery_id
    19. 19. Query PatternBasic Aggregations
    20. 20. Use a built-in reduce function with agroup query• Lets find average abv for each brewery!
    21. 21. Group reduce (reduce by unique key)
    22. 22. Query PatternTime-based Rollups
    23. 23. Find patterns in beer comments bytime{   "type": "comment",   "about_id": "beer_Enlightened_Black_Ale",   "user_id": 525,   "text": "tastes like college!",   "updated": "2010-07-22 20:00:20"}{   "id": "f1e62"}timestamp
    24. 24. Query with group_level=2 to getmonthly rollups
    25. 25. group_level=3 - daily results - greatfor graphing
    26. 26. Query PatternLeaderboard
    27. 27. Aggregate value stored in a document• Lets find the top-rated beers!{   "brewery": "New Belgium Brewing",   "name": "1554 Enlightened Black Ale",   "abv": 5.5,   "description": "Born of a flood...",   "category": "Belgian and French Ale",   "style": "Other Belgian-Style Ales",   "updated": "2010-07-22 20:00:20",  “ratings” : {    “jchris” : 5,    “scalabl3” : 4,    “damienkatz” : 1 },  “comments” : [     “f1e62”,     “6ad8c”ratings
    28. 28. Sort each beer by its average rating• Lets find the top-rated beers!average
    29. 29. Couchbase and Elastic Search
    30. 30. Full Text Search
    31. 31. {"name": "Abbey Belgian Style Ale","description": "Winner of four World Beer Cupmedals and eight medals at theGreat American Beer Fest, AbbeyBelgian Ale is the Mark Spitzof New Belgium’s lineup – butit didn’t start out that way."}Search Across Full JSON BodySearch term: abbey
    32. 32. {"name": "Abbey Belgian Style Ale","description": "Winner of four World Beer Cupmedals and eight medals at theGreat American Beer Fest, AbbeyBelgian Ale is the Mark Spitzof New Belgium’s lineup – butit didn’t start out that way."}Search Across Full JSON BodySearch term: abbey
    33. 33. Faceted SearchCategoriesItems with CountsRange Facets
    34. 34. Learning Portal – Proof of Concept
    35. 35. Couchbase and Hadoop
    36. 36. Cloudera, etc.Operational vs. Analytic DatabasesCouchbaseAnalyticAnalyticDatabasesDatabasesGet insights from Get insights from datadataReal-time, Real-time, Interactive DatabasesInteractive DatabasesFast access Fast access to datato dataNoSQL
    37. 37. What is Sqoop?Sqoop is a tool designed to transfer data betweenHadoop and [OLTP] databases. You can use Sqoopto import data from [an OLTP] databasemanagement system (RDBMS) such as MySQL orOracle [or Couchbase] into the Hadoop DistributedFile System (HDFS), transform the data in HadoopMapReduce, and then export the data back.sqoop.apache.org
    38. 38. Traditional ETLApplication DataDataTWhat is Sqoop?
    39. 39. A different paradigmDataApplicationDataWhat is Sqoop?
    40. 40. A very scalable different paradigmDataApplicationDataApplicationDataApplicationData
    41. 41. Where did the Transform go?ApplicationDataTTT TTT TTT TTTWhat is Sqoop?
    42. 42. Couchbase Import and Export$ sqoop import –-connect http://localhost:8091/pools --table DUMP$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5$ sqoop export --connect http://localhost:8091/pools--table DUMP –export-dir DUMP•For Imports, table must be:– DUMP: All keys currently in Couchbase– BACKFILL_n: All key mutations for n minutes•Specified –username maps to bucket– By default set to “default” bucket
    43. 43. Hadoop and Couchbase – Ad Targetingclick streameventsprofiles, campaignsprofiles, real time campaignstatistics40 milliseconds to respondwith the decision.231
    44. 44. Moving Parts
    45. 45. Content & RecommendationTargeting
    46. 46. Moving Parts
    47. 47. Thank youCouchbaseNoSQL Document Database

    ×