Your SlideShare is downloading. ×
  • Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

  • 199 views
Published

 

Published in Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
199
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
19
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • if you are ingesting Tweets, git commits, and linked-in API data, there ’ s little value in transforming it before you save it. just store it and sort it out later — the same holds for user data
  • schemaless is good as far as it goes, but what it ’ s really saying is: “ don ’ t worry about the database ” so a lot of the patterns move to the application. that ’ s what this section is about.
  • Bulletize the text. Make sure the builds work.
  • 1.  A set request comes in from the application . 2.  Couchbase Server responses back that they key is written 3. Couchbase Server then Replicates the data out to memory in the other nodes 4. At the same time it is put the data into a write que to be persisted to disk
  • Defined via SDKs or administration console Deploying a new view to production is an online operation , but can be heavy
  • See appendix for more indepth query patterns
  • If Sum = 11 and Count = 2, the Average is 5.5
  • ratings are stored in a hash to ensure each user can only rate each beer once
  • http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search ElasticSearch cluster is fed the documents from the Couchbase Server cluster Elastic search indexes the fields(configurable which ones) and by default will only store references back to the document id The application does document access via the Couchbase Server Cluster and uses The Views and incremental map reduce on the Couchbase cluster. For full text queries it queries the Ealstic search cluster directly (simple Http and JSON interface) The full text queries typically returns the ids of the matching documents. Documents are then retrieved from the Couchbase Server cluster. This way the high throughput document access always comes from high performance Couchbase Cluster.
  • There are two types of databases . Each is focused on a very different problem. Analytic databases were referred to in the past as OLAP databases. They are focused on looking through every record in a huge database to answer a question or gain an insight about the data contained in it. These analyses are batch processes that access every piece of data in the database, are very “ read ” heavy, and produce results in seconds , minutes, or sometimes days . For analytic databases, “ real time ” means an analysis takes a few seconds to run. Real-time interactive databases are often referred to as operational databases. They store a lot of data but usually much less than an analytic database. They must provide access to individual records in a database in milliseconds so that users of an application get good response time. Since the requirements of each database is very different, the architectures and capabilities of each are very different as well. When I refer to NoSQL in my presentation , I am referring to real-time, interactive databases . This is the type of NoSQL database Couchbase provides.

Transcript

  • 1. Couchase and Hadooperry Krugr. Solutions Architect
  • 2. Agenda• View basics• Lifecycle of a view• Index definition, build, and query phase• Indexing details• Replica indexes, failover and compaction• Primary and Secondary indexes• View best practices• Couchbase and Elastic Search• Couchbase and Hadoop
  • 3. pol·y·glot / päli glät/ˈ ˌAdjective: Knowing or using several languages.Noun: A person who knows several languages.Synonyms: multilingualper·sist·ence /p r sist ns/ə ˈ əNoun: The continued or prolonged existenceof something.Synonyms: perseverance - tenacity - pertinacity –stubbornness
  • 4. Couchbase Views – The basics• Define materialized views on JSON documents and then queryacross the data set• Using views you can define• Primary indexes• Simple secondary indexes (most common use case)• Complex secondary, tertiary and composite indexes• Aggregations (reduction)• Indexes are eventually indexed• Queries are eventually consistent with respect to documents• Built using Map/Reduce technology• Map and Reduce functions are written in Javascript
  • 5. View LifecycleDefine -> Build -> Query5
  • 6. Buckets & Design docs & Viewsreate design documents on a bucketreate views within a design documentBUCKET 1Designdocument 1View 1View 1View 2View 2View 3View 3Designdocument 2View 4View 4View 5View 5Designdocument 3View 6View 6View 7View 7BUCKET 2
  • 7. Couchbase Server ClusterDistributed Indexing and QueryingUser Configured Replica Count = 1ActiveDoc 5Doc 2DocDocDocServer 1REPLICADoc 3Doc 1Doc 7DocDocDocApp Server 1COUCHBASE Client LibraryCOUCHBASE Client LibraryCluster MapCOUCHBASE Client LibraryCOUCHBASE Client LibraryCluster MapApp Server 2Doc 9• Indexing work is distributedamongst nodes• Parallelize the effort• Each node has index for datastored on it• Queries combine the results fromrequired nodesActiveDoc 3Doc 1DocDocDocServer 2REPLICADoc 6Doc 4Doc 9DocDocDocDoc 8ActiveDoc 4Doc 6DocDocDocServer 3REPLICADoc 2Doc 5Doc 8DocDocDocDoc 7QueryCreate Index / View
  • 8. 3333 22Eventually indexed Views – Data flow2Managed CacheDiskQueueDiskReplicationQueueApp ServerCouchbase Server NodeDoc 1Doc 1Doc 1To other nodeView engineDoc 1
  • 9. DEFINE  Index / View Definition inJavaScriptCREATE INDEX City ON Brewery.City;
  • 10. BUILD  Distributed Index BuildPhase• Optimized for lookups, in-order access and aggregations• View reads are from disk (different performance profile than GET/SET)• Views built against every document on every node­ Group them in a design document• Views are automatically kept up to date
  • 11. QUERY  Dynamic Queries with OptionalAggregation• Eventually consistent with respect to document updates• Efficiently fetch a document or group of similar documents• Queries will use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queriesQuery ?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}
  • 12. Simple Primary andSecondary Indexing
  • 13. Example DocumentDocumentID
  • 14. Define a primary index on the bucket• Lookup the document ID / key by key, range, prefix, suffixIndexdefinition
  • 15. Define a secondary index on thebucket• Lookup an attribute by value, range, prefix, suffixIndexdefinition
  • 16. Find documents by a specific attribute• Lets find beers by brewery_id!
  • 17. The index definitionValueKey
  • 18. The result set: beers keyed bybrewery_id
  • 19. Query PatternBasic Aggregations
  • 20. Use a built-in reduce function with agroup query• Lets find average abv for each brewery!
  • 21. Group reduce (reduce by unique key)
  • 22. Query PatternTime-based Rollups
  • 23. Find patterns in beer comments bytime{   "type": "comment",   "about_id": "beer_Enlightened_Black_Ale",   "user_id": 525,   "text": "tastes like college!",   "updated": "2010-07-22 20:00:20"}{   "id": "f1e62"}timestamp
  • 24. Query with group_level=2 to getmonthly rollups
  • 25. group_level=3 - daily results - greatfor graphing
  • 26. Query PatternLeaderboard
  • 27. Aggregate value stored in a document• Lets find the top-rated beers!{   "brewery": "New Belgium Brewing",   "name": "1554 Enlightened Black Ale",   "abv": 5.5,   "description": "Born of a flood...",   "category": "Belgian and French Ale",   "style": "Other Belgian-Style Ales",   "updated": "2010-07-22 20:00:20",  “ratings” : {    “jchris” : 5,    “scalabl3” : 4,    “damienkatz” : 1 },  “comments” : [     “f1e62”,     “6ad8c”ratings
  • 28. Sort each beer by its average rating• Lets find the top-rated beers!average
  • 29. Couchbase and Elastic Search
  • 30. Full Text Search
  • 31. {"name": "Abbey Belgian Style Ale","description": "Winner of four World Beer Cupmedals and eight medals at theGreat American Beer Fest, AbbeyBelgian Ale is the Mark Spitzof New Belgium’s lineup – butit didn’t start out that way."}Search Across Full JSON BodySearch term: abbey
  • 32. {"name": "Abbey Belgian Style Ale","description": "Winner of four World Beer Cupmedals and eight medals at theGreat American Beer Fest, AbbeyBelgian Ale is the Mark Spitzof New Belgium’s lineup – butit didn’t start out that way."}Search Across Full JSON BodySearch term: abbey
  • 33. Faceted SearchCategoriesItems with CountsRange Facets
  • 34. Learning Portal – Proof of Concept
  • 35. Couchbase and Hadoop
  • 36. Cloudera, etc.Operational vs. Analytic DatabasesCouchbaseAnalyticAnalyticDatabasesDatabasesGet insights from Get insights from datadataReal-time, Real-time, Interactive DatabasesInteractive DatabasesFast access Fast access to datato dataNoSQL
  • 37. What is Sqoop?Sqoop is a tool designed to transfer data betweenHadoop and [OLTP] databases. You can use Sqoopto import data from [an OLTP] databasemanagement system (RDBMS) such as MySQL orOracle [or Couchbase] into the Hadoop DistributedFile System (HDFS), transform the data in HadoopMapReduce, and then export the data back.sqoop.apache.org
  • 38. Traditional ETLApplication DataDataTWhat is Sqoop?
  • 39. A different paradigmDataApplicationDataWhat is Sqoop?
  • 40. A very scalable different paradigmDataApplicationDataApplicationDataApplicationData
  • 41. Where did the Transform go?ApplicationDataTTT TTT TTT TTTWhat is Sqoop?
  • 42. Couchbase Import and Export$ sqoop import –-connect http://localhost:8091/pools --table DUMP$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5$ sqoop export --connect http://localhost:8091/pools--table DUMP –export-dir DUMP•For Imports, table must be:– DUMP: All keys currently in Couchbase– BACKFILL_n: All key mutations for n minutes•Specified –username maps to bucket– By default set to “default” bucket
  • 43. Hadoop and Couchbase – Ad Targetingclick streameventsprofiles, campaignsprofiles, real time campaignstatistics40 milliseconds to respondwith the decision.231
  • 44. Moving Parts
  • 45. Content & RecommendationTargeting
  • 46. Moving Parts
  • 47. Thank youCouchbaseNoSQL Document Database