Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

KCDC_Conference_Nosql_for_sql_professionals

1,116 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

KCDC_Conference_Nosql_for_sql_professionals

  1. 1. NoSQL forInteractive ApplicationsNoSQL forSQL ProfessionalsMatt IngenthronDirector, Developer Solutions
  2. 2. Relational compared toNoSQL Databases
  3. 3. Relational vs Document data modelRelational data model Document data modelCollection of complex documents witharbitrary, nested data formats andvarying “record” format.Highly-structured table organizationwith rigidly-defined data formats andrecord structure.JSONJSONJSONC1 C2 C3 C4{}
  4. 4. SQL Normalized Tables7Addresses1 DEN 30303CO2 MV 94040CA3 CHI 60609ILUsersKEY First ZIP_IDLast4 NY 10010NY1 Matt 2Ingenthron2 Joe 2Smith3 Ali 2Dodson4 John 3DoeZIP_ID CITY ZIPSTATETo get information about specific user, you perform a join across twotablesforeign keySELECT * FROM Users u INNER JOIN Addresses a ON u.zip_id = a.zip_id WHERE key=1
  5. 5. The Classic Order Entry Structurehttp://martinfowler.com/bliki/AggregateOrientedDatabase.htmlRelational databases were not designed with clusters in mind, which is whypeople have cast around for an alternative. Storing aggregates asfundamental units makes a lot of sense for running on a cluster.
  6. 6. When you need to retrieve data from RDBMS,you are "aggregating" or "denormalizing" thedata for your application through queries withjoins, where clauses and order by clauses.In Document Databases, instead of breakingdata into tables and foreign keys, you store theaggregate data together in JSON document(s).What is an aggregate?
  7. 7. Aggregate by Comparisono::1001{uid: “ji22jd”,customer: “Ann”,line_items: [{ sku: 0321293533, quan: 3, unit_price: 48.0 },{ sku: 0321601912, quan: 1, unit_price: 39.0 },{ sku: 0131495054, quan: 1, unit_price: 51.0 }],payment: {type: “Amex”,expiry: “04/2001”,last5: 12345}• Easy to distribute data• Makes sense to application programmers
  8. 8. Documents are Aggregates11+Addresses1 DEN 30303CO23 CHI 60609IL4 NY 10010NYZIP_ID CITY ZIPSTATEUsersKEY First ZIP_IDLast22 Joe 2Smith3 Ali 2Dodson4 John 3DoeAll data in a single document{“ID”: 1,“First”: “Matt”,“Last”:“Ingenthron”,“ZIP”: “92648”,“CITY”: “HB”,“STATE”: “CA”}JSON=couchbase.get(“user::1”)Document Data is anAggregate1 Matt92648CAHBIngenthron
  9. 9. NoSQL Taxonomy
  10. 10. The CAP Theorem• In a distributed System:- Consistency- Availability- Partition Tolerance• When Partition happens- Choose either Consistency(only respond to subset)- or Availability(accept stale data and conflict writes)Conflict Resolution!C AP
  11. 11. • Big Data- Large scale datastore (“>= 100TB or Petabytes”)- Optimized for Batch Processing- Data Warehouse• Big Users- very high get/set rate (thousands of ops/s)- working set in RAM- latency and throughput matters most- (near) Real-Time use casesClarification
  12. 12. The Key-Value Store / “Cache” – thefoundation of NoSQLKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValue
  13. 13. Memcached –the NoSQL precursorKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueMemcachedIn-memory onlyLimited set of operationsBlob Storage: Set, Add, Replace,CASRetrieval: GetStructured Data: Append, Increment“Simple and fast.”Challenges:- cold cache- disruptive elasticity- missing persistence
  14. 14. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)
  15. 15. Redis –More “Structured Data” commandsKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101“Data Structures”BlobListSetHash…RedisDisk Persistence (eventualconsistency on the disk)Vast set of operationsBlob Storage: Set, Add, Replace,CASRetrieval: Get, Pub-SubStructured Data: Strings, Hashes,Lists, Sets,Sorted listsChallenges:- clustering (to come)- RAM limit (no eviction)
  16. 16. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData Structure
  17. 17. Membase –From key-value cache to databaseDisk-based with built-in memcachedcacheCache refill on restartMemcached compatible (drop inreplacement)Highly-available (data replication)Add or remove capacity to live cluster“Simple, fast, elastic.”MembaseKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValue
  18. 18. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase
  19. 19. Couchbase –Document-oriented databaseKey{“string” : “string”,“string” : value,“string” :{ “string” : “string”,“string” : value },“string” : [ array ]}Auto-shardingDisk-based with built-in memcachedcacheMemcached compatibleHighly-available (data replication)Add or remove capacity to live clusterWhen values are JSON objects(“documents”): Create indices, viewsand query against the viewsChooses Consistency over AvailabilityJSON &OpaqueOBJECT(“DOCUMENT”)Couchbase
  20. 20. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase CouchbaseDocument
  21. 21. MongoDB –Document-oriented databaseKey{“string” : “string”,“string” : value,“string” :{ “string” : “string”,“string” : value },“string” : [ array ]}Disk-based with in-memory “caching”BSON (“binary JSON”) format and wireprotocolMaster-slave replicationProxied-sharding tierValues are BSON objectsSupports ad hoc queries – best whenindexedmore similar to RDBMS modeling thanCachesScaling over sharding requires specialnodesBSONOBJECT(“DOCUMENT”)MongoDB
  22. 22. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase CouchbaseMongoDBDocument
  23. 23. Cassandra –Column overlaysDisk-based systemClusteredExternal caching required for low-latencyreads“Columns” are overlaid on the dataNot all rows must have all columnsSupports efficient queries on columnsRestart required when adding columnsMulti-Data-Center replication supportedColumn-Model may be complex to startwithChooses Availability over ConsistencyCassandraKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueColumn 1Column 2Column 3(not present)
  24. 24. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase CouchbaseMongoDBDocument ColumnCassandra
  25. 25. Neo4j – Graph databaseDisk-based systemExternal caching required forlow-latency readsNodes, relationships andpathsProperties on nodesDelete, Insert, Traverse, etc.Neo4jKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValue
  26. 26. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase CouchbaseMongoDBDocument ColumnCassandraGraphNeo4j
  27. 27. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureRiakCouchbaseMongoDBDocument ColumnCassandraGraphNeo4jHBase InfiniteGraphCoherenceMembase
  28. 28. What about Hadoop?
  29. 29. Hadoop: Big Data Swiss Army Knife• Oozie: Workflow, coordination• Sqoop : Data connector to import/export data• Hive : SQL-Like interface• Pig : High level programming language• Mahout : Machine learning library• Whirr : Hadoop management tools for cloud services• Flume : Aggregator• Map Reduce : Framework to process large volume of data• HBase : Column Data Store• Zookeeper : Centralized configuration management• HDFS : Distributed file system
  30. 30. So what? Connecting Hadoopclick streameventsprofiles, campaignsprofiles, real time campaignstatistics40 milliseconds torespond with thedecision.231
  31. 31. Use Case andApplication Examples
  32. 32. Market AdoptionInternet Companies Enterprises• Social Gaming• Ad Networks• Social Networks• Online BusinessServices• E-Commerce• Online Media• Content Management• Cloud Services• Communications• Retail• Financial Services• Health Care• Automotive/Airline• Agriculture• Consumer Electronics• Business Systems
  33. 33. Market Adoption – CustomersInternet Companies EnterprisesMore than 300 customers -- 5,000 production deployments worldwide
  34. 34. Data driven use cases• Support for unlimited data growth• Data with non-homogenous structure• Need to quickly and often change data structure• 3rd party or user defined structure• Variable length documents• Sparse data records• Hierarchical data
  35. 35. Performance driven use cases• Low latency matters• High throughput matters• Large number of users• Unknown demand with sudden growth of users/data• Predominantly direct document access• Workloads with very high mutation rate per document
  36. 36. Use Case ExamplesWeb app or Use-case Couchbase Solution Example CustomerContent andMetadataManagement SystemCouchbase document store + Elastic Search McGraw-Hill…Social Game orMobile AppCouchbase stores game and player data Zynga…Ad Targeting Couchbase stores user information for fastaccessAOL…User Profile Store Couchbase Server as a key-value store TuneWiki…Session Store Couchbase Server as a key-value store Concur….High AvailabilityCaching TierCouchbase Server as a memcached tierreplacementOrbitz…Chat/MessagingPlatformCouchbase Server DOCOMO…
  37. 37. • User account information• User game profile info• User’s social graph• State of the game• Player badges and statsSocial and Mobile GamingUse Case: Social Gaming• Ability to support rapid growth• Fast response times for awesomeuser experience• Game uptime –24x7x365• Easy to update apps with newfeatures• Scalability ensures that games are ready to handle the millions ofusers that come with viral growth.• High performance guarantees players are never left waiting tomake their next move.• Always-on operations means zero interruption to game play (andrevenue)• Flexible data model means games can be developed rapidly andupdated easily with new featuresTypes of Data Application RequirementsWhy NoSQL and Couchbase
  38. 38. • User profile: preferencesand psychographic data• Ad serving history by user• Ad buying history byadvertiser• Ad serving history byadvertiserAd TargetingUse Case: Ad Targeting• High performance to meetlimited ad serving budget; timeallowance is typically <40 msec• Scalability to handle hundreds ofmillions of user profiles andrapidly growing amount of data• 24x7x365 availability to avoid adrevenue loss• Sub-millisecond reads/writes means less time is needed for dataaccess, more time is available for ad logic processing, and morehighly optimized ads will be served• Ease of scalability ensures that the data cluster can be grownseamlessly as the amount of user and ad data grows• Always-on operations = always-on revenue. You will never missthe opportunity to serve an ad because downtime.Types of Data Application RequirementsWhy NoSQL and Couchbase
  39. 39. Use Case: Content and metadata storeBuilding a self-adapting, interactivelearning portal with Couchbase• Designed and built as a collaboration between MHE Labs and Couchbase• Serves as proof-of-concept and testing harness for Couchbase + ElasticSearchintegration• Available for download and further development as open source code
  40. 40. The Technologies
  41. 41. Couchbase:Deployment and Demo
  42. 42. COUCHBASE SERVER CLUSTERBasic Operation• Docs distributed evenly acrossservers• Each server stores both active andreplica docs– Only one server active at a time• Client library provides app withsimple interface to database• Cluster map provides mapto which server doc is on– App never needs to know• App reads, writes, updates docs• Multiple app servers can access samedocument at same timeUser Configured Replica Count = 1READ/WRITE/UPDATEACTIVEDoc 5Doc 2DocDocDocSERVER 1ACTIVEDoc 4Doc 7DocDocDocSERVER 2Doc 8ACTIVEDoc 1Doc 2DocDocDocREPLICADoc 4Doc 1Doc 8DocDocDocREPLICADoc 6Doc 3Doc 2DocDocDocREPLICADoc 7Doc 9Doc 5DocDocDocSERVER 3Doc 6APP SERVER 1COUCHBASE Client LibraryCLUSTER MAPCOUCHBASE Client LibraryCLUSTER MAPAPP SERVER 2Doc 9
  43. 43. Add Nodes to Cluster• Two servers added withone-click operation• Docs automaticallyrebalance across cluster– Even distribution of docs– Minimum doc movement• Cluster map updated• App databasecalls now distributedover larger number ofserversREPLICAACTIVEDoc 5Doc 2DocDocDoc 4Doc 1DocDocSERVER 1REPLICAACTIVEDoc 4Doc 7DocDocDoc 6Doc 3DocDocSERVER 2REPLICAACTIVEDoc 1Doc 2DocDocDoc 7Doc 9DocDocSERVER 3 SERVER 4 SERVER 5REPLICAACTIVEREPLICAACTIVEDocDoc 8 DocDoc 9 DocDoc 2 DocDoc 8 DocDoc 5 DocDoc 6READ/WRITE/UPDATE READ/WRITE/UPDATEAPP SERVER 1COUCHBASE Client LibraryCLUSTER MAPCOUCHBASE Client LibraryCLUSTER MAPAPP SERVER 2COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1
  44. 44. Fail Over NodeREPLICAACTIVEDoc 5Doc 2DocDocDoc 4Doc 1DocDocSERVER 1REPLICAACTIVEDoc 4Doc 7DocDocDoc 6Doc 3DocDocSERVER 2REPLICAACTIVEDoc 1Doc 3DocDocDoc 7Doc 9DocDocSERVER 3 SERVER 4 SERVER 5REPLICAACTIVEREPLICAACTIVEDoc 9Doc 8Doc Doc 6 DocDocDoc 5 DocDoc 2Doc 8 DocDoc• App servers accessing docs• Requests to Server 3 fail• Cluster detects server failed– Promotes replicas of docs toactive– Updates cluster map• Requests for docs now go toappropriate server• Typically rebalancewould followDocDoc 1 Doc 3APP SERVER 1COUCHBASE Client LibraryCLUSTER MAPCOUCHBASE Client LibraryCLUSTER MAPAPP SERVER 2User Configured Replica Count = 1COUCHBASE SERVER CLUSTER
  45. 45. Couchbase Server Admin Console
  46. 46. Demo
  47. 47. Q & A
  48. 48. Thanks!Matt Ingenthron@ingenthrmatt@couchbase.com

×