Your SlideShare is downloading. ×
0
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
KCDC_Conference_Nosql_for_sql_professionals
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

KCDC_Conference_Nosql_for_sql_professionals

899

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
899
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • NEED A BETTER TRANSITION SLIDE
  • Most of you are probably familiar with the table layout. A table is defined with a set of column. And each record in the table conforms to the schema. If you wish to capture different data in the future, the table schema must be changed using the alter table statement. Typically data is normalized in the 3rd normal form reduce duplication. Large tables are split into smaller tablesusing foreign keys
  • Example. Normalized schema 2 tables Fk connects the two. To get information about a specific error, you will perform and join across the two tables
  • Relational databases are all about relational algebra. Reality though is that over the years, as systems have become more and more capable, and development has marched down a path of mixing logic with data (OO development), we work more with an object model which is more along the lines of aggregates. So, you end up with things like this SQL statement… Note that we really work around that with ORMs. Hibernate/NHibernate anyone?relational algebra: an offshoot of first-order logic and of algebra of sets concerned with operations over finitary relations, usually made more convenient to work with by identifying the components of a tuple by a name (called attribute) rather than by a numeric column index, which is what is called a relation in database terminology.
  • Getting info from different tables via joins.We create an Aggregate.
  • Also touch on how we wouldn’t have to perform expensive ALTER TABLE statements when we update our data model.Easier to distribute as no need for joins.
  • Typical architecture, we have stateless application servers, sitting behind a load balancer. as the usage grows, adding additional app servers , update the load balancer and scale out the application linearly on both aspects – Costs and Performance. But the data tier is has a shared everything architecture. At a minimum, these are shared cache or shared disk systems. And so you need to scale up you will need expensive hardware. And even from a performance perspective you hit a limit. so both cost and performance with this approach is non –linear.
  • If you contrast this architecture for NoSQL systems with relational systems, with a document model and auto-sharding, the database now scales horizontally along with your app servers tier. Giving you the linear cost and performance you want.
  • These are the market segments
  • Partial listing of companies with paid production deploymentsThousands more using open source
  • Transcript

    • 1. NoSQL forInteractive ApplicationsNoSQL forSQL ProfessionalsMatt IngenthronDirector, Developer Solutions
    • 2. Relational compared toNoSQL Databases
    • 3. Relational vs Document data modelRelational data model Document data modelCollection of complex documents witharbitrary, nested data formats andvarying “record” format.Highly-structured table organizationwith rigidly-defined data formats andrecord structure.JSONJSONJSONC1 C2 C3 C4{}
    • 4. SQL Normalized Tables7Addresses1 DEN 30303CO2 MV 94040CA3 CHI 60609ILUsersKEY First ZIP_IDLast4 NY 10010NY1 Matt 2Ingenthron2 Joe 2Smith3 Ali 2Dodson4 John 3DoeZIP_ID CITY ZIPSTATETo get information about specific user, you perform a join across twotablesforeign keySELECT * FROM Users u INNER JOIN Addresses a ON u.zip_id = a.zip_id WHERE key=1
    • 5. The Classic Order Entry Structurehttp://martinfowler.com/bliki/AggregateOrientedDatabase.htmlRelational databases were not designed with clusters in mind, which is whypeople have cast around for an alternative. Storing aggregates asfundamental units makes a lot of sense for running on a cluster.
    • 6. When you need to retrieve data from RDBMS,you are "aggregating" or "denormalizing" thedata for your application through queries withjoins, where clauses and order by clauses.In Document Databases, instead of breakingdata into tables and foreign keys, you store theaggregate data together in JSON document(s).What is an aggregate?
    • 7. Aggregate by Comparisono::1001{uid: “ji22jd”,customer: “Ann”,line_items: [{ sku: 0321293533, quan: 3, unit_price: 48.0 },{ sku: 0321601912, quan: 1, unit_price: 39.0 },{ sku: 0131495054, quan: 1, unit_price: 51.0 }],payment: {type: “Amex”,expiry: “04/2001”,last5: 12345}• Easy to distribute data• Makes sense to application programmers
    • 8. Documents are Aggregates11+Addresses1 DEN 30303CO23 CHI 60609IL4 NY 10010NYZIP_ID CITY ZIPSTATEUsersKEY First ZIP_IDLast22 Joe 2Smith3 Ali 2Dodson4 John 3DoeAll data in a single document{“ID”: 1,“First”: “Matt”,“Last”:“Ingenthron”,“ZIP”: “92648”,“CITY”: “HB”,“STATE”: “CA”}JSON=couchbase.get(“user::1”)Document Data is anAggregate1 Matt92648CAHBIngenthron
    • 9. NoSQL Taxonomy
    • 10. The CAP Theorem• In a distributed System:- Consistency- Availability- Partition Tolerance• When Partition happens- Choose either Consistency(only respond to subset)- or Availability(accept stale data and conflict writes)Conflict Resolution!C AP
    • 11. • Big Data- Large scale datastore (“>= 100TB or Petabytes”)- Optimized for Batch Processing- Data Warehouse• Big Users- very high get/set rate (thousands of ops/s)- working set in RAM- latency and throughput matters most- (near) Real-Time use casesClarification
    • 12. The Key-Value Store / “Cache” – thefoundation of NoSQLKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValue
    • 13. Memcached –the NoSQL precursorKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueMemcachedIn-memory onlyLimited set of operationsBlob Storage: Set, Add, Replace,CASRetrieval: GetStructured Data: Append, Increment“Simple and fast.”Challenges:- cold cache- disruptive elasticity- missing persistence
    • 14. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)
    • 15. Redis –More “Structured Data” commandsKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101“Data Structures”BlobListSetHash…RedisDisk Persistence (eventualconsistency on the disk)Vast set of operationsBlob Storage: Set, Add, Replace,CASRetrieval: Get, Pub-SubStructured Data: Strings, Hashes,Lists, Sets,Sorted listsChallenges:- clustering (to come)- RAM limit (no eviction)
    • 16. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData Structure
    • 17. Membase –From key-value cache to databaseDisk-based with built-in memcachedcacheCache refill on restartMemcached compatible (drop inreplacement)Highly-available (data replication)Add or remove capacity to live cluster“Simple, fast, elastic.”MembaseKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValue
    • 18. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase
    • 19. Couchbase –Document-oriented databaseKey{“string” : “string”,“string” : value,“string” :{ “string” : “string”,“string” : value },“string” : [ array ]}Auto-shardingDisk-based with built-in memcachedcacheMemcached compatibleHighly-available (data replication)Add or remove capacity to live clusterWhen values are JSON objects(“documents”): Create indices, viewsand query against the viewsChooses Consistency over AvailabilityJSON &OpaqueOBJECT(“DOCUMENT”)Couchbase
    • 20. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase CouchbaseDocument
    • 21. MongoDB –Document-oriented databaseKey{“string” : “string”,“string” : value,“string” :{ “string” : “string”,“string” : value },“string” : [ array ]}Disk-based with in-memory “caching”BSON (“binary JSON”) format and wireprotocolMaster-slave replicationProxied-sharding tierValues are BSON objectsSupports ad hoc queries – best whenindexedmore similar to RDBMS modeling thanCachesScaling over sharding requires specialnodesBSONOBJECT(“DOCUMENT”)MongoDB
    • 22. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase CouchbaseMongoDBDocument
    • 23. Cassandra –Column overlaysDisk-based systemClusteredExternal caching required for low-latencyreads“Columns” are overlaid on the dataNot all rows must have all columnsSupports efficient queries on columnsRestart required when adding columnsMulti-Data-Center replication supportedColumn-Model may be complex to startwithChooses Availability over ConsistencyCassandraKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueColumn 1Column 2Column 3(not present)
    • 24. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase CouchbaseMongoDBDocument ColumnCassandra
    • 25. Neo4j – Graph databaseDisk-based systemExternal caching required forlow-latency readsNodes, relationships andpathsProperties on nodesDelete, Insert, Traverse, etc.Neo4jKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValueKey101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101OpaqueBinaryValue
    • 26. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureMembase CouchbaseMongoDBDocument ColumnCassandraGraphNeo4j
    • 27. NoSQL catalogKey-ValueMemcachedCache(memoryonly)Database(memory/disk)RedisData StructureRiakCouchbaseMongoDBDocument ColumnCassandraGraphNeo4jHBase InfiniteGraphCoherenceMembase
    • 28. What about Hadoop?
    • 29. Hadoop: Big Data Swiss Army Knife• Oozie: Workflow, coordination• Sqoop : Data connector to import/export data• Hive : SQL-Like interface• Pig : High level programming language• Mahout : Machine learning library• Whirr : Hadoop management tools for cloud services• Flume : Aggregator• Map Reduce : Framework to process large volume of data• HBase : Column Data Store• Zookeeper : Centralized configuration management• HDFS : Distributed file system
    • 30. So what? Connecting Hadoopclick streameventsprofiles, campaignsprofiles, real time campaignstatistics40 milliseconds torespond with thedecision.231
    • 31. Use Case andApplication Examples
    • 32. Market AdoptionInternet Companies Enterprises• Social Gaming• Ad Networks• Social Networks• Online BusinessServices• E-Commerce• Online Media• Content Management• Cloud Services• Communications• Retail• Financial Services• Health Care• Automotive/Airline• Agriculture• Consumer Electronics• Business Systems
    • 33. Market Adoption – CustomersInternet Companies EnterprisesMore than 300 customers -- 5,000 production deployments worldwide
    • 34. Data driven use cases• Support for unlimited data growth• Data with non-homogenous structure• Need to quickly and often change data structure• 3rd party or user defined structure• Variable length documents• Sparse data records• Hierarchical data
    • 35. Performance driven use cases• Low latency matters• High throughput matters• Large number of users• Unknown demand with sudden growth of users/data• Predominantly direct document access• Workloads with very high mutation rate per document
    • 36. Use Case ExamplesWeb app or Use-case Couchbase Solution Example CustomerContent andMetadataManagement SystemCouchbase document store + Elastic Search McGraw-Hill…Social Game orMobile AppCouchbase stores game and player data Zynga…Ad Targeting Couchbase stores user information for fastaccessAOL…User Profile Store Couchbase Server as a key-value store TuneWiki…Session Store Couchbase Server as a key-value store Concur….High AvailabilityCaching TierCouchbase Server as a memcached tierreplacementOrbitz…Chat/MessagingPlatformCouchbase Server DOCOMO…
    • 37. • User account information• User game profile info• User’s social graph• State of the game• Player badges and statsSocial and Mobile GamingUse Case: Social Gaming• Ability to support rapid growth• Fast response times for awesomeuser experience• Game uptime –24x7x365• Easy to update apps with newfeatures• Scalability ensures that games are ready to handle the millions ofusers that come with viral growth.• High performance guarantees players are never left waiting tomake their next move.• Always-on operations means zero interruption to game play (andrevenue)• Flexible data model means games can be developed rapidly andupdated easily with new featuresTypes of Data Application RequirementsWhy NoSQL and Couchbase
    • 38. • User profile: preferencesand psychographic data• Ad serving history by user• Ad buying history byadvertiser• Ad serving history byadvertiserAd TargetingUse Case: Ad Targeting• High performance to meetlimited ad serving budget; timeallowance is typically <40 msec• Scalability to handle hundreds ofmillions of user profiles andrapidly growing amount of data• 24x7x365 availability to avoid adrevenue loss• Sub-millisecond reads/writes means less time is needed for dataaccess, more time is available for ad logic processing, and morehighly optimized ads will be served• Ease of scalability ensures that the data cluster can be grownseamlessly as the amount of user and ad data grows• Always-on operations = always-on revenue. You will never missthe opportunity to serve an ad because downtime.Types of Data Application RequirementsWhy NoSQL and Couchbase
    • 39. Use Case: Content and metadata storeBuilding a self-adapting, interactivelearning portal with Couchbase• Designed and built as a collaboration between MHE Labs and Couchbase• Serves as proof-of-concept and testing harness for Couchbase + ElasticSearchintegration• Available for download and further development as open source code
    • 40. The Technologies
    • 41. Couchbase:Deployment and Demo
    • 42. COUCHBASE SERVER CLUSTERBasic Operation• Docs distributed evenly acrossservers• Each server stores both active andreplica docs– Only one server active at a time• Client library provides app withsimple interface to database• Cluster map provides mapto which server doc is on– App never needs to know• App reads, writes, updates docs• Multiple app servers can access samedocument at same timeUser Configured Replica Count = 1READ/WRITE/UPDATEACTIVEDoc 5Doc 2DocDocDocSERVER 1ACTIVEDoc 4Doc 7DocDocDocSERVER 2Doc 8ACTIVEDoc 1Doc 2DocDocDocREPLICADoc 4Doc 1Doc 8DocDocDocREPLICADoc 6Doc 3Doc 2DocDocDocREPLICADoc 7Doc 9Doc 5DocDocDocSERVER 3Doc 6APP SERVER 1COUCHBASE Client LibraryCLUSTER MAPCOUCHBASE Client LibraryCLUSTER MAPAPP SERVER 2Doc 9
    • 43. Add Nodes to Cluster• Two servers added withone-click operation• Docs automaticallyrebalance across cluster– Even distribution of docs– Minimum doc movement• Cluster map updated• App databasecalls now distributedover larger number ofserversREPLICAACTIVEDoc 5Doc 2DocDocDoc 4Doc 1DocDocSERVER 1REPLICAACTIVEDoc 4Doc 7DocDocDoc 6Doc 3DocDocSERVER 2REPLICAACTIVEDoc 1Doc 2DocDocDoc 7Doc 9DocDocSERVER 3 SERVER 4 SERVER 5REPLICAACTIVEREPLICAACTIVEDocDoc 8 DocDoc 9 DocDoc 2 DocDoc 8 DocDoc 5 DocDoc 6READ/WRITE/UPDATE READ/WRITE/UPDATEAPP SERVER 1COUCHBASE Client LibraryCLUSTER MAPCOUCHBASE Client LibraryCLUSTER MAPAPP SERVER 2COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1
    • 44. Fail Over NodeREPLICAACTIVEDoc 5Doc 2DocDocDoc 4Doc 1DocDocSERVER 1REPLICAACTIVEDoc 4Doc 7DocDocDoc 6Doc 3DocDocSERVER 2REPLICAACTIVEDoc 1Doc 3DocDocDoc 7Doc 9DocDocSERVER 3 SERVER 4 SERVER 5REPLICAACTIVEREPLICAACTIVEDoc 9Doc 8Doc Doc 6 DocDocDoc 5 DocDoc 2Doc 8 DocDoc• App servers accessing docs• Requests to Server 3 fail• Cluster detects server failed– Promotes replicas of docs toactive– Updates cluster map• Requests for docs now go toappropriate server• Typically rebalancewould followDocDoc 1 Doc 3APP SERVER 1COUCHBASE Client LibraryCLUSTER MAPCOUCHBASE Client LibraryCLUSTER MAPAPP SERVER 2User Configured Replica Count = 1COUCHBASE SERVER CLUSTER
    • 45. Couchbase Server Admin Console
    • 46. Demo
    • 47. Q & A
    • 48. Thanks!Matt Ingenthron@ingenthrmatt@couchbase.com

    ×