• Share
  • Email
  • Embed
  • Like
  • Private Content
Couchbase_John_Bryce_Israel_Training_use_cases
 

Couchbase_John_Bryce_Israel_Training_use_cases

on

  • 1,027 views

 

Statistics

Views

Total Views
1,027
Views on SlideShare
1,027
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • For those who don’t know what Draw Something is – it is a “social” game like Pictionary. Two players play. A player is presented with a list of three words, from which they pick one to draw. The other player then sees the drawing and has to guess the word. And it goes back and forth like that.
  • As user growth exploded, the data associated with the game expanded exponentially. By the time the company was acquired, there were over 5000 THOUSAND drawings EVERY SECOND being created and stored by Draw Something. Unprecedented growth – growth most systems would crumble under.
  • Unfortunately, not everyone prepares. On March 1, as Vinny and Pauly D of the Jersey Shore were tweeting about Draw Something, EA launched a game called The Simpson’s: Tapped Out. Almost immediately the game charged to #2 on the iPAD and #3 on the iPhone top free app lists. Growth started to follow the same trajectory as Draw Something! But the outcome couldn’t have been more different. While Draw Something continued to grow, EA was unable to keep up with the success of the game. Games were reportedly being “lost,” there was huge lag and users were beginning to complain, loudly. Rather than praise on twitter, there was a flood of negative reaction. EA was forced, just 4 days later! To pull the game from the App Store. As of the end of March, 2012, it had still not returned. What a contrast.
  • This chart shows Couchbase Server throughput in average operations per second across the number of nodes in a cluster. With 4 nodes, throughput is nearly 1.15 million ops/sec, this means that 1.4 GB/sec is being transferred between the database server and clientOperations are a mix of reads and writes in a 70:30 ratioHigh write throughput is seen even with significant sized 1KB documents This also shows linear throughput as more servers are added to the cluster. 0.62 million ops / sec with 2 nodes and 1.15 million ops / sec with 4 nodes.Demonstrates benefits of a shared nothing architecture, all nodes are identical and independent, so throughput scales linearlyKeys are auto-sharded across the cluster and evenly distributed whether the cluster has 1 node or 8 nodes
  • As user growth exploded, the data associated with the game expanded exponentially. By the time the company was acquired, there were over 5000 THOUSAND drawings EVERY SECOND being created and stored by Draw Something. Unprecedented growth – growth most systems would crumble under.
  • Definition of characteristics of claim (right now terms often used loosely)All about DATA ACCESS
  • Chart shows average latency (response times) across varying document sizes (1KB – 16KB)It demonstrates that Couchbase Server is ridiculously fast and responds in microsecond responses. (That is latency is < 100 μsec on a 10gig Ethernet network for documents of all sizes)The network latency has an impact on a 1GIG Ethernet network, however latency is flat/ consistent on a 10GIG Ethernet networkCouchbase Server gives you a consistent, predictable latency at any document size that is network bound.
  • This chart shows Couchbase Server throughput in average operations per second across the number of nodes in a cluster. With 4 nodes, throughput is nearly 1.15 million ops/sec, this means that 1.4 GB/sec is being transferred between the database server and clientOperations are a mix of reads and writes in a 70:30 ratioHigh write throughput is seen even with significant sized 1KB documents Throughput is network limited for higher document sizesThis also shows linear throughput as more servers are added to the cluster. 0.62 million ops / sec with 2 nodes and 1.15 million ops / sec with 4 nodes.Demonstrates a shared nothing architecture, all nodes are identical and independentKeys are auto-sharded across the cluster and evenly distributed whether the cluster has 1 node or 8 nodes
  • Most of you are probably familiar with the table layout. A table is defined with a set of column. And each record in the table conforms to the schema. If you wish to capture different data in the future, the table schema must be changed using the alter table statement. Typically data is normalized in the 3rd normal form reduce duplication. Large tables are split into smaller tables.using foreign keys
  • More points about MOngoDBScalability: Hotspots (range paritioningvs hash partioning) Fixed increments of how many servers can be added (add a new shard with same number of replicas as other shards) vs adding 1-n nodes in one step with Couchbase Data is slow to re-distribute to new nodes; hard to absorb sudden growth (e.g. going viral!)Performance Latencies are read:write ratio dependant because of per database locking Latencies get quickly higher as throughput ramps up Per node throughput 3-4 times lower on mixed workloads then CouchbaseAlways-on Upgrades are off-line Some maintenance like compaction is off-line (however, it can be done on a node by node basis, only taking individual nodes offline)
  • More points about MOngoDBScalability: Hotspots (range paritioningvs hash partioning) Fixed increments of how many servers can be added (add a new shard with same number of replicas as other shards) vs adding 1-n nodes in one step with Couchbase Data is slow to re-distribute to new nodes; hard to absorb sudden growth (e.g. going viral!)Performance Latencies are read:write ratio dependant because of per database locking Latencies get quickly higher as throughput ramps up Per node throughput 3-4 times lower on mixed workloads then CouchbaseAlways-on Upgrades are off-line Some maintenance like compaction is off-line (however, it can be done on a node by node basis, only taking individual nodes offline)

Couchbase_John_Bryce_Israel_Training_use_cases Couchbase_John_Bryce_Israel_Training_use_cases Presentation Transcript

  • Why companies useCouchbasePerry KrugSr. Solutions Architect
  • Common Use CasesSocial Gaming• Couchbase storesplayer and gamedata• Examplescustomers include:Zynga• Tapjoy, Ubisoft, TencentMobile Apps• Couchbase stores userinfo and app content• Examples customersinclude: Kobo, PlaytikaAd Targeting• Couchbase storesuser information forfast access• Examples customersinclude:AOL, Mediamind, ConvertroSession store• Couchbase Server as a key-value store• Examples customers include:Concur, SabreUser Profile Store• Couchbase Server as akey-value store• Examples customersinclude: TunewikiHigh availability cache• Couchbase Server used as a cache tier replacement• Examples customers include: OrbitzContent & MetadataStore• Couchbase document storewith Elastic Search• Examples customersinclude: McGraw Hill3rd party data aggregation• Couchbase stores social media anddata feeds• Examples customers include:Sambacloud
  • Use Cases & CustomersWeb app or Use-case Couchbase Solution Example CustomerContent Store &Metadata SystemCouchbase document store + Elastic SearchSocial Game &Mobile AppCouchbase store game and player dataAd Targeting Couchbase stores user information for fastaccessUser Profile Store Couchbase Server as a key-value storeSession Store Couchbase Server as a key-value storeHigh AvailabilityCaching TierCouchbase Server as a memcached tierreplacementChat/MessagingPlatformCouchbase Server
  • • Content metadata• Content: Articles, text• Landing pages for website• Digital content: eBooks,magazine, research materialContent and Metadata StoreUse Case: Content and Metadata Store• Flexibility to store any kind ofcontent• Fast access to content metadata(most accessed objects) andcontent• Full-text Search across data set• Scales horizontally as more contentgets added to the system• Fast access to metadata and content via object-managed cache• JSON provides schema flexibility to store all types of content andmetadata• Indexing and querying provides real-time analytics capabilitiesacross dataset• Integration with ElasticSearch for full-text search• Ease of scalability ensures that the data cluster can be grownseamlessly as the amount of user and ad data growsTypes of Data Application RequirementsWhy NoSQL and Couchbase
  • McGraw Hill Education LabsLearning portal
  • Use Case: Content and metadata storeBuilding a self-adapting, interactive learningportal with Couchbase
  • As learning move online in great numbersGrowing need to build interactive learning environments thatScale!Scale to millions oflearnersServe MHE as well as third-partycontentIncludingopen contentSupportlearning apps010100100111010101010101001010101010Self-adapt viausage dataThe Problem
  • • Allow for elastic scaling under spike periods• Ability to catalog & deliver content from manysources• Consistent low-latency for metadata and stats access• Require full-text search support for contentdiscovery• Offer tunable content ranking & recommendationfunctionsBackend is an Interactive Content Delivery Cloud that must:XML DatabasesSQL/MR EnginesIn-memory Data GridsEnterprise Search ServersExperimented with a combination of:The Challenge
  • The Learning Portal• Designed and built as acollaboration between MHE Labsand Couchbase• Serves as proof-of-concept andtesting harness for Couchbase +ElasticSearch integration• Available for download andfurther development as opensource code
  • • Document Modeling• Metadata & Content Storage• View Querying to support Content Browsing• Elastic Search Integration (Full Text Search)- Content Updated in near Real-Time- Search Content Summaries- Relevancy boosted based on User Preferences• Real-Time Content Updates• Event Logging for offline analysisTechniques Used
  • Couchbase 2.0 + ElasticsearchStore full-text articles as wellas document metadata forimage, video and text content inCouchbaseCombine user preferencesstatistics with customrelevancy scoring to providepersonalized search resultsLogs user behavior to calculateuser preference statistics (e.g.video > text)12 4Continuously accept updatesfrom Couchbase with newcontent & stats3
  • Data ModelContent MetadataBucketUser ProfilesBucketContent StatsBucket• Stores content metadata formedia objects and content forarticles• Includes tags, contributors,type information• Includes pointer to the media• Stores user view details pertype• Updated every time a userviews a doc with running count• To be used for customizing ESsearch results per userpreference• Stores content view details• Updated for every time adocument is viewed• To be used for boosting ESsearch results based onpopularity
  • Architecture
  • • User account information• User game profile info• User’s social graph• State of the game• Player badges and statsSocial and Mobile GamingUse Case: Social Gaming• Ability to support rapid growth• Fast response times for awesomeuser experience• Game uptime –24x7x365• Easy to update apps with newfeatures• Scalability ensures that games are ready to handle the millions ofusers that come with viral growth.• High performance guarantees players are never left waiting tomake their next move.• Always-on operations means zero interruption to game play (andrevenue)• Flexible data model means games can be developed rapidly andupdated easily with new featuresTypes of Data Application RequirementsWhy NoSQL and Couchbase
  • Social gaming atTencent Stomp Games
  • Use Case: Social gamingBuilding a social game with anawesome user experience thatcan scale to millions of players
  • Social gaming is all about the experienceApplications needs- User centric data (read key-value access)- Scalability- Easy and simple backendThe Problem
  • • Must be scalable• Highly available• Extreme performance (latency and throughput)• Cost effective• Operationally easy to maintainBackend must be a platform for multiple gamesCouchbaseMongoDBDBShardsMySQL ClusterExperimented with several databasesThe Challenge
  • Evaluations considerationsCouchbase MongoDB dbShards MySQL Cluster (NDB)Sharding strategyReplicationFailover supportScalabilityCustomized data supportSystem compatibilityCoding effortPerformanceProtocolUpgrade difficultyData persisting methodMap Reduce / JoinSQL compatibleLicensing PriceBulk priceManagement / monitor toolHardware requirementSupported OSOperation knowledgeOperation trainingOperation difficultyDeveloper company sizeMarket penetrationSupportSuccessful use cases
  • The architecture
  • 22Draw Something by OMGPOP
  • 23As Usage Grew, Game Data Went Non-LinearDraw Something by OMGPOPDaily Active Users (millions)
  • 24In Contrast…The Simpson’s: Tapped OutDaily Active Users (millions)
  • • Social media feeds: Twitter,Facebook, LinkedIn• Blogs, news, press articles• Data service feeds:Hoovers, Reuters3rd Party Data AggregationUse Case: 3rd party data aggregation• Flexibility to store any kind ofcontent• Flexibility to handle schemachanges• Full-text Search across data set• High speed data ingestion• Scales horizontally as more contentgets added to the system• JSON provides schema flexibility to store all types of content andmetadata• Fast access to individual documents via built-in cache, high writethroughput• Indexing and querying provides real-time analytics capabilities acrossdataset• Integration with ElasticSearch for full-text search• Ease of scalability ensures that the data cluster can be grownseamlessly as the amount of user and ad data growsTypes of Data Application RequirementsWhy NoSQL and Couchbase
  • 3rd party data aggregation atSambacloud
  • Use Case: 3rd party data aggregationBuilding a data and contentaggregation and managementplatform
  • More and more data and content coming in from externalsources: social media, data services, press and news,blogsRequire a single content store for all this information to handledifferent types of formats and schemasThe Problem
  • • Flexible data model to handle any schema andconstant changes to schemas• Allow for elastic scaling particularly for cloudenvironments• Consistent low-latency access and ability to handleincoming streams• Require full-text search support for content• Light weight analytics for sorting / rankingThe platform must supportThe Challenge
  • The TechnologiesWorkAgile ProjectsShareAny ContentOrganizeChannelsRecommendAnalyticsSambaCloud Content Services – REST API, HTML5
  • • Application objects• Popular search queryresults• Session information• Heavily accessed weblanding pagesHigh availability cachingUse Case: High availability caching• Consistently low response timesfor document / key lookups• High-availability - 24x7x365• Operationally easy to migrate /upgrade / maintain with apponline• Replacement for entire cachingtier• Low latency in sub-milliseconds with consistently high read /write throughput• Always-on operations even for database upgrades andmaintenance with zero down time• memcached compatibility for easy migration to Couchbasewithout any application changes• High availability and disaster replication with intra-cluster andcross-cluster replication (XDCR)Types of Data Application RequirementsWhy NoSQL and Couchbase
  • Challenges with a Memcached TierProblem Symptoms Couchbase SolutionCold Cache Slowdown or collapse of the dataservice layer due to heavilyoverloaded RDBMS whenmemcached nodes go down (onfailure or for maintenance)Data is automatically replicated acrossthe Couchbase cluster, providing highavailability of data even on failuresHeavy RDBMSContentionMultiple requests for data items thatdo not exist in the cache results insudden shifting of load to therelational database causing heavycontentionBy replicating data across the cluster,Couchbase Server provides consistentperformance without shifting load tothe RDBMS layerLack of Scalability Adding or removing memcachednodes is complicated and causesunpredictable applicationperformance degradationAuto-sharding and online rebalancing inCouchbase Server provides easy non-disruptive expansion of the clusterComplexMonitoringManagement of individualmemcached nodes increases thecomplexity of operations and lacks asingle consistent view of the cachinglayerCouchbase Server provides an in-builtadmin console for cluster widemanagement and monitoring as well asRESTful APIs for easy automation andthird-party integration
  • Before and After:Replacing Caching Tier with CouchbaseServer
  • Memcached Tier Replacement: How itWorks• Fully memcached protocol compatible• Easy to replace a tier of individual memcached servers with aCouchbase Server cluster• The cluster receives reads and writes, keeps frequently accesseditems in memory, persists and shards and replicates the dataamongst the cluster• Reads and writes are still as low latency and high throughput asmemcached• User gets all the scalability and high-availability advantages of aCouchbase Server cluster
  • • User profile: preferencesand psychographic data• Ad serving history by user• Ad buying history byadvertiser• Ad serving history byadvertiserAd TargetingUse Case: Ad Targeting• High performance to meetlimited ad serving budget; timeallowance is typically <40 msec• Scalability to handle hundreds ofmillions of user profiles andrapidly growing amount of data• 24x7x365 availability to avoid adrevenue loss• Sub-millisecond reads/writes means less time is needed for dataaccess, more time is available for ad logic processing, and morehighly optimized ads will be served• Ease of scalability ensures that the data cluster can be grownseamlessly as the amount of user and ad data grows• Always-on operations = always-on revenue. You will never missthe opportunity to serve an ad because downtime.Types of Data Application RequirementsWhy NoSQL and Couchbase
  • EasyScalabilityConsistent HighPerformanceAlwaysOn24x365Grow cluster withoutapplication changes, withoutdowntime with a single clickConsistent sub-millisecondread and write response timeswith consistent high throughputNo downtime for softwareupgrades, hardwaremaintenance, etc.Couchbase ServerJSONJSONJSONJSONJSONFlexible DataModelJSON document model withno fixed schema.Couchbase is the Complete Solution
  • Proven Easy, OnlineScalability
  • Scaling• Fully online throughout• Single REST/Click to add orremove arbitrary number ofnodes• Parallelize data movement onrebalance, throttles toprevent overload
  • Couchbase: High throughput thatscales linearlyLinear throughputscalabilityHigh throughput with 1.4GB/sec data transfer rateusing 4 servershttp://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf
  • Proven Rapid Growth ScalabilityDraw Something by OMGPOPDaily Active Users (millions)Feb 2012 March 2012
  • Consistent High Performance
  • Consistent High Performance• Consistent, predictable sub millisecond latency- Apps need fast, predictable access to data, it’s not good enoughto be fast some of the time• Consistent, predictable throughput- Throughput capacity of your data layer should be independent ofthe mix of reads and writes
  • Consistent low latency with varyingdoc sizesConsistently low latenciesin microseconds forvarying documents sizeswith a mixed workloadhttp://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf
  • High throughput that scales linearlyLinear throughputscalabilityHigh throughput with 1.4GB/sec data transfer rateusing 4 servers
  • Linked-In 4 node cluster
  • Always On 24x7x365
  • Always on 24x7x365• Online upgrades- Balance in nodeswith new versions• Online backup• Online compaction• Built-in monitoring plus REST interface- Cluster wide to per node drill down• Full admin REST interface for easy integration
  • Availability0 20 40 60 80 100CACHE 1CACHE 2CACHE 3825772Couchbase
  • Flexible Data Model
  • Relational vs Document Data ModelRelational data model Document data modelCollection of complex documents witharbitrary, nested data formats andvarying “record” format.Highly-structured table organizationwith rigidly-defined data formats andrecord structure.C1 C2 C3 C4JSONJSONJSON{}
  • Comparisons
  • Couchbase Server vs. MongoDBEasyScalabilityConsistent, HighPerformanceFlexibleData ModelAlways On24x7x365Consistent sub millisecondreads/writes;Consistent high throughputNo downtime forsoftwareupgrades, hardwaremaintenance, etc.Schemaless datamodel for rapiddevelopmentWith 1-click, horizontallygrow cluster, even scaleacross datacentersHigh & Inconsistent latency;Lower throughputSchemaless datamodel for rapiddevelopmentDifficult onlineupgrade;Not all maintenanceis onlineComplex multi-stepscaling, no write scalingacross data centers✔ ✖✔✔✔✔✖✖✔
  • Couchbase Server Leadership vs.CassandraEasyScalabilityConsistent, HighPerformanceFlexibleData ModelAlways On24x7x365Consistent sub-millisecondreads/writes and highthroughputNo downtime forsoftwareupgrades, hardwaremaintenance, etc.Schemaless datamodel for rapiddevelopmentWith 1-click, horizontallygrow cluster, even scaleacross datacentersHigh and inconsistentlatency; mediumthroughputVery complexcolumnar datamodelOnline upgrades andonline maintenanceComplex multi-stepscaling, coarse graingrowth recommended✔✔✔✔ ✖✖✖✔
  • Read performance comparison -NoSQL databases0246810121416180 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 2200095thPercentileLatency(ms)Operations per SecondRead latencies against throughputMongoDB cannot handlethroughput above ~ 8000 ops / secCouchbase handles ~3X throughputwith significantly lower latencyMongoDBCassandraCouchbaseThird Party Data - Altoros
  • Write performance comparison -NoSQL databases0510152025300 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 2200095thPercentileLatency(ms)Operations per SecondInsert/update latencies against throughputMongoDB latency shootsup beyond 6000 ops / secCouchbase latency stays consistentlylow even at 20000 ops / secMongoDBCassandraCouchbaseThird Party Data - Altoros
  • Thank you!Get Couchbasehttp://www.couchbase.com/downloadperry@couchbase.com