Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Navigating the NoSQL Landscape using Lego Mindstorms and Java

218
views

Published on


0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
218
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Navigating the NoSQL Landscapeusing Lego Mindstorms and Java Michael Nitschinger Developer Advocate, Couchbase Inc.
  • 2. Navigating the NoSQL Landscapeusing Lego Mindstorms and Java Michael Nitschinger Developer Advocate, Couchbase Inc.
  • 3. {“about”:*“me”}*•  Developer(Advocate(at(Couchbase,(Inc.(•  Maintainer(of(the(Couchbase(Java(SDK(•  Speaking(at(Conferences(and(Meetups(•  Living(and(Working(here(in(Vienna,(Austria(
  • 4. What*we’ll*talk*about*•  What*are*the*limits*of*RDBMS*solu=ons?*•  What*are*the*different*NoSQL*taxonomies?*•  Which*NoSQL*solu=on*is*right*for*me?*
  • 5. Growth*is*the*New*Reality*•  Instagram*gained*nearly*1*million*users*overnight*when*they* expanded*to*Android*
  • 6. Showcase:*Draw*Something*
  • 7. Showcase:*Draw*Something*
  • 8. Showcase:*Draw*Something*
  • 9. Does*it*work*with*RDMBS*backend?* Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex serverNote(–(RelaEonal(database(technology(is(great(for(what(it(is(great(for,(but(it(is(not(great(for(this.(
  • 10. Some*alterna=ves*to*scale*out*your*RDBMS*Scale*out*your*RDBMS*•  Run*many*SQL*Servers*•  Data*is*sharded* (on$the$app$level!)$•  Memcached/Cache*for*faster* response*=me*•  Writes*are*s=ll*slow*
  • 11. Scale*out*with*RDBMS*Is*this*a*good*approach*to*scale?*•  Lot*of*components*to*deploy*•  Scale*by*Hand* ­  Caching( ­  Sharding/ReplicaEon( Learn*From*Others(( This(Scenario(Costs(Time(and(Money.(Scaling(SQL(is(potenEally(disastrous(when(going(Viral:(( Very(risky(Eme(for(major(code(changes(and(migraEons...( You(have(no(Time(when(skyrockeEng(up!(
  • 12. The*Rela=onal*Model*•  Formulated*and*proposed*by*Edgar*Codd*in*1969.* ­  hPp://en.wikipedia.org/wiki/RelaEonal_model(•  Based*on*Rela=onal*Algebra* ­  which(is(based(on(Set(Theory(•  Not*all*Problems*fit*into*Set*Theory* ­  i.e.(Graph(Theory( hPp://en.wikipedia.org/( wiki/Honeywell_316( ­  RelaEonships( ­  RecommendaEons(
  • 13. Lacking*market*solu=ons,*users*forced*to* invent* Bigtable ( Dynamo( Cassandra ( Voldemort (November(2006( October(2007( August(2008( February(2009( •  No(schema(required(before(inserEng(data( •  No(schema(change(required(to(change(data(format( •  Autodsharding(without(applicaEon(parEcipaEon( •  Distributed(queries( •  Integrated(main(memory(caching( •  Data(synchronizaEon((mobile,(mulEddatacenter)( Very(few(organizaEons(want(to((fewer(can)(build(and(maintain(database(sobware(technology.( But(every(organizaEon(building(interacEve(web(applicaEons(needs(this(technology.(
  • 14. Survey:*Schema*inflexibility*#1* adop=on*driver* What*is*the*biggest*data*management*problem** driving*your*use*of*NoSQL*in*the*coming*year?* Lack(of(flexibility/rigid(schemas( 49%( Inability(to(scale(out(data( 35%( High(latency/low(performance( 29%( Costs( 16%( All(of(these( 12%( Other( 11%( Source: Couchbase NoSQL Survey, December 2011, n=1351
  • 15. NoSQL*database*matches*applica=on*logic*=er*architecture* Data(layer(now(scales(with(linear(cost(and(constant(performance( Application Scales Out Just add more commodity web serversNoSQL(Database(Servers( Database Scales Out Just add more commodity data serversScaling out flattens the cost and performance curves.
  • 16. NoSQL*Taxonomy*
  • 17. The*CAP*Theorem*•  In*a*distributed*System:* ­  Consistency( C A ­  Availability( ­  ParEEon(Tolerance(•  When*Par==on*happens* P ­  Choose(either(Consistency( (only(respond(to(subset)( ­  or(Availability( (accept(stale(data(and(conflict(writes)( Conflict(ResoluEon!(
  • 18. Clarifica=on*•  Big*Data* ­  Large(scale(datastore((“>=(100TB(or(Petabytes”)( ­  OpEmized(for(Batch(Processing( ­  Data(Warehouse(•  Big*Users* ­  very(high(get/set(rate((thousands(of(ops/s)( ­  working(set(in(RAM( ­  latency(and(throughput(maPers(most( ­  (near)(RealdTime(use(cases(
  • 19. The*Key`Value*Store*/*“Cache”*–*the* founda=on*of*NoSQL* Key* 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( Opaque* 101100101000100010011101( 101100101000100010011101( Binary* 101100101000100010011101( 101100101000100010011101( Value* 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( 101100101000100010011101(
  • 20. Memcached*–*the*NoSQL*precursor*Key* 101100101000100010011101( Memcached* 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( Indmemory(only( 101100101000100010011101( Limited(set(of(operaEons( Opaque* 101100101000100010011101( Blob(Storage:(Set,(Add,(Replace,(CAS( 101100101000100010011101( Binary* 101100101000100010011101( Retrieval:(Get( 101100101000100010011101( Structured(Data:(Append,(Increment( Value* 101100101000100010011101( ( 101100101000100010011101( “Simple(and(fast.”( 101100101000100010011101( 101100101000100010011101( ( 101100101000100010011101( Challenges:(( 101100101000100010011101( d((((cold(cache( d  disrupEve(elasEcity( d  missing(persistence(
  • 21. Database( Cache((memory/disk)( (memory(only)( Memcached( Key`Value* NoSQL*catalog*
  • 22. Redis*–*More*“Structured*Data”* commands*Key* 101100101000100010011101( Redis* 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( “Data*Structures”* Disk(Persistence((eventual(consistency(on( 101100101000100010011101( the(disk)! Blob* 101100101000100010011101( Vast(set(of(operaEons( 101100101000100010011101( List* 101100101000100010011101( Blob(Storage:(Set,(Add,(Replace,(CAS( 101100101000100010011101( Set* Retrieval:(Get,(PubdSub( 101100101000100010011101( Structured(Data:(Strings,(Hashes,(Lists,(Sets,( Hash* 101100101000100010011101( Sorted(lists( 101100101000100010011101( …* 101100101000100010011101( ( 101100101000100010011101( Challenges:( ( 101100101000100010011101( (d(clustering((to(come)( (d(RAM(limit((no(evicEon)( (
  • 23. NoSQL*catalog* Key`Value* Data*Structure*(memory(only)( Cache( Memcached( Redis((memory/disk)( Database(
  • 24. Membase*–*From*key`value*cache*to* database*Key* Membase* 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( 101100101000100010011101( Diskdbased(with(builtdin(memcached(cache( 101100101000100010011101( Cache(refill(on(restart( Opaque* 101100101000100010011101( Memcached(compaEble((drop(in(replacement)( 101100101000100010011101( Binary* 101100101000100010011101( Highlydavailable((data(replicaEon)( 101100101000100010011101( Add(or(remove(capacity(to(live(cluster( Value* 101100101000100010011101( ( 101100101000100010011101( “Simple,(fast,(elasEc.”( 101100101000100010011101( 101100101000100010011101( ( 101100101000100010011101( 101100101000100010011101(
  • 25. NoSQL*catalog* Key`Value* Data*Structure*(memory(only)( Cache( Memcached( Redis((memory/disk)( Database( Membase(
  • 26. Couchbase*–*Document`oriented* database*Key* Couchbase* {( ((((“string”(:(“string”,( ((((“string”(:(value,( Autodsharding( ((((“string”(:(( Diskdbased(with(builtdin(memcached(cache( JSON*&* ((((((((((({((“string”(:(“string”,( Cache(refill(on(restart( (((((((((((((((“string”(:(value(},( Opaque* ((((“string”(:([(array(]( Memcached(compaEble((drop(in(replace)( Highlydavailable((data(replicaEon)( }( OBJECT* Add(or(remove(capacity(to(live(cluster( ( (“DOCUMENT”)* ( When(values(are(JSON(objects((“documents”):( Create(indices,(views(and(query(against(the( views( ( Chooses(Consistency(over(Availability(
  • 27. NoSQL*catalog* Key`Value* Data*Structure* Document*(memory(only)( Cache( Memcached( Redis((memory/disk)( Database( Membase( Couchbase(
  • 28. MongoDB*–*Document`oriented* database*Key* MongoDB* {( ((((“string”(:(“string”,( ((((“string”(:(value,( Diskdbased(with(indmemory(“caching”( ((((“string”(:(( BSON((“binary(JSON”)(format(and(wire(protocol( BSON* ((((((((((({((“string”(:(“string”,( Masterdslave(replicaEon( OBJECT* (((((((((((((((“string”(:(value(},( Autodsharding( ((((“string”(:([(array(]( (“DOCUMENT”)* Values(are(BSON(objects( }( Supports(ad(hoc(queries(–(best(when(indexed( ( ( more(similar(to(RDBMS(modeling(than(Caches( ( Scaling(over(sharding(requires(special(nodes(
  • 29. NoSQL*catalog* Key`Value* Data*Structure* Document*(memory(only)( Cache( Memcached( Redis((memory/disk)( Database( Membase( Couchbase( MongoDB(
  • 30. Cassandra*–*Column*overlays* Key 101100101000100010011101 Cassandra* 101100101000100010011101Column(1( 101100101000100010011101 101100101000100010011101 Diskdbased(system( 101100101000100010011101 Opaque 101100101000100010011101 Clustered((Column(2( 101100101000100010011101 Binary External(caching(required(for(lowdlatency(reads( 101100101000100010011101 101100101000100010011101 “Columns”(are(overlaid(on(the(data( Value 101100101000100010011101 101100101000100010011101 Not(all(rows(must(have(all(columns(Column(3(( 101100101000100010011101(not(present)(( 101100101000100010011101 Supports(efficient(queries(on(columns( 101100101000100010011101 101100101000100010011101 Restart(required(when(adding(columns( ( MulEdDatadCenter(replicaEon(supported( ColumndModel(may(be(complex(to(start(with( ( Chooses(Availability(over(Consistency( ( (
  • 31. NoSQL*catalog* Key`Value* Data*Structure* Document* Column*(memory(only)( Cache( Memcached( Redis((memory/disk)( Database( Membase( Couchbase( Cassandra( MongoDB(
  • 32. Neo4j*–*Graph*database* Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value Neo4j* 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101Key Key Diskdbased(system(101100101000100010011101101100101000100010011101101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 External(caching(required(for(lowdlatency(reads( Nodes,(relaEonships(and(paths(101100101000100010011101 101100101000100010011101101100101000100010011101 101100101000100010011101 Opaque101100101000100010011101 Opaque 101100101000100010011101101100101000100010011101 101100101000100010011101 Binary Binary ProperEes(on(nodes(101100101000100010011101 101100101000100010011101101100101000100010011101 101100101000100010011101 Value101100101000100010011101 Value 101100101000100010011101101100101000100010011101 101100101000100010011101 Delete,(Insert,(Traverse,(etc.(101100101000100010011101 101100101000100010011101101100101000100010011101 101100101000100010011101101100101000100010011101 101100101000100010011101101100101000100010011101 101100101000100010011101 ( ( Key Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101
  • 33. NoSQL*catalog* Key`Value* Data*Structure* Document* Column* Graph*(memory(only)( Cache( Memcached( Redis((memory/disk)( Database( Membase( Couchbase( Cassandra( Neo4j( MongoDB(
  • 34. NoSQL*catalog* Key`Value* Data*Structure* Document* Column* Graph*(memory(only)( Cache( Memcached( Redis( Coherence((memory/disk)( Database( Membase( Couchbase( Cassandra( Neo4j( Riak( MongoDB( HBase( InfiniteGraph(
  • 35. What*about*Hadoop?*
  • 36. Hadoop:*Big*Data*Swiss*Army*Knife*•  Oozie:(Workflow,(coordinaEon(•  Sqoop(:(Data(connector(to(import/export(data(•  Hive(:(SQLdLike(interface(•  Pig(:(High(level(programming(language(•  Mahout(:(Machine(learning(library(•  Whirr(:(Hadoop(management(tools(for(cloud(services(•  Flume(:(Aggregator(•  Map(Reduce(:(Framework(to(process(large(volume(of(data(•  HBase(:(Key(Value(data(store(•  Zookeeper(:(Centralized(configuraEon(management(•  HDFS(:(Distributed(file(system(
  • 37. So*what?*Connec=ng*Hadoop* 40*milliseconds*to(respond( with(the(decision.( profiles,(real(Eme(campaign(( 3* staEsEcs( 2* 1* profiles,(campaigns(click(stream( events(
  • 38. Which*one*is*right*for*me?*
  • 39. Survey:*Schema*inflexibility*#1* adop=on*driver* What*is*the*biggest*data*management*problem** driving*your*use*of*NoSQL*in*the*coming*year?* Lack(of(flexibility/rigid(schemas( 49%( Inability(to(scale(out(data( 35%( High(latency/low(performance( 29%( Costs( 16%( All(of(these( 12%( Other( 11%( Source: Couchbase NoSQL Survey, December 2011, n=1351
  • 40. Lack*of*Flexibility*/*Rigid*Schema*•  Aggregate*Data*Models*(Mar0n$Fowler)$ ­  Flexible(Data(Structure( ­  OpEmized(Access( ­  Easy(to(distribute(data( o::1001* { uid: ji22jd, customer: Ann, line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ], payment: { type: Amex, expiry: 04/2001, last5: 12345 } } hPp://marEnfowler.com/bliki/AggregateOrientedDatabase.html(
  • 41. Use*Cases*Key*Value* • *Session*Management* • *User*Profile/Preferences* • *Shopping*Cart*Document* • *Event*Logging* • *Content*Management** • *Web*Analy=cs* • *E`Commerce*Applica=on*Columns* • *Event*Logging* • *Content*Management* • *Counters*Graph* • *Connected*Data*/**Social*Networks* • *Rou=ng,*Dispatch* • *Recommenda=ons*based*on*Social*Graph*
  • 42. Produc=on*Environment* EMEA*DC* * US*DATA* CENTER* * APAC*DC* *
  • 43. How*do*I*want*to*scale*out?*•  Modify*cluster*topology*should*be*simple* ­  Add,(Remove,(Configure(Nodes(on(a(running(system(•  What*is*the*impact*of*topology*changes?* ­  Sharding,(Caching(of(the(data( ­  Availability(of(the(service(during(cluster(changes(•  More*hardware*=*More*failures* ­  Availability,(reliability(of(the(system:(failover(support(
  • 44. Add*Nodes*to*Cluster* APP*SERVER*1* APP*SERVER*2* COUCHBASE*Client*Library* COUCHBASE*Client*Library* * * CLUSTER*MAP* * CLUSTER*MAP* * READ/WRITE/UPDATE* READ/WRITE/UPDATE* SERVER*1* * SERVER*2* * SERVER*3* * SERVER*4* * SERVER*5* * •  Two*servers*added* * ACTIVE* * ACTIVE* * ACTIVE* * ACTIVE* * ACTIVE* One`click*opera=on* Doc*5* Doc* Doc*4* Doc* Doc*1* Doc* •  Docs*automa=cally* rebalanced*across* Doc*2* Doc* Doc*7* Doc* Doc*2* Doc* cluster* Even(distribuEon(of(docs( Minimum(doc(movement( Doc*9* Doc* Doc*8* Doc* Doc*6* Doc* •  Cluster*map*updated* REPLICA* REPLICA* REPLICA* REPLICA* REPLICA* •  App*database** Doc*4* Doc* Doc*6* Doc* Doc*7* Doc* calls*now*distributed** over*larger*number*of* Doc*1* Doc* Doc*3* Doc* Doc*9* Doc* servers* * Doc*8* Doc* Doc*2* Doc* Doc*5* Doc* COUCHBASE*SERVER*CLUSTER*User(Configured(Replica(Count(=(1(
  • 45. Fail*Over*Node* APP*SERVER*1* APP*SERVER*2* COUCHBASE*Client*Library* COUCHBASE*Client*Library* * * CLUSTER*MAP* * CLUSTER*MAP* * SERVER*1* * SERVER*2* * SERVER*3* * SERVER*4* * SERVER*5* * •  App*servers*accessing*docs* * * * * * ACTIVE* ACTIVE* ACTIVE* ACTIVE* ACTIVE* •  Requests*to*Server*3*fail* Doc*5* Doc* Doc*4* Doc* Doc*1* Doc* Doc*9* Doc* Doc*6* Doc* •  Cluster*detects*server*failed* Promotes(replicas(of(docs(to( Doc*2* Doc* Doc*7* Doc* Doc*2* Doc* Doc*8* Doc* Doc* acEve( Updates(cluster(map( Doc*1* Doc*3* •  Requests*for*docs*now*go*to* REPLICA* REPLICA* REPLICA* REPLICA* REPLICA* appropriate*server* Doc*4* Doc* Doc*6* Doc* Doc*7* Doc* Doc*5* Doc* Doc*8* Doc* •  Typically*rebalance** would*follow* Doc*1* Doc* Doc*3* Doc* Doc*9* Doc* Doc*2* Doc* COUCHBASE*SERVER*CLUSTER*User(Configured(Replica(Count(=(1(
  • 46. Performance*•  What*is*my*working*set?* ­  Different(PaPerns(based(on(the(ApplicaEon( ­  Social(Games(vs.(AnalyEcs(•  What*do*I*need*to*cache*/*how*oren?* ­  Put(your(data(in(RAM( ­  Read/Write(rates(•  How*to*design*my*data*model?* ­  Trim(towards(your(“hot(code(path”( ­  Aggregate(Model( ­  Easy(to(change(
  • 47. Management*and*Monitoring*•  Do*not*forget*about*Opera=ons!* ­  Service(Reliability(Engineering(Team(will(thank(you!(•  Manage*your*cluster*easily:* ­  Command(Line,(AdministraEon(Console(to(change(cluster(toplogy(•  Monitor*“your*NoSQL”* ­  Analyze(the(overall(status(of(your(cluster( ­  View(and(fix(boPlenecks(
  • 48. Conclusion*•  One*Size*Does*Not*Fit*All*•  Overview*of*the*the*NoSQL*types*•  Choose*the*right*solu=on*for*your*applica=on*•  Don’t*mix*Big*Data*with*Big*Users!*
  • 49. Q&A*
  • 50. Thank*you!*michael.nitschinger@couchbase.com( @daschl( ( Get(Couchbase(Server(at((hPp://www.couchbase.com/download(