Friday, April 26, 13
Introduc)on*to*Map*Reducewith*CouchbaseTugdual*Grall*/*@tgrallNoSQL&Ma)ers&‘13&0&Cologne&0&April&25th&2013Friday, April 26...
About*Me*• Tugdual*“Tug”*Grall­ Couchbase• Technical.Evangelist­ eXo• CTO­ Oracle• Developer/Product.Manager• Mainly.Java/...
What’s*the*Problem*?Lots&of&DataBig&Data SaaS/Cloud&CompuDngBig&UsersFriday, April 26, 13
Solu)onDistribute:•&the&data•&the&processing&of&the&dataFriday, April 26, 13
Map*Reduce*MapReduce.is.a.programming.model.for.processing.large.data.sets,.and.the.name.of.an.implementa@on.of.the.model....
In*details• Developer*specifies*2*methods:­ map (in_key, in_value) -> list(out_key, intermediate_value)• Processes.input.da...
Execu)onFriday, April 26, 13
Most*common*use*case©.Yahoo.inc.Friday, April 26, 13
What*about*Couchbase?Friday, April 26, 13
Couchbase*Open*Source*Project• Leading.NoSQL.database.project.focused.on.distributed.database.technology.and.surrounding.e...
Couchbase*Server*Core*PrinciplesEasy*ScalabilityConsistent*High*PerformanceAlways*On*24x365Grow.cluster.without.applica@on...
Addi)onal*Couchbase*Server*FeaturesBuiltKin.clustering.–.All.nodes.equalData.replica@on.with.autoKfailoverZeroKdown@me.mai...
HeartbeatProcess.monitorGlobal.singleton.supervisorConfigura@on.manageron.each.nodeRebalance.orchestratorNode.health.monito...
New*Persistence*Layerstorage.interfaceCouchbase*EP*Engine11210Memcapable..2.0Moxi11211Memcapable..1.0Object]level*CacheDis...
COUCHBASE&SERVER&CLUSTERBasic*Opera)on• Docs*distributed*evenly*across*servers*• Each*server*stores*both*ac)ve*and*replica...
How.to.access.the.data?Friday, April 26, 13
Couchbase.get(“my-key”);Friday, April 26, 13
Key{....“string”.:.“string”,....“string”.:.value,....“string”.:............{..“string”.:.“string”,...............“string”....
Create&an&index&!How*to?Friday, April 26, 13
{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-2...
Concrete*Example• This*map*func)on:­ receives.the.document.and.metadata­ as.developer.you.just.have.to.emit.the.K,VFriday,...
Map*Func)onTextFriday, April 26, 13
doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchb...
doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchb...
doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchb...
How.it.works.?Friday, April 26, 13
COUCHBASE&SERVER&&CLUSTERIndexing*and*Querying*User.Configured.Replica.Count.=.1ACTIVEDoc&5Doc&2DocDocDocSERVER&1REPLICADoc...
Couchbase*Server*2.0:*Views• Views*can*cover*a*few*different*use*cases­ Primary.Index.­ Simple.secondary.indexes.(the.most....
Distributed*Index*Build*Phase• Op)mized*for*lookups,*in]order*access*and*aggrega)ons• All*view*reads*from*disk*(different*p...
Dynamic(Range(Queries(with(Op5onal(Aggrega5on•Efficiently.fetch.an.row.or.group.of.related.rows.•Queries.use.cached.values.f...
Append*Only*Index• Disk&acDvity&is&slow• UpdaDng&disk&blocks&is&very&slow• Appending&new&data&to&the&end&of&the&current&fil...
Adding*a*new*DocumentA-R15I-R8M-R5A B C D F G H I K L N O Q RA-C3D-F2G-H2I-L3N-R4A-H7I-R7A-R14Mnew rootnew keynew reductio...
What*about*Reduce*?• Out*of*the*box*func)ons*:­ _count()­ _sum()­ _stats()• Create*your*own*if*neededfunction(key, values,...
Reduce*Func)on• Key*and*Arrays*of*values*as*parameters• WriVen*Javascript• Called*aner*the*map*func)on• Used*to*reduce*the...
• Map()*Result• Reduce()• ResultReduce*in*Ac)onKey ValueBelgianKStyle.Dubbel 1BelgianKStyle.Dubbel 1BelgianKStyle.Dubbel 1...
How*to*use*it?• Use*client*SDK*to*call*the*view:View view = client.getView("beer", "by_name");Query query = new Query();qu...
Demonstra)onFriday, April 26, 13
≠Hadoop*&*Couchbase• Deal&with&“Big&Data”• “More”&is&be)er&than&“Faster”• Batch&Oriented• Usually&used&to&“extract/transfo...
Map*Reduce*in*Couchbase• Like*many*other*NoSQL*Database*:*Used*for*queries*!*• Index*are*distributed*on*each*node*of*the*c...
Thank.you!tug@couchbase.com@tgrallGet.Couchbase.Server.at.hEp://www.couchbase.com/downloadFriday, April 26, 13
Friday, April 26, 13
Upcoming SlideShare
Loading in...5
×

Couchbase_NoSQL Matters_Introduction_to_map_reduce_2013

237

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
237
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Couchbase_NoSQL Matters_Introduction_to_map_reduce_2013"

  1. 1. Friday, April 26, 13
  2. 2. Introduc)on*to*Map*Reducewith*CouchbaseTugdual*Grall*/*@tgrallNoSQL&Ma)ers&‘13&0&Cologne&0&April&25th&2013Friday, April 26, 13
  3. 3. About*Me*• Tugdual*“Tug”*Grall­ Couchbase• Technical.Evangelist­ eXo• CTO­ Oracle• Developer/Product.Manager• Mainly.Java/SOA­ Developer.in.consul@ng.firms• Web• @tgrall• hEp://blog.grallandco.com• tgrall• NantesJUG.coKfounder• Pet.Project.:• hEp://www.resultri.comFriday, April 26, 13
  4. 4. What’s*the*Problem*?Lots&of&DataBig&Data SaaS/Cloud&CompuDngBig&UsersFriday, April 26, 13
  5. 5. Solu)onDistribute:•&the&data•&the&processing&of&the&dataFriday, April 26, 13
  6. 6. Map*Reduce*MapReduce.is.a.programming.model.for.processing.large.data.sets,.and.the.name.of.an.implementa@on.of.the.model.by.Google..MapReduce.is.typically.used.to.do.distributed.compu@ng.on.clusters.of.computers.hEp://research.google.com/archive/mapreduce.htmlFriday, April 26, 13
  7. 7. In*details• Developer*specifies*2*methods:­ map (in_key, in_value) -> list(out_key, intermediate_value)• Processes.input.data.• Produces.key,.values.pairs­ reduce (out_key, list(intermediate_value)) -> list(out_value)• Combines.all.intermediate.values.for.a.par@cular.key• Produce.a.set.of.merged.output.valuesFriday, April 26, 13
  8. 8. Execu)onFriday, April 26, 13
  9. 9. Most*common*use*case©.Yahoo.inc.Friday, April 26, 13
  10. 10. What*about*Couchbase?Friday, April 26, 13
  11. 11. Couchbase*Open*Source*Project• Leading.NoSQL.database.project.focused.on.distributed.database.technology.and.surrounding.ecosystem• Supports.both.keyKvalue.and.documentKoriented.use.cases• All.components.are.available.under.the.Apache.2.0.Public.License• Obtained.as.packaged.soXware.in.both.enterprise.and.community.edi@ons.CouchbaseOpen Source ProjectFriday, April 26, 13
  12. 12. Couchbase*Server*Core*PrinciplesEasy*ScalabilityConsistent*High*PerformanceAlways*On*24x365Grow.cluster.without.applica@on.changes,.without.down@me.with.a.single.clickConsistent.subKmillisecond.read.and.write.response.@mes.with.consistent.high.throughputNo.down@me.for.soXware.upgrades,.hardware.maintenance,.etc.Flexible*Data*ModelJSON.document.model.with.no.fixed.schema.JSONJSONJSONJSONJSONPERFORMANCEFriday, April 26, 13
  13. 13. Addi)onal*Couchbase*Server*FeaturesBuiltKin.clustering.–.All.nodes.equalData.replica@on.with.autoKfailoverZeroKdown@me.maintenance.BuiltKin.managed.cachedAppendKonly.storage.layerOnline.compac@onMonitoring.and.admin.API.&.UISDK.for.a.variety.of.languagesFriday, April 26, 13
  14. 14. HeartbeatProcess.monitorGlobal.singleton.supervisorConfigura@on.manageron.each.nodeRebalance.orchestratorNode.health.monitorone.per.clustervBucket.state.and.replica@on.managerhVpREST*management*API/Web*UIHTTP8091Erlang.port.mapper4369Distributed.Erlang21100&0&21199Erlang/OTPstorage.interfaceCouchbase*EP*Engine11210Memcapable..2.0Moxi11211Memcapable..1.0MemcachedNew*Persistence*Layer8092Query.APIQuery*EngineData&Manager Cluster&ManagerCouchbase*Server*2.0*ArchitectureFriday, April 26, 13
  15. 15. New*Persistence*Layerstorage.interfaceCouchbase*EP*Engine11210Memcapable..2.0Moxi11211Memcapable..1.0Object]level*CacheDisk*Persistence8092Query.APIQuery*EngineHTTP8091Erlang.port.mapper4369Distributed.Erlang21100&0&21199HeartbeatProcess.monitorGlobal.singleton.supervisorConfigura@on.manageron.each.nodeRebalance.orchestratorNode.health.monitorone.per.clustervBucket.state.and.replica@on.managerhVpREST*management*API/Web*UIErlang/OTPServer/Cluster&Management&&&CommunicaDon(Erlang)RAM&Cache,&Indexing&&&Persistence&Management(C&&&V8)The Unreasonable Effectiveness of C by Damien KatzCouchbase*Server*2.0*ArchitectureFriday, April 26, 13
  16. 16. COUCHBASE&SERVER&CLUSTERBasic*Opera)on• Docs*distributed*evenly*across*servers*• Each*server*stores*both*ac)ve*and*replica*docsOnly.one.server.ac@ve.at.a.@me• Client*library*provides*app*with*simple*interface*to*database• Cluster*map*provides*map*to*which*server*doc*is*onApp.never.needs.to.know• App*reads,*writes,*updates*docs• Mul)ple*app*servers*can*access*same*document*at*same*)meUser.Configured.Replica.Count.=.1READ/WRITE/UPDATEACTIVEDoc&5Doc&2DocDocDocSERVER&1ACTIVEDoc&4Doc&7DocDocDocSERVER&2Doc&8ACTIVEDoc&1Doc&2DocDocDocREPLICADoc&4Doc&1Doc&8DocDocDocREPLICADoc&6Doc&3Doc&2DocDocDocREPLICADoc&7Doc&9Doc&5DocDocDocSERVER&3Doc&6APP&SERVER&1COUCHBASE&Client&LibraryCLUSTER&MAPCOUCHBASE&Client&LibraryCLUSTER&MAPAPP&SERVER&2Doc&9Friday, April 26, 13
  17. 17. How.to.access.the.data?Friday, April 26, 13
  18. 18. Couchbase.get(“my-key”);Friday, April 26, 13
  19. 19. Key{....“string”.:.“string”,....“string”.:.value,....“string”.:............{..“string”.:.“string”,...............“string”.:.value.},....“string”.:.[.array.]}JSONOBJECT(“DOCUMENT”)• How*to*find*document*based*on*its*aVributes?­ get.employee.by.email­ get.products.by.type­ ...• You*need*to*look*“into”*the*document/valueLook*at*a*documentFriday, April 26, 13
  20. 20. Create&an&index&!How*to?Friday, April 26, 13
  21. 21. {"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}Key ValueAven@nus 8.2Avenue.Ale 4.1... ...{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}Create*the*indexFriday, April 26, 13
  22. 22. Concrete*Example• This*map*func)on:­ receives.the.document.and.metadata­ as.developer.you.just.have.to.emit.the.K,VFriday, April 26, 13
  23. 23. Map*Func)onTextFriday, April 26, 13
  24. 24. doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchbase.com u::6ye@@couchbase.com u::4zorro@couchbase.com u::3?startkey=”b1”&&&endkey=”zz”Pulls.the.IndexKKeys.between.UTFK8.Range.specified.by.the.startkey.and.endkey.?startkey=”bz”&&&endkey=”zn”Pulls.the.IndexKKeys.between.UTFK8.Range.specified.by.the.startkey.and.endkey.Friday, April 26, 13
  25. 25. doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchbase.com u::6ye@@couchbase.com u::4zorro@couchbase.com u::3?key=”math@couchbase.com”&Match.a.Single.IndexKKeyFriday, April 26, 13
  26. 26. doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchbase.com u::6ye@@couchbase.com u::4zorro@couchbase.com u::3?keys=[“math@couchbase.com”,“yeD@couchbase.com”]Query.Mul@ple.in.the.Set.(Array.Nota@on)Friday, April 26, 13
  27. 27. How.it.works.?Friday, April 26, 13
  28. 28. COUCHBASE&SERVER&&CLUSTERIndexing*and*Querying*User.Configured.Replica.Count.=.1ACTIVEDoc&5Doc&2DocDocDocSERVER&1REPLICADoc&4Doc&1Doc&8DocDocDocAPP&SERVER&1COUCHBASE&Client&LibraryCLUSTER&MAPCOUCHBASE&Client&LibraryCLUSTER&MAPAPP&SERVER&2Doc&9• Indexing*work*is*distributed*amongst*nodes• Large*data*set*possible• Parallelize*the*effort• Each*node*has*index*for*data*stored*on*it• Queries*combine*the*results*from*required*nodesACTIVEDoc&5Doc&2DocDocDocSERVER&2REPLICADoc&4Doc&1Doc&8DocDocDocDoc&9ACTIVEDoc&5Doc&2DocDocDocSERVER&3REPLICADoc&4Doc&1Doc&8DocDocDocDoc&9QueryFriday, April 26, 13
  29. 29. Couchbase*Server*2.0:*Views• Views*can*cover*a*few*different*use*cases­ Primary.Index.­ Simple.secondary.indexes.(the.most.common)­ Complex.secondary,.ter@ary.and.composite.indexes­ Aggrega@on.func@ons.(reduc@on)• Example:.count.the.number.of.“North.American.Ales”­ Organizing.related.data• Built*using*Map/Reduce­ Map.func@on.creates.a.matrix.from.document.fields­ Reduce.func@on.summarizes.(reduces).informa@onFriday, April 26, 13
  30. 30. Distributed*Index*Build*Phase• Op)mized*for*lookups,*in]order*access*and*aggrega)ons• All*view*reads*from*disk*(different*performance*profile)• View*builds*against*every*document*on*every*node­ This.is.why.you.should.group.them.in.a.design.document• Automa)cally*kept*up*to*date­ “Incremental”.Map.ReduceFriday, April 26, 13
  31. 31. Dynamic(Range(Queries(with(Op5onal(Aggrega5on•Efficiently.fetch.an.row.or.group.of.related.rows.•Queries.use.cached.values.from.BKtree.inner.nodes.when.possible•Take.advantage.of.inKorder.tree.traversal.with.group_level.queriesDoc.4Doc.2Doc.5SERVER*1Doc.6Doc.4SERVER*2Doc.7Doc.1SERVER*3Doc.3Doc.9Doc.7Doc.8 Doc.6Doc.3DOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDoc.9Doc.5DOCDOCDOCDoc.1Doc.8 Doc.2Replica.Docs Replica.Docs Replica.DocsAc@ve.Docs Ac@ve.Docs Ac@ve.Docs?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}Friday, April 26, 13
  32. 32. Append*Only*Index• Disk&acDvity&is&slow• UpdaDng&disk&blocks&is&very&slow• Appending&new&data&to&the&end&of&the&current&file&is&fast• Overhead&of&reverse&reading&is&small• Because&exisDng&blocks&are&not&re0used,&can&lead&to&fragmentaDon­ Couchbase.will.compact.the.index.automa@callyDocViewProcessor DiskDocViewProcessorChanged DocumentsAppendedOriginalFriday, April 26, 13
  33. 33. Adding*a*new*DocumentA-R15I-R8M-R5A B C D F G H I K L N O Q RA-C3D-F2G-H2I-L3N-R4A-H7I-R7A-R14Mnew rootnew keynew reductionsFriday, April 26, 13
  34. 34. What*about*Reduce*?• Out*of*the*box*func)ons*:­ _count()­ _sum()­ _stats()• Create*your*own*if*neededfunction(key, values, rereduce) {if (rereduce) {var result = 0;for (var i = 0; i < values.length; i++) {result += values[i];}return result;} else {return values.length;}}Friday, April 26, 13
  35. 35. Reduce*Func)on• Key*and*Arrays*of*values*as*parameters• WriVen*Javascript• Called*aner*the*map*func)on• Used*to*reduce*the*result*of*a*map*of*single*values• Used*with*grouping• Could*be*ignored*when*querying­ reuse.the.indexFriday, April 26, 13
  36. 36. • Map()*Result• Reduce()• ResultReduce*in*Ac)onKey ValueBelgianKStyle.Dubbel 1BelgianKStyle.Dubbel 1BelgianKStyle.Dubbel 1BelgianKStyle.Pale.Ale 1BelgianKStyle.White 1BelgianKStyle.White 1... ..._count()Key ValueBelgianKStyle.Dubbel 3BelgianKStyle.Pale.Ale 1BelgianKStyle.White 2Friday, April 26, 13
  37. 37. How*to*use*it?• Use*client*SDK*to*call*the*view:View view = client.getView("beer", "by_name");Query query = new Query();query.setIncludeDocs(true)     .setLimit(20)     .setRangeStart(ComplexKey.of(startKey))     .setRangeEnd(ComplexKey.of(startKey + "uefff"));ViewResponse result = client.query(view, query); for(ViewRow row : result) {....}Friday, April 26, 13
  38. 38. Demonstra)onFriday, April 26, 13
  39. 39. ≠Hadoop*&*Couchbase• Deal&with&“Big&Data”• “More”&is&be)er&than&“Faster”• Batch&Oriented• Usually&used&to&“extract/transform”&data• Fully&distributed­ Map,.Shuffle,.Reduce• Distributed&• Executed&where&the&document&is• Deal&with&“indexing”&data&• As&fast&as&possible• Use&to&query&the&data&in&the&DatabaseFriday, April 26, 13
  40. 40. Map*Reduce*in*Couchbase• Like*many*other*NoSQL*Database*:*Used*for*queries*!*• Index*are*distributed*on*each*node*of*the*cluster• Index*are*updated*Incrementally• Write*you*Map*Reduce*in*JavascriptFriday, April 26, 13
  41. 41. Thank.you!tug@couchbase.com@tgrallGet.Couchbase.Server.at.hEp://www.couchbase.com/downloadFriday, April 26, 13
  42. 42. Friday, April 26, 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×