Friday, April 26, 13
Introduc)on	  to	  Map	  Reducewith	  CouchbaseTugdual	  Grall	  /	  @tgrallNoSQL	  Ma)ers	  ‘13	  -­‐	  Cologne	  -­‐	  A...
About	  Me	  • Tugdual	  “Tug”	  Grall­ Couchbase• Technical	  Evangelist­ eXo• CTO­ Oracle• Developer/Product	  Manager• ...
What’s	  the	  Problem	  ?Lots	  of	  DataBig	  Data SaaS/Cloud	  CompuDngBig	  UsersFriday, April 26, 13
Solu)onDistribute:•	  the	  data•	  the	  processing	  of	  the	  dataFriday, April 26, 13
Map	  Reduce	  MapReduce	  is	  a	  programming	  model	  for	  processing	  large	  data	  sets,	  and	  the	  name	  of	...
In	  details• Developer	  specifies	  2	  methods:­ map (in_key, in_value) -> list(out_key, intermediate_value)• Processes	...
Execu)onFriday, April 26, 13
Most	  common	  use	  case©	  Yahoo	  inc.Friday, April 26, 13
What	  about	  Couchbase?Friday, April 26, 13
Couchbase	  Open	  Source	  Project• Leading	  NoSQL	  database	  project	  focused	  on	  distributed	  database	  techno...
Couchbase	  Server	  Core	  PrinciplesEasy	  ScalabilityConsistent	  High	  PerformanceAlways	  On	  24x365Grow	  cluster	...
Addi)onal	  Couchbase	  Server	  FeaturesBuilt-­‐in	  clustering	  –	  All	  nodes	  equalData	  replica@on	  with	  auto-...
HeartbeatProcess	  monitorGlobal	  singleton	  supervisorConfigura@on	  manageron	  each	  nodeRebalance	  orchestratorNode...
New	  Persistence	  Layerstorage	  interfaceCouchbase	  EP	  Engine11210Memcapable	  	  2.0Moxi11211Memcapable	  	  1.0Obj...
COUCHBASE	  SERVER	  CLUSTERBasic	  Opera)on• Docs	  distributed	  evenly	  across	  servers	  • Each	  server	  stores	  ...
How	  to	  access	  the	  data?Friday, April 26, 13
Couchbase.get(“my-key”);Friday, April 26, 13
Key{	  	  	  	  “string”	  :	  “string”,	  	  	  	  “string”	  :	  value,	  	  	  	  “string”	  :	  	  	  	  	  	  	  	  	...
Create	  an	  index	  !How	  to?Friday, April 26, 13
{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-2...
Concrete	  Example• This	  map	  func)on:­ receives	  the	  document	  and	  metadata­ as	  developer	  you	  just	  have	...
Map	  Func)onTextFriday, April 26, 13
doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchb...
doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchb...
doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchb...
How	  it	  works	  ?Friday, April 26, 13
COUCHBASE	  SERVER	  	  CLUSTERIndexing	  and	  Querying	  User	  Configured	  Replica	  Count	  =	  1ACTIVEDoc	  5Doc	  2D...
Couchbase	  Server	  2.0:	  Views• Views	  can	  cover	  a	  few	  different	  use	  cases­ Primary	  Index	  ­ Simple	  se...
Distributed	  Index	  Build	  Phase• Op)mized	  for	  lookups,	  in-­‐order	  access	  and	  aggrega)ons• All	  view	  rea...
Dynamic	  Range	  Queries	  with	  Op5onal	  Aggrega5on•Efficiently	  fetch	  an	  row	  or	  group	  of	  related	  rows.•Q...
Append	  Only	  Index• Disk	  acDvity	  is	  slow• UpdaDng	  disk	  blocks	  is	  very	  slow• Appending	  new	  data	  to...
Adding	  a	  new	  DocumentA-R15I-R8M-R5A B C D F G H I K L N O Q RA-C3D-F2G-H2I-L3N-R4A-H7I-R7A-R14Mnew rootnew keynew re...
What	  about	  Reduce	  ?• Out	  of	  the	  box	  func)ons	  :­ _count()­ _sum()­ _stats()• Create	  your	  own	  if	  nee...
Reduce	  Func)on• Key	  and	  Arrays	  of	  values	  as	  parameters• WriVen	  Javascript• Called	  aner	  the	  map	  fun...
• Map()	  Result• Reduce()• ResultReduce	  in	  Ac)onKey ValueBelgian-­‐Style	  Dubbel 1Belgian-­‐Style	  Dubbel 1Belgian-...
How	  to	  use	  it?• Use	  client	  SDK	  to	  call	  the	  view:View view = client.getView("beer", "by_name");Query quer...
Demonstra)onFriday, April 26, 13
≠Hadoop	  &	  Couchbase• Deal	  with	  “Big	  Data”• “More”	  is	  be)er	  than	  “Faster”• Batch	  Oriented• Usually	  us...
Map	  Reduce	  in	  Couchbase• Like	  many	  other	  NoSQL	  Database	  :	  Used	  for	  queries	  !	  • Index	  are	  dis...
Thank	  you!tug@couchbase.com@tgrallGet	  Couchbase	  Server	  at	  hEp://www.couchbase.com/downloadFriday, April 26, 13
Friday, April 26, 13
Upcoming SlideShare
Loading in …5
×

NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

1,285 views

Published on

Introduction to Map Reduce and how it is used in Couchbase Server 2.0 to query documents

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,285
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

  1. 1. Friday, April 26, 13
  2. 2. Introduc)on  to  Map  Reducewith  CouchbaseTugdual  Grall  /  @tgrallNoSQL  Ma)ers  ‘13  -­‐  Cologne  -­‐  April  25th  2013Friday, April 26, 13
  3. 3. About  Me  • Tugdual  “Tug”  Grall­ Couchbase• Technical  Evangelist­ eXo• CTO­ Oracle• Developer/Product  Manager• Mainly  Java/SOA­ Developer  in  consul@ng  firms• Web• @tgrall• hEp://blog.grallandco.com• tgrall• NantesJUG  co-­‐founder• Pet  Project  :• hEp://www.resultri.comFriday, April 26, 13
  4. 4. What’s  the  Problem  ?Lots  of  DataBig  Data SaaS/Cloud  CompuDngBig  UsersFriday, April 26, 13
  5. 5. Solu)onDistribute:•  the  data•  the  processing  of  the  dataFriday, April 26, 13
  6. 6. Map  Reduce  MapReduce  is  a  programming  model  for  processing  large  data  sets,  and  the  name  of  an  implementa@on  of  the  model  by  Google.  MapReduce  is  typically  used  to  do  distributed  compu@ng  on  clusters  of  computers.hEp://research.google.com/archive/mapreduce.htmlFriday, April 26, 13
  7. 7. In  details• Developer  specifies  2  methods:­ map (in_key, in_value) -> list(out_key, intermediate_value)• Processes  input  data  • Produces  key,  values  pairs­ reduce (out_key, list(intermediate_value)) -> list(out_value)• Combines  all  intermediate  values  for  a  par@cular  key• Produce  a  set  of  merged  output  valuesFriday, April 26, 13
  8. 8. Execu)onFriday, April 26, 13
  9. 9. Most  common  use  case©  Yahoo  inc.Friday, April 26, 13
  10. 10. What  about  Couchbase?Friday, April 26, 13
  11. 11. Couchbase  Open  Source  Project• Leading  NoSQL  database  project  focused  on  distributed  database  technology  and  surrounding  ecosystem• Supports  both  key-­‐value  and  document-­‐oriented  use  cases• All  components  are  available  under  the  Apache  2.0  Public  License• Obtained  as  packaged  soXware  in  both  enterprise  and  community  edi@ons.CouchbaseOpen Source ProjectFriday, April 26, 13
  12. 12. Couchbase  Server  Core  PrinciplesEasy  ScalabilityConsistent  High  PerformanceAlways  On  24x365Grow  cluster  without  applica@on  changes,  without  down@me  with  a  single  clickConsistent  sub-­‐millisecond  read  and  write  response  @mes  with  consistent  high  throughputNo  down@me  for  soXware  upgrades,  hardware  maintenance,  etc.Flexible  Data  ModelJSON  document  model  with  no  fixed  schema.JSONJSONJSONJSONJSONPERFORMANCEFriday, April 26, 13
  13. 13. Addi)onal  Couchbase  Server  FeaturesBuilt-­‐in  clustering  –  All  nodes  equalData  replica@on  with  auto-­‐failoverZero-­‐down@me  maintenance  Built-­‐in  managed  cachedAppend-­‐only  storage  layerOnline  compac@onMonitoring  and  admin  API  &  UISDK  for  a  variety  of  languagesFriday, April 26, 13
  14. 14. HeartbeatProcess  monitorGlobal  singleton  supervisorConfigura@on  manageron  each  nodeRebalance  orchestratorNode  health  monitorone  per  clustervBucket  state  and  replica@on  managerhVpREST  management  API/Web  UIHTTP8091Erlang  port  mapper4369Distributed  Erlang21100  -­‐  21199Erlang/OTPstorage  interfaceCouchbase  EP  Engine11210Memcapable    2.0Moxi11211Memcapable    1.0MemcachedNew  Persistence  Layer8092Query  APIQuery  EngineData  Manager Cluster  ManagerCouchbase  Server  2.0  ArchitectureFriday, April 26, 13
  15. 15. New  Persistence  Layerstorage  interfaceCouchbase  EP  Engine11210Memcapable    2.0Moxi11211Memcapable    1.0Object-­‐level  CacheDisk  Persistence8092Query  APIQuery  EngineHTTP8091Erlang  port  mapper4369Distributed  Erlang21100  -­‐  21199HeartbeatProcess  monitorGlobal  singleton  supervisorConfigura@on  manageron  each  nodeRebalance  orchestratorNode  health  monitorone  per  clustervBucket  state  and  replica@on  managerhVpREST  management  API/Web  UIErlang/OTPServer/Cluster  Management  &  CommunicaDon(Erlang)RAM  Cache,  Indexing  &  Persistence  Management(C  &  V8)The Unreasonable Effectiveness of C by Damien KatzCouchbase  Server  2.0  ArchitectureFriday, April 26, 13
  16. 16. COUCHBASE  SERVER  CLUSTERBasic  Opera)on• Docs  distributed  evenly  across  servers  • Each  server  stores  both  ac)ve  and  replica  docsOnly  one  server  ac@ve  at  a  @me• Client  library  provides  app  with  simple  interface  to  database• Cluster  map  provides  map  to  which  server  doc  is  onApp  never  needs  to  know• App  reads,  writes,  updates  docs• Mul)ple  app  servers  can  access  same  document  at  same  )meUser  Configured  Replica  Count  =  1READ/WRITE/UPDATEACTIVEDoc  5Doc  2DocDocDocSERVER  1ACTIVEDoc  4Doc  7DocDocDocSERVER  2Doc  8ACTIVEDoc  1Doc  2DocDocDocREPLICADoc  4Doc  1Doc  8DocDocDocREPLICADoc  6Doc  3Doc  2DocDocDocREPLICADoc  7Doc  9Doc  5DocDocDocSERVER  3Doc  6APP  SERVER  1COUCHBASE  Client  LibraryCLUSTER  MAPCOUCHBASE  Client  LibraryCLUSTER  MAPAPP  SERVER  2Doc  9Friday, April 26, 13
  17. 17. How  to  access  the  data?Friday, April 26, 13
  18. 18. Couchbase.get(“my-key”);Friday, April 26, 13
  19. 19. Key{        “string”  :  “string”,        “string”  :  value,        “string”  :                        {    “string”  :  “string”,                              “string”  :  value  },        “string”  :  [  array  ]}JSONOBJECT(“DOCUMENT”)• How  to  find  document  based  on  its  aVributes?­ get  employee  by  email­ get  products  by  type­ ...• You  need  to  look  “into”  the  document/valueLook  at  a  documentFriday, April 26, 13
  20. 20. Create  an  index  !How  to?Friday, April 26, 13
  21. 21. {"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}Key ValueAven@nus 8.2Avenue  Ale 4.1... ...{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}{"name": "Aventinus","abv": 8.2,"ibu": 0,"srm": 0,"upc": 0,"type": "beer","brewery_id": "110f1f2012","updated": "2010-07-22 20:00:20","description": "Dark-ruby,... Weizenbock","category": "German Ale"}{"id": "110f37fa30","rev": "1-000000000","expiration": 0,"flags": 0,"type": "json"}Create  the  indexFriday, April 26, 13
  22. 22. Concrete  Example• This  map  func)on:­ receives  the  document  and  metadata­ as  developer  you  just  have  to  emit  the  K,VFriday, April 26, 13
  23. 23. Map  Func)onTextFriday, April 26, 13
  24. 24. doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchbase.com u::6ye@@couchbase.com u::4zorro@couchbase.com u::3?startkey=”b1”  &  endkey=”zz”Pulls  the  Index-­‐Keys  between  UTF-­‐8  Range  specified  by  the  startkey  and  endkey.?startkey=”bz”  &  endkey=”zn”Pulls  the  Index-­‐Keys  between  UTF-­‐8  Range  specified  by  the  startkey  and  endkey.Friday, April 26, 13
  25. 25. doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchbase.com u::6ye@@couchbase.com u::4zorro@couchbase.com u::3?key=”math@couchbase.com”  Match  a  Single  Index-­‐KeyFriday, April 26, 13
  26. 26. doc.email meta.idabba@couchbase.com u::1beta@couchbase.com u::7jasdeep@couchbase.com u::2math@couchbase.com u::5maE@couchbase.com u::6ye@@couchbase.com u::4zorro@couchbase.com u::3?keys=[“math@couchbase.com”,“yeD@couchbase.com”]Query  Mul@ple  in  the  Set  (Array  Nota@on)Friday, April 26, 13
  27. 27. How  it  works  ?Friday, April 26, 13
  28. 28. COUCHBASE  SERVER    CLUSTERIndexing  and  Querying  User  Configured  Replica  Count  =  1ACTIVEDoc  5Doc  2DocDocDocSERVER  1REPLICADoc  4Doc  1Doc  8DocDocDocAPP  SERVER  1COUCHBASE  Client  LibraryCLUSTER  MAPCOUCHBASE  Client  LibraryCLUSTER  MAPAPP  SERVER  2Doc  9• Indexing  work  is  distributed  amongst  nodes• Large  data  set  possible• Parallelize  the  effort• Each  node  has  index  for  data  stored  on  it• Queries  combine  the  results  from  required  nodesACTIVEDoc  5Doc  2DocDocDocSERVER  2REPLICADoc  4Doc  1Doc  8DocDocDocDoc  9ACTIVEDoc  5Doc  2DocDocDocSERVER  3REPLICADoc  4Doc  1Doc  8DocDocDocDoc  9QueryFriday, April 26, 13
  29. 29. Couchbase  Server  2.0:  Views• Views  can  cover  a  few  different  use  cases­ Primary  Index  ­ Simple  secondary  indexes  (the  most  common)­ Complex  secondary,  ter@ary  and  composite  indexes­ Aggrega@on  func@ons  (reduc@on)• Example:  count  the  number  of  “North  American  Ales”­ Organizing  related  data• Built  using  Map/Reduce­ Map  func@on  creates  a  matrix  from  document  fields­ Reduce  func@on  summarizes  (reduces)  informa@onFriday, April 26, 13
  30. 30. Distributed  Index  Build  Phase• Op)mized  for  lookups,  in-­‐order  access  and  aggrega)ons• All  view  reads  from  disk  (different  performance  profile)• View  builds  against  every  document  on  every  node­ This  is  why  you  should  group  them  in  a  design  document• Automa)cally  kept  up  to  date­ “Incremental”  Map  ReduceFriday, April 26, 13
  31. 31. Dynamic  Range  Queries  with  Op5onal  Aggrega5on•Efficiently  fetch  an  row  or  group  of  related  rows.•Queries  use  cached  values  from  B-­‐tree  inner  nodes  when  possible•Take  advantage  of  in-­‐order  tree  traversal  with  group_level  queriesDoc  4Doc  2Doc  5SERVER  1Doc  6Doc  4SERVER  2Doc  7Doc  1SERVER  3Doc  3Doc  9Doc  7Doc  8 Doc  6Doc  3DOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDOCDoc  9Doc  5DOCDOCDOCDoc  1Doc  8 Doc  2Replica  Docs Replica  Docs Replica  DocsAc@ve  Docs Ac@ve  Docs Ac@ve  Docs?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}Friday, April 26, 13
  32. 32. Append  Only  Index• Disk  acDvity  is  slow• UpdaDng  disk  blocks  is  very  slow• Appending  new  data  to  the  end  of  the  current  file  is  fast• Overhead  of  reverse  reading  is  small• Because  exisDng  blocks  are  not  re-­‐used,  can  lead  to  fragmentaDon­ Couchbase  will  compact  the  index  automa@callyDocViewProcessor DiskDocViewProcessorChanged DocumentsAppendedOriginalFriday, April 26, 13
  33. 33. Adding  a  new  DocumentA-R15I-R8M-R5A B C D F G H I K L N O Q RA-C3D-F2G-H2I-L3N-R4A-H7I-R7A-R14Mnew rootnew keynew reductionsFriday, April 26, 13
  34. 34. What  about  Reduce  ?• Out  of  the  box  func)ons  :­ _count()­ _sum()­ _stats()• Create  your  own  if  neededfunction(key, values, rereduce) {if (rereduce) {var result = 0;for (var i = 0; i < values.length; i++) {result += values[i];}return result;} else {return values.length;}}Friday, April 26, 13
  35. 35. Reduce  Func)on• Key  and  Arrays  of  values  as  parameters• WriVen  Javascript• Called  aner  the  map  func)on• Used  to  reduce  the  result  of  a  map  of  single  values• Used  with  grouping• Could  be  ignored  when  querying­ reuse  the  indexFriday, April 26, 13
  36. 36. • Map()  Result• Reduce()• ResultReduce  in  Ac)onKey ValueBelgian-­‐Style  Dubbel 1Belgian-­‐Style  Dubbel 1Belgian-­‐Style  Dubbel 1Belgian-­‐Style  Pale  Ale 1Belgian-­‐Style  White 1Belgian-­‐Style  White 1... ..._count()Key ValueBelgian-­‐Style  Dubbel 3Belgian-­‐Style  Pale  Ale 1Belgian-­‐Style  White 2Friday, April 26, 13
  37. 37. How  to  use  it?• Use  client  SDK  to  call  the  view:View view = client.getView("beer", "by_name");Query query = new Query();query.setIncludeDocs(true)     .setLimit(20)     .setRangeStart(ComplexKey.of(startKey))     .setRangeEnd(ComplexKey.of(startKey + "uefff"));ViewResponse result = client.query(view, query); for(ViewRow row : result) {....}Friday, April 26, 13
  38. 38. Demonstra)onFriday, April 26, 13
  39. 39. ≠Hadoop  &  Couchbase• Deal  with  “Big  Data”• “More”  is  be)er  than  “Faster”• Batch  Oriented• Usually  used  to  “extract/transform”  data• Fully  distributed­ Map,  Shuffle,  Reduce• Distributed  • Executed  where  the  document  is• Deal  with  “indexing”  data  • As  fast  as  possible• Use  to  query  the  data  in  the  DatabaseFriday, April 26, 13
  40. 40. Map  Reduce  in  Couchbase• Like  many  other  NoSQL  Database  :  Used  for  queries  !  • Index  are  distributed  on  each  node  of  the  cluster• Index  are  updated  Incrementally• Write  you  Map  Reduce  in  JavascriptFriday, April 26, 13
  41. 41. Thank  you!tug@couchbase.com@tgrallGet  Couchbase  Server  at  hEp://www.couchbase.com/downloadFriday, April 26, 13
  42. 42. Friday, April 26, 13

×