• Like

Navigating the Transition from Relational to NoSQL Technology

  • 212 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
212
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. SFDAMA  presents   Naviga&ng  the  Transi&on  from  Rela&onal  to  NoSQL  Technology   Dip&  Borkar   Director,  Product  Management   1  
  • 2. WHY  TRANSITION  TO  NOSQL?     2  
  • 3. Changes  in  interac&ve  so@ware  –  NoSQL  driver   3  
  • 4. Survey:  Two  big  drivers  for  NoSQL  adop&on   What  is  the  biggest  data  management  problem     driving  your  use  of  NoSQL  in  the  coming  year?   Lack  of  flexibility/rigid  schemas   49%   Inability  to  scale  out  data   35%   High  latency/low  performance   29%   Costs   16%   All  of  these   12%   Other   11%   Source: Couchbase NoSQL Survey, December 2011, n=1351 4  
  • 5. NoSQL  catalog   Key-­‐Value   Data  Structure   Document   Column   Graph  (memory  only)   Cache   memcached   redis  (memory/disk)   membase   couchbase   cassandra   Neo4j   Database   couchDB   mongoDB   5  
  • 6. Are  you  being  impacted  by  these?     Schema  Rigidity  problems     •  Do  you  store  serialized  objects  in  the  database?   •  Do  you  have  lots  of  sparse  tables  with  very  few  columns   Q   being  used  by  most  rows?   •  Do  you  find  that  your  applica&on  developers  require  schema   changes  frequently  due  to  constantly  changing  data?       •  Are  you  using  your  database  as  a  key-­‐value  store?   Scalability  problems     •  Do  you  periodically  need  to  upgrade  systems  to  more   powerful  servers  and  scale  up?     Q   •  Are  you  reaching  the  read  /  write  throughput  limit  of  a  single   database  server?     •  Is  your  server’s  read  /  write  latency  not  mee&ng  your  SLA?     •  Is  your  user  base  growing  at  a  frightening  pace?     6  
  • 7. DISTRIBUTED  DOCUMENT   DATABASES   7  
  • 8. Document  Databases  •  Each  record  in  the  database  is  a  self-­‐ describing  document     {  •  Each  document  has  an  independent   “UUID”:  “ 21f7f8de-­‐8051-­‐5b89-­‐86 “Time”:   “2011-­‐04-­‐01T13:01:02.42 “Server”:   “A2223E”, structure   “Calling   Server”:   “A2213W”, “Type”:   “E100”, “Initiating   User”:   “dsallings@spy.net”,•  Documents  can  be  complex     “Details”:   { “IP”:  “ 10.1.1.22”,•  All  databases  require  a  unique  key   “API”:   “InsertDVDQueueItem”, “Trace”:   “cleansed”,•  Documents  are  stored  using  JSON  or   “Tags”:   [ “SERVER”,   XML  or  their  deriva&ves   “US-­‐West”,   “API” ]•  Content  can  be  indexed  and  queried     } }•  Offer  auto-­‐sharding  for  scaling  and   replica&on  for  high-­‐availability   8  
  • 9. COMPARING  DATA  MODELS   9  
  • 10. hgp://www.geneontology.org/images/diag-­‐godb-­‐er.jpg   10  
  • 11. Rela&onal  vs  Document  data  model   {   “UUID”:  “ 21f7f8de-­‐8051-­‐5b89-­‐86 R1C1   R1C2   R1C3   R1C4   {   “Time”:   “2011-­‐04-­‐01T13:01:02.42 “UUID”:  “ 21f7f8de-­‐8051-­‐5b89-­‐86 “Server”:   “A2223E”, {   “Time”:   “2011-­‐04-­‐01T13:01:02.42 “Calling   Server”:   “A2213W”, “Server”:   “A2223E”, “UUID”:  “ 21f7f8de-­‐8051-­‐5b89-­‐86 “Type”:   “E100”, {   “Time”:   “2011-­‐04-­‐01T13:01:02.42 “Calling   Server”:   User”:   “dsallings@spy.net”, “Initiating   “A2213W”, “Server”:   “A2223E”, “Type”:   “E100”, “Details”:   “UUID”:  “ 21f7f8de-­‐8051-­‐5b89-­‐86 “Initiating   User”:   “dsallings@spy.net”, “Time”:   “2011-­‐04-­‐01T13:01:02.42 “Calling   Server”:   “A2213W”, { R2C1   R2C2   R2C3   R2C4   “Details”:   “IP”:  “ 10.1.1.22”, “Server”:   “A2223E”, “Type”:   “E100”, { “Initiating   User”:   “dsallings@spy.net”, “Calling   Server”:   “A2213W”, “API”:   “InsertDVDQueueItem”, “Details”:   “Type”:   “E100”, “IP”:  “ 10.1.1.22”, “Trace”:   “cleansed”, { “API”:  “Tags”:   “Initiating   User”:   “dsallings@spy.net”,“InsertDVDQueueItem”, “Details”:   “Trace”:   “cleansed”, “IP”:  “ 10.1.1.22”, [ { “Tags”:   “API”:   “InsertDVDQueueItem”, “SERVER”,   “IP”:  “ 10.1.1.22”, [ “Trace”:   “cleansed”, “US-­‐West”,   R3C1   R3C2   R3C3   R3C4   “Tags”:   “API”:   “InsertDVDQueueItem”, [ “Trace”:   “cleansed”, “SERVER”,   “API” “US-­‐West”,   ] “Tags”:   “SERVER”,  “API” } [ ] “US-­‐West”,   } “SERVER”,   } “API” } ] “US-­‐West”,   } “API” R4C1   R4C2   R4C3   R4C4   } } ] } Rela&onal  data  model   Document  data  model   Highly-­‐structured  table  organiza&on  with     Collec&on  of  complex  documents  with   rigidly-­‐defined  data  formats  and  record   arbitrary,  nested  data  formats  and   structure.   varying  “record”  format.   11  
  • 12. Example:  Error  Logging  Use  case   Table  1:  Error  Log   Table  2:  Data  Centers   KEY   ERR   TIME   DC   KEY   LOC   NUM   FK(DC2)   303-­‐223-­‐   1   ERR   TIME   1   DEN   2332   FK(DC2)   212-­‐223-­‐   2   ERR   TIME   2   NYC   2332   FK(DC2)   415-­‐223-­‐   3   ERR   TIME   3   SFO   2332   FK(DC3)   4   ERR   TIME   12  
  • 13. Document  design  with  flexible  schema      {              “ID”:  4,   {              “ERR”:  “Out  of  Memory”,          “ID”:    1,   {            “TIME”:  “2004-­‐09-­‐16T23:59:58.75”,          “ERR”:  “Out  of  Memory”,          “ID”:  1,   {          “DC”:    “NYC”,   “Out  of  Memory”,          “TIME”:  “2004-­‐09-­‐16T23:59:58.75”,         “ERR”:   1,          “ID”:          “NUM”:    ““  NYC”,   “Out  of  Memory”,   212-­‐223-­‐2332”          “DC”:       “ERR”:          “TIME”:  “2004-­‐09-­‐16T23:59:58.75”,  }          “NUM”:  TIME”:  “2004-­‐09-­‐16T23:59:58.75”,          “ “212-­‐223-­‐2332”          “DC”:  “NYC”,       }          “NUM”:  “212-­‐223-­‐2332”          “DC”:  “NYC”,   SCHEMA  CHANGE   {   }          “NUM”:  “212-­‐223-­‐2332”          “ID”:  5,   }          “ERR”:  “Out  of  Memory”,          “TIME”:  “2004-­‐09-­‐16T23:59:58.75”,              “COMPONENT”:  ”DMS”          “SEV”:  “LEVEL1”            “DC”:  “NYC”,          “NUM”:  “212-­‐223-­‐2332”   }   13  
  • 14. Document  modeling       •  Are  these  separate  object  in  the  model  layer?         Q   •  •  Are  these  objects  accessed  together?     Do  you  need  updates  to  these  objects  to  be  atomic?   •  Are  mul&ple    people  edi&ng  these  objects  concurrently?        When  considering  how  to  model  data  for  a  given    applica&on   •  Think  of  a  logical  container  for  the  data   •  Think  of  how  data  groups  together         14  
  • 15. Document  Design  Op&ons             •  One  document  that  contains  all  related  data       –  Data  is  de-­‐normalized   –  Beger  performance  and  scale   –  Eliminate  client-­‐side  joins       •  Separate  documents  for  different  object  types  with   cross  references     –  Data  duplica&on  is  reduced   –  Objects  may  not  be  co-­‐located     –  Transac&ons  supported  only  on  a  document  boundary   –  Most  document  databases  do  not  support  joins   15  
  • 16. Document  ID  /  Key  selec&on   •  Similar  to  primary  keys  in  rela&onal  databases   •  Documents  are  sharded  based  on  the  document  ID   •  ID  based  document  lookup  is  extremely  fast     •  Usually  an  ID  can  only  appear  once  in  a  bucket         Q     •         Do  you  have  a  unique  way  of  referencing  objects?   •         Are  related  objects  stored  in  separate  documents?   Op&ons   • UUIDs,  date-­‐based  IDs,  numeric  IDs       • Hand-­‐cra@ed  (human  readable)     • Matching  prefixes  (for  mul&ple  related  objects)   16  
  • 17. Example:  En&&es  for  a  Blog   BLOG   •  User  profile   The  main  pointer  into  the  user  data   •  Blog  entries   •  Badge  sesngs,  like  a  twiger  badge       •  Blog  posts   Contains  the  blogs  themselves       •  Blog  comments   •  Comments  from  other  users   17  
  • 18. Blog  Document  –  Op&on  1  –  Single  document     {   “UUID ”:  “2 1 f7 f8 de-­‐8 0 5 1 -­‐5 b89 -­‐8 6 “Time”:   “2 0 1 1 -­‐0 4-­‐0 1 T1 3 :0 1 :0 2.4 2 { “Server”:   “A2 2 2 3 E”, ! “_id”: “jchris_Hello_World”,!3 W”, “Calling   Server”:   “A2 2 1 “Type”:   “E1 0 0 ”, “author”: “jchris”, ! “Initiating   Us er”:   “ds allings @s py.net”, “type”: “post”! “D etails ”:   “title”: “Hello World”,! { “format”: “IP”:  “1 0 .1 ! .2 2 ”, “markdown”, .1 “API”:   “Ins ertD VD QueueItem”, “body”: “Hello from [Couchbase](http://couchbase.com).”, ! “Trace”:   “cleans ed”, “html”: “<p>Hello from <a href=“http: …! “Tags ”:   “comments”:[ ! [ [“format”: “markdown”, “body”:”Awesome post!”],! “SERVER”,   “US-­‐Wes t”,   [“format”: “markdown”, “body”:”Like it.” ]! ]! “API” ] }   } } 18  
  • 19. Blog  Document  –  Op&on  2  -­‐  Split  into  mul&ple  docs    {  { !“UUID ”:  “21f7f8de-­‐8051 -­‐5b89 -­‐86“_id”: “jchris_Hello_World”,!“Time”:   “2011 -­‐04-­‐01T13:01:02.42“author”: “A2223E”, !“Server”:   “jchris”,“Calling   Server”:   “A2213W”,“type”: “E100 ”,“Type”:   “post”!“title”: “Hello World”,! @s py.net”,“Initiating   Us er”:   “ds allings“D etails ”:  “format”: “markdown”, ! {“body”:“IP”:  “10.1.1.22”, “Hello from [Couchbase]( “API”:   “Ins ertDVD QueueItem”,http://couchbase.com).”, ! “Trace”:   “cleans ed”,“html”:“Tags ”:   “<p>Hello from <a href=“http: …! [“comments”:[! “SERVER”,   ! “comment1_jchris_Hello_world”! “US-­‐Wes t”,   ! “API” ]! ] {   COMMENT  }! } “UUID ”:  “ 2 1 f7 f8 de-­‐8 0 5 1 -­‐5 b8 9 -­‐8 6 “Time”:   “ 2 0 1 1 -­‐0 4 -­‐0 1 T1 3 :0 1 :0 2 .4 2 “Server”:   “A2 2 2 3 E”,} “Calling   Server”:   “A2 2 1 3 W ”, {! BLOG  DOC   “Type”:   “E1 0 0 ”, “Initiating   Us er”:   “ds allings @s py.net”, “_id”: “comment1_jchris_Hello_World”,! “D etails ”:   { “IP ”:  “ 1 0 .1 .1 .2 2 ”, “format”: “markdown”, ! “AP I”:   “ Ins ertD VD QueueItem”, “Trace”:   “cleans ed”, “Tags ”:   “body”:”Awesome post!” ! [ “SERVER”,   “US-­‐Wes t”,   }   “AP I” ] } } 19  
  • 20. Threaded  Comments  •  You  can  imagine  how  to  take  this  to  a  threaded  list   List   First   Reply  to   comment   Blog   List   comment   More   Comments  Advantages  •  Only  fetch  the  data  when  you  need  it   •  For  example,  rendering  part  of  a  web  page  •  Spread  the  data  and  load  across  the  en&re  cluster     20  
  • 21. COMPARING    SCALING  MODEL   21  
  • 22. Modern interactive software architecture Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex server Note  –  Rela&onal  database  technology  is  great  for  what  it  is  great  for,  but  it  is  not  great  for  this.   22  
  • 23. NoSQL database matches application logic tier architectureData layer now scales with linear cost and constant performance. Application Scales Out Just add more commodity web servers NoSQL  Database  Servers   Database Scales Out Just add more commodity data servers Scaling out flattens the cost and performance curves. 23  
  • 24. Other  considera&ons            Accessing  data   App  Server   –  No  standards  exist  yet   –  Typically  via  SDKs  or  over  HTTP   –  Check  if  the  programing  language  of  your   choice  is  supported.            Consistency   App  Server   –  Consistent  only  at  the  document  level   –  Most  documents  stores  currently  don’t   support  mul&-­‐document  transac&ons   –  Analyze  your  applica&on  needs            Availability   App  Server   –  Each  node  stores  ac&ve  and  replica  data   (Couchbase)   –  Each  node  is  either  a  master  or  slave   (MongoDB)   24  
  • 25. Other  considera&ons          Opera&ons   App  Server   –  Monitoring  the  system   –  Backup  and  restore  the  system   –  Upgrades  and  maintenance     –  Support                Scaling   App  Server   –  Ease  of  adding  and  reducing  capacity   Client   –  Applica&on  availability  on  topology   changes                  Indexing  and  Querying   –  Secondary  indexes  (Map  func&ons)   –  Aggregates  Grouping  (Reduce  func&ons)   –  Basic  querying     25  
  • 26. Is  NoSQL  the  right  choice  for  you?   Does  your  applica&on  need  rich  database  func&onality?       •  Mul&-­‐document  transac&ons   •  Complex  security  needs  –  user  roles,  document  level  security,   authen&ca&on,  authoriza&on  integra&on   •  Complex  joins  across  bucket  /  collec&ons     •  BI  integra&on     •  Extreme  compression  needs   NoSQL  may  not  be  the  right  choice  for  your  applica&on   26  
  • 27. WHERE  IS  NOSQL  A  GOOD  FIT?   27  
  • 28. Performance  driven  use  cases   •  Low  latency   •  High  throughput  magers   •  Large  number  of  users     •  Unknown  demand  with  sudden  growth  of   users/data     •  Predominantly  direct  document  access   •  Workloads  with  very  high  muta&on  rate  per   document  (temporal  locality)  Working  set  with   heavy  writes     28  
  • 29. Data  driven  use  cases     •  Support  for  unlimited  data  growth       •  Data  with  non-­‐homogenous  structure     •  Need  to  quickly  and  o@en  change  data  structure   •  3rd  party  or  user  defined  structure   •  Variable  length  documents   •  Sparse  data  records   •  Hierarchical  data     29  
  • 30. BRIEF  OVERVIEW  COUCHBASE  SERVER   30  
  • 31. Couchbase  Server   Simple.  Fast.  Elas&c.  NoSQL.      Couchbase  automa&cally  distributes  data  across  commodity  servers.  Built-­‐in  caching   enables  apps  to  read  and  write  data  with  sub-­‐millisecond  latency.  And  with  no  schema  to   manage,  Couchbase  effortlessly  accommodates  changing  data  management  requirements.     31  
  • 32. Representa&ve  user  list   32  
  • 33. Couchbase  architecture   Database  Opera&ons   REST  management  API/Web  UI   vBucket  state  and  replica&on  manager   (built-­‐in  memcached)   Global  singleton  supervisor   Rebalance  orchestrator   Configura&on  manager   Node  health  monitor   Process  monitor   Membase  EP  Engine   Heartbeat   Data  Manager   Cluster  Manager   storage  interface   CouchDB   hgp   on  each  node   one  per  cluster   Erlang/OTP   Cluster  Management   33  
  • 34. Couchbase  deployment   Web   Applica&on   Couchbase   Client  Library   Data  Flow   Cluster  Management   34  
  • 35. Clustering  With  Couchbase   2   1   SET  request  arrives  at  KEY’s   1   SET  acknowledgement   master  server   returned  to  applica&on   3   2   3   Listener-­‐Sender   RAM   Couchbase  storage  engine   4   Disk Disk Disk Disk Disk DiskReplica  Server  1  for  KEY   Master  server  for  KEY   Replica  Server  2  for  KEY   35  
  • 36. Basic  Opera&on   APP  SERVER  1   APP  SERVER  2       § Docs  distributed  evenly  across       COUCHBASE  CLIENT  LIBRARY   servers  in  the  cluster   COUCHBASE  CLIENT  LIBRARY               § Each  server  stores  both  ac#ve   CLUSTER  MAP     CLUSTER  MAP             &  replica  docs       §  Only  one  server  ac&ve  at  a  &me   § Client  library  provides  app  with   Read/Write/Update   Read/Write/Update   simple  interface  to  database   § Cluster  map  provides  map  to   which  server  doc  is  on   §  App  never  needs  to  know   SERVER  1   SERVER  2   SERVER  3   §  App  reads,  writes,  updates   Ac&ve  Docs     Ac&ve  Docs     Ac&ve  Docs     docs     Doc  5   DOC     Doc  4   DOC     Doc  1   DOC         §  Mul&ple  App  Servers  can     Doc  2   DOC     Doc  7   DOC     Doc  3   DOC   access  same  document  at           Doc  9   DOC     Doc  8   DOC     Doc  6   DOC   same  &me             Replica  Docs     Replica  Docs     Replica  Docs           Doc  4   DOC     Doc  6   DOC     Doc  7   DOC           Doc  1   DOC     Doc  3   DOC     Doc  9   DOC           Doc  8   DOC     Doc  2   DOC     Doc  5   DOC   COUCHBASE  SERVER    CLUSTER  User  Configured  Replica  Count  =  1   36  
  • 37. Add  Nodes   APP  SERVER  1   APP  SERVER  2           §  Two  servers  added  to   COUCHBASE  CLIENT  LIBRARY   COUCHBASE  CLIENT  LIBRARY   cluster           §  One-­‐click  opera&on     CLUSTER  MAP     CLUSTER  MAP       §  Docs  automa&cally               rebalanced  across   cluster   §  Even  distribu&on  of   docs   Read/Write/Update   Read/Write/Update   §  Minimum  doc   movement   §  Cluster  map  updated   §  App  database  calls  now   distributed  over  larger  #   SERVER  1   SERVER  2   SERVER  3   SERVER  4   SERVER  5   of  servers   Ac&ve  Docs       Ac&ve  Docs    Ac&ve  Docs  ocs   Ac&ve  Docs     Ac&ve  Docs     Ac&ve  D   Doc  5   DOC     Doc  4   DOC     Doc  1   DOC             Doc  3         Doc  2   DOC     Doc  7   DOC     Doc  3   DOC             Doc  6         Doc  9   DOC     Doc  8   DOC     Doc  6   DOC                   Replica  Docs     Replica  Docs    Replica  Docs     Replica  Docs     Replica  Docs         Replica  Docs         Doc  4   DOC     Doc  6   DOC     Doc  7   7   DOC   Doc                   Doc  1   DOC     Doc  3   DOC     Doc  9   9   DOC   Doc                   Doc  8   DOC     Doc  2   DOC     Doc  5   DOC       COUCHBASE  SERVER    CLUSTER  User  Configured  Replica  Count  =  1   37  
  • 38. Fail  Over  Node   APP  SERVER  1   APP  SERVER  2   §  App  servers  happily  accessing  docs       on  Server  3       COUCHBASE  CLIENT  LIBRARY   §  Server  fails   COUCHBASE  CLIENT  LIBRARY         §  App  server  requests  to  server  3  fail       CLUSTER  MAP     CLUSTER  MAP     §  Cluster  detects  server  has  failed             §  Promotes  replicas  of  docs  to  ac#ve       §  Updates  cluster  map   §  App  server  requests  for  docs  now   go  to  appropriate  server   §  Typically  rebalance    would  follow     SERVER  1   SERVER  2   SERVER  3   SERVER  4   SERVER  5   Ac&ve  Docs       Ac&ve  Docs    Ac&ve  Docs  ocs   Ac&ve  Docs     Ac&ve  Docs     Ac&ve  D   Doc  5   DOC     Doc  4   DOC     Doc  1   DOC     Doc  9   DOC     Doc  6   DOC         Doc  3         Doc  2   DOC     Doc  7   DOC     Doc  3     Doc  8     DOC         Doc  6           DOC                     Replica  Docs     Replica  Docs    Replica  Docs     Replica  Docs     Replica  Docs         Replica  Docs         Doc  4   DOC     Doc  6   DOC     Doc  7   7   DOC   Doc     Doc  5   DOC     Doc  8   DOC               Doc  1   DOC     Doc  3   DOC     Doc  9   9   DOC   Doc     Doc  2     DOC                       COUCHBASE  SERVER    CLUSTER  User  Configured  Replica  Count  =  1   38  
  • 39. Reading  and  Wri&ng   Reading  Data   Wri&ng  Data   Application  Server Application  Server Give  me   Please  store   document  A   A   document  A   Here  is     A   OK,  I  stored   document  A   document  A   A   Server   A   Server   RAM RAM A   A   DISK DISK 39  
  • 40. Flow  of  data  when  wri&ng   Application  Server Application  Server Application  ServerApplica&ons  wri&ng  to  Couchbase     Server   Replica&on  queue   Disk  write  queue   Couchbase  transmi^ng  replicas   Couchbase  wri&ng  to  disk   network   Wri&ng  Data   40  
  • 41. THANK  YOU      DIPTI@COUCHBASE.COM   41  
  • 42. 42  
  • 43. 43