SlideShare a Scribd company logo
NoSQL	
  Databases	
  
                  	
  


          Yousof	
  Alsatom	
  
Wirtscha1sinforma3k	
  Master	
  Program	
  
   Humboldt-­‐Universität	
  zu	
  Berlin	
  
                   	
  
                   2012	
  
                     	
  
Agenda	
  

  •  Rela3onal	
  databases	
  model	
  
      •  Advantages	
  &	
  Disadvantages	
  

  •  NoSql	
  

  •  Basic	
  Concepts,	
  Technique	
  and	
  PaOern	
  in	
  comparison	
  with	
  DBRMS	
  

       •  Consistency	
  

       •  Par33oning	
  

       •  Storage	
  Layout	
  




                                                                                                 2	
  
Agenda	
  

  •  NoSQL	
  data	
  model	
  

      •  Key	
  –	
  Value	
  
          •  DynamoDB	
  

      •  Big	
  table	
  –	
  column	
  family	
  
           •  Google	
  bigtable	
  

      •  Document	
  Databases	
  
          •  CouchDB	
  

      •  GraphDB	
  	
  
          •  Neo4j	
  

  •  Conclusion	
  



                                                     3	
  
Database	
  and	
  DBMS	
  

  •  In	
  essence,	
  a	
  database	
  is	
  a	
  collec3on	
  of	
  data	
  that	
  exists	
  over	
  a	
  long	
  period	
  of	
  
     3me,	
  o1en	
  many	
  years.	
  	
  

  •  	
  Commonly,	
  the	
  term	
  database	
  refers	
  to	
  a	
  collec3on	
  of	
  data	
  that	
  is	
  managed	
  
     by	
  a	
  Database	
  Management	
  System	
  (DBMS).	
  

  •  A	
  DBMS	
  is	
  a	
  (powerful)	
  tool	
  for	
  crea3ng	
  and	
  managing	
  large	
  amounts	
  of	
  data	
  
     efficiently	
  and	
  allowing	
  it	
  to	
  persist	
  over	
  long	
  periods	
  of	
  3me,	
  safely.	
  	
  




                                                                                                                                4	
  
Rela9onal	
  Model	
  

  •  A	
  rela3onal	
  database	
  is	
  a	
  collec3on	
  of	
  data	
  items	
  organized	
  as	
  a	
  set	
  of	
  
     formally-­‐described	
  tables	
  from	
  which	
  data	
  can	
  be	
  accessed	
  or	
  reassembled	
  in	
  
     many	
  different	
  ways	
  without	
  having	
  to	
  reorganize	
  the	
  database	
  tables.	
  
     [techtarget.com].	
  




   Edgar	
  Frank	
  "Ted"	
  Codd	
  	
  
   (August	
  23,	
  1923	
  –	
  April	
  18,	
  2003)	
  
   IBM,	
  




                                                                                                                 5	
  
Rela9onal	
  Database	
  

  •  A	
  rela9onal	
  database	
  is	
  a	
  collec3on	
  of	
  data	
  items	
  organized	
  as	
  a	
  set	
  of	
  
     formally	
  described	
  tables	
  from	
  which	
  data	
  can	
  be	
  accessed	
  easily	
  [Wikipedia].	
  




                                                                                                                 6	
  
Example,	
  Project	
  Management	
  System	
  [Qian	
  Sha,	
  2003]	
  




                                                                            7	
  
Example,	
  Project	
  Management	
  System	
  [Qian	
  Sha,	
  2003]	
  




                                                                            8	
  
Example,	
  Project	
  Management	
  System	
  [Qian	
  Sha,	
  2003]	
  

  •  Possible	
  queries	
  

  •  Give	
  ma	
  all	
  employees	
  who	
  is	
  working	
  in	
  project	
  X	
  

  •  Give	
  me	
  the	
  percentage	
  of	
  progress	
  for	
  project	
  Y	
  	
  




                                                                                        9	
  
Rela9onal	
  Database,	
  Advantages	
  

  •  Reliability	
  	
  
  	
  
  •  ACID	
  
  	
  
  •  Atomicity	
  :	
  All	
  or	
  nothing	
  

  •  Consistency	
  	
  

  •  Isola3on	
  
     •  concurrent	
  execu3on	
  of	
  transac3ons	
  results	
  in	
  a	
  system	
  state	
  that	
  could	
  
         have	
  been	
  obtained	
  if	
  transac3ons	
  are	
  executed	
  serially	
  

  •  Durability	
  
       •  means	
  that	
  once	
  a	
  transac3on	
  has	
  been	
  commiJed,	
  it	
  will	
  remain	
  so,	
  
          even	
  in	
  the	
  event	
  of	
  power	
  loss,	
  crashes,	
  or	
  errors.	
  
  	
  


                                                                                                               10	
  
Rela9onal	
  Database,	
  Limita9on	
  

  •  Scalability	
  	
  	
  
       •  Users	
  can	
  scale	
  a	
  rela3onal	
  database	
  by	
  running	
  it	
  on	
  a	
  more	
  powerful—
          and	
  expensive—	
  computer.	
  	
  

       •  To	
  scale	
  beyond	
  a	
  certain	
  point,	
  though,	
  it	
  must	
  be	
  distributed	
  across	
  
          mul3ple	
  servers.	
  	
  

       •  Rela3onal	
  databases	
  don’t	
  work	
  easily	
  in	
  a	
  distributed	
  manner	
  because	
  
          joining	
  their	
  tables	
  across	
  a	
  distributed	
  system	
  is	
  difficult.	
  [Jeremy	
  
          Zawodny]	
  

  •  Complexity	
  	
  	
  
      •  Convert	
  all	
  data	
  into	
  tables,	
  Complex,	
  slow	
  (Exampl	
  :	
  Wikipedia)	
  

  •  SQL	
  can	
  work	
  only	
  with	
  structured	
  data	
  [	
  Prof.	
  Stefan	
  Edlich,	
  Beuth	
  University	
  
     of	
  Applied	
  Sciences	
  in	
  Berlin]	
  	
  


                                                                                                                        11	
  
Rela9onal	
  Database,	
  Limita9on	
  

     Spandauer Str.1, Berlin




                                          12	
  
Problem!	
  




          Diversity	
     Connec3vity	
     Data	
  size	
  




               ?	
            ?	
                      ?	
  
                                                               13	
  
14	
  
NoSQL	
  

  •  Not	
  using	
  the	
  rela3onal	
  model	
  (nor	
  the	
  SQL	
  language)	
  

  •  No	
  schema,	
  allowing	
  fields	
  to	
  be	
  added	
  to	
  any	
  record	
  without	
  controls	
  
  	
  
  •  Open	
  source	
  

  •  Designed	
  to	
  work	
  on	
  large	
  clusters	
  

  •  Based	
  on	
  the	
  needs	
  of	
  21st	
  century	
  web	
  proper3es	
  




                                                                                                                 15	
  
NoSQL,	
  History	
  

  •  Carlo	
  Strozzi	
  used	
  the	
  term	
  NoSQL	
  in	
  1998	
  to	
  name	
  his	
  lightweight,	
  open-­‐
     source	
  rela3onal	
  database	
  that	
  did	
  not	
  expose	
  the	
  standard	
  SQL	
  interface.	
  

  •  Johan	
  Oskarsson	
  has	
  organized	
  a	
  meetup	
  for	
  folks	
  interested	
  in	
  distributed	
  
     structured	
  data	
  storage	
  and	
  is	
  calling	
  it	
  NoSQL.	
  The	
  event,	
  being	
  held	
  June	
  
     11th	
  in	
  San	
  Fransisco,	
  




                                                                                                                      16	
  
NoSQL	
  

  •  Consistency	
  	
  
      •  It	
  uses	
  an	
  eventual	
  consistency	
  (consistency	
  model	
  used	
  in	
  the	
  parallel	
  
         programming).	
  
      •  Weak	
  consistent	
  

  •  Par33oning	
  	
  
      •  Automa3c	
  Par33oning	
  (Data	
  is	
  growing	
  )	
  

  •  Storage	
  Layout	
  

       •  Row-­‐Based	
  Storage	
  Layout	
  

       •  Columnar	
  Storage	
  Layout	
  	
  

       •  …	
  	
  



                                                                                                                17	
  
NoSQL	
  

  •  Data	
  Model	
  

       •  Key	
  /	
  Value	
  

       •  Bigtable	
  

       •  DocumentDB	
  

       •  GraphDB	
  




                                  18	
  
Key	
  /	
  Value	
  




                        19	
  
Hash	
  Table	
  

  •  Type 	
  Unsorted	
  associa3ve	
  array          	
                	
  	
  
  •  Invented:	
  1953                   	
            	
  	
  
  •  Time	
  complexity	
  :	
  in	
  big	
  O	
  nota3on                	
  	
  




                          Average	
              Worst	
  case	
  
    Space	
               O(n)	
                 O(n)	
  
    Search	
              O(1	
  +	
  n/k)	
     O(n)	
  
    Insert	
              O(1)	
                 O(n)	
  
    Delete	
              O(1	
  +	
  n/k)	
     O(n)	
  




                                                                     Wikipedia	
  :	
  hOp://en.wikipedia.org/wiki/Hash_tables	
  



                                                                                                                               20	
  
Key	
  –	
  Value	
  

  •  The	
  infrastructure	
  is	
  made	
  up	
  by	
  tens	
  
     of	
  thousands	
  of	
  servers	
  and	
  network	
  
     components	
  located	
  in	
  many	
  
     datacenters	
  around	
  the	
  world.	
  	
  

  •  Availability	
  &	
  reliability	
  	
  are	
  the	
  most	
  
     important	
  factors	
  for	
  Amazon	
  

  •  Dynamo	
  targets	
  to	
  achieve	
  high	
  
     availability	
  with	
  less	
  consistency	
  




                                                                                  Service-­‐oriented	
  architecture	
  of	
  Amazon’s	
  plaXorm	
  


                                           Dynamo:	
  Amazon’s	
  Highly	
  Available	
  Key-­‐value	
  Store.	
  September	
  2007.

                                                                                                                                                  21	
  
Key	
  –	
  Value,	
  Dynamo	
  History	
  

  •  Giuseppe	
  DeCandia	
  militate	
  against	
  RDMBSs	
  at	
  Amazon	
  

  •  They	
  admit	
  that	
  advances	
  have	
  been	
  made	
  to	
  scale	
  and	
  par33on	
  RDBMSs	
  
     but	
  state	
  that	
  such	
  setups	
  remain	
  difficult	
  to	
  configure	
  and	
  operate,	
  	
  2006	
  

  •  Dynamo	
  has	
  built	
  on	
  2007	
  




                                                                                                                    22	
  
Dynamo,	
  Consistency	
  Hashing	
  

  Data	
  is	
  par33oned	
  and	
  replicated	
  using	
  consistent	
  hashing	
  	
  
  	
  
       •  Goal	
  :	
  Scalability	
  and	
  Availability	
  

       •  	
  the	
  output	
  range	
  of	
  a	
  hash	
  func3on	
  is	
  treated	
  as	
  a	
  fixed	
  circular	
  space	
  or	
  
          ““ring”	
  

       •  Ordered	
  (new	
  node	
  take	
  random	
  key)	
  

       •  Clockwise	
  

       •  Departure	
  or	
  arrival	
  a	
  node	
  effect	
  only	
  
       	
  	
  	
  	
  	
  neighbors	
  	
  
       	
  
       •  Each	
  node	
  becomes	
  responsible	
  for	
  the	
  region	
  in	
  the	
  ring	
  between	
  it	
  and	
  its	
  
                           predecessor	
  node	
  on	
  the	
  ring.	
  	
  

       •  ”Virtual	
  Nodes”:	
  Each	
  node	
  can	
  be	
  responsible	
  for	
  more	
  than	
  one	
  virtual	
  node.	
  

                                              Dynamo:	
  Amazon’s	
  Highly	
  Available	
  Key-­‐value	
  Store.	
  September	
  2007.
                                                                                                                                        23	
  
Dynamo,	
  Vector	
  Clock	
  

  •  Data	
  Versioning,	
  Dynamo	
  uses	
  vector	
                            Object	
                                   Node	
  
     clocks	
  in	
  order	
  to	
  capture	
  causality	
  
     between	
  different	
  versions	
  of	
  the	
  
     same	
  object.	
  	
  
                                                                                                                              Clock	
  

  •  A	
  vector	
  clock	
  is	
  a	
  list	
  of	
  (node,	
  
     counter)	
  pairs.	
  

  •  Every	
  version	
  of	
  every	
  object	
  is	
  
     associated	
  with	
  one	
  vector	
  clock.	
  

  •  If	
  the	
  counters	
  on	
  the	
  first	
  object’s	
  
     clock	
  are	
  less-­‐than-­‐or-­‐equal	
  to	
  all	
  of	
  
     the	
  nodes	
  in	
  the	
  second	
  clock,	
  then	
  
     the	
  first	
  is	
  an	
  ancestor	
  of	
  the	
  second	
  
     and	
  can	
  be	
  forgoOen.	
  
  	
  
  	
  	
                                       Dynamo:	
  Amazon’s	
  Highly	
  Available	
  Key-­‐value	
  Store.	
  September	
  2007.
  	
                                                                                                                           24	
  
Dynamo,	
  Overview	
  	
  




                              Source	
  :	
  hOp://de.wikipedia.org/wiki/Amazon_Dynamo	
  	
  

                                                                                        25	
  
Dynamo,	
  Sloppy	
  Quorum	
  	
  

  •  Handling	
  Failures,	
  Sloppy	
  Quorum	
  

  •  A	
  quorum	
  is	
  the	
  minimum	
  number	
  of	
  votes	
  that	
  a	
  distributed	
  transac3on	
  
     has	
  to	
  obtain	
  in	
  order	
  to	
  be	
  allowed	
  to	
  perform	
  an	
  opera3on	
  in	
  a	
  
     distributed	
  system.	
  [Wikipedia]	
  

  •  Sloppy	
  Quorum	
  	
  
       •  read	
  and	
  write	
  opera3ons	
  are	
  performed	
  on	
  the	
  first	
  N	
  healthy	
  nodes	
  
          from	
  the	
  preference	
  list,	
  which	
  may	
  not	
  always	
  be	
  the	
  first	
  N	
  nodes	
  
          encountered	
  while	
  walking	
  the	
  consistent	
  hashing	
  ring.	
  	
  
         •  Example	
  :	
  
              •  A	
  is	
  down	
  …	
  
              •  D	
  has	
  meta	
  data	
  
              •  When	
  A	
  come	
  back,	
  D	
  will	
  aOempt	
  to	
  
                 deliver	
  the	
  replica	
  to	
  A	
  	
  

                                             Dynamo:	
  Amazon’s	
  Highly	
  Available	
  Key-­‐value	
  Store.	
  September	
  2007.
                                                                                                                             26	
  
  	
  
Dynamo,	
  Gossip-­‐based	
  membership	
  protocol	
  and	
  failure	
  
detec9on.	
  

  •  A	
  gossip-­‐based	
  protocol	
  propagates	
  membership	
  changes	
  and	
  maintains	
  
     an	
  eventually	
  consistent	
  view	
  of	
  membership.	
  	
  




                                                                                                      27	
  
Key	
  –	
  Value,	
  Dynamo	
  



                         Problem	
                                          Technique	
                                         Advantage	
  

                         Par33oning	
                                    Consistent	
  Hashing	
                           Incremental	
  Scalability	
  
                                                                Vector	
  clocks	
  with	
  reconcilia3on	
     Version	
  size	
  is	
  decoupled	
  from	
  update	
  
              High	
  Availability	
  for	
  writes	
  
                                                                             during	
  reads	
                                            rates.	
  


             Handling	
  temporary	
  failures	
              Sloppy	
  Quorum	
  and	
  hinted	
  handoff	
          Provides	
  high	
  availability	
  and	
  
                                                                                                                  durability	
  guarantee	
  when	
  some	
  of	
  
                            	
                                                     	
                               the	
  replicas	
  are	
  not	
  available.	
  


                                                                                                                Synchronizes	
  divergent	
  replicas	
  in	
  the	
  
        Recovering	
  from	
  permanent	
  failures	
           An3-­‐entropy	
  using	
  Merkle	
  trees	
  
                                                                                                                            background.	
  

                                                                                                                Preserves	
  symmetry	
  and	
  avoids	
  having	
  
                                                                                                                   a	
  centralized	
  registry	
  for	
  storing	
  
                                                              Gossip-­‐based	
  membership	
  protocol	
           membership	
  and	
  node	
  liveness	
  
         Membership	
  and	
  failure	
  detec3on	
  
                                                                      and	
  failure	
  detec3on.	
                             informa3on.	
  
                                                                                                                                         	
  


                                                          Dynamo:	
  Amazon’s	
  Highly	
  Available	
  Key-­‐value	
  Store.	
  September	
  2007.	
  


                                                                                                                                                                   28	
  
Key	
  –	
  Value,	
  Dynamo	
  

  •  Query	
  Model	
  

      •  get(key)	
  :	
  objects,	
  context	
  	
  

            •  Context:	
  metadata	
  such	
  as	
  the	
  object	
  version	
  is	
  stored,	
  it	
  is	
  useful	
  
               in	
  case	
  of	
  conflict	
  

      •  put(key,	
  context,	
  object),	
  The	
  key	
  is	
  hashed	
  by	
  the	
  MD5	
  algorithm	
  	
  
      	
  




                                                                                                                       29	
  
Other	
  Key	
  /	
  Value	
  NoSQL	
  tools	
  

  Riak	
  makes	
  data	
  highly	
  available	
  for	
  use	
  in	
  read	
  and	
  write-­‐intensive	
  web	
  
  applica3ons.	
  




                                                                                                                    30	
  
Bigtable	
  




               31	
  
Bigtable	
  

  •  Bigtable	
  is	
  described	
  as	
  “a	
  distributed	
  storage	
  system	
  for	
  managing	
  
     structured	
  data	
  that	
  is	
  designed	
  to	
  scale	
  to	
  a	
  very	
  large	
  size:	
  petabytes	
  of	
  
     data	
  across	
  thousands	
  of	
  commodity	
  servers”	
  [Google	
  Labs]	
  

  •  Bigtable	
  	
  

        •  distributed,	
  

        •  Persistent	
  mul3-­‐	
  dimensional	
  sorted	
  map.	
  	
  

        •  The	
  map	
  is	
  indexed	
  by	
  a	
  row	
  key,	
  column	
  key,	
  and	
  a	
  3mestamp	
  

        •  Each	
  value	
  in	
  the	
  map	
  is	
  an	
  uninterpreted	
  array	
  of	
  bytes.	
  	
  

        •  (row:string,	
  column:string,	
  3me:int64)	
  →	
  string	
  	
  



                                                                                                                               32	
  
Google’s	
  Bigtable	
  


  •  It	
  is	
  used	
  by	
  over	
  sixty	
  projects	
  at	
  Google	
  as	
  of	
  2006,	
  	
  

        •  Web	
  indexing	
  

        •  Google	
  Earth	
  

        •  Google	
  Analy3cs	
  

        •  Orkut	
  

        •  Google	
  Docs	
  




                                                                                                        33	
  
Google’s	
  Bigtable,	
  Data	
  Model	
  

  •  Store	
  CNN	
  Web	
  pages	
  
       •  Row	
  name	
  is	
  the	
  reversed	
  URL	
  	
  
       •  Contents	
  column	
  family	
  contains	
  the	
  page	
  contents	
  	
  
       •  Anchor column family contains the text of any anchors that
          reference the page 	
  




                  Row	
  

                                                                         Column	
  Family	
  
                            A	
  Distributed	
  Storage	
  System	
  for	
  Structured	
  Data.	
  November	
  2006.	
  	
  
                            hOp://labs.google.com/papers/bigtable-­‐osdi06.pdf	
  	
  
                                                                                                                               34	
  
Google’s	
  Bigtable,	
  Data	
  Model	
  

  •  CNN’s	
  home	
  page	
  is	
  referenced	
  by	
  both	
  the	
  Sports	
  Illustrated	
  and	
  the	
  MY-­‐
     look	
  home	
  pages.	
  
  •  The	
  row	
  contains	
  columns	
  named	
  anchor:cnnsi.com	
  and	
  
     anchor:my.look.ca.	
  	
  
  •  t3	
  :	
  3me	
  stamp	
  




                      Row	
  

                                                                             Column	
  Family	
  
                                A	
  Distributed	
  Storage	
  System	
  for	
  Structured	
  Data.	
  November	
  2006.	
  	
  
                                hOp://labs.google.com/papers/bigtable-­‐osdi06.pdf	
  	
  
                                                                                                                                   35	
  
Google’s	
  Bigtable,	
  Data	
  Model	
  
 Tablet,	
  Rows	
  from	
  same	
  domain	
  




                                                 Com.google.docs	
  

                                                 Com.google.mail	
  

                                                 Com.google.play	
  




                                                 Tablet,	
  lexicographic	
  order	
  


                                                                                         36	
  
Google’s	
  Bigtable,	
  Data	
  Model	
  

  •  Notes	
  

       •  Has	
  no	
  fixed	
  of	
  number	
  of	
  rows	
  or	
  columns	
  

       •  Every	
  value	
  also	
  has	
  an	
  associated	
  3mestamp	
  	
  

       •  Each	
  value	
  is	
  addressed	
  by	
  the	
  triple	
  (domain-­‐name,	
  column-­‐name,	
  
          3mestamp)	
  	
  




                                                                                                             37	
  
Google’s	
  Bigtable,	
  Query	
  Model	
  

  •  Wri3ng	
  to	
  table	
  	
  




                                              38	
  
Google’s	
  Bigtable,	
  Query	
  Model	
  

  •  Reading	
  from	
  table	
  




                                              39	
  
Google’s	
  Bigtable,	
  More	
  

  •  Example	
  with	
  eclipse	
  :	
  hOp://www.kobu.com/appeng/index-­‐en.htm	
  
  	
  	
  
  •  Bigtable	
  as	
  a	
  web	
  service	
  :	
  hOp://bigtable.appspot.com/	
  

  •  Performance	
  and	
  benchmarking:	
  Chang,	
  Fay	
  ;	
  Dean,	
  Jeffrey	
  ;	
  Ghemawat,	
  
     Sanjay	
  ;	
  Hsieh,	
  Wilson	
  C.	
  ;	
  Wallach,	
  Deborah	
  A.	
  ;	
  Burrows,	
  Mike	
  ;	
  Chandra,	
  
     Tushar	
  ;	
  Fikes,	
  Andrew	
  ;	
  Gruber,	
  Robert	
  E.:	
  Bigtable:	
  A	
  Distributed	
  
     Storage	
  System	
  for	
  Structured	
  Data.	
  November	
  2006.	
  –	
  hOp://
     labs.google.com/papers/bigtable-­‐osdi06.pdf	
  	
  




                                                                                                                       40	
  
Other	
  Bigtable	
  NoSQL	
  tools	
  

  Use	
  HBase	
  when	
  you	
  need	
  random,	
  real3me	
  read/write	
  access	
  to	
  your	
  Big	
  
  Data.	
  This	
  project's	
  goal	
  is	
  the	
  hos3ng	
  of	
  very	
  large	
  tables	
  	
  




                                                                                                               41	
  
Document	
  Databases	
  




                            42	
  
Document	
  Databases	
  

 •  Storing,	
  retrieving,	
  and	
  managing	
  document-­‐oriented,	
  or	
  semi	
  structured	
  
    data,	
  informa3on	
  

 •  Documents	
  encapsulate	
  and	
  encode	
  data	
  (or	
  informa3on)	
  in	
  some	
  
    standard	
  formats	
  or	
  encodings.	
  	
  

 •  Encodings	
  in	
  use	
  include	
  XML,	
  YAML,	
  JSON,	
  and	
  BSON,	
  as	
  well	
  as	
  binary	
  
    forms	
  like	
  PDF	
  and	
  Microso1	
  Office	
  documents	
  (MS	
  Word,	
  Excel,	
  and	
  so	
  
    on).	
  




                               Wikipedia	
  :	
  hOp://en.wikipedia.org/wiki/Document-­‐oriented_database


                                                                                                                    43	
  
CouchDB	
  

 •  Distributed	
  Database	
  System	
  

 •  Before	
  each	
  document	
  saved	
  as	
  XML	
  	
  

 •  Javascript	
  func3on	
  (JSON	
  for	
  steriliza3on)	
  select	
  and	
  aggregate	
  documents	
  	
  

 •  Current	
  Release	
  :	
  1.2	
  (April	
  2012)	
  

 •  Started	
  on	
  2005	
  

 •  Ini3a3ve	
  :	
  Damien	
  Katz	
  




                                                                                                           44	
  
CouchDB,	
  Overview	
  

  •  Implemented	
  by	
  ERLANG	
  	
  

  •  ERLANG	
  	
  

       •  Func3onal	
  language	
  	
  

       •  It	
  was	
  designed	
  by	
  Ericsson	
  to	
  support	
  distributed,	
  fault-­‐tolerant,	
  so1-­‐
          real-­‐3me,	
  non-­‐stop	
  applica3ons.	
  

       •  Code	
  example	
  
           	
  fac(N)	
  when	
  N	
  >	
  0,	
  is_integer(N)	
  -­‐>	
  N	
  *	
  fac(N-­‐1)	
  




                                                                                                                45	
  
CouchDB,	
  Overview	
  

  •  Documents	
  consist	
  of	
  named	
  fields	
  	
  
      •  key/name	
  and	
  a	
  value.	
  

  •  Fieldname	
  has	
  to	
  be	
  unique	
  within	
  a	
  document	
  
  •  Value	
  may	
  a	
  string	
  (of	
  arbitrary	
  length),	
  number,	
  boolean,	
  date,	
  an	
  
     ordered	
  list	
  or	
  an	
  associa3ve	
  map,	
  document	
  could	
  refer	
  to	
  another	
  
     document	
  	
  

  •  Example,	
  wiki	
  ar3cle	
  (document):	
  

  •    "Title"	
  :	
  "CouchDB”,	
  
  •    "Last	
  editor"	
  :	
  "172.5.123.91”,	
  
  •    "Last	
  modified":	
  "9/23/2010”,	
  
  •    "Categories":	
  ["Database",	
  "NoSQL",	
  "Document	
  Database"],	
  	
  
  •    "Body":	
  "CouchDB	
  is	
  a	
  ...",	
  
  •    "Reviewed":	
  false	
  


                                                                                                             46	
  
CouchDB,	
  Overview	
  

  •  Each document has an id : 128 bit value

  •  Version number 32 bit value

  •  B-Trees do document indexing (id, version, some meta-data)




                                                                  47	
  
CouchDB	
  

 •  CouchDB	
  uses	
  B-­‐tree	
  storage	
  engine	
  for	
  all	
  internal	
  data,	
  documents,	
  and	
  
    views.	
  
 •  Using	
  MapReduce,	
  return	
  and	
  key	
  or	
  range,	
  complexity	
  O(log	
  N)	
  




                              Source	
  :CouchDB	
  the	
  Defini3ve	
  Guide,	
  O’REILLY,	
  Andelson,	
  Lebnardt	
  &	
  Slater	
  
                                                                                                                           48	
  
CouchDB,	
  Revisions	
  	
  

  •  If	
  you	
  want	
  to	
  change	
  a	
  field	
  in	
  specific	
  document?	
  

       •  Load	
  document	
  	
  

       •  Change	
  it	
  in	
  JSON	
  or	
  your	
  object	
  in	
  actual	
  programming	
  

       •  For	
  update	
  or	
  delete	
  a	
  document,	
  CouchDB	
  expects	
  you	
  include	
  a	
  _rev	
  

       •  When	
  CouchDB	
  confirms	
  changes,	
  it	
  generate	
  a	
  new	
  _rev	
  

       •  This	
  revision	
  system	
  also	
  called	
  a	
  Mul3-­‐Version	
  Concurrency	
  control	
  
          MVCC	
  




                                                                                                                49	
  
CouchDB,	
  Locking	
  Mechanism	
  	
  

  •  Mul3	
  Version	
  Concurrency	
  Control	
  MVCC	
  
  •  Documents	
  in	
  CouchDB	
  saved	
  like	
  they	
  are	
  in	
  Subversion	
  Control	
  




                      Source	
  :	
  CouchDB	
  the	
  Defini3ve	
  Guide,	
  O’REILLY,	
  Andelson,	
  Lebnardt	
  &	
  Slater	
  


                                                                                                                                 50	
  
CouchDB,	
  Views	
  

  {	
  
  "_id":"hello-­‐world",	
  	
  
  "_rev":"43FBA4E7AB",	
  	
  
  "3tle":"Hello	
  World”,	
  
  "body":"Well	
  hello	
  and	
  welcome	
  to	
  my	
  new	
  blog...",	
  	
  
  "date":"2009/01/15	
  15:52:20"	
  	
  
  }	
  	
  
  	
  
  {	
  
  "_id":"bought-­‐a-­‐cat",	
  	
  
  "_rev":"4A3BBEE711",	
  	
  
  "3tle":"Bought	
  a	
  Cat",	
  
  "body":"I	
  went	
  to	
  the	
  the	
  pet	
  store	
  earlier	
  and	
  brought	
  home	
  a	
  liOle	
  kiOy...",	
  
  "date":"2009/02/17	
  21:13:39"	
  	
  
  }	
  	
  
  	
  
  func3on(doc)	
  {	
  
            	
  if(doc.date	
  &&	
  doc.3tle)	
  {	
  	
  
            	
  emit(doc.date,	
  doc.3tle);	
  }	
  	
  
            	
  }	
  	
  
  	
                                                                                                                          51	
  
CouchDB,	
  AJachement	
  

 •  CouchDB	
  documents	
  can	
  have	
  aOachments	
  just	
  like	
  an	
  email	
  message	
  can	
  
    have	
  aOachments.	
  	
  
 •  AOachment	
  is	
  iden3fied	
  by	
  	
  
     •  Name	
  	
  
     •  MIME	
  type	
  (or	
  Content-­‐Type),	
  any	
  data	
  
     •  Number	
  of	
  bytes	
  the	
  aOachment	
  contains.	
  	
  

 •  Example	
  :	
  	
  
      •  curl	
  -­‐vX	
  PUT	
  hOp://127.0.0.1:5984/albums/
         6e1295ed6c29495e54cc05947f18c8af/	
  	
  artwork.jpg?
         rev=2-­‐2739352689	
  -­‐-­‐data-­‐binary	
  @artwork.jpg	
  -­‐H	
  "Content-­‐Type:	
  
         image/jpg"	
  	
  

     •  Retrieve	
  aOachment:	
  
     •  h7p://	
  127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/
        artwork.jpg	
  	
  


                                                                                                         52	
  
CouchDB,	
  Replica9on	
  

  •  CouchDB	
  replica3on	
  is	
  a	
  mechanism	
  to	
  synchronize	
  databases.	
  
  	
  	
  
  •  Replica3on	
  synchronizes	
  two	
  databases	
  locally	
  or	
  remotely.	
  	
  




                                                                                            53	
  
CouchDB,	
  Replica9on	
  

  •  Create	
  target	
  Database	
  (it	
  is	
  not	
  automa3c)	
  

       •  curl	
  -­‐X	
  PUT	
  hOp://127.0.0.1:5984/albums-­‐replica	
  

  •  Perform	
  replica3on:	
  
      •  curl	
  -­‐vX	
  POST	
  hOp://127.0.0.1:5984/_replicate	
  	
  -­‐d	
  
         '{"source":"albums","target":"albums-­‐replica"}'	
  	
  

  •  What	
  we	
  did	
  local	
  replica3on,	
  it	
  is	
  useful	
  for	
  backup	
  or	
  to	
  ac3viate	
  roll	
  back	
  

  •  It	
  is	
  important	
  to	
  note	
  that	
  replica3on	
  replicates	
  the	
  database	
  only	
  as	
  it	
  
     was	
  at	
  the	
  point	
  in	
  3me	
  when	
  replica3on	
  was	
  started.	
  	
  




                                                                                                                              54	
  
Other	
  Document	
  Database	
  tools	
  

  •  MongoDB	
  (from	
  "humongous")	
  is	
  a	
  scalable,	
  high-­‐performance,	
  open	
  
     source	
  NoSQL	
  database.	
  WriOen	
  in	
  C++,	
  




                                                                                                   55	
  
Graph	
  Database	
  




                        hOp://www.herr-­‐rau.de/wordpress/2006/06/your-­‐website-­‐as-­‐a-­‐graph.htm	
  

                                                                                                   56	
  
Graph	
  Databases	
  

  •  A	
  graph	
  database	
  uses	
  graph	
  structures	
  with	
  nodes,	
  edges,	
  and	
  proper3es	
  
     to	
  represent	
  and	
  store	
  data.	
  By	
  defini3on,	
  a	
  graph	
  database	
  is	
  any	
  storage	
  
     system	
  that	
  provides	
  index-­‐free	
  adjacency.	
  This	
  means	
  that	
  every	
  element	
  
     contains	
  a	
  direct	
  pointer	
  to	
  its	
  adjacent	
  element	
  and	
  no	
  index	
  lookups	
  are	
  
     necessary	
  [Wikipedia].	
  




                                                                                                                      57	
  
Graph	
  Databases	
  




 Survey	
  of	
  Graph	
  Database	
  Models	
  ,	
  ACM	
  Compu3ng	
  Surveys,	
  Vol.	
  40,	
  No.	
  1,	
  Ar3cle	
  1,	
  
 Publica3on	
  date:	
  February	
  2008.	
  RENZO	
  ANGLES	
  and	
  CLAUDIO	
  GUTIERREZ,	
  University	
  Chile	
  	
  	
  


                                                                                                                     58	
  
Graph	
  Databases,	
  Data	
  model	
  proper9es	
  

      	
  
  •  Graph	
  databases	
  are	
  o1en	
  faster	
  for	
  associa3ve	
  data	
  sets	
  

  •  Scale	
  more	
  naturally	
  to	
  large	
  data	
  sets	
  as	
  they	
  do	
  not	
  typically	
  require	
  
     expensive	
  join	
  opera3ons.	
  	
  

  •  As	
  they	
  depend	
  less	
  on	
  a	
  rigid	
  schema,	
  they	
  are	
  more	
  suitable	
  to	
  manage	
  
     ad-­‐hoc	
  and	
  changing	
  data	
  with	
  evolving	
  schemas.	
  

  •  Graph	
  databases	
  are	
  a	
  powerful	
  tool	
  for	
  graph-­‐like	
  queries	
  

       •  Compu3ng	
  the	
  shortest	
  path	
  between	
  two	
  nodes	
  in	
  the	
  graph.	
  	
  

       •  Other	
  graph-­‐like	
  queries	
  can	
  be	
  performed	
  over	
  a	
  graph	
  database	
  in	
  a	
  
          natural	
  way	
  (for	
  example	
  graph's	
  diameter	
  computa3ons	
  or	
  
          community	
  detec3on).	
  


                                                                                                                          59	
  
Graph	
  Databases,	
  Neo4j	
  

  •  Neo4j	
  is	
  an	
  open-­‐source	
  graph	
  database,	
  implemented	
  in	
  Java.	
  

  •  The	
  developers	
  describe	
  Neo4j	
  as	
  "embedded,	
  disk-­‐based,	
  fully	
  
     transac3onal	
  Java	
  persistence	
  engine	
  that	
  stores	
  data	
  structured	
  in	
  graphs	
  
     rather	
  than	
  in	
  tables".	
  	
  

  •  Neo4j	
  version	
  1.0	
  was	
  released	
  in	
  February,	
  2010.	
  	
  

  •  Neo4j	
  was	
  developed	
  by	
  Neo	
  Technology,	
  Inc.,	
  based	
  in	
  the	
  San	
  Francisco	
  
     Bay	
  Area,	
  US	
  and	
  Malmö,	
  Sweden.	
  	
  




                                                                                                                    60	
  
Neo4j,	
  Node	
  &	
  Rela9on	
  

  •  A	
  Graph	
  contains	
  Nodes	
  and	
  Rela3onships	
  

      •  “A	
  Graph	
  —records	
  data	
  in→	
  Nodes	
  
         —which	
  have→	
  Proper3es”	
  


      •  “Nodes	
  —are	
  organized	
  by→	
  
         Rela3onships	
  —which	
  also	
  have→	
  
         Proper3es”	
  




                                                                  61	
  
Neo4j,	
  Traversal	
  

  •  	
  Query	
  a	
  Graph	
  with	
  a	
  Traversal	
  

       •  Traversal	
  —navigates→	
  a	
  
          Graph;	
  it	
  —iden3fies→	
  Paths	
  
          —which	
  order→	
  Nodes	
  

       •  A	
  Traversal	
  is	
  how	
  you	
  query	
  
          a	
  Graph,	
  naviga3ng	
  from	
  
          star3ng	
  Nodes	
  to	
  related	
  
          Nodes	
  according	
  to	
  an	
  
          algorithm,	
  finding	
  answers	
  to	
  
          ques3ons	
  like	
  “what	
  music	
  
          do	
  my	
  friends	
  like	
  that	
  I	
  don’t	
  
          yet	
  own,”	
  or	
  “if	
  this	
  power	
  
          supply	
  goes	
  down,	
  what	
  web	
  
          services	
  are	
  affected?”	
  


                                                                  62	
  
Neo4j,	
  Indexes	
  

  •  Indexes	
  look-­‐up	
  Nodes	
  or	
  Rela3onships	
  

      •  “An	
  Index	
  —maps	
  from→	
  Proper3es	
  
         —to	
  either→	
  Nodes	
  or	
  Rela3onships”	
  

      •  O1en,	
  you	
  want	
  to	
  find	
  a	
  specific	
  
         Node	
  or	
  Rela9onship	
  according	
  to	
  a	
  
         Property	
  it	
  has.	
  Rather	
  than	
  
         traversing	
  the	
  en3re	
  graph,	
  use	
  an	
  
         Index	
  to	
  perform	
  a	
  look-­‐up,	
  for	
  
         ques3ons	
  like	
  “find	
  the	
  Account	
  for	
  
         username	
  master-­‐of-­‐graphs.”	
  




                                                                 63	
  
Neo4j,	
  Database	
  

  •  Neo4j	
  is	
  a	
  Graph	
  Database	
  

       •  “A	
  Graph	
  Database	
  —
          manages	
  a→	
  Graph	
  
          and	
  —also	
  manages	
  
          related→	
  Indexes”	
  




                                                 64	
  
Neo4j	
  	
  Helloworld	
  example	
  	
  

  firstNode	
  =	
  graphDb.createNode();	
  
  firstNode.setProperty(	
  "message",	
  "Hello,	
  "	
  );	
  
  secondNode	
  =	
  graphDb.createNode();	
  
  secondNode.setProperty(	
  "message",	
  "World!"	
  );	
  
  	
  	
  
  rela3onship	
  =	
  firstNode.createRela3onshipTo(	
  secondNode,	
  RelTypes.KNOWS	
  );	
  
  rela3onship.setProperty(	
  "message",	
  "brave	
  Neo4j	
  "	
  );	
  




                                                                                                 65	
  
Neo4j	
  	
  &	
  Java	
  &	
  eclipse	
  	
  

   Tutorial	
  :	
  
   hOp://technoracle.blogspot.de/2012/05/third-­‐neo4j-­‐tutorial-­‐geˆng-­‐started.html	
  

   •  import	
  org.neo4j.graphdb.GraphDatabaseService;	
  
   •  DB_PATH	
  =	
  “/Users/neo4j-­‐1.8”	
  

   •  GraphDatabaseService	
  graphDb;	
  
   •  Node	
  myFirstNode;	
  
   •  Rela3onship	
  myRela3onship;	
  

   •  graphDb	
  =	
  new	
  GraphDatabaseFactory().newEmbeddedDatabase(	
  DB_PATH	
  );	
  

   •    myFirstNode	
  =	
  graphDb.createNode();	
  
   •    myFirstNode.setProperty(	
  "name",	
  "Duane	
  Nickull,	
  I	
  Braineater"	
  );	
  
   •    mySecondNode	
  =	
  graphDb.createNode();	
  
   •    mySecondNode.setProperty(	
  "name",	
  "Randy	
  Rampage,	
  Annihilator"	
  );	
  

   •  myRela3onship	
  =	
  myFirstNode.createRela3onshipTo(	
  mySecondNode,	
  
      RelTypes.KNOWS	
  );	
  
   •  myRela3onship.setProperty(	
  "rela3onship-­‐type",	
  "knows"	
  );	
  
                                                                                                  66	
  
Other	
  Graph	
  Database	
  tools	
  

  •  BigData	
  RDF	
  

       •  SPARQL	
  

       •  RDFS+	
  inference	
  




                                          67	
  
Conclusion
             68	
  
NoSQL,	
  BASE	
  

  •  NoSQL	
  characterized	
  by	
  BASE:	
  
  •  	
  	
  
  •  Basically	
  Available:	
  Use	
  replica3on	
  to	
  reduce	
  the	
  likelihood	
  of	
  data	
  
     unavailability	
  and	
  use	
  sharding,	
  or	
  par33oning	
  the	
  data	
  among	
  many	
  
     different	
  storage	
  servers,	
  to	
  make	
  any	
  remaining	
  failures	
  par3al.	
  The	
  result	
  is	
  
     a	
  system	
  that	
  is	
  always	
  available,	
  even	
  if	
  subsets	
  of	
  the	
  data	
  become	
  
     unavailable	
  for	
  short	
  periods	
  of	
  3me.	
  	
  

  •  So1	
  state:	
  While	
  ACID	
  systems	
  assume	
  that	
  data	
  consistency	
  is	
  a	
  hard	
  
     requirement,	
  NoSQL	
  systems	
  allow	
  data	
  to	
  be	
  inconsistent	
  and	
  relegate	
  
     designing	
  around	
  such	
  inconsistencies	
  to	
  applica3on	
  developers.	
  	
  

  •  Eventually	
  consistent:	
  Although	
  applica3ons	
  must	
  deal	
  with	
  instantaneous	
  
     consistency,	
  NoSQL	
  systems	
  ensure	
  that	
  at	
  some	
  future	
  point	
  in	
  3me	
  the	
  data	
  
     assumes	
  a	
  consistent	
  state.	
  In	
  contrast	
  to	
  ACID	
  systems	
  that	
  enforce	
  
     consistency	
  at	
  transac3on	
  commit,	
  NoSQL	
  guarantees	
  consistency	
  only	
  at	
  
     some	
  undefined	
  future	
  3me.	
  	
  

                                                                                                                   69	
  
ACID	
  vs.	
  BASE	
  




                   noSQL	
  Databases,	
  Prof.	
  Walter	
  Kriha,	
  StuOgart	
  Media	
  University	
  




                                                                                                             70	
  
Sta9s9cs	
  

  •  The	
  worldwide	
  NoSQL	
  market	
  is	
  expected	
  to	
  reach	
  $3.4	
  Billion	
  by	
  2018	
  at	
  a	
  
     CAGR	
  of	
  21%	
  between	
  2013	
  and	
  2018.	
  NoSQL	
  market	
  will	
  generate	
  $14	
  
     Billion	
  in	
  revenues	
  over	
  the	
  period	
  2013	
  –	
  2018.	
  

  •  CAGR	
  :	
  Compound	
  annual	
  growth	
  rate	
  




  •  V(t0)	
  :	
  start	
  value,	
  V(tn)	
  	
  :	
  finish	
  value,	
  	
  
  •  tn-­‐	
  t0	
  	
  :	
  number	
  of	
  years.	
  
  	
  


                 Resource	
  :	
  hOp://www.marketresearchmedia.com/2010/11/11/nosql-­‐market/	
  


                                                                                                                       71	
  
When	
  to	
  USE?	
  	
  


      Size	
  
                   Key	
  -­‐	
  Value	
  

                                             Bigtable	
  

                                                            Doc-­‐DB	
  

                                                                           GraphDB	
  




                                                                                  Complexity	
  

                                                                                                   From neo4j




                                                                                                                72	
  
When	
  to	
  USE?	
  	
  




             hOp://paolodedios.com/blog/2010/5/19/the-­‐visual-­‐guide-­‐to-­‐nosql-­‐systems.html	
  
                                                                                                         73	
  
Who	
  uses	
  NoSQL	
  


                              FlockDB	
  
       Dynamo	
  




                           Cassandra	
  
        Bigtable	
  




                                            74	
  
Resources	
  




                http://www.stu-dentdiaries.com/2010_05_01_archive.html
                                                                         75	
  
Resources,	
  Books	
  




                          76	
  
Papers	
  

  1.  DeCandia,	
  Giuseppe	
  ;	
  Hastorun,	
  Deniz	
  ;	
  Jampani,	
  Madan	
  ;	
  Kakulapa3,	
  Gu-­‐	
  
       navardhan	
  ;	
  Lakshman,	
  Avinash	
  ;	
  Pilchin,	
  Alex	
  ;	
  Sivasubramanian,	
  Swaminathan	
  ;	
  
       Vosshall,	
  Peter	
  ;	
  Vogels,	
  Werner:	
  Dynamo:	
  Amazon’s	
  Highly	
  Available	
  Key-­‐value	
  
       Store.	
  September	
  2007.	
  
  	
  
  2.  Chang,	
  Fay	
  ;	
  Dean,	
  Jeffrey	
  ;	
  Ghemawat,	
  Sanjay	
  ;	
  Hsieh,	
  Wilson	
  C.	
  ;	
  Wallach,	
  Deborah	
  
       A.	
  ;	
  Burrows,	
  Mike	
  ;	
  Chandra,	
  Tushar	
  ;	
  Fikes,	
  Andrew	
  ;	
  Gruber,	
  Robert	
  E.:	
  Bigtable:	
  A	
  
       Distributed	
  Storage	
  System	
  for	
  Structured	
  Data.	
  November	
  2006.	
  –	
  hOp://
       labs.google.com/papers/bigtable-­‐osdi06.pdf	
  	
  

  3.  Fay	
  Chang,	
  Jeffrey	
  Dean,	
  Sanjay	
  Ghemawat,	
  Wilson	
  C.	
  Hsieh,	
  Deborah	
  A.	
  Wallach	
  
      Mike	
  Burrows,	
  Tushar	
  Chandra,	
  Andrew	
  Fikes,	
  Robert	
  E.	
  Gruber:	
  Bigtable:	
  A	
  
      Distributed	
  Storage	
  System	
  for	
  Structured	
  Data	
  2006	
  

  4.  RENZO	
  ANGLES	
  and	
  CLAUDIO	
  GUTIERREZ,	
  University	
  Chile	
  :	
  Survey	
  of	
  Graph	
  
      Database	
  Models	
  ,	
  ACM	
  Compu3ng	
  Surveys,	
  Vol.	
  40,	
  No.	
  1,	
  Ar3cle	
  1,	
  Publica3on	
  
      date:	
  February	
  2008.	
  	
  



                                                                                                                                       77	
  
Papers	
  

  5.  Survey	
  of	
  Graph	
  Database	
  Performance	
  on	
  the	
  HPC	
  Scalable	
  Graph	
  Analysis	
  
      Benchmark,	
  D.	
  Dominguez-­‐Sal,	
  P.	
  Urb	
  ́on-­‐Bayes,	
  A.	
  Gim	
   enez-­‐Van	
  ̃o	
  ́,	
  S.	
  Go	
  
                                                                                           ́
      ́mez-­‐Villamor,	
   N.	
   Mart	
   ́ınez-­‐Baz	
   ́an,	
   and	
   J.L.	
   Larriba-­‐Pey,	
   Universitat	
  
       Polit`ecnica	
  de	
  Catalunya,	
  
       	
  2010	
  
  6.  Chad	
   Vicknair,	
   Michael	
   Macias:	
   A	
   Comparison	
   of	
   a	
   Graph	
   Database	
   and	
   a	
  
       Rela3onal	
   Database,	
   A	
   Data	
   Provenance	
   Perspec3ve	
   ,	
   ACMSE	
   ’10,	
   April	
  
       15-­‐17,	
  2010,	
  Oxford,	
  MS,	
  USA	
  	
  

  7.  Bradford	
   Stephens.	
   HBase	
   vs.	
   Cassandra:	
   NoSQL	
   Bat-­‐	
   tle!,	
   2009.	
   hOp://
      www.roadtofailure.com/2009/10/29/	
   hbase-­‐vs-­‐cassandra-­‐nosql-­‐baOle/
      comment-­‐page-­‐1/,	
  last	
  accessed	
  on	
  February	
  2011.	
  	
  

  8.  ON-­‐LINE	
  PROJECT	
  MANAGEMENT	
  SYSTEM,	
  Qian	
  Sha	
  
      Bachelor	
   of	
   Economics,	
   Capital	
   University	
   of	
   Economics	
   and	
   Business,	
   2003	
  
      Will	
  NoSQL	
  Databases	
  Live	
  Up	
  to	
  Their	
  Promise?	
  Neal	
  LeaviO,	
  2010	
  


                                                                                                                       78	
  
Papers	
  

  9.  Karger,	
   D.,	
   Lehman,	
   E.,	
   Leighton,	
   T.,	
   Panigrahy,	
   R.,	
   Levine,	
   M.,	
   and	
   Lewin,	
   D.	
   1997.	
  
      Consistent	
  hashing	
  and	
  random	
  trees:	
  distributed	
  caching	
  protocols	
  for	
  relieving	
  hot	
  
      spots	
   on	
   the	
   World	
   Wide	
   Web.	
   In	
   Proceedings	
   of	
   the	
   Twenty-­‐Ninth	
   Annual	
   ACM	
  
      Symposium	
   on	
   theory	
   of	
   Compu3ng	
   (El	
   Paso,	
   Texas,	
   United	
   States,	
   May	
   04	
   -­‐	
   06,	
  
      1997).	
  STOC	
  '97.	
  ACM	
  Press,	
  New	
  York,	
  NY,	
  654-­‐663.	
  

  10. Lamport,	
   L.Time,	
   clocks	
   and	
   the	
   ordering	
   of	
   events	
   in	
   a	
   distributed	
   system.	
   ACM	
  
      Communica3ons,	
  21(7),	
  pp.	
  558-­‐	
  565,	
  1978.	
  	
  

  11. André	
   Allavena	
   ,	
   Alan	
   Demers,	
   John	
   E.	
   Hopcro1	
   :	
   Correctness	
   of	
   a	
   Gossip	
   Based	
  
      Membership	
  Protocol	
  	
  NY	
  2005,	
  ACM	
  1-­‐58113-­‐994-­‐2/05/0007	
  	
  




                                                                                                                                         79	
  
Resources,	
  Web	
  link	
  	
  

  •  Introduc3on	
  data	
  structure	
  for	
  GraphDB,	
  Shunya	
  Kimura	
  	
  :	
  	
  
  hOp://www.slideshare.net/skimura/graphdatabase-­‐data-­‐structure	
  
  •  Compare	
  nosql	
  database	
  :	
  hOp://nosql.findthebest.com/	
  
  •  Oracle	
  White	
  paper	
  Sep.2011	
  Oracle	
  NoSQL	
  Database	
  
  •  CouchDB:	
  hOp://www.couchbase.com/	
  
  •  Open	
  Source	
  implementa3on	
  of	
  Big	
  Table:	
  HBase,	
  hOp://hbase.apache.org/	
  
  •  hOp://www.db-­‐class.org/course/video/preview_list	
  (Stanford	
  university)	
  
  •  hOp://technirvanaa.wordpress.com/tag/nosql-­‐disadvantages/	
  	
  	
  	
  	
  (March.	
  2011)	
  
  •  hOp://www.kavistechnology.com/blog/?p=1577	
  	
  	
  	
  	
  	
  	
  	
  	
  (March	
  2010)	
  	
  
  •  hOp://www.couchbase.com/press-­‐releases/couchbase-­‐survey-­‐shows-­‐accelerated-­‐
      adop3on-­‐nosql-­‐2012	
  	
  	
  	
  	
  	
  (Survey	
  2012)	
  	
  	
  	
  	
  	
  
  •  hOp://www.couchbase.com/why-­‐nosql/nosql-­‐database	
  
  •  Couch	
  DB	
  wiki	
  :	
  hOp://wiki.apache.org/couchdb/	
  	
  
  •  hOp://highlyscalable.wordpress.com/2012/03/01/nosql-­‐data-­‐modeling-­‐techniques/	
  
        	
  (Very	
  good)	
  
  •  hOp://neo4j.org/	
  
  •  hOp://blog.neo4j.org/2010/03/modeling-­‐categories-­‐in-­‐graph-­‐database.html	
  
  •  Neo4j	
  documenta3on	
  :	
  hOp://components.neo4j.org/neo4j/1.8.M05/apidocs/	
  
  •  SQL	
  Databases	
  v.	
  noSQL	
  Databases,	
  Michael	
  Stonebraker,	
  MIT,	
  2010	
  
        	
  
                                                                                                      80	
  
Do	
  you	
  want	
  to	
  know	
  more?	
  

  •  What	
  The	
  Heck	
  Are	
  You	
  Actually	
  Using	
  Nosql	
  For?	
  
  hOp://highscalability.com/blog/2010/12/6/what-­‐the-­‐heck-­‐are-­‐you-­‐actually-­‐
  using-­‐nosql-­‐for.html	
  
  	
  
  Nice	
  Tutorials	
  for	
  couchDB	
  	
  
  hOp://couchapp.org/page/videos	
  
  	
  




                                                                                         81	
  
CouchDB,	
  Example	
  

  •  Download	
  CouchDB	
  from	
  :	
  hOp://couchdb.apache.org/	
  

  •  Example	
  source	
  :	
  Source	
  :	
  CouchDB	
  the	
  Defini3ve	
  Guide,	
  O’REILLY,	
  
     Andelson,	
  Lebnardt	
  &	
  Slater	
  (
     hOp://guide.couchdb.org/dra1/tour.html#figure/4	
  )	
  



  •  GO	
  -­‐>	
  	
  	
  hOp://127.0.0.1:5984/	
  




                                                                                                      82	
  

More Related Content

What's hot

No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
Salma Gouia
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
Nikiforos Botis
 
Replication Techniques for Distributed Database Design
Replication Techniques for Distributed Database DesignReplication Techniques for Distributed Database Design
Replication Techniques for Distributed Database Design
Meghaj Mallick
 
NoSql
NoSqlNoSql
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
Microsoft Tech Community
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
Norberto Leite
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
YounesCharfaoui
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
Fabio Fumarola
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
Tariqul islam
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
MongoDB
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
Mike Dirolf
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
MongoDB
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 

What's hot (20)

No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Replication Techniques for Distributed Database Design
Replication Techniques for Distributed Database DesignReplication Techniques for Distributed Database Design
Replication Techniques for Distributed Database Design
 
NoSql
NoSqlNoSql
NoSql
 
MongoDB
MongoDBMongoDB
MongoDB
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 

Similar to NoSQL

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation free
Benoit Perroud
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
Swapna Jk
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Big data stores
Big data  storesBig data  stores
Big data stores
Kumaran Ramanujam
 
No SQL
No SQLNo SQL
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql database
Heman Hosainpana
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?
DataStax
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
Abiral Gautam
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
Daniel Austin
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
Mat Keep
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
David Funaro
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 

Similar to NoSQL (20)

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation free
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Big data stores
Big data  storesBig data  stores
Big data stores
 
No SQL
No SQLNo SQL
No SQL
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql database
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

NoSQL

  • 1. NoSQL  Databases     Yousof  Alsatom   Wirtscha1sinforma3k  Master  Program   Humboldt-­‐Universität  zu  Berlin     2012    
  • 2. Agenda   •  Rela3onal  databases  model   •  Advantages  &  Disadvantages   •  NoSql   •  Basic  Concepts,  Technique  and  PaOern  in  comparison  with  DBRMS   •  Consistency   •  Par33oning   •  Storage  Layout   2  
  • 3. Agenda   •  NoSQL  data  model   •  Key  –  Value   •  DynamoDB   •  Big  table  –  column  family   •  Google  bigtable   •  Document  Databases   •  CouchDB   •  GraphDB     •  Neo4j   •  Conclusion   3  
  • 4. Database  and  DBMS   •  In  essence,  a  database  is  a  collec3on  of  data  that  exists  over  a  long  period  of   3me,  o1en  many  years.     •   Commonly,  the  term  database  refers  to  a  collec3on  of  data  that  is  managed   by  a  Database  Management  System  (DBMS).   •  A  DBMS  is  a  (powerful)  tool  for  crea3ng  and  managing  large  amounts  of  data   efficiently  and  allowing  it  to  persist  over  long  periods  of  3me,  safely.     4  
  • 5. Rela9onal  Model   •  A  rela3onal  database  is  a  collec3on  of  data  items  organized  as  a  set  of   formally-­‐described  tables  from  which  data  can  be  accessed  or  reassembled  in   many  different  ways  without  having  to  reorganize  the  database  tables.   [techtarget.com].   Edgar  Frank  "Ted"  Codd     (August  23,  1923  –  April  18,  2003)   IBM,   5  
  • 6. Rela9onal  Database   •  A  rela9onal  database  is  a  collec3on  of  data  items  organized  as  a  set  of   formally  described  tables  from  which  data  can  be  accessed  easily  [Wikipedia].   6  
  • 7. Example,  Project  Management  System  [Qian  Sha,  2003]   7  
  • 8. Example,  Project  Management  System  [Qian  Sha,  2003]   8  
  • 9. Example,  Project  Management  System  [Qian  Sha,  2003]   •  Possible  queries   •  Give  ma  all  employees  who  is  working  in  project  X   •  Give  me  the  percentage  of  progress  for  project  Y     9  
  • 10. Rela9onal  Database,  Advantages   •  Reliability       •  ACID     •  Atomicity  :  All  or  nothing   •  Consistency     •  Isola3on   •  concurrent  execu3on  of  transac3ons  results  in  a  system  state  that  could   have  been  obtained  if  transac3ons  are  executed  serially   •  Durability   •  means  that  once  a  transac3on  has  been  commiJed,  it  will  remain  so,   even  in  the  event  of  power  loss,  crashes,  or  errors.     10  
  • 11. Rela9onal  Database,  Limita9on   •  Scalability       •  Users  can  scale  a  rela3onal  database  by  running  it  on  a  more  powerful— and  expensive—  computer.     •  To  scale  beyond  a  certain  point,  though,  it  must  be  distributed  across   mul3ple  servers.     •  Rela3onal  databases  don’t  work  easily  in  a  distributed  manner  because   joining  their  tables  across  a  distributed  system  is  difficult.  [Jeremy   Zawodny]   •  Complexity       •  Convert  all  data  into  tables,  Complex,  slow  (Exampl  :  Wikipedia)   •  SQL  can  work  only  with  structured  data  [  Prof.  Stefan  Edlich,  Beuth  University   of  Applied  Sciences  in  Berlin]     11  
  • 12. Rela9onal  Database,  Limita9on   Spandauer Str.1, Berlin 12  
  • 13. Problem!   Diversity   Connec3vity   Data  size   ?   ?   ?   13  
  • 14. 14  
  • 15. NoSQL   •  Not  using  the  rela3onal  model  (nor  the  SQL  language)   •  No  schema,  allowing  fields  to  be  added  to  any  record  without  controls     •  Open  source   •  Designed  to  work  on  large  clusters   •  Based  on  the  needs  of  21st  century  web  proper3es   15  
  • 16. NoSQL,  History   •  Carlo  Strozzi  used  the  term  NoSQL  in  1998  to  name  his  lightweight,  open-­‐ source  rela3onal  database  that  did  not  expose  the  standard  SQL  interface.   •  Johan  Oskarsson  has  organized  a  meetup  for  folks  interested  in  distributed   structured  data  storage  and  is  calling  it  NoSQL.  The  event,  being  held  June   11th  in  San  Fransisco,   16  
  • 17. NoSQL   •  Consistency     •  It  uses  an  eventual  consistency  (consistency  model  used  in  the  parallel   programming).   •  Weak  consistent   •  Par33oning     •  Automa3c  Par33oning  (Data  is  growing  )   •  Storage  Layout   •  Row-­‐Based  Storage  Layout   •  Columnar  Storage  Layout     •  …     17  
  • 18. NoSQL   •  Data  Model   •  Key  /  Value   •  Bigtable   •  DocumentDB   •  GraphDB   18  
  • 19. Key  /  Value   19  
  • 20. Hash  Table   •  Type  Unsorted  associa3ve  array       •  Invented:  1953       •  Time  complexity  :  in  big  O  nota3on     Average   Worst  case   Space   O(n)   O(n)   Search   O(1  +  n/k)   O(n)   Insert   O(1)   O(n)   Delete   O(1  +  n/k)   O(n)   Wikipedia  :  hOp://en.wikipedia.org/wiki/Hash_tables   20  
  • 21. Key  –  Value   •  The  infrastructure  is  made  up  by  tens   of  thousands  of  servers  and  network   components  located  in  many   datacenters  around  the  world.     •  Availability  &  reliability    are  the  most   important  factors  for  Amazon   •  Dynamo  targets  to  achieve  high   availability  with  less  consistency   Service-­‐oriented  architecture  of  Amazon’s  plaXorm   Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007. 21  
  • 22. Key  –  Value,  Dynamo  History   •  Giuseppe  DeCandia  militate  against  RDMBSs  at  Amazon   •  They  admit  that  advances  have  been  made  to  scale  and  par33on  RDBMSs   but  state  that  such  setups  remain  difficult  to  configure  and  operate,    2006   •  Dynamo  has  built  on  2007   22  
  • 23. Dynamo,  Consistency  Hashing   Data  is  par33oned  and  replicated  using  consistent  hashing       •  Goal  :  Scalability  and  Availability   •   the  output  range  of  a  hash  func3on  is  treated  as  a  fixed  circular  space  or   ““ring”   •  Ordered  (new  node  take  random  key)   •  Clockwise   •  Departure  or  arrival  a  node  effect  only            neighbors       •  Each  node  becomes  responsible  for  the  region  in  the  ring  between  it  and  its   predecessor  node  on  the  ring.     •  ”Virtual  Nodes”:  Each  node  can  be  responsible  for  more  than  one  virtual  node.   Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007. 23  
  • 24. Dynamo,  Vector  Clock   •  Data  Versioning,  Dynamo  uses  vector   Object   Node   clocks  in  order  to  capture  causality   between  different  versions  of  the   same  object.     Clock   •  A  vector  clock  is  a  list  of  (node,   counter)  pairs.   •  Every  version  of  every  object  is   associated  with  one  vector  clock.   •  If  the  counters  on  the  first  object’s   clock  are  less-­‐than-­‐or-­‐equal  to  all  of   the  nodes  in  the  second  clock,  then   the  first  is  an  ancestor  of  the  second   and  can  be  forgoOen.         Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007.   24  
  • 25. Dynamo,  Overview     Source  :  hOp://de.wikipedia.org/wiki/Amazon_Dynamo     25  
  • 26. Dynamo,  Sloppy  Quorum     •  Handling  Failures,  Sloppy  Quorum   •  A  quorum  is  the  minimum  number  of  votes  that  a  distributed  transac3on   has  to  obtain  in  order  to  be  allowed  to  perform  an  opera3on  in  a   distributed  system.  [Wikipedia]   •  Sloppy  Quorum     •  read  and  write  opera3ons  are  performed  on  the  first  N  healthy  nodes   from  the  preference  list,  which  may  not  always  be  the  first  N  nodes   encountered  while  walking  the  consistent  hashing  ring.     •  Example  :   •  A  is  down  …   •  D  has  meta  data   •  When  A  come  back,  D  will  aOempt  to   deliver  the  replica  to  A     Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007. 26    
  • 27. Dynamo,  Gossip-­‐based  membership  protocol  and  failure   detec9on.   •  A  gossip-­‐based  protocol  propagates  membership  changes  and  maintains   an  eventually  consistent  view  of  membership.     27  
  • 28. Key  –  Value,  Dynamo   Problem   Technique   Advantage   Par33oning   Consistent  Hashing   Incremental  Scalability   Vector  clocks  with  reconcilia3on   Version  size  is  decoupled  from  update   High  Availability  for  writes   during  reads   rates.   Handling  temporary  failures   Sloppy  Quorum  and  hinted  handoff   Provides  high  availability  and   durability  guarantee  when  some  of       the  replicas  are  not  available.   Synchronizes  divergent  replicas  in  the   Recovering  from  permanent  failures   An3-­‐entropy  using  Merkle  trees   background.   Preserves  symmetry  and  avoids  having   a  centralized  registry  for  storing   Gossip-­‐based  membership  protocol   membership  and  node  liveness   Membership  and  failure  detec3on   and  failure  detec3on.   informa3on.     Dynamo:  Amazon’s  Highly  Available  Key-­‐value  Store.  September  2007.   28  
  • 29. Key  –  Value,  Dynamo   •  Query  Model   •  get(key)  :  objects,  context     •  Context:  metadata  such  as  the  object  version  is  stored,  it  is  useful   in  case  of  conflict   •  put(key,  context,  object),  The  key  is  hashed  by  the  MD5  algorithm       29  
  • 30. Other  Key  /  Value  NoSQL  tools   Riak  makes  data  highly  available  for  use  in  read  and  write-­‐intensive  web   applica3ons.   30  
  • 31. Bigtable   31  
  • 32. Bigtable   •  Bigtable  is  described  as  “a  distributed  storage  system  for  managing   structured  data  that  is  designed  to  scale  to  a  very  large  size:  petabytes  of   data  across  thousands  of  commodity  servers”  [Google  Labs]   •  Bigtable     •  distributed,   •  Persistent  mul3-­‐  dimensional  sorted  map.     •  The  map  is  indexed  by  a  row  key,  column  key,  and  a  3mestamp   •  Each  value  in  the  map  is  an  uninterpreted  array  of  bytes.     •  (row:string,  column:string,  3me:int64)  →  string     32  
  • 33. Google’s  Bigtable   •  It  is  used  by  over  sixty  projects  at  Google  as  of  2006,     •  Web  indexing   •  Google  Earth   •  Google  Analy3cs   •  Orkut   •  Google  Docs   33  
  • 34. Google’s  Bigtable,  Data  Model   •  Store  CNN  Web  pages   •  Row  name  is  the  reversed  URL     •  Contents  column  family  contains  the  page  contents     •  Anchor column family contains the text of any anchors that reference the page   Row   Column  Family   A  Distributed  Storage  System  for  Structured  Data.  November  2006.     hOp://labs.google.com/papers/bigtable-­‐osdi06.pdf     34  
  • 35. Google’s  Bigtable,  Data  Model   •  CNN’s  home  page  is  referenced  by  both  the  Sports  Illustrated  and  the  MY-­‐ look  home  pages.   •  The  row  contains  columns  named  anchor:cnnsi.com  and   anchor:my.look.ca.     •  t3  :  3me  stamp   Row   Column  Family   A  Distributed  Storage  System  for  Structured  Data.  November  2006.     hOp://labs.google.com/papers/bigtable-­‐osdi06.pdf     35  
  • 36. Google’s  Bigtable,  Data  Model   Tablet,  Rows  from  same  domain   Com.google.docs   Com.google.mail   Com.google.play   Tablet,  lexicographic  order   36  
  • 37. Google’s  Bigtable,  Data  Model   •  Notes   •  Has  no  fixed  of  number  of  rows  or  columns   •  Every  value  also  has  an  associated  3mestamp     •  Each  value  is  addressed  by  the  triple  (domain-­‐name,  column-­‐name,   3mestamp)     37  
  • 38. Google’s  Bigtable,  Query  Model   •  Wri3ng  to  table     38  
  • 39. Google’s  Bigtable,  Query  Model   •  Reading  from  table   39  
  • 40. Google’s  Bigtable,  More   •  Example  with  eclipse  :  hOp://www.kobu.com/appeng/index-­‐en.htm       •  Bigtable  as  a  web  service  :  hOp://bigtable.appspot.com/   •  Performance  and  benchmarking:  Chang,  Fay  ;  Dean,  Jeffrey  ;  Ghemawat,   Sanjay  ;  Hsieh,  Wilson  C.  ;  Wallach,  Deborah  A.  ;  Burrows,  Mike  ;  Chandra,   Tushar  ;  Fikes,  Andrew  ;  Gruber,  Robert  E.:  Bigtable:  A  Distributed   Storage  System  for  Structured  Data.  November  2006.  –  hOp:// labs.google.com/papers/bigtable-­‐osdi06.pdf     40  
  • 41. Other  Bigtable  NoSQL  tools   Use  HBase  when  you  need  random,  real3me  read/write  access  to  your  Big   Data.  This  project's  goal  is  the  hos3ng  of  very  large  tables     41  
  • 43. Document  Databases   •  Storing,  retrieving,  and  managing  document-­‐oriented,  or  semi  structured   data,  informa3on   •  Documents  encapsulate  and  encode  data  (or  informa3on)  in  some   standard  formats  or  encodings.     •  Encodings  in  use  include  XML,  YAML,  JSON,  and  BSON,  as  well  as  binary   forms  like  PDF  and  Microso1  Office  documents  (MS  Word,  Excel,  and  so   on).   Wikipedia  :  hOp://en.wikipedia.org/wiki/Document-­‐oriented_database 43  
  • 44. CouchDB   •  Distributed  Database  System   •  Before  each  document  saved  as  XML     •  Javascript  func3on  (JSON  for  steriliza3on)  select  and  aggregate  documents     •  Current  Release  :  1.2  (April  2012)   •  Started  on  2005   •  Ini3a3ve  :  Damien  Katz   44  
  • 45. CouchDB,  Overview   •  Implemented  by  ERLANG     •  ERLANG     •  Func3onal  language     •  It  was  designed  by  Ericsson  to  support  distributed,  fault-­‐tolerant,  so1-­‐ real-­‐3me,  non-­‐stop  applica3ons.   •  Code  example    fac(N)  when  N  >  0,  is_integer(N)  -­‐>  N  *  fac(N-­‐1)   45  
  • 46. CouchDB,  Overview   •  Documents  consist  of  named  fields     •  key/name  and  a  value.   •  Fieldname  has  to  be  unique  within  a  document   •  Value  may  a  string  (of  arbitrary  length),  number,  boolean,  date,  an   ordered  list  or  an  associa3ve  map,  document  could  refer  to  another   document     •  Example,  wiki  ar3cle  (document):   •  "Title"  :  "CouchDB”,   •  "Last  editor"  :  "172.5.123.91”,   •  "Last  modified":  "9/23/2010”,   •  "Categories":  ["Database",  "NoSQL",  "Document  Database"],     •  "Body":  "CouchDB  is  a  ...",   •  "Reviewed":  false   46  
  • 47. CouchDB,  Overview   •  Each document has an id : 128 bit value •  Version number 32 bit value •  B-Trees do document indexing (id, version, some meta-data) 47  
  • 48. CouchDB   •  CouchDB  uses  B-­‐tree  storage  engine  for  all  internal  data,  documents,  and   views.   •  Using  MapReduce,  return  and  key  or  range,  complexity  O(log  N)   Source  :CouchDB  the  Defini3ve  Guide,  O’REILLY,  Andelson,  Lebnardt  &  Slater   48  
  • 49. CouchDB,  Revisions     •  If  you  want  to  change  a  field  in  specific  document?   •  Load  document     •  Change  it  in  JSON  or  your  object  in  actual  programming   •  For  update  or  delete  a  document,  CouchDB  expects  you  include  a  _rev   •  When  CouchDB  confirms  changes,  it  generate  a  new  _rev   •  This  revision  system  also  called  a  Mul3-­‐Version  Concurrency  control   MVCC   49  
  • 50. CouchDB,  Locking  Mechanism     •  Mul3  Version  Concurrency  Control  MVCC   •  Documents  in  CouchDB  saved  like  they  are  in  Subversion  Control   Source  :  CouchDB  the  Defini3ve  Guide,  O’REILLY,  Andelson,  Lebnardt  &  Slater   50  
  • 51. CouchDB,  Views   {   "_id":"hello-­‐world",     "_rev":"43FBA4E7AB",     "3tle":"Hello  World”,   "body":"Well  hello  and  welcome  to  my  new  blog...",     "date":"2009/01/15  15:52:20"     }       {   "_id":"bought-­‐a-­‐cat",     "_rev":"4A3BBEE711",     "3tle":"Bought  a  Cat",   "body":"I  went  to  the  the  pet  store  earlier  and  brought  home  a  liOle  kiOy...",   "date":"2009/02/17  21:13:39"     }       func3on(doc)  {    if(doc.date  &&  doc.3tle)  {      emit(doc.date,  doc.3tle);  }      }       51  
  • 52. CouchDB,  AJachement   •  CouchDB  documents  can  have  aOachments  just  like  an  email  message  can   have  aOachments.     •  AOachment  is  iden3fied  by     •  Name     •  MIME  type  (or  Content-­‐Type),  any  data   •  Number  of  bytes  the  aOachment  contains.     •  Example  :     •  curl  -­‐vX  PUT  hOp://127.0.0.1:5984/albums/ 6e1295ed6c29495e54cc05947f18c8af/    artwork.jpg? rev=2-­‐2739352689  -­‐-­‐data-­‐binary  @artwork.jpg  -­‐H  "Content-­‐Type:   image/jpg"     •  Retrieve  aOachment:   •  h7p://  127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/ artwork.jpg     52  
  • 53. CouchDB,  Replica9on   •  CouchDB  replica3on  is  a  mechanism  to  synchronize  databases.       •  Replica3on  synchronizes  two  databases  locally  or  remotely.     53  
  • 54. CouchDB,  Replica9on   •  Create  target  Database  (it  is  not  automa3c)   •  curl  -­‐X  PUT  hOp://127.0.0.1:5984/albums-­‐replica   •  Perform  replica3on:   •  curl  -­‐vX  POST  hOp://127.0.0.1:5984/_replicate    -­‐d   '{"source":"albums","target":"albums-­‐replica"}'     •  What  we  did  local  replica3on,  it  is  useful  for  backup  or  to  ac3viate  roll  back   •  It  is  important  to  note  that  replica3on  replicates  the  database  only  as  it   was  at  the  point  in  3me  when  replica3on  was  started.     54  
  • 55. Other  Document  Database  tools   •  MongoDB  (from  "humongous")  is  a  scalable,  high-­‐performance,  open   source  NoSQL  database.  WriOen  in  C++,   55  
  • 56. Graph  Database   hOp://www.herr-­‐rau.de/wordpress/2006/06/your-­‐website-­‐as-­‐a-­‐graph.htm   56  
  • 57. Graph  Databases   •  A  graph  database  uses  graph  structures  with  nodes,  edges,  and  proper3es   to  represent  and  store  data.  By  defini3on,  a  graph  database  is  any  storage   system  that  provides  index-­‐free  adjacency.  This  means  that  every  element   contains  a  direct  pointer  to  its  adjacent  element  and  no  index  lookups  are   necessary  [Wikipedia].   57  
  • 58. Graph  Databases   Survey  of  Graph  Database  Models  ,  ACM  Compu3ng  Surveys,  Vol.  40,  No.  1,  Ar3cle  1,   Publica3on  date:  February  2008.  RENZO  ANGLES  and  CLAUDIO  GUTIERREZ,  University  Chile       58  
  • 59. Graph  Databases,  Data  model  proper9es     •  Graph  databases  are  o1en  faster  for  associa3ve  data  sets   •  Scale  more  naturally  to  large  data  sets  as  they  do  not  typically  require   expensive  join  opera3ons.     •  As  they  depend  less  on  a  rigid  schema,  they  are  more  suitable  to  manage   ad-­‐hoc  and  changing  data  with  evolving  schemas.   •  Graph  databases  are  a  powerful  tool  for  graph-­‐like  queries   •  Compu3ng  the  shortest  path  between  two  nodes  in  the  graph.     •  Other  graph-­‐like  queries  can  be  performed  over  a  graph  database  in  a   natural  way  (for  example  graph's  diameter  computa3ons  or   community  detec3on).   59  
  • 60. Graph  Databases,  Neo4j   •  Neo4j  is  an  open-­‐source  graph  database,  implemented  in  Java.   •  The  developers  describe  Neo4j  as  "embedded,  disk-­‐based,  fully   transac3onal  Java  persistence  engine  that  stores  data  structured  in  graphs   rather  than  in  tables".     •  Neo4j  version  1.0  was  released  in  February,  2010.     •  Neo4j  was  developed  by  Neo  Technology,  Inc.,  based  in  the  San  Francisco   Bay  Area,  US  and  Malmö,  Sweden.     60  
  • 61. Neo4j,  Node  &  Rela9on   •  A  Graph  contains  Nodes  and  Rela3onships   •  “A  Graph  —records  data  in→  Nodes   —which  have→  Proper3es”   •  “Nodes  —are  organized  by→   Rela3onships  —which  also  have→   Proper3es”   61  
  • 62. Neo4j,  Traversal   •   Query  a  Graph  with  a  Traversal   •  Traversal  —navigates→  a   Graph;  it  —iden3fies→  Paths   —which  order→  Nodes   •  A  Traversal  is  how  you  query   a  Graph,  naviga3ng  from   star3ng  Nodes  to  related   Nodes  according  to  an   algorithm,  finding  answers  to   ques3ons  like  “what  music   do  my  friends  like  that  I  don’t   yet  own,”  or  “if  this  power   supply  goes  down,  what  web   services  are  affected?”   62  
  • 63. Neo4j,  Indexes   •  Indexes  look-­‐up  Nodes  or  Rela3onships   •  “An  Index  —maps  from→  Proper3es   —to  either→  Nodes  or  Rela3onships”   •  O1en,  you  want  to  find  a  specific   Node  or  Rela9onship  according  to  a   Property  it  has.  Rather  than   traversing  the  en3re  graph,  use  an   Index  to  perform  a  look-­‐up,  for   ques3ons  like  “find  the  Account  for   username  master-­‐of-­‐graphs.”   63  
  • 64. Neo4j,  Database   •  Neo4j  is  a  Graph  Database   •  “A  Graph  Database  — manages  a→  Graph   and  —also  manages   related→  Indexes”   64  
  • 65. Neo4j    Helloworld  example     firstNode  =  graphDb.createNode();   firstNode.setProperty(  "message",  "Hello,  "  );   secondNode  =  graphDb.createNode();   secondNode.setProperty(  "message",  "World!"  );       rela3onship  =  firstNode.createRela3onshipTo(  secondNode,  RelTypes.KNOWS  );   rela3onship.setProperty(  "message",  "brave  Neo4j  "  );   65  
  • 66. Neo4j    &  Java  &  eclipse     Tutorial  :   hOp://technoracle.blogspot.de/2012/05/third-­‐neo4j-­‐tutorial-­‐geˆng-­‐started.html   •  import  org.neo4j.graphdb.GraphDatabaseService;   •  DB_PATH  =  “/Users/neo4j-­‐1.8”   •  GraphDatabaseService  graphDb;   •  Node  myFirstNode;   •  Rela3onship  myRela3onship;   •  graphDb  =  new  GraphDatabaseFactory().newEmbeddedDatabase(  DB_PATH  );   •  myFirstNode  =  graphDb.createNode();   •  myFirstNode.setProperty(  "name",  "Duane  Nickull,  I  Braineater"  );   •  mySecondNode  =  graphDb.createNode();   •  mySecondNode.setProperty(  "name",  "Randy  Rampage,  Annihilator"  );   •  myRela3onship  =  myFirstNode.createRela3onshipTo(  mySecondNode,   RelTypes.KNOWS  );   •  myRela3onship.setProperty(  "rela3onship-­‐type",  "knows"  );   66  
  • 67. Other  Graph  Database  tools   •  BigData  RDF   •  SPARQL   •  RDFS+  inference   67  
  • 68. Conclusion 68  
  • 69. NoSQL,  BASE   •  NoSQL  characterized  by  BASE:   •      •  Basically  Available:  Use  replica3on  to  reduce  the  likelihood  of  data   unavailability  and  use  sharding,  or  par33oning  the  data  among  many   different  storage  servers,  to  make  any  remaining  failures  par3al.  The  result  is   a  system  that  is  always  available,  even  if  subsets  of  the  data  become   unavailable  for  short  periods  of  3me.     •  So1  state:  While  ACID  systems  assume  that  data  consistency  is  a  hard   requirement,  NoSQL  systems  allow  data  to  be  inconsistent  and  relegate   designing  around  such  inconsistencies  to  applica3on  developers.     •  Eventually  consistent:  Although  applica3ons  must  deal  with  instantaneous   consistency,  NoSQL  systems  ensure  that  at  some  future  point  in  3me  the  data   assumes  a  consistent  state.  In  contrast  to  ACID  systems  that  enforce   consistency  at  transac3on  commit,  NoSQL  guarantees  consistency  only  at   some  undefined  future  3me.     69  
  • 70. ACID  vs.  BASE   noSQL  Databases,  Prof.  Walter  Kriha,  StuOgart  Media  University   70  
  • 71. Sta9s9cs   •  The  worldwide  NoSQL  market  is  expected  to  reach  $3.4  Billion  by  2018  at  a   CAGR  of  21%  between  2013  and  2018.  NoSQL  market  will  generate  $14   Billion  in  revenues  over  the  period  2013  –  2018.   •  CAGR  :  Compound  annual  growth  rate   •  V(t0)  :  start  value,  V(tn)    :  finish  value,     •  tn-­‐  t0    :  number  of  years.     Resource  :  hOp://www.marketresearchmedia.com/2010/11/11/nosql-­‐market/   71  
  • 72. When  to  USE?     Size   Key  -­‐  Value   Bigtable   Doc-­‐DB   GraphDB   Complexity   From neo4j 72  
  • 73. When  to  USE?     hOp://paolodedios.com/blog/2010/5/19/the-­‐visual-­‐guide-­‐to-­‐nosql-­‐systems.html   73  
  • 74. Who  uses  NoSQL   FlockDB   Dynamo   Cassandra   Bigtable   74  
  • 75. Resources   http://www.stu-dentdiaries.com/2010_05_01_archive.html 75  
  • 77. Papers   1.  DeCandia,  Giuseppe  ;  Hastorun,  Deniz  ;  Jampani,  Madan  ;  Kakulapa3,  Gu-­‐   navardhan  ;  Lakshman,  Avinash  ;  Pilchin,  Alex  ;  Sivasubramanian,  Swaminathan  ;   Vosshall,  Peter  ;  Vogels,  Werner:  Dynamo:  Amazon’s  Highly  Available  Key-­‐value   Store.  September  2007.     2.  Chang,  Fay  ;  Dean,  Jeffrey  ;  Ghemawat,  Sanjay  ;  Hsieh,  Wilson  C.  ;  Wallach,  Deborah   A.  ;  Burrows,  Mike  ;  Chandra,  Tushar  ;  Fikes,  Andrew  ;  Gruber,  Robert  E.:  Bigtable:  A   Distributed  Storage  System  for  Structured  Data.  November  2006.  –  hOp:// labs.google.com/papers/bigtable-­‐osdi06.pdf     3.  Fay  Chang,  Jeffrey  Dean,  Sanjay  Ghemawat,  Wilson  C.  Hsieh,  Deborah  A.  Wallach   Mike  Burrows,  Tushar  Chandra,  Andrew  Fikes,  Robert  E.  Gruber:  Bigtable:  A   Distributed  Storage  System  for  Structured  Data  2006   4.  RENZO  ANGLES  and  CLAUDIO  GUTIERREZ,  University  Chile  :  Survey  of  Graph   Database  Models  ,  ACM  Compu3ng  Surveys,  Vol.  40,  No.  1,  Ar3cle  1,  Publica3on   date:  February  2008.     77  
  • 78. Papers   5.  Survey  of  Graph  Database  Performance  on  the  HPC  Scalable  Graph  Analysis   Benchmark,  D.  Dominguez-­‐Sal,  P.  Urb  ́on-­‐Bayes,  A.  Gim   enez-­‐Van  ̃o  ́,  S.  Go   ́ ́mez-­‐Villamor,   N.   Mart   ́ınez-­‐Baz   ́an,   and   J.L.   Larriba-­‐Pey,   Universitat   Polit`ecnica  de  Catalunya,    2010   6.  Chad   Vicknair,   Michael   Macias:   A   Comparison   of   a   Graph   Database   and   a   Rela3onal   Database,   A   Data   Provenance   Perspec3ve   ,   ACMSE   ’10,   April   15-­‐17,  2010,  Oxford,  MS,  USA     7.  Bradford   Stephens.   HBase   vs.   Cassandra:   NoSQL   Bat-­‐   tle!,   2009.   hOp:// www.roadtofailure.com/2009/10/29/   hbase-­‐vs-­‐cassandra-­‐nosql-­‐baOle/ comment-­‐page-­‐1/,  last  accessed  on  February  2011.     8.  ON-­‐LINE  PROJECT  MANAGEMENT  SYSTEM,  Qian  Sha   Bachelor   of   Economics,   Capital   University   of   Economics   and   Business,   2003   Will  NoSQL  Databases  Live  Up  to  Their  Promise?  Neal  LeaviO,  2010   78  
  • 79. Papers   9.  Karger,   D.,   Lehman,   E.,   Leighton,   T.,   Panigrahy,   R.,   Levine,   M.,   and   Lewin,   D.   1997.   Consistent  hashing  and  random  trees:  distributed  caching  protocols  for  relieving  hot   spots   on   the   World   Wide   Web.   In   Proceedings   of   the   Twenty-­‐Ninth   Annual   ACM   Symposium   on   theory   of   Compu3ng   (El   Paso,   Texas,   United   States,   May   04   -­‐   06,   1997).  STOC  '97.  ACM  Press,  New  York,  NY,  654-­‐663.   10. Lamport,   L.Time,   clocks   and   the   ordering   of   events   in   a   distributed   system.   ACM   Communica3ons,  21(7),  pp.  558-­‐  565,  1978.     11. André   Allavena   ,   Alan   Demers,   John   E.   Hopcro1   :   Correctness   of   a   Gossip   Based   Membership  Protocol    NY  2005,  ACM  1-­‐58113-­‐994-­‐2/05/0007     79  
  • 80. Resources,  Web  link     •  Introduc3on  data  structure  for  GraphDB,  Shunya  Kimura    :     hOp://www.slideshare.net/skimura/graphdatabase-­‐data-­‐structure   •  Compare  nosql  database  :  hOp://nosql.findthebest.com/   •  Oracle  White  paper  Sep.2011  Oracle  NoSQL  Database   •  CouchDB:  hOp://www.couchbase.com/   •  Open  Source  implementa3on  of  Big  Table:  HBase,  hOp://hbase.apache.org/   •  hOp://www.db-­‐class.org/course/video/preview_list  (Stanford  university)   •  hOp://technirvanaa.wordpress.com/tag/nosql-­‐disadvantages/          (March.  2011)   •  hOp://www.kavistechnology.com/blog/?p=1577                  (March  2010)     •  hOp://www.couchbase.com/press-­‐releases/couchbase-­‐survey-­‐shows-­‐accelerated-­‐ adop3on-­‐nosql-­‐2012            (Survey  2012)             •  hOp://www.couchbase.com/why-­‐nosql/nosql-­‐database   •  Couch  DB  wiki  :  hOp://wiki.apache.org/couchdb/     •  hOp://highlyscalable.wordpress.com/2012/03/01/nosql-­‐data-­‐modeling-­‐techniques/    (Very  good)   •  hOp://neo4j.org/   •  hOp://blog.neo4j.org/2010/03/modeling-­‐categories-­‐in-­‐graph-­‐database.html   •  Neo4j  documenta3on  :  hOp://components.neo4j.org/neo4j/1.8.M05/apidocs/   •  SQL  Databases  v.  noSQL  Databases,  Michael  Stonebraker,  MIT,  2010     80  
  • 81. Do  you  want  to  know  more?   •  What  The  Heck  Are  You  Actually  Using  Nosql  For?   hOp://highscalability.com/blog/2010/12/6/what-­‐the-­‐heck-­‐are-­‐you-­‐actually-­‐ using-­‐nosql-­‐for.html     Nice  Tutorials  for  couchDB     hOp://couchapp.org/page/videos     81  
  • 82. CouchDB,  Example   •  Download  CouchDB  from  :  hOp://couchdb.apache.org/   •  Example  source  :  Source  :  CouchDB  the  Defini3ve  Guide,  O’REILLY,   Andelson,  Lebnardt  &  Slater  ( hOp://guide.couchdb.org/dra1/tour.html#figure/4  )   •  GO  -­‐>      hOp://127.0.0.1:5984/   82