SlideShare a Scribd company logo
1 of 51
Download to read offline
Content	
  
•  Introduc*on	
  
•  Databases	
  
    –  ACID	
  
    –  Data	
  structures,	
  algorithms	
  
    –  Scalability	
  issues	
  
    –  Scaling	
  pa=erns	
  
•  Search	
  engines	
  
    –  Data	
  structures,	
  algorithms	
  
    –  Pros	
  &	
  cons	
  
•  NoSQL	
  Movement	
  
    –  Why	
  and	
  What	
  
                                                     1
Content	
  
•  NoSQL	
  Families	
  
    –  Key	
  value	
  stores	
  
    –  Column	
  stores	
  
    –  Document	
  stores	
  
    –  Graph	
  DB	
  
•  Principles:	
  CAP,	
  Scaling	
  pa=erns,	
  High	
  availability	
  
   pa=erns,	
  Elas*city	
  
•  How	
  to	
  choose	
  ?	
  
•  Conclusion	
  


                                                                            2
Introduc,on	
  
• Who	
  we	
  are:	
  
    – Clément	
  STENAC	
  (Indexing	
  and	
  search	
  techs)	
  	
  
    – Jérémie	
  BORDIER	
  (360	
  team	
  (a	
  bit	
  of	
  everything))	
  	
  

• Exalead:	
  
    – Indexing	
  technologies	
  provider	
  since	
  1998	
  
    – Online	
  search	
  engine:	
  h=p://www.exalead.com	
  
    – Daily	
  challenge:	
  Tackle	
  informa*on	
  access	
  
      problems	
  for	
  large	
  companies.	
  

                                                                             3
Introduc,on	
  
• Universal	
  answer	
  to	
  data	
  storage:	
  
  	
   	
  	
  	
  	
  RELATIONAL	
  DATABASES	
  
• Well	
  known	
  data	
  representa*on:	
  Objects	
  
  and	
  rela*onships	
  
• Powerful	
  query	
  language:	
  SQL	
  
• Open	
  source	
  implementa*ons:	
  
   – MySQL	
  
   – PostgreSQL	
  
   – …	
  
                                                           4
Introduc,on	
  
• Database	
  scalability	
  problems	
  ?	
  
• Used	
  to	
  be	
  a	
  Telco	
  and	
  bank	
  problem…	
  
• Un*l	
  the	
  internet	
  has	
  come	
  !	
  




                        Twitter whale, 2008
                                                                  5
Introduc,on	
  
• Thanks	
  to	
  the	
  internet…	
  
• …millions	
  of	
  rows	
  is	
  frequent…	
  
• …	
  real	
  *me	
  websites.	
  

      How	
  to	
  deal	
  with	
  massive	
  amount	
  of	
  
     structured	
  data	
  ?	
  Are	
  there	
  alterna*ves	
  ?	
  
              What’s	
  this	
  NoSQL	
  buzz	
  ?  	
  


                                                                       6
Knowing	
  your	
  enemy:	
  

RELATIONAL	
  DATABASES	
  

                                7
Databases:	
  ACID	
  

ACID	
  constraints	
  

• Atomicity	
  
  • Transac*ons	
  succeed	
  or	
  fail	
  atomically	
  
• Consistency	
  
  • Transac*ons	
  leave	
  the	
  database	
  in	
  a	
  consistent	
  
    state	
  
• Isola,on	
  
  • Transac*ons	
  do	
  not	
  see	
  the	
  effects	
  of	
  concurrent	
  
    transac*ons	
  
• Durability	
  
  • Once	
  a	
  transac*on	
  is	
  commi=ed,	
  it	
  can’t	
  be	
  lost	
  
Database	
  structures	
  
                                                                                      Primary	
  storage	
  
                                                                                CREATE TABLE author (
                                             Heuris*cs	
  change	
  it	
           id INTEGER PRIMARY KEY,
                                                                                   nick VARCHAR(16),               Fixed size
                                               to	
  variable-­‐size	
             age INTEGER,
                                                                                   firstname VARCHAR(128),
                                                                                   biography TEXT);                Variable size
                                                                                CREATE TABLE post (
    Each	
  value	
  or	
  pointer	
                                               id INTEGER PRIMARY KEY,
    can	
  be	
  retrieved	
  at	
  a	
                                            author_id FOREIGN KEY REFERENCES author(id);
                                                                                   timestamp TIMESTAMP,
   known	
  offset	
  in	
  the	
  row	
   	
                                       title VARCHAR(256),
                                                                                   text TEXT);


                            Id             age                          nick               firstname   biography
   Row 1                 4 bytes         4 bytes                      16 bytes               pointer    pointer


                            Id             age                          nick               firstname   biography
   Row 2                 4 bytes         4 bytes                      16 bytes               pointer    pointer




Table strings             len           data              len            data        len     data      len         data
Searching	
  in	
  a	
  database	
  
                                                       SELECT * FROM author WHERE age=24;




The	
  raw	
  way:	
  full	
  scan	
  

• Enumerate	
  all	
  records	
  in	
  the	
  table	
  
• For	
  each	
  record,	
  fetch	
  the	
  condi*on	
  value	
  
  • Inline	
  value:	
  direct	
  access	
  at	
  row_address + offset(column)
  • Outside	
  value	
  :	
  fetch	
  pointer	
  and	
  fetch	
  data	
  
• Perform	
  comparison	
  

Analysis	
  

• Need	
  to	
  analyse	
  the	
  full	
  table	
  
• Very	
  CPU	
  intensive	
  
• If	
  the	
  table	
  does	
  not	
  fit	
  in	
  memory	
  ?	
  –	
  I/O	
  on	
  the	
  whole	
  table	
  
Database	
  structures	
  
                                                            Indexes	
  

What	
  is	
  an	
  index	
  ?	
  

• Primary	
  storage:	
  forward	
  mapping	
  
  row_id –> row data
• Index	
  :	
  reverse	
  mapping	
  
  row data –> row_id(s)
• Updated	
  together	
  with	
  the	
  primary	
  storage	
  	
  

Searching	
  with	
  an	
  index	
  

• Retrieve	
  the	
  row	
  ids	
  using	
  the	
  index	
  
• Fetch	
  the	
  row	
  data	
  from	
  primary	
  storage	
  
Database	
  structures	
  
                                                   Indexes	
  –	
  Hash	
  index	
  
How	
  it	
  works	
  

• Stores	
  hashes	
  of	
  column	
  values	
  in	
  as	
  hash-­‐table	
  
• Retrieve	
  through	
  the	
  hash	
  table	
  

Pros	
  

• Very	
  easy	
  and	
  fast	
  to	
  update	
  
• Fast	
  lookup	
  –	
  single	
  hashtable	
  lookup	
  

Cons	
       	
  

• Only	
  provides	
  equality	
  matching	
  
• Unable	
  to	
  answer	
  inequality	
  queries	
  
Database	
  structures	
  
                                                                 Indexes	
  –	
  BTree	
  index	
  




Binary search tree                                                      B-Tree


 Pros	
  

 • Provides	
  range	
  and	
  inequality	
  queries	
  easily	
  
 • Quite	
  fast	
  (logarithmic)	
  opera*ons	
  

 Cons	
      	
  

 • More	
  complex	
  	
  and	
  expensive	
  to	
  update	
  
   • B-­‐Tree	
  rebalancing	
  
Choosing	
  how	
  to	
  search	
  
Is	
  indexed	
  search	
  always	
  be=er	
  ?	
  

• SELECT * from author where age < 300;

Analysis	
  

•  Fetch	
  of	
  whole	
  table	
  
•  Index:	
  random	
  lookups	
  
•  Full	
  scan	
  :	
  sequen*al	
  fetch	
  

Choosing	
  wisely	
  

• Iden*fy	
  the	
  expensive	
  queries	
  
• Use	
  the	
  EXPLAIN	
  statement	
  
• Only	
  add	
  indexes	
  where	
  they	
  are	
  required	
  
  • Indexes	
  are	
  expensive	
  to	
  update	
  
Joining	
  

Goal	
  

• Put	
  together	
  data	
  from	
  several	
  tables	
  
• For	
  some	
  values	
  in	
  table	
  A,	
  find	
  matching	
  values	
  
  in	
  table	
  B	
  

Example	
  

•  ELECT * FROM post
 S
 INNER JOIN author
 ON author.id = post.author_id
 WHERE author.age = 42;
Join	
  algorithms	
  
Nested	
  loops	
  
• Foreach (author WHERE age=42) {
           Foreach(post) {
                        if (post.author_id == author.id) {
                                      append post to the result set;
                        }
           }
  }
• Very	
  naive	
  algorithm	
  :	
  runs	
  in	
  PxA	
  *me	
  
• Provides	
  all	
  predicates	
  


Hash	
  join	
  

• Algorithm	
  
  • Make	
  a	
  hashtable	
  of	
  author	
  ids	
  matching	
  the	
  «	
  age	
  =	
  42	
  »	
  condi*on	
  
  • Scan	
  once	
  the	
  post	
  table	
  
  • For	
  each	
  post,	
  lookup	
  in	
  the	
  hashtable	
  to	
  check	
  if	
  it	
  matches	
  a	
  valid	
  author	
  	
  
• Faster	
  than	
  nested	
  loops	
  (2	
  scans	
  instead	
  of	
  A)	
  
• Requires	
  memory	
  to	
  store	
  the	
  hashtable	
  
• Only	
  provides	
  equality	
  predicate	
  
Join	
  algorithms	
  
Merge	
  join	
  

• Need	
  to	
  have	
  both	
  tables	
  sorted	
  by	
  join	
  key	
  
  • Post	
  sorted	
  by	
  author_id	
  
  • Author	
  sorted	
  by	
  id	
  
• Perform	
  a	
  single	
  parallel	
  scan	
  of	
  the	
  two	
  tables	
  and	
  iden*fy	
  matches	
  
• Fastest	
  algorithm,	
  but	
  needs	
  sorted	
  data	
  
  • Disk-­‐based	
  sort	
  for	
  large	
  data	
  sets	
  

Choice	
  of	
  join	
  algorithm	
  

• Performed	
  automa*cally	
  by	
  the	
  query	
  op*mizer	
  (EXPLAIN)	
  
• Main	
  parameters:	
  
  • Rela*ons	
  cardinali*es	
  
  • Data	
  order	
  (presence	
  of	
  an	
  ORDER	
  BY	
  clause	
  ?)	
  
  • Available	
  indexes	
  
• JOIN	
  are	
  always	
  expensive	
  -­‐>	
  schema	
  denormaliza,on	
  
Database	
  scaling	
  	
  
                                                                                         Typical	
  workloads	
  
Mostly	
  read	
  workloads	
  

• Example:	
  Wikipedia	
  
• First	
  solu*on:	
  high-­‐level	
  (frontend	
  *er)	
  caching	
  
• Database	
  scaling	
  :	
  1	
  master	
  –	
  N	
  slaves	
  
  • Replica,on	
  of	
  changes	
  from	
  master	
  to	
  slaves	
  
• Does	
  not	
  solve	
  the	
  write	
  bo=leneck	
  problem	
  

High	
  write	
  workloads	
  

• Examples:	
  credit	
  cards,	
  
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Twi=er	
  (>1000	
  tweets/second,	
  1000s	
  of	
  deliveries)	
  
• Performance	
  limited	
  by	
  write	
  I/O	
  throughput	
  
  • Because	
  of	
  the	
  «	
  D	
  »	
  constraint	
  
  • Hard	
  to	
  have	
  more	
  than	
  1000-­‐2000	
  writes/second	
  
Database	
  scaling	
  	
  
                                                                    Scaling	
  writes	
  
 Mul*ple	
  master	
  setups	
  

 •  All	
  masters	
  have	
  the	
  same	
  data	
  and	
  share	
  the	
  updates	
  
    •  «	
  share-­‐all	
  »	
  cluster	
  architecture	
  
 •  Extremely	
  complex	
  synchroniza*on	
  
    •  Bi-­‐direc*onal	
  replica*on	
  
    •  Conflict	
  detec*on	
  
 •  Bad	
  performance	
  
 •  Complex	
  resilience	
  
    •  Down*me	
  of	
  a	
  master:	
  need	
  a	
  resync	
  	
  
 •  Complex,	
  heavy	
  and	
  expensive	
  architectures	
  


                                             Bi-directional
Client 1               Master               replication flow
                                                                    Master                Client 2
                         1                                            2
Database	
  scaling	
  	
  
                                                     Scaling	
  writes	
  
Sharding	
  

• Split	
  the	
  data	
  between	
  the	
  masters	
  based	
  on	
  a	
  
  criterion	
  
  • Date	
  
  • User	
  id	
  
  • 	
  hash(url),	
  …	
  
• Clients	
  query	
  the	
  correct	
  master	
  for	
  each	
  data	
  
• No	
  shared	
  data	
  between	
  masters	
  («	
  share-­‐nothing	
  »)	
  

                                  Client 1
                 Master                              Master
                   1                                   2
                                  Client 2
Database	
  scaling	
  	
  
                                         Problems	
  with	
  SQL	
  sharding	
  
Complexity	
  

• Not	
  integrated	
  in	
  SQL	
  
• Need	
  to	
  perform	
  the	
  sharding	
  in	
  applica*ve	
  code	
  

Resilience	
  

• Several	
  machines	
  but	
  no	
  resilience	
  
• Loss	
  of	
  one	
  master	
  =	
  loss	
  of	
  data	
  (compare	
  to	
  RAID-­‐0)	
  

Loss	
  of	
  features	
  

• You	
  can’t	
  do	
  cross-­‐shard	
  joins	
  

Complex	
  evolu*ons	
  

• How	
  do	
  you	
  keep	
  scaling	
  ?	
  
• To	
  add	
  another	
  machine,	
  you	
  need	
  to	
  change	
  the	
  distribu*on	
  func*on	
  
Database	
  scaling	
  	
  
                             Other	
  SQL	
  shortcomings	
  


Strict	
  schema	
  

• It	
  is	
  good,	
  it	
  provides	
  strong	
  typing	
  
• But,	
  migra*on	
  hell	
  !	
  
• Web	
  applica*ons	
  changes	
  quickly	
  
• Not	
  «	
  Agile	
  »	
  
On	
  the	
  other	
  side:	
  

SEARCH	
  ENGINES	
  

                                  23
A	
  quick	
  look	
  at	
  search	
  engines	
  

Differences	
  from	
  a	
  tradi*onal	
  database	
  

• Not	
  designed	
  for	
  OLTP	
  
• Update	
  by	
  batches	
  
  • No	
  transac*ons,	
  updates	
  are	
  available	
  to	
  readers	
  
    «	
  later	
  »	
  
• Heavily	
  read-­‐op*mized	
  

Full	
  text	
  search	
  

• It’s	
  more	
  complex	
  than	
  	
  LIKE ’%myword%’;
• Need	
  specific	
  data	
  structures	
  
Search	
  engines	
  
                                                                                               Inverted	
  lists	
  
What	
  is	
  is	
  

• A	
  data	
  structure	
  mapping	
  a	
  «	
  word	
  iden*fier	
  »	
  to	
  a	
  list	
  of	
  «	
  document	
  
  iden*fier	
  »	
  
• For	
  each	
  word	
  of	
  each	
  document,	
  store	
  the	
  posi*ons	
  




    Document	
  1	
                                                                                 List	
  for	
  word	
  3	
  (fox)	
  
                                                          List	
  for	
  word	
  1	
  (the)	
       • doc	
  1	
  (at	
  posi*on	
  2)	
  
                                                                                                     	
  
    The	
  quick	
  fox
                      	
  
                                                          • doc	
  1	
  (at	
  posi*on	
  0)	
  
                                                           	
  
                                  • the	
  =	
  1	
  
                                   	
                     • doc	
  2	
  (at	
  posi*on	
  0)	
  
                                                           	
  
    Document	
  2	
               • quick	
  =	
  2	
  
                                   	
                     • doc	
  3	
  (at	
  posi*on	
  0)	
  
                                                           	
                                       List	
  for	
  word	
  4	
  (lazy)	
  
                                  • fox	
  =	
  3	
  
                                   	
  
    The	
  lazy	
  dog	
          • lazy	
  =	
  4	
  
                                   	
                                                               • doc	
  2	
  (at	
  posi*on	
  1)	
  
                                                                                                     	
  
                                  • dog	
  =	
  5	
  
                                   	
                     List	
  for	
  word	
  2	
  (quick)	
  
    Document	
  3	
                                       • doc	
  1	
  (at	
  posi*on	
  1)	
  
                                                           	
  
                                                          • doc	
  3	
  (at	
  posi*on	
  2)	
  
                                                           	
                                       List	
  for	
  word	
  5	
  (dog)	
  
                                                                                                    • doc	
  2	
  (at	
  posi*on	
  2)	
  
                                                                                                     	
  
The	
  dog	
  quick	
  dog	
                                                                        • doc	
  3	
  (at	
  posi*ons	
  1,	
  3)	
  
                                                                                                     	
  


                                                                                                                                          Exalead S.A. © 2010
                                                                                                                                              CONFIDENTIAL
Search	
  engines	
  
                                         Searching	
  with	
  inverted	
  lists	
  
Single	
  word	
  query	
  :	
  dog	
  

• Resolve	
  the	
  word	
  to	
  its	
  id	
  using	
  the	
  dic*onary	
  (wid	
  5)	
  
• Fetch	
  the	
  inverted	
  list	
  for	
  this	
  id	
  
• Simply	
  read	
  the	
  inverted	
  list	
  for	
  its	
  id	
  	
  
• We	
  have	
  the	
  hits:	
  document	
  2	
  and	
  document	
  3	
  

Boolean	
  query:	
  the	
  AND	
  dog	
  

• Resolve	
  words,	
  fetch	
  inverted	
  lists	
  
• The: 1,2,3                 Dog: 2,3
• Perform	
  intersec*on:	
  	
  hits	
  =	
  2,3	
  

Boolean	
  query	
  :	
  the	
  OR	
  dog	
  

• Resolve/fetch	
  
• Perform	
  union:	
  hits	
  =	
  1,	
  2,	
  3	
  

                                                                                             Exalead S.A. © 2010
                                                                                                 CONFIDENTIAL
Search	
  engines	
  
                          Searching	
  with	
  inverted	
  lists	
  
Posi*onal	
  query:	
  the	
  NEXT	
  dog	
  

• Fetch	
  the	
  inverted	
  lists	
  and	
  also	
  read	
  the	
  posi*ons	
  
• The : 1(0), 2(0), 3(0)
  Dog : 2(2), 3(1,3)
• Iden*fy	
  “simple	
  boolean”	
  matches:	
  docs	
  	
  2	
  and	
  3	
  
• For	
  each	
  possible	
  match,	
  	
  check	
  if	
  posi*ons	
  form	
  a	
  
  sequence	
  
• Only	
  document	
  3	
  matches	
  on	
  sequence	
  (0,1)	
  

• Posi*onal	
  queries	
  are	
  more	
  expensive	
  and	
  storing	
  
  word	
  posi*ons	
  is	
  expensive	
  (disk	
  space,	
  decoding	
  
  CPU,	
  I/O)	
  
                                                                                      Exalead S.A. © 2010
                                                                                          CONFIDENTIAL
The	
  revolu*on:	
  

THE	
  NOSQL	
  MOVEMENT	
  

                               28
NoSQL	
  Movement	
  


• «	
  NoSQL	
  »	
  ©	
  Eric	
  VANS	
  (Rackspace,	
  2009)	
  

        The	
  name	
  was	
  an	
  a=empt	
  to	
  describe	
  the	
  
         emergence	
  of	
  a	
  growing	
  number	
  of	
  non-­‐
     rela*onal,	
  distributed	
  data	
  stores	
  that	
  ozen	
  did	
  
         not	
  a=empt	
  to	
  provide	
  ACID	
  guarantees.
                                                            Wikipedia




                                                                              29
NoSQL	
  Movement:	
  Issue	
  


• RDBMS	
  fails	
  with	
  huge	
  amount	
  of	
  data	
  
    – Facebook’s	
  70TB	
  of	
  inbox	
  
    – Digg’s	
  3TB	
  
    – eBay’s	
  2PB…	
  
• High	
  scale	
  SQL	
  systems	
  are	
  either:	
  
    – Very	
  expensive	
  to	
  buy	
  and	
  quite	
  to	
  maintain	
  
    – Very	
  expensive	
  to	
  maintain	
  

                                                                             30
NoSQL	
  Movement	
  


• We	
  need	
  new	
  systems	
  that:	
  
   – Scales	
  horizontally	
  (both	
  read/write)	
  
   – Have	
  no	
  single	
  point	
  of	
  failure	
  
   – Are	
  fault	
  tolerant	
  
   – Are	
  elas*cs	
  (adding	
  nodes	
  is	
  easy)	
  
   – Have	
  flexible	
  data	
  schemas	
  
   – Are	
  more	
  web	
  applica*ons	
  friendly	
  

                                                             31
NoSQL:	
  Families	
  


• Different	
  types	
  of	
  data	
  stores:	
  
    – Key-­‐Value	
  stores	
  (Dynamo,	
  Redis,	
  Voldemort…)	
  
    – Column	
  stores	
  (BigTable,	
  Cassandra,	
  HBase…)	
  
    – Document	
  stores	
  (CouchDB,	
  MongoDB…)	
  
    – Graph	
  stores	
  (Neo4J,	
  Swarm…)	
  




                                                                  32
NoSQL:	
  Key-­‐Value	
  stores	
  
•  Distributed	
  hashtables	
  
    –  Btrees	
  
    –  Fixed	
  sized	
  tables	
  
•  Benefits:	
  
    –  Very	
  simple	
  API	
  (get/put/delete/range)	
  
    –  Easily	
  shardable	
  
    –  Fast	
  reads	
  
•  Drawbacks:	
  
    –  No	
  data	
  schema	
  (no	
  joins,	
  data	
  fla=ening…)	
  
    –  No	
  query	
  language	
  
•  Implems:	
  Redis,	
  Amazon	
  Dynamo,	
  Voldemort	
  
                                                                         33
NoSQL:	
  Column	
  Stores	
  

             Id	
          Lastname	
       Firstname	
     Salary	
  
             1	
           Smith	
          Joe	
           40000	
  
             2	
           Jones	
          Mary	
          50000	
  
             3	
           Johnson	
        Cathy	
         44000	
  


•  Row	
  based	
  storage:	
  
    –  1,Smith,Joe,40000;2,Jones,Mary,50000;3,Johnson,Cathy,44000;	
  


•  Column	
  based	
  storage:	
  
    –  1,2,3;Smith,Jones,Johnson;Joe,Mary,Cathy;40000,50000,44000;	
  


                                                                         34
NoSQL:	
  Column	
  Stores	
  
• Benefits:	
  
    – Reading	
  all	
  the	
  values	
  of	
  a	
  given	
  column	
  is	
  
      faster	
  (ex:	
  aggregates)	
  
    – Batch	
  writes	
  are	
  faster	
  
• Joins	
  are	
  faster	
  
    – Comparing	
  two	
  columns	
  is	
  sequen*al	
  
    – Much	
  more	
  L1	
  CPU	
  cache	
  hits	
  
    – L1	
  cache	
  reference:	
  0.5ns	
  
    – L2	
  cache	
  reference:	
  7ns	
  

                                                                                35
NoSQL:	
  Column	
  Stores	
  

• Drawbacks:	
  
    – Reading	
  a	
  single	
  object	
  is	
  slower	
  (mul*	
  ios)	
  
    – Wri*ng	
  a	
  single	
  object	
  is	
  slower	
  (mul*	
  ios)	
  
    – Doesn’t	
  fit	
  to	
  most	
  applica*ons	
  

•  Finally:	
  
    – Well	
  suited	
  for	
  heavy	
  write	
  /	
  read	
  applica*ons	
  
         •  (eg:	
  Facebook	
  inbox	
  indexes)	
  



                                                                              36
NoSQL:	
  Document	
  Stores	
  
• Can	
  be	
  seen	
  as	
  schema	
  free,	
  hierarchical	
  
  database	
  (usually	
  represented	
  as	
  JSON)	
  	
  

          SQL Schema:                                      Document store:
                                                        Person:
                                                        	
  -­‐	
  id
Person:                                                      - name
	
  -­‐	
  id            1                              	
  -­‐	
  address	
   	
  -­‐	
  id	
  
     - name                    Animal:                       - phone           	
  -­‐	
  person_id	
  
	
  -­‐	
  address	
           	
  -­‐	
  id	
               - animals = 	
  -­‐	
  name	
  
     - phone                 N - person_id                                     	
  -­‐	
  address	
  
                                    - name                                     	
  -­‐	
  phone	
  
                               	
  -­‐	
  address	
  
                                    - phone


                                                                                                          37
NoSQL:	
  Document	
  Stores	
  
• Benefits:	
  
   – Data	
  spa*ality	
  !	
  Everything	
  in	
  one	
  place	
  
   – Efficient	
  write	
  and	
  updates	
  (in	
  place)	
  
   – Efficient	
  read	
  
   – Highly	
  flexible	
  data	
  schema	
  
   – Usually	
  provides	
  indexes	
  over	
  each	
  object	
  key	
  
     to	
  have	
  powerful	
  query	
  language	
  
• Drawbacks	
  
   – Doesn’t	
  encourage	
  well	
  designed	
  data	
  schema	
  	
  

                                                                       38
NoSQL:	
  Graph	
  Stores	
  
• An	
  entry	
  is	
  a	
  node	
  
• Nodes	
  have	
  proper*es	
  
• Edges	
  are	
  links	
  between	
  nodes	
  	
  




                                                          39
NoSQL:	
  Graph	
  Stores	
  


• Benefits:	
  
   – Faster	
  to	
  fetch	
  an	
  entry	
  and	
  its	
  related	
  entries	
  
     (links	
  are	
  already	
  resolved,	
  no	
  need	
  to	
  join)	
  
   – Flexible	
  data	
  schema	
  
• Drawbacks:	
  
   – Complex	
  APIs	
  
   – Slow	
  for	
  batch	
  opera*ons	
  
   – Open	
  source	
  implems	
  are	
  not	
  that	
  good…	
  
                                                                                40
The	
  real	
  issues…	
  

SCALABILITY	
  IN	
  PRACTICE	
  

                                    41
CAP	
  Theorem	
  
• CAP:	
  
   – Consistency:	
  Opera*ng	
  fully	
  or	
  not	
  at	
  all.	
  
   – Availability:	
  The	
  service	
  must	
  be	
  reachable	
  at	
  
     any	
  *me.	
  
   – Par,,on	
  Tolerance:	
  No	
  set	
  of	
  failures	
  less	
  than	
  
     total	
  network	
  failure	
  is	
  allowed	
  to	
  cause	
  the	
  
     system	
  to	
  respond	
  incorrectly.	
  

  Any	
  shared-­‐data	
  system	
  can	
  only	
  achieve	
  two	
  of	
  
                           these	
  three.
                         CAP Theorem, Dr. Eric Brewer, Berkeley (2000)
                                                                              42
Consistent	
  Hashing	
  

• Ensuring	
  data	
  availability:	
  replica*on	
  !	
  
• Reaching	
  the	
  right	
  nodes	
  ?	
  Hashing	
  
• Consistent	
  hashing:	
  Hash	
  ring	
  
   – Objects	
  are	
  mapped	
  into	
  a	
  range	
  
   – Nodes	
  are	
  mapped	
  into	
  that	
  
     range	
  
   – We	
  write	
  the	
  object	
  into	
  the	
  
     nearest	
  node,	
  clockwise	
  


                                                                43
Data	
  consistency	
  
•  Ensuring	
  data	
  eventual	
  consistency:	
  Quorum	
  writes	
  
     –  W	
  =	
  number	
  of	
  writes	
  to	
  ensure	
  before	
  returning	
  OK	
  
     –  R	
  =	
  number	
  of	
  reads	
  to	
  ensure	
  
     –  N	
  =	
  replica*on	
  factor	
  


•  W	
  <	
  N	
  ==	
  High	
  write	
  availability	
  
     –  Data	
  may	
  be	
  lost	
  or	
  outdated	
  if	
  read	
  from	
  another	
  node	
  
•  R	
  <	
  N	
  ==	
  High	
  read	
  availability	
  
     –  Data	
  may	
  be	
  outdated	
  
•  W	
  +	
  R	
  >	
  N	
  ==	
  Full	
  consistency	
  !	
  
     –  But	
  slower	
  writes	
  /	
  reads	
  	
  	
  
                                                                                               44
Conflicts	
  resolu,on	
  


•  What	
  happens	
  when	
  R	
  >	
  1	
  and	
  two	
  different	
  versions	
  
   are	
  found	
  ?	
  
•  Conflict	
  resolu*on	
  !	
  
•  Common	
  algorithm:	
  


                             Vector	
  clocks	
  	
  	
  


                                                                              45
Vector	
  clocks	
  
• Assign	
  to	
  each	
  node	
  a	
  unique	
  ID	
  
• A	
  node	
  increments	
  its	
  own	
  vector	
  and	
  keep	
  
  track	
  of	
  the	
  old	
  entries	
  




                                                                       46
Elas,city:	
  Gossip	
  Membership	
  

• When	
  a	
  node	
  joins…	
  




                                                  47
Elas,city:	
  Gossip	
  Membership	
  

• When	
  a	
  node	
  crashes	
  !	
  




                                                   48
I’m	
  star*ng	
  the	
  next	
  big	
  startup…	
  


WHAT’S	
  THE	
  BEST	
  SYSTEM	
  ?	
  
Choosing	
  your	
  storage	
  system	
  

• “Don’t	
  op,mize	
  too	
  early”	
  
• MySQL	
  is	
  robust	
  and	
  works	
  VERY	
  well	
  
    – You’ll	
  know	
  where	
  bugs	
  come	
  from	
  (you)	
  
• Key-­‐Value	
  stores	
  are	
  hype,	
  and	
  o`en	
  badly	
  
  implemented	
  
• Anyway,	
  most	
  mature	
  “NoSQL”	
  systems:	
  
    – MongoDB	
  
    – Cassandra	
  	
  	
  	
  	
  

                                                                      50
Ques,ons	
  




?	
  

More Related Content

Similar to Exalead managing terrabytes

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Turning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseTurning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseMatthias Wahl
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management SystemAmar Myana
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsJohn Kunze
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index CookbookMYXPLAIN
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraEric Evans
 
NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.Tony Rogerson
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...DataStax Academy
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 

Similar to Exalead managing terrabytes (20)

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Turning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseTurning a Search Engine into a Relational Database
Turning a Search Engine into a Relational Database
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management System
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Exalead managing terrabytes

  • 1. Content   •  Introduc*on   •  Databases   –  ACID   –  Data  structures,  algorithms   –  Scalability  issues   –  Scaling  pa=erns   •  Search  engines   –  Data  structures,  algorithms   –  Pros  &  cons   •  NoSQL  Movement   –  Why  and  What   1
  • 2. Content   •  NoSQL  Families   –  Key  value  stores   –  Column  stores   –  Document  stores   –  Graph  DB   •  Principles:  CAP,  Scaling  pa=erns,  High  availability   pa=erns,  Elas*city   •  How  to  choose  ?   •  Conclusion   2
  • 3. Introduc,on   • Who  we  are:   – Clément  STENAC  (Indexing  and  search  techs)     – Jérémie  BORDIER  (360  team  (a  bit  of  everything))     • Exalead:   – Indexing  technologies  provider  since  1998   – Online  search  engine:  h=p://www.exalead.com   – Daily  challenge:  Tackle  informa*on  access   problems  for  large  companies.   3
  • 4. Introduc,on   • Universal  answer  to  data  storage:            RELATIONAL  DATABASES   • Well  known  data  representa*on:  Objects   and  rela*onships   • Powerful  query  language:  SQL   • Open  source  implementa*ons:   – MySQL   – PostgreSQL   – …   4
  • 5. Introduc,on   • Database  scalability  problems  ?   • Used  to  be  a  Telco  and  bank  problem…   • Un*l  the  internet  has  come  !   Twitter whale, 2008 5
  • 6. Introduc,on   • Thanks  to  the  internet…   • …millions  of  rows  is  frequent…   • …  real  *me  websites.   How  to  deal  with  massive  amount  of   structured  data  ?  Are  there  alterna*ves  ?   What’s  this  NoSQL  buzz  ?   6
  • 7. Knowing  your  enemy:   RELATIONAL  DATABASES   7
  • 8. Databases:  ACID   ACID  constraints   • Atomicity   • Transac*ons  succeed  or  fail  atomically   • Consistency   • Transac*ons  leave  the  database  in  a  consistent   state   • Isola,on   • Transac*ons  do  not  see  the  effects  of  concurrent   transac*ons   • Durability   • Once  a  transac*on  is  commi=ed,  it  can’t  be  lost  
  • 9. Database  structures   Primary  storage   CREATE TABLE author ( Heuris*cs  change  it   id INTEGER PRIMARY KEY, nick VARCHAR(16), Fixed size to  variable-­‐size   age INTEGER, firstname VARCHAR(128), biography TEXT); Variable size CREATE TABLE post ( Each  value  or  pointer   id INTEGER PRIMARY KEY, can  be  retrieved  at  a   author_id FOREIGN KEY REFERENCES author(id); timestamp TIMESTAMP, known  offset  in  the  row     title VARCHAR(256), text TEXT); Id age nick firstname biography Row 1 4 bytes 4 bytes 16 bytes pointer pointer Id age nick firstname biography Row 2 4 bytes 4 bytes 16 bytes pointer pointer Table strings len data len data len data len data
  • 10. Searching  in  a  database   SELECT * FROM author WHERE age=24; The  raw  way:  full  scan   • Enumerate  all  records  in  the  table   • For  each  record,  fetch  the  condi*on  value   • Inline  value:  direct  access  at  row_address + offset(column) • Outside  value  :  fetch  pointer  and  fetch  data   • Perform  comparison   Analysis   • Need  to  analyse  the  full  table   • Very  CPU  intensive   • If  the  table  does  not  fit  in  memory  ?  –  I/O  on  the  whole  table  
  • 11. Database  structures   Indexes   What  is  an  index  ?   • Primary  storage:  forward  mapping   row_id –> row data • Index  :  reverse  mapping   row data –> row_id(s) • Updated  together  with  the  primary  storage     Searching  with  an  index   • Retrieve  the  row  ids  using  the  index   • Fetch  the  row  data  from  primary  storage  
  • 12. Database  structures   Indexes  –  Hash  index   How  it  works   • Stores  hashes  of  column  values  in  as  hash-­‐table   • Retrieve  through  the  hash  table   Pros   • Very  easy  and  fast  to  update   • Fast  lookup  –  single  hashtable  lookup   Cons     • Only  provides  equality  matching   • Unable  to  answer  inequality  queries  
  • 13. Database  structures   Indexes  –  BTree  index   Binary search tree B-Tree Pros   • Provides  range  and  inequality  queries  easily   • Quite  fast  (logarithmic)  opera*ons   Cons     • More  complex    and  expensive  to  update   • B-­‐Tree  rebalancing  
  • 14. Choosing  how  to  search   Is  indexed  search  always  be=er  ?   • SELECT * from author where age < 300; Analysis   •  Fetch  of  whole  table   •  Index:  random  lookups   •  Full  scan  :  sequen*al  fetch   Choosing  wisely   • Iden*fy  the  expensive  queries   • Use  the  EXPLAIN  statement   • Only  add  indexes  where  they  are  required   • Indexes  are  expensive  to  update  
  • 15. Joining   Goal   • Put  together  data  from  several  tables   • For  some  values  in  table  A,  find  matching  values   in  table  B   Example   •  ELECT * FROM post S INNER JOIN author ON author.id = post.author_id WHERE author.age = 42;
  • 16. Join  algorithms   Nested  loops   • Foreach (author WHERE age=42) { Foreach(post) { if (post.author_id == author.id) { append post to the result set; } } } • Very  naive  algorithm  :  runs  in  PxA  *me   • Provides  all  predicates   Hash  join   • Algorithm   • Make  a  hashtable  of  author  ids  matching  the  «  age  =  42  »  condi*on   • Scan  once  the  post  table   • For  each  post,  lookup  in  the  hashtable  to  check  if  it  matches  a  valid  author     • Faster  than  nested  loops  (2  scans  instead  of  A)   • Requires  memory  to  store  the  hashtable   • Only  provides  equality  predicate  
  • 17. Join  algorithms   Merge  join   • Need  to  have  both  tables  sorted  by  join  key   • Post  sorted  by  author_id   • Author  sorted  by  id   • Perform  a  single  parallel  scan  of  the  two  tables  and  iden*fy  matches   • Fastest  algorithm,  but  needs  sorted  data   • Disk-­‐based  sort  for  large  data  sets   Choice  of  join  algorithm   • Performed  automa*cally  by  the  query  op*mizer  (EXPLAIN)   • Main  parameters:   • Rela*ons  cardinali*es   • Data  order  (presence  of  an  ORDER  BY  clause  ?)   • Available  indexes   • JOIN  are  always  expensive  -­‐>  schema  denormaliza,on  
  • 18. Database  scaling     Typical  workloads   Mostly  read  workloads   • Example:  Wikipedia   • First  solu*on:  high-­‐level  (frontend  *er)  caching   • Database  scaling  :  1  master  –  N  slaves   • Replica,on  of  changes  from  master  to  slaves   • Does  not  solve  the  write  bo=leneck  problem   High  write  workloads   • Examples:  credit  cards,                                        Twi=er  (>1000  tweets/second,  1000s  of  deliveries)   • Performance  limited  by  write  I/O  throughput   • Because  of  the  «  D  »  constraint   • Hard  to  have  more  than  1000-­‐2000  writes/second  
  • 19. Database  scaling     Scaling  writes   Mul*ple  master  setups   •  All  masters  have  the  same  data  and  share  the  updates   •  «  share-­‐all  »  cluster  architecture   •  Extremely  complex  synchroniza*on   •  Bi-­‐direc*onal  replica*on   •  Conflict  detec*on   •  Bad  performance   •  Complex  resilience   •  Down*me  of  a  master:  need  a  resync     •  Complex,  heavy  and  expensive  architectures   Bi-directional Client 1 Master replication flow Master Client 2 1 2
  • 20. Database  scaling     Scaling  writes   Sharding   • Split  the  data  between  the  masters  based  on  a   criterion   • Date   • User  id   •   hash(url),  …   • Clients  query  the  correct  master  for  each  data   • No  shared  data  between  masters  («  share-­‐nothing  »)   Client 1 Master Master 1 2 Client 2
  • 21. Database  scaling     Problems  with  SQL  sharding   Complexity   • Not  integrated  in  SQL   • Need  to  perform  the  sharding  in  applica*ve  code   Resilience   • Several  machines  but  no  resilience   • Loss  of  one  master  =  loss  of  data  (compare  to  RAID-­‐0)   Loss  of  features   • You  can’t  do  cross-­‐shard  joins   Complex  evolu*ons   • How  do  you  keep  scaling  ?   • To  add  another  machine,  you  need  to  change  the  distribu*on  func*on  
  • 22. Database  scaling     Other  SQL  shortcomings   Strict  schema   • It  is  good,  it  provides  strong  typing   • But,  migra*on  hell  !   • Web  applica*ons  changes  quickly   • Not  «  Agile  »  
  • 23. On  the  other  side:   SEARCH  ENGINES   23
  • 24. A  quick  look  at  search  engines   Differences  from  a  tradi*onal  database   • Not  designed  for  OLTP   • Update  by  batches   • No  transac*ons,  updates  are  available  to  readers   «  later  »   • Heavily  read-­‐op*mized   Full  text  search   • It’s  more  complex  than    LIKE ’%myword%’; • Need  specific  data  structures  
  • 25. Search  engines   Inverted  lists   What  is  is   • A  data  structure  mapping  a  «  word  iden*fier  »  to  a  list  of  «  document   iden*fier  »   • For  each  word  of  each  document,  store  the  posi*ons   Document  1   List  for  word  3  (fox)   List  for  word  1  (the)   • doc  1  (at  posi*on  2)     The  quick  fox   • doc  1  (at  posi*on  0)     • the  =  1     • doc  2  (at  posi*on  0)     Document  2   • quick  =  2     • doc  3  (at  posi*on  0)     List  for  word  4  (lazy)   • fox  =  3     The  lazy  dog   • lazy  =  4     • doc  2  (at  posi*on  1)     • dog  =  5     List  for  word  2  (quick)   Document  3   • doc  1  (at  posi*on  1)     • doc  3  (at  posi*on  2)     List  for  word  5  (dog)   • doc  2  (at  posi*on  2)     The  dog  quick  dog   • doc  3  (at  posi*ons  1,  3)     Exalead S.A. © 2010 CONFIDENTIAL
  • 26. Search  engines   Searching  with  inverted  lists   Single  word  query  :  dog   • Resolve  the  word  to  its  id  using  the  dic*onary  (wid  5)   • Fetch  the  inverted  list  for  this  id   • Simply  read  the  inverted  list  for  its  id     • We  have  the  hits:  document  2  and  document  3   Boolean  query:  the  AND  dog   • Resolve  words,  fetch  inverted  lists   • The: 1,2,3 Dog: 2,3 • Perform  intersec*on:    hits  =  2,3   Boolean  query  :  the  OR  dog   • Resolve/fetch   • Perform  union:  hits  =  1,  2,  3   Exalead S.A. © 2010 CONFIDENTIAL
  • 27. Search  engines   Searching  with  inverted  lists   Posi*onal  query:  the  NEXT  dog   • Fetch  the  inverted  lists  and  also  read  the  posi*ons   • The : 1(0), 2(0), 3(0) Dog : 2(2), 3(1,3) • Iden*fy  “simple  boolean”  matches:  docs    2  and  3   • For  each  possible  match,    check  if  posi*ons  form  a   sequence   • Only  document  3  matches  on  sequence  (0,1)   • Posi*onal  queries  are  more  expensive  and  storing   word  posi*ons  is  expensive  (disk  space,  decoding   CPU,  I/O)   Exalead S.A. © 2010 CONFIDENTIAL
  • 28. The  revolu*on:   THE  NOSQL  MOVEMENT   28
  • 29. NoSQL  Movement   • «  NoSQL  »  ©  Eric  VANS  (Rackspace,  2009)   The  name  was  an  a=empt  to  describe  the   emergence  of  a  growing  number  of  non-­‐ rela*onal,  distributed  data  stores  that  ozen  did   not  a=empt  to  provide  ACID  guarantees. Wikipedia 29
  • 30. NoSQL  Movement:  Issue   • RDBMS  fails  with  huge  amount  of  data   – Facebook’s  70TB  of  inbox   – Digg’s  3TB   – eBay’s  2PB…   • High  scale  SQL  systems  are  either:   – Very  expensive  to  buy  and  quite  to  maintain   – Very  expensive  to  maintain   30
  • 31. NoSQL  Movement   • We  need  new  systems  that:   – Scales  horizontally  (both  read/write)   – Have  no  single  point  of  failure   – Are  fault  tolerant   – Are  elas*cs  (adding  nodes  is  easy)   – Have  flexible  data  schemas   – Are  more  web  applica*ons  friendly   31
  • 32. NoSQL:  Families   • Different  types  of  data  stores:   – Key-­‐Value  stores  (Dynamo,  Redis,  Voldemort…)   – Column  stores  (BigTable,  Cassandra,  HBase…)   – Document  stores  (CouchDB,  MongoDB…)   – Graph  stores  (Neo4J,  Swarm…)   32
  • 33. NoSQL:  Key-­‐Value  stores   •  Distributed  hashtables   –  Btrees   –  Fixed  sized  tables   •  Benefits:   –  Very  simple  API  (get/put/delete/range)   –  Easily  shardable   –  Fast  reads   •  Drawbacks:   –  No  data  schema  (no  joins,  data  fla=ening…)   –  No  query  language   •  Implems:  Redis,  Amazon  Dynamo,  Voldemort   33
  • 34. NoSQL:  Column  Stores   Id   Lastname   Firstname   Salary   1   Smith   Joe   40000   2   Jones   Mary   50000   3   Johnson   Cathy   44000   •  Row  based  storage:   –  1,Smith,Joe,40000;2,Jones,Mary,50000;3,Johnson,Cathy,44000;   •  Column  based  storage:   –  1,2,3;Smith,Jones,Johnson;Joe,Mary,Cathy;40000,50000,44000;   34
  • 35. NoSQL:  Column  Stores   • Benefits:   – Reading  all  the  values  of  a  given  column  is   faster  (ex:  aggregates)   – Batch  writes  are  faster   • Joins  are  faster   – Comparing  two  columns  is  sequen*al   – Much  more  L1  CPU  cache  hits   – L1  cache  reference:  0.5ns   – L2  cache  reference:  7ns   35
  • 36. NoSQL:  Column  Stores   • Drawbacks:   – Reading  a  single  object  is  slower  (mul*  ios)   – Wri*ng  a  single  object  is  slower  (mul*  ios)   – Doesn’t  fit  to  most  applica*ons   •  Finally:   – Well  suited  for  heavy  write  /  read  applica*ons   •  (eg:  Facebook  inbox  indexes)   36
  • 37. NoSQL:  Document  Stores   • Can  be  seen  as  schema  free,  hierarchical   database  (usually  represented  as  JSON)     SQL Schema: Document store: Person:  -­‐  id Person: - name  -­‐  id 1  -­‐  address    -­‐  id   - name Animal: - phone  -­‐  person_id    -­‐  address    -­‐  id   - animals =  -­‐  name   - phone N - person_id  -­‐  address   - name  -­‐  phone    -­‐  address   - phone 37
  • 38. NoSQL:  Document  Stores   • Benefits:   – Data  spa*ality  !  Everything  in  one  place   – Efficient  write  and  updates  (in  place)   – Efficient  read   – Highly  flexible  data  schema   – Usually  provides  indexes  over  each  object  key   to  have  powerful  query  language   • Drawbacks   – Doesn’t  encourage  well  designed  data  schema     38
  • 39. NoSQL:  Graph  Stores   • An  entry  is  a  node   • Nodes  have  proper*es   • Edges  are  links  between  nodes     39
  • 40. NoSQL:  Graph  Stores   • Benefits:   – Faster  to  fetch  an  entry  and  its  related  entries   (links  are  already  resolved,  no  need  to  join)   – Flexible  data  schema   • Drawbacks:   – Complex  APIs   – Slow  for  batch  opera*ons   – Open  source  implems  are  not  that  good…   40
  • 41. The  real  issues…   SCALABILITY  IN  PRACTICE   41
  • 42. CAP  Theorem   • CAP:   – Consistency:  Opera*ng  fully  or  not  at  all.   – Availability:  The  service  must  be  reachable  at   any  *me.   – Par,,on  Tolerance:  No  set  of  failures  less  than   total  network  failure  is  allowed  to  cause  the   system  to  respond  incorrectly.   Any  shared-­‐data  system  can  only  achieve  two  of   these  three. CAP Theorem, Dr. Eric Brewer, Berkeley (2000) 42
  • 43. Consistent  Hashing   • Ensuring  data  availability:  replica*on  !   • Reaching  the  right  nodes  ?  Hashing   • Consistent  hashing:  Hash  ring   – Objects  are  mapped  into  a  range   – Nodes  are  mapped  into  that   range   – We  write  the  object  into  the   nearest  node,  clockwise   43
  • 44. Data  consistency   •  Ensuring  data  eventual  consistency:  Quorum  writes   –  W  =  number  of  writes  to  ensure  before  returning  OK   –  R  =  number  of  reads  to  ensure   –  N  =  replica*on  factor   •  W  <  N  ==  High  write  availability   –  Data  may  be  lost  or  outdated  if  read  from  another  node   •  R  <  N  ==  High  read  availability   –  Data  may  be  outdated   •  W  +  R  >  N  ==  Full  consistency  !   –  But  slower  writes  /  reads       44
  • 45. Conflicts  resolu,on   •  What  happens  when  R  >  1  and  two  different  versions   are  found  ?   •  Conflict  resolu*on  !   •  Common  algorithm:   Vector  clocks       45
  • 46. Vector  clocks   • Assign  to  each  node  a  unique  ID   • A  node  increments  its  own  vector  and  keep   track  of  the  old  entries   46
  • 47. Elas,city:  Gossip  Membership   • When  a  node  joins…   47
  • 48. Elas,city:  Gossip  Membership   • When  a  node  crashes  !   48
  • 49. I’m  star*ng  the  next  big  startup…   WHAT’S  THE  BEST  SYSTEM  ?  
  • 50. Choosing  your  storage  system   • “Don’t  op,mize  too  early”   • MySQL  is  robust  and  works  VERY  well   – You’ll  know  where  bugs  come  from  (you)   • Key-­‐Value  stores  are  hype,  and  o`en  badly   implemented   • Anyway,  most  mature  “NoSQL”  systems:   – MongoDB   – Cassandra           50