ScalabilityNicola Baldihttp://it.linkedin.com/in/nicolabaldiLuigi Berrettinihttp://it.linkedin.com/in/luigiberrettini
The need for speed15/12/2012           Scalability   2
Companies continuously increase      More and more data and traffic      More and more computing resources needed         ...
vertical scalability = scale up   single server   performance ⇒ more resources (CPUs, storage, memory)   volumes increa...
Vertical scalability15/12/2012           Scalability    5
All data on a single node  Use cases    data usage = mostly processing aggregates    many graph databases  Pros/Cons   ...
Horizontal scalability  Architectures and  distribution models15/12/2012      Scalability   7
Shared everything    every node has access to all data    all nodes share memory and disk storage    used on some RDBMS...
Shared disk    every node has access to all data    all nodes share disk storage    used on some RDBMSs15/12/2012      ...
Shared nothing    nodes are independent and self-sufficient    no shared memory or disk storage    used on some RDBMSs ...
Sharding       different data put on different nodes       Replication       same data copied over multiple nodes       Sh...
Different parts of the data onto different nodes  data accessed together (aggregates) are on the same node  clumps arran...
Use cases    different people access different parts of the dataset    to horizontally scale writes  Pros/Cons    “manu...
Data replicated across multiple nodes  One designated master (primary) node   • contains the original   • processes write...
R                                                          R                                  A                           ...
Use cases   load balancing cluster: data usage mostly read-intensive   failover cluster: single server with hot backup P...
Data replicated across multiple nodes     All nodes are peer (equal weight): no master, no slaves     All nodes can both...
R       W                                                                                 R   W      A                    ...
Use cases    load balancing cluster: data usage read/write-intensive    need to scale out more easily  Pros/Cons    bet...
Sharding + master-slave replication    multiple masters    each data item has a single master    node configurations:  ...
R       W                             R       W                                                                           ...
R   W                             R       W                                    R      W          A                        ...
Oracle Database   Oracle RAC               shared everything  Microsoft SQL Server   All editions    shared nothing       ...
Oracle MySQL   MySQL Cluster            shared nothing                            sharding, replication, sharding + replic...
Horizontal scalability  Consistency15/12/2012      Scalability   25
Inconsistent write = write-write conflict  multiple writes of the same data at the same time  (highly likely with peer-to-...
 Pessimistic approach      prevent conflicts from occurring     Optimistic approach      detect conflicts and fix them15...
Implementation   write locks ⇒ acquire a lock before updating a value    (only one lock at a time can be tacken) Pros/Con...
Implementation   conditional updates ⇒ test a value before updating it    (to see if its changed since the last read)   ...
 Logical consistency      different data make sense together     Replication consistency      same data ⇒ same value on ...
ACID transactions ⇒ aggregate-ignorant DBsPartially atomic updates ⇒ aggregate-oriented DBs  atomic updates within an agg...
Eventual consistency     nodes may have replication inconsistencies:       stale (out of date) data     eventually all n...
Session consistency   within a user’s session there is read-your-writes consistency    (no stale data read from a node af...
Horizontal scalability  CAP theorem15/12/2012      Scalability   34
Consistencyall nodes see the same data at the same timeLatencythe response time in interactions between nodesAvailability...
1) read(A)                                                                           2) A = A – 50   Transaction to transf...
1) read(A)                                                                           2) A = A – 50   Transaction to transf...
Basically Available                 Soft state                 Eventually consistent     Soft state and eventual consisten...
Given the three properties of             Consistency, Availability and                  Partition tolerance,             ...
C  being up and keeping consistency is reasonable  A  one node: if it’s up it’s available  P  a single machine can’t parti...
AP ( C )   partition ⇒ update on one node = inconsistency15/12/2012        Scalability – Horizontal scalability: CAP theor...
CP ( A )   partition ⇒ consistency only if one nonfailing               node stops replying to requests15/12/2012         ...
CA ( P )  nodes communicate ⇒ C and A can be preserved  partition ⇒ all nodes on one partition must be              turned...
ACID databases  focus on consistency first and availability second  BASE databases  focus on availability first and consis...
Single server    no partitions    consistency versus performance: relaxed isolation     levels or no transactions  Clust...
strong write consistency ⇒ write to the master  strong read consistency ⇒ read from the master15/12/2012        Scalabilit...
N = replication factor         (nodes involved in replication NOT nodes in the cluster)W = nodes confirming a writeR = nod...
Upcoming SlideShare
Loading in …5
×

DotNetToscana: NoSQL Revolution - Scalability

505 views

Published on

http://www.dotnettoscana.org/nosql-revolution.aspx

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
505
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

DotNetToscana: NoSQL Revolution - Scalability

  1. 1. ScalabilityNicola Baldihttp://it.linkedin.com/in/nicolabaldiLuigi Berrettinihttp://it.linkedin.com/in/luigiberrettini
  2. 2. The need for speed15/12/2012 Scalability 2
  3. 3. Companies continuously increase More and more data and traffic More and more computing resources needed SOLUTION SCALING15/12/2012 Scalability – The need for speed 3
  4. 4. vertical scalability = scale up  single server  performance ⇒ more resources (CPUs, storage, memory)  volumes increase ⇒ more difficult and expensive to scale  not reliable: individual machine failures are common horizontal scalability = scale out  cluster of servers  performance ⇒ more servers  cheaper hardware (more likely to fail)  volumes increase ⇒ complexity ~ constant, costs ~ linear  reliability: CAN operate despite failures  complex: use only if benefits are compelling15/12/2012 Scalability – The need for speed 4
  5. 5. Vertical scalability15/12/2012 Scalability 5
  6. 6. All data on a single node Use cases  data usage = mostly processing aggregates  many graph databases Pros/Cons  RDBMSs or NoSQL databases  simplest and most often recommended option  only vertical scalability15/12/2012 Scalability – Vertical scalability 6
  7. 7. Horizontal scalability Architectures and distribution models15/12/2012 Scalability 7
  8. 8. Shared everything  every node has access to all data  all nodes share memory and disk storage  used on some RDBMSs15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 8
  9. 9. Shared disk  every node has access to all data  all nodes share disk storage  used on some RDBMSs15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 9
  10. 10. Shared nothing  nodes are independent and self-sufficient  no shared memory or disk storage  used on some RDBMSs and all NoSQL databases15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 10
  11. 11. Sharding different data put on different nodes Replication same data copied over multiple nodes Sharding + replication the two orthogonal techniques combined15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 11
  12. 12. Different parts of the data onto different nodes  data accessed together (aggregates) are on the same node  clumps arranged by physical location, to keep load even, or according to any domain-specific access rule R W R W W R A B C F E D H G I Shard Shard Shard15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 12
  13. 13. Use cases  different people access different parts of the dataset  to horizontally scale writes Pros/Cons  “manual” sharding with every RDBMS or NoSQL store  better read performance  better write performance  low resilience: all but failing node data available  high licensing costs for RDBMSs  difficult or impossible cluster-level operations (querying, transactions, consistency controls)15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 13
  14. 14. Data replicated across multiple nodes  One designated master (primary) node • contains the original • processes writes and passes them on  All other nodes are slave (secondary) • contain the copies • synchronized with the master during a replication process15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 14
  15. 15. R R A A B B C C Slave Slave R W A MASTER-SLAVE REPLICATION B C Master15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 15
  16. 16. Use cases  load balancing cluster: data usage mostly read-intensive  failover cluster: single server with hot backup Pros/Cons  better read performance  worse write performance (write management)  high read (slave) resilience: master failure ⇒ slaves can still handle read requests  low write (master) resilience: master failure ⇒ no writes until old/new master is up  read inconsistencies: update not propagated to all slaves  master = bottleneck and single point of failure  high licensing costs for RDBMSs15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 16
  17. 17. Data replicated across multiple nodes  All nodes are peer (equal weight): no master, no slaves  All nodes can both read and write15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 17
  18. 18. R W R W A A B B C C Peer Peer R W A B C Peer15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 18
  19. 19. Use cases  load balancing cluster: data usage read/write-intensive  need to scale out more easily Pros/Cons  better read performance  better write performance  high resilience: node failure ⇒ reads/writes handled by other nodes  read inconsistencies: update not propagated to all nodes  write inconsistencies: same record at the same time  high licensing costs for RDBMSs15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 19
  20. 20. Sharding + master-slave replication  multiple masters  each data item has a single master  node configurations: • master • slave • master for some data / slave for other data Sharding + peer-to-peer replication15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 20
  21. 21. R W R W R A B C F E D H G I Master 1 Master/Slave 2 Slave 3 R R W W R A B C F E D H G I Slave 1 Slave/Master 2 Master 315/12/2012 Scalability – Horizontal scalability: architectures and distribution models 21
  22. 22. R W R W R W A B C F E D H G I Peer 1/2 Peer 3/4 Peer 5/6 R W R W W R A B C F H D E G I Peer 1/4 Peer 2/3 Peer 5/615/12/2012 Scalability – Horizontal scalability: architectures and distribution models 22
  23. 23. Oracle Database Oracle RAC shared everything Microsoft SQL Server All editions shared nothing master-slave replication IBM DB2 DB2 pureScale shared disk DB2 HADR shared nothing master-slave replication (failover cluster)15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 23
  24. 24. Oracle MySQL MySQL Cluster shared nothing sharding, replication, sharding + replication The PostgreSQL Global Development Group PostgreSQL PGCluster-II shared disk Postgres-XC shared nothing sharding, replication, sharding + replication15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 24
  25. 25. Horizontal scalability Consistency15/12/2012 Scalability 25
  26. 26. Inconsistent write = write-write conflict multiple writes of the same data at the same time (highly likely with peer-to-peer replication) Inconsistent read = read-write conflict read in the middle of someone else’s write15/12/2012 Scalability – Horizontal scalability: consistency 26
  27. 27.  Pessimistic approach prevent conflicts from occurring  Optimistic approach detect conflicts and fix them15/12/2012 Scalability – Horizontal scalability: consistency 27
  28. 28. Implementation  write locks ⇒ acquire a lock before updating a value (only one lock at a time can be tacken) Pros/Cons  often severely degrade system responsiveness  often leads to deadlocks (hard to prevent/debug)  rely on a consistent serialization of the updates* * sequential consistency ensuring that all nodes apply operations in the same order15/12/2012 Scalability – Horizontal scalability: consistency 28
  29. 29. Implementation  conditional updates ⇒ test a value before updating it (to see if its changed since the last read)  merged updates ⇒ merge conflicted updates somehow (save updates, record conflict and merge somehow) Pros/Cons  conditional updates rely on a consistent serialization of the updates* * sequential consistency ensuring that all nodes apply operations in the same order15/12/2012 Scalability – Horizontal scalability: consistency 29
  30. 30.  Logical consistency different data make sense together  Replication consistency same data ⇒ same value on different replicas  Read-your-writes consistency users continue seeing their updates15/12/2012 Scalability – Horizontal scalability: consistency 30
  31. 31. ACID transactions ⇒ aggregate-ignorant DBsPartially atomic updates ⇒ aggregate-oriented DBs  atomic updates within an aggregate  no atomic updates between aggregates  updates of multiple aggregates: inconsistency window  replication can lengthen inconsistency windows15/12/2012 Scalability – Horizontal scalability: consistency 31
  32. 32. Eventual consistency  nodes may have replication inconsistencies: stale (out of date) data  eventually all nodes will be synchronized15/12/2012 Scalability – Horizontal scalability: consistency 32
  33. 33. Session consistency  within a user’s session there is read-your-writes consistency (no stale data read from a node after an update on another one)  consistency lost if • session ends • the system is accessed simultaneously from different PCs  implementations • sticky session/session affinity = sessions tied to one node  affects load balancing  quite intricate with master-slave replication • version stamps  track latest version stamp seen by a session  ensure that all interactions with the data store include it15/12/2012 Scalability – Horizontal scalability: consistency 33
  34. 34. Horizontal scalability CAP theorem15/12/2012 Scalability 34
  35. 35. Consistencyall nodes see the same data at the same timeLatencythe response time in interactions between nodesAvailability every nonfailing node must reply to requests the limit of latency that we are prepared to tolerate: once latency gets too high, we give up and treat data as unavailablePartition tolerancethe cluster can survive communication breakages(separating it into partitions unable to communicate with each other)15/12/2012 Scalability – Horizontal scalability: CAP theorem 35
  36. 36. 1) read(A) 2) A = A – 50 Transaction to transfer $50 3) write(A) from account A to account B 4) read(B) 5) B = B + 50 6) write(B) Atomicity • transaction fails after 3 and before 6 ⇒ the system should ensure that its updates are not reflected in the database Consistency • A + B is unchanged by the execution of the transaction15/12/2012 Scalability – Horizontal scalability: CAP theorem 36
  37. 37. 1) read(A) 2) A = A – 50 Transaction to transfer $50 3) write(A) from account A to account B 4) read(B) 5) B = B + 50 6) write(B) Isolation • another transaction will see inconsistent data between 3 and 6 (A + B will be less than it should be) • Isolation can be ensured trivially by running transactions serially ⇒ performance issue Durability • user notified that transaction completed ($50 transferred) ⇒ transaction updates must persist despite failures15/12/2012 Scalability – Horizontal scalability: CAP theorem 37
  38. 38. Basically Available Soft state Eventually consistent Soft state and eventual consistency are techniques that work well in the presence of partitions and thus promote availability15/12/2012 Scalability – Horizontal scalability: CAP theorem 38
  39. 39. Given the three properties of Consistency, Availability and Partition tolerance, you can only get two15/12/2012 Scalability – Horizontal scalability: CAP theorem 39
  40. 40. C being up and keeping consistency is reasonable A one node: if it’s up it’s available P a single machine can’t partition15/12/2012 Scalability – Horizontal scalability: CAP theorem 40
  41. 41. AP ( C ) partition ⇒ update on one node = inconsistency15/12/2012 Scalability – Horizontal scalability: CAP theorem 41
  42. 42. CP ( A ) partition ⇒ consistency only if one nonfailing node stops replying to requests15/12/2012 Scalability – Horizontal scalability: CAP theorem 42
  43. 43. CA ( P ) nodes communicate ⇒ C and A can be preserved partition ⇒ all nodes on one partition must be turned off (failing nodes preserve A) difficult and expensive15/12/2012 Scalability – Horizontal scalability: CAP theorem 43
  44. 44. ACID databases focus on consistency first and availability second BASE databases focus on availability first and consistency second15/12/2012 Scalability – Horizontal scalability: CAP theorem 44
  45. 45. Single server  no partitions  consistency versus performance: relaxed isolation levels or no transactions Cluster  consistency versus latency/availability  durability versus performance (e.g. in memory DBs)  durability versus latency (e.g. the master acknowledges the update to the client only after having been acknowledged by some slaves)15/12/2012 Scalability – Horizontal scalability: CAP theorem 45
  46. 46. strong write consistency ⇒ write to the master strong read consistency ⇒ read from the master15/12/2012 Scalability – Horizontal scalability: CAP theorem 46
  47. 47. N = replication factor (nodes involved in replication NOT nodes in the cluster)W = nodes confirming a writeR = nodes needed for a consistent readwrite quorum: W > N/2 read quorum: R + W > NConsistency is on a per operation basisChoose the most appropriate combination ofproblems and advantages15/12/2012 Scalability – Horizontal scalability: CAP theorem 47

×