• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
DotNetToscana: NoSQL Revolution - Scalability
 

DotNetToscana: NoSQL Revolution - Scalability

on

  • 383 views

http://www.dotnettoscana.org/nosql-revolution.aspx

http://www.dotnettoscana.org/nosql-revolution.aspx

Statistics

Views

Total Views
383
Views on SlideShare
383
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    DotNetToscana: NoSQL Revolution - Scalability DotNetToscana: NoSQL Revolution - Scalability Presentation Transcript

    • ScalabilityNicola Baldihttp://it.linkedin.com/in/nicolabaldiLuigi Berrettinihttp://it.linkedin.com/in/luigiberrettini
    • The need for speed15/12/2012 Scalability 2
    • Companies continuously increase More and more data and traffic More and more computing resources needed SOLUTION SCALING15/12/2012 Scalability – The need for speed 3
    • vertical scalability = scale up  single server  performance ⇒ more resources (CPUs, storage, memory)  volumes increase ⇒ more difficult and expensive to scale  not reliable: individual machine failures are common horizontal scalability = scale out  cluster of servers  performance ⇒ more servers  cheaper hardware (more likely to fail)  volumes increase ⇒ complexity ~ constant, costs ~ linear  reliability: CAN operate despite failures  complex: use only if benefits are compelling15/12/2012 Scalability – The need for speed 4
    • Vertical scalability15/12/2012 Scalability 5
    • All data on a single node Use cases  data usage = mostly processing aggregates  many graph databases Pros/Cons  RDBMSs or NoSQL databases  simplest and most often recommended option  only vertical scalability15/12/2012 Scalability – Vertical scalability 6
    • Horizontal scalability Architectures and distribution models15/12/2012 Scalability 7
    • Shared everything  every node has access to all data  all nodes share memory and disk storage  used on some RDBMSs15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 8
    • Shared disk  every node has access to all data  all nodes share disk storage  used on some RDBMSs15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 9
    • Shared nothing  nodes are independent and self-sufficient  no shared memory or disk storage  used on some RDBMSs and all NoSQL databases15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 10
    • Sharding different data put on different nodes Replication same data copied over multiple nodes Sharding + replication the two orthogonal techniques combined15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 11
    • Different parts of the data onto different nodes  data accessed together (aggregates) are on the same node  clumps arranged by physical location, to keep load even, or according to any domain-specific access rule R W R W W R A B C F E D H G I Shard Shard Shard15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 12
    • Use cases  different people access different parts of the dataset  to horizontally scale writes Pros/Cons  “manual” sharding with every RDBMS or NoSQL store  better read performance  better write performance  low resilience: all but failing node data available  high licensing costs for RDBMSs  difficult or impossible cluster-level operations (querying, transactions, consistency controls)15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 13
    • Data replicated across multiple nodes  One designated master (primary) node • contains the original • processes writes and passes them on  All other nodes are slave (secondary) • contain the copies • synchronized with the master during a replication process15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 14
    • R R A A B B C C Slave Slave R W A MASTER-SLAVE REPLICATION B C Master15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 15
    • Use cases  load balancing cluster: data usage mostly read-intensive  failover cluster: single server with hot backup Pros/Cons  better read performance  worse write performance (write management)  high read (slave) resilience: master failure ⇒ slaves can still handle read requests  low write (master) resilience: master failure ⇒ no writes until old/new master is up  read inconsistencies: update not propagated to all slaves  master = bottleneck and single point of failure  high licensing costs for RDBMSs15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 16
    • Data replicated across multiple nodes  All nodes are peer (equal weight): no master, no slaves  All nodes can both read and write15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 17
    • R W R W A A B B C C Peer Peer R W A B C Peer15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 18
    • Use cases  load balancing cluster: data usage read/write-intensive  need to scale out more easily Pros/Cons  better read performance  better write performance  high resilience: node failure ⇒ reads/writes handled by other nodes  read inconsistencies: update not propagated to all nodes  write inconsistencies: same record at the same time  high licensing costs for RDBMSs15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 19
    • Sharding + master-slave replication  multiple masters  each data item has a single master  node configurations: • master • slave • master for some data / slave for other data Sharding + peer-to-peer replication15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 20
    • R W R W R A B C F E D H G I Master 1 Master/Slave 2 Slave 3 R R W W R A B C F E D H G I Slave 1 Slave/Master 2 Master 315/12/2012 Scalability – Horizontal scalability: architectures and distribution models 21
    • R W R W R W A B C F E D H G I Peer 1/2 Peer 3/4 Peer 5/6 R W R W W R A B C F H D E G I Peer 1/4 Peer 2/3 Peer 5/615/12/2012 Scalability – Horizontal scalability: architectures and distribution models 22
    • Oracle Database Oracle RAC shared everything Microsoft SQL Server All editions shared nothing master-slave replication IBM DB2 DB2 pureScale shared disk DB2 HADR shared nothing master-slave replication (failover cluster)15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 23
    • Oracle MySQL MySQL Cluster shared nothing sharding, replication, sharding + replication The PostgreSQL Global Development Group PostgreSQL PGCluster-II shared disk Postgres-XC shared nothing sharding, replication, sharding + replication15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 24
    • Horizontal scalability Consistency15/12/2012 Scalability 25
    • Inconsistent write = write-write conflict multiple writes of the same data at the same time (highly likely with peer-to-peer replication) Inconsistent read = read-write conflict read in the middle of someone else’s write15/12/2012 Scalability – Horizontal scalability: consistency 26
    •  Pessimistic approach prevent conflicts from occurring  Optimistic approach detect conflicts and fix them15/12/2012 Scalability – Horizontal scalability: consistency 27
    • Implementation  write locks ⇒ acquire a lock before updating a value (only one lock at a time can be tacken) Pros/Cons  often severely degrade system responsiveness  often leads to deadlocks (hard to prevent/debug)  rely on a consistent serialization of the updates* * sequential consistency ensuring that all nodes apply operations in the same order15/12/2012 Scalability – Horizontal scalability: consistency 28
    • Implementation  conditional updates ⇒ test a value before updating it (to see if its changed since the last read)  merged updates ⇒ merge conflicted updates somehow (save updates, record conflict and merge somehow) Pros/Cons  conditional updates rely on a consistent serialization of the updates* * sequential consistency ensuring that all nodes apply operations in the same order15/12/2012 Scalability – Horizontal scalability: consistency 29
    •  Logical consistency different data make sense together  Replication consistency same data ⇒ same value on different replicas  Read-your-writes consistency users continue seeing their updates15/12/2012 Scalability – Horizontal scalability: consistency 30
    • ACID transactions ⇒ aggregate-ignorant DBsPartially atomic updates ⇒ aggregate-oriented DBs  atomic updates within an aggregate  no atomic updates between aggregates  updates of multiple aggregates: inconsistency window  replication can lengthen inconsistency windows15/12/2012 Scalability – Horizontal scalability: consistency 31
    • Eventual consistency  nodes may have replication inconsistencies: stale (out of date) data  eventually all nodes will be synchronized15/12/2012 Scalability – Horizontal scalability: consistency 32
    • Session consistency  within a user’s session there is read-your-writes consistency (no stale data read from a node after an update on another one)  consistency lost if • session ends • the system is accessed simultaneously from different PCs  implementations • sticky session/session affinity = sessions tied to one node  affects load balancing  quite intricate with master-slave replication • version stamps  track latest version stamp seen by a session  ensure that all interactions with the data store include it15/12/2012 Scalability – Horizontal scalability: consistency 33
    • Horizontal scalability CAP theorem15/12/2012 Scalability 34
    • Consistencyall nodes see the same data at the same timeLatencythe response time in interactions between nodesAvailability every nonfailing node must reply to requests the limit of latency that we are prepared to tolerate: once latency gets too high, we give up and treat data as unavailablePartition tolerancethe cluster can survive communication breakages(separating it into partitions unable to communicate with each other)15/12/2012 Scalability – Horizontal scalability: CAP theorem 35
    • 1) read(A) 2) A = A – 50 Transaction to transfer $50 3) write(A) from account A to account B 4) read(B) 5) B = B + 50 6) write(B) Atomicity • transaction fails after 3 and before 6 ⇒ the system should ensure that its updates are not reflected in the database Consistency • A + B is unchanged by the execution of the transaction15/12/2012 Scalability – Horizontal scalability: CAP theorem 36
    • 1) read(A) 2) A = A – 50 Transaction to transfer $50 3) write(A) from account A to account B 4) read(B) 5) B = B + 50 6) write(B) Isolation • another transaction will see inconsistent data between 3 and 6 (A + B will be less than it should be) • Isolation can be ensured trivially by running transactions serially ⇒ performance issue Durability • user notified that transaction completed ($50 transferred) ⇒ transaction updates must persist despite failures15/12/2012 Scalability – Horizontal scalability: CAP theorem 37
    • Basically Available Soft state Eventually consistent Soft state and eventual consistency are techniques that work well in the presence of partitions and thus promote availability15/12/2012 Scalability – Horizontal scalability: CAP theorem 38
    • Given the three properties of Consistency, Availability and Partition tolerance, you can only get two15/12/2012 Scalability – Horizontal scalability: CAP theorem 39
    • C being up and keeping consistency is reasonable A one node: if it’s up it’s available P a single machine can’t partition15/12/2012 Scalability – Horizontal scalability: CAP theorem 40
    • AP ( C ) partition ⇒ update on one node = inconsistency15/12/2012 Scalability – Horizontal scalability: CAP theorem 41
    • CP ( A ) partition ⇒ consistency only if one nonfailing node stops replying to requests15/12/2012 Scalability – Horizontal scalability: CAP theorem 42
    • CA ( P ) nodes communicate ⇒ C and A can be preserved partition ⇒ all nodes on one partition must be turned off (failing nodes preserve A) difficult and expensive15/12/2012 Scalability – Horizontal scalability: CAP theorem 43
    • ACID databases focus on consistency first and availability second BASE databases focus on availability first and consistency second15/12/2012 Scalability – Horizontal scalability: CAP theorem 44
    • Single server  no partitions  consistency versus performance: relaxed isolation levels or no transactions Cluster  consistency versus latency/availability  durability versus performance (e.g. in memory DBs)  durability versus latency (e.g. the master acknowledges the update to the client only after having been acknowledged by some slaves)15/12/2012 Scalability – Horizontal scalability: CAP theorem 45
    • strong write consistency ⇒ write to the master strong read consistency ⇒ read from the master15/12/2012 Scalability – Horizontal scalability: CAP theorem 46
    • N = replication factor (nodes involved in replication NOT nodes in the cluster)W = nodes confirming a writeR = nodes needed for a consistent readwrite quorum: W > N/2 read quorum: R + W > NConsistency is on a per operation basisChoose the most appropriate combination ofproblems and advantages15/12/2012 Scalability – Horizontal scalability: CAP theorem 47