SlideShare a Scribd company logo
Scalability




Nicola Baldi
http://it.linkedin.com/in/nicolabaldi
Luigi Berrettini
http://it.linkedin.com/in/luigiberrettini
The need for speed




15/12/2012           Scalability   2
Companies continuously increase

      More and more data and traffic

      More and more computing resources needed

                            SOLUTION



                         SCALING
15/12/2012            Scalability – The need for speed   3
vertical scalability = scale up
   single server
   performance ⇒ more resources (CPUs, storage, memory)
   volumes increase ⇒ more difficult and expensive to scale
   not reliable: individual machine failures are common

 horizontal scalability = scale out
   cluster of servers
   performance ⇒ more servers
   cheaper hardware (more likely to fail)
   volumes increase ⇒ complexity ~ constant, costs ~ linear
   reliability: CAN operate despite failures
   complex: use only if benefits are compelling

15/12/2012             Scalability – The need for speed        4
Vertical scalability




15/12/2012           Scalability    5
All data on a single node

  Use cases
    data usage = mostly processing aggregates
    many graph databases


  Pros/Cons
    RDBMSs or NoSQL databases
    simplest and most often recommended option
    only vertical scalability



15/12/2012             Scalability – Vertical scalability   6
Horizontal scalability

  Architectures and
  distribution models



15/12/2012      Scalability   7
Shared everything
    every node has access to all data
    all nodes share memory and disk storage
    used on some RDBMSs




15/12/2012     Scalability – Horizontal scalability: architectures and distribution models   8
Shared disk
    every node has access to all data
    all nodes share disk storage
    used on some RDBMSs




15/12/2012      Scalability – Horizontal scalability: architectures and distribution models   9
Shared nothing
    nodes are independent and self-sufficient
    no shared memory or disk storage
    used on some RDBMSs and all NoSQL databases




15/12/2012     Scalability – Horizontal scalability: architectures and distribution models   10
Sharding
       different data put on different nodes

       Replication
       same data copied over multiple nodes

       Sharding + replication
       the two orthogonal techniques combined


15/12/2012     Scalability – Horizontal scalability: architectures and distribution models   11
Different parts of the data onto different nodes
  data accessed together (aggregates) are on the same node
  clumps arranged by physical location, to keep load
   even, or according to any domain-specific access rule


             R   W                              R       W                                          W
                                                                                             R

               A                                B                                            C
               F                                E                                            D
               H                                G                                            I
             Shard                            Shard                                        Shard


15/12/2012           Scalability – Horizontal scalability: architectures and distribution models       12
Use cases
    different people access different parts of the dataset
    to horizontally scale writes

  Pros/Cons
    “manual” sharding with every RDBMS or NoSQL store
    better read performance
    better write performance
    low resilience: all but failing node data available
    high licensing costs for RDBMSs
    difficult or impossible cluster-level operations
     (querying, transactions, consistency controls)


15/12/2012     Scalability – Horizontal scalability: architectures and distribution models   13
Data replicated across multiple nodes

  One designated master (primary) node
   • contains the original
   • processes writes and passes them on

  All other nodes are slave (secondary)
   • contain the copies
   • synchronized with the master during a replication process




15/12/2012     Scalability – Horizontal scalability: architectures and distribution models   14
R                                                          R

                                  A                                                              A
                                  B                                                              B
                                  C                                                              C
                                Slave                                                          Slave
         R   W


        A
                                  MASTER-SLAVE REPLICATION
        B
        C
      Master

15/12/2012       Scalability – Horizontal scalability: architectures and distribution models           15
Use cases
   load balancing cluster: data usage mostly read-intensive
   failover cluster: single server with hot backup

 Pros/Cons
   better read performance
   worse write performance (write management)
   high read (slave) resilience:
    master failure ⇒ slaves can still handle read requests
   low write (master) resilience:
    master failure ⇒ no writes until old/new master is up
   read inconsistencies: update not propagated to all slaves
   master = bottleneck and single point of failure
   high licensing costs for RDBMSs
15/12/2012    Scalability – Horizontal scalability: architectures and distribution models   16
Data replicated across multiple nodes

     All nodes are peer (equal weight): no master, no slaves


     All nodes can both read and write




15/12/2012      Scalability – Horizontal scalability: architectures and distribution models   17
R       W                                                                                 R   W


      A                                                                                         A
      B                                                                                         B
      C                                                                                         C
     Peer                                                                                      Peer

                                                R        W


                                                 A
                                                 B
                                                 C
                                                Peer
15/12/2012       Scalability – Horizontal scalability: architectures and distribution models           18
Use cases
    load balancing cluster: data usage read/write-intensive
    need to scale out more easily

  Pros/Cons
    better read performance
    better write performance
    high resilience:
     node failure ⇒ reads/writes handled by other nodes
    read inconsistencies: update not propagated to all nodes
    write inconsistencies: same record at the same time
    high licensing costs for RDBMSs

15/12/2012     Scalability – Horizontal scalability: architectures and distribution models   19
Sharding + master-slave replication
    multiple masters
    each data item has a single master
    node configurations:
         • master
         • slave
         • master for some data / slave for other data


  Sharding + peer-to-peer replication




15/12/2012         Scalability – Horizontal scalability: architectures and distribution models   20
R       W                             R       W
                                                                                                       R

         A                                 B                                                   C
         F                                 E                                                   D
         H                                 G                                                   I
      Master 1                       Master/Slave 2                                         Slave 3

                 R                                 R       W                                               W
                                                                                                R

           A                               B                                                 C
           F                               E                                                 D
           H                               G                                                 I
        Slave 1                      Slave/Master 2                                       Master 3
15/12/2012               Scalability – Horizontal scalability: architectures and distribution models           21
R   W                             R       W                                    R      W


          A                                  B                                            C
          F                                  E                                            D
          H                                  G                                            I
       Peer 1/2                           Peer 3/4                                     Peer 5/6

             R   W                             R       W                                           W
                                                                                            R

          A                                  B                                            C
          F                                  H                                            D
          E                                  G                                            I
       Peer 1/4                           Peer 2/3                                     Peer 5/6
15/12/2012           Scalability – Horizontal scalability: architectures and distribution models       22
Oracle Database
   Oracle RAC               shared everything

  Microsoft SQL Server
   All editions    shared nothing
                   master-slave replication

  IBM DB2
   DB2 pureScale            shared disk
   DB2 HADR                 shared nothing
                            master-slave replication (failover cluster)



15/12/2012     Scalability – Horizontal scalability: architectures and distribution models   23
Oracle MySQL
   MySQL Cluster            shared nothing
                            sharding, replication, sharding + replication

  The PostgreSQL Global Development Group PostgreSQL
   PGCluster-II   shared disk
   Postgres-XC    shared nothing
                  sharding, replication, sharding + replication




15/12/2012     Scalability – Horizontal scalability: architectures and distribution models   24
Horizontal scalability

  Consistency



15/12/2012      Scalability   25
Inconsistent write = write-write conflict
  multiple writes of the same data at the same time
  (highly likely with peer-to-peer replication)


  Inconsistent read = read-write conflict
  read in the middle of someone else’s write




15/12/2012        Scalability – Horizontal scalability: consistency   26
 Pessimistic approach
      prevent conflicts from occurring


     Optimistic approach
      detect conflicts and fix them




15/12/2012          Scalability – Horizontal scalability: consistency   27
Implementation
   write locks ⇒ acquire a lock before updating a value
    (only one lock at a time can be tacken)

 Pros/Cons
   often severely degrade system responsiveness
   often leads to deadlocks (hard to prevent/debug)
   rely on a consistent serialization of the updates*


   * sequential consistency
   ensuring that all nodes apply operations in the same order


15/12/2012           Scalability – Horizontal scalability: consistency   28
Implementation
   conditional updates ⇒ test a value before updating it
    (to see if it's changed since the last read)
   merged updates ⇒ merge conflicted updates somehow
    (save updates, record conflict and merge somehow)

 Pros/Cons
   conditional updates
    rely on a consistent serialization of the updates*

   * sequential consistency
   ensuring that all nodes apply operations in the same order

15/12/2012           Scalability – Horizontal scalability: consistency   29
 Logical consistency
      different data make sense together

     Replication consistency
      same data ⇒ same value on different replicas

     Read-your-writes consistency
      users continue seeing their updates




15/12/2012         Scalability – Horizontal scalability: consistency   30
ACID transactions ⇒ aggregate-ignorant DBs

Partially atomic updates ⇒ aggregate-oriented DBs
  atomic updates within an aggregate
  no atomic updates between aggregates
  updates of multiple aggregates: inconsistency window
  replication can lengthen inconsistency windows




15/12/2012        Scalability – Horizontal scalability: consistency   31
Eventual consistency
     nodes may have replication inconsistencies:
       stale (out of date) data

     eventually all nodes will be synchronized




15/12/2012            Scalability – Horizontal scalability: consistency   32
Session consistency
   within a user’s session there is read-your-writes consistency
    (no stale data read from a node after an update on another one)
   consistency lost if
       • session ends
       • the system is accessed simultaneously from different PCs
   implementations
     • sticky session/session affinity = sessions tied to one node
              affects load balancing
              quite intricate with master-slave replication
       • version stamps
              track latest version stamp seen by a session
              ensure that all interactions with the data store include it


15/12/2012                        Scalability – Horizontal scalability: consistency   33
Horizontal scalability

  CAP theorem



15/12/2012      Scalability   34
Consistency
all nodes see the same data at the same time

Latency
the response time in interactions between nodes

Availability
 every nonfailing node must reply to requests
 the limit of latency that we are prepared to tolerate:
  once latency gets too high, we give up and treat data as
  unavailable

Partition tolerance
the cluster can survive communication breakages
(separating it into partitions unable to communicate with each other)

15/12/2012            Scalability – Horizontal scalability: CAP theorem   35
1) read(A)
                                                                           2) A = A – 50
   Transaction to transfer $50                                             3) write(A)
  from account A to account B                                              4) read(B)
                                                                           5) B = B + 50
                                                                           6) write(B)


 Atomicity
     • transaction fails after 3 and before 6 ⇒ the system should
       ensure that its updates are not reflected in the database
 Consistency
     • A + B is unchanged by the execution of the transaction



15/12/2012             Scalability – Horizontal scalability: CAP theorem                   36
1) read(A)
                                                                           2) A = A – 50
   Transaction to transfer $50                                             3) write(A)
  from account A to account B                                              4) read(B)
                                                                           5) B = B + 50
                                                                           6) write(B)
 Isolation
     • another transaction will see inconsistent data between 3 and 6
       (A + B will be less than it should be)
     • Isolation can be ensured trivially by running transactions
       serially ⇒ performance issue

 Durability
     • user notified that transaction completed ($50 transferred)
       ⇒ transaction updates must persist despite failures

15/12/2012             Scalability – Horizontal scalability: CAP theorem                   37
Basically Available
                 Soft state
                 Eventually consistent

     Soft state and eventual consistency are techniques that work
    well in the presence of partitions and thus promote availability




15/12/2012             Scalability – Horizontal scalability: CAP theorem   38
Given the three properties of
             Consistency, Availability and
                  Partition tolerance,
                 you can only get two




15/12/2012         Scalability – Horizontal scalability: CAP theorem   39
C
  being up and keeping consistency is reasonable

  A
  one node: if it’s up it’s available

  P
  a single machine can’t partition



15/12/2012          Scalability – Horizontal scalability: CAP theorem   40
AP ( C )
   partition ⇒ update on one node = inconsistency




15/12/2012        Scalability – Horizontal scalability: CAP theorem   41
CP ( A )
   partition ⇒ consistency only if one nonfailing
               node stops replying to requests




15/12/2012         Scalability – Horizontal scalability: CAP theorem   42
CA ( P )
  nodes communicate ⇒ C and A can be preserved
  partition ⇒ all nodes on one partition must be
              turned off (failing nodes preserve A)
              difficult and expensive




15/12/2012         Scalability – Horizontal scalability: CAP theorem   43
ACID databases
  focus on consistency first and availability second


  BASE databases
  focus on availability first and consistency second




15/12/2012         Scalability – Horizontal scalability: CAP theorem   44
Single server
    no partitions
    consistency versus performance: relaxed isolation
     levels or no transactions

  Cluster
    consistency versus latency/availability
    durability versus performance (e.g. in memory DBs)
    durability versus latency (e.g. the master
     acknowledges the update to the client only after
     having been acknowledged by some slaves)


15/12/2012         Scalability – Horizontal scalability: CAP theorem   45
strong write consistency ⇒ write to the master


  strong read consistency ⇒ read from the master




15/12/2012        Scalability – Horizontal scalability: CAP theorem   46
N = replication factor
         (nodes involved in replication NOT nodes in the cluster)
W = nodes confirming a write
R = nodes needed for a consistent read

write quorum: W > N/2                                read quorum: R + W > N

Consistency is on a per operation basis

Choose the most appropriate combination of
problems and advantages

15/12/2012               Scalability – Horizontal scalability: CAP theorem   47

More Related Content

Recently uploaded

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 

Recently uploaded (20)

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 

Featured (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

DotNetToscana: NoSQL Revolution - Scalability

  • 2. The need for speed 15/12/2012 Scalability 2
  • 3. Companies continuously increase More and more data and traffic More and more computing resources needed SOLUTION SCALING 15/12/2012 Scalability – The need for speed 3
  • 4. vertical scalability = scale up  single server  performance ⇒ more resources (CPUs, storage, memory)  volumes increase ⇒ more difficult and expensive to scale  not reliable: individual machine failures are common horizontal scalability = scale out  cluster of servers  performance ⇒ more servers  cheaper hardware (more likely to fail)  volumes increase ⇒ complexity ~ constant, costs ~ linear  reliability: CAN operate despite failures  complex: use only if benefits are compelling 15/12/2012 Scalability – The need for speed 4
  • 6. All data on a single node Use cases  data usage = mostly processing aggregates  many graph databases Pros/Cons  RDBMSs or NoSQL databases  simplest and most often recommended option  only vertical scalability 15/12/2012 Scalability – Vertical scalability 6
  • 7. Horizontal scalability Architectures and distribution models 15/12/2012 Scalability 7
  • 8. Shared everything  every node has access to all data  all nodes share memory and disk storage  used on some RDBMSs 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 8
  • 9. Shared disk  every node has access to all data  all nodes share disk storage  used on some RDBMSs 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 9
  • 10. Shared nothing  nodes are independent and self-sufficient  no shared memory or disk storage  used on some RDBMSs and all NoSQL databases 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 10
  • 11. Sharding different data put on different nodes Replication same data copied over multiple nodes Sharding + replication the two orthogonal techniques combined 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 11
  • 12. Different parts of the data onto different nodes  data accessed together (aggregates) are on the same node  clumps arranged by physical location, to keep load even, or according to any domain-specific access rule R W R W W R A B C F E D H G I Shard Shard Shard 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 12
  • 13. Use cases  different people access different parts of the dataset  to horizontally scale writes Pros/Cons  “manual” sharding with every RDBMS or NoSQL store  better read performance  better write performance  low resilience: all but failing node data available  high licensing costs for RDBMSs  difficult or impossible cluster-level operations (querying, transactions, consistency controls) 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 13
  • 14. Data replicated across multiple nodes  One designated master (primary) node • contains the original • processes writes and passes them on  All other nodes are slave (secondary) • contain the copies • synchronized with the master during a replication process 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 14
  • 15. R R A A B B C C Slave Slave R W A MASTER-SLAVE REPLICATION B C Master 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 15
  • 16. Use cases  load balancing cluster: data usage mostly read-intensive  failover cluster: single server with hot backup Pros/Cons  better read performance  worse write performance (write management)  high read (slave) resilience: master failure ⇒ slaves can still handle read requests  low write (master) resilience: master failure ⇒ no writes until old/new master is up  read inconsistencies: update not propagated to all slaves  master = bottleneck and single point of failure  high licensing costs for RDBMSs 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 16
  • 17. Data replicated across multiple nodes  All nodes are peer (equal weight): no master, no slaves  All nodes can both read and write 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 17
  • 18. R W R W A A B B C C Peer Peer R W A B C Peer 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 18
  • 19. Use cases  load balancing cluster: data usage read/write-intensive  need to scale out more easily Pros/Cons  better read performance  better write performance  high resilience: node failure ⇒ reads/writes handled by other nodes  read inconsistencies: update not propagated to all nodes  write inconsistencies: same record at the same time  high licensing costs for RDBMSs 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 19
  • 20. Sharding + master-slave replication  multiple masters  each data item has a single master  node configurations: • master • slave • master for some data / slave for other data Sharding + peer-to-peer replication 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 20
  • 21. R W R W R A B C F E D H G I Master 1 Master/Slave 2 Slave 3 R R W W R A B C F E D H G I Slave 1 Slave/Master 2 Master 3 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 21
  • 22. R W R W R W A B C F E D H G I Peer 1/2 Peer 3/4 Peer 5/6 R W R W W R A B C F H D E G I Peer 1/4 Peer 2/3 Peer 5/6 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 22
  • 23. Oracle Database Oracle RAC shared everything Microsoft SQL Server All editions shared nothing master-slave replication IBM DB2 DB2 pureScale shared disk DB2 HADR shared nothing master-slave replication (failover cluster) 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 23
  • 24. Oracle MySQL MySQL Cluster shared nothing sharding, replication, sharding + replication The PostgreSQL Global Development Group PostgreSQL PGCluster-II shared disk Postgres-XC shared nothing sharding, replication, sharding + replication 15/12/2012 Scalability – Horizontal scalability: architectures and distribution models 24
  • 25. Horizontal scalability Consistency 15/12/2012 Scalability 25
  • 26. Inconsistent write = write-write conflict multiple writes of the same data at the same time (highly likely with peer-to-peer replication) Inconsistent read = read-write conflict read in the middle of someone else’s write 15/12/2012 Scalability – Horizontal scalability: consistency 26
  • 27.  Pessimistic approach prevent conflicts from occurring  Optimistic approach detect conflicts and fix them 15/12/2012 Scalability – Horizontal scalability: consistency 27
  • 28. Implementation  write locks ⇒ acquire a lock before updating a value (only one lock at a time can be tacken) Pros/Cons  often severely degrade system responsiveness  often leads to deadlocks (hard to prevent/debug)  rely on a consistent serialization of the updates* * sequential consistency ensuring that all nodes apply operations in the same order 15/12/2012 Scalability – Horizontal scalability: consistency 28
  • 29. Implementation  conditional updates ⇒ test a value before updating it (to see if it's changed since the last read)  merged updates ⇒ merge conflicted updates somehow (save updates, record conflict and merge somehow) Pros/Cons  conditional updates rely on a consistent serialization of the updates* * sequential consistency ensuring that all nodes apply operations in the same order 15/12/2012 Scalability – Horizontal scalability: consistency 29
  • 30.  Logical consistency different data make sense together  Replication consistency same data ⇒ same value on different replicas  Read-your-writes consistency users continue seeing their updates 15/12/2012 Scalability – Horizontal scalability: consistency 30
  • 31. ACID transactions ⇒ aggregate-ignorant DBs Partially atomic updates ⇒ aggregate-oriented DBs  atomic updates within an aggregate  no atomic updates between aggregates  updates of multiple aggregates: inconsistency window  replication can lengthen inconsistency windows 15/12/2012 Scalability – Horizontal scalability: consistency 31
  • 32. Eventual consistency  nodes may have replication inconsistencies: stale (out of date) data  eventually all nodes will be synchronized 15/12/2012 Scalability – Horizontal scalability: consistency 32
  • 33. Session consistency  within a user’s session there is read-your-writes consistency (no stale data read from a node after an update on another one)  consistency lost if • session ends • the system is accessed simultaneously from different PCs  implementations • sticky session/session affinity = sessions tied to one node  affects load balancing  quite intricate with master-slave replication • version stamps  track latest version stamp seen by a session  ensure that all interactions with the data store include it 15/12/2012 Scalability – Horizontal scalability: consistency 33
  • 34. Horizontal scalability CAP theorem 15/12/2012 Scalability 34
  • 35. Consistency all nodes see the same data at the same time Latency the response time in interactions between nodes Availability  every nonfailing node must reply to requests  the limit of latency that we are prepared to tolerate: once latency gets too high, we give up and treat data as unavailable Partition tolerance the cluster can survive communication breakages (separating it into partitions unable to communicate with each other) 15/12/2012 Scalability – Horizontal scalability: CAP theorem 35
  • 36. 1) read(A) 2) A = A – 50 Transaction to transfer $50 3) write(A) from account A to account B 4) read(B) 5) B = B + 50 6) write(B)  Atomicity • transaction fails after 3 and before 6 ⇒ the system should ensure that its updates are not reflected in the database  Consistency • A + B is unchanged by the execution of the transaction 15/12/2012 Scalability – Horizontal scalability: CAP theorem 36
  • 37. 1) read(A) 2) A = A – 50 Transaction to transfer $50 3) write(A) from account A to account B 4) read(B) 5) B = B + 50 6) write(B)  Isolation • another transaction will see inconsistent data between 3 and 6 (A + B will be less than it should be) • Isolation can be ensured trivially by running transactions serially ⇒ performance issue  Durability • user notified that transaction completed ($50 transferred) ⇒ transaction updates must persist despite failures 15/12/2012 Scalability – Horizontal scalability: CAP theorem 37
  • 38. Basically Available Soft state Eventually consistent Soft state and eventual consistency are techniques that work well in the presence of partitions and thus promote availability 15/12/2012 Scalability – Horizontal scalability: CAP theorem 38
  • 39. Given the three properties of Consistency, Availability and Partition tolerance, you can only get two 15/12/2012 Scalability – Horizontal scalability: CAP theorem 39
  • 40. C being up and keeping consistency is reasonable A one node: if it’s up it’s available P a single machine can’t partition 15/12/2012 Scalability – Horizontal scalability: CAP theorem 40
  • 41. AP ( C ) partition ⇒ update on one node = inconsistency 15/12/2012 Scalability – Horizontal scalability: CAP theorem 41
  • 42. CP ( A ) partition ⇒ consistency only if one nonfailing node stops replying to requests 15/12/2012 Scalability – Horizontal scalability: CAP theorem 42
  • 43. CA ( P ) nodes communicate ⇒ C and A can be preserved partition ⇒ all nodes on one partition must be turned off (failing nodes preserve A) difficult and expensive 15/12/2012 Scalability – Horizontal scalability: CAP theorem 43
  • 44. ACID databases focus on consistency first and availability second BASE databases focus on availability first and consistency second 15/12/2012 Scalability – Horizontal scalability: CAP theorem 44
  • 45. Single server  no partitions  consistency versus performance: relaxed isolation levels or no transactions Cluster  consistency versus latency/availability  durability versus performance (e.g. in memory DBs)  durability versus latency (e.g. the master acknowledges the update to the client only after having been acknowledged by some slaves) 15/12/2012 Scalability – Horizontal scalability: CAP theorem 45
  • 46. strong write consistency ⇒ write to the master strong read consistency ⇒ read from the master 15/12/2012 Scalability – Horizontal scalability: CAP theorem 46
  • 47. N = replication factor (nodes involved in replication NOT nodes in the cluster) W = nodes confirming a write R = nodes needed for a consistent read write quorum: W > N/2 read quorum: R + W > N Consistency is on a per operation basis Choose the most appropriate combination of problems and advantages 15/12/2012 Scalability – Horizontal scalability: CAP theorem 47