SlideShare a Scribd company logo
1 of 40
Durability for Memory-Based
     Key-Value Stores


       Kiarash Rezahanjani
             July 4, 2012




                              1
Durability
                          Data Store
                        (university , KTH )
set(university , UPC)




 Ack




get(university )




  UPC


                                              2
Durability
                         Data Store
set(university , UPC )


                                        Commodity
Ack


                                      Non Volatile




                                                     3
Durability
                     Data Store
set(myKey, U)


                                  Commodity
Ack




                                        4
Durability


  Seek time
        +
                      SLOW
Rotational time
      +           Write          Read
 Transfer time




                          Disk

                                        5
Cache in memory

Slow   Writes          Reads   Fast



                   Cached Objects


                                          Consistency ?
                Primary copy of objects

                                                     6
Cache in memory

        Stale data
            Application Servers


                Set ObjA             Read ObjA - > Cache Miss

                 Spending resouces
Read Obj A                                                      Memcache servers


Complicates development                                         Delete Obj A


Update Obj A
               Writes are still Slow
                                  MySQL Servers

                                                                                   7
Memory-Based Databases
No inconsistency Writes             Reads
                                             No stale data


 Reads are fast    Primary Copy of Objects
                    Durability?


                                    Writes latency?
                          Back up

                                                             8
Approaches towards durability

State A            State B     Periodic Snapshots   Data loss


Snapshot           Snapshot


                              Synchronous logging Slow


   Log       Log        Log



                              Asynchronous logging Data loss


          Logs       Logs
                                                            9
Approaches towards durability

                    Replica



                                  Expensive
                     Data

Catastrophic Failure , All gone
       Replica                     Replica




                                             10
Project Goals

          Durable write
           Low latency

Availability, able to recover quickly

 Cheap, commodity hardware


                                        11
Target systems
•   Data is big = many machines
•   Read dominant workload
•   Simple key-value store
•   Small writes
    – Example: Facebook
       •   Tera bytes of data = 2000 memcache servers
       •   Write/read ratio < 6%
       •   Memcache is a key-value store
       •   Status update, tag photo, profile update, etc

                                                           12
Solution




           13
Design decisions


  Periodic snapshot
       vs.
  Message logging     



                          14
Design decisions


    Local disk
       vs.
  Remote location   



                        15
Design decisions


      Remote file server
               vs.
Local disks of database cluster   



                                      16
Design Decision
                write


         Database
           client


  Ack               Log




        Remote storage
                          17
Design Decision
            write
                       Two Problems
      Database
       client           1) Synchronous logging

Ack              Log             Must
                          Asynchronous logging
                                 Problems: Data loss

                        2) Data availability

 Replication
                                                       18
Replication

                   Ack                  Log
Ack     Log


                 Log        Log   Log

Replication




                                              19
Replication
              Broadcast                              Chain replication


        Ack               Log           Ack                                Log



                mast
                 er                           tail                       head



slave                           slave




                                                                            20
Replication
          Broadcast


    Ack               Log


            mast
             er



slave                  slave



            slave

                               21
Replication
               Chain replication


Ack                                  Log




      tail                         head




                                           22
Replication
               Chain replication


                                          Log
Ack



      tail                         head




                                                23
Chain Replication
                       write


            Database
      Ack     client   Log




Log         Log                Log




                                     24
Chain Replication
Synchronous logging abstraction
                           write


Low latency             Database
                  Ack     client   Log


Available Logs


        Log             Log              Log
                  Stable Storage Unit

                                               25
Log Server


 Log




             26
Log Server
                                                       3        2 1
                                        Reader


           7
Receiver                6     5     3




                                        Persister


                 Sequential Write

                 Seek time

                                                 2 1
                                                           27
Forming storage units

1. Query zookeeper
                                    Zookeeper
2. Get list of servers
3. Leader send request
4. Leader send list of
  members
                              ID1     ID2       ID3
5. Upload storage unit data
6. Start the service
                                                 28
Storage System
                                 Zookeeper




Client


Client     Stable storage unit               Stable storage unit



Client




           Stable storage unit               Stable storage unit
                                                                   29
Failover
                          Cient




ID 1                              ID 2             ID 3
50%                               20%              30%




ID 4               ID 5                            ID 6
40%                45%                             20%



 Stable Storage Unit                Stable Storage Unit   30
Failover
                          Cient




ID 1                              ID 2             ID 3
50%                               20%              30%




ID 4               ID 5                            ID 6
40%                45%                             20%



 Stable Storage Unit                Stable Storage Unit   31
Failover
                          Cient




ID 1                              ID 2             ID 3
50%                               20%              30%




ID 4               ID 5                            ID 6
40%                45%                             20%



 Stable Storage Unit                Stable Storage Unit   32
Evaluation
• Throughput and latency of stable storage unit
  – Log entry sizes
  – Replication factors
• Comparison with WAL into local disk




                                                  33
Single synchronous client
             Replication factor of 3


Entry Size    Latency(ms)        Throughput(entries/sec)
(bytes)
200           0,45               2200
1024          0,62               1600
4096          0,99               1000




                                                           34
Throughput vs. Latency
                                          Replication factor of 3
               3500



               3000



               2500
Latency (ms)




               2000
                                                                                                             5B
                                                                                                             200 B
               1500                                                                                          1 KB
                          5000                                                                               4 KB

               1000                14000                        28000                                        10 KB

                                                                                 34000
               500



                  0
                      0    5000   10000   15000      20000      25000        30000   35000   40000   45000
                                                  Throughput (entries/sec)


                                                                                                                     35
Additional replica
                                                   Entry size of 200 bytes
                        2000

                        1800

                        1600

                        1400
Latency (microsecond)




                        1200

                        1000

                         800                                                                                  RF 3
                                                                                                              RF 2
                         600

                         400

                         200

                           0
                               0   5000    10000     15000       20000        25000   30000   35000   40000
                                                        Throughput (entries/sec)



                                                                                                                     36
Sustained load




                 37
‹#›
Resource utilization

• Throughput of 6,000 entries/sec
• Log entries of 200 bytes
  – CPU utilization = 9%
  – Bandwidth = 29 Mb/s
  – Dedicated disk
  – Small memory requirement


                                    39
Summary
 Durable write

 Low latency

 High availability

 Scalable

 No additional resources

  Avoid dependencies       40

More Related Content

What's hot

Large customers want postgresql too !!
Large customers want postgresql too !!Large customers want postgresql too !!
Large customers want postgresql too !!rosensteel
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsShinya Takamaeda-Y
 
Solving MySQL replication problems with Tungsten
Solving MySQL replication problems with TungstenSolving MySQL replication problems with Tungsten
Solving MySQL replication problems with TungstenGiuseppe Maxia
 
Exchange 2010 ha ctd
Exchange 2010 ha ctdExchange 2010 ha ctd
Exchange 2010 ha ctdKaliyan S
 
Preventing multi master conflicts with tungsten
Preventing multi master conflicts with tungstenPreventing multi master conflicts with tungsten
Preventing multi master conflicts with tungstenGiuseppe Maxia
 
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...Jeff Malek
 
Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive
Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep DiveMicrosoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive
Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Diversnarayanan
 
MySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMat Keep
 
The Native NDB Engine for Memcached
The Native NDB Engine for MemcachedThe Native NDB Engine for Memcached
The Native NDB Engine for MemcachedJohn David Duncan
 
High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)Chris Bailey
 
Distributed Multi-Threading in GNU-Prolog
Distributed Multi-Threading in GNU-PrologDistributed Multi-Threading in GNU-Prolog
Distributed Multi-Threading in GNU-PrologNuno Morgadinho
 
Advocacy for the OpenSource Relational Data Base Management System Over FreeBSD
Advocacy for the OpenSource Relational Data Base Management System Over FreeBSDAdvocacy for the OpenSource Relational Data Base Management System Over FreeBSD
Advocacy for the OpenSource Relational Data Base Management System Over FreeBSDtheManda
 
Oracle rac cachefusion - High Availability Day 2015
Oracle rac cachefusion - High Availability Day 2015Oracle rac cachefusion - High Availability Day 2015
Oracle rac cachefusion - High Availability Day 2015aioughydchapter
 

What's hot (20)

Large customers want postgresql too !!
Large customers want postgresql too !!Large customers want postgresql too !!
Large customers want postgresql too !!
 
Introduction to visual DSP++ Kernel
Introduction to visual DSP++ KernelIntroduction to visual DSP++ Kernel
Introduction to visual DSP++ Kernel
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
 
Solving MySQL replication problems with Tungsten
Solving MySQL replication problems with TungstenSolving MySQL replication problems with Tungsten
Solving MySQL replication problems with Tungsten
 
Exchange 2010 ha ctd
Exchange 2010 ha ctdExchange 2010 ha ctd
Exchange 2010 ha ctd
 
Preventing multi master conflicts with tungsten
Preventing multi master conflicts with tungstenPreventing multi master conflicts with tungsten
Preventing multi master conflicts with tungsten
 
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, a...
 
Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive
Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep DiveMicrosoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive
Microsoft Exchange Server 2007 High Availability And Disaster Recovery Deep Dive
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
MySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached API
 
Smashing The Stack
Smashing The StackSmashing The Stack
Smashing The Stack
 
The Native NDB Engine for Memcached
The Native NDB Engine for MemcachedThe Native NDB Engine for Memcached
The Native NDB Engine for Memcached
 
Twee remedies tegen systeemuitval en datacorruptie
Twee remedies tegen systeemuitval en datacorruptieTwee remedies tegen systeemuitval en datacorruptie
Twee remedies tegen systeemuitval en datacorruptie
 
High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)High speed networks and Java (Ryan Sciampacone)
High speed networks and Java (Ryan Sciampacone)
 
Distributed Multi-Threading in GNU-Prolog
Distributed Multi-Threading in GNU-PrologDistributed Multi-Threading in GNU-Prolog
Distributed Multi-Threading in GNU-Prolog
 
Build Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVMBuild Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVM
 
Advocacy for the OpenSource Relational Data Base Management System Over FreeBSD
Advocacy for the OpenSource Relational Data Base Management System Over FreeBSDAdvocacy for the OpenSource Relational Data Base Management System Over FreeBSD
Advocacy for the OpenSource Relational Data Base Management System Over FreeBSD
 
Edition based redefinition joords
Edition based redefinition joordsEdition based redefinition joords
Edition based redefinition joords
 
Vastsky xen summit20100428
Vastsky xen summit20100428Vastsky xen summit20100428
Vastsky xen summit20100428
 
Oracle rac cachefusion - High Availability Day 2015
Oracle rac cachefusion - High Availability Day 2015Oracle rac cachefusion - High Availability Day 2015
Oracle rac cachefusion - High Availability Day 2015
 

Similar to Presentation

Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...SQLExpert.pl
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Ontico
 
Collaborate vdb performance
Collaborate vdb performanceCollaborate vdb performance
Collaborate vdb performanceKyle Hailey
 
WalB: Block-level WAL. Concept.
WalB: Block-level WAL. Concept.WalB: Block-level WAL. Concept.
WalB: Block-level WAL. Concept.Takashi Hoshino
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architectureAmit Bhalla
 
Scaling at Showyou: Operations
Scaling at Showyou: OperationsScaling at Showyou: Operations
Scaling at Showyou: Operationsaphyr_
 
Less14 br concepts
Less14 br conceptsLess14 br concepts
Less14 br conceptsAmit Bhalla
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "Kuniyasu Suzaki
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the LogBen Stopford
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced CassandraEric Evans
 
Playing in the Same Sandbox: MySQL and Oracle
Playing in the Same Sandbox:  MySQL and OraclePlaying in the Same Sandbox:  MySQL and Oracle
Playing in the Same Sandbox: MySQL and Oraclelynnferrante
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sinaHui Cheng
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)referenceFromDual GmbH
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Jeff Larkin
 

Similar to Presentation (20)

Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
 
MySQL高可用
MySQL高可用MySQL高可用
MySQL高可用
 
Collaborate vdb performance
Collaborate vdb performanceCollaborate vdb performance
Collaborate vdb performance
 
WalB: Block-level WAL. Concept.
WalB: Block-level WAL. Concept.WalB: Block-level WAL. Concept.
WalB: Block-level WAL. Concept.
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architecture
 
Scaling at Showyou: Operations
Scaling at Showyou: OperationsScaling at Showyou: Operations
Scaling at Showyou: Operations
 
Less14 br concepts
Less14 br conceptsLess14 br concepts
Less14 br concepts
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
Playing in the Same Sandbox: MySQL and Oracle
Playing in the Same Sandbox:  MySQL and OraclePlaying in the Same Sandbox:  MySQL and Oracle
Playing in the Same Sandbox: MySQL and Oracle
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sina
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Presentation

  • 1. Durability for Memory-Based Key-Value Stores Kiarash Rezahanjani July 4, 2012 1
  • 2. Durability Data Store (university , KTH ) set(university , UPC) Ack get(university ) UPC 2
  • 3. Durability Data Store set(university , UPC ) Commodity Ack Non Volatile 3
  • 4. Durability Data Store set(myKey, U) Commodity Ack 4
  • 5. Durability Seek time + SLOW Rotational time + Write Read Transfer time Disk 5
  • 6. Cache in memory Slow Writes Reads Fast Cached Objects Consistency ? Primary copy of objects 6
  • 7. Cache in memory Stale data Application Servers Set ObjA Read ObjA - > Cache Miss Spending resouces Read Obj A Memcache servers Complicates development Delete Obj A Update Obj A Writes are still Slow MySQL Servers 7
  • 8. Memory-Based Databases No inconsistency Writes Reads No stale data Reads are fast Primary Copy of Objects Durability? Writes latency? Back up 8
  • 9. Approaches towards durability State A State B Periodic Snapshots Data loss Snapshot Snapshot Synchronous logging Slow Log Log Log Asynchronous logging Data loss Logs Logs 9
  • 10. Approaches towards durability Replica Expensive Data Catastrophic Failure , All gone Replica Replica 10
  • 11. Project Goals Durable write Low latency Availability, able to recover quickly Cheap, commodity hardware 11
  • 12. Target systems • Data is big = many machines • Read dominant workload • Simple key-value store • Small writes – Example: Facebook • Tera bytes of data = 2000 memcache servers • Write/read ratio < 6% • Memcache is a key-value store • Status update, tag photo, profile update, etc 12
  • 13. Solution 13
  • 14. Design decisions Periodic snapshot vs. Message logging  14
  • 15. Design decisions Local disk vs. Remote location  15
  • 16. Design decisions Remote file server vs. Local disks of database cluster  16
  • 17. Design Decision write Database client Ack Log Remote storage 17
  • 18. Design Decision write Two Problems Database client 1) Synchronous logging Ack Log Must Asynchronous logging Problems: Data loss 2) Data availability Replication 18
  • 19. Replication Ack Log Ack Log Log Log Log Replication 19
  • 20. Replication Broadcast Chain replication Ack Log Ack Log mast er tail head slave slave 20
  • 21. Replication Broadcast Ack Log mast er slave slave slave 21
  • 22. Replication Chain replication Ack Log tail head 22
  • 23. Replication Chain replication Log Ack tail head 23
  • 24. Chain Replication write Database Ack client Log Log Log Log 24
  • 25. Chain Replication Synchronous logging abstraction write Low latency Database Ack client Log Available Logs Log Log Log Stable Storage Unit 25
  • 27. Log Server 3 2 1 Reader 7 Receiver 6 5 3 Persister Sequential Write Seek time 2 1 27
  • 28. Forming storage units 1. Query zookeeper Zookeeper 2. Get list of servers 3. Leader send request 4. Leader send list of members ID1 ID2 ID3 5. Upload storage unit data 6. Start the service 28
  • 29. Storage System Zookeeper Client Client Stable storage unit Stable storage unit Client Stable storage unit Stable storage unit 29
  • 30. Failover Cient ID 1 ID 2 ID 3 50% 20% 30% ID 4 ID 5 ID 6 40% 45% 20% Stable Storage Unit Stable Storage Unit 30
  • 31. Failover Cient ID 1 ID 2 ID 3 50% 20% 30% ID 4 ID 5 ID 6 40% 45% 20% Stable Storage Unit Stable Storage Unit 31
  • 32. Failover Cient ID 1 ID 2 ID 3 50% 20% 30% ID 4 ID 5 ID 6 40% 45% 20% Stable Storage Unit Stable Storage Unit 32
  • 33. Evaluation • Throughput and latency of stable storage unit – Log entry sizes – Replication factors • Comparison with WAL into local disk 33
  • 34. Single synchronous client Replication factor of 3 Entry Size Latency(ms) Throughput(entries/sec) (bytes) 200 0,45 2200 1024 0,62 1600 4096 0,99 1000 34
  • 35. Throughput vs. Latency Replication factor of 3 3500 3000 2500 Latency (ms) 2000 5B 200 B 1500 1 KB 5000 4 KB 1000 14000 28000 10 KB 34000 500 0 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 Throughput (entries/sec) 35
  • 36. Additional replica Entry size of 200 bytes 2000 1800 1600 1400 Latency (microsecond) 1200 1000 800 RF 3 RF 2 600 400 200 0 0 5000 10000 15000 20000 25000 30000 35000 40000 Throughput (entries/sec) 36
  • 39. Resource utilization • Throughput of 6,000 entries/sec • Log entries of 200 bytes – CPU utilization = 9% – Bandwidth = 29 Mb/s – Dedicated disk – Small memory requirement 39
  • 40. Summary  Durable write  Low latency  High availability  Scalable  No additional resources  Avoid dependencies 40

Editor's Notes

  1. Resume
  2. Periodicsnapshop: degrade the performance at the time of snapshot, generate load spikeon machine
  3. Important not to try to be all things to all people– Clients might be demanding 8 different things– Doing 6 of them is easy– …handling 7 of them requires real thought– …dealing with all 8 usually results in a worse system• more complex, compromises other clients in trying to satisfy everyoneE.g.Facebook 2008 – 800 memcache server – 2000 now &lt; 6% writeUpdatessmall (expecttag, addfriend, new ads, status, profileupdate, sharing)
  4. After log isreplicated in memory of several machines ackissendtotheclientIfsome of theprocessescrashsomeotherprocess in other machines willstillpersistthe dataSeveral replicas providebetteravailabilityof data at the time of recoveryAggregatethereadbandwidth of the servers toacceceleratetherecovery
  5. Adding replica doesnt introduce bottleneck and doesnotimpactthroughput
  6. Scalablility
  7. Replication factor of three
  8. Commonapproach WAL to local disk, Redisisanexample of a popular in memorydatabase uses WAL to diskToGuranteedurability of every log ,itshould be writtento disk uponeverywriteoperationEvenwhen log iswrittento disk thereis no guranteethatitispersisted disk, bacauseby default the disk caches are enabledProcesscrash 1.7 alsopoweroutage 49, no availabilityif server isdownOurs factor of 4 betterthan disk with cache disableSaturation can be prevented