• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Evaluating Data Freshness in Large Scale Replicated Databases
 

Evaluating Data Freshness in Large Scale Replicated Databases

on

  • 672 views

My presentation of the paper titled "Evaluating Data Freshness in Large Scale Replicated Databases" on INForum 2010

My presentation of the paper titled "Evaluating Data Freshness in Large Scale Replicated Databases" on INForum 2010

Statistics

Views

Total Views
672
Views on SlideShare
670
Embed Views
2

Actions

Likes
1
Downloads
0
Comments
0

2 Embeds 2

http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Evaluating Data Freshness in Large Scale Replicated Databases Evaluating Data Freshness in Large Scale Replicated Databases Presentation Transcript

    • Evaluating Data Freshness in LargeScale Replicated Databases Miguel Araújo and José Pereira Partially funded by project ReD - Resilient Database Clusters (PDTC / EIA-EIA / 109044 / 2008)
    • Introduction
    • Introduction• Why replicate?
    • Introduction• Why replicate? • Replicating data improves fault-tolerance • Allows the construction of large scale systems • Improves performance
    • Introduction• Why replicate? • Replicating data improves fault-tolerance • Allows the construction of large scale systems • Improves performance• There are different replication protocols: • Lazy schemes use separate transactions for execution and propagation • Eager schemes distribute updates to replicas in the context of the original updating transaction
    • Introduction• Why replicate? • Replicating data improves fault-tolerance • Allows the construction of large scale systems • Improves performance• There are different replication protocols: • Lazy schemes use separate transactions for execution and propagation • Eager schemes distribute updates to replicas in the context of the original updating transaction• Most database management systems implement lazy master-slave replication The application has to deal with stale data in order to avoid data inconsistency among replicas
    • Background Replication Mechanism• Replication mechanism of MySQL: 1. The master records changes to its data in its binary log (these records are called binary log events) 2. The slave copies the master’s binary log events to its own log (relay log) 3. The slave replays the events in the relay log, applying the changes to its own data
    • Background Replication Topologies• MySQL in particular allows almost any configuration of master and slave, as long as each server has at most one master
    • Background Replication Topologies• MySQL in particular allows almost any configuration of master and slave, as long as each server has at most one master Master and multiple slaves
    • Background Replication Topologies• MySQL in particular allows almost any configuration of master and slave, as long as each server has at most one master Master and multiple slaves Chain
    • BackgroundReplication Topologies
    • BackgroundReplication Topologies Tree
    • BackgroundReplication Topologies The ring topology is the only one that allows update- everywhere replication (if the application does not found any conflicts) Ring
    • Background Data Freshness• But... These techniques do not provide data freshness guarantees• So, asynchronous replication leads to periods of time that copies of the same data diverge• MySQL replication is commonly known as being very fast• There have not been systematic efforts to characterize data freshness in MySQL
    • Goal
    • Goal Assessing the impact of replication topology in MySQL, towards maximizing scalability and data freshness
    • Agenda Approach Experiments - Workload - Setting - Results Conclusions
    • Measuring Propagation Delay Approach• Our approach is based on using a centralized probe • It periodically query each of the replicas, thus discovering what has been the last update applied • By comparing such positions, it should be possible to discover the propagation delay There are however several challenges that have to be tackled to obtain correct results:
    • Measuring Propagation DelayApproach
    • Measuring Propagation Delay Approach• Measuring updates
    • Measuring Propagation Delay Approach• Measuring updates• Non-simultaneous probing Problem: cannot probe master and slave simultaneously, so readings are not comparable!"#$%&(")%!*+,$*&
    • Measuring Propagation Delay Approach• Measuring updates Solution: interpolate with a line the time-log position values of the master and then compare each replica• Non-simultaneous probing value point with it. At the end, calculate the average Problem: cannot probe master and slave simultaneously, so readings are not comparable ! " ."/)&"#0,0"1 !!"#$%& " ! #$%&( " ! " !(")% " !)*)%$#,(- ")*)#$+(!*+,$*& 20%(
    • Measuring Propagation DelayApproach
    • Measuring Propagation Delay Approach• Eliminating quiet periods Problem: sampling twice without updates erroneously biases the estimate !"#$%& (")% !*+,$*&
    • Measuring Propagation Delay Approach• Eliminating quiet periods Solution: select periods where line segments from both replicas have a positive slope Problem: sampling twice without updates erroneously biases the estimate !"#$%& (")% !*+,$*&
    • Measuring Propagation Delay Approach• Eliminating quiet periods
    • Measuring Propagation Delay Approach• Eliminating quiet periods• Dealing with variability
    • Measuring Propagation Delay Approach• Eliminating quiet periods• Dealing with variability Solution: consider a sufficient amount of samples and assume that each probe happens after half of the observed round-trip
    • Experiments Workload• We have chosen the workload model defined by TPC-C benchmark• TPC-C is composed mostly by update intensive transactions• The master server is almost entirely dedicated to update transactions even in a small scale experimental setting, mimicking what would happen in a very large scale MySQL setup• Each client is attached to a database server and produces a stream of transaction requests
    • Experiments Setting• Two replication schemes were installed and configured. A five machine topology of master and multiple slaves, and a five machine topology in daisy chain• All machines are connected through a LAN, and are named PD01 to PD06. Being PD01 the master instance, PD04 the remote machine in which interrogation client executed, and the others the slave instances Machines: HP’s - lab SO OS: Linux Ubuntu Server 2.6.31-14 ✓ 20 clients CPU: 2 Intel(R) Core(TM)2 CPU 6400 - 2.13GHz ✓ 80 clients ✓ 40 clients MEM: 1 GB ✓ 100 clients DBMS: MySQL 5.1.39 ✓ 60 clients Scale Factor(warehouses): 2 Duration: 20 minutes
    • Results• Results for the different number of TPC-C clients on the master and multiple slaves topology PD02 PD03 PD05 PD06 11000 9900 8800 7700 Delay Average (µs) 6600 5500 4400 3300 2200 1100 0 20 40 60 80 100 Clients
    • Results• Results for the different number of TPC-C clients on the chain topology PD02 PD03 PD05 PD06 32000 28800 25600 22400 Delay Average (µs) 19200 16000 12800 9600 6400 3200 0 20 40 60 80 100 Clients
    • Conclusions
    • Conclusions We were able to measure freshness accurately and with a realistic workload
    • Conclusions We were able to measure freshness accurately and with a realistic workload Results show how much:
    • Conclusions We were able to measure freshness accurately and with a realistic workload Results show how much:• Delay grows with workload• Delay grows with number of replicas attached• Delay grows with number of hops
    • Conclusions We were able to measure freshness accurately and with a realistic workload Results show how much: • Delay grows with workload • Delay grows with number of replicas attached • Delay grows with number of hops The conclusion is that the scalability of MySQL using its replicationmechanism is limited when data freshness is a concern
    • Questions? Thank you