Thesis (Master) of Computer Science. It's based on Distributed System and Database System. It's about management replica on distributed database systemm.
Distributed Dynamic Replication Management Mechanism Based on Accessing Frequency Detecting
1. Supervisor : Daw Khaing
Name : Ma May Sit Hman
Roll No. : 5MCS – 5
Batch : 23rd Batch
Seminar : Third Seminar
Date : 11.12.2018
1
Dynamic Replication Management Mechanism Based on
Accessing Frequency Detecting
2. Outline
2
Abstract
Objective
Introduction
Basic Model of FDRM
Rules for Replica Adding and Deleting
Interval of Periodicity
Advantages of FDRM
Algorithm of System
System Overview
System Flow
Database Design
Sequence Diagram
Example Calculations
System Implementations
Related Works
Conclusions
References
Thesis Time Schedule
3. Abstract
3
This system introduces replication management policies in distributed file system.
This system presents a decentralized dynamic replication management mechanism
based on accessing frequency detecting named FDRM (Frequency Detecting
Replication Management).
In FDRM, in order to provide better system performance and reduce network
traffic, system nodes scan their local data replicas to monitor replicas’ access
pattern, and makes decision independently to add, delete or migrate replicas.
In addition, the scanning interval of a replica is variable according to the
accessing frequency to that replica, which makes FDRM more sensitive to the
change of system behaviors.
4. Objectives
4
To build this system as a reliable system (To ensure consistency between
redundant resources to improve reliability or accessibility).
To improve system performance by dynamically adding or deleting replicas to
optimize the message traffic in the network.
To reduce the network traffic effectively without extra system overhead.
To provide a uniform view for multiple replicas of files.
5. Introduction
5
Replication of data is an essential component of distributed file systems and peer-
to-peer systems.
There are two major motivations for data replication: increasing data availability
and improving system performance.
Adding replicas for the files in the distributed file system allows the system to
keep the ability of accessing file data and thus increases file reliability and
availability.
Also, if a file is replicated near the location where it is often accessed,
communication costs could be reduced by reading the data nearby, thus the
response time will be reduced and system performance will be enhanced.
6. Introduction (Cont’d)
6
Distributed file system must provide a uniform view for multiple replicas of file,
which requires special effort to maintain data consistency.
With rapid increment of complexity of network environment, network costs and
system overheads for data synchronization among file replicas are also increased
greatly.
Under this circumstance, an optimal replication management mechanism for a
large-scaled distributed file system is expected.
7. Introduction (Cont’d)
7
This proposal presents an effective distributed dynamic replica management
mechanism based on accessing frequency detecting named FDRM.
The aim of FDRM is to improve system performance by dynamically adding or
deleting file replicas to optimize the message traffic in the network.
FDRM runs in a distributed manner. Every node makes its own decision
independently to add, delete or move its replicas.
By this means, FDRM treats a replica data individually, and achieves better
performance with low system overhead.
8. Basic Model of FDRM
8
FDRM works in the read-one-write-all context
a file read involves only one replica
the write update is propagated to all replicas
the read request is satisfied locally or by the nearest replica.
the write request is satisfied to all replicas of file and origin file.
9. Basic Model of FDRM (Cont’d)
9
Rules for Replica Adding and Deleting
Assume that there are n replicas of a given file f in the system, node i (i=1, 2, , n)
denotes n nodes which store these replicas, α denotes the system cost to satisfy a read
request at the closest replica, and β denotes the cost for system to update a replica.
Working in a distributed manner, FDRM uses local data of a replica to determine
replication scheme.
During a time interval t, a replica of f on node i maintains some counters to record
requests to f both locally and remotely: r in is the number of local read requests, rout, j is
the number of read requests from node j, win is the number of local write requests, w out
is the number of update requests that node i receives.
10. Basic Model of FDRM (Cont’d)
10
Rules for Replica Adding and Deleting (Cont’d)
The precondition for nodei to add a new replica to another node jis that the overall costs
after adding is less than the costs before adding:
Equation no. (1) :
𝑟 𝑜𝑢𝑡, 𝑗 𝛼 + 𝑛𝑤𝑜𝑢𝑡𝛽 + 𝑛 − 1 𝑤𝑖𝑛 𝛽 > 𝑛 + 1 𝑤 𝑜𝑢𝑡 𝛽 + 𝑛𝑤𝑖𝑛𝛽
The precondition for nodei to delete its replica of f is the overall cost after deleting is less
than the costs before deleting:
Equation no. (2) :
𝑛𝑤 𝑜𝑢𝑡 𝛽 + 𝑛 − 1 𝑤𝑖𝑛 𝛽 > 𝑟𝑖𝑛𝛼 + 𝑛 − 1 𝑤 𝑜𝑢𝑡 𝛽 + 𝑛 − 1 𝑤𝑖𝑛 𝛽
11. Basic Model of FDRM (Cont’d)
11
Rules for Replica Adding and Deleting (Cont’d)
In a real system, we can suggest α≈β, so Eqs.(1) and (2) can be simplified as
𝒓 𝒐𝒖𝒕, 𝒋 > 𝒘 𝒐𝒖𝒕 + 𝒘𝒊𝒏
And
𝒓𝒊𝒏 < 𝒘 𝒐𝒖𝒕
12. Basic Model of FDRM (Cont’d)
12
Interval of Periodicity
After a period of time t, a node will make its decision of adding or deleting a replica.
The length of t is important to system’s performance.
The intervals of periodicity in FDRM are variable to the frequency of file access.
The interval will be short when access is not frequent and the interval will become longer
when frequency of file accessing increases.
By this means, different parts of the replication scheme may have different scan
frequencies.
To avoid intervals being too short or too long, we set a lower and upper limit for it.
MIN_PERIOD <= t <= MAX_PERIOD
14. Basic Model of FDRM (Cont’d)
14
Advantages of FDRM
FDRM works in the read-one-write-all context, which means a file read
involves only one replica and the write update is propagated to all replicas;
advantage in data consistency.
the read request is satisfied locally or by the nearest replica; advantage in
reducing user latency.
advantages in the avoiding of lost update.
22. Related Work
22
“Data Replication in Aircraft Components Database System using Distributed
Database System”, Nann Thin Thin Nwe, University of Computer Studies, Yangon, 2010.
Method : Push based Algorithm
Whenever a master copy (for example, of data A) gets updated at replica 1, an
invalidation message M is constructed and flooded into the network to other replicas.
Message M contains at least the following information: the origin server ID (e.g., IP
address), the data record, and the new version number of that data record. A replica
receives M will check if it caches data A, if yes, it compares the local version number of
A with that in the message, and invalidates its local copy if the message contains a newer
version number. It passes on message M to its neighbors no matter if it contains the
specified file, just like the way it deals with query.
Application Area : This system presents the replication process to
improve data availability in Aircraft Components Database System by implementing
distributed database system in different physical location.
23. Related Work (Cont’d)
23
“A System for Dynamic Replication on Distributed Data”, Aye Thant, University of
Computer Studies, Yangon, 2011.
Method : Proportional Replication Strategy
The application has read-only nature, replication can greatly improve the
performance. The system finds the average search size of the files. In Uniform replication
strategy, objects that are frequently queried, is inefficient. A more natural policy is to
replicate proportional to the querying rate. This should reduce the search sizes for the
more popular objects. In the proportional strategy, replicas are created according to their
popularity. Manager calculates the overall average search size. Replica Manager
compares the every search size with query count of the files. If the query count of the file
is greater than average search size, Replica Manager decides to create the replica in a
node which accesses the file frequently.
Application Area : This system used some files in the network to
implement the system. This system focused on digital library system.
24. Conclusions
24
The proposed decentralized dynamic replication management mechanism is based
on accessing frequency detecting named FDRM.
In FDRM, system node scans local replicas to detect their access pattern, and makes
decision independently to add, delete or migrate a replica.
In addition, the scanning interval of a replica is variable according to the accessing
frequency to that replica, which makes FDRM more sensitive to the change of
system behavior.
25. References
25
1. “Research on Distributed Dynamic Replication Management Policy”, ZHOU
Xu, LU Xian-liang, HOU Meng-shu, WU Jin, (School of Computer Science and
Engineering, University of Electronic Science and Technology of China, 2005).
2. Ranganathan K, Foster I. Identifying Dynamic Replication Strategies for a
High Performance Data. International Workshop on Computing. Denver:
Springer-Verlag, 2008.
3. Cabri G, Corradi A, Zambonelli F. Experience of Adaptive Replication in
Distributed File Sytems[Z]. Proc. of EUROMICRO-22, Prague, 1996. 459-466