LIQUID-A Scalable Deduplication File System For Virtual Machine Images.
INTRODUCTION:Cloud computing means storing and accessing data programs over internet instead of yours computers hard drive.
A virtual machine is a software that creates a virtualized environment between the computer platform and the end user in which the end user can operate software.
Data Deduplication – data compression technology.
Eliminate duplicate copies of repeating data.
A redundant data block is replaced instead of storing multiple times.Improves storage utilization.
ADVANTAGES OF LIQUID:
*Fast virtual machine deployment with peer to peer data transfer.
*Low storage consumption by means of deduplication.
*Instant cloning for virtual machine images.
*On demand fetching through a network caching with local disks.
*LIQUID files has no specific limit.
CONCLUSION:
Presented LIQUID which is a deduplication file system with good IO performance.
Achieve by caching frequently accessed data blocks in memory cache.
Avoids additional disk operations.
Deduplication of VM images proved to be effective.
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
LIQUID-A Scalable Deduplication File System For Virtual Machine Images
1.
2. CONTENTS
INTRODUCTION
ADVANTAGE OF CLOUD COMPUTING
VIRTUAL MACHINE
ADVANTAGES OF VIRTUAL MACHINE
DISADVANTAGES OF VIRTUAL MACHINE
DEDUPLICATION
BENEFITSOFDEDUPLICATION
EXISTING SYSTEM
ISSUES IN VM STORAGE
LIQUID SYSTEM ARCHITECTURE
DEDUPLICATION IN LIQUID
3. CONTENTS(cont)
OPTIMIZATION ON FINGER PRINT CALCULATION
FILE SYSTEM LAYOUT
COMMUNICATION AMONG COMPONENTS
HEART BEAT PROTOCOL
FAST CLONING FOR VM IMAGE
FAULT TOLERANCE
GARBAGE COLLECTIONS
ADVANTAGES OF LIQUID
CONCLUSION
REFERENCES
4. INTRODUCTION
Cloud computing means storing and accessing
data programs over internet instead of yours
computers hard drive.
Figure 1: A Sample Cloud Computing Network[1]
5. ADVANTAGE CLOUD COMPUTING
Lower computer cost.
Improved performance.
Reduced software cost.
Instant software cost.
Unlimited storage capacity.
Increased data reliability.
Device independence and the “always on!,anywhere
and any place”.
Free from maintenance and the “no-need-to-know”.
6. VIRTUAL MACHINE
A virtual machine is a software that creates a
virtualized environment between the computer
platform and the end user in which the end user can
operate software.
Vitualization deals with extending or replacing an
existing interface so as to mimic the behavior of
another system.
Crucial component in cloud computing.
7. VIRTUAL MACHINE(cont)
Virtual machine – Hypothetical computer.
Execute programs like a physical machine.
Initial state of a virtual machine is stored in a file
called virtual machine image.
9. ADVANTAGES OF VIRTUAL MACHINE
Familiar interfaces
Isolation
-Each OS run seperately with its own virtual
resources.
High Availability
-If one VM server is failed then easily access data
from another one.
Scalability
-Add Or remove resources easily.
10. ADVANTAGES OF VIRTUAL
MACHINE(cont)
Back up with fast recovery
-Using VMDK data recovery tool.
Reduction of cost
-it save cost by running multi OS on single
machine.
-sharing of hardware.
11. DISADVANTAGES OF VIRTUAL
MACHINE
Difficulty in direct access to hardware.
Great RAM consumption since each virtual machine
will occupy a separate area of the same.
Greate use of disk space , since it takes all the files for
each operating system installed on each virtual
machine.
A virtual machine is less efficient than an actual
machine when it access the host hard drive indirectly.
12. DEDUPLICATION
Data Deduplication – data compression technology.
Eliminate duplicate copies of repeating data.
A redundant data block is replaced instead of storing
multiple times.
Improves storage utilization.
15. BENEFITS OF DEDUPLICATION
Lower storage space requirements.
Minimize the additional storage cost.
Performance increased.
Increase Network efficiency.
Efficient Volume replication.
16. EXISTING SYSTEM
Hypervisors such as xen ,KVM etc.
Network Attached Storage(NAS)
Storage Area Network(SAN)
Direct Attached storage(DAS)
17. ISSUES IN VM STORAGE
High demand on VM storage remains a challenging
problem.
Existing systems have made efforts to reduce storage
consuptions.
Uses SAN cluster.
Cannot satisfy increasing demand due to cost
limitatations.
Hence we propose LIQUID.
18. LIQUID SYSTEM ARCHITECTURE
Three compononts – single meta server with hot back
up , multiple data server and multiple clients.
Runs on user – level service process.
VM images are split into fixed size data blocks.
Meta server – namespace , finger print , reference
count.
Meta server – mirrored to hot back up shadow meta
server.
19. LIQUID SYSTEM ARCHITECTURE(cont)
Data servers – charge of managing data blocks in VM
images.
Organized in a distributed hash table.
A liquid client provides a POSIX compatible file
system.
Client – critical component (provides deduplication).
Fault tolerance – mirroring the meta server.
Replicas of data blocks are stored.
21. DEDUPLICATION IN LIQUID
Liquid choose fixed size chunking instead of variable
size chunking.
Better since all files stored in vm images will be aligned
on disk block boundaries.
Advantage – simplicity.
Block size choice.
Block size – balancing factor which is hard to choose.
Great impact on both deduplication and IO
performance.
22. DEDUPLICATION IN LIQUID(cont)
Smaller block size – more random seeks when
accessing a VM image.
Not tolerable.
A block size also not preferable , it will reduce
deduplication ratio.
Liquid choose different block size under different
situation.
Advised to use a multiplication of 4 kb between 256 kb
and 1 MB to achieve good balance between IO
performance and deduplication ratio.
25. OPTIMIZATION ON FINGER PRINT
CALCULATION
Rely on comparison of data block finger prints for
redundancy.
Finger print – collision resistant hash value calculated
from data block contents.
MD5(26) and SHA-1[12] are frequently used for this
purpose.
Finger print collision – very small , orders of
magnitude smaller than hardware error rates.
26. OPTIMIZATION ON FINGER PRINT
CALCULATION(cont)
So we could safely assume that two data blocks are
identical.
Finger print calculation – expensive.
Delays finger print calculation for recently modified
data blocks.
Runs deduplication lazily only when it s necessary.
Client side maintains a shared cache which contains
recently accessed data blocks.
27. OPTIMIZATION ON FINGER PRINT
CALCULATION(cont)
A portion of memory is used by the client side of
liquid as private cache.
Private cache hold – modified data blocks and delay
finger print calculation on them.
Modified data block ejected from shared cache and
added to private cache.
Modified data will be ejected if private cache becomes
full.
28. OPTIMIZATION ON FINGER PRINT
CALCULATION(cont)
And ejected based on LRU policy.
Only then will the modified data block’s finger print
be calculated.
Liquid uses multiple threads for finger print
calculation.
Multiple threads will process different data blocks
currently.
Provides good IO performance.
29. FILE SYSTEM LAYOUT
All file system meta data are stored on the meta server.
Organized in a file system tree.
Client side could cache portions of file systems meta
data for fast accesses.
When a VM is stopped , modified meta data and data
blocks.
Will be pushed back to meta server.
Data servers ensures modification on VM image is
visible to other client nodes.
32. HEART BEAT PROTOCOL
META SERVER – manages all data servers.
Exchange regular heart beat message with each data
server in a ROUND ROBIN FASHION.
Detect failed data servers when there are many data
servers.
To speed up failure detection data servers send an
error signal to meta server.
33. FAST CLONING FOR VM IMAGE
Copying large images may be time consuming.
Liquid provide efficient solution by means of fast
cloning.
VM images represented by meta data files having
reference to data blocks.
By copying meta data file and updating reference
count a clone VM image is achieved.
Modification on cloned images will not effect the
original image.
35. GARBAGE COLLECTIONS
Removes unused garbage data blocks when running
out of space.
Reference counting of all data blocks are maintained
by meta servers.
Garbage collection request is issued periodically to
data server.
Garbage collection is executed based on the data block
membership in the bloom filter.
36. ADVANTAGES OF LIQUID
Fast virtual machine deployment with peer to peer
data transfer.
Low storage consumption by means of deduplication.
Instant cloning for virtual machine images.
On demand fetching through a network caching with
local disks.
LIQUID files has no specific limit.
37. CONCLUSION
Presented LIQUID which is a deduplication file system
with good IO performance.
Achieve by caching frequently accessed data blocks in
memory cache.
Avoids additional disk operations.
Deduplication of VM images proved to be effective.