OSDC 2014: Fabrizio Manfredi - Data replication
Upcoming SlideShare
Loading in...5
×
 

OSDC 2014: Fabrizio Manfredi - Data replication

on

  • 198 views

Data replication is a crucial component for distributed services deployed in a multi-Data Center environment. The replication schema needs to be carefully evaluated before its implementation, wrong ...

Data replication is a crucial component for distributed services deployed in a multi-Data Center environment. The replication schema needs to be carefully evaluated before its implementation, wrong design or the misuse in most of the case end with a big service outages.
To understand the replication it is needed to understand the algorithms behind it, for this reason the session will start to explaining the most used algorithms to solve the CAP theorem (Consistency , Availability and Partitioning Tolerance) like Consistent Hash, Vector clock, Gossip protocol, Paxos and Raft.
The second part of the talk will be focused to analyze how the products on the market do the replication (replication in action) with advantages and disadvantages, the talk will cover the distributed filesystem (cephs, tahoe, extreemfs..), distributed databases (db replication primitieves and external tool like Tungsten), Nosql (riak, cassandra, mongodb, couchdb) and Frameworks for in house solution (beardb, open replication,..). The talk will also show the evaluation methods and testing process for identify the best solution for your environment.

Statistics

Views

Total Views
198
Views on SlideShare
173
Embed Views
25

Actions

Likes
0
Downloads
3
Comments
0

2 Embeds 25

http://www.netways.de 20
http://www.slideee.com 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

OSDC 2014: Fabrizio Manfredi - Data replication OSDC 2014: Fabrizio Manfredi - Data replication Presentation Transcript

  • Beolink.org! Data replication 
 
 Fabrizio Manfredi Furuholmen
 "
  • Beolink.org! FOSDEM 2014" 2" Agenda !  Introduction !  overview !  Theorem !  Common Pattern !  Implementation !  Filesystem !  RDBMS !  Nosql !  Framework !  Example
  • Beolink.org! 3" Data Replication http://blog.open-e.com/in-a-nutshell-data-replication-snapshots-and-backup/"
  • Beolink.org! 4" Data Replication http://www.dreamstime.com/stock-images-cloud-computing-scalability-reliability-background-concept-word-image34898574"
  • Beolink.org! 5" Introduction
  • Beolink.org! 6" World Connection
  • Beolink.org! 7" Main Problem VS!
  • Beolink.org! 8" Main Problem
  • Beolink.org! 9" CAP theorem According to Brewer’s CAP theorem, it is impossible for any distributed computer system to simultaneously provide all three of Consistency, Availability and Partition Tolerance." " You " can’t have the three at the same time ! and get an acceptable latency."
  • Beolink.org! 10" CAP ACID! ! Atomic: Everything in a transaction succeeds or the entire transaction is rolled back." Consistent: A transaction cannot leave the database in an inconsistent state." Isolated: Transactions cannot interfere with each other." Durable: Completed transactions persist, even when servers restart etc." " -  Strong consistency for transaction highest priority" -  Pessimistic" -  Complex mechanisms" " -  Availability and scaling highest priorities" -  Weak consistency" -  Optimistic" -  Best Effort" -  Simple and FAST " Basic Availability" Soft-state" Eventual consistency" " BASE" " RDBMS! NoSQL!
  • Beolink.org! 11" Data Distribution Business Decision!
  • Beolink.org! 12" Start with some Algorithms
  • Beolink.org! 13" Data Distribution Replication! Data Placement" Data Consistency" System Coordination" Data Transmission"
  • Beolink.org! 14" Data Placement Better Distribution = partitioning ! Parallel operation = parallel stream/multi core! !
  • Beolink.org! 15" Data Placement
  • Beolink.org! 16" Data placement by HASH It isn’t rocket science !!
  • Beolink.org! 17" Data Distribution http://www.cs.rutgers.edu/~pxk/417/notes/23-lookup.html" Consistent HASH! Chord" Space base/multi dimension"
  • Beolink.org! 18" Data placement http://highlyscalable.wordpress.com/2012/09/18/distributed-algorithms-in-nosql-databases/" Vnode base" Proximity base" Replication"
  • Beolink.org! 19" Data Consistency http://highlyscalable.wordpress.com/2012/09/18/distributed-algorithms-in-nosql-databases/" To avoid ACID implementation but to guarantee the consistency some solution leave to the client the ownership of the algorithm." " -  Read and Write quorum! -  Write quorum Read all!
  • Beolink.org! 20" Data Consistency http://highlyscalable.wordpress.com/2012/09/18/distributed-algorithms-in-nosql-databases/" To avoid ACID implementation but to guarantee the consistency some solution leave to the client the ownership of the algorithm." " -  Read and Write quorum! -  Write quorum Read all!
  • Beolink.org! 21" Coordination Protocol Consensus protocol! " Paxos , Raft, ect" " Based on the state machine approach (The state machine approach is a technique for converting an algorithm into a fault-tolerant, distributed implementation. )" " " " " Epidemic (Gossip)! " epidemic: anybody can infect anyone " else with equal probability" " " " " " " Anti-entropy protocols assume that synchronization is performed by a fixed schedule – every node regularly chooses another node at random or by some rule and exchanges database contents, resolving differences. " O(log n)" http://www.cis.cornell.edu/IAI/events/Gossip_Tutorial.pdf"
  • Beolink.org! 22" Transmission Protocol Optimization! -  Re order" -  Deduplication" " ! Transmission" -  By difference (Merkel tree) " -  Callback " -  Compression" -  Auto correction" Locking! -  Distributed locking" -  Multiversioning" -  …" ! " mitosis!
  • Beolink.org! 23" Implementation
  • Beolink.org! 24" Answer …no Answer Block replication, file Information Document , blog, session Content with a TTL over a 1m Distributed file system RDMBS NoSQL Caching system
  • Beolink.org! 25" Distributed Filesystem DFS is a service that provides a single point of reference and a logical tree structure for file system resources that may be physically located anywhere on the network." " " One significant responsibility of a file system is to ensure that, regardless of the actions by programs accessing the data, the structure remains consistent…"
  • Beolink.org! 26" Filesystem " " Properties of DFS! " •  Simple from application point of view" •  Data consistency" " Base on the solution! " •  Partitioning Tolerance " •  Scalability" •  High Avaibility " " " "
  • Beolink.org! 27" Filesystem DRDB DRDB! ! Replication mode: Asynchronous, Memory synchronous , Synchronous " Transfer optimization: DRProxy " " " Main Goals! ! Disk replication, single service availability" " Disaster Recovery" " "
  • Beolink.org! 28" Filesystem CEPH " " Ceph! Data distribution: Hash base" Consensus protocol: Raft for consensus" Write mode: Write one, read one, client is notified when all replicas have been written" Weak consistency with cache pool" " " Openstack Backednd at Cern" " 1128 OSDs" 3PB" XXX vms" " http://www.slideshare.net/" Inktank_Ceph/scaling-ceph-at-cern " Main Goals! ! - Blockdevice/base for other filesystem" - Cloud support, image storage and vm storage" " "
  • Beolink.org! 29" CEPH " " Users: > 5000" VMs > 7000" > 250k VMs spawned" http://www.synnefo.org/resources.html"
  • Beolink.org! 30" RDBMS " " Property of RDBMS! " •  Quite Simple from application point of view" •  Data consistency" " Base on the solution! " •  Low Partitioning Tolerance " •  Low Scalability" •  Low High Availability " " " "
  • Beolink.org! 31" RDBMS ! Asynchronous Replication" Semi synchronous" " Postgres" Synchronous" Asynchronous"
  • Beolink.org! 32" NoSQL Properties of DFS! " •  Fast" " " Base on the solution! " •  Partitioning Tolerance " •  Scalability" •  High Availability" •  Simple " " " "
  • Beolink.org! 33" NoSQL Performance http://planetcassandra.org/nosql-performance-benchmarks/"
  • Beolink.org! 34" Riak Geo Replication! Tunable trade-offs for distribution and replication (N, R, W) " Distributed Hash Table"
  • Beolink.org! 35" Filesystem over NoSQL FUSE! In most of the case non stable" ! S3 Interface! Internet standard de facto"
  • Beolink.org! 36" Filesystem over NoSQL Wooga" http://www.slideshare.net/wooga/riak-at-woogariak-meetup-sept-2013? qid=4809eca2-8378-4e70-8e75-0db29b635fa5&v=qf1&b=&from_search=3" https://fosdem.org/2014/schedule/event/nyt_cassandra/"
  • Beolink.org! 37" Combine different solution 37" Edge node (Varnish)! Nosql! Local! cache! Centralize! cache! Info! Storage! DFS! Origin (Distribute cache)! Local! DB! Nosql! Decreasethenumberoftherequests! Increaseoftheageofthedata!
  • Beolink.org! 38" Framework Build your system if you need … " " " ….do you really need" CERN" CERN"
  • Beolink.org! 39" Framework Don’t forget Rsync !!
  • Beolink.org! 40" Framework Replication or Caching ?!
  • Beolink.org! 41" Build a solution •  Split in pieces" •  Track version " •  Transfer when needed" •  Transfer the difference" •  Use Notification when is possible" •  Move data close to computation" •  Move master close to write operation" •  Split counter to avoid dead lock" •  In HTTP don’t forget the Etag and lastmodify" " " " openkad! open-chord! openReplica! Raft!
  • Beolink.org! 42" Build a solution
  • Beolink.org!" Five pylons 43" Objects" • Separation btw data and metadata" •  Each element is marked with a revision" • Each element is marked with an hash." Cache" •  Client side" •  Callback/ Notify" •  Persistent! Transmission" •  Parallel operation" •  Http like protocol" •  Compression" •  Transfer by difference" Distribution" • Resource discovery by DNS" • Data spread on multi node cluster" • Decentralize! • Independents cluster! • Data Replication! Security" • Secure connection" •  Encryption client side," •  Extend ACL" •  Delegation/ Federation! • Admin Delegation!
  • Beolink.org! 44" Build a solution -  Consistent HASH" -  Zmq transport protocol" -  Gossip protocol for failure detection" -  Tunable trade-offs " " Pisa is a simple block data replication ! on a wide range of node!
  • Beolink.org!" And … 45" “There is always a failure waiting around the corner”" *Werner Vogel! "
  • Beolink.org!! Thank you
 
 http://restfs.beolink.org
 
 manfred.furuholmen@gmail.com
 "