+
Fault-tolerant mechanisms
in Big Data
Karan Pardeshi
+
Agenda
 Introduction
 Distributed Fault-tolerant mechanisms in Big Data
 Current Model
 Use of Features to build a better model
 Future Work
+
Introduction
 Cloud computing is everywhere.
 Advantages
 Cost Efficient
 Unlimited storage
 Seamless access
 Importance of Fault Tolerance
 Mass outage at Amazon Web Services
 A zone was off for an entire day!
 Time critical systems
 Rocket on a mission
 Bank applications
+
Fault tolerant mechanisms in
Distributed Systems
 Google File System (GFS)
 Focused on storage
 Replication mechanism
 different machines on different racks, N=3.
 Shadow-master’s in support to primary master
 Read access
 Checksums for data reliability
 CRC
 Amazon Dynamo
 Focused on High Availability
 Use Vector Clocks
 For semantic reconcilation
 Hinted hand-off
 Merkle Tree
 To detect and correct instabilities
+
Fault tolerant mechanisms in
Distributed Systems (continued)
 Facebook’s Cassandra
 Accrual Failure detection mechanism with gossip based protocol.
 First of its kind
 Probabilistic failure rate estimator
 Zookeeper
 Group of workstations acting as servers
 One master, other service providers in accordance with the main master
 High availability
 Bigtable
 Works on top of GFS
 Chubby service – metadata storage
 Heart of Bigtable
 Primary co-ordinator of Bigtable
 Data persistence
+
Fault tolerant mechanisms in
Distributed Systems (continued)
 MapReduce
 Classic Master-Slave configuration
 Ex - Hadoop
 Re-execution of entire operation
 If any operation terminates in between
 Operational even if some worker’s fail
 Efficient load balancing
 HDFS
+
Existing Fault tolerant model for
Cloud Computing
 Proposed by Anjali Meshram, A.S Sambare, S.D Zade
 Input is passed to all VM’s
 Accepter
 Testing carried out on algorithms for every VM.
 Timer
 Monitoring time constraint for each VM
 Reliability Assessor (RA)
 Starts with reliability of 100% for every VM
 Calculated with time taken for every result for each VM
 Decision Maker
 Selects output of node with highest reliability.
 Raises failure if reliability falls below minimum and node is removed.
+
Fig.
+
Features that can be combined to
create a new Fault Tolerant Model
 Master Node
 Co-ordinator
 Built on Zookeeper service
 Each job carried on three different
node
 Accrual Fault Detectors
 Probabilistic failure value
 Measured on ping responses from
Master
 Decision Maker
 Selects the majority vote to produce
final output
+
Future Work
 Develop a better and a more robust fault tolerant model
using the features described in earlier slides.
+
ThankYou

Fault tolerant mechanisms in Big Data

  • 1.
  • 2.
    + Agenda  Introduction  DistributedFault-tolerant mechanisms in Big Data  Current Model  Use of Features to build a better model  Future Work
  • 3.
    + Introduction  Cloud computingis everywhere.  Advantages  Cost Efficient  Unlimited storage  Seamless access  Importance of Fault Tolerance  Mass outage at Amazon Web Services  A zone was off for an entire day!  Time critical systems  Rocket on a mission  Bank applications
  • 4.
    + Fault tolerant mechanismsin Distributed Systems  Google File System (GFS)  Focused on storage  Replication mechanism  different machines on different racks, N=3.  Shadow-master’s in support to primary master  Read access  Checksums for data reliability  CRC  Amazon Dynamo  Focused on High Availability  Use Vector Clocks  For semantic reconcilation  Hinted hand-off  Merkle Tree  To detect and correct instabilities
  • 5.
    + Fault tolerant mechanismsin Distributed Systems (continued)  Facebook’s Cassandra  Accrual Failure detection mechanism with gossip based protocol.  First of its kind  Probabilistic failure rate estimator  Zookeeper  Group of workstations acting as servers  One master, other service providers in accordance with the main master  High availability  Bigtable  Works on top of GFS  Chubby service – metadata storage  Heart of Bigtable  Primary co-ordinator of Bigtable  Data persistence
  • 6.
    + Fault tolerant mechanismsin Distributed Systems (continued)  MapReduce  Classic Master-Slave configuration  Ex - Hadoop  Re-execution of entire operation  If any operation terminates in between  Operational even if some worker’s fail  Efficient load balancing  HDFS
  • 7.
    + Existing Fault tolerantmodel for Cloud Computing  Proposed by Anjali Meshram, A.S Sambare, S.D Zade  Input is passed to all VM’s  Accepter  Testing carried out on algorithms for every VM.  Timer  Monitoring time constraint for each VM  Reliability Assessor (RA)  Starts with reliability of 100% for every VM  Calculated with time taken for every result for each VM  Decision Maker  Selects output of node with highest reliability.  Raises failure if reliability falls below minimum and node is removed.
  • 8.
  • 9.
    + Features that canbe combined to create a new Fault Tolerant Model  Master Node  Co-ordinator  Built on Zookeeper service  Each job carried on three different node  Accrual Fault Detectors  Probabilistic failure value  Measured on ping responses from Master  Decision Maker  Selects the majority vote to produce final output
  • 10.
    + Future Work  Developa better and a more robust fault tolerant model using the features described in earlier slides.
  • 11.