Bft mr-clouds-of-clouds-discco2012 - navtalk

Byzantine Fault-Tolerant MapReduce
in Cloud-of-Clouds
Joint work with: Miguel Correia, Marcelo Pasin,
Alysson Bessani, Fernando Ramos, Paulo Verissimo
Presenter: Pedro Costa

Navtalk

Motivation
• How to count the number of words in the
internet?
• How to do it with the help of a cloud-of-clouds
(ie, several clouds)
• Guarantee integrity and availability of data

2

Outline
• Introduction
– MapReduce programming model
– Fault tolerance in Cloud-of-clouds
– 3 problems for Basic scheme
• Our approach
– Byzantine fault-tolerant MapReduce in clouds-of-clouds
• Evaluation

3

MAPREDUCE AND FAULTS

4

What is MapReduce?
• Programming model + execution environment
• Introduced by Google in 2004
• Used for processing large data sets using clusters of servers
• A few implementations available, used by many companies
• Hadoop MapReduce, an open-source MapReduce of Apache
• The most used, the one we have been using
• Includes HDFS, a distributed file system for large files

5

MapReduce basic idea
A file with all the words
on the Internet

Map Phase <word,1>

<word,n>

Reduce Phase

Tasktracker
servers

Tasktracker
servers
Job tracker detects and recovers crashed map/reduce tasks 6

MapReduce components
Wordcount

TT1 TT2 TT3 TT1 TT3

(TT)

7

But there are more faults…
• Problem: Accidental faults may affect the correctness of the results
of MapReduce
• Task corruptions: memory errors, chipset errors, …
• Cloud outages: MapReduce job interruptions
(as reported in popular clouds)

• Our goal:
• guarantee integrity and availability (despite task corruptions and
cloud outages)
• Develop a new model to compute MapReduce in cloud-of-clouds
• Commercially feasible?
Yes, but out of scope of this presentation
Tobias Kurze et al., Cloud federation. In Proceedings of the 2nd International
Conference on Cloud Computing, GRIDs, and Virtualization CLOUD COMPUTING
2011.

8

Byzantine fault-tolerant MapReduce
• Basic idea: to replicate tasks in different clouds and vote the
results returned by the replicas
• The set of clouds forms a clouds, so cloud-of-clouds
• Inputs initially stored in all clouds (i.e., not our problem)

Cloud 1

Cloud 2

Cloud 3

9

System model
• Client is correct (not part of MapReduce)
• Clouds: up to t clouds can arbitrarily corrupt all tasks and
other modules they execute
• Why use t and not f? t≤f

• Next:
• Basic BFT MapReduce scheme
• 3 problems of the Basic scheme
• Our approach: Full BFT MapReduce scheme

10

MapReduce: Map perspective

Official Cloud-of-Clouds

Replicas in different
clouds

11

MapReduce: Reduce perspective

Official Cloud-of-Clouds

clouds
But we can do better. 12

Improvements over basic version
• 3 problems have risen
• Computation problem
• Communication problem
• Job execution control problem

• 3 Solutions: Our BFT MapReduce can be thought of as this
basic version plus the following mechanisms,
• Deferred execution (computation problem)
• Digest communication (communication problem)
• Distributed Job tracker (job execution control problem)

13

Problem 1: computation

split 0 part 0

split 0 part 0


clouds
clouds

split 0 part 0

Tasks are executed 2t+1 times 14

Solution 1: Deferred execution
• Computation problem is uncommon
• Job Tracker replicates tasks across t+1 clouds (t in standby)
• If results differ or one cloud stops, request 1 more (up to t)

split 0

part 0

split 0

part 0

15

Problem 2: communication

split 0 part 0

split 0 part 0

clouds
split 0 part 0

All this communication through the Internet (delay, cost)! 16

Solution 2: Transferring Digests
• Reduces must fetch the map task outputs
• Intra-cloud fetch: output fetched normally
• Inter-cloud fetch: only hash of the output fetched – key idea

split 0

other clouds same cloud
part 0

split 0

split 0
17

Problem 3: Job execution control
• Job tracker controls all task executions in the task trackers in
all clouds
• If Job tracker is in one cloud separated from many task
trackers by the internet:
• Communication is slow
• Large timeouts for detecting task tracker failure
• …and it’s a single point of failure (this is the case in MR & Hadoop MR)

18

Solution 3: Job execution control
Client
VJT

Job
Tracker

Job Task Job
Tracker Tracker Tracker
Task Task
Tracker Tracker
Task Task
Tracker Tracker
Task Task Task Task
Tracker Tracker Tracker Tracker

19

Setup and Test
Platform configuration
• 3 clouds
• Each cloud has 3 nodes
• 1 JT and 3TT for each cloud
• All JTs are interconnected

Job submitted (Wordcount)
• Input data: 26 chunks of 64 MB (total 1.5GB )
• Map tasks: 26
• Reduce tasks: 120, 180, 360, 400

21

Number of reduce tasks executed
(no faults, t=1)

Nr. Job Job Diff
Reduce duration duration
tasks (Official) (CoC)
120 00:15:35 00:17:13 00:02:35
180 00:19:35 00:21:36 00:02:01
360 00:31:12 00:33:30 00:02:18
400 00:33:37 00:36:24 00:02:47

Task details
Official BFT Cloud-of-clouds: 1 view
Map Duration: 00:06:47 Map duration: 00:07:08
Map Tasks

Map Tasks
Reduce duration: 00:13:18 Reduce duration: 00:14:46
Reduce Tasks

Reduce Tasks

23

Conclusions
• Our method guarantee integrity and availability despite task
corruptions and cloud outages
• BFT MapReduce in cloud-of-clouds is feasible!
• No need to execute in all 2t+1 clouds
• Only digests sent through the Internet (no “big data”)
• Control job execution within each cloud

Thank you
24

Bft mr-clouds-of-clouds-discco2012 - navtalk

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bft mr-clouds-of-clouds-discco2012 - navtalk

Similar to Bft mr-clouds-of-clouds-discco2012 - navtalk (20)

Recently uploaded

Recently uploaded (20)

Bft mr-clouds-of-clouds-discco2012 - navtalk