Distributed Computing

Distributed Computing

Varun Thacker

Linux User’s Group Manipal

April 8, 2010

Varun Thacker (LUG Manipal) Distributed Computing April 8, 2010 1 / 42

Outline I
1 Introduction
LUG Manipal
Points To Remember
2 Distributed Computing
Technologies to be covered
Idea
Data !!
Why Distributed Computing is Hard
Why Distributed Computing is Important
Three Common Distributed Architectures
3 Distributed File System
GFS
What a Distributed File System Does
Google File System Architecture
GFS Architecture: Chunks

Outline II
GFS Architecture: Master
GFS: Life of a Read
GFS: Life of a Write
GFS: Master Failure
4 MapReduce
MapReduce
Do We Need It?
Bad News!
MapReduce
Map Reduce Paradigm
MapReduce Paradigm
Working
Working
Under the hood: Scheduling
Robustness
5 Hadoop

Outline III
Hadoop
What is Hadoop
Who uses Hadoop?
Mapper
Combiners
Reducer
Some Terminology
Job Distribution

6 Contact Information

7 Attribution

8 Copying


Who are we?



Who are we?

Life, Universe and FOSS!!


Who are we?

Believers of Knowledge Sharing


Who are we?

Most technologically focused “group” in University


Who are we?

LUG Manipal is a non proﬁt “Group” alive only on voluntary work!!


Who are we?

LUG Manipal is a non proﬁt “Group” alive only on voluntary work!!
http://lugmanipal.org


Points To Remember!!!

If you have problem(s) don’t hesitate to ask



Slides are based on Documentation so discussions are really
important, slides are for later reference!!



Please dont consider sessions as Class( Classes are boring !! )



Speaker is just like any person sitting next to you



Documentation is really important



Google is your friend



Google is your friend
If you have questions after this workshop mail me or come to LUG
Manipal’s forums
http://forums.lugmanipal.org



Distributed computing refers to the use of distributed systems to
solve computational problems on the distributed system.



A distributed system consists of multiple computers that
communicate through a network.



MapReduce is a framework which implements the idea of a
distributed computing.



GFS is the distributed ﬁle system on which distributed programs store
and process data in Google. It’s free implementation is HDFS.



GFS is the distributed ﬁle system on which distributed programs store
and process data in Google. It’s free implementation is HDFS.
Hadoop is an open source framework written in Java which
implements the MapReduce technology.


Idea

While the storage capacities of hard drives have increased massively
over the years, access speeds—the rate at which data can be read
from drives have not kept up.


Idea

One terabyte drives are the norm, but the transfer speed is around
100 MB/s, so it takes more than two and a half hours to read all the
data oﬀ the disk.


Idea

One terabyte drives are the norm, but the transfer speed is around
100 MB/s, so it takes more than two and a half hours to read all the
data oﬀ the disk.
The obvious way to reduce the time is to read from multiple disks at
once. Imagine if we had 100 drives, each holding one hundredth of
the data. Working in parallel, we could read the data in under two
minutes.


Data

We live in the data age.An IDC estimate put the size of the “digital
universe” at 0.18 zettabytes(?) in 2006.


Data

And by 2011 there will be a tenfold growth to 1.8 zettabytes.


Data

1 zetabyte is one million petabytes, or one billion terabytes.


Data

The New York Stock Exchange generates about one terabyte of new
trade data per day.


Data

trade data per day.
Facebook hosts approximately 10 billion photos, taking up one
petabyte of storage.


Data

trade data per day.
Facebook hosts approximately 10 billion photos, taking up one
petabyte of storage.
The Large Hadron Collider near Geneva produces about 15 petabytes
of data per year.



Computers crash.



Computers crash.
Network links crash.



Computers crash.
Talking is slow(even ethernet has 300 microsecond latency, during
which time your 2Ghz PC can do 600,000 cycles).



Computers crash.
Bandwidth is ﬁnite.



Computers crash.
Bandwidth is ﬁnite.
Internet scale: the computers and network are
heterogeneous,untrustworthy, and subject to change at any time.



Can be more reliable.



Can be faster.



Can be faster.
Can be cheaper ($30 million Cray versus 100 $1000 PC’s).



Hope: have N computers do separate pieces of work. Speed-up < N.
Probability of failure = 1–(1 − p)N ≈ Np. (p = probability of
individual crash).



individual crash).
Replication: have N computers do the same thing. Speed-up < 1.
Probability of failure = p N .



individual crash).
Replication: have N computers do the same thing. Speed-up < 1.
Probability of failure = p N .
Master-servant: have 1 computer hand out pieces of work to N-1
servants, and re-hand out pieces of work if servants fail. Speed-up
< N − 1. Probability of failure ≈ p.


GFS

GFS



Usual file system stuff: create, read, move & find files.



Allow distributed access to ﬁles.



Files are stored distributedly.



If you just do #1 and #2, you are a network ﬁle system.



If you just do #1 and #2, you are a network ﬁle system.
To do #3, it’s a good idea to also provide fault tolerance.


GFS Architecture



Files are divided into 64 MB chunks (last chunk of a ﬁle may be
smaller).



smaller).
Each chunk is identiﬁed by an unique 64-bit id.



smaller).
Chunks are stored as regular ﬁles on local disks.



smaller).
By default, each chunk is stored thrice, preferably on more than one
rack.



smaller).
rack.
To protect data integrity, each 64 KB block gets a 32 bit checksum
that is checked on all reads.



smaller).
rack.
To protect data integrity, each 64 KB block gets a 32 bit checksum
that is checked on all reads.
When idle, a chunkserver scans inactive chunks for corruption.



Stores all metadata (namespace, access control).



Stores (ﬁle − > chunks) and (chunk − > location) mappings.



Clients get chunk locations for a ﬁle from the master, and then talk
directly to the chunkservers for the data.



Advantage of single master simplicity.



Disadvantages of single master:



Metadata operations are bottlenecked.



Metadata operations are bottlenecked.
Maximum Number of ﬁles limited by master’s memory.


GFS: Life of a Read

Client program asks for 1 Gb of ﬁle “A” starting at the 200 millionth
byte.


GFS: Life of a Read

byte.
Client GFS library asks master for chunks 3, ... 16387 of ﬁle “A”.


GFS: Life of a Read

byte.
Master responds with all of the locations of chunks 2, ... 20000 of ﬁle
“A”.


GFS: Life of a Read

byte.
“A”.
Client caches all of these locations (with their cache time-outs)


GFS: Life of a Read

byte.
“A”.
Client reads chunk 2 from the closest location.


GFS: Life of a Read

byte.
“A”.


GFS: Life of a Read

byte.
“A”.
...



Client gets locations of chunk replicas as before.



For each chunk, client sends the write data to nearest replica.



This replica sends the data to the nearest replica to it that has not
yet received the data.



When all of the replicas have received the data, then it is safe for
them to actually write it.



Tricky Details:



Tricky Details:
Master hands out a short term ( 1 minute) lease for a particular
replica to be the primary one.



Tricky Details:
Master hands out a short term ( 1 minute) lease for a particular
replica to be the primary one.
This primary replica assigns a serial number to each mutation so that
every replica performs the mutations in the same order.


GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutation
log.


GFS: Master Failure

log.
Both are replicated.


GFS: Master Failure

log.
Master election and notiﬁcation is implemented using an external lock
server.


GFS: Master Failure

log.
Master election and notiﬁcation is implemented using an external lock
server.
New master restores state from checkpoint and log.


MapReduce

MapReduce


Do We Need It?

Yes: Otherwise some problems are too big.


Do We Need It?

Example: 20+ billion web pages x 20KB = 400+ terabytes


Do We Need It?

One computer can read 30-35 MB/sec from disk


Do We Need It?

four months to read the web


Do We Need It?

four months to read the web
Same problem with 1000 machines, < 3 hours


Bad News!

Bad News!!


Bad News!

Bad News!!
communication and coordination


Bad News!

Bad News!!
recovering from machine failure (all the time!)


Bad News!

Bad News!!
debugging


Bad News!

Bad News!!
debugging
optimization


Bad News!

Bad News!!
debugging
optimization
locality


Bad News!

Bad News!!
debugging
optimization
locality
Bad news II: repeat for every problem you want to solve


Bad News!

Bad News!!
debugging
optimization
locality
Bad news II: repeat for every problem you want to solve
Good News I and II: MapReduce and Hadoop!


MapReduce

A simple programming model that applies to many large-scale
computing problems


MapReduce

computing problems
Hide messy details in MapReduce runtime library:


MapReduce

computing problems
automatic parallelization


MapReduce

computing problems
load balancing


MapReduce

computing problems
load balancing
network and disk transfer optimization


MapReduce

computing problems
load balancing
handling of machine failures


MapReduce

computing problems
load balancing
robustness


MapReduce

computing problems
load balancing
robustness
Therfore we can write application level programs and let MapReduce
insulate us from many concerns.


Map Reduce Paradigm

Read a lot of data


Map Reduce Paradigm

Read a lot of data
Map: extract something you care about from each record.


Map Reduce Paradigm

Read a lot of data
Shuﬄe and Sort.


Map Reduce Paradigm

Read a lot of data
Shuﬄe and Sort.
Reduce: aggregate, summarize, ﬁlter, or transform


Map Reduce Paradigm

Read a lot of data
Shuﬄe and Sort.
Reduce: aggregate, summarize, ﬁlter, or transform
Write the results.


MapReduce Paradigm

Basic data type: the key-value pair (k,v).


MapReduce Paradigm

For example, key = URL, value = HTML of the web page.


MapReduce Paradigm

Programmer speciﬁes two primary methods:


MapReduce Paradigm

Map: (k, v) − > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>


MapReduce Paradigm

Reduce: (k’, <v’1, v’2,...,v’n’>) − > <(k’, v”1), (k’, v”2),...,(k’,
v”n”)>


MapReduce Paradigm

v”n”)>
All v’ with same k’ are reduced together.


MapReduce Paradigm

v”n”)>
All v’ with same k’ are reduced together.
(Remember the invisible “Shuﬄe and Sort” step.)


Working



One master, many workers



Input data split into M map tasks (typically 64 MB in size)



¯
Reduce phase partitioned into R reduce tasks (# of output ﬁles)



¯
Tasks are assigned to workers dynamically



¯
Master assigns each map task to a free worker



¯
Considers locality of data to worker when assigning task



¯
Worker reads task input (often from local disk!)



¯
Worker produces R local ﬁles containing intermediate (k,v) pairs



¯
Master assigns each reduce task to a free worker



¯
Worker reads intermediate (k,v) pairs from map workers



¯
Worker sorts & applies user’s Reduce op to produce the output



¯
Worker sorts & applies user’s Reduce op to produce the output
User may specify Partition: which intermediate keys to which Reducer


Robustness

One master, many workers.


Robustness

Detect failure via periodic heartbeats.


Robustness

Re-execute completed and in-progress map tasks.


Robustness

Re-execute in-progress reduce tasks.


Robustness

Master assigns each map task to a free worker.


Robustness

Master failure:


Robustness

Master failure:
State is checkpointed to replicated ﬁle system.


Robustness

Master failure:
New master recovers & continues.


Robustness

Master failure:
New master recovers & continues.
Very Robust: lost 1600 of 1800 machines once, but ﬁnished
ﬁne-Google.


Hadoop

Hadoop


What is hadoop

Apache Hadoop is a Java software framework that supports
data-intensive distributed applications under a free license.


What is hadoop

Hadoop was inspired by Google’s MapReduce and Google File System
(GFS) papers.


What is hadoop

(GFS) papers.
A Map/Reduce job usually splits the input data-set into independent
chunks which are processed by the map tasks in a completely parallel
manner.


What is hadoop

(GFS) papers.
manner.
It is then made input to the reduce tasks.


What is hadoop

(GFS) papers.
manner.
It is then made input to the reduce tasks.
The framework takes care of scheduling tasks, monitoring them and
re-executes the failed tasks.


Who uses Hadoop?

Adobe


Who uses Hadoop?

Adobe
AOL


Who uses Hadoop?

Adobe
AOL
Baidu - the leading Chinese language search engine


Who uses Hadoop?

Adobe
AOL
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.


Who uses Hadoop?

Adobe
AOL
Facebook


Who uses Hadoop?

Adobe
AOL
Facebook
Google


Who uses Hadoop?

Adobe
AOL
Facebook
Google
IBM


Who uses Hadoop?

Adobe
AOL
Facebook
Google
IBM
Twitter


Who uses Hadoop?

Adobe
AOL
Facebook
Google
IBM
Twitter
Yahoo!


Who uses Hadoop?

Adobe
AOL
Facebook
Google
IBM
Twitter
Yahoo!
The New York Times,Last.fm,Hulu,LinkedIn


Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.


Mapper
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.


Mapper
pairs.
Output pairs do not need to be of the same types as input pairs.


Mapper
pairs.
Mapper implementations are passed the JobConf for the job.


Mapper
pairs.
The framework then calls map method for each key/value pair.


Mapper
pairs.
Applications can use the Reporter to report progress.


Mapper
pairs.
All intermediate values associated with a given output key are
subsequently grouped by the framework, and passed to the
Reducer(s) to determine the ﬁnal output.


Mapper
pairs.
The intermediate, sorted outputs are always stored in a simple
(key-len, key, value-len, value) format.


Mapper
pairs.
The number of maps is usually driven by the total size of the inputs,
that is, the total number of blocks of the input ﬁles.


Mapper
pairs.
The number of maps is usually driven by the total size of the inputs,
that is, the total number of blocks of the input ﬁles.
Users can optionally specify a combiner to perform local aggregation
of the intermediate outputs.

Combiners

When the map operation outputs its pairs they are already available
in memory.


Combiners

in memory.
If a combiner is used then the map key-value pairs are not
immediately written to the output.


Combiners

in memory.
They are collected in lists, one list per each key value.


Combiners

in memory.
They are collected in lists, one list per each key value.
When a certain number of key-value pairs have been written, this
buﬀer is ﬂushed by passing all the values of each key to the combiner’s
reduce method and outputting the key-value pairs of the combine
operation as if they were created by the original map operation.


Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.


Reducer
Reducer implementations are passed the JobConf for the job.


Reducer
The framework then calls reduce(WritableComparable, Iterator,
OutputCollector, Reporter) method for each ¡key, (list of values)¿ pair
in the grouped inputs.


Reducer
The reducer has 3 primary phases:


Reducer
Shuﬄe:Input to the Reducer is the sorted output of the mappers. In
this phase the framework fetches the relevant partition of the output
of all the mappers, via HTTP.


Reducer
Sort:The framework groups Reducer inputs by keys (since diﬀerent
mappers may have output the same key) in this stage.


Reducer
Reduce:In this phase the reduce method is called for each <key, (list
of values)> pair in the grouped inputs.


Reducer
Reduce:In this phase the reduce method is called for each <key, (list
of values)> pair in the grouped inputs.
The generated ouput is a new value.

Some Terminology

Job – A “full program” - an execution of a Mapper and Reducer
across a data set.


Some Terminology

across a data set.
Task – An execution of a Mapper or a Reducer on a slice of data


Some Terminology

across a data set.
Task – An execution of a Mapper or a Reducer on a slice of data
Task Attempt – A particular instance of an attempt to execute a task
on a machine.


Job Distribution

MapReduce programs are contained in a Java “jar” file + an XML file
containing serialized program configuration options.


Job Distribution

Running a MapReduce job places these ﬁles into the HDFS and
notiﬁes TaskTrackers where to retrieve the relevant program code.


Job Distribution

Running a MapReduce job places these ﬁles into the HDFS and
notiﬁes TaskTrackers where to retrieve the relevant program code.
Data Distribution: Implicit in design of MapReduce!


Contact Information

Varun Thacker
varunthacker1989@gmail.com
http://lugmanipal.org
http:
http://forums.lugmanipal.org
//varunthacker.wordpress.com


Attribution

Google
Under the Creative Commons Attribution-Share Alike 2.5 Generic.


Copying

Creative Commons Attribution-Share Alike 2.5 India License
http://creativecommons.org/licenses/by-sa/2.5/in/


Distributed Computing

Recommended

Recommended

More Related Content

Similar to Distributed Computing

Similar to Distributed Computing (20)

Distributed Computing