Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Resource Aware Scheduling for Hadoop [Final Presentation]

3,450 views

Published on

Presentation slides I used for the final presentation of my final year project

Published in: Technology, Business

Resource Aware Scheduling for Hadoop [Final Presentation]

  1. 1. Na>onal
University
of
Singapore
 School
of
Compu>ng
 Department
of
Informa>on
Systems
Lu
Wei
Project
No:
H064420
Supervisor:
Professor
Tan
Kian‐Lee
RESOURCE‐AWARE
SCHEDULING
FOR
HADOOP

 1

  2. 2. MapReduce
&
Hadoop
 2

  3. 3. MapReduce
•  Distributed
data
processing
framework
by
 Google
•  Job
 – Map
func>on
 – Reduce
func>on
 3

  4. 4. Hadoop
Architecture
 4

  5. 5. Exis>ng
Schedulers
 5

  6. 6. Early
Schedulers
•  FIFO:
MapReduce
default,
by
Google
 – Priority
level
&
submission
>me
 – Data
locality
 – Problem:
starva>on
of
other
jobs
in
presence
of
a
 long
running
job
•  Hadoop
On
Demand
(HOD):
by
Yahoo!
 – Fairness:
Sta>c
node
alloca>on
using
Torque
 Resource
Manager
 – Problem:
Poor
data
locality
&
underu>liza>on
 6

  7. 7. Mainstream
Schedulers
•  Fair
Scheduler:
by
Facebook
 – Fairness:
dynamic
resource
redistribu>on

 – Challenges:

 •  data
locality
–
solved
with
delayed
scheduling
 •  Reduce/map
dependence
–
solved
with
copy‐compute
 splibng
•  Capacity
Scheduler:
by
Yahoo!
 – Similar
to
Fair
Scheduler
 – Special
support
for
memory
intensive
jobs
 7

  8. 8. Alterna>ve
Schedulers
•  Adap>ve
Scheduler
(2010‐2011)
 – Goal/deadline
orientated
 – Adap>vely
establish
predic>ons
by
job
matching
 – Problem:
strong
assump>ons
&
ques>onable
 performance
•  Machine
Learning
Approach
(2010)
 – Naïve
Bayes
&
Proceptron
with
the
aid
of
user
hints
 – Befer
performance
than
FIFO
 – Underu>liza>on
during
learning
phase
&
Overhead

 8

  9. 9. Exis>ng
Schedulers
Scheduler
 Pro
 Con
 Resource‐Awareness
FIFO
 High
throughput
 Starva>on
of
short
 Data
locality
 jobs
HOD
 Sharing
of
cluster
 Poor
data
locality
&
 ‐
 underu>liza>on
Fair
Scheduler
 Fairness
&
dynamic
 Complicated
 Data
locality
 resource
re‐ configura>on
 Copy‐compute
 alloca>on
 splibng
Capacity
Scheduler
 Similar
to
FS
 Similar
to
FS
 Special
support
for
 memory
intensive
jobs
Adap>ve
Scheduler
 Adap>ve
approach
 Strong
assump>ons
 Resource
u>liza>on
 &
ques>onable
 control
using
job
 performance
 matching
Machine
Learning
 Reported
befer
 Underu>liza>on
 Resource
u>liza>on
 performance
than
 during
learning
 control
using
pafern
 FIFO
 phase
&
overhead
 classifica>on
 9

  10. 10. Mo>va>ons
•  Heterogeneity
by
Configura>on
 – Hardware
capacity
differences
among
a
cluster
•  Heterogeneity
by
Usage
 – All
task
slots
are
treated
equally
without
 considera>ons
of
resource
status
of
current
node
 or
resource
demand
of
queuing
jobs
 – Possible
that
a
CPU
busy
node
is
assigned
a
CPU
 intensive
job;
and
an
I/O
busy
node
assigned
an
I/ O
intensive
job


 10

  11. 11. Resource‐Aware
Scheduler
 11

  12. 12. Design
Overview
1.  Capture

 –  the
job’s
resource
demand
characteris>cs

 –  the
TaskTracker’s
sta>c
capability
&
run>me
 usage
status

2.  Combine
and
Transform
into
quan>fied
 measurements
3.  Predict
how
fast
a
given
TaskTracker
is
 expected
to
finish
a
given
task
4.  Apply
scheduling
policy
of
choice

 12

  13. 13. Design
Details
•  TaskTracker
Profiling
 – Resource
scores:
represent
availability
 – Sampled
every
second
(at
every
heartbeat)
for
 each
TaskTracker
 13

  14. 14. Design
Details
 •  Task
Based
Job
Sampling
 –  Assump>on:
 tsample = ts−cpu + ts−disk + ts−network –  Target
measurements:
Task
resource
demand
 €TaskTracker
resource
statuses
 –  Technique:


 •  Periodical
re‐sampling:
avoid
over‐reliance
on
one
job
sample

 14

  15. 15. Design
Details
 •  Task
Processing
Time
Es>ma>on
 testimate = te −cpu + te −disk + te −network cs−cpu testimate = ts−cpu × + te −disk −in + te −disk −out + te −disk −spill + te −network −in + te −network −out ccpu cs−disk −read s€ te −disk −in = ts−disk −in × cdisk −read × disk −in ss−disk −in € ss−disk −spill sdisk −spill = × sin Ss−in € sout βs−oi −ratio × sin snetwork −out = = N total −reduce N total −reduce € 15
 €
  16. 16. Design
Details
•  Scheduling
policies
 – Map
Tasks
 •  Shortest
Job
First
(SJF)
 •  Starva>on
of
long
running
jobs:
addressed
by
periodical
 re‐sampling

 – Reduce
Tasks
 •  Naïve
I/O
Biasing
 – Do
not
schedule
I/O
intensive
job
on
I/O
busy
node
when
 there
are
other
reduce
slots
with
higher
disk
I/O
availability
 – I/O
intensive
job:
judged
using
map
phase
sample
 – I/O
busy
node:
disk
I/O
scores
below
cluster
average

 16

  17. 17. Implementa>on
 Es>mated
task
 MapTaskFinishTim processing
>me
 Resource
 eEs>mator
 Scheduler
 Resource
Scores
 Sample
task
processing
>me
&
data
sizes

 TaskTracker
 JobTracker
 TaskTrackerStatus
 MapSampleReport Job
profiles
 MyJobInProgress
 ResourceStatus
 Logger
 JobInProgress
 Resource
Profiles
 HashMap<JobID,
 TaskInProgress
 MapSampleReport>
 ResourceCalculator Task
 Plugin
 TaskStatus
 SampleTaskStatus
hfps://github.com/weilu/Hadoop‐Resource‐Aware‐Scheduler
 17

  18. 18. Evalua>on
&
Results
 18

  19. 19. Es>ma>on
Accuracy
•  Cluster
Configura>on
I
 –  Shared
with
other
users
and
other
applica>ons
 –  1
master,
10
slave
nodes
 –  1Gbps
network,
same
rack
 –  Each
node:

 •  4
processors:
Intel
Xeon
E5607
Quad
Core
CPU
(2.26GHz),

 •  32GBmemory,
and

 •  1TB
hard
disk
•  Hadoop
Configura>on
 –  HDFS
block
size:
64MB
 –  Data
replica>on:
1
 –  Each
node:
 •  Map
slots:
1
 •  Reduce
slots:
2
 –  Specula>ve
map
&
reduce
tasks:
off
 –  Completed
maps
required
before
scheduling
reduce:
1
out
of
1000
total
maps

 19

  20. 20. Es>ma>on
Accuracy
•  Workload
descrip>on:
 –  I/O
workload:
word
count
 •  Counts
the
occurrence
of
each
word
in
given
input
files

 •  Mapper:
Scans
through
the
input;
outputs
each
word
with
itself
as
 the
key
and
1
as
the
value,
sorted
on
the
key
value.
 •  Reducer:
Collects
those
with
the
same
key
by
adding
up
the
value;
 outputs
the
key
and
total
occurrence


 –  CPU
workload:
pi
es>ma>on
 •  Approximate
the
value
of
pi
by
coun>ng
the
number
of
points
that
 fall
within
the
unit
quarter
circle
 •  Mapper:
Reads
coordinates
of
points;
counts
points
inside/outside
 of
the
inscribed
circle
of
the
square.
 •  Reducer:
Accumulates
numbers
of
points
inside/outside
results
 from
the
mappers

 20

  21. 21. Es>ma>on
Accuracy
•  I/O
Workload
1
 (Resource
Scheduler,
wordcount,
10
node,
5G
in
data,
single
job)

 Es?mated
vs.
Actual
Task
Execu?on
Time


 es>mate
 actual
 160000
 140000
 120000
 100000
 80000
 60000
 40000
 20000
 0
 21

  22. 22. Es>ma>on
Accuracy
•  I/O
Workload
2
 (Resource
Scheduler,
wordcount,
10
node,
5G
in
data,
single
job)
 Es?mated
vs.
Actual
Task
Execu?on
Time


 es>mate
 actual
 45000
 40000
 35000
 30000
 25000
 20000
 22

  23. 23. Es>ma>on
Accuracy
•  CPU
Workload
1
 Resource
Scheduler
pi


 (10
node,
100maps,
108points
each,
Single
job)


 es>mated
 actual
 6000
 5000
 4000
 3000
 2000
 1000
 0
 23

  24. 24. Es>ma>on
Accuracy
 Resource
Scheduler
pi


•  CPU
Workload
2
 (10
node,
100maps,
109points
each,
Single
job)


 es>mated
 actual
 50000
 45000
 40000
 35000
 30000
 25000
 20000
 15000
 10000
 5000
 0
 24

  25. 25. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  Cluster
Configura>on
II
(Diff
to
Configura>on
I)
 –  Reserved
and
unshared
 –  1
master,
5
slave
nodes
•  Workload
Descrip>on
 –  Single
I/O
job:
word
count
 Overhead
Evalua>on
 –  Single
CPU
job:
pi
es>ma>on
 –  Simultaneous
submission
of
I/O
job
and
CPU
job

 Baseline
establishment:
reality
test

 25

  26. 26. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

Resource‐Homogeneous
Environment
•  Overhead
Evalua>on

 Table
9
–
evalua?on
and
results:
word
count
in
resource‐homogeneous
environment
3runs
(summary)
 Table
10
–
evalua?on
and
results:
pi
es?ma?on
in
resource‐homogeneous
environment
3runs
(summary)
 26

  27. 27. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  FIFO
vs
Resource
Scheduler
in
a
Resource‐Homogeneous
 Environment

 27

  28. 28. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  Analysis
 FIFO
vs
Resource
Scheduler
in
a
Resource‐ Homogeneous
Environment
 –  Negligible
overhead
 (Simultaneous
submission
of
an
I/O
job
 and
a
CPU
job
)

 –  Resource
Scheduler
performs
 1700
 worse:
slowdown
in
all
 1650
 measured
dimensions
and
case
 1600
 –  Reason:
Resource
scheduler
has
 1550
 more
concurrent
running
 1500
 reducers
compe>ng
for
 1450
 worst
 resources
 1400
 average
 1350
 best
 –  Expect:
Same
performance
in
a
 1300
 busy
cluster
(all
reduce
slots
are
 1250
 constantly
filled
with
running
 1200
 tasks)
 FIFO
 Resource
 FIFO
 Resource
 total
map
>me
(sec)
 total
job
>me
(sec)
 28

  29. 29. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

Resource‐Heterogeneous
Environment
•  Environment
Simula>on
 –  CPU
interven>on:
Non‐MapReduce
Pi
es>ma>on
 –  Disk
I/O
interven>on:
dd
50G
write‐read
•  Simulated
Environment
 –  3
CPU
busy
nodes
+
2
Disk
IO
busy
nodes

 29

  30. 30. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  FIFO
vs
Resource
Scheduler
in
a
Resource‐Heterogeneous
 Environment
(Sequen>al
submission
of
2
jobs)

 30

  31. 31. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  FIFO
vs
Resource
Scheduler
in
a
Resource‐Heterogeneous
 Environment
(Concurrent
submission
of
2
jobs)

 31

  32. 32. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

 FIFO
vs
Resource
Scheduler
in
a
Resource‐ Heterogeneous
Environment

 Total
map
?me

 (Simultaneous
submission
of
an
I/O
job
and
a
 percentage
slowdown
of
resource
to
FIFO
scheduler
 CPU
job
)


 16.00%
 14.00%
2700
 12.00%
2550
 10.00%
 homogenous
 8.00%
 environment
2400
 6.00%
 heterogenous
 4.00%
 environment

2250
 2.00%
 0.00%
2100
 Best
 Average
 Worst
1950
 worst
 Total
job
?me

 percentage
slowdown
of
resource
to
FIFO
scheduler


1800
 average
 20.00%
 best
 18.00%
1650
 16.00%
 14.00%
1500
 12.00%
 homogenous
 10.00%
 environment
1350
 8.00%
 6.00%
 heterogenous
1200
 4.00%
 environment
 FIFO
 Resource
 FIFO
 Resource
 2.00%
 0.00%
 ‐2.00%
 Best
 Average
 Worst
 Total
map
>me
(sec)
 Total
job
>me
(sec)
 ‐4.00%
 32

  33. 33. Conclusion
•  Resource
based
map
task
processing
>me
es>ma>on
is
sa>sfactory
•  Resource
scheduler
did
not
manage
to
outperform
FIFO
scheduler
 in
resource‐homogenous
environment
and
most
cases
of
resource
 heterogeneous
environment
due
to
extra
concurrent
reduce
tasks
•  However
we
verified
that
resource
scheduler
is
indeed
resource
 aware
–
it
performs
befer
when
moved
from
a
resource‐ homogeneous
environment
to
a
resource‐heterogeneous
 environment:
 –  Smaller
percentage
slowdown
compared
to
FIFO
in
all
cases
and
all
 measured
dimensions
 –  Observed
speedup
compared
to
FIFO
in
worse
cases
due
to
I/O
biasing
 scheduling
during
reduce
stage
 33

  34. 34. Recommenda>ons
for
Future
Work
•  Evalua>on
 – Heavier
workload
&
busy
cluster
 •  Observe
overhead
 •  Benchmark
performance
•  Scheduling
policy
 – Map
Task
 •  Highest
Response
Ra>o
Next
(HRRN)
 testimated + twaiting twaiting priority = = 1+ testimated testimated – Reduce
Task
 •  CPU
Biasing
for
CPU
intensive
jobs
 € 34


×