Resource Aware Scheduling for Hadoop [Final Presentation]

2,759
-1

Published on

Presentation slides I used for the final presentation of my final year project

Published in: Technology, Business
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,759
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
176
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Resource Aware Scheduling for Hadoop [Final Presentation]

  1. 1. Na>onal
University
of
Singapore
 School
of
Compu>ng
 Department
of
Informa>on
Systems
Lu
Wei
Project
No:
H064420
Supervisor:
Professor
Tan
Kian‐Lee
RESOURCE‐AWARE
SCHEDULING
FOR
HADOOP

 1

  2. 2. MapReduce
&
Hadoop
 2

  3. 3. MapReduce
•  Distributed
data
processing
framework
by
 Google
•  Job
 – Map
func>on
 – Reduce
func>on
 3

  4. 4. Hadoop
Architecture
 4

  5. 5. Exis>ng
Schedulers
 5

  6. 6. Early
Schedulers
•  FIFO:
MapReduce
default,
by
Google
 – Priority
level
&
submission
>me
 – Data
locality
 – Problem:
starva>on
of
other
jobs
in
presence
of
a
 long
running
job
•  Hadoop
On
Demand
(HOD):
by
Yahoo!
 – Fairness:
Sta>c
node
alloca>on
using
Torque
 Resource
Manager
 – Problem:
Poor
data
locality
&
underu>liza>on
 6

  7. 7. Mainstream
Schedulers
•  Fair
Scheduler:
by
Facebook
 – Fairness:
dynamic
resource
redistribu>on

 – Challenges:

 •  data
locality
–
solved
with
delayed
scheduling
 •  Reduce/map
dependence
–
solved
with
copy‐compute
 splibng
•  Capacity
Scheduler:
by
Yahoo!
 – Similar
to
Fair
Scheduler
 – Special
support
for
memory
intensive
jobs
 7

  8. 8. Alterna>ve
Schedulers
•  Adap>ve
Scheduler
(2010‐2011)
 – Goal/deadline
orientated
 – Adap>vely
establish
predic>ons
by
job
matching
 – Problem:
strong
assump>ons
&
ques>onable
 performance
•  Machine
Learning
Approach
(2010)
 – Naïve
Bayes
&
Proceptron
with
the
aid
of
user
hints
 – Befer
performance
than
FIFO
 – Underu>liza>on
during
learning
phase
&
Overhead

 8

  9. 9. Exis>ng
Schedulers
Scheduler
 Pro
 Con
 Resource‐Awareness
FIFO
 High
throughput
 Starva>on
of
short
 Data
locality
 jobs
HOD
 Sharing
of
cluster
 Poor
data
locality
&
 ‐
 underu>liza>on
Fair
Scheduler
 Fairness
&
dynamic
 Complicated
 Data
locality
 resource
re‐ configura>on
 Copy‐compute
 alloca>on
 splibng
Capacity
Scheduler
 Similar
to
FS
 Similar
to
FS
 Special
support
for
 memory
intensive
jobs
Adap>ve
Scheduler
 Adap>ve
approach
 Strong
assump>ons
 Resource
u>liza>on
 &
ques>onable
 control
using
job
 performance
 matching
Machine
Learning
 Reported
befer
 Underu>liza>on
 Resource
u>liza>on
 performance
than
 during
learning
 control
using
pafern
 FIFO
 phase
&
overhead
 classifica>on
 9

  10. 10. Mo>va>ons
•  Heterogeneity
by
Configura>on
 – Hardware
capacity
differences
among
a
cluster
•  Heterogeneity
by
Usage
 – All
task
slots
are
treated
equally
without
 considera>ons
of
resource
status
of
current
node
 or
resource
demand
of
queuing
jobs
 – Possible
that
a
CPU
busy
node
is
assigned
a
CPU
 intensive
job;
and
an
I/O
busy
node
assigned
an
I/ O
intensive
job


 10

  11. 11. Resource‐Aware
Scheduler
 11

  12. 12. Design
Overview
1.  Capture

 –  the
job’s
resource
demand
characteris>cs

 –  the
TaskTracker’s
sta>c
capability
&
run>me
 usage
status

2.  Combine
and
Transform
into
quan>fied
 measurements
3.  Predict
how
fast
a
given
TaskTracker
is
 expected
to
finish
a
given
task
4.  Apply
scheduling
policy
of
choice

 12

  13. 13. Design
Details
•  TaskTracker
Profiling
 – Resource
scores:
represent
availability
 – Sampled
every
second
(at
every
heartbeat)
for
 each
TaskTracker
 13

  14. 14. Design
Details
 •  Task
Based
Job
Sampling
 –  Assump>on:
 tsample = ts−cpu + ts−disk + ts−network –  Target
measurements:
Task
resource
demand
 €TaskTracker
resource
statuses
 –  Technique:


 •  Periodical
re‐sampling:
avoid
over‐reliance
on
one
job
sample

 14

  15. 15. Design
Details
 •  Task
Processing
Time
Es>ma>on
 testimate = te −cpu + te −disk + te −network cs−cpu testimate = ts−cpu × + te −disk −in + te −disk −out + te −disk −spill + te −network −in + te −network −out ccpu cs−disk −read s€ te −disk −in = ts−disk −in × cdisk −read × disk −in ss−disk −in € ss−disk −spill sdisk −spill = × sin Ss−in € sout βs−oi −ratio × sin snetwork −out = = N total −reduce N total −reduce € 15
 €
  16. 16. Design
Details
•  Scheduling
policies
 – Map
Tasks
 •  Shortest
Job
First
(SJF)
 •  Starva>on
of
long
running
jobs:
addressed
by
periodical
 re‐sampling

 – Reduce
Tasks
 •  Naïve
I/O
Biasing
 – Do
not
schedule
I/O
intensive
job
on
I/O
busy
node
when
 there
are
other
reduce
slots
with
higher
disk
I/O
availability
 – I/O
intensive
job:
judged
using
map
phase
sample
 – I/O
busy
node:
disk
I/O
scores
below
cluster
average

 16

  17. 17. Implementa>on
 Es>mated
task
 MapTaskFinishTim processing
>me
 Resource
 eEs>mator
 Scheduler
 Resource
Scores
 Sample
task
processing
>me
&
data
sizes

 TaskTracker
 JobTracker
 TaskTrackerStatus
 MapSampleReport Job
profiles
 MyJobInProgress
 ResourceStatus
 Logger
 JobInProgress
 Resource
Profiles
 HashMap<JobID,
 TaskInProgress
 MapSampleReport>
 ResourceCalculator Task
 Plugin
 TaskStatus
 SampleTaskStatus
hfps://github.com/weilu/Hadoop‐Resource‐Aware‐Scheduler
 17

  18. 18. Evalua>on
&
Results
 18

  19. 19. Es>ma>on
Accuracy
•  Cluster
Configura>on
I
 –  Shared
with
other
users
and
other
applica>ons
 –  1
master,
10
slave
nodes
 –  1Gbps
network,
same
rack
 –  Each
node:

 •  4
processors:
Intel
Xeon
E5607
Quad
Core
CPU
(2.26GHz),

 •  32GBmemory,
and

 •  1TB
hard
disk
•  Hadoop
Configura>on
 –  HDFS
block
size:
64MB
 –  Data
replica>on:
1
 –  Each
node:
 •  Map
slots:
1
 •  Reduce
slots:
2
 –  Specula>ve
map
&
reduce
tasks:
off
 –  Completed
maps
required
before
scheduling
reduce:
1
out
of
1000
total
maps

 19

  20. 20. Es>ma>on
Accuracy
•  Workload
descrip>on:
 –  I/O
workload:
word
count
 •  Counts
the
occurrence
of
each
word
in
given
input
files

 •  Mapper:
Scans
through
the
input;
outputs
each
word
with
itself
as
 the
key
and
1
as
the
value,
sorted
on
the
key
value.
 •  Reducer:
Collects
those
with
the
same
key
by
adding
up
the
value;
 outputs
the
key
and
total
occurrence


 –  CPU
workload:
pi
es>ma>on
 •  Approximate
the
value
of
pi
by
coun>ng
the
number
of
points
that
 fall
within
the
unit
quarter
circle
 •  Mapper:
Reads
coordinates
of
points;
counts
points
inside/outside
 of
the
inscribed
circle
of
the
square.
 •  Reducer:
Accumulates
numbers
of
points
inside/outside
results
 from
the
mappers

 20

  21. 21. Es>ma>on
Accuracy
•  I/O
Workload
1
 (Resource
Scheduler,
wordcount,
10
node,
5G
in
data,
single
job)

 Es?mated
vs.
Actual
Task
Execu?on
Time


 es>mate
 actual
 160000
 140000
 120000
 100000
 80000
 60000
 40000
 20000
 0
 21

  22. 22. Es>ma>on
Accuracy
•  I/O
Workload
2
 (Resource
Scheduler,
wordcount,
10
node,
5G
in
data,
single
job)
 Es?mated
vs.
Actual
Task
Execu?on
Time


 es>mate
 actual
 45000
 40000
 35000
 30000
 25000
 20000
 22

  23. 23. Es>ma>on
Accuracy
•  CPU
Workload
1
 Resource
Scheduler
pi


 (10
node,
100maps,
108points
each,
Single
job)


 es>mated
 actual
 6000
 5000
 4000
 3000
 2000
 1000
 0
 23

  24. 24. Es>ma>on
Accuracy
 Resource
Scheduler
pi


•  CPU
Workload
2
 (10
node,
100maps,
109points
each,
Single
job)


 es>mated
 actual
 50000
 45000
 40000
 35000
 30000
 25000
 20000
 15000
 10000
 5000
 0
 24

  25. 25. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  Cluster
Configura>on
II
(Diff
to
Configura>on
I)
 –  Reserved
and
unshared
 –  1
master,
5
slave
nodes
•  Workload
Descrip>on
 –  Single
I/O
job:
word
count
 Overhead
Evalua>on
 –  Single
CPU
job:
pi
es>ma>on
 –  Simultaneous
submission
of
I/O
job
and
CPU
job

 Baseline
establishment:
reality
test

 25

  26. 26. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

Resource‐Homogeneous
Environment
•  Overhead
Evalua>on

 Table
9
–
evalua?on
and
results:
word
count
in
resource‐homogeneous
environment
3runs
(summary)
 Table
10
–
evalua?on
and
results:
pi
es?ma?on
in
resource‐homogeneous
environment
3runs
(summary)
 26

  27. 27. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  FIFO
vs
Resource
Scheduler
in
a
Resource‐Homogeneous
 Environment

 27

  28. 28. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  Analysis
 FIFO
vs
Resource
Scheduler
in
a
Resource‐ Homogeneous
Environment
 –  Negligible
overhead
 (Simultaneous
submission
of
an
I/O
job
 and
a
CPU
job
)

 –  Resource
Scheduler
performs
 1700
 worse:
slowdown
in
all
 1650
 measured
dimensions
and
case
 1600
 –  Reason:
Resource
scheduler
has
 1550
 more
concurrent
running
 1500
 reducers
compe>ng
for
 1450
 worst
 resources
 1400
 average
 1350
 best
 –  Expect:
Same
performance
in
a
 1300
 busy
cluster
(all
reduce
slots
are
 1250
 constantly
filled
with
running
 1200
 tasks)
 FIFO
 Resource
 FIFO
 Resource
 total
map
>me
(sec)
 total
job
>me
(sec)
 28

  29. 29. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

Resource‐Heterogeneous
Environment
•  Environment
Simula>on
 –  CPU
interven>on:
Non‐MapReduce
Pi
es>ma>on
 –  Disk
I/O
interven>on:
dd
50G
write‐read
•  Simulated
Environment
 –  3
CPU
busy
nodes
+
2
Disk
IO
busy
nodes

 29

  30. 30. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  FIFO
vs
Resource
Scheduler
in
a
Resource‐Heterogeneous
 Environment
(Sequen>al
submission
of
2
jobs)

 30

  31. 31. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

•  FIFO
vs
Resource
Scheduler
in
a
Resource‐Heterogeneous
 Environment
(Concurrent
submission
of
2
jobs)

 31

  32. 32. Performance
Benchmark:

Resource
Scheduler
vs.
FIFO
Scheduler

 FIFO
vs
Resource
Scheduler
in
a
Resource‐ Heterogeneous
Environment

 Total
map
?me

 (Simultaneous
submission
of
an
I/O
job
and
a
 percentage
slowdown
of
resource
to
FIFO
scheduler
 CPU
job
)


 16.00%
 14.00%
2700
 12.00%
2550
 10.00%
 homogenous
 8.00%
 environment
2400
 6.00%
 heterogenous
 4.00%
 environment

2250
 2.00%
 0.00%
2100
 Best
 Average
 Worst
1950
 worst
 Total
job
?me

 percentage
slowdown
of
resource
to
FIFO
scheduler


1800
 average
 20.00%
 best
 18.00%
1650
 16.00%
 14.00%
1500
 12.00%
 homogenous
 10.00%
 environment
1350
 8.00%
 6.00%
 heterogenous
1200
 4.00%
 environment
 FIFO
 Resource
 FIFO
 Resource
 2.00%
 0.00%
 ‐2.00%
 Best
 Average
 Worst
 Total
map
>me
(sec)
 Total
job
>me
(sec)
 ‐4.00%
 32

  33. 33. Conclusion
•  Resource
based
map
task
processing
>me
es>ma>on
is
sa>sfactory
•  Resource
scheduler
did
not
manage
to
outperform
FIFO
scheduler
 in
resource‐homogenous
environment
and
most
cases
of
resource
 heterogeneous
environment
due
to
extra
concurrent
reduce
tasks
•  However
we
verified
that
resource
scheduler
is
indeed
resource
 aware
–
it
performs
befer
when
moved
from
a
resource‐ homogeneous
environment
to
a
resource‐heterogeneous
 environment:
 –  Smaller
percentage
slowdown
compared
to
FIFO
in
all
cases
and
all
 measured
dimensions
 –  Observed
speedup
compared
to
FIFO
in
worse
cases
due
to
I/O
biasing
 scheduling
during
reduce
stage
 33

  34. 34. Recommenda>ons
for
Future
Work
•  Evalua>on
 – Heavier
workload
&
busy
cluster
 •  Observe
overhead
 •  Benchmark
performance
•  Scheduling
policy
 – Map
Task
 •  Highest
Response
Ra>o
Next
(HRRN)
 testimated + twaiting twaiting priority = = 1+ testimated testimated – Reduce
Task
 •  CPU
Biasing
for
CPU
intensive
jobs
 € 34


×