Use of a Levy Distribution for Modeling Best Case Execution Time Variation

Use of a Levy Distribution for Modeling
Best Case Execution Time Variation
Jonathan Beard, Roger Chamberlain
SBS
Stream Based
Supercomputing Lab
http://sbs.wustl.edu
Work also supported by:
1

Outline
• Motivation!
• Stream Processing!
• Optimization Goals!
• Methodology!
• Distributions!
• Results
2

SBS
Stream Based
Supercomputing Lab
Streaming Computing
3

SBS
Stream Based
Supercomputing Lab
Streaming Computing
Kernel
3

SBS
Stream Based
Supercomputing Lab
Streaming Computing
Kernel 1
Kernel 2
Kernel 3
Kernel 2
Stream
Stream
Stream
Stream
4

SBS
Stream Based
Supercomputing Lab
Streaming Languages
StreamIt, Auto-Pipe, Brook, Cg, S-Net,
Scala-Pipe, Streams-C and
many others
5

SBS
Stream Based
Supercomputing Lab
Optimization
Slow
Fast Kernel
Super Fast
Medium
6

SBS
Stream Based
Supercomputing Lab
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A
1 2
3 4
multi-core B
1 2
3 4
More allocation choices,
NUMA node A or B to
allocate stream.
7

1 2
SBS
Stream Based
Supercomputing Lab
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A
1 2
3 4
multi-core B
1 2
3 4
NUMA node A or B to
allocate stream.
7

SBS
Stream Based
Supercomputing Lab
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A
1 2
3 4
multi-core B
1 2
3 4
NUMA node A or B to
allocate stream.
1 2
7

SBS
Stream Based
Supercomputing Lab
Optimization
A B C
“Stream” is modeled as a Queue
A Q1 B Q2 C
8

We want good models for streaming systems
on shared multi-core systems (i.e., a cluster)
Problem: Accurate measurement is very difficult. Is there
a way to decide on a model without it.
• Commodity multi-core timer availability and latency
• Frequency scaling and core migration
• Measuring modifies the application behavior
SBS
Streaming on Multi-core Systems
Stream Based
Supercomputing Lab
9

SBS
Stream Based
Supercomputing Lab
Derived Information
Expected Observed
10

SBS
Is there a pattern of minimal variation within the
Stream Based
Supercomputing Lab
Derived Information
Expected Observed
systems we’re running on?
Avg. Service Time = E[ X ] + Error
10

SBS
Stream Based
Supercomputing Lab
Goal
Find a distribution that characterizes
the minimum expected variation of a
hardware and software system
Use this characterization to
accept or reject models
11

SBS
Stream Based
Supercomputing Lab
Process
12
• Measurement!
• Workload definition!
• Find a distribution!
• Utilize the distribution to aid model selection

SBS
Stream Based
Supercomputing Lab
Timer Mechanism
Timer Thread Code
13
Ask for Time
Receive Time

SBS
Stream Based
Supercomputing Lab
Timer Mechanism
Timer Thread
rdtsc clock_gettime
14
• x86 assembly
• varying methods
to serialize
• relatively fast
• multiple drift
issues
• POSIX standard
• relatively accurate
• portable
• slower than rdtsc

SBS
Stream Based
Supercomputing Lab
Two Timing Choices
15

SBS
NUMA Node Variations
Stream Based
Supercomputing Lab
16

SBS
Stream Based
Supercomputing Lab
Minimize Variation
• Restricting timer to single core
!
• Use the x86 rdtsc instruction with processor
recommended serializers for each processor
type
!
• Keeping processes under test on the same
NUMA node as timer
!
• Run timer thread with altered priority to
minimize core context swaps
17

SBS
Best Case Execution Time Variation
• no-op instruction implemented in most processors
!
• usually takes exactly 1 cycle
!
• no real functional units are involved, so least
taxing
!
• variation observed in execution time should be
external to process
Stream Based
Supercomputing Lab
18

SBS
Stream Based
Supercomputing Lab
Data Collection
• no-op loops calibrated for various nominal
times, tied to a single core and run
thousands of times
!
• Execution time measured end to end for
each run, environment collected
!
• Parameters include:
Number of processes executing on core
Number of context swaps (voluntary,
involuntary)
Many others
19

SBS
Stream Based
Supercomputing Lab
Levy Distribution
20
Execution Time Error
( obs - mean )

SBS
Stream Based
Supercomputing Lab
Levy Distribution
21
Normal Distribution

SBS
Stream Based
Supercomputing Lab
Levy Distribution
22
Gumbel Distribution

SBS
Stream Based
Supercomputing Lab
Levy Distribution
23
Levy Distribution

SBS
Stream Based
Supercomputing Lab
Levy Distribution
• Truncation enables mean calculation, but
requires fitting to each dataset to find where
to truncate
!
• The truncation parameters are correlated to
both the number of processes per core and
the expected execution time
!
• Roughly linear relationship gives an
approximate solution to truncation
parameters without refitting
24

1 - 5 processes 6 - 10 processes
!0.000014
!0.0000145
!0.000015
!0.0000155
SBS
11 - 15 processes 16 - 20 processes
!0.00002
!0.000025
!0.00003
!0.000035
!0.00004
!0.000045
!0.00005
Stream Based
Supercomputing Lab
Levy Fit
!0.0000125
!0.000013
!0.0000135
!0.000014
!0.00001
!0.000015
!0.00002
!0.000025
!0.00003
25
!0.000025 !0.00001
!0.000025 !0.00001 0
!0.00006 !0.00003 0
!0.00005 !0.00002 0

Hypothesis: Lower Kullback-Leibler (KL) divergence
SBS
Question: Can we use an M/M/1 queueing model to
estimate the mean queue occupancy of this system?
Stream Based
Supercomputing Lab
Test Setup
A Q1 B
!
between expected and realized distribution is
associated with higher model accuracy.
26

SBS
Stream Based
Supercomputing Lab
Test Setup
A Q1 B
1. Dedicated thread of execution monitors
27
queue occupancy
2. Calculate the estimated mean queue
occupancy using the M/M/1 model
3. Calculate KL Divergence for the arrival
process distribution using the truncated
Levy distribution noise model

SBS
Convolution with Exponential
Stream Based
Supercomputing Lab
28

SBS
Stream Based
Supercomputing Lab
Conclusions
• The truncated Levy distribution can be used to
approximate BCETV
!
• The distribution of BCETV can be used as a tool
to accept or reject a stochastic queueing model
based on distributional assumptions
!
• KL divergence between the expected and
convolved distribution highly correlates with
queue model accuracy
29

SBS
Stream Based
Supercomputing Lab
Parting Notes
Slides available here:
sbs.wust.edu
!
Timer C++ template code:
http://goo.gl/ItJ3jP
!
Test harness used to collect data:
http://goo.gl/U1VG6N
30

Use of a Levy Distribution for Modeling Best Case Execution Time Variation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Use of a Levy Distribution for Modeling Best Case Execution Time Variation

Similar to Use of a Levy Distribution for Modeling Best Case Execution Time Variation (20)

Recently uploaded

Recently uploaded (20)

Use of a Levy Distribution for Modeling Best Case Execution Time Variation