1. Improving
Throughput
of
Simultaneous
Mul6threading
(SMT)
Processors
using
Applica6on
Signatures
and
Thread
Priori6es
Mitesh
R.
Meswani
University
of
Texas
at
El
Paso
(UTEP)
11/20/2008
1
By
Mitesh
R.
Meswani
2. Simultaneous
Mul6threading
(SMT)
U6liza6on
Thread-‐X
Execu6ng
Thread-‐Y
Execu6ng
No
Thread
Execu6ng
Legend:
1
2
3
4
5
6
FP
FX
LSU
Processor
Cycles
Single-‐Threaded
ExecuDon
1
2
3
4
5
6
FP
FX
LSU
Processor
Cycles
SMT
ExecuDon
ExecuDon
Units
ExecuDon
Units
SMT
with
two
hardware
threads
• SMT
hardware
contexts
share
most
of
the
processor
resources
• Poten7al
of
2x
throughput
with
perfect
resource
sharing
• Throughput
gains
limited
by
conten7on
of
shared
resources
Thread
X
waits
un6l
resource
is
free,
due
to
sharing
Thread
X
uses
unused
resource
2
By
Mitesh
R.
Meswani
11/20/2008
3. Research
Ques6on
and
Hypothesis
• SMT-‐performance
Tunables:
– Enable
or
disable
SMT
mode
– Priori6ze
one
hardware
thread
over
the
other
• Research
QuesDon:
What
are
the
op6mal
priority
seWngs
for
best
processor
throughput?
• Hypothesis:
Use
hints
from
resource
usage
in
Single-‐threaded
mode
3
By
Mitesh
R.
Meswani
11/20/2008
4. Disserta6on
Contribu6ons
1. Showed
that
priori6za6on
of
threads
improves
throughput
for
nearly
half
the
applica6ons
studied
2. Defined
and
captured
applica6on
“signatures”
which
characterize
usage
of
cri6cal
resources
3. Showed
that
only
a
small
set
of
signatures
are
present
in
real
world
applica6ons
4. Developed
a
predic6on
methodology
using
signature
microbenchmarks
and
showed
that
our
predic6ons
improve
throughput
over
no
priori6za6on
(default)
4
By
Mitesh
R.
Meswani
11/20/2008
5. Experimental
Pla^orm:
Thread
Priori6es
in
IBM
POWER5
• Six
out
of
eight
priori6es
available
to
the
opera6ng
system
for
normal
mode
of
opera6on:
1,
2,
3,
4
(default),
5,
and
6
• Difference
in
hardware
thread
priori6es
control
decode
cycle
sharing
– Higher
Priority
thread
gets
more
decode
cycles
– Equal
Priori6es
(default)
gives
one
out
of
two
decode
cycles
to
each
thread
5
By
Mitesh
R.
Meswani
11/20/2008
6. Signatures
1. Iden6fy
Significant
Resources
:
Floa6ng-‐point
unit
(FPU),
Fixed-‐point
unit
(FXU),
L2
unified
cache,
and
L2
unified
TLB
2. Capture
u6liza6on
of
resources
using
performance
counters
3. Define
u6liza6on
levels
of
resources
in
Single-‐Threaded
mode,
forming
a
signature
– Ten
u6liza6on
levels
L1
to
L10
per
resource
– Example:
L1L2L3L9,
L9L6L7L8,
L2L3L10L6…
6
By
Mitesh
R.
Meswani
11/20/2008
7. Work
Flow
Performance
Counter
SeWngs
Step
1:
Find
Signatures
of
Real
Applica6ons
Run
Applica6on
and
Periodically
Sample
Counters
Serial
Applica6on
Single-‐
Threaded
Mode
Signature
Data
Base
Signatures
Signature-‐microbenchmark
Pair
X,
Y
CPI
Step
2:
Create
Signature
Microbenchmarks
for
Frequently
Appearing
Signatures
and
Empirically
Find
Priority
Predic6ons
Run
Signature-‐
Microbenchmark
Pair
Priori6es
i,
j
in
SMT
Mode
Predic6on
Data
Base
Store
CPI
for
all
priori6es
for
Pair
X,
Y
Iden6fy
Best
Case
Priority
for
Pair
X,
Y
Predic6ons
Step
3:
Execute
Applica6on
Pairs
using
Predicted
Priori6es
Signature
Data
Base
Predic6on
Data
Base
Read
Signatures
Applica6on
Pair
A,
B
Read
Priori6es
Yes
Signature
of
A,B
Run
Pair
A,
B
with
Predicted
Priori6es
in
SMT
Mode
Priority
of
A,
Priority
of
B
Found
Domina6ng
Signatures
?
Run
Pair
A,
B
with
Equal
Priori6es
in
SMT
Mode
No
7
By
Mitesh
R.
Meswani
11/20/2008
8. Details
of
Step
1
• Four
groups
of
counters
were
measured
• Each
group
measured
in
separate
runs
• Sampled
in
one
second
6me
intervals
• Signature
of
an
interval
is
composed
from
u6liza6on
for
that
interval
from
4
runs
Interval
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Sample#
Run
1
Run
2
Run
3
Run
4
8
By
Mitesh
R.
Meswani
11/20/2008
9. Different
Signatures
are
Present
in
Real
Applica6ons
(SPEC
CPU2006,
NAS
NPB
SER,
PETSc
KSP/Matrix)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
429.mcf
416.gamess
444.namd
462.libquantum
cgs
gmres
L1L1L1L1
L3L1L1L1
L3L2L1L1
L2L1L1L1
L2L3L1L1
L2L2L1L1
L1L4L1L1
L1L1L9L5
L1L2L7L4
L1L1L7L4
L1L1L6L4
L1L2L6L3
L1L2L5L2
L1L3L1L1
L1L2L2L1
L1L2L3L1
L1L2L6L4
L1L2L5L4
L1L2L5L3
L1L2L4L3
L1L2L4L2
L1L2L3L2
L1L1L2L1
L1L2L1L1
%
of
Total
Cycles
Signature
Histogram
of
Four
SPEC
CPU2006
and
Two
PETSc
KSP
Library
FuncDons
ApplicaDons
One
Signature
>
80%
(dominant)
9
By
Mitesh
R.
Meswani
11/20/2008
10. Conclusions
1. Showed
that
equal
priori6es
(default)
are
not
the
best
for
nearly
47%
of
applica6ons
studied
2. Only
16
Signatures
are
sufficient
to
represent
95.5%
of
execu6on
6me
of
20
SPEC
CPU2006
benchmarks,
9
NAS
NPB3.2
Serial
benchmarks,
119
PETSc
KSP,
and
180
PETSc
Matrix
libraries
3. Priority
predic6ons
using
signature
benchmarks
improve
throughput
over
default
seWngs
for
87%
of
the
15
PETSc
KSP
coschedules.
10
By
Mitesh
R.
Meswani
11/20/2008
11. Applica6ons
with
Mul6ple
Signatures
11
By
Mitesh
R.
Meswani
11/20/2008
DisDnct
TransiDons
DisDnct
TransiDons
Long
Phases
RepeaDng
Small
Phases
12. Future
Work
and
References
Future
Work:
• Iden6fy
applica6ons
with
mul6ple
signatures
• Dynamic
adapta6on
of
priori6es
• Detec6ng
signatures
on
the
fly
• Phase
detec6on
and
Predic6on
for
a
truly
adap6ve
system
References:
• M.
R.
Meswani,
P.
J.
Teller,
and
S.
Arunangiri.,
“A
Study
of
the
Influence
of
the
POWER5
Dynamic
Resource
Balancing
Hardware
on
Op6mal
Hardware
Thread
Priori6es,”
To
Appear
in
the
Proceedings
of
the
2008
Live
Virtual
Construc=ve
Conference,
Jan
2009,
El
Paso,
TX
• M.
R.
Meswani
and
P.
J.
Teller,
“
Evalua6ng
the
Performance
Impact
of
Hardware
Thread
Priori6es
in
Simultaneous
Mul6threaded
Processors
using
SPEC
CPU2000,”
Proceedings
of
the
2nd
Interna=onal
Workshop
on
Opera=ng
Systems
Interference
In
High
Performance
Applica=ons,
in
conjunc6on
with
the
15th
Interna6onal
Conferences
on
Parallel
Architectures
and
Compila6on
Techniques
(PACT06)
Conference,
sponsored
by
ACM
and
IEEE,
September
2006,
Seaqle,
WA.
12
By
Mitesh
R.
Meswani
11/20/2008
13. Acknowledgements
• This
work
is
supported
by
AHPCRC
Grant
W11NF-‐07-‐2-‐2007
• Dr.
Patricia
J.
Teller,
Professor,
UTEP
(Advisor)
• Amir
Simon,
IBM
for
assistance
with
p550
machine
• Email:
mitesh.meswani@gmail.com
• URL:
www.linkedin.com/in/miteshmeswani
11/20/2008
By
Mitesh
R.
Meswani
13