MATEX @ DAC14

1.
Computer Science & Engineering Dept.,
University of California, San Diego, CA
2. Facebook Inc., Menlo Park, CA
MATEX: A Distributed Framework of Transient Simulation for Power Distribution Networks
* Email: zhuangh@ucsd.edu
Hao Zhuang1*, Shih-Hung Weng2, Jeng-HauLin1,
Chung-KuanCheng1

Outline

Problem Formulation

MATEX Framework

Circuit Solver

Matrix Exponential Kernel

Krylov Subspace Accelerations for PDNs

Distributed Framework

Linear system’s Superposition Propertyand Parallel Processing

Reduce Krylov Subspace Computations

Experimental Results

Conclusions
2

Linear differential equations
퐂퐂̇퐱퐱푡푡=−퐆퐆퐆(푡푡)+퐁퐁퐁퐁(푡푡)
Tens of millions or billions unknowns
Problem Formulation for PDN Transient Simulation
퐂퐂:capacitance/inductance matrix
퐆퐆:conductance matrix
퐱퐱(푡푡):voltage/current vector
퐁퐁: input selection matrix
퐮
푡푡:input current sources (vector)
PDN structure
RLC model
3

Previous Work

Time step size ℎis determined by

Input transition distances defines the upper bound of the time step, e.g. ℎ2=min(ℎ1,ℎ2,ℎ3)

Stiffness of systems

Local truncation error(LTE) ℎ1
ℎ2
ℎ3
A pulse input example

Low order approximations, e.g. Trapezoidal method (TR) 퐂퐂 ℎ+퐆퐆 2퐱퐱푡푡+ℎ=퐂퐂 ℎ−퐆퐆 2퐱퐱푡푡+퐁퐁퐮푡푡+ℎ+퐮(푡푡) 2
TRwith fixed time-step ℎwas used by the top solvers in TAU’12 power grid (PG) simulation contest
Efficient for IBM PG Benchmarks
Only one matrix factorization for transient stepping
Process forward and backward substitutions to calculate 퐱퐱푡푡+ℎ
4

Our Matrix Exponential Method

Analytical solution [Weng, et. al., IEEE TCAD 2012]
퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀퐱퐱(푡푡)+න 0 ℎ 푒푒(ℎ−휏휏)퐀퐀퐛퐛(푡푡+휏휏)푑푑휏휏
where퐀퐀=−퐂퐂−ퟏퟏ퐆퐆,퐛퐛=퐂퐂−ퟏퟏ퐁퐁퐁퐁(퐭퐭)

Input sources are piecewise linear (PWL)
퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀(퐱퐱푡푡+퐅퐅푡푡,ℎ)−퐏퐏푡푡,ℎ
Where
퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ,
퐏퐏푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+ℎ+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ
vector
Matrix exponential
vector
5

Advantage in Accuracy
Reference solution
 With the same h, Matrix Exponential method can reaches
the reference solution, while Backward Euler cannot.
6

Not 풆풆퐀퐀, but 풆풆퐀퐀퐯퐯[Weng, et. al. IEEE TCAD 2012]

Compute풆풆퐀퐀is very expensive, when 퐀퐀is large!

풆풆퐀퐀퐯퐯: Matrix Exponential and Vector Product (MEVP)

Efficiently approximated via Krylov subspace (MEXP)

Standard Krylov subspace 푲푲풎풎퐀퐀,퐯퐯=퐯퐯,퐀퐀퐀,퐀퐀ퟐퟐ퐯퐯,…,퐀퐀풎풎−ퟏퟏ퐯퐯

Basis Generation: 퐕퐕풎풎=퐯퐯ퟏퟏ,퐯퐯ퟐퟐ,⋯,퐯퐯풎풎

Arnoldiprocess and Matrix reduction:
퐀퐀퐀풎풎=퐕퐕풎풎퐇퐇풎풎+풉풉풎풎+ퟏퟏ,풎풎퐯퐯풎풎+ퟏퟏ풆풆풎풎퐓퐓

MEVP is computed by
풆풆퐀퐀퐯퐯≈퐯퐯ퟐퟐ퐕퐕풎풎풆풆퐇퐇풎풎풆풆ퟏퟏ

Time stepping only by scaling h,
풆풆ℎ퐀퐀퐯퐯≈퐯퐯ퟐퟐ퐕퐕풎풎풆풆ℎ퐇퐇풎풎풆풆ퟏퟏ
7

Algorithm of Computing 퐱퐱(푡푡+ℎ)
PDN is a linear system, so that the
input matrices 퐗퐗ퟐퟐ, 퐋퐋, 퐔퐔 do not change.
퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝 퐗퐗ퟏퟏ is done only once
for the whole simulation.
퐗퐗ퟏퟏ
퐗퐗ퟐퟐ
퐋퐋,퐔퐔
MEXP
퐂퐂
퐆퐆
퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝(퐗퐗퐗)
8


PDNs are usually highly stiff circuits

Generalized eigenvalues spread in a wide range within spectrum of A. (퐀퐀=−퐂퐂−ퟏퟏ퐆퐆)

Requires Standard Krylovsubspace to build a very large number of bases to approximate MEVP.
Problem #1: Stiff PDN Circuits
9

Next Section

Problem Formulation

MATEX Framework

Circuit Solver







Conclusions
10

Standard Krylov subspace (MEXP)
 (a) Standard Krylov Basis (MEXP):
푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯
Im
Re
0
(a)
Eigenvalues of A: small magnitude of real components
Eigenvalues of A: large magnitude of real components
퐀퐀=−퐂퐂−ퟏퟏ퐆퐆
11

Im
Re
0
(a)
•Fast mode of dynamical behavior of circuits.
•Standard Krylovbasis tends to capture these eigenvalues with large magnitude.
Eigenvalues of A: large magnitude of real components 12

Im
Re
0
(a)
•These eigenvalues defines the major dynamical behavior of circuits.
•Demand more bases in order to characterize these eigenvalues

Inverted Krylov subspace (I-MATEX)
 (b) Inverted Krylov Basis (I-MATEX)
푲푲풎풎 퐀퐀−ퟏퟏ, 퐯퐯 = 퐯퐯, 퐀퐀−ퟏퟏ퐯퐯, 퐀퐀−ퟐퟐ 퐯퐯,…, 퐀퐀−풎풎+ퟏퟏ퐯퐯
Im
Re
Im
Re
0
0
(a)
(b)

Inverted Krylov subspace (I-MATEX)
 (b) Inverted Krylov Basis (I-MATEX)
푲푲풎풎 퐀퐀−ퟏퟏ, 퐯퐯 = 퐯퐯, 퐀퐀−ퟏퟏ퐯퐯, 퐀퐀−ퟐퟐ 퐯퐯,…, 퐀퐀−풎풎+ퟏퟏ퐯퐯
Im
Re
Im
Re
0
0
(a)
(b)
Inverted Krylov subspace is more likely to capture these “important” eigenvalues

Rational Krylov subspace (R-MATEX)
 (c) Rational Krylov Basis (R-MATEX)
푲푲풎풎 (퐈퐈 − 훾훾퐀퐀)−ퟏퟏ, 퐯퐯 = 퐯퐯, (퐈퐈 − 훾훾퐀퐀)−ퟏퟏ퐯퐯, (퐈퐈 − 훾훾퐀퐀)−ퟐퟐ 퐯퐯,…, (퐈퐈 − 훾훾퐀퐀)−풎풎+ퟏퟏ퐯퐯
Im
Re
Im
Re
Eigenvalues of A: large magnitude of real components
0
0
(a)
(c)
•Rational Krylov is still likely to capture these “important” eigenvalues
•More robust numerical property
16

Error trend of R-MATEX
Directly compute 푒푒ℎ퐀퐀
MEVP via R-MATEX
푒푒푒푒푒푒푒푒푒푒=|푒푒ℎ퐀퐀퐯퐯−퐕퐕퐦푒푒ℎ퐇퐇퐦푒푒1|vs. m vs. h
Error
17

Same Algorithm with Different Input Matrices
Still only one 퐋퐋, 퐔퐔 = 퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝(퐗퐗퐗)
퐗퐗ퟏퟏ
퐗퐗ퟐퟐ
퐇퐇풎풎
MEXP
퐂퐂
퐆퐆
퐇퐇풎풎
I-MATEX
퐆퐆
퐂퐂
퐇퐇퐇푚푚−1
R-MATEX
퐂퐂+휸휸휸
퐂퐂
(퐈퐈−෩퐇퐇 푚푚−1)/휸휸
18

Testcases: RC Circuits with Different Stiffness
ma: average dimension of Krylov subspace (Vm, Hm)
mp: peak dimension of Krylov subspace (Vm, Hm)
Err(%): relative error compared to reference solution.
Speedups brought by Krylov subspace reduction
Stiffness:
|푅푅푅{휆휆푚푚푚푚푚푚퐴퐴}| |푅푅푅{휆휆푚푚푎푎푎푎퐴퐴}|
Method
푚푚푎푎
푚푚푝푝
Err(%)
Speedup/MEXP
Stiffness
MEXP
211.4
229
0.510
1X
2.1X1016
I-MATEX
5.7
14
0.004
2616X
R-MATEX
6.9
12
0.004
2735X
MEXP
154.2
224
0.004
1X
2.1X1012
I-MATEX
5.7
14
0.004
583X
R-MATEX
6.9
12
0.004
611X
MEXP
148.6
223
0.004
1X
2.1X108
I-MATEX
5.7
14
0.004
229X
R-MATEX
6.9
12
0.004
252X
19

Problem #2: Initial Vector Change
MEVP=푒푒퐀퐀퐯퐯
Once 퐯퐯changes, we need to compute 푲푲풎풎for MEVP.
initial vector of
푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯
20

changes when input sources cannot keep the previous trend
In circuit solver,
where
퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ
initial vector of
initial vector 21

initial vector of
A pulse input example,
•the dash lines are places
where initial vector changes
•“transition spot”
22

Once 풗풗changes, we need to compute 푲푲풎풎for MEVP.
In circuit solver,
where
initial vector of
initial vector

Many input current sources in PDN make the initial vector change frequently, which triggers Krylovsubspace generations and consumes runtime (trouble maker).
23

Next Section

Problem Formulation

MATEX Framework

Circuit Solver







Conclusions
24

Input sources, the trouble maker
A PDN with three input
current sources.
25

current sources.
26

Some definitions

Local Transition Spot (LTS):foroneinputsource,its transitionspots.

Global Transition Spot (GTS):theunionof all LTS.

Snapshot:foroneinputsource,the spot in GTS but not in LTS.
current sources.
27

Some definitions

Local Transition Spot (LTS):for one input source, its transition spots.

Global Transition Spot (GTS):theunionof all LTS

Snapshot:foroneinputsource,the spot in GTS but not in LTS
current sources.
Simulating circuit with input sources as a whole, GTS triggers Krylov subspace generations.
28

How about simulating the
circuit with individual
source, then sum them up
later by superposition?
A PDN with three input current sources.
Some definitions
Local Transition Spot (LTS):foroneinputsource,its transitionspots.
Global Transition Spot (GTS):theunionof all LTS.
Snapshot:foroneinputsource,the spot in GTS but not in LTS.
29

Reduce the Krylov subspace generation chances and reuse subspace

For one input source, LTS is much smaller than GTS.

Meanwhile, the snapshot is needed to keep track for later superposition.

Compute snapshot without extra Krylov subspace generations.
30


Given an previous solution x(t)
퐱퐱 푡푡
31


To compute the solution at snapshot 퐱퐱푡푡+ℎ1and 퐱퐱푡푡+ℎ2without Krylov subspace generations
퐱퐱 푡푡 + ℎ1
퐱퐱푡푡+ℎ2
ℎ1
ℎ2
32


Generate 퐕퐕퐦and 퐇퐇퐦at t
퐕퐕퐦, 퐇퐇풎풎
푡푡
33


Use 퐕퐕퐦,퐇퐇퐦and scaling hto h1, and h2for MEVP, until reach the next LTS

No matrix factorizations during this adaptive stepping! 퐱퐱 푡푡 + ℎ2 = ||퐯퐯||퐕퐕퐦푒푒ℎ2퐇퐇푚푚풆풆ퟏퟏ − 푷푷(푡푡, ℎퟐퟐ)
ℎ2
퐱퐱푡푡+ℎ1=||퐯퐯||퐕퐕퐦푒푒ℎ1퐇퐇푚푚풆풆ퟏퟏ−푷푷(풕풕,ℎ1)
ℎ1
퐕퐕퐦,퐇퐇풎풎
34

MATEX’s Distributed Framework 35

More aggressive!

Each computing node is responsible for one set of bumps.
36


Test cases:IBM power grid benchmarks

TR: Trapezoidal methodwith fixed time step

MATEX: circuit solver uses R-MATEX

Environment

Linux workstations,

Intel CoreTMi7-4770 3.40GHz processor

32GB memory.

Implemented in MATLAB 2013.

Easytoemulate distributed environment (nosynchronization during the simulation).
37

Design
MATEX
# Grp
trmatex(s)
trtotal(s)
Avg
Err.
Speedups
t1000(s)/trmatex(s)
Speedups
ttotal(s)/trtotal(s)
ibmpg1t
100
0.50
0.85
2.5E-5
11.9X
7.3X
ibmpg2t
100
2.02
3.72
4.3E-5
13.4X
7.7X
ibmpg3t
100
20.15
45.77
3.7E-5
12.2X
6.0X
Ibmpg4t
15
22.35
65.66
3.9E-5
14.7X
5.6X
ibmpg5t
100
35.67
54.21
1.1E-5
11.5X
7.9X
ibmpg6t
100
47.27
74.94
3.4E-5
11.5X
7.6X
Design
TRwith h=10ps
t1000(s)
tttotal(s)
ibmpg1t
5.94
6.20
ibmpg2t
26.98
28.61
ibmpg3t
245.92
272.47
Ibmpg4t
329.36
368.55
ibmpg5t
408.78
428.43
ibmpg6t
542.04
567.38
• Avg Err.: average differences compared
to all output nodes' solutions provided by
IBM Power Grid Benchmarks;
• Speedups t1000/trmatex : transient stepping
runtime speedups of MATEX over TR;
• Speedups tttotal/trtotal : total simulation
runtime speedups of MATEX over TR.
38

Design
MATEX
# Grp
trmatex(s)
trtotal(s)
Avg
Err.
Speedups
t1000(s)/trmatex(s)
Speedups
tttotal(s)/trtotal(s)
ibmpg1t
100
0.50
0.85
2.5E-5
11.9X
7.3X
ibmpg2t
100
2.02
3.72
4.3E-5
13.4X
7.7X
ibmpg3t
100
20.15
45.77
3.7E-5
12.2X
6.0X
Ibmpg4t
15
22.35
65.66
3.9E-5
14.7X
5.6X
ibmpg5t
100
35.67
54.21
1.1E-5
11.5X
7.9X
ibmpg6t
100
47.27
74.94
3.4E-5
11.5X
7.6X
Design
TRwith h=10ps
t1000(s)
tttotal(s)
ibmpg1t
5.94
6.20
ibmpg2t
26.98
28.61
ibmpg3t
245.92
272.47
Ibmpg4t
329.36
368.55
ibmpg5t
408.78
428.43
ibmpg6t
542.04
567.38
39

Design
MATEX
# Grp
trmatex(s)
trtotal(s)
Avg
Err.
Speedups
t1000(s)/trmatex(s)
Speedups
tttotal(s)/trtotal(s)
ibmpg1t
100
0.50
0.85
2.5E-5
11.9X
7.3X
ibmpg2t
100
2.02
3.72
4.3E-5
13.4X
7.7X
ibmpg3t
100
20.15
45.77
3.7E-5
12.2X
6.0X
Ibmpg4t
15
22.35
65.66
3.9E-5
14.7X
5.6X
ibmpg5t
100
35.67
54.21
1.1E-5
11.5X
7.9X
ibmpg6t
100
47.27
74.94
3.4E-5
11.5X
7.6X
Design
TRwith h=10ps
t1000(s)
tttotal(s)
ibmpg1t
5.94
6.20
ibmpg2t
26.98
28.61
ibmpg3t
245.92
272.47
Ibmpg4t
329.36
368.55
ibmpg5t
408.78
428.43
ibmpg6t
542.04
567.38
40

Contributions

New time-integration kernel is applied with improved Krylovsubspace-based MEVP approximations for PDNs

Adaptive time stepping without matrix re-factorization during the transient (stepping) simulation

This feature cannot be achieved in low order approximation strategy, e.g., trapezoidal (TR), due to the explicitly embeddedℎin 퐂퐂 ℎ+퐆퐆 2

Distributed computing framework

Decompose simulation task based on LTS, then do superposition using GTS and snapshot to form the final solution.

Explore the advantages of large time stepping, also reduce and reuse Krylov subspaces.

Results of IBM PG benchmarks

Compared to TR with fixed time step (10ps), the speedup of transient stepping is 13Xon average.
41

MATEX @ DAC14

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MATEX @ DAC14

Similar to MATEX @ DAC14 (20)

Recently uploaded

Recently uploaded (20)

MATEX @ DAC14