SlideShare a Scribd company logo
1. 
Computer Science & Engineering Dept., 
University of California, San Diego, CA 
2. Facebook Inc., Menlo Park, CA 
MATEX: A Distributed Framework of Transient Simulation for Power Distribution Networks 
* Email: zhuangh@ucsd.edu 
Hao Zhuang1*, Shih-Hung Weng2, Jeng-HauLin1, 
Chung-KuanCheng1
Outline 
 
Problem Formulation 
 
MATEX Framework 
 
Circuit Solver 
 
Matrix Exponential Kernel 
 
Krylov Subspace Accelerations for PDNs 
 
Distributed Framework 
 
Linear system’s Superposition Propertyand Parallel Processing 
 
Reduce Krylov Subspace Computations 
 
Experimental Results 
 
Conclusions 
2
Linear differential equations 
퐂퐂̇퐱퐱푡푡=−퐆퐆퐆(푡푡)+퐁퐁퐁퐁(푡푡) 
Tens of millions or billions unknowns 
Problem Formulation for PDN Transient Simulation 
퐂퐂:capacitance/inductance matrix 
퐆퐆:conductance matrix 
퐱퐱(푡푡):voltage/current vector 
퐁퐁: input selection matrix 
퐮
푡푡:input current sources (vector) 
PDN structure 
RLC model 
3
Previous Work 
 
Time step size ℎis determined by 
 
Input transition distances defines the upper bound of the time step, e.g. ℎ2=min(ℎ1,ℎ2,ℎ3) 
 
Stiffness of systems 
 
Local truncation error(LTE) ℎ1 
ℎ2 
ℎ3 
A pulse input example 
 
Low order approximations, e.g. Trapezoidal method (TR) 퐂퐂 ℎ+퐆퐆 2퐱퐱푡푡+ℎ=퐂퐂 ℎ−퐆퐆 2퐱퐱푡푡+퐁퐁퐮푡푡+ℎ+퐮(푡푡) 2 
TRwith fixed time-step ℎwas used by the top solvers in TAU’12 power grid (PG) simulation contest 
Efficient for IBM PG Benchmarks 
Only one matrix factorization for transient stepping 
Process forward and backward substitutions to calculate 퐱퐱푡푡+ℎ 
4
Our Matrix Exponential Method 
 
Analytical solution [Weng, et. al., IEEE TCAD 2012] 
퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀퐱퐱(푡푡)+න 0 ℎ 푒푒(ℎ−휏휏)퐀퐀퐛퐛(푡푡+휏휏)푑푑휏휏 
where퐀퐀=−퐂퐂−ퟏퟏ퐆퐆,퐛퐛=퐂퐂−ퟏퟏ퐁퐁퐁퐁(퐭퐭) 
 
Input sources are piecewise linear (PWL) 
퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀(퐱퐱푡푡+퐅퐅푡푡,ℎ)−퐏퐏푡푡,ℎ 
Where 
퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ, 
퐏퐏푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+ℎ+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ 
vector 
Matrix exponential 
vector 
5
Advantage in Accuracy 
Reference solution 
 With the same h, Matrix Exponential method can reaches 
the reference solution, while Backward Euler cannot. 
6
Not 풆풆퐀퐀, but 풆풆퐀퐀퐯퐯[Weng, et. al. IEEE TCAD 2012] 
 
Compute풆풆퐀퐀is very expensive, when 퐀퐀is large! 
 
풆풆퐀퐀퐯퐯: Matrix Exponential and Vector Product (MEVP) 
 
Efficiently approximated via Krylov subspace (MEXP) 
 
Standard Krylov subspace 푲푲풎풎퐀퐀,퐯퐯=퐯퐯,퐀퐀퐀,퐀퐀ퟐퟐ퐯퐯,…,퐀퐀풎풎−ퟏퟏ퐯퐯 
 
Basis Generation: 퐕퐕풎풎=퐯퐯ퟏퟏ,퐯퐯ퟐퟐ,⋯,퐯퐯풎풎 
 
Arnoldiprocess and Matrix reduction: 
퐀퐀퐀풎풎=퐕퐕풎풎퐇퐇풎풎+풉풉풎풎+ퟏퟏ,풎풎퐯퐯풎풎+ퟏퟏ풆풆풎풎퐓퐓 
 
MEVP is computed by 
풆풆퐀퐀퐯퐯≈퐯퐯ퟐퟐ퐕퐕풎풎풆풆퐇퐇풎풎풆풆ퟏퟏ 
 
Time stepping only by scaling h, 
풆풆ℎ퐀퐀퐯퐯≈퐯퐯ퟐퟐ퐕퐕풎풎풆풆ℎ퐇퐇풎풎풆풆ퟏퟏ 
7
Algorithm of Computing 퐱퐱(푡푡+ℎ) 
PDN is a linear system, so that the 
input matrices 퐗퐗ퟐퟐ, 퐋퐋, 퐔퐔 do not change. 
퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝 퐗퐗ퟏퟏ is done only once 
for the whole simulation. 
퐗퐗ퟏퟏ 
퐗퐗ퟐퟐ 
퐋퐋,퐔퐔 
MEXP 
퐂퐂 
퐆퐆 
퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝(퐗퐗퐗) 
8
 
PDNs are usually highly stiff circuits 
 
Generalized eigenvalues spread in a wide range within spectrum of A. (퐀퐀=−퐂퐂−ퟏퟏ퐆퐆) 
 
Requires Standard Krylovsubspace to build a very large number of bases to approximate MEVP. 
Problem #1: Stiff PDN Circuits 
9
Next Section 
 
Problem Formulation 
 
MATEX Framework 
 
Circuit Solver 
 
Matrix Exponential Kernel 
 
Krylov Subspace Accelerations for PDNs 
 
Distributed Framework 
 
Linear system’s Superposition Propertyand Parallel Processing 
 
Reduce Krylov Subspace Computations 
 
Experimental Results 
 
Conclusions 
10
Standard Krylov subspace (MEXP) 
 (a) Standard Krylov Basis (MEXP): 
푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 
Im 
Re 
0 
(a) 
Eigenvalues of A: small magnitude of real components 
Eigenvalues of A: large magnitude of real components 
퐀퐀=−퐂퐂−ퟏퟏ퐆퐆 
11
Standard Krylov subspace (MEXP) 
 (a) Standard Krylov Basis (MEXP): 
푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 
Im 
Re 
0 
(a) 
•Fast mode of dynamical behavior of circuits. 
•Standard Krylovbasis tends to capture these eigenvalues with large magnitude. 
Eigenvalues of A: small magnitude of real components 
Eigenvalues of A: large magnitude of real components 12
Standard Krylov subspace (MEXP) 
 (a) Standard Krylov Basis (MEXP): 
푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 
Im 
Re 
0 
(a) 
•These eigenvalues defines the major dynamical behavior of circuits. 
•Demand more bases in order to characterize these eigenvalues 
Eigenvalues of A: small magnitude of real components 
Eigenvalues of A: large magnitude of real components 13
Inverted Krylov subspace (I-MATEX) 
 (a) Standard Krylov Basis (MEXP): 
푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 
 (b) Inverted Krylov Basis (I-MATEX) 
푲푲풎풎 퐀퐀−ퟏퟏ, 퐯퐯 = 퐯퐯, 퐀퐀−ퟏퟏ퐯퐯, 퐀퐀−ퟐퟐ 퐯퐯,…, 퐀퐀−풎풎+ퟏퟏ퐯퐯 
Im 
Re 
Im 
Re 
0 
0 
(a) 
(b) 
Eigenvalues of A: small magnitude of real components 
Eigenvalues of A: large magnitude of real components 14
Inverted Krylov subspace (I-MATEX) 
 (a) Standard Krylov Basis (MEXP): 
푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 
 (b) Inverted Krylov Basis (I-MATEX) 
푲푲풎풎 퐀퐀−ퟏퟏ, 퐯퐯 = 퐯퐯, 퐀퐀−ퟏퟏ퐯퐯, 퐀퐀−ퟐퟐ 퐯퐯,…, 퐀퐀−풎풎+ퟏퟏ퐯퐯 
Im 
Re 
Im 
Re 
0 
0 
(a) 
(b) 
Inverted Krylov subspace is more likely to capture these “important” eigenvalues 
Eigenvalues of A: small magnitude of real components 
Eigenvalues of A: large magnitude of real components 15
Rational Krylov subspace (R-MATEX) 
 (a) Standard Krylov Basis (MEXP): 
푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 
 (c) Rational Krylov Basis (R-MATEX) 
푲푲풎풎 (퐈퐈 − 훾훾퐀퐀)−ퟏퟏ, 퐯퐯 = 퐯퐯, (퐈퐈 − 훾훾퐀퐀)−ퟏퟏ퐯퐯, (퐈퐈 − 훾훾퐀퐀)−ퟐퟐ 퐯퐯,…, (퐈퐈 − 훾훾퐀퐀)−풎풎+ퟏퟏ퐯퐯 
Im 
Re 
Im 
Re 
Eigenvalues of A: small magnitude of real components 
Eigenvalues of A: large magnitude of real components 
0 
0 
(a) 
(c) 
•Rational Krylov is still likely to capture these “important” eigenvalues 
•More robust numerical property 
16
Error trend of R-MATEX 
Directly compute 푒푒ℎ퐀퐀 
MEVP via R-MATEX 
푒푒푒푒푒푒푒푒푒푒=|푒푒ℎ퐀퐀퐯퐯−퐕퐕퐦푒푒ℎ퐇퐇퐦푒푒1|vs. m vs. h 
Error 
17
Same Algorithm with Different Input Matrices 
Still only one 퐋퐋, 퐔퐔 = 퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝(퐗퐗퐗) 
퐗퐗ퟏퟏ 
퐗퐗ퟐퟐ 
퐇퐇풎풎 
MEXP 
퐂퐂 
퐆퐆 
퐇퐇풎풎 
I-MATEX 
퐆퐆 
퐂퐂 
퐇퐇퐇푚푚−1 
R-MATEX 
퐂퐂+휸휸휸
퐂퐂 
(퐈퐈−෩퐇퐇 푚푚−1)/휸휸 
18
Testcases: RC Circuits with Different Stiffness 
ma: average dimension of Krylov subspace (Vm, Hm) 
mp: peak dimension of Krylov subspace (Vm, Hm) 
Err(%): relative error compared to reference solution. 
Speedups brought by Krylov subspace reduction 
Stiffness: 
|푅푅푅{휆휆푚푚푚푚푚푚퐴퐴}| |푅푅푅{휆휆푚푚푎푎푎푎퐴퐴}| 
Method 
푚푚푎푎 
푚푚푝푝 
Err(%) 
Speedup/MEXP 
Stiffness 
MEXP 
211.4 
229 
0.510 
1X 
2.1X1016 
I-MATEX 
5.7 
14 
0.004 
2616X 
R-MATEX 
6.9 
12 
0.004 
2735X 
MEXP 
154.2 
224 
0.004 
1X 
2.1X1012 
I-MATEX 
5.7 
14 
0.004 
583X 
R-MATEX 
6.9 
12 
0.004 
611X 
MEXP 
148.6 
223 
0.004 
1X 
2.1X108 
I-MATEX 
5.7 
14 
0.004 
229X 
R-MATEX 
6.9 
12 
0.004 
252X 
19
Problem #2: Initial Vector Change 
MEVP=푒푒퐀퐀퐯퐯 
Once 퐯퐯changes, we need to compute 푲푲풎풎for MEVP. 
initial vector of 
푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯 
20
Problem #2: Initial Vector Change 
changes when input sources cannot keep the previous trend 
MEVP=푒푒퐀퐀퐯퐯 
Once 퐯퐯changes, we need to compute 푲푲풎풎for MEVP. 
In circuit solver, 
퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀(퐱퐱푡푡+퐅퐅푡푡,ℎ)−퐏퐏푡푡,ℎ 
where 
퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ 
initial vector of 
푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯 
initial vector 21
Problem #2: Initial Vector Change 
MEVP=푒푒퐀퐀퐯퐯 
Once 퐯퐯changes, we need to compute 푲푲풎풎for MEVP. 
퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ 
initial vector of 
푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯 
A pulse input example, 
•the dash lines are places 
where initial vector changes 
•“transition spot” 
changes when input sources cannot keep the previous trend 
22
Problem #2: Initial Vector Change 
changes when input sources cannot keep the previous trend 
MEVP=푒푒퐀퐀퐯퐯 
Once 풗풗changes, we need to compute 푲푲풎풎for MEVP. 
In circuit solver, 
퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀(퐱퐱푡푡+퐅퐅푡푡,ℎ)−퐏퐏푡푡,ℎ 
where 
퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ 
initial vector of 
푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯 
initial vector 
 
Many input current sources in PDN make the initial vector change frequently, which triggers Krylovsubspace generations and consumes runtime (trouble maker). 
23
Next Section 
 
Problem Formulation 
 
MATEX Framework 
 
Circuit Solver 
 
Matrix Exponential Kernel 
 
Krylov Subspace Accelerations for PDNs 
 
Distributed Framework 
 
Linear system’s Superposition Propertyand Parallel Processing 
 
Reduce Krylov Subspace Computations 
 
Experimental Results 
 
Conclusions 
24
Input sources, the trouble maker 
A PDN with three input 
current sources. 
25
Input sources, the trouble maker 
A PDN with three input 
current sources. 
26
Input sources, the trouble maker 
Some definitions 
 
Local Transition Spot (LTS):foroneinputsource,its transitionspots. 
 
Global Transition Spot (GTS):theunionof all LTS. 
 
Snapshot:foroneinputsource,the spot in GTS but not in LTS. 
A PDN with three input 
current sources. 
27
Input sources, the trouble maker 
Some definitions 
 
Local Transition Spot (LTS):for one input source, its transition spots. 
 
Global Transition Spot (GTS):theunionof all LTS 
 
Snapshot:foroneinputsource,the spot in GTS but not in LTS 
A PDN with three input 
current sources. 
Simulating circuit with input sources as a whole, GTS triggers Krylov subspace generations. 
28
Input sources, the trouble maker 
How about simulating the 
circuit with individual 
source, then sum them up 
later by superposition? 
A PDN with three input current sources. 
Some definitions 
Local Transition Spot (LTS):foroneinputsource,its transitionspots. 
Global Transition Spot (GTS):theunionof all LTS. 
Snapshot:foroneinputsource,the spot in GTS but not in LTS. 
29
Reduce the Krylov subspace generation chances and reuse subspace 
 
For one input source, LTS is much smaller than GTS. 
 
Meanwhile, the snapshot is needed to keep track for later superposition. 
 
Compute snapshot without extra Krylov subspace generations. 
30
Reduce the Krylov subspace generation chances and reuse subspace 
 
Given an previous solution x(t) 
퐱퐱 푡푡 
31
Reduce the Krylov subspace generation chances and reuse subspace 
 
To compute the solution at snapshot 퐱퐱푡푡+ℎ1and 퐱퐱푡푡+ℎ2without Krylov subspace generations 
퐱퐱 푡푡 + ℎ1 
퐱퐱푡푡+ℎ2 
ℎ1 
ℎ2 
32
Reduce the Krylov subspace generation chances and reuse subspace 
 
Generate 퐕퐕퐦and 퐇퐇퐦at t 
퐕퐕퐦, 퐇퐇풎풎 
푡푡 
33
Reduce the Krylov subspace generation chances and reuse subspace 
 
Use 퐕퐕퐦,퐇퐇퐦and scaling hto h1, and h2for MEVP, until reach the next LTS 
 
No matrix factorizations during this adaptive stepping! 퐱퐱 푡푡 + ℎ2 = ||퐯퐯||퐕퐕퐦푒푒ℎ2퐇퐇푚푚풆풆ퟏퟏ − 푷푷(푡푡, ℎퟐퟐ) 
ℎ2 
퐱퐱푡푡+ℎ1=||퐯퐯||퐕퐕퐦푒푒ℎ1퐇퐇푚푚풆풆ퟏퟏ−푷푷(풕풕,ℎ1) 
ℎ1 
퐕퐕퐦,퐇퐇풎풎 
34
MATEX’s Distributed Framework 35
More aggressive! 
 
Each computing node is responsible for one set of bumps. 
36
Experimental Results 
 
Test cases:IBM power grid benchmarks 
 
TR: Trapezoidal methodwith fixed time step 
 
MATEX: circuit solver uses R-MATEX 
 
Environment 
 
Linux workstations, 
 
Intel CoreTMi7-4770 3.40GHz processor 
 
32GB memory. 
 
Implemented in MATLAB 2013. 
 
Easytoemulate distributed environment (nosynchronization during the simulation). 
37
Experimental Results 
Design 
MATEX 
# Grp 
trmatex(s) 
trtotal(s) 
Avg 
Err. 
Speedups 
t1000(s)/trmatex(s) 
Speedups 
ttotal(s)/trtotal(s) 
ibmpg1t 
100 
0.50 
0.85 
2.5E-5 
11.9X 
7.3X 
ibmpg2t 
100 
2.02 
3.72 
4.3E-5 
13.4X 
7.7X 
ibmpg3t 
100 
20.15 
45.77 
3.7E-5 
12.2X 
6.0X 
Ibmpg4t 
15 
22.35 
65.66 
3.9E-5 
14.7X 
5.6X 
ibmpg5t 
100 
35.67 
54.21 
1.1E-5 
11.5X 
7.9X 
ibmpg6t 
100 
47.27 
74.94 
3.4E-5 
11.5X 
7.6X 
Design 
TRwith h=10ps 
t1000(s) 
tttotal(s) 
ibmpg1t 
5.94 
6.20 
ibmpg2t 
26.98 
28.61 
ibmpg3t 
245.92 
272.47 
Ibmpg4t 
329.36 
368.55 
ibmpg5t 
408.78 
428.43 
ibmpg6t 
542.04 
567.38 
• Avg Err.: average differences compared 
to all output nodes' solutions provided by 
IBM Power Grid Benchmarks; 
• Speedups t1000/trmatex : transient stepping 
runtime speedups of MATEX over TR; 
• Speedups tttotal/trtotal : total simulation 
runtime speedups of MATEX over TR. 
38
Experimental Results 
Design 
MATEX 
# Grp 
trmatex(s) 
trtotal(s) 
Avg 
Err. 
Speedups 
t1000(s)/trmatex(s) 
Speedups 
tttotal(s)/trtotal(s) 
ibmpg1t 
100 
0.50 
0.85 
2.5E-5 
11.9X 
7.3X 
ibmpg2t 
100 
2.02 
3.72 
4.3E-5 
13.4X 
7.7X 
ibmpg3t 
100 
20.15 
45.77 
3.7E-5 
12.2X 
6.0X 
Ibmpg4t 
15 
22.35 
65.66 
3.9E-5 
14.7X 
5.6X 
ibmpg5t 
100 
35.67 
54.21 
1.1E-5 
11.5X 
7.9X 
ibmpg6t 
100 
47.27 
74.94 
3.4E-5 
11.5X 
7.6X 
Design 
TRwith h=10ps 
t1000(s) 
tttotal(s) 
ibmpg1t 
5.94 
6.20 
ibmpg2t 
26.98 
28.61 
ibmpg3t 
245.92 
272.47 
Ibmpg4t 
329.36 
368.55 
ibmpg5t 
408.78 
428.43 
ibmpg6t 
542.04 
567.38 
• Avg Err.: average differences compared 
to all output nodes' solutions provided by 
IBM Power Grid Benchmarks; 
• Speedups t1000/trmatex : transient stepping 
runtime speedups of MATEX over TR; 
• Speedups tttotal/trtotal : total simulation 
runtime speedups of MATEX over TR. 
39
Experimental Results 
Design 
MATEX 
# Grp 
trmatex(s) 
trtotal(s) 
Avg 
Err. 
Speedups 
t1000(s)/trmatex(s) 
Speedups 
tttotal(s)/trtotal(s) 
ibmpg1t 
100 
0.50 
0.85 
2.5E-5 
11.9X 
7.3X 
ibmpg2t 
100 
2.02 
3.72 
4.3E-5 
13.4X 
7.7X 
ibmpg3t 
100 
20.15 
45.77 
3.7E-5 
12.2X 
6.0X 
Ibmpg4t 
15 
22.35 
65.66 
3.9E-5 
14.7X 
5.6X 
ibmpg5t 
100 
35.67 
54.21 
1.1E-5 
11.5X 
7.9X 
ibmpg6t 
100 
47.27 
74.94 
3.4E-5 
11.5X 
7.6X 
Design 
TRwith h=10ps 
t1000(s) 
tttotal(s) 
ibmpg1t 
5.94 
6.20 
ibmpg2t 
26.98 
28.61 
ibmpg3t 
245.92 
272.47 
Ibmpg4t 
329.36 
368.55 
ibmpg5t 
408.78 
428.43 
ibmpg6t 
542.04 
567.38 
• Avg Err.: average differences compared 
to all output nodes' solutions provided by 
IBM Power Grid Benchmarks; 
• Speedups t1000/trmatex : transient stepping 
runtime speedups of MATEX over TR; 
• Speedups tttotal/trtotal : total simulation 
runtime speedups of MATEX over TR. 
40
Contributions 
 
New time-integration kernel is applied with improved Krylovsubspace-based MEVP approximations for PDNs 
 
Adaptive time stepping without matrix re-factorization during the transient (stepping) simulation 
 
This feature cannot be achieved in low order approximation strategy, e.g., trapezoidal (TR), due to the explicitly embeddedℎin 퐂퐂 ℎ+퐆퐆 2 
 
Distributed computing framework 
 
Decompose simulation task based on LTS, then do superposition using GTS and snapshot to form the final solution. 
 
Explore the advantages of large time stepping, also reduce and reuse Krylov subspaces. 
 
Results of IBM PG benchmarks 
 
Compared to TR with fixed time step (10ps), the speedup of transient stepping is 13Xon average. 
41
THANK YOU 
42

More Related Content

What's hot

circuit_modes_v5
circuit_modes_v5circuit_modes_v5
circuit_modes_v5
Olivier Buu
 
Exp 5 (1)5. Newton Raphson load flow analysis Matlab Software
Exp 5 (1)5.	Newton Raphson load flow analysis Matlab SoftwareExp 5 (1)5.	Newton Raphson load flow analysis Matlab Software
Exp 5 (1)5. Newton Raphson load flow analysis Matlab Software
Shweta Yadav
 
Load flow studies 19
Load flow studies 19Load flow studies 19
Load flow studies 19
Asha Anu Kurian
 
neural networksNnf
neural networksNnfneural networksNnf
neural networksNnf
Sandilya Sridhara
 
APSA LEC 9
APSA LEC 9APSA LEC 9
APSA LEC 9
mehmoodtahir1
 
Mom slideshow
Mom slideshowMom slideshow
Mom slideshow
ashusuzie
 
Metodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang LandauMetodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang Landau
angely alcendra
 
Ieee project reversible logic gates by_amit
Ieee project reversible logic gates  by_amitIeee project reversible logic gates  by_amit
Ieee project reversible logic gates by_amit
Amith Bhonsle
 
Barker_SIAMCSE15
Barker_SIAMCSE15Barker_SIAMCSE15
Barker_SIAMCSE15
Karen Pao
 
gmrit-cse
gmrit-csegmrit-cse
gmrit-cse
Ayyarao T S L V
 
Newton raphson method
Newton raphson methodNewton raphson method
Newton raphson method
Revathi Subramaniam
 
Neural Network
Neural NetworkNeural Network
Neural Network
samisounda
 
Phonons & Phonopy: Pro Tips (2015)
Phonons & Phonopy: Pro Tips (2015)Phonons & Phonopy: Pro Tips (2015)
Phonons & Phonopy: Pro Tips (2015)
Jonathan Skelton
 
Graph Kernelpdf
Graph KernelpdfGraph Kernelpdf
Graph Kernelpdf
pratik shukla
 
Computational electromagnetics
Computational electromagneticsComputational electromagnetics
Computational electromagnetics
Awaab Fakih
 
Load flow study
Load flow studyLoad flow study
Load flow study
f s
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Load Flow Analysis of Jamshoro Thermal Power Station (JTPS) Pakistan Using MA...
Load Flow Analysis of Jamshoro Thermal Power Station (JTPS) Pakistan Using MA...Load Flow Analysis of Jamshoro Thermal Power Station (JTPS) Pakistan Using MA...
Load Flow Analysis of Jamshoro Thermal Power Station (JTPS) Pakistan Using MA...
sunny katyara
 
Reversible code converter
Reversible code converterReversible code converter
Reversible code converter
Rakesh kumar jha
 
Cost Efficient PageRank Computation using GPU : NOTES
Cost Efficient PageRank Computation using GPU : NOTESCost Efficient PageRank Computation using GPU : NOTES
Cost Efficient PageRank Computation using GPU : NOTES
Subhajit Sahu
 

What's hot (20)

circuit_modes_v5
circuit_modes_v5circuit_modes_v5
circuit_modes_v5
 
Exp 5 (1)5. Newton Raphson load flow analysis Matlab Software
Exp 5 (1)5.	Newton Raphson load flow analysis Matlab SoftwareExp 5 (1)5.	Newton Raphson load flow analysis Matlab Software
Exp 5 (1)5. Newton Raphson load flow analysis Matlab Software
 
Load flow studies 19
Load flow studies 19Load flow studies 19
Load flow studies 19
 
neural networksNnf
neural networksNnfneural networksNnf
neural networksNnf
 
APSA LEC 9
APSA LEC 9APSA LEC 9
APSA LEC 9
 
Mom slideshow
Mom slideshowMom slideshow
Mom slideshow
 
Metodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang LandauMetodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang Landau
 
Ieee project reversible logic gates by_amit
Ieee project reversible logic gates  by_amitIeee project reversible logic gates  by_amit
Ieee project reversible logic gates by_amit
 
Barker_SIAMCSE15
Barker_SIAMCSE15Barker_SIAMCSE15
Barker_SIAMCSE15
 
gmrit-cse
gmrit-csegmrit-cse
gmrit-cse
 
Newton raphson method
Newton raphson methodNewton raphson method
Newton raphson method
 
Neural Network
Neural NetworkNeural Network
Neural Network
 
Phonons & Phonopy: Pro Tips (2015)
Phonons & Phonopy: Pro Tips (2015)Phonons & Phonopy: Pro Tips (2015)
Phonons & Phonopy: Pro Tips (2015)
 
Graph Kernelpdf
Graph KernelpdfGraph Kernelpdf
Graph Kernelpdf
 
Computational electromagnetics
Computational electromagneticsComputational electromagnetics
Computational electromagnetics
 
Load flow study
Load flow studyLoad flow study
Load flow study
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Load Flow Analysis of Jamshoro Thermal Power Station (JTPS) Pakistan Using MA...
Load Flow Analysis of Jamshoro Thermal Power Station (JTPS) Pakistan Using MA...Load Flow Analysis of Jamshoro Thermal Power Station (JTPS) Pakistan Using MA...
Load Flow Analysis of Jamshoro Thermal Power Station (JTPS) Pakistan Using MA...
 
Reversible code converter
Reversible code converterReversible code converter
Reversible code converter
 
Cost Efficient PageRank Computation using GPU : NOTES
Cost Efficient PageRank Computation using GPU : NOTESCost Efficient PageRank Computation using GPU : NOTES
Cost Efficient PageRank Computation using GPU : NOTES
 

Similar to MATEX @ DAC14

Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Florent Renucci
 
DAC15 Hao Zhuang poster
DAC15 Hao Zhuang poster DAC15 Hao Zhuang poster
DAC15 Hao Zhuang poster
Hao Zhuang
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15
Karen Pao
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
ANIRBANMAJUMDAR18
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
Austin Benson
 
fault analysis.pptx
fault analysis.pptxfault analysis.pptx
fault analysis.pptx
lovish34
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
Aritra Sarkar
 
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
Alexander Decker
 
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
Alexander Decker
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
Data Con LA
 
solver (1)
solver (1)solver (1)
solver (1)
Raj Mitra
 
document(1).pdf
document(1).pdfdocument(1).pdf
document(1).pdf
MohamedBalbaa8
 
Fundamentals of quantum computing part i rev
Fundamentals of quantum computing   part i revFundamentals of quantum computing   part i rev
Fundamentals of quantum computing part i rev
PRADOSH K. ROY
 
Chaotic Communication for mobile applica
Chaotic Communication for mobile applicaChaotic Communication for mobile applica
Chaotic Communication for mobile applica
YaseenMo
 
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
m.a.kirn
 
Presentation on shunt active filter
Presentation on shunt active filterPresentation on shunt active filter
Presentation on shunt active filter
Maharshi Gohel
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
Reduction of Active Power Loss byUsing Adaptive Cat Swarm Optimization
Reduction of Active Power Loss byUsing Adaptive Cat Swarm OptimizationReduction of Active Power Loss byUsing Adaptive Cat Swarm Optimization
Reduction of Active Power Loss byUsing Adaptive Cat Swarm Optimization
ijeei-iaes
 
Machine learning ppt and presentation code
Machine learning ppt and presentation codeMachine learning ppt and presentation code
Machine learning ppt and presentation code
sharma239172
 

Similar to MATEX @ DAC14 (20)

Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
DAC15 Hao Zhuang poster
DAC15 Hao Zhuang poster DAC15 Hao Zhuang poster
DAC15 Hao Zhuang poster
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
 
fault analysis.pptx
fault analysis.pptxfault analysis.pptx
fault analysis.pptx
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
 
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
 
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
A chaotic particle swarm optimization (cpso) algorithm for solving optimal re...
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
solver (1)
solver (1)solver (1)
solver (1)
 
document(1).pdf
document(1).pdfdocument(1).pdf
document(1).pdf
 
Fundamentals of quantum computing part i rev
Fundamentals of quantum computing   part i revFundamentals of quantum computing   part i rev
Fundamentals of quantum computing part i rev
 
Chaotic Communication for mobile applica
Chaotic Communication for mobile applicaChaotic Communication for mobile applica
Chaotic Communication for mobile applica
 
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
 
Presentation on shunt active filter
Presentation on shunt active filterPresentation on shunt active filter
Presentation on shunt active filter
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Reduction of Active Power Loss byUsing Adaptive Cat Swarm Optimization
Reduction of Active Power Loss byUsing Adaptive Cat Swarm OptimizationReduction of Active Power Loss byUsing Adaptive Cat Swarm Optimization
Reduction of Active Power Loss byUsing Adaptive Cat Swarm Optimization
 
Machine learning ppt and presentation code
Machine learning ppt and presentation codeMachine learning ppt and presentation code
Machine learning ppt and presentation code
 

Recently uploaded

ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))
shivani5543
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 

Recently uploaded (20)

ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))gray level transformation unit 3(image processing))
gray level transformation unit 3(image processing))
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 

MATEX @ DAC14

  • 1. 1. Computer Science & Engineering Dept., University of California, San Diego, CA 2. Facebook Inc., Menlo Park, CA MATEX: A Distributed Framework of Transient Simulation for Power Distribution Networks * Email: zhuangh@ucsd.edu Hao Zhuang1*, Shih-Hung Weng2, Jeng-HauLin1, Chung-KuanCheng1
  • 2. Outline  Problem Formulation  MATEX Framework  Circuit Solver  Matrix Exponential Kernel  Krylov Subspace Accelerations for PDNs  Distributed Framework  Linear system’s Superposition Propertyand Parallel Processing  Reduce Krylov Subspace Computations  Experimental Results  Conclusions 2
  • 3. Linear differential equations 퐂퐂̇퐱퐱푡푡=−퐆퐆퐆(푡푡)+퐁퐁퐁퐁(푡푡) Tens of millions or billions unknowns Problem Formulation for PDN Transient Simulation 퐂퐂:capacitance/inductance matrix 퐆퐆:conductance matrix 퐱퐱(푡푡):voltage/current vector 퐁퐁: input selection matrix 퐮 푡푡:input current sources (vector) PDN structure RLC model 3
  • 4. Previous Work  Time step size ℎis determined by  Input transition distances defines the upper bound of the time step, e.g. ℎ2=min(ℎ1,ℎ2,ℎ3)  Stiffness of systems  Local truncation error(LTE) ℎ1 ℎ2 ℎ3 A pulse input example  Low order approximations, e.g. Trapezoidal method (TR) 퐂퐂 ℎ+퐆퐆 2퐱퐱푡푡+ℎ=퐂퐂 ℎ−퐆퐆 2퐱퐱푡푡+퐁퐁퐮푡푡+ℎ+퐮(푡푡) 2 TRwith fixed time-step ℎwas used by the top solvers in TAU’12 power grid (PG) simulation contest Efficient for IBM PG Benchmarks Only one matrix factorization for transient stepping Process forward and backward substitutions to calculate 퐱퐱푡푡+ℎ 4
  • 5. Our Matrix Exponential Method  Analytical solution [Weng, et. al., IEEE TCAD 2012] 퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀퐱퐱(푡푡)+න 0 ℎ 푒푒(ℎ−휏휏)퐀퐀퐛퐛(푡푡+휏휏)푑푑휏휏 where퐀퐀=−퐂퐂−ퟏퟏ퐆퐆,퐛퐛=퐂퐂−ퟏퟏ퐁퐁퐁퐁(퐭퐭)  Input sources are piecewise linear (PWL) 퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀(퐱퐱푡푡+퐅퐅푡푡,ℎ)−퐏퐏푡푡,ℎ Where 퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ, 퐏퐏푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+ℎ+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ vector Matrix exponential vector 5
  • 6. Advantage in Accuracy Reference solution  With the same h, Matrix Exponential method can reaches the reference solution, while Backward Euler cannot. 6
  • 7. Not 풆풆퐀퐀, but 풆풆퐀퐀퐯퐯[Weng, et. al. IEEE TCAD 2012]  Compute풆풆퐀퐀is very expensive, when 퐀퐀is large!  풆풆퐀퐀퐯퐯: Matrix Exponential and Vector Product (MEVP)  Efficiently approximated via Krylov subspace (MEXP)  Standard Krylov subspace 푲푲풎풎퐀퐀,퐯퐯=퐯퐯,퐀퐀퐀,퐀퐀ퟐퟐ퐯퐯,…,퐀퐀풎풎−ퟏퟏ퐯퐯  Basis Generation: 퐕퐕풎풎=퐯퐯ퟏퟏ,퐯퐯ퟐퟐ,⋯,퐯퐯풎풎  Arnoldiprocess and Matrix reduction: 퐀퐀퐀풎풎=퐕퐕풎풎퐇퐇풎풎+풉풉풎풎+ퟏퟏ,풎풎퐯퐯풎풎+ퟏퟏ풆풆풎풎퐓퐓  MEVP is computed by 풆풆퐀퐀퐯퐯≈퐯퐯ퟐퟐ퐕퐕풎풎풆풆퐇퐇풎풎풆풆ퟏퟏ  Time stepping only by scaling h, 풆풆ℎ퐀퐀퐯퐯≈퐯퐯ퟐퟐ퐕퐕풎풎풆풆ℎ퐇퐇풎풎풆풆ퟏퟏ 7
  • 8. Algorithm of Computing 퐱퐱(푡푡+ℎ) PDN is a linear system, so that the input matrices 퐗퐗ퟐퟐ, 퐋퐋, 퐔퐔 do not change. 퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝 퐗퐗ퟏퟏ is done only once for the whole simulation. 퐗퐗ퟏퟏ 퐗퐗ퟐퟐ 퐋퐋,퐔퐔 MEXP 퐂퐂 퐆퐆 퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝(퐗퐗퐗) 8
  • 9.  PDNs are usually highly stiff circuits  Generalized eigenvalues spread in a wide range within spectrum of A. (퐀퐀=−퐂퐂−ퟏퟏ퐆퐆)  Requires Standard Krylovsubspace to build a very large number of bases to approximate MEVP. Problem #1: Stiff PDN Circuits 9
  • 10. Next Section  Problem Formulation  MATEX Framework  Circuit Solver  Matrix Exponential Kernel  Krylov Subspace Accelerations for PDNs  Distributed Framework  Linear system’s Superposition Propertyand Parallel Processing  Reduce Krylov Subspace Computations  Experimental Results  Conclusions 10
  • 11. Standard Krylov subspace (MEXP)  (a) Standard Krylov Basis (MEXP): 푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 Im Re 0 (a) Eigenvalues of A: small magnitude of real components Eigenvalues of A: large magnitude of real components 퐀퐀=−퐂퐂−ퟏퟏ퐆퐆 11
  • 12. Standard Krylov subspace (MEXP)  (a) Standard Krylov Basis (MEXP): 푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 Im Re 0 (a) •Fast mode of dynamical behavior of circuits. •Standard Krylovbasis tends to capture these eigenvalues with large magnitude. Eigenvalues of A: small magnitude of real components Eigenvalues of A: large magnitude of real components 12
  • 13. Standard Krylov subspace (MEXP)  (a) Standard Krylov Basis (MEXP): 푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯 Im Re 0 (a) •These eigenvalues defines the major dynamical behavior of circuits. •Demand more bases in order to characterize these eigenvalues Eigenvalues of A: small magnitude of real components Eigenvalues of A: large magnitude of real components 13
  • 14. Inverted Krylov subspace (I-MATEX)  (a) Standard Krylov Basis (MEXP): 푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯  (b) Inverted Krylov Basis (I-MATEX) 푲푲풎풎 퐀퐀−ퟏퟏ, 퐯퐯 = 퐯퐯, 퐀퐀−ퟏퟏ퐯퐯, 퐀퐀−ퟐퟐ 퐯퐯,…, 퐀퐀−풎풎+ퟏퟏ퐯퐯 Im Re Im Re 0 0 (a) (b) Eigenvalues of A: small magnitude of real components Eigenvalues of A: large magnitude of real components 14
  • 15. Inverted Krylov subspace (I-MATEX)  (a) Standard Krylov Basis (MEXP): 푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯  (b) Inverted Krylov Basis (I-MATEX) 푲푲풎풎 퐀퐀−ퟏퟏ, 퐯퐯 = 퐯퐯, 퐀퐀−ퟏퟏ퐯퐯, 퐀퐀−ퟐퟐ 퐯퐯,…, 퐀퐀−풎풎+ퟏퟏ퐯퐯 Im Re Im Re 0 0 (a) (b) Inverted Krylov subspace is more likely to capture these “important” eigenvalues Eigenvalues of A: small magnitude of real components Eigenvalues of A: large magnitude of real components 15
  • 16. Rational Krylov subspace (R-MATEX)  (a) Standard Krylov Basis (MEXP): 푲푲풎풎 퐀퐀, 퐯퐯 = 퐯퐯, 퐀퐀퐀, 퐀퐀ퟐퟐ퐯퐯,…, 퐀퐀풎풎−ퟏퟏ퐯퐯  (c) Rational Krylov Basis (R-MATEX) 푲푲풎풎 (퐈퐈 − 훾훾퐀퐀)−ퟏퟏ, 퐯퐯 = 퐯퐯, (퐈퐈 − 훾훾퐀퐀)−ퟏퟏ퐯퐯, (퐈퐈 − 훾훾퐀퐀)−ퟐퟐ 퐯퐯,…, (퐈퐈 − 훾훾퐀퐀)−풎풎+ퟏퟏ퐯퐯 Im Re Im Re Eigenvalues of A: small magnitude of real components Eigenvalues of A: large magnitude of real components 0 0 (a) (c) •Rational Krylov is still likely to capture these “important” eigenvalues •More robust numerical property 16
  • 17. Error trend of R-MATEX Directly compute 푒푒ℎ퐀퐀 MEVP via R-MATEX 푒푒푒푒푒푒푒푒푒푒=|푒푒ℎ퐀퐀퐯퐯−퐕퐕퐦푒푒ℎ퐇퐇퐦푒푒1|vs. m vs. h Error 17
  • 18. Same Algorithm with Different Input Matrices Still only one 퐋퐋, 퐔퐔 = 퐥퐥퐥_퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝퐝(퐗퐗퐗) 퐗퐗ퟏퟏ 퐗퐗ퟐퟐ 퐇퐇풎풎 MEXP 퐂퐂 퐆퐆 퐇퐇풎풎 I-MATEX 퐆퐆 퐂퐂 퐇퐇퐇푚푚−1 R-MATEX 퐂퐂+휸휸휸 퐂퐂 (퐈퐈−෩퐇퐇 푚푚−1)/휸휸 18
  • 19. Testcases: RC Circuits with Different Stiffness ma: average dimension of Krylov subspace (Vm, Hm) mp: peak dimension of Krylov subspace (Vm, Hm) Err(%): relative error compared to reference solution. Speedups brought by Krylov subspace reduction Stiffness: |푅푅푅{휆휆푚푚푚푚푚푚퐴퐴}| |푅푅푅{휆휆푚푚푎푎푎푎퐴퐴}| Method 푚푚푎푎 푚푚푝푝 Err(%) Speedup/MEXP Stiffness MEXP 211.4 229 0.510 1X 2.1X1016 I-MATEX 5.7 14 0.004 2616X R-MATEX 6.9 12 0.004 2735X MEXP 154.2 224 0.004 1X 2.1X1012 I-MATEX 5.7 14 0.004 583X R-MATEX 6.9 12 0.004 611X MEXP 148.6 223 0.004 1X 2.1X108 I-MATEX 5.7 14 0.004 229X R-MATEX 6.9 12 0.004 252X 19
  • 20. Problem #2: Initial Vector Change MEVP=푒푒퐀퐀퐯퐯 Once 퐯퐯changes, we need to compute 푲푲풎풎for MEVP. initial vector of 푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯 20
  • 21. Problem #2: Initial Vector Change changes when input sources cannot keep the previous trend MEVP=푒푒퐀퐀퐯퐯 Once 퐯퐯changes, we need to compute 푲푲풎풎for MEVP. In circuit solver, 퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀(퐱퐱푡푡+퐅퐅푡푡,ℎ)−퐏퐏푡푡,ℎ where 퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ initial vector of 푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯 initial vector 21
  • 22. Problem #2: Initial Vector Change MEVP=푒푒퐀퐀퐯퐯 Once 퐯퐯changes, we need to compute 푲푲풎풎for MEVP. 퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ initial vector of 푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯 A pulse input example, •the dash lines are places where initial vector changes •“transition spot” changes when input sources cannot keep the previous trend 22
  • 23. Problem #2: Initial Vector Change changes when input sources cannot keep the previous trend MEVP=푒푒퐀퐀퐯퐯 Once 풗풗changes, we need to compute 푲푲풎풎for MEVP. In circuit solver, 퐱퐱푡푡+ℎ=푒푒ℎ퐀퐀(퐱퐱푡푡+퐅퐅푡푡,ℎ)−퐏퐏푡푡,ℎ where 퐅퐅푡푡,ℎ=퐀퐀−ퟏퟏ퐛퐛푡푡+퐀퐀−ퟐퟐ퐛퐛푡푡+ℎ−퐛퐛푡푡 ℎ initial vector of 푲푲풎풎(퐈퐈−훾훾퐀퐀)−ퟏퟏ,퐯퐯 initial vector  Many input current sources in PDN make the initial vector change frequently, which triggers Krylovsubspace generations and consumes runtime (trouble maker). 23
  • 24. Next Section  Problem Formulation  MATEX Framework  Circuit Solver  Matrix Exponential Kernel  Krylov Subspace Accelerations for PDNs  Distributed Framework  Linear system’s Superposition Propertyand Parallel Processing  Reduce Krylov Subspace Computations  Experimental Results  Conclusions 24
  • 25. Input sources, the trouble maker A PDN with three input current sources. 25
  • 26. Input sources, the trouble maker A PDN with three input current sources. 26
  • 27. Input sources, the trouble maker Some definitions  Local Transition Spot (LTS):foroneinputsource,its transitionspots.  Global Transition Spot (GTS):theunionof all LTS.  Snapshot:foroneinputsource,the spot in GTS but not in LTS. A PDN with three input current sources. 27
  • 28. Input sources, the trouble maker Some definitions  Local Transition Spot (LTS):for one input source, its transition spots.  Global Transition Spot (GTS):theunionof all LTS  Snapshot:foroneinputsource,the spot in GTS but not in LTS A PDN with three input current sources. Simulating circuit with input sources as a whole, GTS triggers Krylov subspace generations. 28
  • 29. Input sources, the trouble maker How about simulating the circuit with individual source, then sum them up later by superposition? A PDN with three input current sources. Some definitions Local Transition Spot (LTS):foroneinputsource,its transitionspots. Global Transition Spot (GTS):theunionof all LTS. Snapshot:foroneinputsource,the spot in GTS but not in LTS. 29
  • 30. Reduce the Krylov subspace generation chances and reuse subspace  For one input source, LTS is much smaller than GTS.  Meanwhile, the snapshot is needed to keep track for later superposition.  Compute snapshot without extra Krylov subspace generations. 30
  • 31. Reduce the Krylov subspace generation chances and reuse subspace  Given an previous solution x(t) 퐱퐱 푡푡 31
  • 32. Reduce the Krylov subspace generation chances and reuse subspace  To compute the solution at snapshot 퐱퐱푡푡+ℎ1and 퐱퐱푡푡+ℎ2without Krylov subspace generations 퐱퐱 푡푡 + ℎ1 퐱퐱푡푡+ℎ2 ℎ1 ℎ2 32
  • 33. Reduce the Krylov subspace generation chances and reuse subspace  Generate 퐕퐕퐦and 퐇퐇퐦at t 퐕퐕퐦, 퐇퐇풎풎 푡푡 33
  • 34. Reduce the Krylov subspace generation chances and reuse subspace  Use 퐕퐕퐦,퐇퐇퐦and scaling hto h1, and h2for MEVP, until reach the next LTS  No matrix factorizations during this adaptive stepping! 퐱퐱 푡푡 + ℎ2 = ||퐯퐯||퐕퐕퐦푒푒ℎ2퐇퐇푚푚풆풆ퟏퟏ − 푷푷(푡푡, ℎퟐퟐ) ℎ2 퐱퐱푡푡+ℎ1=||퐯퐯||퐕퐕퐦푒푒ℎ1퐇퐇푚푚풆풆ퟏퟏ−푷푷(풕풕,ℎ1) ℎ1 퐕퐕퐦,퐇퐇풎풎 34
  • 36. More aggressive!  Each computing node is responsible for one set of bumps. 36
  • 37. Experimental Results  Test cases:IBM power grid benchmarks  TR: Trapezoidal methodwith fixed time step  MATEX: circuit solver uses R-MATEX  Environment  Linux workstations,  Intel CoreTMi7-4770 3.40GHz processor  32GB memory.  Implemented in MATLAB 2013.  Easytoemulate distributed environment (nosynchronization during the simulation). 37
  • 38. Experimental Results Design MATEX # Grp trmatex(s) trtotal(s) Avg Err. Speedups t1000(s)/trmatex(s) Speedups ttotal(s)/trtotal(s) ibmpg1t 100 0.50 0.85 2.5E-5 11.9X 7.3X ibmpg2t 100 2.02 3.72 4.3E-5 13.4X 7.7X ibmpg3t 100 20.15 45.77 3.7E-5 12.2X 6.0X Ibmpg4t 15 22.35 65.66 3.9E-5 14.7X 5.6X ibmpg5t 100 35.67 54.21 1.1E-5 11.5X 7.9X ibmpg6t 100 47.27 74.94 3.4E-5 11.5X 7.6X Design TRwith h=10ps t1000(s) tttotal(s) ibmpg1t 5.94 6.20 ibmpg2t 26.98 28.61 ibmpg3t 245.92 272.47 Ibmpg4t 329.36 368.55 ibmpg5t 408.78 428.43 ibmpg6t 542.04 567.38 • Avg Err.: average differences compared to all output nodes' solutions provided by IBM Power Grid Benchmarks; • Speedups t1000/trmatex : transient stepping runtime speedups of MATEX over TR; • Speedups tttotal/trtotal : total simulation runtime speedups of MATEX over TR. 38
  • 39. Experimental Results Design MATEX # Grp trmatex(s) trtotal(s) Avg Err. Speedups t1000(s)/trmatex(s) Speedups tttotal(s)/trtotal(s) ibmpg1t 100 0.50 0.85 2.5E-5 11.9X 7.3X ibmpg2t 100 2.02 3.72 4.3E-5 13.4X 7.7X ibmpg3t 100 20.15 45.77 3.7E-5 12.2X 6.0X Ibmpg4t 15 22.35 65.66 3.9E-5 14.7X 5.6X ibmpg5t 100 35.67 54.21 1.1E-5 11.5X 7.9X ibmpg6t 100 47.27 74.94 3.4E-5 11.5X 7.6X Design TRwith h=10ps t1000(s) tttotal(s) ibmpg1t 5.94 6.20 ibmpg2t 26.98 28.61 ibmpg3t 245.92 272.47 Ibmpg4t 329.36 368.55 ibmpg5t 408.78 428.43 ibmpg6t 542.04 567.38 • Avg Err.: average differences compared to all output nodes' solutions provided by IBM Power Grid Benchmarks; • Speedups t1000/trmatex : transient stepping runtime speedups of MATEX over TR; • Speedups tttotal/trtotal : total simulation runtime speedups of MATEX over TR. 39
  • 40. Experimental Results Design MATEX # Grp trmatex(s) trtotal(s) Avg Err. Speedups t1000(s)/trmatex(s) Speedups tttotal(s)/trtotal(s) ibmpg1t 100 0.50 0.85 2.5E-5 11.9X 7.3X ibmpg2t 100 2.02 3.72 4.3E-5 13.4X 7.7X ibmpg3t 100 20.15 45.77 3.7E-5 12.2X 6.0X Ibmpg4t 15 22.35 65.66 3.9E-5 14.7X 5.6X ibmpg5t 100 35.67 54.21 1.1E-5 11.5X 7.9X ibmpg6t 100 47.27 74.94 3.4E-5 11.5X 7.6X Design TRwith h=10ps t1000(s) tttotal(s) ibmpg1t 5.94 6.20 ibmpg2t 26.98 28.61 ibmpg3t 245.92 272.47 Ibmpg4t 329.36 368.55 ibmpg5t 408.78 428.43 ibmpg6t 542.04 567.38 • Avg Err.: average differences compared to all output nodes' solutions provided by IBM Power Grid Benchmarks; • Speedups t1000/trmatex : transient stepping runtime speedups of MATEX over TR; • Speedups tttotal/trtotal : total simulation runtime speedups of MATEX over TR. 40
  • 41. Contributions  New time-integration kernel is applied with improved Krylovsubspace-based MEVP approximations for PDNs  Adaptive time stepping without matrix re-factorization during the transient (stepping) simulation  This feature cannot be achieved in low order approximation strategy, e.g., trapezoidal (TR), due to the explicitly embeddedℎin 퐂퐂 ℎ+퐆퐆 2  Distributed computing framework  Decompose simulation task based on LTS, then do superposition using GTS and snapshot to form the final solution.  Explore the advantages of large time stepping, also reduce and reuse Krylov subspaces.  Results of IBM PG benchmarks  Compared to TR with fixed time step (10ps), the speedup of transient stepping is 13Xon average. 41