Center for PRedictive Integrated
Structural Materials Science
Computa(onal	Performance	of	Phase	Field	
Calcula(ons	using	a	Matrix-Free	(Sum-
Factoriza(on)	Finite	Element	Method	
Stephen	DeWiA,	Shiva	Rudraraju,	and	Katsuyo	Thornton		
	
Department	of	Materials	Science	and	Engineering	
University	of	Michigan	
	
	
PRISMS-PF
An Open-Source Phase Field Modeling Framework
Center for PRedictive Integrated
Structural Materials Science
Background:	Phase	Field	Modeling	
•  Diffuse	interface	approach	to	modeling	
microstructure	evolu(on	
•  Used	to	study	phase	separa(on	in	systems	with	
2+	free	energy	minima	
•  Evolu(on	equa(ons	derived	from	a	free	energy	
func(onal		
–  May	or	may	not	have	a	physical	basis	
•  Applica(ons	include:		
	solidifica(on,	precipita(on,	grain	growth,	
	phase	separa(on	in	baAeries,	deposi(on,	
	ferroics	
!"
!"
= ! ⋅ !!
!"
!"
	
Allen-Cahn	Equa(on	
(non-conserved	dynamics)	
Cahn-Hilliard	Equa(on	
(conserved	dynamics)	
Spinodal	Decomposi(on	
(Cahn-Hilliard	Equa(on)
Center for PRedictive Integrated
Structural Materials Science
Common	Numerical	Approaches	
•  Finite	difference	method	
	
•  Spectral/FFT	methods	
	
•  Finite	volume	method	
	
•  Finite	element	method	
In-House	Custom	
Codes	
In-House	Custom	
Codes	
MMSP	
PRISMS-PF
An Open-Source Phase Field Modeling Framework
Center for PRedictive Integrated
Structural Materials Science
Phase	field	calcula(ons	are	expensive	
Simula(ons	of	scien(fic	value	oXen	require	
10s-100s	of	cores	for	days	
	
Phase	field	modelers	can	be	very	performance	
sensi:ve	
	
S(ck	to	MPI-parallelized	in-house	finite	
difference	or	in-house	spectral	codes	that	are	
simple	and	fast
Center for PRedictive Integrated
Structural Materials Science
Common	Numerical	Approaches	
•  Finite	difference	method	
	
•  Spectral/FFT	methods	
	
•  Finite	volume	method	
	
•  Finite	element	method	
In-House	Custom	
Codes	
In-House	Custom	
Codes	
MMSP	
PRISMS-PF
An Open-Source Phase Field Modeling Framework
Center for PRedictive Integrated
Structural Materials Science
A	Quote	from	a	Skep(c	
“People	keep	coming	around	with	fancy	finite	
element	codes,	but	in	the	end	they’re	always	
slower	than	finite	difference	codes”	
	
Is	there	an	alterna(ve	pathway	with	comparable	
performance?	
(Spoiler	alert:	Yes)
Center for PRedictive Integrated
Structural Materials Science
Sum-Factoriza(on	to	Enhance	Finite	
Element	Code	Performance	
Tradi(onal	finite	element	
approach	
•  Separate	operator	
evalua(on	into	two	steps:	
–  Finite	element	assembly	
–  Linear	algebra	
•  A	global	sparse	matrix	is	
the	intermediate	
between	these	steps	
•  Accessing	and	wri(ng	to	
this	matrix	can	be	a	
performance	boAleneck	
Sum-Factoriza(on	Approach	
•  Operator	is	applied	cell-
by-cell	
•  No	global	sparse	matrix	
leads	to	substan(al	
reduc(on	in	memory	for	
the	operator	
•  Can	break	evalua(on	of	
each	cell	into	a	series	of	
1D	opera(ons	
	
Kronbichler	and	Kormann,	Computers	&	Fluids,	63	(2012)
Center for PRedictive Integrated
Structural Materials Science
Gauss-LobaAo	Quadrature	to	
Eliminate	the	Need	for	Mass-Lumping		
Two	advantages	of	Gauss-LobaAo	Quadrature:	
1.  Nodal	points	of	the	Lagrange	polynomials	are	clustered	
toward	the	element	boundaries,	improving	condi(oning	
at	high	degree	
2.  Some	nodes	are	on	the	element	ver(ces,	leading	to	a	
diagonal	“mass	matrix”	
–  Trivially	invertable	without	“mass	lumping”	
–  Similar	to	treatment	in	finite	difference	
–  Large	improvement	in	performance	for	explicit	(me	stepping	
Sum-factoriza:on	using	Gauss-LobaBo	quadrature	
available	as	part	of	the	deal.II	FE	library
Center for PRedictive Integrated
Structural Materials Science
Sum-Factoriza(on	+	Gauss	LobaAo	vs.	
Sparse	Matrix	
•  Even	performance	for	
linear	elements	
•  4x	faster	for	quadra(c	
elements	
•  Even	larger	gains	for	
higher	degrees	
Time	for	a	single	gradient		
operator	calcula(on,	fixed	DoF	
Kronbichler	and	Kormann,	Computers	&	Fluids,	63	(2012)
Center for PRedictive Integrated
Structural Materials Science
User-Friendly:	
Simple	interface	to	solve	an	arbitrary	
number	of	coupled	PDEs	
Detailed	user	guide	
22	applica(ons	to	get	you	started	
	
	
High-Performance:	
Built	on	the	deal.II	library	
Sum-factoriza(on	+	Gauss-LobaAo	
elements	
Ideal	scaling	for	>1,000	processors	
Adap(ve	meshing	
	
An	Open	Source,	Finite	Element,		
General	Purpose	Phase-Field	Plajorm	
(github.com/prisms-center/phaseField)	
PRISMS-PF
An Open-Source Phase Field Modeling Framework
Center for PRedictive Integrated
Structural Materials Science
Performance	Comparison:	Ostwald	
Ripening	
•  PRISMS-PF	and	custom	finite	difference	
code	
–  WriAen	in	Fortran	with	MPI	
paralleliza(on	
–  Second	order	central	differencing	
–  Explicit	(me	stepping	(forward	Euler)	
•  3D	Ostwald	ripening,	2	par(cles	
–  Coupled	Cahn-Hilliard/Allen-Cahn	
–  Closely	related	to	a	number	of	problems	
of	physical	interest	(precipita(on,	
solidifica(on,	etc.)	
•  Problem	fully	defined	before	
performance	comparison	conducted	
•  Simula(ons	run	on	16	cores	
•  Time	step	set	at	the	CFL	condi(on
Center for PRedictive Integrated
Structural Materials Science
Performance	Comparison:	Ostwald	
Ripening	
10
1
10
2
10
3
L2
Error
10
1
102
10
3
10
4
WallTime
FD
PRISMS-PF, 1st, regular
6	pts.	in		
interface	
3	pts.	in		
interface	
PRISMS-PF	w/	linear	elements	
•  Similar	error	to	FD	at	same	mesh	size	
•  FD	is	4.5x	faster	
	
PRISMS-PF	w/	quadra(c	elements	
•  1.6x	faster	than	FD	at	moderate	
resolu(on	
•  Fewer	floa(ng	point	opera(ons	per	
DoF,	beAer	OOA	than	linear	
PRISMS-PF	w/	adap(ve	quadra(c	
elements	
•  Faster	than	FD	at	both	resolu(ons	
•  6.7x	faster	than	FD	at	moderate	
resolu(on
Center for PRedictive Integrated
Structural Materials Science
Performance	Comparison:	Ostwald	
Ripening	
10
1
10
2
10
3
L2
Error
10
1
102
10
3
10
4
WallTime
FD
PRISMS-PF, 1st, regular
PRISMS-PF, 2nd, regular
6	pts.	in		
interface	
3	pts.	in		
interface	
1.6x	
PRISMS-PF	w/	linear	elements	
•  Similar	error	to	FD	at	same	mesh	size	
•  FD	is	4.5x	faster	
	
PRISMS-PF	w/	quadra(c	elements	
•  1.6x	faster	than	FD	at	moderate	
resolu(on	
•  Fewer	floa(ng	point	opera(ons	per	
DoF,	beAer	OOA	than	linear
Center for PRedictive Integrated
Structural Materials Science
Performance	Comparison:	Ostwald	
Ripening	
PRISMS-PF	w/	linear	elements	
•  Similar	error	to	FD	at	same	mesh	size	
•  FD	is	4.5x	faster	
	
PRISMS-PF	w/	quadra(c	elements	
•  1.6x	faster	than	FD	at	moderate	
resolu(on	
•  Fewer	floa(ng	point	opera(ons	per	
DoF,	beAer	OOA	than	linear	
PRISMS-PF	w/	adap(ve	quadra(c	
elements	
•  Faster	than	FD	at	both	resolu(ons	
•  6.7x	faster	than	FD	at	moderate	
resolu(on	
101
102
103
L2
Error
10
1
102
103
10
4
WallTime
FD
PRISMS-PF, 1st, regular
PRISMS-PF, 2nd, regular
PRISMS-PF, 2nd, adaptive
6	pts.	in		
interface	
3	pts.	in		
interface	
6.7x
Center for PRedictive Integrated
Structural Materials Science
Drawbacks	to	the	Ostwald	Ripening	
Comparison	Test		
The	previous	test	problem	has	several	
drawbacks:	
1.  Computa(onally	expensive	because	3D	
2.  To	avoid	interpola(on	error,	mesh	size	must	vary	by	
factors	of	two	
3.  No	analy(c	solu(on,	so	must	be	compared	to	highly	
refined	“gold	standard”	calcula(on	(very	expensive,	
not	portable)
Center for PRedictive Integrated
Structural Materials Science
PFHub	MMS	Benchmark	
•  Benchmark	problem	7	on	PFHub	was	developed	
to	escape	these	drawbacks	
•  Method	of	Manufactured	Solu(ons	(MMS)	used	
to	generate	an	Allen-Cahn	solu(on	
•  Because	the	solu(on	is	known,	the	exact	error	
can	be	unambiguously	calculated	
hAps://pages.nist.gov/poub/	
Benchmark	7b
Center for PRedictive Integrated
Structural Materials Science
Performance	Comparison:	PFHub	7b	
10
-4
10
-3
10
-2
L2
Error
10
0
10
1
102
10
3
10
4
10
5
ComputationalCost(core-s)
FD
PRISMS-PF, 1st, regular
PRISMS-PF, 2nd, regular
PRISMS-PF, 3rd, regular
8	pts.	in		
interface	
4	pts.	in		
interface	
16	pts.	in		
interface	
PRISMS-PF	w/	linear	elements	
•  Always	slower	than	quadra(c	or	
cubic	elements	
•  5x	slower	than	FD	(no	adap(vity)	
	
PRISMS-PF	w/	quadra(c	elements	
•  Not	faster	than	FD	un(l	very	low	
error	
•  4x	slower	than	FD	(no	adap(vity)	
PRISMS-PF	w/	cubic	elements	
•  No	advantage	over	quadra(c	
elements	
•  4x	slower	than	FD	(no	adap(vity)
Center for PRedictive Integrated
Structural Materials Science
10
-4
10
-3
10
-2
L2
Error
100
10
1
10
2
103
10
4
10
5
ComputationalCost(core-s)
FD
PRISMS-PF, 1st, regular
PRISMS-PF, 2nd, regular
PRISMS-PF, 3rd, regular
PRISMS-PF, 2nd, adaptive
PRISMS-PF, 3rd, adaptive
Performance	Comparison:	PFHub	7b	
PRISMS-PF	w/	linear	elements	
•  Always	slower	than	quadra(c	or	
cubic	elements	
•  5x	slower	than	FD	(no	adap(vity)	
•  3x	faster	than	FD	(w/	adap(vity)	
	
PRISMS-PF	w/	quadra(c	elements	
•  Not	faster	than	FD	un(l	very	low	
error	
•  4x	slower	than	FD	(no	adap(vity)	
•  4x	faster	than	FD	(w/adap(vity)	
PRISMS-PF	w/	cubic	elements	
•  No	advantage	over	quadra(c	
elements	
•  4x	slower	than	FD	(no	adap(vity)	
•  3x	faster	than	FD	(w/	adap(vity)	
8	pts.	in		
interface	
4	pts.	in		
interface	
16	pts.	in		
interface
Center for PRedictive Integrated
Structural Materials Science
Concluding	Comments	
•  Performance	of	finite	element	calcula(ons	can	be	improved	using	sum-
factoriza(on	+	Gauss-LobaAo	quadrature	
•  FE	w/	linear	elements	and	regular	mesh	is	“only”	5x	slower	than	FD	
•  Ostwald	ripening	test:	
–  Improvement	in	error	and	stable	(me	step	makes	FE	w/	quadra(c	elements	
1.6x	faster	than	FD	
–  With	adap(ve	meshing	increases	to	7x	faster	
•  MMS	benchmark	test:	
–  Smaller	performance	gain	from	linear	to	quadra(c,	(me	step?	
–  With	adap(ve	meshing,	PRISMS-PF	can	be	4x	faster	than	FD	
–  Benchmark	may	need	modifica(on	to	be	representa(ve	of	a	real	PF	problem	
“People	keep	coming	around	with	fancy	finite	element	codes,	
but	in	the	end	they’re	always	slower	than	finite	difference	
codes”
Center for PRedictive Integrated
Structural Materials Science
Acknowledgements	
Computa(onal	Resources:	
	
	
	
	
Funding:	
US	DOE,	Office	of	Science,	Basic	Energy	Sciences	
Award	DE-SC0008637	
	
	
	
	
	
deal.II	Developers:
Center for PRedictive Integrated
Structural Materials Science
Ques(ons?	
Interested	in	PRISMS-PF?		
Talk	to	me	or	visit	us	on	GitHub:	
hAps://github.com/prisms-center/phaseField

Computational Performance of Phase Field Calculations using a Matrix-Free (Sum-Factorization) Finite Element Method

  • 1.
    Center for PRedictiveIntegrated Structural Materials Science Computa(onal Performance of Phase Field Calcula(ons using a Matrix-Free (Sum- Factoriza(on) Finite Element Method Stephen DeWiA, Shiva Rudraraju, and Katsuyo Thornton Department of Materials Science and Engineering University of Michigan PRISMS-PF An Open-Source Phase Field Modeling Framework
  • 2.
    Center for PRedictiveIntegrated Structural Materials Science Background: Phase Field Modeling •  Diffuse interface approach to modeling microstructure evolu(on •  Used to study phase separa(on in systems with 2+ free energy minima •  Evolu(on equa(ons derived from a free energy func(onal –  May or may not have a physical basis •  Applica(ons include: solidifica(on, precipita(on, grain growth, phase separa(on in baAeries, deposi(on, ferroics !" !" = ! ⋅ !! !" !" Allen-Cahn Equa(on (non-conserved dynamics) Cahn-Hilliard Equa(on (conserved dynamics) Spinodal Decomposi(on (Cahn-Hilliard Equa(on)
  • 3.
    Center for PRedictiveIntegrated Structural Materials Science Common Numerical Approaches •  Finite difference method •  Spectral/FFT methods •  Finite volume method •  Finite element method In-House Custom Codes In-House Custom Codes MMSP PRISMS-PF An Open-Source Phase Field Modeling Framework
  • 4.
    Center for PRedictiveIntegrated Structural Materials Science Phase field calcula(ons are expensive Simula(ons of scien(fic value oXen require 10s-100s of cores for days Phase field modelers can be very performance sensi:ve S(ck to MPI-parallelized in-house finite difference or in-house spectral codes that are simple and fast
  • 5.
    Center for PRedictiveIntegrated Structural Materials Science Common Numerical Approaches •  Finite difference method •  Spectral/FFT methods •  Finite volume method •  Finite element method In-House Custom Codes In-House Custom Codes MMSP PRISMS-PF An Open-Source Phase Field Modeling Framework
  • 6.
    Center for PRedictiveIntegrated Structural Materials Science A Quote from a Skep(c “People keep coming around with fancy finite element codes, but in the end they’re always slower than finite difference codes” Is there an alterna(ve pathway with comparable performance? (Spoiler alert: Yes)
  • 7.
    Center for PRedictiveIntegrated Structural Materials Science Sum-Factoriza(on to Enhance Finite Element Code Performance Tradi(onal finite element approach •  Separate operator evalua(on into two steps: –  Finite element assembly –  Linear algebra •  A global sparse matrix is the intermediate between these steps •  Accessing and wri(ng to this matrix can be a performance boAleneck Sum-Factoriza(on Approach •  Operator is applied cell- by-cell •  No global sparse matrix leads to substan(al reduc(on in memory for the operator •  Can break evalua(on of each cell into a series of 1D opera(ons Kronbichler and Kormann, Computers & Fluids, 63 (2012)
  • 8.
    Center for PRedictiveIntegrated Structural Materials Science Gauss-LobaAo Quadrature to Eliminate the Need for Mass-Lumping Two advantages of Gauss-LobaAo Quadrature: 1.  Nodal points of the Lagrange polynomials are clustered toward the element boundaries, improving condi(oning at high degree 2.  Some nodes are on the element ver(ces, leading to a diagonal “mass matrix” –  Trivially invertable without “mass lumping” –  Similar to treatment in finite difference –  Large improvement in performance for explicit (me stepping Sum-factoriza:on using Gauss-LobaBo quadrature available as part of the deal.II FE library
  • 9.
    Center for PRedictiveIntegrated Structural Materials Science Sum-Factoriza(on + Gauss LobaAo vs. Sparse Matrix •  Even performance for linear elements •  4x faster for quadra(c elements •  Even larger gains for higher degrees Time for a single gradient operator calcula(on, fixed DoF Kronbichler and Kormann, Computers & Fluids, 63 (2012)
  • 10.
    Center for PRedictiveIntegrated Structural Materials Science User-Friendly: Simple interface to solve an arbitrary number of coupled PDEs Detailed user guide 22 applica(ons to get you started High-Performance: Built on the deal.II library Sum-factoriza(on + Gauss-LobaAo elements Ideal scaling for >1,000 processors Adap(ve meshing An Open Source, Finite Element, General Purpose Phase-Field Plajorm (github.com/prisms-center/phaseField) PRISMS-PF An Open-Source Phase Field Modeling Framework
  • 11.
    Center for PRedictiveIntegrated Structural Materials Science Performance Comparison: Ostwald Ripening •  PRISMS-PF and custom finite difference code –  WriAen in Fortran with MPI paralleliza(on –  Second order central differencing –  Explicit (me stepping (forward Euler) •  3D Ostwald ripening, 2 par(cles –  Coupled Cahn-Hilliard/Allen-Cahn –  Closely related to a number of problems of physical interest (precipita(on, solidifica(on, etc.) •  Problem fully defined before performance comparison conducted •  Simula(ons run on 16 cores •  Time step set at the CFL condi(on
  • 12.
    Center for PRedictiveIntegrated Structural Materials Science Performance Comparison: Ostwald Ripening 10 1 10 2 10 3 L2 Error 10 1 102 10 3 10 4 WallTime FD PRISMS-PF, 1st, regular 6 pts. in interface 3 pts. in interface PRISMS-PF w/ linear elements •  Similar error to FD at same mesh size •  FD is 4.5x faster PRISMS-PF w/ quadra(c elements •  1.6x faster than FD at moderate resolu(on •  Fewer floa(ng point opera(ons per DoF, beAer OOA than linear PRISMS-PF w/ adap(ve quadra(c elements •  Faster than FD at both resolu(ons •  6.7x faster than FD at moderate resolu(on
  • 13.
    Center for PRedictiveIntegrated Structural Materials Science Performance Comparison: Ostwald Ripening 10 1 10 2 10 3 L2 Error 10 1 102 10 3 10 4 WallTime FD PRISMS-PF, 1st, regular PRISMS-PF, 2nd, regular 6 pts. in interface 3 pts. in interface 1.6x PRISMS-PF w/ linear elements •  Similar error to FD at same mesh size •  FD is 4.5x faster PRISMS-PF w/ quadra(c elements •  1.6x faster than FD at moderate resolu(on •  Fewer floa(ng point opera(ons per DoF, beAer OOA than linear
  • 14.
    Center for PRedictiveIntegrated Structural Materials Science Performance Comparison: Ostwald Ripening PRISMS-PF w/ linear elements •  Similar error to FD at same mesh size •  FD is 4.5x faster PRISMS-PF w/ quadra(c elements •  1.6x faster than FD at moderate resolu(on •  Fewer floa(ng point opera(ons per DoF, beAer OOA than linear PRISMS-PF w/ adap(ve quadra(c elements •  Faster than FD at both resolu(ons •  6.7x faster than FD at moderate resolu(on 101 102 103 L2 Error 10 1 102 103 10 4 WallTime FD PRISMS-PF, 1st, regular PRISMS-PF, 2nd, regular PRISMS-PF, 2nd, adaptive 6 pts. in interface 3 pts. in interface 6.7x
  • 15.
    Center for PRedictiveIntegrated Structural Materials Science Drawbacks to the Ostwald Ripening Comparison Test The previous test problem has several drawbacks: 1.  Computa(onally expensive because 3D 2.  To avoid interpola(on error, mesh size must vary by factors of two 3.  No analy(c solu(on, so must be compared to highly refined “gold standard” calcula(on (very expensive, not portable)
  • 16.
    Center for PRedictiveIntegrated Structural Materials Science PFHub MMS Benchmark •  Benchmark problem 7 on PFHub was developed to escape these drawbacks •  Method of Manufactured Solu(ons (MMS) used to generate an Allen-Cahn solu(on •  Because the solu(on is known, the exact error can be unambiguously calculated hAps://pages.nist.gov/poub/ Benchmark 7b
  • 17.
    Center for PRedictiveIntegrated Structural Materials Science Performance Comparison: PFHub 7b 10 -4 10 -3 10 -2 L2 Error 10 0 10 1 102 10 3 10 4 10 5 ComputationalCost(core-s) FD PRISMS-PF, 1st, regular PRISMS-PF, 2nd, regular PRISMS-PF, 3rd, regular 8 pts. in interface 4 pts. in interface 16 pts. in interface PRISMS-PF w/ linear elements •  Always slower than quadra(c or cubic elements •  5x slower than FD (no adap(vity) PRISMS-PF w/ quadra(c elements •  Not faster than FD un(l very low error •  4x slower than FD (no adap(vity) PRISMS-PF w/ cubic elements •  No advantage over quadra(c elements •  4x slower than FD (no adap(vity)
  • 18.
    Center for PRedictiveIntegrated Structural Materials Science 10 -4 10 -3 10 -2 L2 Error 100 10 1 10 2 103 10 4 10 5 ComputationalCost(core-s) FD PRISMS-PF, 1st, regular PRISMS-PF, 2nd, regular PRISMS-PF, 3rd, regular PRISMS-PF, 2nd, adaptive PRISMS-PF, 3rd, adaptive Performance Comparison: PFHub 7b PRISMS-PF w/ linear elements •  Always slower than quadra(c or cubic elements •  5x slower than FD (no adap(vity) •  3x faster than FD (w/ adap(vity) PRISMS-PF w/ quadra(c elements •  Not faster than FD un(l very low error •  4x slower than FD (no adap(vity) •  4x faster than FD (w/adap(vity) PRISMS-PF w/ cubic elements •  No advantage over quadra(c elements •  4x slower than FD (no adap(vity) •  3x faster than FD (w/ adap(vity) 8 pts. in interface 4 pts. in interface 16 pts. in interface
  • 19.
    Center for PRedictiveIntegrated Structural Materials Science Concluding Comments •  Performance of finite element calcula(ons can be improved using sum- factoriza(on + Gauss-LobaAo quadrature •  FE w/ linear elements and regular mesh is “only” 5x slower than FD •  Ostwald ripening test: –  Improvement in error and stable (me step makes FE w/ quadra(c elements 1.6x faster than FD –  With adap(ve meshing increases to 7x faster •  MMS benchmark test: –  Smaller performance gain from linear to quadra(c, (me step? –  With adap(ve meshing, PRISMS-PF can be 4x faster than FD –  Benchmark may need modifica(on to be representa(ve of a real PF problem “People keep coming around with fancy finite element codes, but in the end they’re always slower than finite difference codes”
  • 20.
    Center for PRedictiveIntegrated Structural Materials Science Acknowledgements Computa(onal Resources: Funding: US DOE, Office of Science, Basic Energy Sciences Award DE-SC0008637 deal.II Developers:
  • 21.
    Center for PRedictiveIntegrated Structural Materials Science Ques(ons? Interested in PRISMS-PF? Talk to me or visit us on GitHub: hAps://github.com/prisms-center/phaseField