Ankit Agrawal
Research Associate Professor
Department of Electrical Engineering and Computer Science,
Northwestern University
Materials Informatics and Big Data:
Realization of “Fourth Paradigm” of Science
in Materials Science
Collaborations:
Alok Choudhary (NU), BP Gautham (TRDDC), Surya Kalidindi
(GaTech), Greg Olson (NU, QuesTek), NIMS, Chris Wolverton (NU),
Logan Ward (UC), Peter Voorhees (NU), Veera Sundararaghavan
(UMich), Marc De Graef (CMU),Wei Chen (NU), Cate Brinson (Duke),
Carelyn Campbell (NIST), Kamal Choudhary (NIST), Francesca
Tavazza (NIST), Andrew Reid (NIST), Stefanos Papanikolaou (WVU)
Artificial Intelligence for Materials Science (AIMS) Workshop
NIST, Gaithersburg, MD
August 07, 2018
Research Thrusts
Materials Genome Initiative
• NIST Center of Excellence: Center for Hierarchical Materials Design (CHiMaD)
• AFOSR MURI: Managing the Mosaic of Microstructure
• DARPA SIMPLEX: Data-Driven Discovery for Designed Thermoelectric Materials
• NSF BigData Spoke: SPOKE: MIDWEST: Collaborative: Integrative Materials
Design (IMaD): Leverage, Innovate, & Disseminate
• NU Data Science Initiative: Data-driven analytics for understanding processing-
structure-property-performance relationships in steel alloys
• Toyota Motor Corporation: The investigation of machine learning for material
development
MGI: Projects
Overview
• Materials Informatics and Big Data
★ Paradigms of Science
★ PSPP Relationships in Materials Science
★ Materials Informatics Knowledge Discovery Workflow
• Illustrative Materials Informatics
★ Forward PSPP models
★ Inverse PSPP models
★ Structure characterization
• Materials Informatics Tools
Overview
• Materials Informatics and Big Data
★ Paradigms of Science
★ PSPP Relationships in Materials Science
★ Materials Informatics Knowledge Discovery Workflow
• Illustrative Materials Informatics
★ Forward PSPP models
★ Inverse PSPP models
★ Structure characterization
• Materials Informatics Tools
Paradigms of Science
A. Agrawal and A. Choudhary, “Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of
science in materials science, APL Materials, 4, 053208 (2016), doi:10.1063/1.4946894
Data Data Everywhere…
Volume: Amount of data (size)
Velocity: Speed with which new data is
generated
Variety: Heterogeneity in the data
Variability: Inconsistency in the data
Veracity: How trustworthy the data is
Value: Knowledge hidden in big data
(needle in a haystack)
Visualization: Ability to interpret the data
and resulting insights
Finding Needle in a Haystack?
PSPP Relationships in Materials Science
A. Agrawal and A. Choudhary, “Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of
science in materials science, APL Materials, 4, 053208 (2016), doi:10.1063/1.4946894
Materials Informatics Knowledge Discovery Workflow
A. Agrawal and A. Choudhary, “Perspective:
Materials informatics and big data:
Realization of the “fourth paradigm” of
science in materials science, APL Materials,
4, 053208 (2016), doi:10.1063/1.4946894
Overview
• Materials Informatics and Big Data
★ Paradigms of Science
★ PSPP Relationships in Materials Science
★ Materials Informatics Knowledge Discovery Workflow
• Illustrative Materials Informatics
★ Forward PSPP models
★ Inverse PSPP models
★ Structure characterization
• Materials Informatics Tools
Illustrative	Materials	Informatics
• Forward	PSPP	models	(property	prediction)
o Steel	fatigue	strength	prediction	[IMMI	2014,	CIKM	2016,	IJF	
2018]
o Formation	energy	prediction	[PRB	2014,	npjCM 2016,	ICDM	
2016,	DL-KDD	2016,	PRB	2017]
o Band	gap	and	glass	forming	ability	prediction	[npjCM 2016]
o Bulk	modulus	prediction	[RSC	Adv 2016]
o Seebeck coefficient	prediction	[JCompChem 2018]
o Data-driven	solutions	to	multi-scale	localization	relationships	
[IMMI	2015,	IMMI	2017,	CMS	2018]
• Inverse	PSPP	models	(optimization/discovery)
o Stable	compounds	[PRB	2014]
o Magnetostrictive materials	(Galfenol)	[Nature	Scientific	
Reports	2015,	AIAA	2018]
o Semiconductors	and	metallic	glasses	[npjCM 2016]	
o Microstructure	design	(GAN)	[JMD	2018]
• Structure	characterization
o EBSD	Indexing	[BigData-ASH	2016]	
o Crack	detection	in	macroscale	images	[CBM	2017,	IJTTE	2018]
Steel	Data	Mining
Online Tool: http://info.eecs.northwestern.edu/SteelFatigueStrengthPredictor
Agrawal	et	al.,	IMMI	2014;	Agrawal	and	Choudhary,	CIKM	2016,	IJF	2018
Significance
• Fatigue	accounts	for	
>90%	of	mechanical	
failures
• High	cost	and	time	of	
fatigue	testing
Goal
• Data-driven	forward	
models	for	fatigue	
strength	of	steels
Experimental	data
• From	NIMS	Japan
• 371	carbon	and	low-
alloy	steels,	48	
carburizing	steels,	
and	18	spring	steels
Results
• R2 >	0.98	for	cross-
validated	models
• Online	tool	deploying	
forward	models
DFT	Data	Mining
Density	Functional	Theory
• Very	slow	simulations
• Require	crystal	structure	as	input
Training	Data
• Hundreds	of	thousands	of	DFT	
calculations	from	(OQMD)
• JARVIS-DFT	(NIST)
Composition-based	models
• 145	attributes	(stoichiometric/	
elemental/electronic/ionic)
Structure-aware	models
• Voronoi	tessellations	to	capture	local	
environment	of	atoms
Deep	learning	models	(ElemNet)
• Use	only	element	fractions
• 20%	more	accurate	and	two	orders	
of	magnitude	faster
• Learn	chemistry	of	materials
Inverse	models
• Stable	compounds,	metallic	glasses,	
semiconductors,	quaternary	heuslers
Software
• FEpredictor,	Magpie
Meredig and	Agrawal	et	al.,	PRB	2014;	Agrawal	et	al.,	ICDM	2016;	Ward	et	al.,	npj Comp	Mat	2016;	Ward	et	al.,	PRB	2017;	Liu	et	al.,	DL-KDD	2016;	Jha	et	al.,	under	review
Online Tool: http://info.eecs.northwestern.edu/FEpredictor
Collaboration	between	Agrawal,	Choudhary,	Wolverton,	Ward,	NIST
CH MaD
Experimental data
• Extracted from literature
• UCSB database
• TE Design database
Methodology
• 187 attributes (composition/
crystallinity/production method)
• Random Forest models
Predictive models
• R2 up to 0.84
• Work for non-stoichiometric
(doped) materials
Outlier analysis
• Helped identify bad data
Software: ThermoEl toolkit
• Seebeck coefficient
• Bulk modulus Furmanchuk et	al.,	RSC	Adv 2016;	J	Comp	Chem 2018
Collaboration	between	Agrawal,	Choudhary,	Olson
Thermoelectrics Data	Mining
Online Tool: http://info.eecs.northwestern.edu/ThermoEl
FEM	Data	Mining:	Homogenization	and	Localization
Collaboration	between	Agrawal,	Choudhary,	Kalidindi
Liu et al., IMMI 2015; Liu et al., IMMI 2017; Yang et al., CMS 2018; Yang et al., under review.
24
Galfenol Microstructure	Optimization
Galfenol
• A	magnetoelastic Fe-Ga	alloy	
Problem
• Discover	microstructure	with	
enhanced	optimal	strength	
and	magnetostriction
Forward	models	known
• Theoretical	models	well-
established	(homogenization)
Inverse	models	unknown
• Optimization	problem
• Challenging	due	to	high	
dimensionality	of	
microstructure	space
• Simple	realization	of	inverse	
models	is	prohibitively	
expensive!
• Non-uniqueness	of	solutions
Data-driven	optimization
• 80%	faster,	20%	better	than	
traditional	methods
• Multiple	solutions	discovered	
for	the	first	time
Liu et al., Scientific Reports 2015; Paul et al., AIAA 2018
Collaboration	between	Agrawal,	Choudhary,	Sundararaghavan
EBSD	Indexing	Using	Deep	Learning
Collaboration	between	Agrawal,	Choudhary,	De	Graef
Tilted
specimen
Diffraction
plane
X
Y
Z
Z’
Y’
X’
ϕ
φ1 φ2
(a)
(b)
ϕ
Objective: Fast	and	accurate	indexing	of	electron	backscatter	diffraction	
(EBSD)	patterns
Solution:		Deep	convolutional	neural	networks	(CNN)	
with	customized	loss	function
Prediction of First Angle
Prediction of Second Angle
Prediction of Third Angle
Benchmark: 5.7o
Our method: 2.5o
Benchmark: 5.7o
Our method: 1.8o
Benchmark: 7.7o
Our method: 4.8o
Electron
beam
Tilted
specimen
Diffraction
plane
Screen
detector
X
Y
Z
Z’
Y’
X’
ϕ
φ1 φ2
(a)
(b)
ϕ
Predictor
MAE
(degrees)
Training
time
Run time
1-NN 5.7, 5.7, 7.7 0 375s
Deep
Learning
2.5, 1.8, 4.8 7 days 50s
Results:	On	average	56%	more	accurate	and	86%	
faster	predictions	compared	to	state-of-the-art	
(1-nearest-neighbor	with	cosine	similarity)
Liu et al., BigData ASH 2016; Jha et al., under review.
Challenge
• Identifying	a	low	dimensional	
microstructure	representation
• Use	it	for	materials	design
Proposed	Solution
• Deep	learning
• Generative	adversarial	networks
• Bayesian	optimization	with	
RCWA
Data
• 5000	128x128	images	
synthesized	using	GRF	method
Results
• 4x4	matrix	(design	variables)	
• Statistically	similar	
microstructures
• 142%	better	optical	absorption
• Scalable	generator
• Transferable	discriminator	 Yang	and	Li	et	al.,	JMD	2018,	in	press
Collaboration	between	Agrawal,	Choudhary,	Chen,	Brinson
Deep	Adversarial	Learning	for	Microstructure	Design
Pavement	Crack	Detection	Using	Deep	Transfer	Learning
Objective: Fast	and	accurate	crack	detection	from	Hot-Mix	
Asphalt	(HMA)	and	Portland	Cement	Concrete	(PCC)	
surfaced	pavement	images
Solution:	A	binary	classifier	trained	on	ImageNet	pre-trained	VGG-16	
CNN	features	for	pavement	images
Results:	Up	to	90%	classification	accuracy	and	0.87	AUC
Gopalakrishnan et al., CBM 2017
Data: Pavement	distress	images	from	the	Federal	Highway	
Administration’s	(FHWA’s)	Long-Term	Pavement	Performance	
(LTPP)	program
Challenges: Inhomogeneity	of	crack,	diversity	of	surface	
texture,	background	complexity,	presence	of	non-crack	
features	such	as	joints,	etc.
Application	to	UAV	Images	for	Structural	Health	Monitoring	
Objective: Monitoring	condition	of	civil	infrastructure	from	
images	captured	by	Unmanned	Aerial	Vehicles	(UAVs)	or	
drones
Solution:	A	binary	classifier	trained	on	ImageNet	pre-trained	VGG-16	
CNN	features	for	pavement	images
Results:	Up	to	90%	classification	accuracy	and	0.9	AUC	in	
realistic	situations	without	any	augmentation	or	preprocessing
Gopalakrishnan et al., IJTTE 2018
Data: Images	of	civil	infrastructure	captured	by	Hexacopter-I	
UAV	with	a	30	MP	high	definition	Canon	EOS	5D	Mk	IV	DSLR	
camera	mounted	on	a	3-axis	rotatable	gimbal	with	live	video	
transmission.	A	total	of	130	images:	80	images	(cracked)	+	50	
images	(uncracked)
Illustrative	Materials	Informatics
• Forward	PSPP	models	(property	prediction)
o Steel	fatigue	strength	prediction	[IMMI	2014,	CIKM	2016,	IJF	
2018]
o Formation	energy	prediction	[PRB	2014,	npjCM 2016,	ICDM	
2016,	DL-KDD	2016,	PRB	2017]
o Band	gap	and	glass	forming	ability	prediction	[npjCM 2016]
o Bulk	modulus	prediction	[RSC	Adv 2016]
o Seebeck coefficient	prediction	[JCompChem 2018]
o Data-driven	solutions	to	multi-scale	localization	relationships	
[IMMI	2015,	IMMI	2017,	CMS	2018]
• Inverse	PSPP	models	(optimization/discovery)
o Stable	compounds	[PRB	2014]
o Magnetostrictive materials	(Galfenol)	[Nature	Scientific	
Reports	2015,	AIAA	2018]
o Semiconductors	and	metallic	glasses	[npjCM 2016]	
o Microstructure	design	(GAN)	[JMD	2018]
• Structure	characterization
o EBSD	Indexing	[BigData-ASH	2016]	
o Crack	detection	in	macroscale	images	[CBM	2017,	IJTTE	2018]
Overview
• Materials Informatics and Big Data
★ Paradigms of Science
★ PSPP Relationships in Materials Science
★ Materials Informatics Knowledge Discovery Workflow
• Illustrative Materials Informatics
★ Forward PSPP models
★ Inverse PSPP models
★ Structure characterization
• Materials Informatics Tools
Illustrative	Materials	Informatics	Tools
http://info.eecs.northwestern.edu
Thank you!

“Materials Informatics and Big Data: Realization of 4th Paradigm of Science in Materials Science