Materials Data in the 21st Century: From Mishmash to Moneyball

Materials Data in the 21st Century:
From Mishmash to Moneyball
Bryce Meredig, Citrine Informatics

OMI Workshop
Madison, Wisconsin
9 February 2015

What Is Citrine?
venture- and grant-funded
Si Valley startup w/revenue
our platform extracts
insights from materials data

Who Is Citrine?
three founders, five full-
time employees
backgrounds: DFT, GaN
growth, Red Hat, Snapchat

Citrine’s Vision
provide Moneyball
analytics for your research
unite all materials data &
algorithms in one platform

Things Citrine Does Not Do
DFT or atomistic modeling
consulting
charge academics $

“Software is eating the world”
-Marc Andreessen, cofounder of Netscape & Si Valley VC
…so why not materials?

1. The materials data landscape is
highly f r a g m e n t e d

Materials Data Pipeline
Laboratory
equipment
Data generation
USB stick,
email,
Dropbox, PC
Transport/store
Excel, PS,
MATLAB,
Python, R
Wrangling
Word / LaTeX
Report Authoring
Google,
WoS, ICSD
Search/Aggreg.
Elsevier, ArXiv,
Researchgate
Distribution
Bibliography Tools
Mendeley,
EndNote,
Zotero, Papers

Materials Data Landscape
•  SpringerMaterials ($)
•  NIMS
•  MatWeb
General Databases
•  ICSD ($)
•  Powder Diffraction File ($)
•  Nanohub
•  ASM Phase Diagrams ($)
•  Granta
Domain Databases
•  Materials Project
•  AFLOWLIB
•  Harvard Clean Energy Project
•  NoMaD
•  CatApp
Computational Databases
Publishers
•  Elsevier
•  APS
•  ACS
•  Wiley
•  Springer
Societies
•  MRS
•  TMS
•  ASM
•  APS
•  ACS
Distribu(on

Characterization
•  Bruker
•  PANalytical
•  Sigma
•  Agilent
•  KLA-Tencor
Crea(on

Synthesis/mfg
•  Applied Materials
•  Oerlikon
•  Veeco
•  Homemade tools
Testing
•  Newport
•  Weibull
•  Intertek
Incep(on

Institutions
•  Universities
•  National Labs
•  Industry
Software
•  ICME
•  Accelrys
•  AFLOW
•  ASE

Materials Data Landscape
•  SpringerMaterials ($)
•  NIMS
•  MatWeb
General Databases
•  ICSD ($)
•  Powder Diffraction File ($)
•  Nanohub
•  ASM Phase Diagrams ($)
•  Granta ($)
Domain Databases
•  Materials Project
•  AFLOWLIB
•  Harvard Clean Energy Project
•  NoMaD
•  CatApp
Computational Databases
Publishers
•  Elsevier
•  APS
•  ACS
•  Wiley
•  Springer
Societies
•  MRS
•  TMS
•  ASM
•  APS
•  ACS
Distribu(on

Characterization
•  Bruker
•  PANalytical
•  Sigma
•  Agilent
•  KLA-Tencor
Crea(on

Synthesis/mfg
•  Applied Materials
•  Oerlikon
•  Veeco
•  Homemade tools
Testing
•  Newport
•  Weibull
•  Intertek
Incep(on

Institutions
•  Universities
•  National Labs
•  Industry
Software
•  ICME
•  Accelrys
•  AFLOW
•  ASE
Citrination

2. Not only that, materials data
are incredibly diverse

materials have a wide variety of
data formats and sources

http://www2.warwick.ac.uk/fac/sci/physics/research/
condensedmatt/microscopy/research/staff/reza/
http://newscenter.lbl.gov/2012/10/30/folding-
funnels-biomimicry/
http://www.nature.com/srep/2013/130507/
srep01787/full/srep01787.html
http://epsc.wustl.edu/haskin-group/
Raman/faqs.htm
http://www.beilstein-journals.org/
bjnano/single/articleFullText.htm?
publicId=2190-4286-1-15
http://www.esrf.eu/
UsersAndScience/
Experiments/CRG/
BM32/Featuredexp/
eymery
http://universe-review.ca/F13-atom04.htm
http://www.cameca.com/instruments-
for-research/atom-probe.aspx
http://www.canmin.org/
content/41/3/759.figures-only

materials have many degrees of
freedom besides composition

some metadata examples (there
are many, many more):

…crystalline

http://www.materials.unsw.edu.au/tutorials/
online-tutorials/1-atomic-structure
amorphous

http://www.metallographic.com/Technical/
Metallography-Intro.html
grain structure

http://pubs.rsc.org/en/content/articlelanding/
2011/jm/c1jm10125k#!divAbstract
nanoparticles

http://ipn2.epfl.ch/CHBU/images/Nanotubemodels.gif
nanotubes

http://media.treehugger.com/assets/images/2011/10/
nanowires-solar-panel-001.jpg
nanowires

http://upload.wikimedia.org/wikipedia/
commons/1/13/Composite_3d.png
composites…

all of these layers tell us about
how materials behave, so…
any solution to the mishmash
must include all of them

(rise of numerous smaller
databases creates need for
meta-aggregation)

Machine Learning for Thermo
B. Meredig & C. Wolverton, PRB 89, 094104 (2014)

Auto-Discovery of Correlations
B. Meredig & C. Wolverton, Chem Mater 26, 1985 (2014)

Citrination Platform
“all materials data”
single data standard
data-driven apps & models
data extraction from pdf’s
semantic search, APIs

Citrination Platform
citrination.com

Materials Data Standard
JSON-based definition
of arbitrary materials
objects & processes
Able to accommodate
wide variety of materials
data

Thermoelectric Discovery
Model Input Data Canonical Thermoelectrics Citrine Discovery
Compound positions determined by weighted composition
(e.g., SiGe would be halfway between Si and Ge; Mg2Si is 1/3 of the way from Mg to Si.)
Distant, novel
class of
thermoelectrics
Universe of
known TE
compounds
MW Gaultois, AO Oliynyk, A Mar, TD Sparks, GJ Mulholland, & B Meredig, “A Recommendation
Engine for Suggesting Unexpected Thermoelectric Chemistries: Initial Experimental Validation.”

MW Gaultois, AO Oliynyk, A Mar, TD Sparks, GJ Mulholland, & B Meredig, “A Recommendation
Engine for Suggesting Unexpected Thermoelectric Chemistries: Initial Experimental Validation.”

TE Search: All Ternary Systems

User Data Upload
http://www.bruker.com/products/x-ray-diffraction-and-elemental-analysis/x-ray-diffraction/xrd-
software/overview/eva/eva-phase-analysis.html

User Model Development
“Github for
materials models”

Challenges
incentives
existing workflows
need for buy-in

Stakeholders
universities
government labs (DOE labs, NIST...) in the US, EU, Japan, China...
funding agencies
journal publishers
scholarship search engines
professional societies
database providers
equipment makers
materials industry (Dow, DuPont, Alcoa, Corning…)
industries that rely on matls (aerospace, electronics, energy...)
and YOU.

Ways to Get Involved
email bryce@citrine.io to join mailing list
try citrination.com and give us feedback
contribute data
contribute models to platform (alpha)
grant proposals – drive our dev

Materials Data in the 21st Century: From Mishmash to Moneyball

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Materials Data in the 21st Century: From Mishmash to Moneyball

Similar to Materials Data in the 21st Century: From Mishmash to Moneyball (20)

Recently uploaded

Recently uploaded (20)

Materials Data in the 21st Century: From Mishmash to Moneyball