SlideShare a Scribd company logo
The Materials Project
Validation, Provenance and Sandboxes
Goals
• Validation
– constantly guard against bugs in core data
and imported data
• Provenance
– know how data came to be
• Sandboxes
– Combine public and non-public data; "good
fences make good neighbors"
Validation
(Internal)
Database ID
External ID What we expected
What we got
Validation runs all the time
• Rules with "constraints" for every database (and sandbox)
• Test constraints against entire DB every night  email reports
• Validation engine, etc. all open-source software in pymatgen-db
Remote
server
Validation
engine
Rules
MP Databases
Reports
(email, web pages, ..)
Rules have a simple syntax
_aliases:
- snl_id = mps_id
- energy = analysis.e_above_hull
materials:
-
filter:
constraints:
- final_energy_per_atom <= 0
- initial_structure.lattice.volume > 0
- initial_structure.lattice.a > 0
- initial_structure.lattice.b > 0
- initial_structure.lattice.c > 0
- initial_structure.lattice.matrix size 3
- formation_energy_per_atom <= 5
- formation_energy_per_atom > -5
- cpu_time > 5
- e_above_hull > -0.000001
- final_energy < 0
- reduced_cell_formula size$
nelements
# Check num. ICSD sources for
selected compounds
-
filter:
- task_id = "mp-540081"
constraints:
- icsd_id size> 10
-
filter:
- task_id = "mp-20379"
constraints:
- icsd_id size 1
-
filter:
- task_id = "mp-13634"
constraints:
- icsd_id size> 0
-
filter:
- task_id = "mp-600022"
constraints:
- icsd_id size 0
# NiO2 phases should never become
stable
-
filter:
- e_above_hull = 0
constraints:
- pretty_formula != 'NiO2'
tasks:
-
filter:
- state = "successful"
constraints:
- output.final_energy_per_atom <= 0
Validation summary
Easy-to-use, integrated, efficient tools to
report errors
Next steps
– Record all check results in DB
– More sophisticated checks (Map/Reduce)
– Make it easier to add new checks internally
– Make it easier to add new check for anyone
• per-sandbox or even per-user ("MP Alerts")
Provenance: How do I know that
the data is correct?
Types of provenance in the system
1) Calculation workflows
– FireWorks records calculation inputs, .. results in great detail
2) External datasets
– Structure Notation Language standardizes the naming of data
sources and publications
3) Post-calculation data transformations
– New "builders" provides framework for tracking creation of final
database products
(1) (2)
(3)
Provenance is available
for every material
Provenance in DB
Structure Notation Language
"snl_final": {
"about": {
"created_at": {
"string": "2014-02-22
19:07:00.383869",
"@class": "datetime",
"@module": "datetime"
},
"_materialsproject": {
"submission_id":
52621,
"snl_id": 398676,
"spacegroup": {
"lattice_type":
"tetragonal",
"symbol":
"P4_2/mmc",
"number": 131,
"point_group":
"4/mmm",
"crystal_system":
"tetragonal",
"hall": "-P 4c 2"
}
},
"_cedergroup": {
"BURP_sids": [
409544,
409545,
409546
],
"icsd_ids": [
],
"e_above_hull":
0.075125350000000423734
},
"references": "",
"authors": [
{
"name": "Geoffroy
Hautier",
"email":
"geoffroy.hautier@uclouvain
.be"
},
{
"name": "Bo Xu",
"email":
"boxu14@mit.edu"
}
],
"remarks": [
"supplementary
compounds from MIT
matgen database"
],
"projects": [
"MIT matgen"
],
"history": [
{
"url": "http://www.fiz-
karlsruhe.de/icsd_home.htm
l",
"name": "Inorganic
Crystal Structure Database",
"description": {
"Collection code":
24692
}
},
{
"url": "",
"name": "",
"description": {
"source": null,
"orig_name": "Basic
substitution code.",
"formula": "O1 Pd1"
}
},
{
"url":
"http://ceder.mit.edu/",
"name": "MIT Ceder
group research database",
"description": {
"source": 105986,
"orig_name": "",
"formula": "FeO"
}
},
{
"url":
"http://www.materialsproject.
org",
"name": "Materials
Project structure
optimization",
"description": {
"fw_id": 820305,
"task_type": "GGA
optimize structure (2x)",
"task_id": "mp-
753682"
}
},
{
"url":
"http://www.materialsproject.
org",
"name": "Materials
Project structure
optimization",
"description": {
"fw_id": 820308,
"task_type":
"GGA+U optimize structure
(2x)",
"task_id": "mp-
776678"
}
}
]
},
Metadata
Crystal
DB sources
References History of
structure
optimizations
Future work: unified view of
provenance
VASP
result
ICSD
VASP
result
VASP
result
Post-
processing
Material
properties
Computation
Data import
processing
e.g., Defects
Sandbox example: Multivalent
JCESR
users
Non-
JCESR
users
Multivalent app
Sandboxes = Database + Apps
Core data Core data
+
multivalent
materials
Non-
JCESR
users
JCESR
users
Technical challenges
• Pre-process data for real-time search
• Interfaces for per-user access control
– https://materialsproject.org/materials/1234?san
dbox=jcesr
– Web UI elements
and
Future: dynamic sandbox creation
Current:
– Large & significant
additional data / apps
• e.g., JCESR
– Longer-term
connections to MP data
• e.g. porous materials
– Companies
• e.g. VW/Stanford
Future
small collab.
per-user?
CoD?
Summary
• Validation
– guard against bugs by checking all data daily
and at data import/creation time
• Provenance
– universal standard for annotating data
provenance
• Sandboxes
– unified view of distinct databases
– onramp for new collaborations and data

More Related Content

What's hot

A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter Himself
Sergey Soldatov
 
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Malachi Jones
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2
Damir Delija
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysis
Charles Lim
 
Windows Threat Hunting
Windows Threat HuntingWindows Threat Hunting
Windows Threat Hunting
GIBIN JOHN
 
Tcpdump hunter
Tcpdump hunterTcpdump hunter
Tcpdump hunter
Andrew McNicol
 
Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016
Xavier Ashe
 
Malware forensics
Malware forensicsMalware forensics
Malware forensics
Sameera Amjad
 
Android Malware Analysis
Android Malware AnalysisAndroid Malware Analysis
Android Malware Analysis
JongWon Kim
 
Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)
Jared Atkinson
 
Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1
Rhydham Joshi
 
DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101
dc612
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
Rahul Mohandas
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
Priyanka Aash
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics tools
Damir Delija
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your Network
EC-Council
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario Malicioso
Conferencias FIST
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
Silvio Cesare
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Sandeep Kumar Seeram
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident Response
Takahiro Haruyama
 

What's hot (20)

A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter Himself
 
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysis
 
Windows Threat Hunting
Windows Threat HuntingWindows Threat Hunting
Windows Threat Hunting
 
Tcpdump hunter
Tcpdump hunterTcpdump hunter
Tcpdump hunter
 
Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016
 
Malware forensics
Malware forensicsMalware forensics
Malware forensics
 
Android Malware Analysis
Android Malware AnalysisAndroid Malware Analysis
Android Malware Analysis
 
Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)
 
Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1
 
DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics tools
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your Network
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario Malicioso
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident Response
 

Similar to Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
DataStax
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
MongoDB
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
Boundary
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
psathishcs
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
Sylvain Wallez
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
The Statistical and Applied Mathematical Sciences Institute
 
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEEMEMTECHSTUDENTPROJECTS
 
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
IEEEMEMTECHSTUDENTSPROJECTS
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
Nick Kypreos
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Public private hybrid - cmdb challenge
Public private hybrid - cmdb challengePublic private hybrid - cmdb challenge
Public private hybrid - cmdb challenge
ryszardsshare
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
Lewis Lin 🦊
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of Things
Massimo Brignoli
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
SoftServe
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
Josh Patterson
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
butest
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
Andrew Flatters
 

Similar to Materials Project Validation, Provenance, and Sandboxes by Dan Gunter (20)

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
 
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Public private hybrid - cmdb challenge
Public private hybrid - cmdb challengePublic private hybrid - cmdb challenge
Public private hybrid - cmdb challenge
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of Things
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 

Recently uploaded

Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
Karen593256
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 

Recently uploaded (20)

Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 

Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

  • 1. The Materials Project Validation, Provenance and Sandboxes
  • 2. Goals • Validation – constantly guard against bugs in core data and imported data • Provenance – know how data came to be • Sandboxes – Combine public and non-public data; "good fences make good neighbors"
  • 3. Validation (Internal) Database ID External ID What we expected What we got
  • 4. Validation runs all the time • Rules with "constraints" for every database (and sandbox) • Test constraints against entire DB every night  email reports • Validation engine, etc. all open-source software in pymatgen-db Remote server Validation engine Rules MP Databases Reports (email, web pages, ..)
  • 5. Rules have a simple syntax _aliases: - snl_id = mps_id - energy = analysis.e_above_hull materials: - filter: constraints: - final_energy_per_atom <= 0 - initial_structure.lattice.volume > 0 - initial_structure.lattice.a > 0 - initial_structure.lattice.b > 0 - initial_structure.lattice.c > 0 - initial_structure.lattice.matrix size 3 - formation_energy_per_atom <= 5 - formation_energy_per_atom > -5 - cpu_time > 5 - e_above_hull > -0.000001 - final_energy < 0 - reduced_cell_formula size$ nelements # Check num. ICSD sources for selected compounds - filter: - task_id = "mp-540081" constraints: - icsd_id size> 10 - filter: - task_id = "mp-20379" constraints: - icsd_id size 1 - filter: - task_id = "mp-13634" constraints: - icsd_id size> 0 - filter: - task_id = "mp-600022" constraints: - icsd_id size 0 # NiO2 phases should never become stable - filter: - e_above_hull = 0 constraints: - pretty_formula != 'NiO2' tasks: - filter: - state = "successful" constraints: - output.final_energy_per_atom <= 0
  • 6. Validation summary Easy-to-use, integrated, efficient tools to report errors Next steps – Record all check results in DB – More sophisticated checks (Map/Reduce) – Make it easier to add new checks internally – Make it easier to add new check for anyone • per-sandbox or even per-user ("MP Alerts")
  • 7. Provenance: How do I know that the data is correct?
  • 8. Types of provenance in the system 1) Calculation workflows – FireWorks records calculation inputs, .. results in great detail 2) External datasets – Structure Notation Language standardizes the naming of data sources and publications 3) Post-calculation data transformations – New "builders" provides framework for tracking creation of final database products (1) (2) (3)
  • 10. Provenance in DB Structure Notation Language "snl_final": { "about": { "created_at": { "string": "2014-02-22 19:07:00.383869", "@class": "datetime", "@module": "datetime" }, "_materialsproject": { "submission_id": 52621, "snl_id": 398676, "spacegroup": { "lattice_type": "tetragonal", "symbol": "P4_2/mmc", "number": 131, "point_group": "4/mmm", "crystal_system": "tetragonal", "hall": "-P 4c 2" } }, "_cedergroup": { "BURP_sids": [ 409544, 409545, 409546 ], "icsd_ids": [ ], "e_above_hull": 0.075125350000000423734 }, "references": "", "authors": [ { "name": "Geoffroy Hautier", "email": "geoffroy.hautier@uclouvain .be" }, { "name": "Bo Xu", "email": "boxu14@mit.edu" } ], "remarks": [ "supplementary compounds from MIT matgen database" ], "projects": [ "MIT matgen" ], "history": [ { "url": "http://www.fiz- karlsruhe.de/icsd_home.htm l", "name": "Inorganic Crystal Structure Database", "description": { "Collection code": 24692 } }, { "url": "", "name": "", "description": { "source": null, "orig_name": "Basic substitution code.", "formula": "O1 Pd1" } }, { "url": "http://ceder.mit.edu/", "name": "MIT Ceder group research database", "description": { "source": 105986, "orig_name": "", "formula": "FeO" } }, { "url": "http://www.materialsproject. org", "name": "Materials Project structure optimization", "description": { "fw_id": 820305, "task_type": "GGA optimize structure (2x)", "task_id": "mp- 753682" } }, { "url": "http://www.materialsproject. org", "name": "Materials Project structure optimization", "description": { "fw_id": 820308, "task_type": "GGA+U optimize structure (2x)", "task_id": "mp- 776678" } } ] }, Metadata Crystal DB sources References History of structure optimizations
  • 11. Future work: unified view of provenance VASP result ICSD VASP result VASP result Post- processing Material properties Computation Data import processing e.g., Defects
  • 14. Sandboxes = Database + Apps Core data Core data + multivalent materials Non- JCESR users JCESR users
  • 15. Technical challenges • Pre-process data for real-time search • Interfaces for per-user access control – https://materialsproject.org/materials/1234?san dbox=jcesr – Web UI elements and
  • 16. Future: dynamic sandbox creation Current: – Large & significant additional data / apps • e.g., JCESR – Longer-term connections to MP data • e.g. porous materials – Companies • e.g. VW/Stanford Future small collab. per-user? CoD?
  • 17. Summary • Validation – guard against bugs by checking all data daily and at data import/creation time • Provenance – universal standard for annotating data provenance • Sandboxes – unified view of distinct databases – onramp for new collaborations and data

Editor's Notes

  1. Picture of 1915 Heinrich Campendonk painting, "Landscape with horses". Steve Martin paid $850K for a forged version of the painting, from a reputable art house in Paris, in 2004. He sold it at a loss of $250K before discovering it was a forgery. The forgery was performed by Wolfgang Beltracchi.
  2. Sandboxes are a way to share preliminary data in the context of MP data and tools.