Network Materials Data Discovery Cloud

computationinstitute.org
Networking materials data
Ian Foster
foster@anl.gov
ianfoster.org

Materials Innovation Infrastructure
A data sharing system to facilitate:
• Use of a broader set of data to render
more accurate models
• Multi-disciplinary communication
among scientists and engineers
working on different stages of
materials development
• Searches for advanced materials
with specific, desired properties
• Curating and sharing of reliable computational
code for modeling and simulation
Credit: Meredith Drosback, OSTP
Computation
Data
Experiment

Data: Rare treasure?
http://www.thejakartapost.com/news/2011/05/14/holy-water.html

It’s both …
We must manage the data
deluge—both to enhance
user productivity and to
increase data capture
 Network materials data
Or chaotic deluge?
Wellington bucket fountain: https://www.youtube.com/watch?v=_p_FNNDu16w

Linking simulation and experiment
to study disordered structures
Diffuse scattering images from Ray Osborn et al., Argonne

SampleExperimental
scattering
Material
composition
Simulated
structure
Simulated
scattering
La 60%
Sr 40%
Detect errors
(secs—mins)
Knowledge base
Past experiments;
simulations; literature;
expert knowledge
Select experiments
(mins—hours)
Contribute to knowledge base
Simulations driven by
experiments (mins—days)
Knowledge-driven
decision making
Evolutionary optimization

An expensive business …
Network
engineer
Parallel
programme
r
Software
engineer
Database
architect
Database
manager
Software
engineer
Data
engineer
Parallel
programmer
Postdoc
Postdoc

A small business, 20 years ago
Secretary
HR
manager
Marketing
Database
manager
Accountant
IT
department
Personal
assistant
Shipping
department
Intern
Payroll

A small business, today
“Business cloud”
Reduce costs
Speed innovation
Reliable, scalable, simple

Can we do the same for research?
“Discovery cloud”
Reduce costs
Speed discovery
Reliable, scalable, simple
?

File transfer
& sharing
Discovery cloud: Globus research
data management services
www.globus.org

SampleExperimental
scattering
Material
composition
Simulated
structure
Simulated
scattering
La 60%
Sr 40%
Globus transfer service
Cloud hosted: reliable, secure, fast
20K users, 3B files, 50 PB transferred
Available at www.globus.org

File transfer
& sharing
Identity & group
management
www.globus.org

SampleExperimental
scattering
Material
composition
Simulated
structure
Simulated
scattering
La 60%
Sr 40%
Evolutionary optimization
Globus sharing
Identities, groups, profiles
Cloud hosted

File transfer
& sharing
Data publication
& discovery
Identity & group
management
www.globus.org

SampleExperimental
scattering
Material
composition
Simulated
structure
Simulated
scattering
La 60%
Sr 40%
Knowledge base
Past experiments;
simulations; literature;
expert knowledge
Contribute to knowledge base
Knowledge-driven
decision making
Globus data publication
and discovery
Cloud hosted

Data publication and discovery
We are looking for pilot users!
Metadata
Access Control
License
Storage
Curation
Workflow
Policies
Collection
Metadata
DataMetadata
Data
Metadata
Data
Dataset
Dataset
Dataset
Community

Publish dashboard
20

Start a new submission
21

22
Describe submission:
1) Dublin Core

23
Describe submission:
2) Science metadata

Assemble the dataset
24

25
Transfer files to
submission endpoint

26
Check dataset is
assembled correctly

Submission now in curation workflow
27

Search published datasets
28

Search across collections

Discover a published dataset
30

Select a published dataset
31

View downloaded dataset
32

File transfer
& sharing
Data publication
& discovery
Simulation &
data analysis
Identity & group
management
www.globus.org

Tool shed
Simulation
models & analysis
tools
Data space
Local and remote
datasets
Workflows
Link data, tools in
reusable form
Simulation and data analysis:
Point and click parallelism
Capture domain
knowledge: data
and code
Reusable workflows encode
commonly used modeling
and analysis pipelines
Builds on widely used Galaxy, Globus, and Swift systems
galaxyproject.org ✧ globus.org ✧ swift-lang.org
Large
simulation
campaigns
Hosted on Amazon cloud for reliable, on-demand access and scalability

Discovery Cloud:
Three common themes
1) Accelerate discovery via automation
2) Slash costs of trying new methods
– No local software installation
– No need to read manual
– On-demand, elastic scalability
– Low operational costs, proactive support
3) Make data preservation trivial

Take away messages
• Data has a dual nature: rare treasure
and chaotic deluge
• MGI must embrace this duality
– Treasure: Store, curate, index, preserve
– Deluge: Slash management costs, to both
accelerate use & facilitate data preservation
• Cloud services can help in both areas

Thanks to great colleagues
and collaborators
• Rachana Ananthakrishnan, Ben Blaiszik, Kyle
Chard, Raj Kettimuthu, Ravi Madduri, Tanu
Malik, Steve Tuecke, Justin Wozniak, and other
CS colleagues
• Ray Osborn, Francesco de Carlo, Chris
Jacobsen, Nicola Ferrier, and other Argonne
scientists
• Juan de Pablo, Peter Voorhees, and other NIST
CHiMaD participants

Thank you to our sponsors!

Network Materials Data Discovery Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Network Materials Data Discovery Cloud

Similar to Network Materials Data Discovery Cloud (20)

More from Ian Foster

More from Ian Foster (20)

Recently uploaded

Recently uploaded (20)

Network Materials Data Discovery Cloud

Editor's Notes