SlideShare a Scribd company logo
Funding: 2018 Argonne Advanced Computing LDRD
Collaborators: Ryan Chard, Logan Ward, Marcus Schwarting, Kyle Chard, Zhuozhao Li, Anna
Woodard, Yadu Babuji, Steve Tuecke, Mike Franklin, Ian Foster
Blue – also presenting at this workshop
Data and Learning Hub for Science
https://www.dlhub.org
A FAIR Approach to Publishing and
Sharing Machine Learning Models
Ben Blaiszik (blaiszik@uchicago.edu)
Quick Polls
• How many of you have trained a machine learning model?
• How many of you have published papers using machine learning?
• How many of you have tried to reuse models from others?
State of Machine Learning in Science
Highs
• Rapid increase in number of
journal publications
• Advances across the scientific
domains
• Achievements on par with experts
or best-in-class methods in many
domains
• Funding agencies are coalescing
around ML (AI Initiative etc.)
Chart Source and Method:
https://github.com/blaiszik/ml_publication_charts
State of Machine Learning in Science
For a given model:
• Where is the code?
• Where are the trained models?
• Where is the training data?
• How can I reproduce these
results?
Without all of these pieces,
progress is drastically slowed
Location of many ML models after a
paper is finished
Github is another location…
Lows
FAIR Data Principles
• Findable
• Accessible
• Interoperable
• Reusable
https://www.force11.org/group/fairgroup/fairprinciples
Set of principles to help make data as
useful as possible to the community
FAIR Data Principles
Findable
• Data have an identifier
• Data are registered in a searchable resource
Accesible
• Data accessible via identifier
• Data retrievable by open protocols
FAIR Data Principles
Interoperable
• Data leverage formalized shared vocabularies
• Vocabularies themselves follow FAIR principles
Reusable
• Clear licensing
• Descriptive metadata is sufficient to promote
reuse
What Would FAIR Look Like in ML?
(1) Find Interesting Science Paper
• Links to code repository
(Github/DOI)
• Links to data repository (DOI)
• Publication describes the model
and its uses and limitations
What Would FAIR Look Like in ML?
(2) Find Code
• Has unique identifier (DOI)
• Links back to publication
(DOI)
• Has well-documented code
• Tagged with metadata to aid
discovery
• Registered in a search index
• Open license
What Would FAIR Look Like in ML?
(3) Find and Run Model
• Model has identifier (DOI)
• Model has links to data (DOI)
• Model has links to the code
(DOI/Github)
• Model has links to publication
(DOI)
• Data are accessible
• Inference run from the cloud - no
installation necessary!
11
• Collect, publish, categorize models and pre/post processing code
• Operate models as a service to simplify sharing, consumption, and
access
• Identify models with unique and persistent identifiers (e.g., DOI)
• Implement versioning, search, access controls etc.
Goal: Deliver FAIR for ML
2018 Argonne Adv. Computing LDRD
DATA AND LEARNING HUB FOR
SCIENCE (DLHUB)
DLHub: Key Concepts
Run()
• Servables are containers with defined
inputs and outputs
• Servables may represent machine
learning models or other data
transformations
• Outputs can be cached for inputs
DLHub: Key Concepts
• Servables are containers with defined
inputs and outputs
• Servables may represent machine
learning models or other data
transformations
• Outputs can be cached for inputs
Preprocess 1
Run()
Preprocess 2
Run()
Model predict
Run()
Example: Predicting Formation Enthalpy
This is what a user has
This is what a user wants
Example: Predicting Formation Enthalpy
This is what a user has
This is what a user wants
PUBLISHING A MACHINE LEARNING MODEL
16
Marking up a Model – Python SDK
Existing Model
User Mark Up with
SDK
Send to DLHub
(via Globus or HTTPS)
DLHub
Containerization
Populate Search
Index / Mint
Identifiers
SDK Extracts Metadata
for Known Model
Types
Python SDK – Automated Metadata Generation
Citation Metadata
Following Datacite
DLHub Metadata Servable Metadata
Access Control
• Public
• Globus users
• Globus groups
Using DLHub is Easy!
19 2018 Argonne Adv. Computing LDRD
Python SDK
$ pip install dlhub_sdk
1
2
Describe
Publish
• Publish to DLHub
• DLHub service creates
containers
• DLHub service creates unique
endpoint for servable
• Specify the model files
• Mark up the model with
information to make it
discoverable and usable
Using DLHub is Easy!
20 2018 Argonne Adv. Computing LDRD
4
Run
• Make predictions by sending
data to DLHub and
specifying the servable to
use
3
Discover
• Discover servables with
advanced search capabilities
through Python SDK or web
UI (under construction)
NEXT STEPS
21
Combining DLHub with Data Repositories
Get Data
Run Model
2018 Argonne Adv. Computing LDRD
22
• Using high-throughput optical
imaging to predict material
bandgap
Get Data
Run Model
Combining DLHub with Data Repositories
23
2018 Argonne Adv. Computing LDRD
Model-in-the-Loop Science
Select DLHub Use Cases
Funding: 2018 Argonne Adv. Computing LDRD
• Crystal structure • NIST PFHub
• Models linked to dynamic data sources
Community Model Benchmarking
Automated Model Retraining with New
Data
• Metallic glass discovery [active learning]
• XRD applications
XRD image tagging
(Yager, BNL)
(Ward, ANL/UC)
(Ward, ANL/UC) (Wheeler, Warren, Heinonen
NIST/UC/Argonne/NU)
(Center for Hierarchical Materials
Design NIST/UC/Argonne/NU)
CH MaD
XRD intensity à structure/phase
(Cherukara Argonne)
More Examples Available In Our Repositories
25 2018 Argonne Adv. Computing LDRD
Cherukara et al.
Energy Storage Tomography X-Ray Science
Ward et al.
TomoGAN
Liu et al.
DLHub Architecture and Performance
• Task Managers (TM) to support
execution on various compute
resources
• Executors chosen by TM to invoke a
given servable’
• Caching at TM
• Data staging with Globus
• Batch submissions
• Scalability through deployment of
model replicas
https://arxiv.org/abs/1811.11213
zmq
Task Manager
Model
Repository
REST
CLI SDK
TF
Serving
DLHub Management
Service Key
Servable
Node
Model
Serving
Parsl
Sage
Maker
Executor Executor Executor
zmq
Task Manager
Ryan Chard Zhuozhao Li
Open Source Opportunities
2018 Argonne Adv. Computing LDRDhttps://www.dlhub.org
https://github.com/DLHub-Argonne
• Deposit models from the community
• Help build client functionality
• Build examples using existing servables
• Be you!
Contact: Ben Blaiszik (blaiszik@uchicago.edu)
Thanks to our sponsors!
U.S. DEPARTMENT OF
ENERGY
ALCF DF
Parsl Globus IMaD
DLHub Argonne
LDRD

More Related Content

What's hot

ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
Dr. Haxel Consult
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
Sebastian Hellmann
 
Imaging Data Commons (IDC) - Introduction and intital approach
Imaging Data Commons (IDC) - Introduction and intital approachImaging Data Commons (IDC) - Introduction and intital approach
Imaging Data Commons (IDC) - Introduction and intital approach
imgcommcall
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
Christophe Debruyne
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ Karlsruhe
Dr. Haxel Consult
 
HDF5 iRODS
HDF5 iRODSHDF5 iRODS
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
Neeraj Goswami
 
PyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image RegistrationPyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image Registration
Matthew McCormick
 
Data Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureData Analytics.01. Data selection and capture
Data Analytics.01. Data selection and capture
Alex Rayón Jerez
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
vty
 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & Representation
Darshan Patil
 
Semantic annotation
Semantic annotation Semantic annotation
Semantic annotation
serge sonfack
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
dgarijo
 
The HDF Group: Community models and outreach
The HDF Group: Community models and outreachThe HDF Group: Community models and outreach
The HDF Group: Community models and outreach
The HDF-EOS Tools and Information Center
 
Exploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic developmentExploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic development
Paul Walk
 
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
OpenAccessBelgium
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
vty
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
sopekmir
 

What's hot (18)

ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Imaging Data Commons (IDC) - Introduction and intital approach
Imaging Data Commons (IDC) - Introduction and intital approachImaging Data Commons (IDC) - Introduction and intital approach
Imaging Data Commons (IDC) - Introduction and intital approach
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ Karlsruhe
 
HDF5 iRODS
HDF5 iRODSHDF5 iRODS
HDF5 iRODS
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
 
PyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image RegistrationPyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image Registration
 
Data Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureData Analytics.01. Data selection and capture
Data Analytics.01. Data selection and capture
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & Representation
 
Semantic annotation
Semantic annotation Semantic annotation
Semantic annotation
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
The HDF Group: Community models and outreach
The HDF Group: Community models and outreachThe HDF Group: Community models and outreach
The HDF Group: Community models and outreach
 
Exploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic developmentExploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic development
 
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 

Similar to A FAIR Approach to Publishing and Sharing Machine Learning Models

Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Debraj GuhaThakurta
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j
 
Publishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHubPublishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHub
Globus
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j GraphTalk Oslo - Building Intelligent Solutions with GraphsNeo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Databricks
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
amiraryani
 
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
Bruce Kozuma
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdf
JaberRad1
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
Neo4j
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
Philippe Mizrahi
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
Philip Bourne
 
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
Marcus Hanwell
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
"Data in Context" IG sessions @ RDA 3rd Plenary
"Data in Context" IG sessions @  RDA 3rd Plenary"Data in Context" IG sessions @  RDA 3rd Plenary
"Data in Context" IG sessions @ RDA 3rd Plenary
Brigitte Jörg
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Brigitte Jörg
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
Comsode - FP7 project
 

Similar to A FAIR Approach to Publishing and Sharing Machine Learning Models (20)

Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
 
Publishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHubPublishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHub
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j GraphTalk Oslo - Building Intelligent Solutions with GraphsNeo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
 
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdf
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
"Data in Context" IG sessions @ RDA 3rd Plenary
"Data in Context" IG sessions @  RDA 3rd Plenary"Data in Context" IG sessions @  RDA 3rd Plenary
"Data in Context" IG sessions @ RDA 3rd Plenary
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 

Recently uploaded

Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
vimalveerammal
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
yashika sharman06
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
gyhwyo
 
Mites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdfMites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdf
PirithiRaju
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
Nereis Type Study for BSc 1st semester.ppt
Nereis Type Study for BSc 1st semester.pptNereis Type Study for BSc 1st semester.ppt
Nereis Type Study for BSc 1st semester.ppt
underratedsunrise
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
rajeshwexl
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Rodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdfRodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdf
PirithiRaju
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
abhinayakamasamudram
 
acanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptxacanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptx
muralinath2
 
Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05
Sérgio Sacani
 

Recently uploaded (20)

Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
 
Mites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdfMites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdf
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
Nereis Type Study for BSc 1st semester.ppt
Nereis Type Study for BSc 1st semester.pptNereis Type Study for BSc 1st semester.ppt
Nereis Type Study for BSc 1st semester.ppt
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Rodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdfRodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdf
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
 
acanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptxacanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptx
 
Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05
 

A FAIR Approach to Publishing and Sharing Machine Learning Models

  • 1. Funding: 2018 Argonne Advanced Computing LDRD Collaborators: Ryan Chard, Logan Ward, Marcus Schwarting, Kyle Chard, Zhuozhao Li, Anna Woodard, Yadu Babuji, Steve Tuecke, Mike Franklin, Ian Foster Blue – also presenting at this workshop Data and Learning Hub for Science https://www.dlhub.org A FAIR Approach to Publishing and Sharing Machine Learning Models Ben Blaiszik (blaiszik@uchicago.edu)
  • 2. Quick Polls • How many of you have trained a machine learning model? • How many of you have published papers using machine learning? • How many of you have tried to reuse models from others?
  • 3. State of Machine Learning in Science Highs • Rapid increase in number of journal publications • Advances across the scientific domains • Achievements on par with experts or best-in-class methods in many domains • Funding agencies are coalescing around ML (AI Initiative etc.) Chart Source and Method: https://github.com/blaiszik/ml_publication_charts
  • 4. State of Machine Learning in Science For a given model: • Where is the code? • Where are the trained models? • Where is the training data? • How can I reproduce these results? Without all of these pieces, progress is drastically slowed Location of many ML models after a paper is finished Github is another location… Lows
  • 5. FAIR Data Principles • Findable • Accessible • Interoperable • Reusable https://www.force11.org/group/fairgroup/fairprinciples Set of principles to help make data as useful as possible to the community
  • 6. FAIR Data Principles Findable • Data have an identifier • Data are registered in a searchable resource Accesible • Data accessible via identifier • Data retrievable by open protocols
  • 7. FAIR Data Principles Interoperable • Data leverage formalized shared vocabularies • Vocabularies themselves follow FAIR principles Reusable • Clear licensing • Descriptive metadata is sufficient to promote reuse
  • 8. What Would FAIR Look Like in ML? (1) Find Interesting Science Paper • Links to code repository (Github/DOI) • Links to data repository (DOI) • Publication describes the model and its uses and limitations
  • 9. What Would FAIR Look Like in ML? (2) Find Code • Has unique identifier (DOI) • Links back to publication (DOI) • Has well-documented code • Tagged with metadata to aid discovery • Registered in a search index • Open license
  • 10. What Would FAIR Look Like in ML? (3) Find and Run Model • Model has identifier (DOI) • Model has links to data (DOI) • Model has links to the code (DOI/Github) • Model has links to publication (DOI) • Data are accessible • Inference run from the cloud - no installation necessary!
  • 11. 11 • Collect, publish, categorize models and pre/post processing code • Operate models as a service to simplify sharing, consumption, and access • Identify models with unique and persistent identifiers (e.g., DOI) • Implement versioning, search, access controls etc. Goal: Deliver FAIR for ML 2018 Argonne Adv. Computing LDRD DATA AND LEARNING HUB FOR SCIENCE (DLHUB)
  • 12. DLHub: Key Concepts Run() • Servables are containers with defined inputs and outputs • Servables may represent machine learning models or other data transformations • Outputs can be cached for inputs
  • 13. DLHub: Key Concepts • Servables are containers with defined inputs and outputs • Servables may represent machine learning models or other data transformations • Outputs can be cached for inputs Preprocess 1 Run() Preprocess 2 Run() Model predict Run()
  • 14. Example: Predicting Formation Enthalpy This is what a user has This is what a user wants
  • 15. Example: Predicting Formation Enthalpy This is what a user has This is what a user wants
  • 16. PUBLISHING A MACHINE LEARNING MODEL 16
  • 17. Marking up a Model – Python SDK Existing Model User Mark Up with SDK Send to DLHub (via Globus or HTTPS) DLHub Containerization Populate Search Index / Mint Identifiers SDK Extracts Metadata for Known Model Types
  • 18. Python SDK – Automated Metadata Generation Citation Metadata Following Datacite DLHub Metadata Servable Metadata Access Control • Public • Globus users • Globus groups
  • 19. Using DLHub is Easy! 19 2018 Argonne Adv. Computing LDRD Python SDK $ pip install dlhub_sdk 1 2 Describe Publish • Publish to DLHub • DLHub service creates containers • DLHub service creates unique endpoint for servable • Specify the model files • Mark up the model with information to make it discoverable and usable
  • 20. Using DLHub is Easy! 20 2018 Argonne Adv. Computing LDRD 4 Run • Make predictions by sending data to DLHub and specifying the servable to use 3 Discover • Discover servables with advanced search capabilities through Python SDK or web UI (under construction)
  • 22. Combining DLHub with Data Repositories Get Data Run Model 2018 Argonne Adv. Computing LDRD 22 • Using high-throughput optical imaging to predict material bandgap
  • 23. Get Data Run Model Combining DLHub with Data Repositories 23 2018 Argonne Adv. Computing LDRD
  • 24. Model-in-the-Loop Science Select DLHub Use Cases Funding: 2018 Argonne Adv. Computing LDRD • Crystal structure • NIST PFHub • Models linked to dynamic data sources Community Model Benchmarking Automated Model Retraining with New Data • Metallic glass discovery [active learning] • XRD applications XRD image tagging (Yager, BNL) (Ward, ANL/UC) (Ward, ANL/UC) (Wheeler, Warren, Heinonen NIST/UC/Argonne/NU) (Center for Hierarchical Materials Design NIST/UC/Argonne/NU) CH MaD XRD intensity à structure/phase (Cherukara Argonne)
  • 25. More Examples Available In Our Repositories 25 2018 Argonne Adv. Computing LDRD Cherukara et al. Energy Storage Tomography X-Ray Science Ward et al. TomoGAN Liu et al.
  • 26. DLHub Architecture and Performance • Task Managers (TM) to support execution on various compute resources • Executors chosen by TM to invoke a given servable’ • Caching at TM • Data staging with Globus • Batch submissions • Scalability through deployment of model replicas https://arxiv.org/abs/1811.11213 zmq Task Manager Model Repository REST CLI SDK TF Serving DLHub Management Service Key Servable Node Model Serving Parsl Sage Maker Executor Executor Executor zmq Task Manager Ryan Chard Zhuozhao Li
  • 27. Open Source Opportunities 2018 Argonne Adv. Computing LDRDhttps://www.dlhub.org https://github.com/DLHub-Argonne • Deposit models from the community • Help build client functionality • Build examples using existing servables • Be you! Contact: Ben Blaiszik (blaiszik@uchicago.edu)
  • 28. Thanks to our sponsors! U.S. DEPARTMENT OF ENERGY ALCF DF Parsl Globus IMaD DLHub Argonne LDRD