SlideShare a Scribd company logo
1 of 28
Using PID Graph to reproduce
research
Marcus Povey and Claudia Alen Amaro
FREYA Wrap up meeting, Amsterdam
November 2020
Instruct-ERIC is the single point of access to technology and expertise for
structural biology research.
The Instruct consortium comprises ten Instruct Centres that offer
access to 23 research sites across Europe.
Instruct has 15 Members that each pay an annual subscription to allow their scientists to access the
range of services that are available through Instruct.
1. Instruct Centre BE
2. Instruct Centre CZ
3. Instruct Centre ES
4. Instruct Centre FI
5. Instruct Centre FR1
6. Instruct Centre FR2
7. Instruct Centre IL
8. Instruct Centre IT
9. Instruct Centre NL
10. Instruct Centre UK
Belgium
Italy
Denmark
Netherlands
Czech Republic
Latvia
Finland
Slovakia
France
Spain
Israel
UKPortugal
EMBL
Lithuania
Instruct’s extensive
service catalogue
encourages a
multidisciplinary
approach to structural
biology research.
Technology
Catalogue
Instruct offers access to
over 75 different
services, from sample
preparation to
biomolecular and 3D
structural analyses.
Instruct is an ESFRI landmark and an ERIC
We are working together with other Life
science research infrastructures in the
project EOSC-Life.
We have been collaborating with: FREYA and
OpenAire, preparing to engage with EOSC
and to make sure that our data is FAIR
We have developed our own proposal
management system: ARIA
Our Mission
• ACCESS – Facilitating access to cutting
edge research infrastructure and
methods
• FACILITY – Helping research
infrastructures manage their equipment,
and representing their interests
• COMMUNITY – Contributing to the wider
scientific community as a whole, and
helping researchers, projects and
infrastructures work better together
• DATA – Improving access to research data,
and facilitating Open Access
• ARIA Cloud!
DATA
FACILITY
ACCESS
COMMUNITY
Structural Biologists
use Microscopes
• 12 Samples (grids) in a loader
• Each grid can potentially have multiple
structures that are of interest (projects
with 96 well grids are underway)
• Outputs ~1-3TB of HD Video per day
Electron Microscopy
Researcher submits a
proposal for access
Researcher produces a
sample locally
Sample is loaded
onto a grid
Grid goes into Electron
Microscope
Micrographs go into
pre-processor
Particle picking, auto &
manual processing
Datasets are
analysed by 10s of
software packages
3D structure
determined
Structure deposited
into PDB/EM-DB
Researcher submits a
publication to journal
There are a lot of
things we might want
to track…
• Number of sample grids
• Potentially multiplied by samples on
a grid
• Multiplied by grids in a microscope
• Multiplied by frames of video
• Multiplied by number of microscopes per
facility
• … multiplied by the number of facilities.
... But wait, there’s
more!
• We need to know the data processing
workflows used
• We need to identify samples and
associated metadata
• We need to know a given machine’s
configuration
• Software and software versions used to
process and analyse data
• Researchers involved in project
• Funding applications (proposals)
Structural Biologists
also use
Synchrotrons…
• Similarly large data volumes
• Similarly complex machine configurations
• Similarly complex data processing
workflow to produce results
Crystallography
Researcher submits a
proposal for access
Researcher produces a
sample locally
Sample added to
crystal plate
Crystal plate imaged
regularly
Crystals loaded onto
pins
Crystals shot with X-
Rays at synchrotron
Diffraction pattern
auto-analysis and re-
running
3D structure
determined
Structure deposited
into PDB
Researcher submits a
publication to journal
How can we help
make the research
reproduceable?
• There are a lot of parts to keep track of!
• Data sets are often too large to practically
move about
• Machine configurations are often only
available on the machine itself
• Software gets modified
• How do we make this findable, accessible
and reusable?
• Can the PID graph help?
How this might work…
Building an
experimental session
• At the beginning of an experimental
session, mint a PID.
• It is also a good idea to reference the
institution in which this takes place, if
available
• Mint a PID for all relevant assets and data
as session progresses
• (”Relevant” is highly application specific)
• Each asset PID “cites” the experimental
identifier
Building a "research
bundle"
• Once the output is produced a research
identifier is minted
• Which links one or more experimental
sessions
• A link is established between the
identifier and the output
Using a “Research
bundle”
• Interrogate the published output
• Use the PID graph to find associated
datasets
Using a “Research
bundle”
• Next, find one or more experimental
sessions
• Interrogate relatedIdentifiers to
find referenced Dataset nodes
Producing a “Research
bundle”
• Finally, expand the experimental session
• Use the graph in a similar method to User
Story 8 (Fenner, 2020)
• For each data set, collect its parts.
What a tool might look like…
What a tool might look like…
• There are three main tasks
• Minting PIDs for the assets
• Linking those PIDs together to produce a bundle
• Retrieving the bundle and presenting them in a structured way
• All this needs to be wrapped into one or more services to be useful
Experiment session
manager
• A tool primarily used by infrastructures to
produce the experimental session and
research identifier
• Service provides APIs to mint identifiers
for assets
• Service manages associating assets with
the session record.
Experiment session manager
• Researcher makes a booking / drops in to use a machine
• They scan a QR code and are taken to their booking on a website
• They log in using their ORCID and click “Begin Session”
• PID is minted identifying session
• System adds machine PID
• System adds PID for booking, machine configuration. Etc
• During the session other outputs can be added
• Facility staff can add additional information (processed data, samples
etc)
• ARIA could be extended to do this!
The Claiming tool
• When the output has been produced, we
need to link this to all the outputs and
create a “bundle”
• For ARIA users, this would be as simple as
adding the PID of the output to their
proposal
• This is an often requested feature!
• For others, we need the help of the PID
graph
The Claiming tool
• Researcher logs in to a service with their ORCID
• Enter the PID of their output
• A search is performed identifying datasets produced by the author,
allowing selection to be added to a bundle
• Perhaps further optimised by excluding those not “part of” an existing dataset
• They “confirm” the process, a new bundle is created and a PID
minted.
Research bundle
viewer
• Brings this all together in a simple site
• Landing page where a visitor could enter
the PID
• Performs searches and provides an
interface to drill down into the relevant
research bundles.
• … in usable way
Next steps…
Next steps…
• Still very much in the concept stage
• But much of this functionality has been
requested by our users
• We have a commitment to better help
link their data…
• Watch this space!
DATA
FACILITY
ACCESS
COMMUNITY
@ARIA_access
aria@instruct-eric.eu
Thanks!

More Related Content

Recently uploaded

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 

Recently uploaded (20)

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Using the FREYA PID Graph to help reproduce scientific research

  • 1. Using PID Graph to reproduce research Marcus Povey and Claudia Alen Amaro FREYA Wrap up meeting, Amsterdam November 2020
  • 2. Instruct-ERIC is the single point of access to technology and expertise for structural biology research. The Instruct consortium comprises ten Instruct Centres that offer access to 23 research sites across Europe. Instruct has 15 Members that each pay an annual subscription to allow their scientists to access the range of services that are available through Instruct. 1. Instruct Centre BE 2. Instruct Centre CZ 3. Instruct Centre ES 4. Instruct Centre FI 5. Instruct Centre FR1 6. Instruct Centre FR2 7. Instruct Centre IL 8. Instruct Centre IT 9. Instruct Centre NL 10. Instruct Centre UK Belgium Italy Denmark Netherlands Czech Republic Latvia Finland Slovakia France Spain Israel UKPortugal EMBL Lithuania
  • 3. Instruct’s extensive service catalogue encourages a multidisciplinary approach to structural biology research. Technology Catalogue Instruct offers access to over 75 different services, from sample preparation to biomolecular and 3D structural analyses.
  • 4. Instruct is an ESFRI landmark and an ERIC We are working together with other Life science research infrastructures in the project EOSC-Life. We have been collaborating with: FREYA and OpenAire, preparing to engage with EOSC and to make sure that our data is FAIR We have developed our own proposal management system: ARIA
  • 5. Our Mission • ACCESS – Facilitating access to cutting edge research infrastructure and methods • FACILITY – Helping research infrastructures manage their equipment, and representing their interests • COMMUNITY – Contributing to the wider scientific community as a whole, and helping researchers, projects and infrastructures work better together • DATA – Improving access to research data, and facilitating Open Access • ARIA Cloud! DATA FACILITY ACCESS COMMUNITY
  • 6. Structural Biologists use Microscopes • 12 Samples (grids) in a loader • Each grid can potentially have multiple structures that are of interest (projects with 96 well grids are underway) • Outputs ~1-3TB of HD Video per day
  • 7. Electron Microscopy Researcher submits a proposal for access Researcher produces a sample locally Sample is loaded onto a grid Grid goes into Electron Microscope Micrographs go into pre-processor Particle picking, auto & manual processing Datasets are analysed by 10s of software packages 3D structure determined Structure deposited into PDB/EM-DB Researcher submits a publication to journal
  • 8. There are a lot of things we might want to track… • Number of sample grids • Potentially multiplied by samples on a grid • Multiplied by grids in a microscope • Multiplied by frames of video • Multiplied by number of microscopes per facility • … multiplied by the number of facilities.
  • 9. ... But wait, there’s more! • We need to know the data processing workflows used • We need to identify samples and associated metadata • We need to know a given machine’s configuration • Software and software versions used to process and analyse data • Researchers involved in project • Funding applications (proposals)
  • 10. Structural Biologists also use Synchrotrons… • Similarly large data volumes • Similarly complex machine configurations • Similarly complex data processing workflow to produce results
  • 11. Crystallography Researcher submits a proposal for access Researcher produces a sample locally Sample added to crystal plate Crystal plate imaged regularly Crystals loaded onto pins Crystals shot with X- Rays at synchrotron Diffraction pattern auto-analysis and re- running 3D structure determined Structure deposited into PDB Researcher submits a publication to journal
  • 12. How can we help make the research reproduceable? • There are a lot of parts to keep track of! • Data sets are often too large to practically move about • Machine configurations are often only available on the machine itself • Software gets modified • How do we make this findable, accessible and reusable? • Can the PID graph help?
  • 13. How this might work…
  • 14. Building an experimental session • At the beginning of an experimental session, mint a PID. • It is also a good idea to reference the institution in which this takes place, if available • Mint a PID for all relevant assets and data as session progresses • (”Relevant” is highly application specific) • Each asset PID “cites” the experimental identifier
  • 15. Building a "research bundle" • Once the output is produced a research identifier is minted • Which links one or more experimental sessions • A link is established between the identifier and the output
  • 16. Using a “Research bundle” • Interrogate the published output • Use the PID graph to find associated datasets
  • 17. Using a “Research bundle” • Next, find one or more experimental sessions • Interrogate relatedIdentifiers to find referenced Dataset nodes
  • 18. Producing a “Research bundle” • Finally, expand the experimental session • Use the graph in a similar method to User Story 8 (Fenner, 2020) • For each data set, collect its parts.
  • 19. What a tool might look like…
  • 20. What a tool might look like… • There are three main tasks • Minting PIDs for the assets • Linking those PIDs together to produce a bundle • Retrieving the bundle and presenting them in a structured way • All this needs to be wrapped into one or more services to be useful
  • 21. Experiment session manager • A tool primarily used by infrastructures to produce the experimental session and research identifier • Service provides APIs to mint identifiers for assets • Service manages associating assets with the session record.
  • 22. Experiment session manager • Researcher makes a booking / drops in to use a machine • They scan a QR code and are taken to their booking on a website • They log in using their ORCID and click “Begin Session” • PID is minted identifying session • System adds machine PID • System adds PID for booking, machine configuration. Etc • During the session other outputs can be added • Facility staff can add additional information (processed data, samples etc) • ARIA could be extended to do this!
  • 23. The Claiming tool • When the output has been produced, we need to link this to all the outputs and create a “bundle” • For ARIA users, this would be as simple as adding the PID of the output to their proposal • This is an often requested feature! • For others, we need the help of the PID graph
  • 24. The Claiming tool • Researcher logs in to a service with their ORCID • Enter the PID of their output • A search is performed identifying datasets produced by the author, allowing selection to be added to a bundle • Perhaps further optimised by excluding those not “part of” an existing dataset • They “confirm” the process, a new bundle is created and a PID minted.
  • 25. Research bundle viewer • Brings this all together in a simple site • Landing page where a visitor could enter the PID • Performs searches and provides an interface to drill down into the relevant research bundles. • … in usable way
  • 27. Next steps… • Still very much in the concept stage • But much of this functionality has been requested by our users • We have a commitment to better help link their data… • Watch this space! DATA FACILITY ACCESS COMMUNITY