COPO - Collaborative Open Plant Omics, by Rob Davey
1. COPO: Collaborative Open Plant Omics
Rob Davey
DataInfrastructureand AlgorithmsGroup Leader
robert.davey@tgac.ac.uk
@froggleston
2.
3. Acknowledgements
Oxford eResearch Centre
Susanna Sansone
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Warwick
Jim Beynon
Katherine Denby
Ruth Bastow
EMBL-EBI
Paul Kersey
TGAC
Vicky Schneider
Tanya Dickie
EmilyAngiolini
Matt Drew
Toni Etuk Felix Shaw
4. COPO
• Recently awarded BBSRC grant
• TGAC, Univ. Oxford, Univ. Warwick, EMBL-EBI
• Supported by GARNet, iPlant, Eagle Genomics
• Empower bioscience plant researchers to:
1. Enable standards-compliant data collection, curation and
integration
2. Enhance access to data analysis and visualisation pipelines
3. Facilitate data sharing and publication to promote reuse
• Train plant researchers in best practice for data sharing and
producing citable Research Objects
5. COPO
• (Good) Science is founded on reproducibility
• Reproducibility depends on:
• reducing reinvention (“friction”)*
• describing methods and data
• maximising benefit to the researcher
• Papers are seen as the typical way of distributing information
• Data description sorely under-represented and used
• Benefits are often opaque
• Fear of being scooped, loss of control, reputation, etc
* http://cameronneylon.net/blog/network-enabled-research/
6. COPO
• What prevents plant scientists from openly depositing their data
and metadata?
• Lack of interoperability between:
• metadata annotation services
• data repository services
• data analysis services
• data publishing services
• Researchers might not:
• be aware that the services exist
• have the expertise to use them
• see the value in properly describing their data
7. COPO
• What prevents plant scientists from openly depositing their data
and metadata?
• Lack of interoperability between:
• metadata annotation services
• data repository services
• data analysis services
• data publishing services
• Researchers might not:
• be aware that the services exist
• have the expertise to use them
• see the value in properly describing their data
5% TECHNICAL
8. COPO
• What prevents plant scientists from openly depositing their data
and metadata?
• Lack of interoperability between:
• metadata annotation services
• data repository services
• data analysis services
• data publishing services
• Researchers might not:
• be aware that the services exist
• have the expertise to use them
• see the value in properly describing their data
5% TECHNICAL
95% CULTURAL
10. COPO
• It's not because these services don't exist!
• Clearly, barriers exist between the scientist and the service
• Infrastructure can help by:
• wiring existing services together
• improving access to services
• making consistent metadata attribution easier
• facilitating collaboration
• raising profile of the benefits of open science
• How do we collaborate successfully to make this happen?
• Mapping services with Application Programming Interfaces
12. COPO
• Single-sign on (SSO), e.g. ORCID
• Deposit multi-omicsdata in one go
• Genomics, metabolomics, proteomics, and so on
• No context-switching between services
• Brokering of research data
• Wizard systemsguide users based on selected semantic terms
• Selecting options at given points informs the path
• Intuitive and less time intensive
• Mix of required repository metadataand extra descriptors
• Building knowledge graphs of linked data
INTEROPERABILITY
13. COPO
• Run and deposit analyticalworkflows
• Describe softwareused, versions
• Data IO between platforms, e.g. Galaxy, iPlant
• Support virtualisation, e.g. iPlant Atmosphere, Docker,Amazon AWS
• Data is well-described, open, and DOIs will be minted
• Finding and integrating data improved greatly
• Make suggestions to users based on their metadata/data/workflows
• Users get recognition for sharing well-described data
• Datacitation
• Programmatic access to all layers
REPRODUCIBILITY
17. COPO
• Build graphs of interconnected data, analyses and outputs
• Searches hitting any part of the graph will allow retrieval of
the rest
• Including any citations, data or text
18. COPO
• Not just raw/processed data is valuable
• COPO supports submission of supplementary data to Figshare
• PDFs (posters, papers)
• CSV/Excel
• movies/images (size permitting)
• Zenodo/Github releases for code DOIs
• e.g. ENCODE Digital Curation Center’s software metadata
descriptors
• EDAM/Software ontology
19. COPO
• What have we achieved so far?
• TGAC infrastructure to support brokering of data
• iRODS and web server virtual machines
• High speed transfer Aspera links to EBI
• Prototype user interface for multi-omics data submissions
• Oauth2 support (“sign in with” ORCiD, Google, Twitter)
• Moving to OpenID Connect?
• Profiles and collections for managing metadata
• Metadata for sequencing
• Tracking data uploads and accession status
20. COPO
• What have we achieved so far?
• Developing JSON specification for COPO objects
• Easily stored in document-based databases, e.g. MongoDB
• Interconversion between ISA formats
• ISATab (TSV based) to JSON, and vice versa
• Linked Data specifications
• Community interactions
• First user requirements workshop at TGAC
• F1000, GigaScience
• This meeting!
21. COPO
• COPOwill:
• Facilitate easy relevantdata description
• Intuitively submit data and metadatato multiple public repositories
• Integratewith other research data managementplatforms
• Wheat Information System, iPlant, …
What are the barriers for plant research data?
5% technical,95% cultural
• Work with other infrastructurebuilders to provide truly open connected
ecosystems for data?
• How can we help researchersrealise the benefits of sharingdata?