Successfully reported this slideshow.
Data sharing
Data management
The SysMO-SEEK
Story
Professor Carole Goble FREng FBCS CITP
University of Manchester, UK
caro...
13 teams
91 institutes, 300 scientists
Multi-site, multi-disciplinary
Each three year duration
Data generation
Data consum...
Own data solutions. wikis, e-Groupware,
PHProjekt, BaseCamp, PLONE, Alfresco, bespoke
commercial … files and spreadsheets....
Scientist Lab Collaborators Competitors
ProgrammePublished
Post-
Publication
Pre-
Publication
Data mine-ing
“my impression of researchers, and I can
criticize myself in this, is that we’re much
more interested in sha...
Competitive advantage.
Adoption.
Kudos & Credit.
Help.
Fame.
Reputation.
Being scooped.
Scrutiny.
Misinterpretation.
Cost....
“It’s not ready yet”
“I need to get (another) publication first”
“We don’t have the resources or skills to prepare
it for ...
Pseudo Sharing
2. Preparation for Use
Curation
Standards
Reusability
Reproducibility
Accountability & Quality
Data discipline Silo busting
CIMR Core Information for Metabolomics Reporting
MIABE Minimal Information About a Bioactive Entity
MIACA Minimal Informat...
http://usefulchem.wikispaces.com/page/code/EXPLAN001
http://www.mygrid.org.uk/tools/taverna/
Publishing Process
models
sof...
Community Curation
Responsiblity
Blue Collar Science
John Quackenbush
Difficult
and time
consuming
Poor Credit
or Reward
Shabby
Career
Paths &
Prospects
3. Credit Crisis
• Reward sharing, curation and
reuse rather than reinvention.
• Credit. Attribution. Citation.
• For soft...
4. Infrastructure, Capability & Capacity
• Three year
PhD/project cycle
• Local data control
• Realistic paths to
adoption...
http://www.biosharing.org
Identity Management
Sharednames DataCite
LSID DOIs ORCID
5. Data Ecosystem
Resources
6. Sustained Resources
• Three year projects.
• Three year lifespan of data (and its software).
• Sunsets and Sustains
• R...
Incentives.
Sensitivity to
Behaviours
Infrastructure
Community building
Trusted service
Coordination
Governance
Policy
Cap...
A Partnership
• Software engineers
• Computational scientists
• Experimental Scientists
• Domain informaticians
• Service ...
Summary
• Science is a complex social activity
undertaken by tribes of people and
dominated by trust issues.
• Infrastruct...
Data management, data sharing: the SysMO-SEEK Story
Upcoming SlideShare
Loading in …5
×

Data management, data sharing: the SysMO-SEEK Story

1,404 views

Published on

My experiences of the SysMO-DB project for data sharing and data management in the field by systems biologists.
to the Research Information Network

Published in: Technology
  • Be the first to comment

Data management, data sharing: the SysMO-SEEK Story

  1. 1. Data sharing Data management The SysMO-SEEK Story Professor Carole Goble FREng FBCS CITP University of Manchester, UK carole.goble@manchester.ac.uk
  2. 2. 13 teams 91 institutes, 300 scientists Multi-site, multi-disciplinary Each three year duration Data generation Data consumption Data analysis Data management: Local – Shared – Long term Pan European Systems Biology http://www.sysmo.net
  3. 3. Own data solutions. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Extreme caution over sharing. Modellers vs experimentalist tribalism Many institutions, many projects, overlapping memberships, changing membership. Projects ending, starting, carrying on the same, carrying on differently. Legacy Suspicion Dynamics Expert scientists, inexpert informaticians. Few resources. Skills Patchy standards, incomparable data, afterthought. Data
  4. 4. Scientist Lab Collaborators Competitors ProgrammePublished Post- Publication Pre- Publication
  5. 5. Data mine-ing “my impression of researchers, and I can criticize myself in this, is that we’re much more interested in sharing data when we mean sharing somebody else’s as opposed [to] sharing ours.” E-infrastructure - taking forward the strategy, RIN report, 2010
  6. 6. Competitive advantage. Adoption. Kudos & Credit. Help. Fame. Reputation. Being scooped. Scrutiny. Misinterpretation. Cost. Blame. Reputation. RewardsRisks Nature 461, 145 (10 September 2009) 1. Sharing
  7. 7. “It’s not ready yet” “I need to get (another) publication first” “We don’t have the resources or skills to prepare it for others, esp. now we finished that project” “Its faster/easier to do it myself, and will keep the credit/control too” “Its not described enough to be usable” “I don’t trust the quality. Its not reliable enough. Its too noisy. “Others won’t use it properly.” “It’s not worth my while”“They are my competitors!!”
  8. 8. Pseudo Sharing
  9. 9. 2. Preparation for Use Curation Standards Reusability Reproducibility Accountability & Quality Data discipline Silo busting
  10. 10. CIMR Core Information for Metabolomics Reporting MIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment MIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments STRENDA Standards for Reporting Enzymology Data TBC Tox Biology Checklist BioPAX : Biological Pathways Exchange http://www.biopax.org/ FuGE Functional Genomics Experimenthttp://www.mibbi.org/index.php/MIBBI_portal Minimum Information for Biological and Biomedical Investigations Metadata Minefield
  11. 11. http://usefulchem.wikispaces.com/page/code/EXPLAN001 http://www.mygrid.org.uk/tools/taverna/ Publishing Process models software methods scripts http://openwetware.org standard operating procedures
  12. 12. Community Curation Responsiblity
  13. 13. Blue Collar Science John Quackenbush Difficult and time consuming Poor Credit or Reward Shabby Career Paths & Prospects
  14. 14. 3. Credit Crisis • Reward sharing, curation and reuse rather than reinvention. • Credit. Attribution. Citation. • For software, methods and standards too. • Technical (DataCite.org). • Cultural (Respected policy). • Institutional. • Funding bodies.
  15. 15. 4. Infrastructure, Capability & Capacity • Three year PhD/project cycle • Local data control • Realistic paths to adoption by busy people. • Spreadsheets, wikis, catalogues and yellow pages. • Content and Tools
  16. 16. http://www.biosharing.org Identity Management Sharednames DataCite LSID DOIs ORCID 5. Data Ecosystem Resources
  17. 17. 6. Sustained Resources • Three year projects. • Three year lifespan of data (and its software). • Sunsets and Sustains • Reinvention rewarded • Institution. • Funding councils. • Funding panels. • Publishers • Libraries • National data centres • International data centres Free. Like Puppies
  18. 18. Incentives. Sensitivity to Behaviours Infrastructure Community building Trusted service Coordination Governance Policy Capability Community Integration
  19. 19. A Partnership • Software engineers • Computational scientists • Experimental Scientists • Domain informaticians • Service providers • Funding agencies • But the community credit crisis continues….
  20. 20. Summary • Science is a complex social activity undertaken by tribes of people and dominated by trust issues. • Infrastructure has to be there and fit for purpose but its not the real the problem. • Need a cultural shift (on all sides) that truly honours data.

×