Data sharing - Data management - The SysMO-SEEK Story

Data sharing
Data management
The SysMO-SEEK
Story
Professor Carole Goble FREng FBCS CITP
University of Manchester, UK
carole.goble@manchester.ac.uk

13 teams
91 institutes, 300 scientists
Multi-site, multi-disciplinary
Each three year duration
Data generation
Data consumption
Data analysis
Data management:
Local – Shared – Long term
Pan European
Systems Biology
http://www.sysmo.net

Own data solutions. wikis, e-Groupware,
PHProjekt, BaseCamp, PLONE, Alfresco, bespoke
commercial … files and spreadsheets.
Extreme caution over sharing.
Modellers vs experimentalist tribalism
Many institutions, many projects, overlapping
memberships, changing membership. Projects
ending, starting, carrying on the same, carrying
on differently.
Legacy
Suspicion
Dynamics
Expert scientists, inexpert informaticians. Few
resources.
Skills
Patchy standards, incomparable data,
afterthought.
Data

Scientist Lab Collaborators Competitors
ProgrammePublished
Post-
Publication
Pre-
Publication

Data mine-ing
“my impression of researchers, and I can
criticize myself in this, is that we’re much
more interested in sharing data when we
mean sharing somebody else’s as opposed
[to] sharing ours.”
E-infrastructure - taking forward the strategy, RIN report, 2010

Competitive advantage.
Adoption.
Kudos & Credit.
Help.
Fame.
Reputation.
Being scooped.
Scrutiny.
Misinterpretation.
Cost.
Blame.
Reputation.
RewardsRisks
Nature 461, 145 (10 September 2009)
1. Sharing

“It’s not ready yet”
“I need to get (another) publication first”
“We don’t have the resources or skills to prepare
it for others, esp. now we finished that project”
“Its faster/easier to do it myself, and will keep the
credit/control too”
“Its not described enough to be usable”
“I don’t trust the quality. Its not reliable enough. Its
too noisy.
“Others won’t use it properly.”
“It’s not worth
my while”“They are my competitors!!”

2. Preparation for Use
Curation
Standards
Reusability
Reproducibility
Accountability & Quality
Data discipline Silo busting

CIMR Core Information for Metabolomics Reporting
MIABE Minimal Information About a Bioactive Entity
MIACA Minimal Information About a Cellular Assay
MIAME Minimum Information About a Microarray Experiment
MIAME/Env MIAME / Environmental transcriptomic experiment
MIAME/Nutr MIAME / Nutrigenomics
MIAME/Plant MIAME / Plant transcriptomics
MIAME/Tox MIAME / Toxicogenomics
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIARE Minimum Information About a RNAi Experiment
MIASE Minimum Information About a Simulation Experiment
MIENS Minimum Information about an ENvironmental Sequence
MIFlowCyt Minimum Information for a Flow Cytometry Experiment
MIGen Minimum Information about a Genotyping Experiment
MIGS Minimum Information about a Genome Sequence
MIMIx Minimum Information about a Molecular Interaction Experiment
MIMPP Minimal Information for Mouse Phenotyping Procedures
MINI Minimum Information about a Neuroscience Investigation
MINIMESS Minimal Metagenome Sequence Analysis Standard
MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
MIPFE Minimal Information for Protein Functional Evaluation
MIQAS Minimal Information for QTLs and Association Studies
MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment
MIRIAM Minimal Information Required In the Annotation of biochemical Models
MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
Experiments
STRENDA Standards for Reporting Enzymology Data
TBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/
FuGE Functional Genomics Experimenthttp://www.mibbi.org/index.php/MIBBI_portal
Minimum
Information for
Biological and
Biomedical
Investigations
Metadata Minefield

http://usefulchem.wikispaces.com/page/code/EXPLAN001
http://www.mygrid.org.uk/tools/taverna/
Publishing Process
models
software
methods
scripts
http://openwetware.org
standard operating
procedures

Community Curation
Responsiblity

Blue Collar Science
John Quackenbush
Difficult
and time
consuming
Poor Credit
or Reward
Shabby
Career
Paths &
Prospects

3. Credit Crisis
• Reward sharing, curation and
reuse rather than reinvention.
• Credit. Attribution. Citation.
• For software, methods and
standards too.
• Technical (DataCite.org).
• Cultural (Respected policy).
• Institutional.
• Funding bodies.

4. Infrastructure, Capability & Capacity
• Three year
PhD/project cycle
• Local data control
• Realistic paths to
adoption by busy
people.
• Spreadsheets, wikis,
catalogues and
yellow pages.
• Content and Tools

http://www.biosharing.org
Identity Management
Sharednames DataCite
LSID DOIs ORCID
5. Data Ecosystem
Resources

6. Sustained Resources
• Three year projects.
• Three year lifespan of data (and its software).
• Sunsets and Sustains
• Reinvention rewarded
• Institution.
• Funding councils.
• Funding panels.
• Publishers
• Libraries
• National data centres
• International data centres

Incentives.
Sensitivity to
Behaviours
Infrastructure
Community building
Trusted service
Coordination
Governance
Policy
Capability
Community
Integration

A Partnership
• Software engineers
• Computational scientists
• Experimental Scientists
• Domain informaticians
• Service providers
• Funding agencies
• But the community
credit crisis continues….

Summary
• Science is a complex social activity
undertaken by tribes of people and
dominated by trust issues.
• Infrastructure has to be there and fit for
purpose but its not the real the problem.
• Need a cultural shift (on all sides) that
truly honours data.

Data sharing - Data management - The SysMO-SEEK Story

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Data sharing - Data management - The SysMO-SEEK Story

Similar to Data sharing - Data management - The SysMO-SEEK Story (20)

More from Research Information Network

More from Research Information Network (20)

Recently uploaded

Recently uploaded (20)

Data sharing - Data management - The SysMO-SEEK Story

Editor's Notes