E-Lab and Taverna – all my software - elephants ---- elephant in the room, blind men and elephants, danger of being white elephants? SysMO And other e-Science projects Each of these apply to all our projects. Just one of them is not enough. Not even for Taverna. To sustain it as a service we must sustain the software and the content in its repositories
Data management, data sharing: the SysMO-SEEK Story
Professor Carole Goble FREng FBCS CITP
University of Manchester, UK
91 institutes, 300 scientists
Each three year duration
Local – Shared – Long term
Own data solutions. wikis, e-Groupware,
PHProjekt, BaseCamp, PLONE, Alfresco, bespoke
commercial … files and spreadsheets.
Extreme caution over sharing.
Modellers vs experimentalist tribalism
Many institutions, many projects, overlapping
memberships, changing membership. Projects
ending, starting, carrying on the same, carrying
Expert scientists, inexpert informaticians. Few
Patchy standards, incomparable data,
“my impression of researchers, and I can
criticize myself in this, is that we’re much
more interested in sharing data when we
mean sharing somebody else’s as opposed
[to] sharing ours.”
E-infrastructure - taking forward the strategy, RIN report, 2010
“It’s not ready yet”
“I need to get (another) publication first”
“We don’t have the resources or skills to prepare
it for others, esp. now we finished that project”
“Its faster/easier to do it myself, and will keep the
“Its not described enough to be usable”
“I don’t trust the quality. Its not reliable enough. Its
“Others won’t use it properly.”
“It’s not worth
my while”“They are my competitors!!”
2. Preparation for Use
Accountability & Quality
Data discipline Silo busting
CIMR Core Information for Metabolomics Reporting
MIABE Minimal Information About a Bioactive Entity
MIACA Minimal Information About a Cellular Assay
MIAME Minimum Information About a Microarray Experiment
MIAME/Env MIAME / Environmental transcriptomic experiment
MIAME/Nutr MIAME / Nutrigenomics
MIAME/Plant MIAME / Plant transcriptomics
MIAME/Tox MIAME / Toxicogenomics
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIARE Minimum Information About a RNAi Experiment
MIASE Minimum Information About a Simulation Experiment
MIENS Minimum Information about an ENvironmental Sequence
MIFlowCyt Minimum Information for a Flow Cytometry Experiment
MIGen Minimum Information about a Genotyping Experiment
MIGS Minimum Information about a Genome Sequence
MIMIx Minimum Information about a Molecular Interaction Experiment
MIMPP Minimal Information for Mouse Phenotyping Procedures
MINI Minimum Information about a Neuroscience Investigation
MINIMESS Minimal Metagenome Sequence Analysis Standard
MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
MIPFE Minimal Information for Protein Functional Evaluation
MIQAS Minimal Information for QTLs and Association Studies
MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment
MIRIAM Minimal Information Required In the Annotation of biochemical Models
MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
STRENDA Standards for Reporting Enzymology Data
TBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/
FuGE Functional Genomics Experimenthttp://www.mibbi.org/index.php/MIBBI_portal
Blue Collar Science
3. Credit Crisis
• Reward sharing, curation and
reuse rather than reinvention.
• Credit. Attribution. Citation.
• For software, methods and
• Technical (DataCite.org).
• Cultural (Respected policy).
• Funding bodies.
4. Infrastructure, Capability & Capacity
• Three year
• Local data control
• Realistic paths to
adoption by busy
• Spreadsheets, wikis,
• Content and Tools
LSID DOIs ORCID
5. Data Ecosystem
6. Sustained Resources
• Three year projects.
• Three year lifespan of data (and its software).
• Sunsets and Sustains
• Reinvention rewarded
• Funding councils.
• Funding panels.
• National data centres
• International data centres
Free. Like Puppies
• Software engineers
• Computational scientists
• Experimental Scientists
• Domain informaticians
• Service providers
• Funding agencies
• But the community
credit crisis continues….
• Science is a complex social activity
undertaken by tribes of people and
dominated by trust issues.
• Infrastructure has to be there and fit for
purpose but its not the real the problem.
• Need a cultural shift (on all sides) that
truly honours data.