Learn about JISC’s work in the area of shared services for STEM subjects, particularly the JANET network service and virtual research environments (i.e., web tools for helping research processes) Explore new opportunities for research being opened up via shared services, and also the economic savings this creates Consider the role their university might play in providing a shared service to other institutions
Nor major data centres but long tail
Data pipeline Data funnel Fuzzy line between collaborators and competitors Usb drives, wikis, databadsaes, Disributed in email etc.
Sharing without fear
MaDaM project Competitive advantage. Academic vanity. Adoption. Reputation. Acceleration. Novel insights. Help. Scrutiny. Being scooped. Misinterpretation. Reputation. Trust. Not comprehensible Competitive advantage. Academic vanity. Reputation. Adoption Scrutiny. Being scooped. Misinterpretation. New Reward Schemes But we have to aware of the drivers for collaboration. Competitive advantage. Be the first with the Nature paper. Academic vanity Credit, credibility, fame, acclaim, recognition, peer respect, reputation. Adoption Get my stuff adopted / recognised More funding Being found out Open to rigorous inspection. Being scooped Beaten by lab X Protecting my turf. Releasing results too early. Getting left behind. Being out of fashion. Looking stupid Being misinterpreted or misrepresented. Looking stupid. Losing control. Taking a risk
Genomics Standards Consortium http://gensc.org/gc_wiki/index.php/MIBBI_workshop All or nothing
Credit, Citation, Career Personal and institutional visibility Scholarly citation metrics
contribute, curate, review, reuse. Data is not respected . John Quackenbush - John Quackenbush - Professor of Computational Biology and Bioinformatics - Department of Biostatistics - Harvard School of Public Health.
58% developed by students, 24% stated not maintained (Schultheiss et al. (2010) PLoS Comp Biol (in review)) Tools, commons Preparing data for sharing is free like puppies are free
National Centre for BioOntologies The Open Biological and Biomedical Ontologies Standardise messages not structures Only as good as your data services Minimum models and Controlled vocabularies 63% 47%
58% developed by students, 24% stated not maintained (Schultheiss et al. (2010) PLoS Comp Biol (in review)) Tools, commons Preparing data for sharing is free like puppies are free Doi’s cost
Hard core are the PALs Commons-based Cleanup ● Manual and automated curation workflows ● Curators emergent and assigned ● Curation tools Incentives Right time right place – also email! Third party curation is really hard Expert curation Classification Weeding Added value Structured metadata Prompting Classification Filtering Facetted browsing Time to get organised One example workflow can be found at: http://www.myexperiment.org/workflows/16 This the the old example workflow, but I have tagged as a benchmark. You can see the breakdown of tags given to this at: http://www.myexperiment.org/workflows/16/curation ... or by clicking on the breakdown section (see attached image). 14 curation tags Some are slightly ambiguous and others have little meaning These were: * test workflow * component - part of whole solution * whole solution * tutorial / example * incomplete * junk * obsolete - deprecated * runnable * not runnable * requires description * requires credit / attribution * requires example input data * description; [Description Text] * example data; [port : value] Each tag was preceeded by a &quot;c:&quot; so that it would be picked up by the myExperiment plugin and could be differentiated from other myExperiment tags. If some example data was known, I tried to add it to using the example tag &quot;example data; [port : value]&quot;, where the port name is given, along with the data to be put into the port. The whole process was very time consuming, as I had to try and open each workflow in T2, run it using some example data (or figure out what it did and run it with lots of test data), and then add each comment (checking each workflow on myExperiment to see if it had complete properly.
Add url here
E-Lab and Taverna – all my software - elephants ---- elephant in the room, blind men and elephants, danger of being white elephants? SysMO And other e-Science projects Each of these apply to all our projects. Just one of them is not enough. Not even for Taverna. To sustain it as a service we must sustain the software and the content in its repositories
Data sharing - Data management - The SysMO-SEEK Story
Professor Carole Goble FREng FBCS CITP
University of Manchester, UK
91 institutes, 300 scientists
Each three year duration
Local – Shared – Long term
Own data solutions. wikis, e-Groupware,
PHProjekt, BaseCamp, PLONE, Alfresco, bespoke
commercial … files and spreadsheets.
Extreme caution over sharing.
Modellers vs experimentalist tribalism
Many institutions, many projects, overlapping
memberships, changing membership. Projects
ending, starting, carrying on the same, carrying
Expert scientists, inexpert informaticians. Few
Patchy standards, incomparable data,
“my impression of researchers, and I can
criticize myself in this, is that we’re much
more interested in sharing data when we
mean sharing somebody else’s as opposed
[to] sharing ours.”
E-infrastructure - taking forward the strategy, RIN report, 2010
“It’s not ready yet”
“I need to get (another) publication first”
“We don’t have the resources or skills to prepare
it for others, esp. now we finished that project”
“Its faster/easier to do it myself, and will keep the
“Its not described enough to be usable”
“I don’t trust the quality. Its not reliable enough. Its
“Others won’t use it properly.”
“It’s not worth
my while”“They are my competitors!!”
2. Preparation for Use
Accountability & Quality
Data discipline Silo busting
CIMR Core Information for Metabolomics Reporting
MIABE Minimal Information About a Bioactive Entity
MIACA Minimal Information About a Cellular Assay
MIAME Minimum Information About a Microarray Experiment
MIAME/Env MIAME / Environmental transcriptomic experiment
MIAME/Nutr MIAME / Nutrigenomics
MIAME/Plant MIAME / Plant transcriptomics
MIAME/Tox MIAME / Toxicogenomics
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIARE Minimum Information About a RNAi Experiment
MIASE Minimum Information About a Simulation Experiment
MIENS Minimum Information about an ENvironmental Sequence
MIFlowCyt Minimum Information for a Flow Cytometry Experiment
MIGen Minimum Information about a Genotyping Experiment
MIGS Minimum Information about a Genome Sequence
MIMIx Minimum Information about a Molecular Interaction Experiment
MIMPP Minimal Information for Mouse Phenotyping Procedures
MINI Minimum Information about a Neuroscience Investigation
MINIMESS Minimal Metagenome Sequence Analysis Standard
MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
MIPFE Minimal Information for Protein Functional Evaluation
MIQAS Minimal Information for QTLs and Association Studies
MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment
MIRIAM Minimal Information Required In the Annotation of biochemical Models
MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
STRENDA Standards for Reporting Enzymology Data
TBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/
FuGE Functional Genomics Experimenthttp://www.mibbi.org/index.php/MIBBI_portal
Blue Collar Science
3. Credit Crisis
• Reward sharing, curation and
reuse rather than reinvention.
• Credit. Attribution. Citation.
• For software, methods and
• Technical (DataCite.org).
• Cultural (Respected policy).
• Funding bodies.
4. Infrastructure, Capability & Capacity
• Three year
• Local data control
• Realistic paths to
adoption by busy
• Spreadsheets, wikis,
• Content and Tools
LSID DOIs ORCID
5. Data Ecosystem
6. Sustained Resources
• Three year projects.
• Three year lifespan of data (and its software).
• Sunsets and Sustains
• Reinvention rewarded
• Funding councils.
• Funding panels.
• National data centres
• International data centres
• Software engineers
• Computational scientists
• Experimental Scientists
• Domain informaticians
• Service providers
• Funding agencies
• But the community
credit crisis continues….
• Science is a complex social activity
undertaken by tribes of people and
dominated by trust issues.
• Infrastructure has to be there and fit for
purpose but its not the real the problem.
• Need a cultural shift (on all sides) that
truly honours data.