• Save
10. Katy Wolstencroft, University of Manchester
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
401
On Slideshare
401
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Focuses on the dynamic and interacting processes in biological systems
  • The social engineering bit
  • Common framework for describing each type of experiment and data type Common framework for integration Requirement for integration and interoperation Requirement for standard naming conventions
  • Not all mibbis are equal
  • Linking methods with data and linking models with data Spreadsheets everywhere
  • The e-lab features

Transcript

  • 1. Sharing data, models and procedures between SystemsBiologists: technical and social engineering Katy Wolstencroft University of Manchester Vrije Universiteit, Amsterdam myGrid, SysMO-DB
  • 2. Systems Biology Method Public Data Public Data Acquisition Acquisition Biological insight Model Hypothesis Analysis Generation ExperimentalModelling Experiment Model and Data Validation Generation Model Experiment Construction Analysis Biological insight Public Data Public Data Acquisition Acquisition
  • 3. Data Acquisition and Generation Some data is large  Problems of storage, transmission over network etc Publicand new data must be combined Data must be integrated into models Some public data stored in distributed silos Some data is buried in the literature
  • 4. Local hard Model driveRepository Biological insight Model Hypothesis Analysis Generation Experiment Model and Data Validation Generation Model Experiment Construction Analysis Biological insight Local hard Public drive Databases
  • 5. Barriers to Improving the Situation No environments for sharing Systems Biology data and combining it with models Heterogeneous data (and models) Few incentives for data generators to share Few incentives for modellers to share intermediate version of models Few incentives for experimentalists or modellers to annotate or curate their data Poorly annotated data
  • 6. SEEK: Systems Biology Data SharingA platform for sharing Systems Biology data, models and protocols in the context of systems biology experiments Web based environment for sharing within a consortium, disseminating to the community, and exploring data and models (an e-Laboratory) Standards Compliant Fitting in with laboratory practices
  • 7. • Search and exchange of data, models andprocesses• Maximise the “shelf-life” of SysMO assets• Disseminate to wider scientific community• Started at the end of 2008
  • 8. Systems Biology of Microorganismshttp://www.sysmo.net  Pan European collaboration  13 individual projects, >100 institutes  Different research outcomes  A cross-section of microorganisms, incl. bacteria, archaea and yeast  Record and describe the dynamic molecular processes occurring in microorganisms and present these processes in the form of computerized mathematical models  Pool research capacities and know-how  Disseminate activities of the consortium  Started April 2007
  • 9. Assets Catalogue. Archive. Social Network. Sharing Space. Gateway . Communication I-S-A Publications structure:Yellow Pages: Community Linking Models: Gateway to PubMed People data, Store Expertise models, Simulate with JWS Projects SOPs Online Institutions Gateway to Annotate BioPortal Gateway to COPASI, JWS Online, BioModelsData: Experimental data sets Processes and analysed results Standard Operating Procedures Gateway to public data stores Computational workflows - – SABIO-RK, ‘omics Taverna Gateway to myExperiment Versioning/Sharing
  • 10. ~ 2110 assets People – 362 Investigations - 43 Studies - 98 Assays - 183 Data sets - 1002 Models - 84 SOPs - 148 Publications -193
  • 11. SEEK in Use Adopted by:Developed for: Host–pathogen interactions Consensus model CISBIC of Yeast Glycolysis
  • 12. Virtual Liver Network ~45 organisations, ~70 groups BMBF “Großprojekt“ multiscale rep of the liver  Same key requirements: yellow pages, exchange of all clinical impact sops/data/models, sharing general public portal rights  Different biology • Multiscale data • Multiscale models • Imaging  Different project structure • Hierarchies (A, A1, A1.2) • Regional groups of groups  Flexibility, extensibility, open 12 sourceness of SEEK key
  • 13. Social Challenges and Social Engineering How did we make sure we built to the specifications of the whole community? Howdid we create a resource that was immediately useful and incremental? How did we encourage the use of the SEEK?
  • 14. SysMO-DB PALS  Post-docs and Post-grads students  Funded mechanism2008  Co-designers  Intense collaboration. 13 meetings, 6 workshops, 79 visits  Audits and Sharing  Methods, data, models, standards, software, schemas, spreadsheets,2009 SOPs…..20 questions  Bridge modellers-experimentalists  Big training/engagement effort  14 Summer schools / tutorials  Systems Biology Software Developers Foundry 20122010
  • 15. PALs Communication Show what is there Double check Suggest what is possible Transmit Ask for requirements Disseminate Give requirements Tell priorities Rate outcomes Collect answers Suggest improvementsDB team Focus Group Projects
  • 16. The PALs told usWhat was difficult What was important Communication between  Sharing SOPs and protocols experimentalists and  Finding out who does what modellers – yellow pages Cross-site collaboration  Remaining in control of Managing versions of data each uploaded item and models  Share with chosen few Justifying/wanting to spend  Share with consortium time annotating data and  Share with world models  Update, version, remove  Credit and attribution
  • 17. The PALs told us what tools were used For storing data
  • 18. The PALs told us what tools were used For storing data  Spreadsheets on a hard drive, or maybe on a wiki
  • 19. The PALs told us what tools were used For storing data  Spreadsheets on a hard drive, or maybe on a wiki For analysing data
  • 20. The PALs told us what tools were used For storing data  Spreadsheets on a hard drive, or maybe on a wiki For analysing data  Spreadsheets (graphs, calculations)
  • 21. The PALs told us what tools were used For storing data  Spreadsheets on a hard drive, or maybe on a wiki For analysing data  Spreadsheets (graphs, calculations) For sharing data
  • 22. The PALs told us what tools were used For storing data  Spreadsheets on a hard drive, or maybe on a wiki For analysing data  Spreadsheets (graphs, calculations) For sharing data  Spreadsheets (attached to emails or in a wiki)
  • 23. The PALs told us what tools were used For storing data  Spreadsheets on a hard drive, or maybe on a wiki For analysing data  Spreadsheets (graphs, calculations) For sharing data  Spreadsheets (attached to emails) We should build some spreadsheet tools!
  • 24. THE SEEK IN USEBASIC PRINCIPLES: USE WHAT ISAVAILABLE
  • 25. Types of data Multiple omics  genomics, transcriptomics  proteomics, metabolomics  fluxomics, reactomics Images Molecular biology Reaction Kinetics Models  Metabolic, gene network, kinetic Relationships between data sets/experiments  Procedures, experiments, data, results and models Analysis of data
  • 26. Minimum Information Models Formats, Ontologies, Naming schemes and Controlled Vocabularies What is the least amount of information required to: Find, Interpret, Understand, Reuse Realistic and pragmatic expectations for annotation Ask for too much = people overwhelmed and provide less Ask for too little = people will obligeTranscriptomicsProteomicsMetabolomics40+ MIBBI
  • 27. Not quite available “off the shelf” Loose guidelines or checklists Specific formats (generally in XML) Specific formats with associated ontologiesRemaining questions for the scientists: How can I generate standards compliant data? Which vocabularies/ontologies should I use? How do I know which ontology terms to use where?
  • 28. Data Templates and Vocabularies Metabolomics SOP Proteomics MetabolomicsInvestigations Studies SOP Mass Spec Assays Fluxomics SOP Transcriptomics Construction Validation
  • 29. Overarching‘MIBBI’ standard for data“Just Enough Results Model” in SEEK What type of data is it  Microarray, growth curve, enzyme activity… What was measured  Gene expression, OD, metabolite concentration…. What do the values in the datasets mean  Units, time series, repeats…. Format and JERM ontology
  • 30. standardData based templates
  • 31. RightField: Managing Vocabularies Selected parent term from the ontologyExcel workbook withmarked-up cells Methods for specifying ontology terms Value Type and Property Term lists for selected cells
  • 32. RightField Results Ontology details hidden, simple drop-down lists Vanilla spreadsheets – no macros or plugins Ontology and term IRIs preserved for versioning/provenance
  • 33. standardOmics data based templates
  • 34. Semantic Linking and QueryingExtracting and storing metadata in RDF (Resource Description Framework), via RightField Better searching and querying New representations and visualisation of relationships between SEEK assets Linking SEEK data to the web of Linked Data
  • 35. Exploring and Analysing: The SEEK as a Gateway Cytoscape JWS onlineExplorespreadsheets SYCAMORE
  • 36. http://www.seek4science.orgPublic SEEK
  • 37. DEALING WITH LARGE DATA
  • 38. Dealing with Large Data
  • 39. Dealing with Large Data Submit big data via drag and drop Submit by mail To: Alice To: Alice Cc: seek@virtual- Cc: seek@virtual- liver.de liver.de
  • 40. BUDGET CUTS: MAKING THE MOSTOF YOUR RESOURCES
  • 41. SEEK Properties Virtual machine  Easy to install and maintain Basic hardware/software requirements  Low cost start-up Extensible  Add own plugins Open source  Affordable / free Long term storage and dissemination still an issue
  • 42. Incentives for using the SEEK Funding requirement to keep and give access to the data for 10 years SEEK is a safe haven for the data  10 year guarantee from HITS Other SEEK administrators providing the same ‘SEEK Mothership’ planned for European Systems Biology community
  • 43. Incentives for Using SEEK Credit and attribution  Linking people to the data  Linking data to publications Accountability  SEEK records who owns what  SEEK records when data was uploaded/shared  Natural competitiveness between projects
  • 44. Incentives for using SEEK Exporting and linking  Submission to public repositories  Silosnot ideal for Systems Biology, but useful for meta -analysis  Supplementary materials store – persistent URIs  Linking publications and data
  • 45. Why it works for us Off the shelf Systems Biology sharing environment Fits in with existing data and model management practices Incremental production with rapid prototypes and feedback from PALs Publish and share within the consortium and beyond Scientists stay in control
  • 46. http://bib.oxfordjournals.org/content/early/2012/10/09/bib.bbs064.full.pdf
  • 47. Acknowledgements: SysMO-DB Team Carole Goble Stuart Owen Katy Wolstencroft Rob Haines Niall BeardStellenbosch Finn Bacall Matt Horridge Sergejs Aleksejevs Franco Du Preez Jacky SnoepHITS SysMO PALS Wolfgang Mueller Olga Krebs Quyen Nguyen Andreas Weidemann