SyMBA: Overview

1,215 views

Published on

SyMBA (http://symba.sf.net) is a data archive and integrator based on Version 1 of the Functional Genomics Experiment (FuGE, http://fuge.sf.net) Object Model (FuGE-OM), and which archives, stores, and retrieves raw high-throughput data. Until now, few published systems have successfully integrated multiple omics data types and information about experiments in a single database. SyMBA includes a database back-end, expert and standard interfaces, and a Life Science Identifier (LSID) Resolution and Assigning service to identify objects and provide programmatic access to the database. Having a central data repository prevents deletion, loss, or accidental modification of primary data, while giving convenient access to the data for publication and analysis. It also provides a central location for storage of metadata for the high-throughput data sets, and will facilitate subsequent data integration strategies.

We encourage the use, installation and development of SyMBA by other groups. Please let us know if you are interested in using or evaluating SyMBA for use at your own Centre. Contact us: symba-devel at lists.sourceforge.net

Published in: Technology, Art & Photos
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,215
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

SyMBA: Overview

  1. 1. SyMBA Overview Allyson Lister [email_address] CISBAN, Newcastle University March 2009 Allyson Lister, CC BY-SA 3.0 unless otherwise specified
  2. 2. Systems and Molecular Biology Data and Metadata Archive <ul><li>Background: Handling Big Data
  3. 3. Why use SyMBA?
  4. 4. What is SyMBA?
  5. 5. How is SyMBA used? </li></ul>CC-SA-2.0, Tom Murphy VII, commons.wikimedia.org
  6. 6. Background: Handling Big Data CC-SA-2.0, Tom Murphy VII, commons.wikimedia.org
  7. 7. Responsible Data Management <ul><li>“Sooner or later, the research community will need to be involved in the annotation effort to scale up to the rate of data generation.” Nature 455, 47-50
  8. 8. “This transition will require...standardized methods” Nature 455, 47-50 </li></ul>Release of September 2 2008: http://uniprot.org
  9. 9. Commitment to Curation <ul><li>“ ... standards require support from researchers, who should adopt them and deploy them consistently.” Nature 455, 1
  10. 10. “ This takes a degree of intellectual and practical commitment to what can seem like tedious bookkeeping.” Nature 455, 1 </li></ul>Nature Biotechnology 25, 1127 - 1133
  11. 11. Documentation as Part of the Experiment <ul><li>“Researchers need to adapt their institutions and practices in response to torrents of new data...” ( Nature 455, 1)
  12. 12. “ Researchers need to be obliged to document and manage their data with as much professionalism as they devote to their experiments.” ( Nature 455, 1) </li></ul>CC-NC-2.0
  13. 13. It's Not Just Researchers... <ul><li>“ Funding agencies have been slow to support data infrastructure and this is one cultural shift that needs to accelerate” Nature 455, 1
  14. 14. [researchers]... “should receive greater support in this endeavour than they are afforded at present.” Nature 455, 1 </li></ul>
  15. 15. Researchers as Stewards <ul><li>From Nature 455, 28-29: Scientists should act as stewards by </li><ul><li>Honouring disciplinary standards
  16. 16. Defining and recording appropriate metadata to allow for later interpretation of the data
  17. 17. Definition of metadata best done at the time of data capture
  18. 18. This includes provenance, parameters, and more </li></ul><li>This is where SyMBA comes in </li><ul><li>Allows the above, and removes tedious repetition </li></ul></ul>
  19. 19. What is SyMBA? CC-SA-2.0, Tom Murphy VII, commons.wikimedia.org
  20. 20. The Three Foundations <ul><li>Content: the information about the experiment
  21. 21. Syntax: the structure for that information
  22. 22. Semantics: providing agreed-upon definitions for the information </li></ul>PD: http://commons.wikimedia.org/wiki/Image:Duke_Ellington_-_Hurricane_Ballroom_-_trio.jpg
  23. 23. Content: MIBBI, e.g. <ul><li>MIAME: what is considered “minimal” for microarrays: </li><ul><li>the raw data for each hybridisation (e.g., CEL or GPR)
  24. 24. the final processed (normalised) data for the set of hybridisations in the experiment
  25. 25. the essential sample annotation
  26. 26. the experimental design
  27. 27. sufficient annotation of the array
  28. 28. the essential laboratory and data processing protocols </li></ul></ul>adapted from mibbi.org (image) and text from http://www.mged.org/Workgroups/MIAME/miame.html
  29. 29. Syntax: FuGE <ul><li>The Functional Genomics Experiment Object Model & Markup Language (FuGE-OM, FuGE-ML)
  30. 30. standardizes and structures experimental metadata for a range of omics experiments
  31. 31. models experimental objects such as samples, protocols, instruments, and software
  32. 32. provides extension points for the creation of individual community standards </li></ul>PD: http://commons.wikimedia.org/wiki/Image:Syntax_tree.svg
  33. 33. Semantics: OBI and others <ul><li>encourages unambiguous names for things
  34. 34. 'universal' terms, that are applicable across various biological and technological domains
  35. 35. enables computational exploitation of information </li></ul>PD: http://commons.wikimedia.org/wiki/Image:Enigma.jpg
  36. 36. Why Use SyMBA? CC-SA-2.0, Tom Murphy VII, commons.wikimedia.org
  37. 37. Curation Starts at Home <ul><li>Nature's recent Big Data special has emphasized the importance of data curation by the researchers who create data
  38. 38. CISBAN has a way to allow researchers to provide this metadata at the same time as they archive and backup their data: SyMBA
  39. 39. The Big Data special was only 2 weeks ago, but SyMBA has been in development for > 2 years! </li></ul>CC BY-SA 3.0: http://commons.wikimedia.org/wiki/File:DNA_microarray.svg
  40. 40. What does SyMBA do for me? <ul><li>Storage for primary, large-scale data is: </li><ul><li>Long-term
  41. 41. Protected
  42. 42. Well-organized
  43. 43. Easily-accessible
  44. 44. Searchable </li></ul></ul>PD: http://commons.wikimedia.org/wiki/Image:Affymetrix_GeneChip.jpg
  45. 45. What does SyMBA do for me? <ul><li>Keeps histories
  46. 46. Promote data sharing through the use of standards
  47. 47. Aids conformance to journal standards of data deposition and description </li></ul>nature.com
  48. 48. What does SyMBA do for me? <ul><li>Open Source </li><ul><li>Code (but not data!) freely available for anyone's contributions
  49. 49. Could speed development with larger programmer base </li></ul><li>Aids fulfilment of BBSRC best practices </li></ul>PD: commons.wikimedia.org/wiki/Image:Wikimedia_Community_Logo-Commons_from_a_blue_planet.svg
  50. 50. How is SyMBA Used? CC-SA-2.0, Tom Murphy VII, commons.wikimedia.org
  51. 51. What does SyMBA look like? <ul><li>To the user, SyMBA is a website
  52. 52. When the design of the website was being developed, the users said you wanted something quick and simple to use . </li></ul>
  53. 53. How do developers prepare SyMBA for users? <ul><li>Developers talk with users
  54. 54. Discover what protocols, equipment, and software are used (e.g. answers to MIBBI checklists)
  55. 55. Templates are made
  56. 56. This saves users from entering data multiple times! </li></ul>GNU: commons.wikimedia.org/wiki/Image:Cyberduck_document.png
  57. 57. GNU: commons.wikimedia.org/wiki/Image:Cyberduck_document.png Template Exp. 1 Exp. 2 Exp. 3 SyMBA Developer-created Templates User-created Experiments SyMBA SyMBA
  58. 58. The Future... <ul><li>Update the interface to make it prettier
  59. 59. Template Creation Wizard
  60. 60. Provide batch loading features </li></ul>CC BY 2.5: http://commons.wikimedia.org/wiki/Image:DeLorean_DMC-12_Head_with_doors_open.png
  61. 61. When FuGE is more extensively used... <ul><li>EBI plans on having databases that understand FuGE
  62. 62. This could mean automatic upload from SyMBA to EBI
  63. 63. If other research groups store data using the FuGE format, then we could share experimental information much more easily </li></ul>
  64. 64. Credits <ul><li>Programmers </li><ul><li>Allyson Lister, Olly Shaw, Frank Gibson, Joerg Servos, Rainer Schopf </li></ul><li>Bioinformatics Support Unit, Newcastle Uni </li><ul><li>Dan Swan, Simon Cockell </li></ul><li>Ideas People </li><ul><li>Matt Pocock, Neil Wipat, Jen Hallinan, Phil Lord, Andy Jones </li></ul><li>Tom Kirkwood and all at CISBAN for all their testing and more </li></ul>
  65. 65. Thank You CC-SA-2.0: http://commons.wikimedia.org/wiki/Image:Thank_you_trashcan.jpg
  66. 66. More information <ul><li>Developed mainly at: http://www.cisban.ac.uk
  67. 67. Project documentation: http://symba.sf.net
  68. 68. Mailing list: [email_address]
  69. 69. Sandbox (playground) installation: http://www.cisban.ac.uk/symba-sandbox </li></ul>
  70. 70. Small Print <ul><li>Legend for license abbreviations in the body of the presentation: </li><ul><li>CC-SA-2.0 is the Creative Commons Attribution – Share Alike 2.0 Generic license. Details here: http://creativecommons.org/licenses/by-sa/2.0/
  71. 71. CC-BY-2.5 is under the Creative Commons Attribution – 2.5 Generic license. Details here: http://creativecommons.org/licenses/by/2.5
  72. 72. CC-NC-2.0 is under the Creative Commons Non-Commercial 2.0 license. Details here: http://creativecommons.org/licenses/by-nc/2.0/uk/ </li></ul><li>PD: Public Domain, no restrictions
  73. 73. I have strived to keep attribution for all images used. Please let me know if I have gotten anything wrong. Please note all other portions of this presentation are copyright by Allyson Lister and her employers under the CC BY-SA 3.0. See http://creativecommons.org/licenses/by-sa/3.0 </li></ul>

×