SyMBA Overview Allyson Lister [email_address] CISBAN, Newcastle University March 2009 Allyson Lister, CC BY-SA 3.0 unless otherwise specified
Systems and Molecular Biology Data and Metadata Archive
Background: Handling Big Data
Why use SyMBA?
What is SyMBA?
How is SyMBA used?
CC-SA-2.0, Tom Murphy VII, commons.wikimedia.org
Background: Handling Big Data CC-SA-2.0, Tom Murphy VII, commons.wikimedia.org
Responsible Data Management
“Sooner or later, the research community will need to be involved in the annotation effort to scale up to the rate of data generation.” Nature 455, 47-50
“This transition will require...standardized methods” Nature 455, 47-50
Release of September 2 2008: http://uniprot.org
Commitment to Curation
“ ... standards require support from researchers, who should adopt them and deploy them consistently.” Nature 455, 1
“ This takes a degree of intellectual and practical commitment to what can seem like tedious bookkeeping.” Nature 455, 1
Nature Biotechnology 25, 1127 - 1133
Documentation as Part of the Experiment
“Researchers need to adapt their institutions and practices in response to torrents of new data...” ( Nature 455, 1)
“ Researchers need to be obliged to document and manage their data with as much professionalism as they devote to their experiments.” ( Nature 455, 1)
CC-NC-2.0
It's Not Just Researchers...
“ Funding agencies have been slow to support data infrastructure and this is one cultural shift that needs to accelerate” Nature 455, 1
[researchers]... “should receive greater support in this endeavour than they are afforded at present.” Nature 455, 1
Researchers as Stewards
From Nature 455, 28-29: Scientists should act as stewards by
Honouring disciplinary standards
Defining and recording appropriate metadata to allow for later interpretation of the data
Definition of metadata best done at the time of data capture
This includes provenance, parameters, and more
This is where SyMBA comes in
Allows the above, and removes tedious repetition
What is SyMBA? CC-SA-2.0, Tom Murphy VII, commons.wikimedia.org
The Three Foundations
Content: the information about the experiment
Syntax: the structure for that information
Semantics: providing agreed-upon definitions for the information
Legend for license abbreviations in the body of the presentation:
CC-SA-2.0 is the Creative Commons Attribution – Share Alike 2.0 Generic license. Details here: http://creativecommons.org/licenses/by-sa/2.0/
CC-BY-2.5 is under the Creative Commons Attribution – 2.5 Generic license. Details here: http://creativecommons.org/licenses/by/2.5
CC-NC-2.0 is under the Creative Commons Non-Commercial 2.0 license. Details here: http://creativecommons.org/licenses/by-nc/2.0/uk/
PD: Public Domain, no restrictions
I have strived to keep attribution for all images used. Please let me know if I have gotten anything wrong. Please note all other portions of this presentation are copyright by Allyson Lister and her employers under the CC BY-SA 3.0. See http://creativecommons.org/licenses/by-sa/3.0
SyMBA (http://symba.sf.net) is a data archive and i more
SyMBA (http://symba.sf.net) is a data archive and integrator based on Version 1 of the Functional Genomics Experiment (FuGE, http://fuge.sf.net) Object Model (FuGE-OM), and which archives, stores, and retrieves raw high-throughput data. Until now, few published systems have successfully integrated multiple omics data types and information about experiments in a single database. SyMBA includes a database back-end, expert and standard interfaces, and a Life Science Identifier (LSID) Resolution and Assigning service to identify objects and provide programmatic access to the database. Having a central data repository prevents deletion, loss, or accidental modification of primary data, while giving convenient access to the data for publication and analysis. It also provides a central location for storage of metadata for the high-throughput data sets, and will facilitate subsequent data integration strategies.
We encourage the use, installation and development of SyMBA by other groups. Please let us know if you are interested in using or evaluating SyMBA for use at your own Centre. Contact us: symba-devel at lists.sourceforge.net less
0 comments
Post a comment