Software Development by the Genomics Standards Consortium

Bringing Standards to Life:
Software Development by the
Genomics
Standards Consortium

Renzo Kottmann
Microbial Genomics Group
Max Planck Institute for Marine Microbiology

M3 SIG Stockholm July 2009 1

Genomic Standards Consortium (GSC)

Goal
• Promote mechanisms that
 standardize the description of genomes
 exchange and integrate genomic data

Open-membership, international working body
• Established in Sept 2005
• Participants include DDBJ, EMBL, GenBank, Sanger,
JCVI, JGI, EBI and a range of US, UK and EU research
institutions
• Organized a series of workshops

2 2
http://gensc.org and http://gensc.org/gc_wiki/index.php/GSC_Membership

Minimum Information about a Genome Sequence
(MIGS) Specification

MIGS extends what DDBJ/EMBL/GenBank request
upon submission of a genome sequence
• Examples:
 Description of geographic location of a sample and
habitat
 “Minimum Information about a Metagenomic Sequence”
(MIMS)
– Temperature
– pH
 Description of sequence generation
– Sequencing method
– Assembly method

3 3
Field et al. Nat Biotechnol. 2008

MIGS Checklist 2.0

4 4

MIGS Checklist 2.0

M = mandatory

5 5

Software Development for MIGS/MIMS

Mechanisms for
achieving compliance
are needed:
• Such mechanisms
involve
 an appropriate reporting
structure for capturing
and exchanging data,
 software,
 databases
 and controlled
vocabularies and/or
ontologies for defining
the terms used in the
annotations.

6


Mechanisms for Supporting Projects:
achieving compliance • Habitat-Lite (Ontology
are needed: specification)
• Such mechanisms
involve
 software,
 databases
 and controlled
vocabularies and/or
annotations.

7


• Such mechanisms • Genomic Rosetta Stone
involve (Identifier Mapping)
 software,
 databases
 and controlled
vocabularies and/or
annotations.

8


 an appropriate reporting • GCDML (MIGS/MIMS
and exchanging data, specification in XML)
 software,
 databases
 and controlled
vocabularies and/or
annotations.

9


 an appropriate reporting • GCDML (MIGS/MIMS
and exchanging data, specification in XML)
 software, • Genomes Catalogue
 databases (Database and Web
 and controlled Server)
vocabularies and/or
annotations.

10

Aquatic Aquatic: Freshwater Acquatic: Marine Terrestrial Air Fossil Food Organism-Associated Extreme Habitat Other

Habitat-Lite (= EnvO-Lite)
Easy-to-use (small) set of terms
• Captures high-level information about habitat
• Derived from the Environment Ontology (EnvO).

Meet the needs of multiple users
• Annotators, database providers, biologists, and
bioinformaticians alike who need to search and
employ such data in comparative analyses.

Hirschman et al. OMICS. 2008 11 11

Habitat-Lite

1. Level 2. Level
Aquatic soil
Aquatic: Freshwater sediment
Aquatic: Marine sludge
Terrestrial waste water
Air hot spring
Fossil hydrothermal vent
Food biofilm
Organism-Associated microbial mat
Extreme Habitat
Other

< 20 terms

Hirschman et al. OMICS. 2008 12 12

Habitat-Lite applied

http://www.megx.net/genomes 13 13

Genomic Rosetta Stone (GRS)
Create a unified mapping between different genomic
resources
Improve navigation across these resources
Enable the integration of this information in the near
future.

Van Brabant et al. OMICS. 2008 14 14

Enable the integration of this information in the near
future


Genomic Contextual Data
Markup Language (GCDML)

An Extensible Markup Language (XML)

Aim
• Implement MIGS/MIMS
• Provide even more descriptors
• Facilitate exchange and integration of genomic data

Kottmann et al. OMICS. 2008 17 17

GCDML Example (excerpt)

<gcdml:originalSample>
<gcdml:physicalMaterial>
<gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime>

<gcdml:samplePointLocation>
<gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord>
<gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString>
<gcdml:pos2D>54.329 10.149</gcdml:pos2D>
<gcdml:determinationMethod>derived from literature</gcdml:determinationMethod>
</gcdml:samplePointLocation>

<gcdml:marineHabitat>
<gcdml:waterBody>
<gcdml:depth>
<gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure>
</gcdml:depth>
</gcdml:waterBody>
</gcdml:marineHabitat>

<gcdml:materialType>seawater</gcdml:materialType>
<gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount>
</gcdml:physicalMaterial>
</gcdml:originalSample> 18
Kottmann et al. OMICS. 2008 18




<gcdml:waterBody>
<gcdml:depth>
</gcdml:depth>
</gcdml:waterBody>


Genome Catalogue
Online system for capturing MIGS/MIMS compliant
reports

Field et al. Nature 2008 21 21

Genome Catalogue
Requirements
• A Rich toolkit/user-friendly
• Designed to give credit to all contributors
• XML-based (GCDML)
 Able to maintain all versions of GCDML schemas
• Web services-based
 Supporting the automated exchange of content
• Serve as the international GCAT identifier authority
• Comprehensive
 Containing reports for all taxa and metagenomes
• Ontology-supportive
• Shared by the GSC

22 22

Current Status
We have specifications:
• MIGS/MIMS
• Habitat-Lite
• Genomic Rosetta Stone
Work on supporting software is ongoing:
• Genomes Catalogue is in prototype status
• Funding
 This is a long-term endeavour that can not be done on a
voluntary basis

23 23

Disscusion
Need of software for:
• Creation of MIGS/MIMS data
• Storage
• Analysis
Expand standardization efforts to
• Software specification/development
• Work on a standardized genomic data management
architecture / cyberinfrastructure
Data intensive science is successful if it works
towards one community with one vision
• World Wide Genomics project

24 24

Acknowledgements

All Members of GSC incl.
 Dawn Field
 Peter Sterk
 Saul Kravitz
 Tanya Gray

Megx.net team
 Frank Oliver Glöckner
 Ivaylo Kostadinov
 Melissa Beth Duhaime
 Pier Luigi Buttigieg
 Wolfgang Hankeln
 Pelin Yilmaz

25

END

Looking forward to the discussion

Join the GSC
http://gensc.org

26 26

Software Development by the Genomics Standards Consortium

Recommended

Recommended

More Related Content

Similar to Software Development by the Genomics Standards Consortium

Similar to Software Development by the Genomics Standards Consortium (20)

Recently uploaded

Recently uploaded (20)

Software Development by the Genomics Standards Consortium