Phyloinformatics VoCamp

427 views

Published on

Report to the 2009 TDWG Conference in Montpellier, France, about the Phyloinformatics VoCamp that we ran just prior to and into the beginning of the conference. Full details about the VoCamp are here:

http://www.evoio.org/wiki/VoCamp1

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
427
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Rapidly emerging and changing requirements.\nIs a monolithic domain ontology better or worse than smaller modular reference ontologies.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Phyloinformatics VoCamp

    1. 1. Phyloinformatics VoCamp Hilmar LappNational Evolutionary Synthesis Center (NESCent) TDWG Conference 2009, Montpellier http://evoio.org/wiki/VoCamp1
    2. 2. AcknowledgmentsOrganizing Committee: 30 Participants from 7 countries, Arlin Stoltzfus 28 institutions, and (Chair) >40 projects Nico Cellinese Karen Cranston Sponsors: Hilmar Lapp NESCent Sheldon McKay TDWG Enrico Pontelli LIRMMThe VoCamp is a Phylogenetics Standards InterestGroup activity.
    3. 3. Evolutionary InformaticsWorking Group (2006-2009)
    4. 4. Evolutionary Database Interoperability Hackathon NESCent, March 2009
    5. 5. NSF INTEROP Proposal*: A network for enabling community-driven standards to link evolutioninto the global web of data (EvoIO) (*) submitted July 2009, currently under review
    6. 6. MotivationIntegrating data with the evolution ofhistorical and extant biodiversity is agrand challenge.Data interoperability rests on formalized,shared vocabularies.Such ontologies have emerged as a key gap,and developing them is a community-basedproject.Hackathons have been highly successful atcollaborative development of open &standards-supporting software.
    7. 7. Phyloinformatics VoCamp: The eventHands-on, face-to-facecollaborationParticipants: 1/2invited, 1/2 throughopen CfPInterdisciplinarycomposition fosterscross-pollination,learning4 days total, 2 segmentsSelf-organized into 6subgroups
    8. 8. PublishingTaxonomies SubgroupGoal: Determine RDF modeling rules that bestsupport reuse and inference over biologicalclassifications published as linked-data RDF.Outcomes: Illuminated modeling taxa as individuals versus as classes. Explored the problems with expressing and inferring monophyly in OWL-DL. Conceptual model of inferring the species of specimens based on taxonomic definitions.
    9. 9. Class of things identifiable to any taxon that has been named by Class of all name 1 taxon 2Class of all organisms taxon 1organisms equivalent classes class of annotation specimens that are correctly identifiable as taxon 1 taxon 2 rdf:type belonging to specimen 1 logical logical taxon 2 (type?) rdf:typeindividual individual describes is correctly identifiable as definition of definition of cites taxon 1 taxon 2 specimen 2 SpeciesID is defined by is defined by has been tagged participates in name 1 identification process using taxonomic definitions agent (expert, etc.)
    10. 10. Taxonomic Reasoning SubgroupGoal: Use-case questions requiring inference overphylogenies and related data, and how to answer thosewith data, ontologies and reasoners.Outcomes: Core question: Location -> species present there -> traits of those. Most recent ancestor -> all descendants -> traits of those. Requires geo-referenced species observations, trait ontology with subsumption hierarchy, phylogeny. Many lessons. E.g., real data for this are hard to come by. Iterative development rested on interdisciplinary group.
    11. 11. IntegratingOntologies SubgroupGoal: Determine best practices for thebuilding, maintenance and integration ofontologies shared across domains.Outcomes: Use-case query: In which environments do we find social tuco-tucos? (Integrates DarwinCore records with phylogenetic character state data.) SPARQL query representing the above. Recommendations for ontology development and annotation based on encountered obstacles.
    12. 12. Triplestore Subgroup Goal: Geospatial reasoning over GBIF occurrence data integrated with the IUCN Redlist, using RDF and Franz’ AllegroGraph. Outcomes: Worked out many bug and data loading issues with Franz. Identified RDF parsing and spec issues of the RDF data sources.
    13. 13. Phyloreferencing SubgroupGoal: Define syntax, semantics, and queryexpressions for nodes (and their subtrees)defined by phylogenetic ancestry.Outcomes: 3 principal use-case queries Phyloreference expressions for each Vocabulary for query and specifier semantics started (needs definitions) PhyloWS query specification (CQL-based) started
    14. 14. Phyloreferencing: ExampleGive me a subtree for my group of interest.E.g., Magnoliaceae from a tree of all plants.Formally: the clade originating with the mostrecent common ancestor of S1,..., and Sn,with n≥2.Sn are node specifiers: e.g., taxon name,taxonID, specimen, sequence accessionPhyloreference: <S1&...&SnSpecifier and query resolution semantics aredefined by vocabulary terms.
    15. 15. SADI Subgroup Goal: Use the SADI* web service framework to link species occurrence data from GBIF to molecular data in UniProt. Outcomes: Used taxon names from UniProt to integrate GBIF occurrence records. Created web services that enable locality queries on UniProt through SPARQL(*) See M. Wilkinson’s Wild Ideas presentation this afternoon.
    16. 16. SADIexample
    17. 17. Lessons learntBootcamps help tremendously, but onlywith the right level of detail. An expertin the group can be as or more effective.Good use-cases take much more time thanyou think.Real data is much less available than youthink. Use made-up data for learning.
    18. 18. Non-Tangible Outcomes “I learned a lot.” “We have been talking about possible collaborations and grant proposals.” Shared goals and cohesion across communities: e.g., DwC, CDAO, TDWG, GCP Tried to integrate data beyond own domains. E.g., UniProt and GBIF
    19. 19. VoCamp and TDWGMore than 10 participants who would nothave come otherwise experienced TDWG.Future such events need bettercoordination with program committee.Can we leverage TDWG to organize a 2010follow-up event?How do we sustain future such events atTDWG? Can TDWG help with securing funding?
    20. 20. URLsPhyloinformatics VoCamp:http://evoio.org/wiki/VoCamp1Evolutionary Informatics WorkingGroup: http://evoinfo.nescent.orgNESCent-sponsored hackathons:http://hackathon.nescent.orgEvoIO Interoperability network (NSFproposal under review):http://evoio.org

    ×