• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NeXML - phylogenetic data as XML
 

NeXML - phylogenetic data as XML

on

  • 1,348 views

NeXML is an exchange standard for representing phyloinformatic data — inspired by the commonly used NEXUS format, but more robust and easier to process.

NeXML is an exchange standard for representing phyloinformatic data — inspired by the commonly used NEXUS format, but more robust and easier to process.

Statistics

Views

Total Views
1,348
Views on SlideShare
1,348
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    NeXML - phylogenetic data as XML NeXML - phylogenetic data as XML Presentation Transcript

    • Nexml A future data exchange standard for phylogenetics Rutger Vos
    • Increased automation in evolutionary informatics is hampered by poorly defined “standards” Introduction (1/7) The problem Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Addressing interoperability problems by coding our way out of it Introduction (2/7) EvoInfo.nescent.org interests Syntax: Nexml Semantics: CDAO Transport: PhyloWS Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Introduction (3/7) This subproject’s mission
        • To create a file format like nexus* , but:
          • Fix (some) problems with nexus
          • Give access to data at higher level
          • Be extensible
          • Expose data to xml goodies
          • * Maddison, Swofford and Maddison , 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol. 46 (4):590-621
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Introduction (4/7) Nexus issues
        • Hard/impossible to validate
        • No explicit versions
          • Nothing ever deprecated
        • No public extensions
          • Leads to hacks such as ‘mixed’ data, ‘hot comments’
          • Phylogenetics post-’80s in private blocks 
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources https://www.nescent.org/wg_evoinfo/NEXUS_Problems
    • Introduction (5/7) Parsing plain text versus parsing XML
        • Processing nexus data involves lexing + parsing + processing
        • XML allows choosing a parser library , data can be processed as a structure that hides tokenization issues
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Introduction (6/7) Extensibility
        • ‘ Extensible ’ file format should provide the ability to:
          • Define new data types that implement described ‘interfaces’
          • Attach typed data structures to core types
          • Attach custom XML
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Introduction (7/7) XML goodies
        • Large stack of off-the-shelf tools:
          • XML parser libraries
          • Web service toolkits
          • Native XML databases
          • Editors / IDEs
          • Serialization / data binding tools
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Design (1/5) Design principles
        • Re-use of prior art
        • Follow design patterns
        • Referencing
        • Verbose and compact representations
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Design (2/5) Re-use of prior art
        • Generic key/value attachments using RDFa
        • Trees and networks following graphml
        • General file structure following nexus concepts, i.e. blocks that reference each other
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Design (3/5) XML design patterns
        • http://www.xmlpatterns.com
        • “ Declare before use”
        • “ Metadata first”
        • “ Venetian blinds”
        • Abstract inheritance through extension, concrete inheritance through restriction
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Design (4/5) Inheritance IDTagged (required id attribute) Labelled (optional label attribute) Annotated (optional dict elements) Base (optional base/lang/href attributes) AbstractElement (in root schema) ConcreteElement (in instance document) extends extends extends extends restricts Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Design (5/5) Referencing
        • Elements sometimes refer to other elements, much like in nexus
        • In nexml, elements refer to the id of other elements by the name of the referenced element:
      •   <otu id=&quot;t1&quot;/>
      •   <!-- referenced later: -->
      •   <node id=&quot;n1&quot; otu=&quot;t1&quot;/>
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Implementation (1/6) Approach
        • Schema design
        • Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc.
        • Processors (perl, java, python, c++, javascript, VB) development in parallel
        • Experiments with xml tools (ws, db, data binding tools)
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Implementation (2/6) Entity relationships Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach     ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Implementation (3/6) inheritance tree for elements Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Implementation (4/6) anatomy of a “block” <characters      id=&quot;c1&quot;      xsi:type=&quot;nex:DnaSeqs&quot;      otus=&quot;t1&quot;> </characters> <meta id=&quot;m1&quot; datatype=&quot;xsd:string” xsi:type=&quot;nex:LiteralMeta” property=&quot;dwc:catalogNumber&quot; content=&quot;12345&quot;/> Contents… Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Implementation (5/6) Character Classes RestrictionCells RestrictionSeqs Restriction ContinuousCells ContinuousSeqs Continuous StandardCells StandardSeqs Standard ProteinCells ProteinSeqs Protein RnaCells RnaSeqs RNA DnaCells DnaSeqs DNA Cells Sequence Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Implementation (6/6) Tree Classes IntTree FloatTree Tree IntNetwork FloatNetwork Network Int Float Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Current status (1/4) Schema blocks
        • Done:
          • OTUs
          • characters : dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose)
          • trees : graphml trees and networks, various edge formats and rootings
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
        • Nexml parsers and writers:
          • Phenex
          • TreeBASE
          • Mesquite
          • Bio::Phylo
          • DendroPy
          • DAMBE
          • Etc.
      Current status (2/4) Parsers and writers Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
        • Included schema in soap wsdl
        • Indexed files in dbxml
        • Created large files from tolweb, rbcl
        • XInclude with tinyseq xml
        • REST service described using nexml
      Current status (3/4) Experiments Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
        • Cross-reference with glossary, ontology
        • Substitution model descriptions
        • Publish standard
        • Compact trees
        • Distances
        • Splits
      Current status (4/4) To do Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Resources
      • NeXML Base URL: http://nexml.org
        • Wiki: /wiki
        • Mailing list: /mail
        • Issue tracker: /tracker
        • SVN repository: /code
      • EvoInfo: http://evoinfo.nescent.org  
      •  
      • CDAO: http://www.evolutionaryontology.org
      Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
    • Acknowledgements
        • Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia, Chase Miller, Anurag Priyam, Jaime Huerta-Cepas, Matt Yoder, Andrew Hill, Sam Smits, Mike Keesey, Apurv Verma, Mark Jensen
        • Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison
        • Additional funding, support: NESCent, GSoC