0
NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia
Increased automation in evolutionary informatics is hampered by poorly defined “standards” Introduction (1/7) The problem ...
Addressing interoperability problems by coding our way out of it Introduction (2/7) EvoInfo   interests Syntax: NeXML Sema...
Introduction (3/7) This subproject’s mission <ul><li>To create a file format like  nexus* </li></ul><ul><ul><ul><li>* Madd...
Introduction (4/7) Nexus issues Introduction      The problem      EvoInfo interests      This subproject      Nexus issue...
Introduction (5/7) Parsing plain text versus parsing XML <ul><ul><li>Processing nexus data involves  lexing  +  parsing  +...
Introduction (6/7) Extensibility <ul><ul><li>Extensible  file format should provide the ability to:   </li></ul></ul>Intro...
Introduction (7/7) XML goodies <ul><ul><li>Large stack of off-the-shelf tools: </li></ul></ul>Introduction      The proble...
Design (1/5) Design principles <ul><ul><li>Re-use of prior art </li></ul></ul><ul><ul><li>Follow design patterns </li></ul...
Design (2/5) Re-use of prior art <ul><ul><li>Generic key/value attachments  following apple’s plist semantics: </li></ul><...
Design (3/5) XML design patterns <ul><ul><li>“ Declare before use ” </li></ul></ul>Introduction      The problem      EvoI...
Design (4/5) Inheritance IDTagged   (required id attribute) Labelled   (optional label attribute) Annotated   (optional di...
Design (5/5) Referencing <ul><ul><li>Elements sometimes  refer  to other elements, much like in nexus </li></ul></ul><ul><...
<ul><ul><li>Schema  design </li></ul></ul><ul><ul><li>Community feedback  through wiki, email, telecon, projects (evoinfo,...
Implementation (2/6)  Entity relationships Introduction      The problem      EvoInfo interests      This subproject      ...
Implementation (3/6) inheritance tree for elements Introduction      The problem      EvoInfo interests      This subproje...
Implementation (4/6)  anatomy of a “block” <characters       id=&quot;c1&quot;       xsi:type=&quot;nex:DnaSeqs&quot;     ...
Implementation (5/6) Character Classes RestrictionCells RestrictionSeqs Restriction ContinuousCells ContinuousSeqs Continu...
Implementation (6/6) Tree Classes IntTree FloatTree Tree IntNetwork FloatNetwork Network Int Float Introduction      The p...
Current status (1/4) Schema blocks <ul><ul><li>Done: </li></ul></ul><ul><ul><ul><li>OTUs </li></ul></ul></ul><ul><ul><ul><...
<ul><ul><li>Nexml parsers and writers :  </li></ul></ul><ul><ul><ul><li>mesquite  (java NeXML class libraries) </li></ul><...
<ul><ul><li>Semantic annotation  (CDAO)  using  SAWSDL </li></ul></ul>Current status (3/4) Experiments Introduction      T...
<ul><ul><li>Publish standard </li></ul></ul><ul><ul><li>More  restricted vocabulary attachments  (e.g. Darwin core, CDAO-m...
Resources <ul><li>NeXML Base URL:  http://www.nexml.org </li></ul><ul><ul><li>Wiki:  /wiki </li></ul></ul><ul><ul><li>Mail...
Acknowledgements <ul><ul><li>Contributions:  Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia </li></u...
Upcoming SlideShare
Loading in...5
×

NeXML

565

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
565
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "NeXML"

  1. 1. NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia
  2. 2. Increased automation in evolutionary informatics is hampered by poorly defined “standards” Introduction (1/7) The problem Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  3. 3. Addressing interoperability problems by coding our way out of it Introduction (2/7) EvoInfo interests Syntax: NeXML Semantics: CDAO Transport: PhyloWS Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  4. 4. Introduction (3/7) This subproject’s mission <ul><li>To create a file format like nexus* </li></ul><ul><ul><ul><li>* Maddison, Swofford and Maddison , 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol. 46 (4):590-621 </li></ul></ul></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources <ul><ul><ul><li>Fix (some) problems with nexus </li></ul></ul></ul><ul><ul><ul><li>Give access to data at higher level </li></ul></ul></ul><ul><ul><ul><li>Be extensible </li></ul></ul></ul>Expose data to xml goodies , but:
  5. 5. Introduction (4/7) Nexus issues Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources https://www.nescent.org/wg_evoinfo/NEXUS_Problems <ul><ul><li>No explicit versions </li></ul></ul><ul><ul><ul><li>Nothing ever deprecated </li></ul></ul></ul><ul><ul><li>No public extensions </li></ul></ul><ul><ul><ul><li>Leads to hacks such as ‘mixed’ data, ‘hot comments’ </li></ul></ul></ul><ul><ul><ul><li>Phylogenetics post-’80s in private blocks </li></ul></ul></ul>Hard/impossible to validate
  6. 6. Introduction (5/7) Parsing plain text versus parsing XML <ul><ul><li>Processing nexus data involves lexing + parsing + processing </li></ul></ul><ul><ul><li>XML allows choosing a parser library , data can be processed as a structure that hides tokenization issues </li></ul></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  7. 7. Introduction (6/7) Extensibility <ul><ul><li>Extensible file format should provide the ability to: </li></ul></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources Define new data types that implement described ‘interfaces’ Attach typed data structures to core types Attach custom XML
  8. 8. Introduction (7/7) XML goodies <ul><ul><li>Large stack of off-the-shelf tools: </li></ul></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources XML parser libraries Web service toolkits Native XML databases Editors / IDEs Serialization / data binding tools
  9. 9. Design (1/5) Design principles <ul><ul><li>Re-use of prior art </li></ul></ul><ul><ul><li>Follow design patterns </li></ul></ul><ul><ul><li>Referencing </li></ul></ul><ul><ul><li>Verbose and compact representations </li></ul></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  10. 10. Design (2/5) Re-use of prior art <ul><ul><li>Generic key/value attachments following apple’s plist semantics: </li></ul></ul><ul><ul><li><dict> </li></ul></ul><ul><ul><ul><li><key>prior</key> </li></ul></ul></ul><ul><ul><ul><li><float>0.78</float> </li></ul></ul></ul><ul><ul><li></dict> </li></ul></ul><ul><ul><li>Trees and networks following graphml </li></ul></ul><ul><ul><li>General file structure following nexus concepts, i.e. blocks that reference each other </li></ul></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  11. 11. Design (3/5) XML design patterns <ul><ul><li>“ Declare before use ” </li></ul></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources “ Metadata first ” “ Venetian blinds ” Abstract inheritance through extension, concrete inheritance through restriction
  12. 12. Design (4/5) Inheritance IDTagged (required id attribute) Labelled (optional label attribute) Annotated (optional dict elements) Base (optional base/lang/href attributes) AbstractElement (in root schema) ConcreteElement (in instance document) extends extends extends extends restricts Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  13. 13. Design (5/5) Referencing <ul><ul><li>Elements sometimes refer to other elements, much like in nexus </li></ul></ul><ul><ul><li>In nexml, elements refer to the id of other elements by the name of the referenced element: </li></ul></ul><ul><li>  <otu id=&quot;t1&quot;/> </li></ul><ul><li>  <!-- referenced later: --> </li></ul><ul><li>  <node id=&quot;n1&quot; otu=&quot;t1&quot;/> </li></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  14. 14. <ul><ul><li>Schema design </li></ul></ul><ul><ul><li>Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. </li></ul></ul><ul><ul><li>Processors (perl, java, python, c++, VB, JavaScript) development in parallel </li></ul></ul><ul><ul><li>Experiments with xml tools (ws, db, data binding tools) </li></ul></ul>Implementation (1/6) Approach Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  15. 15. Implementation (2/6) Entity relationships Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach     ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  16. 16. Implementation (3/6) inheritance tree for elements Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  17. 17. Implementation (4/6) anatomy of a “block” <characters      id=&quot;c1&quot;      xsi:type=&quot;nex:DnaSeqs&quot;      otus=&quot;t1&quot;> </characters> <dict> <key>desc</key> <string>description … </string> </dict> Contents… Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  18. 18. Implementation (5/6) Character Classes RestrictionCells RestrictionSeqs Restriction ContinuousCells ContinuousSeqs Continuous StandardCells StandardSeqs Standard ProteinCells ProteinSeqs Protein RnaCells RnaSeqs RNA DnaCells DnaSeqs DNA Cells Sequence Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  19. 19. Implementation (6/6) Tree Classes IntTree FloatTree Tree IntNetwork FloatNetwork Network Int Float Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  20. 20. Current status (1/4) Schema blocks <ul><ul><li>Done: </li></ul></ul><ul><ul><ul><li>OTUs </li></ul></ul></ul><ul><ul><ul><li>characters : dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) </li></ul></ul></ul><ul><ul><ul><li>trees : graphml trees and networks, various edge formats and rootings </li></ul></ul></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  21. 21. <ul><ul><li>Nexml parsers and writers : </li></ul></ul><ul><ul><ul><li>mesquite (java NeXML class libraries) </li></ul></ul></ul><ul><ul><ul><li>Bio::Phylo (BioPerl compatible) </li></ul></ul></ul><ul><ul><ul><li>pyNexml (python) </li></ul></ul></ul><ul><ul><ul><li>DAMBE (Visual Basic) </li></ul></ul></ul><ul><ul><ul><li>NCL (C++) </li></ul></ul></ul><ul><ul><ul><li>JavaScript </li></ul></ul></ul>Current status (2/4) Parsers and writers Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  22. 22. <ul><ul><li>Semantic annotation (CDAO) using SAWSDL </li></ul></ul>Current status (3/4) Experiments Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources <ul><li>Scalability: </li></ul><ul><ul><li>Indexed files in dbxml </li></ul></ul><ul><ul><li>Created large files from tolweb , rbcl </li></ul></ul><ul><ul><li>XInclude with tinyseq xml </li></ul></ul><ul><li>REST Web services: </li></ul><ul><ul><li>ToL service </li></ul></ul><ul><ul><li>validation service </li></ul></ul><ul><ul><li>nexml2json , nexus2xml </li></ul></ul><ul><ul><li>Schema inclusion in wsdl </li></ul></ul>
  23. 23. <ul><ul><li>Publish standard </li></ul></ul><ul><ul><li>More restricted vocabulary attachments (e.g. Darwin core, CDAO-mediated terms) </li></ul></ul><ul><ul><li>Substitution model descriptions </li></ul></ul><ul><ul><li>Sets (in progress, using class identifiers) </li></ul></ul><ul><ul><li>Distances </li></ul></ul><ul><ul><li>Splits </li></ul></ul>Current status (4/4) To do Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  24. 24. Resources <ul><li>NeXML Base URL: http://www.nexml.org </li></ul><ul><ul><li>Wiki: /wiki </li></ul></ul><ul><ul><li>Mailing list: /mail </li></ul></ul><ul><ul><li>Issue tracker: /tracker </li></ul></ul><ul><ul><li>SVN repository: /code </li></ul></ul><ul><li>EvoInfo: http://evoinfo.nescent.org  </li></ul><ul><li>  </li></ul><ul><li>CDAO: http://www.evolutionaryontology.org </li></ul>Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  25. 25. Acknowledgements <ul><ul><li>Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia </li></ul></ul><ul><ul><li>Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison </li></ul></ul><ul><ul><li>Additional funding, support: NESCent, GSoC </li></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×