<XML>                     Pierre Lindenbaum             http://plindenbaum.blogspot.com         @yokofakun(http://twitter....
Extensible Markup Language
Machine Readeable
Human Readeable
DOM
... not alwaysartOfLineage></rdf:Description><rdf:Descriptiong/taxonomy/12292"><rdf:typerdf:resource="http:/ankrdf:resourc...
Just a format
*.txtPMID-   16381885OWN -   NLMSTAT-   MEDLINEDA -    20051229DCOM-   20060228LR -    20091118IS -    1362-4962 (Electron...
*.xml<?xml version="1.0"?><!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2008//EN<PubmedArticle...
*.json{ "header": { "type": "efetch.pubmed", "version":"0.3" }, "result": [ { "medlinecitation": { "pmid": { "version": "1...
XML namespace<my-database>                       <my-database  <record>                           xmlns="http://mydatabase...
xmllint
xsltproc
Parsing
DOMElement root = document.getDocumentElement();for (Node item=root.getFirstChild();        item!=null;        item=item.g...
StAxpublic interface XMLStreamReader {  public int next();  public boolean hasNext() ;  public String getText();  public S...
SAXpublic interface ContentHandler    {    public void startDocument () ;    public void endDocument();    public void sta...
XPath<?xml version="1.0" encoding="UTF-8"?>               $ xmllint -xpath <genes>                                        ...
XInclude<?xml version="1.0" encoding="UTF-8"?><genes xmlns:xi="http://www.w3.org/2001/XInclud  <gene id="1">    <name>Gene...
XHTML
SVG<svg xmlns="http://www.w3.org/2000/svg" width=300px height=300px><circle cx=120 cy=150 r=60 style=fill: gold; /><polyli...
XSL-FO<?xml version="1.0" encoding="ISO-8859-1"?><fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"><fo:layout-master-s...
RDF<?xml version="1.0" encoding="UTF-8"?><rdf:RDF (...)><rdf:Description rdf:about="http://…/isbn/20203    <f:titre xml:la...
RDF
SOAP<?xml version="1.0" encoding="UTF-8"?>    <SOAP-ENV:Envelope (...)>    <SOAP-ENV:Body>    <r:queryPathwaysForReference...
WSDL(...)  <wsdl:message name="getEvsData">    <wsdl:part element="tns:getEvsData" name="p    </wsdl:part>  </wsdl:message...
WSDL$ wsimport  "http://evs.gs.washington.edu/wsEVS/EVSDataQueparsing WSDL...Generating code...Compiling code...
WSDL$ more ./edu/washington/gs/evs/webservice/Locus.javapackage edu.washington.gs.evs.webservice;(...)@XmlAccessorType(Xml...
Well formed..<a><b>c</a></b>
Validated (DTD)$ cat genes1.dtd<!ELEMENT   genes (gene+)><!ELEMENT   gene ((name+),sequence)><!ELEMENT   name (#PCDATA)><!...
DTD/JAXB : no need to create a parser$ xjc genes1.xsd$ xjc -dtd genes1.dtdparsing a schema...compiling a schema...generate...
Validated (XSD)<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >   <xsd:com...
Validated (XSD)$ xmllint --noout  --schema genes1.xsd  genes1.xmlgenes1.xml validates
XSD/JAXB : no need to create a parser$ xjc genes1.xsdparsing a schema...compiling a schema...generated/Gene.javagenerated/...
XSLT
XSLT (text)<?xml version=1.0 encoding="ISO-8859-1"?><xsl:stylesheet                                $ xsltproc   genes2txt....
XSLT (html)<?xml version=1.0 encoding="ISO-8859-1"?><xsl:stylesheet                                    $ xsltproc xmlns:xs...
XSLT Embedded<?xml-stylesheet type="text/xsl" href="genes2html.xsl"?>
XSLT (xml)<?xml version=1.0 encoding="ISO-8859-1"?><xsl:stylesheetxmlns:xsl=http://www.w3.org/1999/XSL/Transformxmlns="htt...
END
Photos from wikipedia and W3C.
XML for bioinformatics
XML for bioinformatics
Upcoming SlideShare
Loading in...5
×

XML for bioinformatics

823

Published on

My short course about XML and bioinformatics. January 2013.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
823
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

XML for bioinformatics

  1. 1. <XML> Pierre Lindenbaum http://plindenbaum.blogspot.com @yokofakun(http://twitter.com/yokofakun) INSERM-UMR1087 Nantes January 2013https://github.com/lindenb/courses/tree/master/about.xml
  2. 2. Extensible Markup Language
  3. 3. Machine Readeable
  4. 4. Human Readeable
  5. 5. DOM
  6. 6. ... not alwaysartOfLineage></rdf:Description><rdf:Descriptiong/taxonomy/12292"><rdf:typerdf:resource="http:/ankrdf:resource="http://purl.uniprot.org/core/Sttp://www.w3.org/2001/XMLSchema#boolean">true</><scientificName>Nicotianavelutinamosaicvirus</commonName><hostrdf:resource="http://purl.uniprClassOfrdf:resource="http://purl.uniprot.org/tadatatype="http://www.w3.org/2001/XMLSchema#boolescription><rdf:Descriptionrdf:about="http://pudf:typerdf:resource="http://purl.uniprot.org/cop://purl.uniprot.org/core/Species"/><scientificme><rdfs:subClassOfrdf:resource="http://purl.unfLineagerdf:datatype="http://www.w3.org/2001/XMeage></rdf:Description><rdf:Descriptionrdf:abou
  7. 7. Just a format
  8. 8. *.txtPMID- 16381885OWN - NLMSTAT- MEDLINEDA - 20051229DCOM- 20060228LR - 20091118IS - 1362-4962 (Electronic)IS - 0305-1048 (Linking)VI - 34IP - Database issueDP - 2006 Jan 1TI - From genomics to chemical genomics: new developments in KEGG.PG - D354-7AB - The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the orga and their interactions with the environment, as well as for medical, industr and other practical applications. The KEGG resource (http://www.genome.jp/ke provides a reference knowledge base for linking genomes to biological system categorized as building blocks in the genomic space (KEGG GENES) and the che space (KEGG LIGAND), and wiring diagrams of interaction networks and reactio networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally a to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process bas the hierarchically structured knowledge about the genomic, chemical and netw spaces. In accordance with the new chemical genomics initiatives, the scope
  9. 9. *.xml<?xml version="1.0"?><!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2008//EN<PubmedArticleSet><PubmedArticle> <MedlineCitation Status=MEDLINE Owner=NLM> <PMID Version=1>16381885</PMID> <DateCreated> <Year>2005</Year> <Month>12</Month> <Day>29</Day> </DateCreated> <DateCompleted> <Year>2006</Year> <Month>02</Month> <Day>28</Day> </DateCompleted> <DateRevised> <Year>2009</Year> <Month>11</Month> <Day>18</Day> </DateRevised> <Article PubModel=Print> <Journal> <ISSN IssnType=Electronic>1362-4962</ISSN> <JournalIssue CitedMedium=Internet>
  10. 10. *.json{ "header": { "type": "efetch.pubmed", "version":"0.3" }, "result": [ { "medlinecitation": { "pmid": { "version": "1", "value": "17284678" }, "datecreated": { "year": "2007", "month": "03", "day": "02" }, "datecompleted": { "year": "2007", "month":"04", "day": "05" }, "daterevised": { "year": "2009", "month": "11", "day": "18" }, "article": { "journal": { "issn": { "issntype": "Print", "value":"1088-9051" }, "journalissue": { "citedmedium": "Print", "volume": "17", "issue": "3", "pubdate": ["2007", "Mar" ] }, "title": "Genome research", "isoabbreviation": "Genome Res." }, "articletitle": "Sequencing and analysis of chromosome 1 of Eimeria tenella reveals a unique segmental organization.", "pagination": [ "311-9" ], "abstract": { "abstracttexts": [ { "value": "Eimeria tenella is an intracellular protozoan parasite that infects the intestinal tracts of domestic fowl and causes coccidiosis, a serious and sometimes lethal enteritis. Eimeria falls in the same phylum (Apicomplexa) as several human and animal parasites such as Cryptosporidium, Toxoplasma, and the malaria parasite, Plasmodium. Here we report the sequencing and analysisof the first chromosome of E. tenella, a chromosome believed to carry loci associated with drug resi
  11. 11. XML namespace<my-database> <my-database <record> xmlns="http://mydatabase.org" <title>Record1</title> xmlns:h="http://www.w3.org/1999/xhtml"> <html> <record> <head> <title>Record1</title> <title>hello</title> <h:html> </head> <h:head> <body> <h:title>hello</title> <h1>Hello</h1> </h:head> </body> <h:body> </html> <h:h1>Hello</h:h1> </record> </h:body></my-database> </h:html> </record> </my-database>
  12. 12. xmllint
  13. 13. xsltproc
  14. 14. Parsing
  15. 15. DOMElement root = document.getDocumentElement();for (Node item=root.getFirstChild(); item!=null; item=item.getNextSibling()){ if (item.getNodeType()==Node.ELEMENT_NODE) { System.out.println( ((Element)item).getAttribute("id")); }}
  16. 16. StAxpublic interface XMLStreamReader { public int next(); public boolean hasNext() ; public String getText(); public String getLocalName(); public String getNamespaceURI(); // ...other methods not shown}
  17. 17. SAXpublic interface ContentHandler { public void startDocument () ; public void endDocument(); public void startElement(String name, Attri public void endElement (String name); public void characters (char ch[], int star }
  18. 18. XPath<?xml version="1.0" encoding="UTF-8"?> $ xmllint -xpath <genes> /genes/gene[1]/name[2]/text() <gene id="1"> genes1.xml <name>Gene1</name> <name>gene-1</name> gene-1 <sequence>ATAATGCTAGCTAGCTATCGAATG</sequence> </gene> $ xmllint -xpath <gene id="2"> /genes/gene[1]/name[2] <name>Gene2</name> genes1.xml <name>gene-2</name> <sequence>AATTGCGATTCATCGATGCTATA</sequence> <name>gene-1</name> </gene></genes> $ xmllint -xpath count(/genes/gene) genes1.xml 2 $ xmllint -xpath /genes/gene[@id=2]/name[1]/text() genes1.xml Gene2
  19. 19. XInclude<?xml version="1.0" encoding="UTF-8"?><genes xmlns:xi="http://www.w3.org/2001/XInclud <gene id="1"> <name>Gene1</name> <name>gene-1</name> <sequence><xi:include href="sequence.txt" parse="text" /></sequence> </gene> <xi:include href="gene2.xml" parse="xml"/></genes>
  20. 20. XHTML
  21. 21. SVG<svg xmlns="http://www.w3.org/2000/svg" width=300px height=300px><circle cx=120 cy=150 r=60 style=fill: gold; /><polyline points=120 30, 25 150, 290 150 stroke-width=4 stroke=brown style=fill: none; /><polygon points=210 100, 210 200, 270 150 style=fill: lawngreen; /><text x=60 y=250 fill=blue>Hello, World!</text></svg>
  22. 22. XSL-FO<?xml version="1.0" encoding="ISO-8859-1"?><fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"><fo:layout-master-set> <fo:simple-page-master master-name="A4"> <!-- Page template goes here --> </fo:simple-page-master></fo:layout-master-set><fo:page-sequence master-reference="A4"> <!-- Page content goes here --></fo:page-sequence></fo:root>
  23. 23. RDF<?xml version="1.0" encoding="UTF-8"?><rdf:RDF (...)><rdf:Description rdf:about="http://…/isbn/20203 <f:titre xml:lang="fr">Le palais des mirroi <f:original rdf:resource="http://…/isbn/000</rdf:Description></rdf:RDF>
  24. 24. RDF
  25. 25. SOAP<?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope (...)> <SOAP-ENV:Body> <r:queryPathwaysForReferenceIdentifiers> <r:referenceIdentifiers> <soapenc:string>Q9Y266</soapenc:string> <soapenc:string>P17480</soapenc:string> <soapenc:string>P2048</soapenc:string> </r:referenceIdentifiers> </r:queryPathwaysForReferenceIdentifiers> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
  26. 26. WSDL(...) <wsdl:message name="getEvsData"> <wsdl:part element="tns:getEvsData" name="p </wsdl:part> </wsdl:message> <wsdl:message name="getEvsDataResponse"> <wsdl:part element="tns:getEvsDataResponse" </wsdl:part> </wsdl:message> <wsdl:portType name="DataQuery"> <wsdl:operation name="getEvsData"> <wsdl:input message="tns:getEvsData" name </wsdl:input> <wsdl:output message="tns:getEvsDataRespo
  27. 27. WSDL$ wsimport "http://evs.gs.washington.edu/wsEVS/EVSDataQueparsing WSDL...Generating code...Compiling code...
  28. 28. WSDL$ more ./edu/washington/gs/evs/webservice/Locus.javapackage edu.washington.gs.evs.webservice;(...)@XmlAccessorType(XmlAccessType.FIELD)@XmlType(name = "locus", propOrder = { "geneName", "chromosome", "strand", "mrnaAccession", "geneId", "txStart", "txEnd", "keggPathwayIds"})public class Locus { protected String geneName; protected String chromosome; protected String strand; protected String mrnaAccession; protected int geneId; protected int txStart; protected int txEnd; @XmlElement(nillable = true) (...)
  29. 29. Well formed..<a><b>c</a></b>
  30. 30. Validated (DTD)$ cat genes1.dtd<!ELEMENT genes (gene+)><!ELEMENT gene ((name+),sequence)><!ELEMENT name (#PCDATA)><!ELEMENT sequence (#PCDATA)><!ATTLIST gene id CDATA #REQUIRED>$ xmllint --dtdvalid genes1.dtd genes1.xml
  31. 31. DTD/JAXB : no need to create a parser$ xjc genes1.xsd$ xjc -dtd genes1.dtdparsing a schema...compiling a schema...generated/Gene.javagenerated/Genes.javagenerated/Name.javagenerated/ObjectFactory.java
  32. 32. Validated (XSD)<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:complexType name="Genes"> <xsd:sequence> <xsd:element name="gene" type="Gene" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="Gene"> <xsd:sequence> <xsd:element name="name" maxOccurs="unbounded" type="xsd:string"/> <xsd:element name="sequence" type="xsd:string"/> </xsd:sequence> <xsd:attribute name="id" use="required" type="xsd:int"/> </xsd:complexType> <xsd:element type="Genes" name="genes"/> </xsd:schema>
  33. 33. Validated (XSD)$ xmllint --noout --schema genes1.xsd genes1.xmlgenes1.xml validates
  34. 34. XSD/JAXB : no need to create a parser$ xjc genes1.xsdparsing a schema...compiling a schema...generated/Gene.javagenerated/Genes.javagenerated/ObjectFactory.java
  35. 35. XSLT
  36. 36. XSLT (text)<?xml version=1.0 encoding="ISO-8859-1"?><xsl:stylesheet $ xsltproc genes2txt.xslxmlns:xsl=http://www.w3.org/1999/XSL/Transformversion=1.0> >id:1|Gene1<xsl:output method=text/> ATAATGCTAGCTAGCTATCGAATG<xsl:template match="/"><xsl:apply-templates select="genes"/></xsl:template> >id:2|Gene2<xsl:template match="genes"> AATTGCGATTCATCGATGCTATA<xsl:apply-templates select="gene"/></xsl:template><xsl:template match="gene"><xsl:text>&gt;id:</xsl:text><xsl:value-of select="@id"/><xsl:text>|</xsl:text><xsl:value-of select="name[1]"/><xsl:text></xsl:text><xsl:value-of select="sequence"/><xsl:text></xsl:text></xsl:template></xsl:stylesheet>
  37. 37. XSLT (html)<?xml version=1.0 encoding="ISO-8859-1"?><xsl:stylesheet $ xsltproc xmlns:xsl=http://www.w3.org/1999/XSL/Transformversion=1.0> genes2html.xsl <xsl:output method=html/> genes1.xml<xsl:template match="/"><html><body><xsl:apply-templates select="genes"/></body></html> <html><body></xsl:template> <h1>2 genes</h1><xsl:template match="genes"><h1> <h2>&gt;id:1|Gene1</h2> <xsl:value-of select="count(gene)"/> genes</h1><xsl:apply-templates select="gene"/> <pre>ATAATGCTAGCTAGCTATCG</xsl:template> <h2>&gt;id:2|Gene2</h2><xsl:template match="gene"><h2> <pre>AATTGCGATTCATCGATGCT<xsl:text>&gt;id:</xsl:text><xsl:value-of select="@id"/><xsl:text>|</xsl:text> </body></html><xsl:value-of select="name[1]"/></h2><pre><xsl:value-of select="sequence"/></pre></xsl:template>
  38. 38. XSLT Embedded<?xml-stylesheet type="text/xsl" href="genes2html.xsl"?>
  39. 39. XSLT (xml)<?xml version=1.0 encoding="ISO-8859-1"?><xsl:stylesheetxmlns:xsl=http://www.w3.org/1999/XSL/Transformxmlns="http://www.w3.org/2000/svg"xmlns:math="http://exslt.org/math"version="1.0"><xsl:output method=xml/><xsl:template match="/"><svg width="500" height="500" version=1.0><xsl:apply-templates select="genes"/></svg></xsl:template><xsl:template match="genes"><xsl:apply-templates select="gene[1]"/></xsl:template><xsl:template match="gene"><text x="250" y="250"> <xsl:value-of select="name[1]"/></text><xsl:call-template name="drawseq"> <xsl:with-param name="i" select="number(1.0)"/> <xsl:with-param name="s" select="sequence"/></xsl:call-template></xsl:template><xsl:template name="drawseq"> <xsl:param name="i"/> <xsl:param name="s" /> <xsl:variable name="L" select="string-length($s)"/><text> <xsl:variable name="angle" select="$i * ( (2.0*3.14159) div $L )"/> <xsl:attribute name="x"><xsl:value-of select="250+200*math:cos( $angle )"/></xsl:attribute> <xsl:attribute name="y"><xsl:value-of select="250+200*math:sin( $angle )"/></xsl:attribute> <xsl:value-of select="substring($s,$i,1)"/></text><xsl:if test="$i+1 &lt;= $L"><xsl:call-template name="drawseq"> <xsl:with-param name="i" select="1 + $i"/> <xsl:with-param name="s" select="$s"/></xsl:call-template></xsl:if></xsl:template></xsl:stylesheet>
  40. 40. END
  41. 41. Photos from wikipedia and W3C.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×