SlideShare a Scribd company logo
1 of 25
Data Interchange-Integration
Biological XML DTD
HTML-XML
25102023
Data Integration in the Life Sciences
Much unintegrated data:
• from a variety of incompatible sources
• no standard naming convention
• each with a custom browsing and querying mechanism (no common interface)
• and poor interaction with other data sources
Approaches to Integration
• Accessing the original data sources
• Handling redundant as well as missing data
• Normalizing analytical data from different data sources
• Conforming terminology to industry standards
• Accessing the integrated data as a single logical repository
• Metadata (used to traverse domains)
XML For Bioinformatics
• Biology is a complex discipline
• Wide variety of data resources and repositories
• Biological data represented in multiple formats eg. FASTA, gff etc.
• No standard protocol exists to interrogate biological data stores
• Data Interchange
• EMBL format
• ASN.1
• XML
Why XML
• Data in incompatible formats
• Difficulties in Exchanging data
• Software and hardware independent way of sharing data
• XML used to store and display data
• With XML data available to more users
XML
• Allows uniform description of data and metadata
• Metadata described through DTDs (Document Type Definition)
• Data conforms to metadata description
• Provides open source solution for data integration between components
• Lots of support in Computer Science community (modules developed)
• XML::CGI - a module to convert CGI parameters to and from XML
• XML::DOM - a Perl extension to XML::Parser. It adds a new 'Style' to XML::Parser,called 'Dom', that allows XML::Parser to build an Object Oriented data structure with a DOM Level 1
compliant interface.
• XML::Dumper - a simple package to experiment with converting Perl data structures to XML and converting XML to perl data structures.
• XML::Encoding - a subclass of XML::Parser, parses encoding map XML files.
• XML::Generator is an extremely simple module to help in the generation of XML.
• XML::Grove - provides simple objects for parsed XML documents. The objects may be modified but no checking is performed.
• XML::Parser - a Perl extension interface to James Clark's XML parser, expat
• XML::QL - an early implementation of a note published by the W3C called "XML-QL: A Query Language for XML".
• XML::XQL - a Perl extension that allows you to perform XQL queries on XML object trees.
How the Web is
• HTML documents
• all intended for human consumption
• many generated automatically by applications
Easy to fetch any Web page, from any server, any platform
Limits of the Web
• application cannot consume HTML
• HTML wrapper technology is brittle
• need interoperability fast
Paradigm Shift on the Web
• new Web standard XML:
• XML generated by applications
• XML consumed by applications
• data exchange
• across platforms: enterprise interoperability
• across enterprises
Web: from collection of documents to data and documents
What is XML
• XML stands for eXtensible Markup Language
• XML is a markup language much like HTML
• XML was designed to store and transport data
• XML was designed to be self-descriptive
• XML is a W3C Recommendation
• It is a hierarchical data description language
• XML was designed to describe data and focus on what data is.
• Derived from SGML (Standard Generalized Markup Language), but simpler to use than
SGML
• Documents have tags giving extra information about sections of the document
• E.g. <title> XML </title> <slide> Introduction …</slide>
• Extensible, unlike HTML
• Users can add new tags, and separately specify how the tag should be handled for display
What is a DTD
• DTD stands for Document Type Definition.
• A DTD defines the structure and the legal elements and attributes of
an XML document.
• Valid XML Documents
• A "Valid" XML document is "Well Formed", as well as it conforms to
the rules of a DTD:
Features of XML
• XML is an easy and automatically parseable way to describe data
• More flexible and adaptable information identification.
• XML is extensible
How does XML differ from HTML?
• HTML is a presentation markup language – provides no information
about content.
• There is only one standard definition of all of the tags used in HTML.
• XML can define both presentation style and give information about
content.
• XML relies on custom documents defining the meaning of tags.
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases
</i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann,
1999
• <!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is a paragraph.</p>
</body>
</html>
XML
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
XML describes the content
XML separates data from html
• If you need to update a website dynamically, the kind of effort you
have to put is rigorous. But xml, since it separates data and
presentational features of that data, it is easier to update the xml file
dynamically and html takes care of how data looks.
XML Terminology
• tags: book, title, author, …
• start tag: <book>, end tag: </book>
• elements: <book>…<book>,<author>…</author>
• elements are nested
• empty element: <red></red> abbrv. <red/>
• an XML document: single root element
well formed XML document: if it has matching tags
More XML: Attributes
<book price = “500” currency = “INR”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 2017 </year>
</book>
attributes are alternative ways to represent data
XML namespace
• XML namespace is a collection of XML elements and attributes identified
by an Internationalized Resource Identifier (IRI); this collection is often
referred to as an XML "vocabulary."
• Since XML allows designers to chose their own tag names, it is possible that
two or more designers may choose the same tag names for some or all of
their elements. XML namespace solves this problem. It provides a way to
distinguish between XML elements that have the same local name but are,
in fact, from different vocabularies. This is done by associating an element
with a namespace. A namespace acts as scope for all elements associated
with it.
A minimal XML document
<?xml version=“1.0” ?>
<document name=“first”>ABC</document>
A tag
An attribute
value
Closing tag
A Piece of XML Schema
<seq id=“my_seq” name=“NUCLEAR RIBONUCLEOPROTEIN”>
<dbxref>
<database>SWISS-PROT</database>
<unique_id>P09651</unique_id>
</dbxref>
<residues type=“aa”>
SKSESPKEPEQLRKLFIGGLSFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVTYATVEEV
DAAMNARPHKVDGRVVEPKRAVSREDSQRPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIE
IMTDRGSGKKRGFAFVTFDDHDSVDKIVIQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNF
GGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRGYGS
GGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGGSGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGR
SSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF
</residues>
</seq>
Biological XML
• Some DTD’s have been proposed publicly as XML formats for biological data
• GAME (Genome Annotation Markup Elements)
• BIOML (The Biopolymer Markup Language)
• IOML (Interactive Outline Markup Language)
• BSML (Bioinformatic Sequence Markup Language)
• CML (Chemical Markup Language)
• GEML (Gene Expression Markup Language)
phyloXML: XML for evolutionary biology and comparative genomics
• http://www.phyloxml.org/
• phyloXML is an XML language designed to describe phylogenetic trees (or networks) and
associated data.
• It provides elements for commonly used features, such as taxonomic information, gene names and
identifiers, branch lengths, support values, and gene duplication and speciation events. Using these
standardized elements allows interoperability between various applications and databases.
Furthermore, both due to extensible nature of XML itself and the provision of <property> elements
by phyloXML, extensibility as well as domain specific applications are ensured.
• The structure of phyloXML is described by XML Schema Definition (XSD) language.
XML at the PDBe
• http://www.ebi.ac.uk/pdbe/docs/documentation/xml.html
• The PDBe is involved in XML at two levels.
• development of standard DTDs/XML schemae for representing
macromolecular structure and other biological data.
• For example:
• structural genomics data exchange packets (with eHTPX)
• nuclear magnetic resonance experimental information (with CCPN)
• macromolecular structure data (with RCSB)
Significance of Using XML
1. Open and extensible - XML’s one-of-a-kind open structure allows you to add other state-of-the-art elements when
needed. This means that you can always adapt your system to embrace industry-specific vocabulary.
2. It is simple to modify a DTD. The XML and DTD files are human readable and then can be easily edited by
people with only few computer skills
3. XML is Internet-oriented and has very rich capabilities for linking data
-This can be used for interconnecting databases
4. XML provides an open framework for defining standard specifications.
-This is an important point because bioinformatics clearly lacks standardization
5. XML data is self-describing. That means it contains both data and information about the data. In records of
traditional database systems, before you store data, it requires to define relational schemata, file description
tables, external data definitions etc. Where as in xml, these things are not required. Because the data itself
contains all these information.
6. XML ensures total usability of data. This is very important for seamless integration of data, as far as business
applications are concerned.
7. XML can be integrated to all the feasible data format like form text and numbers to multimedia like sound, image
to active formats like Java Applets or ActiveX Components.
8. No programming required to modify the presentation of data - One can change the look and feel of documents or
even entire websites with XSL Style Sheets without manipulating the data itself
9. Single source for distributed data - XML documents can consist of data from many different databases distributed
over multiple servers. In other words: With XML the entire World Wide Web is transformed into a single all-
encompassing database.
10. Future-oriented technology - XML is the endorsed industry standard of the World Wide Web Consortium (W3C)
and is supported by all leading software providers. Furthermore, XML is also the standard today in an increasing
number of other industries, for example, health care.

More Related Content

Similar to Data interchange integration, HTML XML Biological XML DTD (20)

Introduction to XML.ppt
Introduction to XML.pptIntroduction to XML.ppt
Introduction to XML.ppt
 
Xml
XmlXml
Xml
 
XML1.pptx
XML1.pptxXML1.pptx
XML1.pptx
 
XML
XMLXML
XML
 
Xml
XmlXml
Xml
 
Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)
 
Xml
XmlXml
Xml
 
Xml schema
Xml schemaXml schema
Xml schema
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
Xml
XmlXml
Xml
 
XML notes.pptx
XML notes.pptxXML notes.pptx
XML notes.pptx
 
Xml
XmlXml
Xml
 
IT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notesIT6801-Service Oriented Architecture- UNIT-I notes
IT6801-Service Oriented Architecture- UNIT-I notes
 
Xml unit1
Xml unit1Xml unit1
Xml unit1
 
23xml
23xml23xml
23xml
 
Oracle soa xml faq
Oracle soa xml faqOracle soa xml faq
Oracle soa xml faq
 
XMl
XMlXMl
XMl
 
XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7
 
00 introduction
00 introduction00 introduction
00 introduction
 
Full xml
Full xmlFull xml
Full xml
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Data interchange integration, HTML XML Biological XML DTD

  • 2. Data Integration in the Life Sciences Much unintegrated data: • from a variety of incompatible sources • no standard naming convention • each with a custom browsing and querying mechanism (no common interface) • and poor interaction with other data sources
  • 3. Approaches to Integration • Accessing the original data sources • Handling redundant as well as missing data • Normalizing analytical data from different data sources • Conforming terminology to industry standards • Accessing the integrated data as a single logical repository • Metadata (used to traverse domains)
  • 4. XML For Bioinformatics • Biology is a complex discipline • Wide variety of data resources and repositories • Biological data represented in multiple formats eg. FASTA, gff etc. • No standard protocol exists to interrogate biological data stores • Data Interchange • EMBL format • ASN.1 • XML
  • 5. Why XML • Data in incompatible formats • Difficulties in Exchanging data • Software and hardware independent way of sharing data • XML used to store and display data • With XML data available to more users
  • 6. XML • Allows uniform description of data and metadata • Metadata described through DTDs (Document Type Definition) • Data conforms to metadata description • Provides open source solution for data integration between components • Lots of support in Computer Science community (modules developed) • XML::CGI - a module to convert CGI parameters to and from XML • XML::DOM - a Perl extension to XML::Parser. It adds a new 'Style' to XML::Parser,called 'Dom', that allows XML::Parser to build an Object Oriented data structure with a DOM Level 1 compliant interface. • XML::Dumper - a simple package to experiment with converting Perl data structures to XML and converting XML to perl data structures. • XML::Encoding - a subclass of XML::Parser, parses encoding map XML files. • XML::Generator is an extremely simple module to help in the generation of XML. • XML::Grove - provides simple objects for parsed XML documents. The objects may be modified but no checking is performed. • XML::Parser - a Perl extension interface to James Clark's XML parser, expat • XML::QL - an early implementation of a note published by the W3C called "XML-QL: A Query Language for XML". • XML::XQL - a Perl extension that allows you to perform XQL queries on XML object trees.
  • 7. How the Web is • HTML documents • all intended for human consumption • many generated automatically by applications Easy to fetch any Web page, from any server, any platform
  • 8. Limits of the Web • application cannot consume HTML • HTML wrapper technology is brittle • need interoperability fast
  • 9. Paradigm Shift on the Web • new Web standard XML: • XML generated by applications • XML consumed by applications • data exchange • across platforms: enterprise interoperability • across enterprises Web: from collection of documents to data and documents
  • 10. What is XML • XML stands for eXtensible Markup Language • XML is a markup language much like HTML • XML was designed to store and transport data • XML was designed to be self-descriptive • XML is a W3C Recommendation • It is a hierarchical data description language • XML was designed to describe data and focus on what data is. • Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML • Documents have tags giving extra information about sections of the document • E.g. <title> XML </title> <slide> Introduction …</slide> • Extensible, unlike HTML • Users can add new tags, and separately specify how the tag should be handled for display
  • 11. What is a DTD • DTD stands for Document Type Definition. • A DTD defines the structure and the legal elements and attributes of an XML document. • Valid XML Documents • A "Valid" XML document is "Well Formed", as well as it conforms to the rules of a DTD:
  • 12. Features of XML • XML is an easy and automatically parseable way to describe data • More flexible and adaptable information identification. • XML is extensible
  • 13. How does XML differ from HTML? • HTML is a presentation markup language – provides no information about content. • There is only one standard definition of all of the tags used in HTML. • XML can define both presentation style and give information about content. • XML relies on custom documents defining the meaning of tags.
  • 14. HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999 • <!DOCTYPE html> <html> <head> <title>Page Title</title> </head> <body> <h1>This is a Heading</h1> <p>This is a paragraph.</p> </body> </html>
  • 15. XML <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content
  • 16. XML separates data from html • If you need to update a website dynamically, the kind of effort you have to put is rigorous. But xml, since it separates data and presentational features of that data, it is easier to update the xml file dynamically and html takes care of how data looks.
  • 17. XML Terminology • tags: book, title, author, … • start tag: <book>, end tag: </book> • elements: <book>…<book>,<author>…</author> • elements are nested • empty element: <red></red> abbrv. <red/> • an XML document: single root element well formed XML document: if it has matching tags
  • 18. More XML: Attributes <book price = “500” currency = “INR”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 2017 </year> </book> attributes are alternative ways to represent data
  • 19. XML namespace • XML namespace is a collection of XML elements and attributes identified by an Internationalized Resource Identifier (IRI); this collection is often referred to as an XML "vocabulary." • Since XML allows designers to chose their own tag names, it is possible that two or more designers may choose the same tag names for some or all of their elements. XML namespace solves this problem. It provides a way to distinguish between XML elements that have the same local name but are, in fact, from different vocabularies. This is done by associating an element with a namespace. A namespace acts as scope for all elements associated with it.
  • 20. A minimal XML document <?xml version=“1.0” ?> <document name=“first”>ABC</document> A tag An attribute value Closing tag
  • 21. A Piece of XML Schema <seq id=“my_seq” name=“NUCLEAR RIBONUCLEOPROTEIN”> <dbxref> <database>SWISS-PROT</database> <unique_id>P09651</unique_id> </dbxref> <residues type=“aa”> SKSESPKEPEQLRKLFIGGLSFETTDESLRSHFEQWGTLTDCVVMRDPNTKRSRGFGFVTYATVEEV DAAMNARPHKVDGRVVEPKRAVSREDSQRPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIE IMTDRGSGKKRGFAFVTFDDHDSVDKIVIQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNF GGGRGGGFGGNDNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRGYGS GGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGGSGSNFGGGGSYNDFGNYNNQSSNFGPMKGGNFGGR SSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF </residues> </seq>
  • 22. Biological XML • Some DTD’s have been proposed publicly as XML formats for biological data • GAME (Genome Annotation Markup Elements) • BIOML (The Biopolymer Markup Language) • IOML (Interactive Outline Markup Language) • BSML (Bioinformatic Sequence Markup Language) • CML (Chemical Markup Language) • GEML (Gene Expression Markup Language)
  • 23. phyloXML: XML for evolutionary biology and comparative genomics • http://www.phyloxml.org/ • phyloXML is an XML language designed to describe phylogenetic trees (or networks) and associated data. • It provides elements for commonly used features, such as taxonomic information, gene names and identifiers, branch lengths, support values, and gene duplication and speciation events. Using these standardized elements allows interoperability between various applications and databases. Furthermore, both due to extensible nature of XML itself and the provision of <property> elements by phyloXML, extensibility as well as domain specific applications are ensured. • The structure of phyloXML is described by XML Schema Definition (XSD) language.
  • 24. XML at the PDBe • http://www.ebi.ac.uk/pdbe/docs/documentation/xml.html • The PDBe is involved in XML at two levels. • development of standard DTDs/XML schemae for representing macromolecular structure and other biological data. • For example: • structural genomics data exchange packets (with eHTPX) • nuclear magnetic resonance experimental information (with CCPN) • macromolecular structure data (with RCSB)
  • 25. Significance of Using XML 1. Open and extensible - XML’s one-of-a-kind open structure allows you to add other state-of-the-art elements when needed. This means that you can always adapt your system to embrace industry-specific vocabulary. 2. It is simple to modify a DTD. The XML and DTD files are human readable and then can be easily edited by people with only few computer skills 3. XML is Internet-oriented and has very rich capabilities for linking data -This can be used for interconnecting databases 4. XML provides an open framework for defining standard specifications. -This is an important point because bioinformatics clearly lacks standardization 5. XML data is self-describing. That means it contains both data and information about the data. In records of traditional database systems, before you store data, it requires to define relational schemata, file description tables, external data definitions etc. Where as in xml, these things are not required. Because the data itself contains all these information. 6. XML ensures total usability of data. This is very important for seamless integration of data, as far as business applications are concerned. 7. XML can be integrated to all the feasible data format like form text and numbers to multimedia like sound, image to active formats like Java Applets or ActiveX Components. 8. No programming required to modify the presentation of data - One can change the look and feel of documents or even entire websites with XSL Style Sheets without manipulating the data itself 9. Single source for distributed data - XML documents can consist of data from many different databases distributed over multiple servers. In other words: With XML the entire World Wide Web is transformed into a single all- encompassing database. 10. Future-oriented technology - XML is the endorsed industry standard of the World Wide Web Consortium (W3C) and is supported by all leading software providers. Furthermore, XML is also the standard today in an increasing number of other industries, for example, health care.