Introduction and Overview of
XML
Objectives
• Explain what XML is and the need for XML
• Know other markup languages – SGML, HTML,
XHTML
• Understand the difference between SGML,
HTML and XML
• Know the various applications of XML
• Know the pros and cons of XML
XML
• XML stands for Extensible Markup Language.
• XML is a tool for data transportation and data
storage in platform and language neutral way.
• XML plays an important role in the exchange of a
wide variety of data on the web
• XML defines set of rules for encoding documents
which is both human-readable and machine-
readable
• All rules are defined in XML 1.0 specification
developed by W3C an open standard
• Many Parsers or APIs(Application Programming
Interface) are available to process the XML data
History of XML
• W3C’s primary goals is to make the Web universally
accessible—regardless of disabilities, language, culture, etc
• Internet is a collection of interconnected computers
• DARPANET (Defense Advanced Research Project Agency
Network) was the first network to interconnect academic,
government and private research organizations
• Initially, internet used for sending electronic messages and
transferring files.
• FTP(File Transfer Protocol) allows people to request files
from the other system
• Limitation
– what format the files requested would be in and
– Can the file be processed
Contd.
• CERN browser
– Used to request files over the internet and display
them in a predefined format
– Uses
• HTTP (Hyper Text Transfer Protocol) and
• HTML(Hyper text markup language)
• Presentation details cannot be transferred as they
are coded in the machine specific manner that may
not be understood at the receiving end
Contd.
• Standardized Generalized Markup Language (SGML) -
allows information about the document's structure to be
preserved
• DSSSL – Document style semantics and specification
language
• SGML is used to specify mark up languages.
• The purpose of SGML is to create the vocabularies which
could be used to mark up documents with structural tags.
• HTML - one of the most popular applications of SGML
• HTML - mark up language used for presentation i.e. design
a webpage
• HTML - All tags predefined
Contd.
• Limitation of HTML
– Data storage and interchange of data is not
possible using HTML
– All tags are predefined
• XML bridges this gap
– human readable, while being flexible enough to
support platform and
– architecture independent data interchange
SGML vs HTML vs XML
• HTML allows hypertext links to be specified,
SGML does not allow any hyper text links
• HTML is used for presentation, not the meaning
of the data content,
XML describes the meaning of the document
• HTML is not extensible,
XML is highly extensible
Semi Structured Data
• Data may be
– Structured
– Unstructured
– Raw data
• Text Database – Unstructured
• Text Mark Up – Mark-up languages
• SGML – meta language
• HTML – markup language with predefined
tags

Introduction to XML

  • 1.
  • 2.
    Objectives • Explain whatXML is and the need for XML • Know other markup languages – SGML, HTML, XHTML • Understand the difference between SGML, HTML and XML • Know the various applications of XML • Know the pros and cons of XML
  • 3.
    XML • XML standsfor Extensible Markup Language. • XML is a tool for data transportation and data storage in platform and language neutral way. • XML plays an important role in the exchange of a wide variety of data on the web • XML defines set of rules for encoding documents which is both human-readable and machine- readable • All rules are defined in XML 1.0 specification developed by W3C an open standard • Many Parsers or APIs(Application Programming Interface) are available to process the XML data
  • 4.
    History of XML •W3C’s primary goals is to make the Web universally accessible—regardless of disabilities, language, culture, etc • Internet is a collection of interconnected computers • DARPANET (Defense Advanced Research Project Agency Network) was the first network to interconnect academic, government and private research organizations • Initially, internet used for sending electronic messages and transferring files. • FTP(File Transfer Protocol) allows people to request files from the other system • Limitation – what format the files requested would be in and – Can the file be processed
  • 5.
    Contd. • CERN browser –Used to request files over the internet and display them in a predefined format – Uses • HTTP (Hyper Text Transfer Protocol) and • HTML(Hyper text markup language) • Presentation details cannot be transferred as they are coded in the machine specific manner that may not be understood at the receiving end
  • 6.
    Contd. • Standardized GeneralizedMarkup Language (SGML) - allows information about the document's structure to be preserved • DSSSL – Document style semantics and specification language • SGML is used to specify mark up languages. • The purpose of SGML is to create the vocabularies which could be used to mark up documents with structural tags. • HTML - one of the most popular applications of SGML • HTML - mark up language used for presentation i.e. design a webpage • HTML - All tags predefined
  • 7.
    Contd. • Limitation ofHTML – Data storage and interchange of data is not possible using HTML – All tags are predefined • XML bridges this gap – human readable, while being flexible enough to support platform and – architecture independent data interchange
  • 8.
    SGML vs HTMLvs XML • HTML allows hypertext links to be specified, SGML does not allow any hyper text links • HTML is used for presentation, not the meaning of the data content, XML describes the meaning of the document • HTML is not extensible, XML is highly extensible
  • 9.
    Semi Structured Data •Data may be – Structured – Unstructured – Raw data • Text Database – Unstructured • Text Mark Up – Mark-up languages • SGML – meta language • HTML – markup language with predefined tags