Intro toxml

<title>XML Overview</title>
<subtitle>eXtensible Markup Language</subtitle>
<presenter>Dan Hebert</presenter>
<email>dhebert@mitre.org</email>
<affiliation>The MITRE Corporation</affiliation>
<date>26 Feb 2004</date>

Origins of the World Wide Web
 1989 - Information
Management: A Proposal
– Tim Berners-Lee, CERN
 “We should work toward a
universal linked information
system, in which generality and
portability are more important
than fancy graphics techniques
and complex extra facilities”
 Comments from
Management:
– ‘Exciting, but a little
vague’
A “mesh” of computers
REJECTED

World Wide Web Consortium (W3C)
 Founded in 1994 to “lead the Web to its full potential”
– Develop common protocols
– Ensure interoperability
– Promote WWW evolution
 Co-hosted by MIT (U.S.), INRIA (France) and Keio (Japan)
Universities
 Supported by DARPA and the European Commission
 Over 300 members including DISA, National Labs and
MITRE

XML is all about data!
 Every organization uses data! So, XML is a very
foundational technology.
DATA
Microsoft
Sun
IBM
DoD
KMart Dell . . .
… data makes the world go around ...

Family of XML Technologies
XML
Namespaces
XSLT/XPath
XML Schemas
RDF
XQuery
SVG
SAX/DOM
SOAP,
WSDL,
UDDI
Xlink/
XPointer
RDDL
MathML
RSS

XML
(and its Associated Technologies)
 All about data:
– structuring the data
– accessing and manipulating the data
Computer 1 Computer 2
data

Passing Data between Systems
 Suppose that you’ve got book data that you want
to pass between some systems
“My Life and Times” Paul McCartney July 1998
94303-12021-43892 McMillin Publishing.
“Illusions The Adventures of a Reluctant Messiah”
Richard Bach 1977 0-440-34319-4 Dell Publishing Co..
“The First and Last Freedom” J. Krishnamurti 1954
0-06-064831-7 Harper & Row.

Passing Data between Systems
 First thing you might do is agree on how you will
structure your data:
“My Life and Times”/Paul McCartney/July 1998/94303-12021-43892/McMillin Publishing.
“Illusions The Adventures of a Reluctant Messiah”/Richard Bach/1977/0-440-34319-4/Dell Publishing Co..
“The First and Last Freedom”/J. Krishnamurti/1954/0-06-064831-7/Harper & Row.
Title / Author / Date / ISBN / Publisher
Here we are using a slash to delimit (separate) each field and a
carriage return to delimit each record.

Alternatively
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>July, 1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillin Publishing</Publisher>
</Book>
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
</Book>
Here we are delimiting each data item with a start and end tag.
We are enclosing each record also within a start-end tag.

Comparison
 Slash-delimited
– Advs:
– Little space overhead
– Disadv:
– Rigid format (cannot shuffle the data around)
 Tag-delimited (XML)
– Advs:
– Flexible format (can shuffle the data around)
– Tags enhance ability to search for data
– Tags enhance ability to extract subsets of data
– Disadvs:
– Verbose (XML Bloat)

Compressibility of XML
 XML is very compressible!
txt
XML
WinZip
XMill (an XML compression tool from AT&T)
Alternative tool: B-Zip.
674,062 bytes
11,421,822 bytes
148,294 bytes 94,369 bytes
The compressed version of the XML document is smaller
than the compressed version of the original document!
translate

Transferring Large XML Documents
 Example: suppose that you have a 30MB XML file that you need
transferred
 Typical transfer rate: 1MB/sec
 Total time = 30MB * sec/1MB = 30sec
 Too long! What do we do?
 We could compress it, then it would be a smaller file and thus
would take less time
 However, there are issues with compressing - time to
compress/decompress, ensuring both sender and receiver have
the tools, etc
 There is an alternative ...

Transferring Large XML Documents
(cont.)
 The alternative is to do XML Streaming
– send the XML declaration and root element to
make the initial contact.
– Then send the first chunk of XML. While the
receiver is processing the first chunk the
succeeding chunks can be sent in the
background.
 HTML, Jabber (XML-based Instant Messaging)
does streaming

Summary of First Step
 Thus, the first step in passing data between systems is to
agree to how the data is going to be structured
– Use slash-delimiters, or
– Use tags, or
– Use some other delimiter
 Now each system can be written to expect the data that it
receives to be in that structure. Likewise, when it sends
out data it will send it out in that format.

What Next? Express Data Business
Rules
 We will need a syntax to express our data's
business rules.
Each Book must contain data for the Title, Author, Date, ISBN, and Publisher.
The Date must have the format: year, or month-comma-year.
The ISBN must have 10 digits, 3 dashes, and must end with 0-9 or x.
etc.
Data Business Rules
Note: of course, we will want to express these constraints using a more formal
syntax than English.

What next?
 Now that the data is structured in an agreed upon
fashion, what else do we typically want to do with the
data?
– We might want to have a tool which validates that
the data is in the agreed upon format
– Such a tool would help reduce system crashes
by ensuring that the data is valid

Validation
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4-ppp</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
Validator
Error!!! Invalid ISBN!
Rules that indicate
the valid structure
of book data

What else?
 You might want a tool which helps you to build
your data documents
– It would be very helpful if this tool could use
the rules document, so that you don’t need to
remember tag names and the order of the data

Creating and Editing
<BOOKCATALOGUE>
<BOOK>
<TITLE>My Life and Times</TITLE>
<AUTHOR>Paul McCartney</AUTHOR>
<DATE>July, 1998</DATE>
<ISBN>94303-12021-43892</ISBN>
<PUBLISHER>McMillin Publishing</PUBLISHER>
</BOOK>
<BOOK>
<TITLE>Illusions The Adventures of a Reluctant
Messiah</TITLE>
<AUTHOR>Richard Bach</AUTHOR>
<DATE>1977</DATE>
<ISBN>0-440-34319-4</ISBN>
<PUBLISHER>Dell Publishing Co.</PUBLISHER>
</BOOK>
<BOOK>
<TITLE>The First and Last Freedom</TITLE>
<AUTHOR>J. Krishnamurti</AUTHOR>
<DATE>1954</DATE>
<ISBN>0-06-064831-7</ISBN>
<PUBLISHER>Harper & Row</PUBLISHER>
</BOOK>
</BOOKCATALOGUE>
Rules that indicate
the valid structure
of book data

What else?
 What else do you need to use the data?
– A common Application Programming Interface (API) that
allows the systems to programmatically access the data
would be very beneficial
– Such a common API would keep each system from
duplicating effort
Computer
API
Data

What else?
 You might want to display the data, perhaps as
an HTML (Web) page, or filter out sensitive data,
or create a text version.
– In general, you might want a tool which
transforms the data from one form to another

XML to HTML
XML
Web page
(HTML)
Transformation
Tool
Transformation
Instructions
Raw data
(nicely organized,
as XML of course!)
Data organized
in tables, in lists,
etc

XML to XML
XML XML
Transformation
Tool
Transformation
Instructions
Contains
sensitive
data
Stripped of
the sensitive
data

Problem – migrating legacy systems
 Problem: migrate a group of systems from an old,
legacy data-format to the new, XML format.
 Caveat: the migration strategy cannot force all
systems to migrate in lock-step!

Transforming the Data
Data
Transformation
Tool
Transformation
Instructions
HTML, XML, Text

What else?
 You might want to provide metadata for the data
(i.e., data about the data)
– When was the data created? By whom? How
long is it valid?
 Perhaps if your system is located at a Web site
you may want to serve up the metadata document
first, so that people/programs that interact with
your Web site can first decide if the data is
relevant before actually downloading the data

What else?
 You may wish to provide a query tool so that the
data can be queried
DataQuery
tool
Query
Results

What else?
 You might wish to provide hyperlinking
capabilities in your data, so that you can express
the relationship between this data and other data.

Name Deconfliction
Medical XML Vocabulary NIST (msrmt stds) XML Vocabulary
Endoskeleton
Nerve
Body
Spine
Lymph
Foot
Mile
Meter
Inches
Kilometer
Foot
<foot>…</foot>
<foot>…</foot>
Is this a a human foot, or a
measurement foot?
If a machine processes this
document, how will it be
able to distinguish?XML document

Summary
 In a group of systems which pass around data, here are some things to
consider:
– Structure the data
– Syntax to express data business rules
– Validate the data
– Create/edit the data
– Provide a programmatic access API
– Transform tool to display the data
– A syntax to express metadata about the data
– Query tool
– Syntax to express relationships between documents
– Name deconfliction

XML Technologies
Syntax
Data business rules
Validator
Editor
Programmatic API
Transformation tool
Metadata
Query
Linking
Name deconfliction
XML
DTD/XML Schema
XML Parser
XML Editor
XML DOM and SAX
XSL
RDF
XQL and XML-QL
XLink and Xpointer
Namespaces

eXtensible Stylesheet Language Transform (XSLT)
 XML alone says nothing about how to present the data
(what should it look like?)
 XSLT is a flexible language to allow multiple
presentations and transformations of a given XML
representation
– Defines some behavior for XML elements
 XSLT is expressed in XML
<?xml version=“1.0”?>
<xsl:stylesheet>
<xsl:template match=“air_tasking_order”>
[action]
</xsl:template>
<xsl:template match=“mission_data”>
[action]
</xsl:template>
...
</xsl:stylesheet>

XML Query Language: XQuery
 Provides declarative access to XML documents.
– Resilient to changes in the underlying structure or
schema.
 Allows XML documents to be treated as database
instances.
– Information retrieved through interactive queries.
 15 Feb 2001 – First working draft released
 13 May 2001: Microsoft announces availability of XQuery
prototype (msdn.microsoft.com/xml)
FOR $b IN document("bib.xml")/book
WHERE $b//name = “Dr. Bob Miller" AND
$b//affil = “The MITRE Corporation"
RETURN $b/title

Simple Object Access Protocol (SOAP)
 Simple, easy to use XML-based protocol to let
software components and applications
communicate using standard Internet HTTP
SOAP = HTTP + XML
 Standard RPC (DCOM, CORBA) not easily
adaptable to the Internet (e.g., blocked by
Firewalls)
 9 July 2001: W3C SOAP 1.2 Working Draft
<SOAP-ENV:Envelop>
<SOAP-ENV:Body>
<xmlns:m=http://www.stock.org />
<m:GetStockPrice>
<StockName>MITRE</StockName>
</m:GetStockPrice>
</SOAP-ENV:Body>
</SOAP-ENV:Envelop>
HTTP Post
28 March
2001: SOAP
included in
ebXML
Messaging
Spec.

Document Object Model (DOM)
 Set of abstract (language neutral) class declarations for the
tree representation of XML documents
 Declares several node classes that are connected to make
XML documents, such as:
– Document
– Element
– Attribute
– Text
 Includes typical operations (methods), such as:
– createElement (name)
– createAttribute (name)
– ReplaceChild (newChild)
XML
Doc
Computer
Application
DOM
Implementation

Wireless Application Protocol
 Defines Binary XML
Content Format
 Uses XML for
– Data Exchange
– User Interface via
Wireless Markup
Language (WML)
 Managed by WAP
Forum
– Over 200 members
representing over
90% of the global
handset market
– Active liaison with
W3C and IETF
Taken from Nokia’s WAP Web Site

Wireless Application Protocol (WAP)
 Used for handheld devices (e.g., cell phone, Palm Pilot)
– Makes minimal demands on air interface
– Employs light weight protocol stack to minimize
bandwidth
 Communicate with a WAP gateway to the Internet
– Works with most wireless networks
– Micro-browser spec controls the user interface
 1 August 2001: WAP Forum released WAP 2.0
– Now supports eXtensible HTML (XHTML)
– Multimedia messaging services
– Instant messaging
– Voice, Images
– WAP Push (e.g., for alerts)

Wireless Markup Language (WML)
 Adheres to XML standards
– Allows use of XML tools
 WML documents organized into well defined units of user
interaction
– Units called “cards”
– Suitable for limited display of handheld device
– Telephony (computerized phone services) aware
 Write once, use anywhere
– WML documents can be used by any network or device
that is WAP-compliant

XML Supports Multiple Interfaces
Human
Interfaces
ABCS
TBMCS
AFATDS
GCSS
IBS
Application
Interfaces
C4ISR
Databases
Data
Interfaces
Mobile
Interfaces
<air_operations_data>
<day-time> 020200Z </day-time>
<quantity> 6 </quantity>
<country> US </country>
<subject_type> FTR </subject_type>
<aircraft_type> F15 </aircraft_type>
<track_number> 401 </track_number>
</air_operations_data>
XML-MTF

The Tower of Babel Problem
What’s a Namespace?
 We need shared vocabularies and the means to specify
relationships between vocabularies
 For example, What should the <tank> tag denote?
– A tracked vehicle with turret and cannon?
– A container for aviation fuel?
 Possible solutions:
– Standardize all tags everywhere
– Have COIs standardize tags for that community
 XML namespaces provide the XML Document vocabulary
 XML namespace defined by XML schema
– More on this later……
www.ontology.org
www.rosettanet.org

What is Metadata?
 Metadata is data about data
 Metadata adds value by supplying meaning
(semantics) to data so that is used as intended
 Two types
– Internal (about info object content)
– External (about info object as a whole)
 Metadata is exposed in various ways
– Data definitions, schema, ontology
– More is better

What’s a DTD?
 Document Type Definition (DTD):
– Supplies Metadata
– Describe the structure of XML documents
– Provide typing information of elements in those
documents
 Problems with DTDs:
– DTDs not written in XML
– variety of Typing information is limited
 Need something that supports endless variety of Types for
maximum flexibility in understanding the meaning of an
XML document

What’s an XML Schema?
 Schema is metadata about an XML document (information
object)
 Used to describe the structure and content of a given XML
document type
– What will an instance of an XML document contain?
(e.g., a purchase order, a phone book record, a target
report, etc.)
– Elements (of data)
– Type (of data)
– Structure (of XML document)
 Extensible typing -- Users can define their own types
– Allows rich semantics (metadata)
 Specification against which XML can be validated.
May 2, 2001: XML Schema specification
released as a W3C recommendation

XML Schema Example
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
targetNamespace="http://www.publishing.org"
xmlns="http://www.publishing.org"...
<xsd:element name=“Author">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref=“Name" minOccurs="1" maxOccurs="1"/>
<xsd:element ref=“Affil" maxOccurs="1"/>
<xsd:element ref=“Email" minOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name=“Name" type="xsd:string"/>
<xsd:element name=“Affil" type="xsd:affilType"/>
<xsd:element name=“Email" type="xsd:emailType"/>
</xsd:schema>
Note that XML Schema is written in XML;
I.e. an XML schema is an XML document

XML Schema Example
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
targetNamespace="http://www.publishing.org"
xmlns="http://www.publishing.org"...
<xsd:element name=“Author">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref=“Name" minOccurs="1" maxOccurs="1"/>
<xsd:element ref=“Affil" maxOccurs="1"/>
<xsd:element ref=“Email" minOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name=“Name" type="xsd:string"/>
<xsd:element name=“Affil" type="xsd:affilType"/>
<xsd:element name=“Email" type="xsd:emailType"/>
</xsd:schema>
Definition of Element Structure
Definition of Element Type
XML Namespace Declaration

Using Schemas
Composition and Validation
(Message Preparation)
Data entry
XML
Schema
XML
WSYSWG
Editor
XML
document
Guidance
XML Parser
XML document
Validation
XML
Schema
XML
Translation
Rules
Translator
Parsing, Validation, & Translation
(Message Processing)
Sender Receiver
User/Application

XML is not a Silver Bullet
 Some XML standards are still in development
 Some vendors offer differing implementations of the standard
 Just having systems publish XML does not ensure interoperability
– XML usage requires Communities of Interest (COI) to agree on XML tags
to ensure consistent interpretation
– Standardized schemas will also help
 XML provides a common language through which to organize, define,
structure, and deal with your data and information - it forces you to actually
MANAGE information
 XML allows many desirable practices for data management and exchange to
be applied more broadly, and at lower cost
 XML is adding tools to our information management arsenal to attack some
of the deeper problems
 Web services based on HTTP and XML provide the “least common
denominator” to integrate the wide variety of enterprise systems
But

Places to go for more information
about XML
 www.w3.org
 www.xml.com
 www.xml.org
 www.ebxml.org
 www.oasis-open.org/cover/xml.html
 www.microsoft.com/xml
 www.software.ibm.com/xml
 www.oracle.com/xml
 www.architag.com

XML and Bandwidth
 Tag bandwidth is small compared to imagery, video, etc.
 Ability to dynamically interact with XML content can reduce the
granularity of information exchanges
– e.g., XQuery returns only a portion of a document
 XML can be significantly compressed
– Commercial tools:
– Pkzip (everyone’s favorite)
– XMill -- AT&T smart compression tool
– MITRE Knowledge Based Compression
– Uses knowledge about message structure to direct
compression/decompression
– Defense Evaluation and Research Agency (DERA) UK
– XML Compression Study

Electronic Business XML Initiative (ebXML)
 Sponsored by UN/CEFACT + OASIS
– UN/CEFACT = United Nations body for e-commerce
(EDI)
– OASIS = Org. for Advancement of Structured Info Stds
(runs xml.org)
 Developing an open XML- based infrastructure enabling the
global use of electronic business information in an
interoperable, secure and consistent manner by all parties
– a set of open technical specifications that define an
interoperable eBusiness framework
– An 18-month project
 Global, open participation process supporting small,
medium and large enterprises
 ebXML specifications approved at a meeting in Vienna,
Austria on 11 May 2001.
Tech specs available at www.ebxml.org

ebXML Architecture
Messaging Services
Information
Package
Service
Interface
Enterprise
System
Registry/
Repository
Business
Process
Core Info &
Process
Information
Objects
Brokered I/S Exchange
ContextFor BuiltWith
Enterprise
System
Service
Interface
Enable one
Partner Role
Register
Enable other
Partner Roles
Ops View
Service View
14 May 2001
ebXML spec approved

The Semantic Web
 Extension of today’s WWW
 Adds meaning (semantics) through:
– Rich metadata (ontologies express
relationships)
– Logic (supports inference rules)
 Improved query and search
 Semantic mediation (transforms info)
 Supports Agents (fuselets)
http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html

What’s RDF?
 A W3C model for representing metadata
 Metadata represented as ordered triple
– (Subject, Verb, Object), e.g.
– Or Node, Property, Value
– Form is (URI, URI, URI or literal) but stated in XML
– Provides graph-like representations
– Thus an agent could move along the graph gobbling up
metadata
<#drbob> <#owns> <#mustang>
<#mustang> <#type_of> <#automobile>
Note: # is URI reference

Ontologies
Mustang
Dr. Bob
Automobile
owns
type of
consumer
is a
Ford
made by
Graspic DS1
partof
Tire
type of
Sam’s Club
sold by
member of
Some URLlistssalesat

Semantic Web Technologies
 XML Schemas
 XML Query
 Topic Maps
 Resource Description Framework (RDF)
 DARPA Agent Markup Language (DAML)
 Ontologies

Tim Berners-Lee Model
Unicode URI
XML Namespace XMLSchema
RDF RDFSchema
Ontology
Logic
Proof
Trust
DigitalSignature
Presented at
XML2K (Dec 2000)
SemanticWeb

Building the Semantic Web through Web Services
 Transactions initiated
automatically by a program,
not necessarily using a
browser
 Can be described, published,
discovered, and invoked
dynamically in a distributed
computing environment
 New ways of using the web:
intelligent agents,
marketplaces, auctions
All built on XML and other
internet standards!
TCP/IP - SSL
HTTP/HTTPS
Web Servers
Servlets/COM
XML
SOAP
WSDL
UDDI
Brokering
Workflow

Intro toxml

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Intro toxml

Similar to Intro toxml (20)

Intro toxml

Editor's Notes