SlideShare a Scribd company logo
Page 1
<title>XML Overview</title>
<subtitle>eXtensible Markup Language</subtitle>
<presenter>Dan Hebert</presenter>
<email>dhebert@mitre.org</email>
<affiliation>The MITRE Corporation</affiliation>
<date>26 Feb 2004</date>
Page 2
Origins of the World Wide Web
 1989 - Information
Management: A Proposal
– Tim Berners-Lee, CERN
 “We should work toward a
universal linked information
system, in which generality and
portability are more important
than fancy graphics techniques
and complex extra facilities”
 Comments from
Management:
– ‘Exciting, but a little
vague’
A “mesh” of computers
REJECTED
Page 3
World Wide Web Consortium (W3C)
 Founded in 1994 to “lead the Web to its full potential”
– Develop common protocols
– Ensure interoperability
– Promote WWW evolution
 Co-hosted by MIT (U.S.), INRIA (France) and Keio (Japan)
Universities
 Supported by DARPA and the European Commission
 Over 300 members including DISA, National Labs and
MITRE
Page 4
XML is all about data!
 Every organization uses data! So, XML is a very
foundational technology.
DATA
Microsoft
Sun
IBM
DoD
KMart Dell . . .
… data makes the world go around ...
Page 5
Family of XML Technologies
XML
Namespaces
XSLT/XPath
XML Schemas
RDF
XQuery
SVG
SAX/DOM
SOAP,
WSDL,
UDDI
Xlink/
XPointer
RDDL
MathML
RSS
Page 6
XML
(and its Associated Technologies)
 All about data:
– structuring the data
– accessing and manipulating the data
Computer 1 Computer 2
data
Page 7
Passing Data between Systems
 Suppose that you’ve got book data that you want
to pass between some systems
“My Life and Times” Paul McCartney July 1998
94303-12021-43892 McMillin Publishing.
“Illusions The Adventures of a Reluctant Messiah”
Richard Bach 1977 0-440-34319-4 Dell Publishing Co..
“The First and Last Freedom” J. Krishnamurti 1954
0-06-064831-7 Harper & Row.
Page 8
Passing Data between Systems
 First thing you might do is agree on how you will
structure your data:
“My Life and Times”/Paul McCartney/July 1998/94303-12021-43892/McMillin Publishing.
“Illusions The Adventures of a Reluctant Messiah”/Richard Bach/1977/0-440-34319-4/Dell Publishing Co..
“The First and Last Freedom”/J. Krishnamurti/1954/0-06-064831-7/Harper & Row.
Title / Author / Date / ISBN / Publisher
Here we are using a slash to delimit (separate) each field and a
carriage return to delimit each record.
Page 9
Alternatively
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>July, 1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillin Publishing</Publisher>
</Book>
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper &amp; Row</Publisher>
</Book>
Here we are delimiting each data item with a start and end tag.
We are enclosing each record also within a start-end tag.
Page 10
Comparison
 Slash-delimited
– Advs:
– Little space overhead
– Disadv:
– Rigid format (cannot shuffle the data around)
 Tag-delimited (XML)
– Advs:
– Flexible format (can shuffle the data around)
– Tags enhance ability to search for data
– Tags enhance ability to extract subsets of data
– Disadvs:
– Verbose (XML Bloat)
Page 11
Compressibility of XML
 XML is very compressible!
txt
XML
WinZip
XMill (an XML compression tool from AT&T)
Alternative tool: B-Zip.
674,062 bytes
11,421,822 bytes
148,294 bytes 94,369 bytes
The compressed version of the XML document is smaller
than the compressed version of the original document!
translate
Page 12
Transferring Large XML Documents
 Example: suppose that you have a 30MB XML file that you need
transferred
 Typical transfer rate: 1MB/sec
 Total time = 30MB * sec/1MB = 30sec
 Too long! What do we do?
 We could compress it, then it would be a smaller file and thus
would take less time
 However, there are issues with compressing - time to
compress/decompress, ensuring both sender and receiver have
the tools, etc
 There is an alternative ...
Page 13
Transferring Large XML Documents
(cont.)
 The alternative is to do XML Streaming
– send the XML declaration and root element to
make the initial contact.
– Then send the first chunk of XML. While the
receiver is processing the first chunk the
succeeding chunks can be sent in the
background.
 HTML, Jabber (XML-based Instant Messaging)
does streaming
Page 14
Summary of First Step
 Thus, the first step in passing data between systems is to
agree to how the data is going to be structured
– Use slash-delimiters, or
– Use tags, or
– Use some other delimiter
 Now each system can be written to expect the data that it
receives to be in that structure. Likewise, when it sends
out data it will send it out in that format.
Page 15
What Next? Express Data Business
Rules
 We will need a syntax to express our data's
business rules.
Each Book must contain data for the Title, Author, Date, ISBN, and Publisher.
The Date must have the format: year, or month-comma-year.
The ISBN must have 10 digits, 3 dashes, and must end with 0-9 or x.
etc.
Data Business Rules
Note: of course, we will want to express these constraints using a more formal
syntax than English.
Page 16
What next?
 Now that the data is structured in an agreed upon
fashion, what else do we typically want to do with the
data?
– We might want to have a tool which validates that
the data is in the agreed upon format
– Such a tool would help reduce system crashes
by ensuring that the data is valid
Page 17
Validation
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4-ppp</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
Validator
Error!!! Invalid ISBN!
Rules that indicate
the valid structure
of book data
Page 18
What else?
 You might want a tool which helps you to build
your data documents
– It would be very helpful if this tool could use
the rules document, so that you don’t need to
remember tag names and the order of the data
Page 19
Creating and Editing
<BOOKCATALOGUE>
<BOOK>
<TITLE>My Life and Times</TITLE>
<AUTHOR>Paul McCartney</AUTHOR>
<DATE>July, 1998</DATE>
<ISBN>94303-12021-43892</ISBN>
<PUBLISHER>McMillin Publishing</PUBLISHER>
</BOOK>
<BOOK>
<TITLE>Illusions The Adventures of a Reluctant
Messiah</TITLE>
<AUTHOR>Richard Bach</AUTHOR>
<DATE>1977</DATE>
<ISBN>0-440-34319-4</ISBN>
<PUBLISHER>Dell Publishing Co.</PUBLISHER>
</BOOK>
<BOOK>
<TITLE>The First and Last Freedom</TITLE>
<AUTHOR>J. Krishnamurti</AUTHOR>
<DATE>1954</DATE>
<ISBN>0-06-064831-7</ISBN>
<PUBLISHER>Harper &amp; Row</PUBLISHER>
</BOOK>
</BOOKCATALOGUE>
Rules that indicate
the valid structure
of book data
Page 20
What else?
 What else do you need to use the data?
– A common Application Programming Interface (API) that
allows the systems to programmatically access the data
would be very beneficial
– Such a common API would keep each system from
duplicating effort
Computer
API
Data
Page 21
What else?
 You might want to display the data, perhaps as
an HTML (Web) page, or filter out sensitive data,
or create a text version.
– In general, you might want a tool which
transforms the data from one form to another
Page 22
XML to HTML
XML
Web page
(HTML)
Transformation
Tool
Transformation
Instructions
Raw data
(nicely organized,
as XML of course!)
Data organized
in tables, in lists,
etc
Page 23
XML to XML
XML XML
Transformation
Tool
Transformation
Instructions
Contains
sensitive
data
Stripped of
the sensitive
data
Page 24
Problem – migrating legacy systems
 Problem: migrate a group of systems from an old,
legacy data-format to the new, XML format.
 Caveat: the migration strategy cannot force all
systems to migrate in lock-step!
Page 25
Transforming the Data
Data
Transformation
Tool
Transformation
Instructions
HTML, XML, Text
Page 26
What else?
 You might want to provide metadata for the data
(i.e., data about the data)
– When was the data created? By whom? How
long is it valid?
 Perhaps if your system is located at a Web site
you may want to serve up the metadata document
first, so that people/programs that interact with
your Web site can first decide if the data is
relevant before actually downloading the data
Page 27
What else?
 You may wish to provide a query tool so that the
data can be queried
DataQuery
tool
Query
Results
Page 28
What else?
 You might wish to provide hyperlinking
capabilities in your data, so that you can express
the relationship between this data and other data.
Page 29
Name Deconfliction
Medical XML Vocabulary NIST (msrmt stds) XML Vocabulary
Endoskeleton
Nerve
Body
Spine
Lymph
Foot
Mile
Meter
Inches
Kilometer
Foot
<foot>…</foot>
<foot>…</foot>
Is this a a human foot, or a
measurement foot?
If a machine processes this
document, how will it be
able to distinguish?XML document
Page 30
Summary
 In a group of systems which pass around data, here are some things to
consider:
– Structure the data
– Syntax to express data business rules
– Validate the data
– Create/edit the data
– Provide a programmatic access API
– Transform tool to display the data
– A syntax to express metadata about the data
– Query tool
– Syntax to express relationships between documents
– Name deconfliction
Page 31
XML Technologies
Syntax
Data business rules
Validator
Editor
Programmatic API
Transformation tool
Metadata
Query
Linking
Name deconfliction
XML
DTD/XML Schema
XML Parser
XML Editor
XML DOM and SAX
XSL
RDF
XQL and XML-QL
XLink and Xpointer
Namespaces
Page 32
eXtensible Stylesheet Language Transform (XSLT)
 XML alone says nothing about how to present the data
(what should it look like?)
 XSLT is a flexible language to allow multiple
presentations and transformations of a given XML
representation
– Defines some behavior for XML elements
 XSLT is expressed in XML
<?xml version=“1.0”?>
<xsl:stylesheet>
<xsl:template match=“air_tasking_order”>
[action]
</xsl:template>
<xsl:template match=“mission_data”>
[action]
</xsl:template>
...
</xsl:stylesheet>
Page 33
XML Query Language: XQuery
 Provides declarative access to XML documents.
– Resilient to changes in the underlying structure or
schema.
 Allows XML documents to be treated as database
instances.
– Information retrieved through interactive queries.
 15 Feb 2001 – First working draft released
 13 May 2001: Microsoft announces availability of XQuery
prototype (msdn.microsoft.com/xml)
FOR $b IN document("bib.xml")/book
WHERE $b//name = “Dr. Bob Miller" AND
$b//affil = “The MITRE Corporation"
RETURN $b/title
Page 34
Simple Object Access Protocol (SOAP)
 Simple, easy to use XML-based protocol to let
software components and applications
communicate using standard Internet HTTP
SOAP = HTTP + XML
 Standard RPC (DCOM, CORBA) not easily
adaptable to the Internet (e.g., blocked by
Firewalls)
 9 July 2001: W3C SOAP 1.2 Working Draft
<SOAP-ENV:Envelop>
<SOAP-ENV:Body>
<xmlns:m=http://www.stock.org />
<m:GetStockPrice>
<StockName>MITRE</StockName>
</m:GetStockPrice>
</SOAP-ENV:Body>
</SOAP-ENV:Envelop>
HTTP Post
28 March
2001: SOAP
included in
ebXML
Messaging
Spec.
Page 35
Document Object Model (DOM)
 Set of abstract (language neutral) class declarations for the
tree representation of XML documents
 Declares several node classes that are connected to make
XML documents, such as:
– Document
– Element
– Attribute
– Text
 Includes typical operations (methods), such as:
– createElement (name)
– createAttribute (name)
– ReplaceChild (newChild)
XML
Doc
Computer
Application
DOM
Implementation
Page 36
Wireless Application Protocol
 Defines Binary XML
Content Format
 Uses XML for
– Data Exchange
– User Interface via
Wireless Markup
Language (WML)
 Managed by WAP
Forum
– Over 200 members
representing over
90% of the global
handset market
– Active liaison with
W3C and IETF
Taken from Nokia’s WAP Web Site
Page 37
Wireless Application Protocol (WAP)
 Used for handheld devices (e.g., cell phone, Palm Pilot)
– Makes minimal demands on air interface
– Employs light weight protocol stack to minimize
bandwidth
 Communicate with a WAP gateway to the Internet
– Works with most wireless networks
– Micro-browser spec controls the user interface
 1 August 2001: WAP Forum released WAP 2.0
– Now supports eXtensible HTML (XHTML)
– Multimedia messaging services
– Instant messaging
– Voice, Images
– WAP Push (e.g., for alerts)
Page 38
Wireless Markup Language (WML)
 Adheres to XML standards
– Allows use of XML tools
 WML documents organized into well defined units of user
interaction
– Units called “cards”
– Suitable for limited display of handheld device
– Telephony (computerized phone services) aware
 Write once, use anywhere
– WML documents can be used by any network or device
that is WAP-compliant
Page 39
XML Supports Multiple Interfaces
Human
Interfaces
ABCS
TBMCS
AFATDS
GCSS
IBS
Application
Interfaces
C4ISR
Databases
Data
Interfaces
Mobile
Interfaces
<air_operations_data>
<day-time> 020200Z </day-time>
<quantity> 6 </quantity>
<country> US </country>
<subject_type> FTR </subject_type>
<aircraft_type> F15 </aircraft_type>
<track_number> 401 </track_number>
</air_operations_data>
XML-MTF
Page 40
The Tower of Babel Problem
What’s a Namespace?
 We need shared vocabularies and the means to specify
relationships between vocabularies
 For example, What should the <tank> tag denote?
– A tracked vehicle with turret and cannon?
– A container for aviation fuel?
 Possible solutions:
– Standardize all tags everywhere
– Have COIs standardize tags for that community
 XML namespaces provide the XML Document vocabulary
 XML namespace defined by XML schema
– More on this later……
www.ontology.org
www.rosettanet.org
Page 41
What is Metadata?
 Metadata is data about data
 Metadata adds value by supplying meaning
(semantics) to data so that is used as intended
 Two types
– Internal (about info object content)
– External (about info object as a whole)
 Metadata is exposed in various ways
– Data definitions, schema, ontology
– More is better
Page 42
What’s a DTD?
 Document Type Definition (DTD):
– Supplies Metadata
– Describe the structure of XML documents
– Provide typing information of elements in those
documents
 Problems with DTDs:
– DTDs not written in XML
– variety of Typing information is limited
 Need something that supports endless variety of Types for
maximum flexibility in understanding the meaning of an
XML document
Page 43
What’s an XML Schema?
 Schema is metadata about an XML document (information
object)
 Used to describe the structure and content of a given XML
document type
– What will an instance of an XML document contain?
(e.g., a purchase order, a phone book record, a target
report, etc.)
– Elements (of data)
– Type (of data)
– Structure (of XML document)
 Extensible typing -- Users can define their own types
– Allows rich semantics (metadata)
 Specification against which XML can be validated.
May 2, 2001: XML Schema specification
released as a W3C recommendation
Page 44
XML Schema Example
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
targetNamespace="http://www.publishing.org"
xmlns="http://www.publishing.org"...
<xsd:element name=“Author">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref=“Name" minOccurs="1" maxOccurs="1"/>
<xsd:element ref=“Affil" maxOccurs="1"/>
<xsd:element ref=“Email" minOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name=“Name" type="xsd:string"/>
<xsd:element name=“Affil" type="xsd:affilType"/>
<xsd:element name=“Email" type="xsd:emailType"/>
</xsd:schema>
Note that XML Schema is written in XML;
I.e. an XML schema is an XML document
Page 45
XML Schema Example
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
targetNamespace="http://www.publishing.org"
xmlns="http://www.publishing.org"...
<xsd:element name=“Author">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref=“Name" minOccurs="1" maxOccurs="1"/>
<xsd:element ref=“Affil" maxOccurs="1"/>
<xsd:element ref=“Email" minOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name=“Name" type="xsd:string"/>
<xsd:element name=“Affil" type="xsd:affilType"/>
<xsd:element name=“Email" type="xsd:emailType"/>
</xsd:schema>
Definition of Element Structure
Definition of Element Type
XML Namespace Declaration
Page 46
Using Schemas
Composition and Validation
(Message Preparation)
Data entry
XML
Schema
XML
WSYSWG
Editor
XML
document
Guidance
XML Parser
XML document
Validation
XML
Schema
XML
Translation
Rules
Translator
Parsing, Validation, & Translation
(Message Processing)
Sender Receiver
User/Application
Page 47
XML is not a Silver Bullet
 Some XML standards are still in development
 Some vendors offer differing implementations of the standard
 Just having systems publish XML does not ensure interoperability
– XML usage requires Communities of Interest (COI) to agree on XML tags
to ensure consistent interpretation
– Standardized schemas will also help
 XML provides a common language through which to organize, define,
structure, and deal with your data and information - it forces you to actually
MANAGE information
 XML allows many desirable practices for data management and exchange to
be applied more broadly, and at lower cost
 XML is adding tools to our information management arsenal to attack some
of the deeper problems
 Web services based on HTTP and XML provide the “least common
denominator” to integrate the wide variety of enterprise systems
But
Page 48
Places to go for more information
about XML
 www.w3.org
 www.xml.com
 www.xml.org
 www.ebxml.org
 www.oasis-open.org/cover/xml.html
 www.microsoft.com/xml
 www.software.ibm.com/xml
 www.oracle.com/xml
 www.architag.com
Page 49
XML and Bandwidth
 Tag bandwidth is small compared to imagery, video, etc.
 Ability to dynamically interact with XML content can reduce the
granularity of information exchanges
– e.g., XQuery returns only a portion of a document
 XML can be significantly compressed
– Commercial tools:
– Pkzip (everyone’s favorite)
– XMill -- AT&T smart compression tool
– MITRE Knowledge Based Compression
– Uses knowledge about message structure to direct
compression/decompression
– Defense Evaluation and Research Agency (DERA) UK
– XML Compression Study
Page 50
Electronic Business XML Initiative (ebXML)
 Sponsored by UN/CEFACT + OASIS
– UN/CEFACT = United Nations body for e-commerce
(EDI)
– OASIS = Org. for Advancement of Structured Info Stds
(runs xml.org)
 Developing an open XML- based infrastructure enabling the
global use of electronic business information in an
interoperable, secure and consistent manner by all parties
– a set of open technical specifications that define an
interoperable eBusiness framework
– An 18-month project
 Global, open participation process supporting small,
medium and large enterprises
 ebXML specifications approved at a meeting in Vienna,
Austria on 11 May 2001.
Tech specs available at www.ebxml.org
Page 51
ebXML Architecture
Messaging Services
Information
Package
Service
Interface
Enterprise
System
Registry/
Repository
Business
Process
Core Info &
Process
Information
Objects
Brokered I/S Exchange
ContextFor BuiltWith
Enterprise
System
Service
Interface
Enable one
Partner Role
Register
Enable other
Partner Roles
Ops View
Service View
14 May 2001
ebXML spec approved
Page 52
The Semantic Web
 Extension of today’s WWW
 Adds meaning (semantics) through:
– Rich metadata (ontologies express
relationships)
– Logic (supports inference rules)
 Improved query and search
 Semantic mediation (transforms info)
 Supports Agents (fuselets)
http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html
Page 53
What’s RDF?
 A W3C model for representing metadata
 Metadata represented as ordered triple
– (Subject, Verb, Object), e.g.
– Or Node, Property, Value
– Form is (URI, URI, URI or literal) but stated in XML
– Provides graph-like representations
– Thus an agent could move along the graph gobbling up
metadata
<#drbob> <#owns> <#mustang>
<#mustang> <#type_of> <#automobile>
Note: # is URI reference
Page 54
Ontologies
Mustang
Dr. Bob
Automobile
owns
type of
consumer
is a
Ford
made by
Graspic DS1
partof
Tire
type of
Sam’s Club
sold by
member of
Some URLlistssalesat
Page 55
The Semantic Web
Page 56
Semantic Web Technologies
 XML Schemas
 XML Query
 Topic Maps
 Resource Description Framework (RDF)
 DARPA Agent Markup Language (DAML)
 Ontologies
Page 57
Tim Berners-Lee Model
Unicode URI
XML Namespace XMLSchema
RDF RDFSchema
Ontology
Logic
Proof
Trust
DigitalSignature
Presented at
XML2K (Dec 2000)
SemanticWeb
Page 58
Building the Semantic Web through Web Services
 Transactions initiated
automatically by a program,
not necessarily using a
browser
 Can be described, published,
discovered, and invoked
dynamically in a distributed
computing environment
 New ways of using the web:
intelligent agents,
marketplaces, auctions
All built on XML and other
internet standards!
TCP/IP - SSL
HTTP/HTTPS
Web Servers
Servlets/COM
XML
SOAP
WSDL
UDDI
Brokering
Workflow

More Related Content

Viewers also liked

Amplop idul fitri
Amplop idul fitriAmplop idul fitri
Amplop idul fitri
Solo Timur
 
Ppt kuliner khas kota solo
Ppt kuliner khas kota soloPpt kuliner khas kota solo
Ppt kuliner khas kota solorivara
 
Ppt kuliner khas kota solo
Ppt kuliner khas kota soloPpt kuliner khas kota solo
Ppt kuliner khas kota solo
rivara
 
7 stars
7 stars7 stars
7 stars
RKittu
 
Grupos vulnerables web
Grupos vulnerables webGrupos vulnerables web
Grupos vulnerables web
RDamian199512345
 
Sistem Sirkulasi
Sistem SirkulasiSistem Sirkulasi
Sistem Sirkulasi
anisayunmima
 
Ppt kuliner khas kota solo
Ppt kuliner khas kota soloPpt kuliner khas kota solo
Ppt kuliner khas kota solorivara
 
Keseimbangan 4 sektor
Keseimbangan 4 sektorKeseimbangan 4 sektor
Keseimbangan 4 sektor
Sudirman Jie
 
Ppt kuliner khas kota solo
Ppt kuliner khas kota soloPpt kuliner khas kota solo
Ppt kuliner khas kota solorivara
 
Hotel Management_MiniProject
Hotel Management_MiniProjectHotel Management_MiniProject
Hotel Management_MiniProject
sudhakar mandal
 

Viewers also liked (10)

Amplop idul fitri
Amplop idul fitriAmplop idul fitri
Amplop idul fitri
 
Ppt kuliner khas kota solo
Ppt kuliner khas kota soloPpt kuliner khas kota solo
Ppt kuliner khas kota solo
 
Ppt kuliner khas kota solo
Ppt kuliner khas kota soloPpt kuliner khas kota solo
Ppt kuliner khas kota solo
 
7 stars
7 stars7 stars
7 stars
 
Grupos vulnerables web
Grupos vulnerables webGrupos vulnerables web
Grupos vulnerables web
 
Sistem Sirkulasi
Sistem SirkulasiSistem Sirkulasi
Sistem Sirkulasi
 
Ppt kuliner khas kota solo
Ppt kuliner khas kota soloPpt kuliner khas kota solo
Ppt kuliner khas kota solo
 
Keseimbangan 4 sektor
Keseimbangan 4 sektorKeseimbangan 4 sektor
Keseimbangan 4 sektor
 
Ppt kuliner khas kota solo
Ppt kuliner khas kota soloPpt kuliner khas kota solo
Ppt kuliner khas kota solo
 
Hotel Management_MiniProject
Hotel Management_MiniProjectHotel Management_MiniProject
Hotel Management_MiniProject
 

Similar to Intro toxml

SC4 Workshop 2 : Pieter Colpaert - Maximizing the reuse of open transport data
SC4 Workshop 2 : Pieter Colpaert - Maximizing the reuse of open transport dataSC4 Workshop 2 : Pieter Colpaert - Maximizing the reuse of open transport data
SC4 Workshop 2 : Pieter Colpaert - Maximizing the reuse of open transport data
BigData_Europe
 
Cs 891 2rev B
Cs 891 2rev BCs 891 2rev B
Cs 891 2rev B
Carlton Northern
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
Armin Haller
 
Basics of Open Data: what you need to know by Wouter Degadt & Pieter Colpaert
Basics of Open Data: what you need to know by Wouter Degadt & Pieter ColpaertBasics of Open Data: what you need to know by Wouter Degadt & Pieter Colpaert
Basics of Open Data: what you need to know by Wouter Degadt & Pieter Colpaert
Opening-up.eu
 
Sem web tutorial general
Sem web tutorial generalSem web tutorial general
Sem web tutorial general
Swapnil & Patil
 
Cloud Computingfor Librarian To Librarian Networking Summit
Cloud Computingfor Librarian To Librarian Networking SummitCloud Computingfor Librarian To Librarian Networking Summit
Cloud Computingfor Librarian To Librarian Networking Summit
Lynn McCormick
 
COMP303-Lecture-01_1539277777777777.pptx
COMP303-Lecture-01_1539277777777777.pptxCOMP303-Lecture-01_1539277777777777.pptx
COMP303-Lecture-01_1539277777777777.pptx
AqeelaTahir3
 
D B M S Animate
D B M S AnimateD B M S Animate
D B M S Animate
Indu George
 
Build Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaBuild Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to Omega
Ravi Mynampaty
 
Ms access 2007 pptx
Ms access 2007 pptxMs access 2007 pptx
Ms access 2007 pptx
Abenezer Abiti
 
Tahira I.T
Tahira I.TTahira I.T
Tahira I.T
Tahira Sultana
 
MicrosoftAccessHandout.doc
MicrosoftAccessHandout.docMicrosoftAccessHandout.doc
MicrosoftAccessHandout.doc
Com2K22Class
 
Week 6 - Discussion ForumRequired ResourcesTextSharpe, N. .docx
Week 6 - Discussion ForumRequired ResourcesTextSharpe, N. .docxWeek 6 - Discussion ForumRequired ResourcesTextSharpe, N. .docx
Week 6 - Discussion ForumRequired ResourcesTextSharpe, N. .docx
helzerpatrina
 
BarCamp Sd Microformats
BarCamp Sd MicroformatsBarCamp Sd Microformats
BarCamp Sd Microformats
Joshua Brewer
 
Basics of XML
Basics of XMLBasics of XML
Basics of XML
indiangarg
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
ArslanButt52
 
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUDLITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
International Journal of Technical Research & Application
 
Lecture 1 database system notes full.pptx
Lecture 1 database system notes full.pptxLecture 1 database system notes full.pptx
Lecture 1 database system notes full.pptx
salutiontechnology
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
dclsocialmedia
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
Shyjal Raazi
 

Similar to Intro toxml (20)

SC4 Workshop 2 : Pieter Colpaert - Maximizing the reuse of open transport data
SC4 Workshop 2 : Pieter Colpaert - Maximizing the reuse of open transport dataSC4 Workshop 2 : Pieter Colpaert - Maximizing the reuse of open transport data
SC4 Workshop 2 : Pieter Colpaert - Maximizing the reuse of open transport data
 
Cs 891 2rev B
Cs 891 2rev BCs 891 2rev B
Cs 891 2rev B
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Basics of Open Data: what you need to know by Wouter Degadt & Pieter Colpaert
Basics of Open Data: what you need to know by Wouter Degadt & Pieter ColpaertBasics of Open Data: what you need to know by Wouter Degadt & Pieter Colpaert
Basics of Open Data: what you need to know by Wouter Degadt & Pieter Colpaert
 
Sem web tutorial general
Sem web tutorial generalSem web tutorial general
Sem web tutorial general
 
Cloud Computingfor Librarian To Librarian Networking Summit
Cloud Computingfor Librarian To Librarian Networking SummitCloud Computingfor Librarian To Librarian Networking Summit
Cloud Computingfor Librarian To Librarian Networking Summit
 
COMP303-Lecture-01_1539277777777777.pptx
COMP303-Lecture-01_1539277777777777.pptxCOMP303-Lecture-01_1539277777777777.pptx
COMP303-Lecture-01_1539277777777777.pptx
 
D B M S Animate
D B M S AnimateD B M S Animate
D B M S Animate
 
Build Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaBuild Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to Omega
 
Ms access 2007 pptx
Ms access 2007 pptxMs access 2007 pptx
Ms access 2007 pptx
 
Tahira I.T
Tahira I.TTahira I.T
Tahira I.T
 
MicrosoftAccessHandout.doc
MicrosoftAccessHandout.docMicrosoftAccessHandout.doc
MicrosoftAccessHandout.doc
 
Week 6 - Discussion ForumRequired ResourcesTextSharpe, N. .docx
Week 6 - Discussion ForumRequired ResourcesTextSharpe, N. .docxWeek 6 - Discussion ForumRequired ResourcesTextSharpe, N. .docx
Week 6 - Discussion ForumRequired ResourcesTextSharpe, N. .docx
 
BarCamp Sd Microformats
BarCamp Sd MicroformatsBarCamp Sd Microformats
BarCamp Sd Microformats
 
Basics of XML
Basics of XMLBasics of XML
Basics of XML
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUDLITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
 
Lecture 1 database system notes full.pptx
Lecture 1 database system notes full.pptxLecture 1 database system notes full.pptx
Lecture 1 database system notes full.pptx
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 

Intro toxml

  • 1. Page 1 <title>XML Overview</title> <subtitle>eXtensible Markup Language</subtitle> <presenter>Dan Hebert</presenter> <email>dhebert@mitre.org</email> <affiliation>The MITRE Corporation</affiliation> <date>26 Feb 2004</date>
  • 2. Page 2 Origins of the World Wide Web  1989 - Information Management: A Proposal – Tim Berners-Lee, CERN  “We should work toward a universal linked information system, in which generality and portability are more important than fancy graphics techniques and complex extra facilities”  Comments from Management: – ‘Exciting, but a little vague’ A “mesh” of computers REJECTED
  • 3. Page 3 World Wide Web Consortium (W3C)  Founded in 1994 to “lead the Web to its full potential” – Develop common protocols – Ensure interoperability – Promote WWW evolution  Co-hosted by MIT (U.S.), INRIA (France) and Keio (Japan) Universities  Supported by DARPA and the European Commission  Over 300 members including DISA, National Labs and MITRE
  • 4. Page 4 XML is all about data!  Every organization uses data! So, XML is a very foundational technology. DATA Microsoft Sun IBM DoD KMart Dell . . . … data makes the world go around ...
  • 5. Page 5 Family of XML Technologies XML Namespaces XSLT/XPath XML Schemas RDF XQuery SVG SAX/DOM SOAP, WSDL, UDDI Xlink/ XPointer RDDL MathML RSS
  • 6. Page 6 XML (and its Associated Technologies)  All about data: – structuring the data – accessing and manipulating the data Computer 1 Computer 2 data
  • 7. Page 7 Passing Data between Systems  Suppose that you’ve got book data that you want to pass between some systems “My Life and Times” Paul McCartney July 1998 94303-12021-43892 McMillin Publishing. “Illusions The Adventures of a Reluctant Messiah” Richard Bach 1977 0-440-34319-4 Dell Publishing Co.. “The First and Last Freedom” J. Krishnamurti 1954 0-06-064831-7 Harper & Row.
  • 8. Page 8 Passing Data between Systems  First thing you might do is agree on how you will structure your data: “My Life and Times”/Paul McCartney/July 1998/94303-12021-43892/McMillin Publishing. “Illusions The Adventures of a Reluctant Messiah”/Richard Bach/1977/0-440-34319-4/Dell Publishing Co.. “The First and Last Freedom”/J. Krishnamurti/1954/0-06-064831-7/Harper & Row. Title / Author / Date / ISBN / Publisher Here we are using a slash to delimit (separate) each field and a carriage return to delimit each record.
  • 9. Page 9 Alternatively <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>July, 1998</Date> <ISBN>94303-12021-43892</ISBN> <Publisher>McMillin Publishing</Publisher> </Book> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper &amp; Row</Publisher> </Book> Here we are delimiting each data item with a start and end tag. We are enclosing each record also within a start-end tag.
  • 10. Page 10 Comparison  Slash-delimited – Advs: – Little space overhead – Disadv: – Rigid format (cannot shuffle the data around)  Tag-delimited (XML) – Advs: – Flexible format (can shuffle the data around) – Tags enhance ability to search for data – Tags enhance ability to extract subsets of data – Disadvs: – Verbose (XML Bloat)
  • 11. Page 11 Compressibility of XML  XML is very compressible! txt XML WinZip XMill (an XML compression tool from AT&T) Alternative tool: B-Zip. 674,062 bytes 11,421,822 bytes 148,294 bytes 94,369 bytes The compressed version of the XML document is smaller than the compressed version of the original document! translate
  • 12. Page 12 Transferring Large XML Documents  Example: suppose that you have a 30MB XML file that you need transferred  Typical transfer rate: 1MB/sec  Total time = 30MB * sec/1MB = 30sec  Too long! What do we do?  We could compress it, then it would be a smaller file and thus would take less time  However, there are issues with compressing - time to compress/decompress, ensuring both sender and receiver have the tools, etc  There is an alternative ...
  • 13. Page 13 Transferring Large XML Documents (cont.)  The alternative is to do XML Streaming – send the XML declaration and root element to make the initial contact. – Then send the first chunk of XML. While the receiver is processing the first chunk the succeeding chunks can be sent in the background.  HTML, Jabber (XML-based Instant Messaging) does streaming
  • 14. Page 14 Summary of First Step  Thus, the first step in passing data between systems is to agree to how the data is going to be structured – Use slash-delimiters, or – Use tags, or – Use some other delimiter  Now each system can be written to expect the data that it receives to be in that structure. Likewise, when it sends out data it will send it out in that format.
  • 15. Page 15 What Next? Express Data Business Rules  We will need a syntax to express our data's business rules. Each Book must contain data for the Title, Author, Date, ISBN, and Publisher. The Date must have the format: year, or month-comma-year. The ISBN must have 10 digits, 3 dashes, and must end with 0-9 or x. etc. Data Business Rules Note: of course, we will want to express these constraints using a more formal syntax than English.
  • 16. Page 16 What next?  Now that the data is structured in an agreed upon fashion, what else do we typically want to do with the data? – We might want to have a tool which validates that the data is in the agreed upon format – Such a tool would help reduce system crashes by ensuring that the data is valid
  • 17. Page 17 Validation <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4-ppp</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> Validator Error!!! Invalid ISBN! Rules that indicate the valid structure of book data
  • 18. Page 18 What else?  You might want a tool which helps you to build your data documents – It would be very helpful if this tool could use the rules document, so that you don’t need to remember tag names and the order of the data
  • 19. Page 19 Creating and Editing <BOOKCATALOGUE> <BOOK> <TITLE>My Life and Times</TITLE> <AUTHOR>Paul McCartney</AUTHOR> <DATE>July, 1998</DATE> <ISBN>94303-12021-43892</ISBN> <PUBLISHER>McMillin Publishing</PUBLISHER> </BOOK> <BOOK> <TITLE>Illusions The Adventures of a Reluctant Messiah</TITLE> <AUTHOR>Richard Bach</AUTHOR> <DATE>1977</DATE> <ISBN>0-440-34319-4</ISBN> <PUBLISHER>Dell Publishing Co.</PUBLISHER> </BOOK> <BOOK> <TITLE>The First and Last Freedom</TITLE> <AUTHOR>J. Krishnamurti</AUTHOR> <DATE>1954</DATE> <ISBN>0-06-064831-7</ISBN> <PUBLISHER>Harper &amp; Row</PUBLISHER> </BOOK> </BOOKCATALOGUE> Rules that indicate the valid structure of book data
  • 20. Page 20 What else?  What else do you need to use the data? – A common Application Programming Interface (API) that allows the systems to programmatically access the data would be very beneficial – Such a common API would keep each system from duplicating effort Computer API Data
  • 21. Page 21 What else?  You might want to display the data, perhaps as an HTML (Web) page, or filter out sensitive data, or create a text version. – In general, you might want a tool which transforms the data from one form to another
  • 22. Page 22 XML to HTML XML Web page (HTML) Transformation Tool Transformation Instructions Raw data (nicely organized, as XML of course!) Data organized in tables, in lists, etc
  • 23. Page 23 XML to XML XML XML Transformation Tool Transformation Instructions Contains sensitive data Stripped of the sensitive data
  • 24. Page 24 Problem – migrating legacy systems  Problem: migrate a group of systems from an old, legacy data-format to the new, XML format.  Caveat: the migration strategy cannot force all systems to migrate in lock-step!
  • 25. Page 25 Transforming the Data Data Transformation Tool Transformation Instructions HTML, XML, Text
  • 26. Page 26 What else?  You might want to provide metadata for the data (i.e., data about the data) – When was the data created? By whom? How long is it valid?  Perhaps if your system is located at a Web site you may want to serve up the metadata document first, so that people/programs that interact with your Web site can first decide if the data is relevant before actually downloading the data
  • 27. Page 27 What else?  You may wish to provide a query tool so that the data can be queried DataQuery tool Query Results
  • 28. Page 28 What else?  You might wish to provide hyperlinking capabilities in your data, so that you can express the relationship between this data and other data.
  • 29. Page 29 Name Deconfliction Medical XML Vocabulary NIST (msrmt stds) XML Vocabulary Endoskeleton Nerve Body Spine Lymph Foot Mile Meter Inches Kilometer Foot <foot>…</foot> <foot>…</foot> Is this a a human foot, or a measurement foot? If a machine processes this document, how will it be able to distinguish?XML document
  • 30. Page 30 Summary  In a group of systems which pass around data, here are some things to consider: – Structure the data – Syntax to express data business rules – Validate the data – Create/edit the data – Provide a programmatic access API – Transform tool to display the data – A syntax to express metadata about the data – Query tool – Syntax to express relationships between documents – Name deconfliction
  • 31. Page 31 XML Technologies Syntax Data business rules Validator Editor Programmatic API Transformation tool Metadata Query Linking Name deconfliction XML DTD/XML Schema XML Parser XML Editor XML DOM and SAX XSL RDF XQL and XML-QL XLink and Xpointer Namespaces
  • 32. Page 32 eXtensible Stylesheet Language Transform (XSLT)  XML alone says nothing about how to present the data (what should it look like?)  XSLT is a flexible language to allow multiple presentations and transformations of a given XML representation – Defines some behavior for XML elements  XSLT is expressed in XML <?xml version=“1.0”?> <xsl:stylesheet> <xsl:template match=“air_tasking_order”> [action] </xsl:template> <xsl:template match=“mission_data”> [action] </xsl:template> ... </xsl:stylesheet>
  • 33. Page 33 XML Query Language: XQuery  Provides declarative access to XML documents. – Resilient to changes in the underlying structure or schema.  Allows XML documents to be treated as database instances. – Information retrieved through interactive queries.  15 Feb 2001 – First working draft released  13 May 2001: Microsoft announces availability of XQuery prototype (msdn.microsoft.com/xml) FOR $b IN document("bib.xml")/book WHERE $b//name = “Dr. Bob Miller" AND $b//affil = “The MITRE Corporation" RETURN $b/title
  • 34. Page 34 Simple Object Access Protocol (SOAP)  Simple, easy to use XML-based protocol to let software components and applications communicate using standard Internet HTTP SOAP = HTTP + XML  Standard RPC (DCOM, CORBA) not easily adaptable to the Internet (e.g., blocked by Firewalls)  9 July 2001: W3C SOAP 1.2 Working Draft <SOAP-ENV:Envelop> <SOAP-ENV:Body> <xmlns:m=http://www.stock.org /> <m:GetStockPrice> <StockName>MITRE</StockName> </m:GetStockPrice> </SOAP-ENV:Body> </SOAP-ENV:Envelop> HTTP Post 28 March 2001: SOAP included in ebXML Messaging Spec.
  • 35. Page 35 Document Object Model (DOM)  Set of abstract (language neutral) class declarations for the tree representation of XML documents  Declares several node classes that are connected to make XML documents, such as: – Document – Element – Attribute – Text  Includes typical operations (methods), such as: – createElement (name) – createAttribute (name) – ReplaceChild (newChild) XML Doc Computer Application DOM Implementation
  • 36. Page 36 Wireless Application Protocol  Defines Binary XML Content Format  Uses XML for – Data Exchange – User Interface via Wireless Markup Language (WML)  Managed by WAP Forum – Over 200 members representing over 90% of the global handset market – Active liaison with W3C and IETF Taken from Nokia’s WAP Web Site
  • 37. Page 37 Wireless Application Protocol (WAP)  Used for handheld devices (e.g., cell phone, Palm Pilot) – Makes minimal demands on air interface – Employs light weight protocol stack to minimize bandwidth  Communicate with a WAP gateway to the Internet – Works with most wireless networks – Micro-browser spec controls the user interface  1 August 2001: WAP Forum released WAP 2.0 – Now supports eXtensible HTML (XHTML) – Multimedia messaging services – Instant messaging – Voice, Images – WAP Push (e.g., for alerts)
  • 38. Page 38 Wireless Markup Language (WML)  Adheres to XML standards – Allows use of XML tools  WML documents organized into well defined units of user interaction – Units called “cards” – Suitable for limited display of handheld device – Telephony (computerized phone services) aware  Write once, use anywhere – WML documents can be used by any network or device that is WAP-compliant
  • 39. Page 39 XML Supports Multiple Interfaces Human Interfaces ABCS TBMCS AFATDS GCSS IBS Application Interfaces C4ISR Databases Data Interfaces Mobile Interfaces <air_operations_data> <day-time> 020200Z </day-time> <quantity> 6 </quantity> <country> US </country> <subject_type> FTR </subject_type> <aircraft_type> F15 </aircraft_type> <track_number> 401 </track_number> </air_operations_data> XML-MTF
  • 40. Page 40 The Tower of Babel Problem What’s a Namespace?  We need shared vocabularies and the means to specify relationships between vocabularies  For example, What should the <tank> tag denote? – A tracked vehicle with turret and cannon? – A container for aviation fuel?  Possible solutions: – Standardize all tags everywhere – Have COIs standardize tags for that community  XML namespaces provide the XML Document vocabulary  XML namespace defined by XML schema – More on this later…… www.ontology.org www.rosettanet.org
  • 41. Page 41 What is Metadata?  Metadata is data about data  Metadata adds value by supplying meaning (semantics) to data so that is used as intended  Two types – Internal (about info object content) – External (about info object as a whole)  Metadata is exposed in various ways – Data definitions, schema, ontology – More is better
  • 42. Page 42 What’s a DTD?  Document Type Definition (DTD): – Supplies Metadata – Describe the structure of XML documents – Provide typing information of elements in those documents  Problems with DTDs: – DTDs not written in XML – variety of Typing information is limited  Need something that supports endless variety of Types for maximum flexibility in understanding the meaning of an XML document
  • 43. Page 43 What’s an XML Schema?  Schema is metadata about an XML document (information object)  Used to describe the structure and content of a given XML document type – What will an instance of an XML document contain? (e.g., a purchase order, a phone book record, a target report, etc.) – Elements (of data) – Type (of data) – Structure (of XML document)  Extensible typing -- Users can define their own types – Allows rich semantics (metadata)  Specification against which XML can be validated. May 2, 2001: XML Schema specification released as a W3C recommendation
  • 44. Page 44 XML Schema Example <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org"... <xsd:element name=“Author"> <xsd:complexType> <xsd:sequence> <xsd:element ref=“Name" minOccurs="1" maxOccurs="1"/> <xsd:element ref=“Affil" maxOccurs="1"/> <xsd:element ref=“Email" minOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name=“Name" type="xsd:string"/> <xsd:element name=“Affil" type="xsd:affilType"/> <xsd:element name=“Email" type="xsd:emailType"/> </xsd:schema> Note that XML Schema is written in XML; I.e. an XML schema is an XML document
  • 45. Page 45 XML Schema Example <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://www.publishing.org" xmlns="http://www.publishing.org"... <xsd:element name=“Author"> <xsd:complexType> <xsd:sequence> <xsd:element ref=“Name" minOccurs="1" maxOccurs="1"/> <xsd:element ref=“Affil" maxOccurs="1"/> <xsd:element ref=“Email" minOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name=“Name" type="xsd:string"/> <xsd:element name=“Affil" type="xsd:affilType"/> <xsd:element name=“Email" type="xsd:emailType"/> </xsd:schema> Definition of Element Structure Definition of Element Type XML Namespace Declaration
  • 46. Page 46 Using Schemas Composition and Validation (Message Preparation) Data entry XML Schema XML WSYSWG Editor XML document Guidance XML Parser XML document Validation XML Schema XML Translation Rules Translator Parsing, Validation, & Translation (Message Processing) Sender Receiver User/Application
  • 47. Page 47 XML is not a Silver Bullet  Some XML standards are still in development  Some vendors offer differing implementations of the standard  Just having systems publish XML does not ensure interoperability – XML usage requires Communities of Interest (COI) to agree on XML tags to ensure consistent interpretation – Standardized schemas will also help  XML provides a common language through which to organize, define, structure, and deal with your data and information - it forces you to actually MANAGE information  XML allows many desirable practices for data management and exchange to be applied more broadly, and at lower cost  XML is adding tools to our information management arsenal to attack some of the deeper problems  Web services based on HTTP and XML provide the “least common denominator” to integrate the wide variety of enterprise systems But
  • 48. Page 48 Places to go for more information about XML  www.w3.org  www.xml.com  www.xml.org  www.ebxml.org  www.oasis-open.org/cover/xml.html  www.microsoft.com/xml  www.software.ibm.com/xml  www.oracle.com/xml  www.architag.com
  • 49. Page 49 XML and Bandwidth  Tag bandwidth is small compared to imagery, video, etc.  Ability to dynamically interact with XML content can reduce the granularity of information exchanges – e.g., XQuery returns only a portion of a document  XML can be significantly compressed – Commercial tools: – Pkzip (everyone’s favorite) – XMill -- AT&T smart compression tool – MITRE Knowledge Based Compression – Uses knowledge about message structure to direct compression/decompression – Defense Evaluation and Research Agency (DERA) UK – XML Compression Study
  • 50. Page 50 Electronic Business XML Initiative (ebXML)  Sponsored by UN/CEFACT + OASIS – UN/CEFACT = United Nations body for e-commerce (EDI) – OASIS = Org. for Advancement of Structured Info Stds (runs xml.org)  Developing an open XML- based infrastructure enabling the global use of electronic business information in an interoperable, secure and consistent manner by all parties – a set of open technical specifications that define an interoperable eBusiness framework – An 18-month project  Global, open participation process supporting small, medium and large enterprises  ebXML specifications approved at a meeting in Vienna, Austria on 11 May 2001. Tech specs available at www.ebxml.org
  • 51. Page 51 ebXML Architecture Messaging Services Information Package Service Interface Enterprise System Registry/ Repository Business Process Core Info & Process Information Objects Brokered I/S Exchange ContextFor BuiltWith Enterprise System Service Interface Enable one Partner Role Register Enable other Partner Roles Ops View Service View 14 May 2001 ebXML spec approved
  • 52. Page 52 The Semantic Web  Extension of today’s WWW  Adds meaning (semantics) through: – Rich metadata (ontologies express relationships) – Logic (supports inference rules)  Improved query and search  Semantic mediation (transforms info)  Supports Agents (fuselets) http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html
  • 53. Page 53 What’s RDF?  A W3C model for representing metadata  Metadata represented as ordered triple – (Subject, Verb, Object), e.g. – Or Node, Property, Value – Form is (URI, URI, URI or literal) but stated in XML – Provides graph-like representations – Thus an agent could move along the graph gobbling up metadata <#drbob> <#owns> <#mustang> <#mustang> <#type_of> <#automobile> Note: # is URI reference
  • 54. Page 54 Ontologies Mustang Dr. Bob Automobile owns type of consumer is a Ford made by Graspic DS1 partof Tire type of Sam’s Club sold by member of Some URLlistssalesat
  • 56. Page 56 Semantic Web Technologies  XML Schemas  XML Query  Topic Maps  Resource Description Framework (RDF)  DARPA Agent Markup Language (DAML)  Ontologies
  • 57. Page 57 Tim Berners-Lee Model Unicode URI XML Namespace XMLSchema RDF RDFSchema Ontology Logic Proof Trust DigitalSignature Presented at XML2K (Dec 2000) SemanticWeb
  • 58. Page 58 Building the Semantic Web through Web Services  Transactions initiated automatically by a program, not necessarily using a browser  Can be described, published, discovered, and invoked dynamically in a distributed computing environment  New ways of using the web: intelligent agents, marketplaces, auctions All built on XML and other internet standards! TCP/IP - SSL HTTP/HTTPS Web Servers Servlets/COM XML SOAP WSDL UDDI Brokering Workflow

Editor's Notes

  1. While working at CERN, Tim Berners-Lee’s vision of a World Wide Web where people and systems could exchange information was rejected. However, there was enough peripheral funding through other tasks to keep the effort going.
  2. … The World Wide Web Consortium does.
  3. XSLT has standard function calls to perform actions. Programming code (Java, C++, etc.) and scripts can be used through extensions. But this creates the problem of changing the XSLT code when you migrate to a different engine. One common XSLT engine is xalan from Apache. It is the new name for Lotus XSLT and was given to Apache as open source code by IBM Microsoft’s XSLT engine is included with MSXML.
  4. XQuery permits treating an XML document as a database. XQuery allows a user to navigate and search through an XML document form specified information.
  5. XML Protocol WG released SOAP version 12. Working Draft on 9 July 2001. SOAP version 1.2 is a lightweight protocol for exchange of information in a decentralized, distributed environment. It is an XML based protocol that consists of four parts: an envelope that defines a framework for describing what is in a message and how to process it, a set of encoding rules for expressing instances of application-defined data types, a convention for representing remote procedure calls and responses and a binding convention for exchanging messages using an underlying protocol. SOAP can potentially be used in combination with a variety of other protocols; however, the only bindings defined in the Working Draft document describe how to use SOAP in combination with HTTP and the experimental HTTP Extension Framework. 28 March 2001: ebXML announces the inclusion of SOAP in its messaging Specification
  6. We need ways for computer programs to interrogate XML documents.for information. In addition, we would want to have these programs be able to change the content and structure of an XML document (based on for description of what the form of the document should be – see XML schema). The Document Object Model (DOM) provides a programming interface to an XML document. The D:OM specifies operations (methods) that can be done on the XML document by a computer program.
  7. &amp;lt;number&amp;gt; WAP – Wireless Application Protocol – is an open standard for delivery and presentation of wireless information and telephony services on wireless devices (cellular phones, pagers, PDAs, terminals). WAP is managed by the WAP Forum, a consortium of over 200 wireless device manufacturers representing over 100 million subscribers worldwide. Through WAP servers and gateways, WAP supports Handheld digital wireless devices such as mobile phones, pagers, two-way radios, smartphones and communicators -- from low-end to high-end. WAP is designed to work with most wireless networks such as CDPD, CDMA, GSM, PDC, PHS, TDMA, FLEX, ReFLEX, iDEN, TETRA, DECT, DataTAC, Mobitex. WAP is a communications protocol and application environment. It can be built on any operating system including PalmOS, EPOC, Windows CE, FLEXOS, OS/9, JavaOS etc. It provides service interoperability even between different device families. The WAP specification includes Wireless Markup Language (WML) based on XML that is designed to enable powerful information applications within the constraints of handheld devices and wireless terminals.
  8. &amp;lt;number&amp;gt; The WAP specification provides all of the features needed to support efficient wireless information exchange, thus addressing the constraints of a wireless network. The WAP standard is both air interface and device independent. This means that applications developed for one standard can operate on a wide variety of devices that implement the WAP specification. WAP employs a lightweight protocol stack to minimize bandwidth requirements and at the same time optimizes standard WWW protocols such as HTTP. A microbrowser specification analogous to a web browser controls the user interface. The microbrowser spec has been designed for wireless handsets so the resulting code will be compact and efficient, yet provide a flexible and powerful user interface. Capabilities such as a push mechanism are provided to alert the subscriber to time sensitive information changes (e.g., stoke market quotes).
  9. The WAP programming model closely follows the WWW development model. The specification employs the Wireless Markup Language or WML. WML is a tag-based document language specified as an XML document type. Thus, Web developers will find it easy to develop WAP applications since existing XML authoring tools as well as many HTML development environments can be used to develop WML applications. Using XSL, content written in XML can be automatically translated into content suitable for either HTML or WML WML documents are divided into a set of well-define units of interaction called “cards”. Services are created by letting users navigate back and forth between cards. The QWERTY keyboard or mouse is not assumed; the user interface supports use of the keypads found on handheld devices. WML allows the use of icons and bitmapped graphics for devices that support them. One application will work equally well on a phone with or without graphics by offering alternate text to the phone that is not capable of displaying images. These type of features support eh “write once, use anywhere” paradigm.
  10. XML supports enterprise application integration by allowing diverse systems to interoperate through XML documents. This type of integration is called “loose coupling.” Loose coupling through “messaging” is typically easier to achieve than program based interfaces (tight coupling) in distributed, diverse environments. The World Wide Web and interactions between coalition partners are examples of environments where loose coupling provides benefits.
  11. &amp;lt;number&amp;gt; XML exposes (some of) the meaning (semantics) behind information. This is called self describing information. However, there is room for confusion or misreading since many terms have different meanings depending on their community of use. Thus while tagging information is a step in the right direction, tagging alone doesn’t guarantee that the information while be used as intended by the process that produced it. We could try to obtain agreement on a standardized set of tags. Such attempts at standardization have not worked in the past. We could also attempt to standardize tagging conventions within communities of interest. For example, have the business community agree to a particular tag set.to be used in business documents and processes. At the other extreme we could allow people and systems to use whatever tags they want provided they somehow explain what the tags mean or represent. Those “explanations” might be used to translate between documents based on different tag sets. Various business communities are working towards common vocabularies (tags and schema) and business practices. Ontology.org provides a common business library. RosettaNet is a consortium of major Information Technology, Electronic Components and Semiconductor Manufacturing companies working to create and implement industry-wide, open e-business process standards. These standards form a common e-business language, aligning processes between supply chain partners on a global basis.
  12. Metadata (or data about data) provides additional detail (definitions, structure) about data and information. Metadata assists in making the correct (intended) interpretation of data. Metadata helps ensure that information will be used properly (as intended). Metadata has often been implicit or embedded in procedures, standards documents, or conventions. A good example is a person’s postal address. We (generally) use 3 lines – name, street address, city/state/zip code – in that order. Consequently an address of: First Street Commerce Bank Hampton, VA 23665 Would be interpreted as an organization with the name of First Street located on a street called Commerce Bank in the City of Hampton, VA. This typically would be viewed as a mistake by a human. However, an automated process depending on programmed logic may add the organization First Street to its mailing list! On the other hand, there may actually be an organization with the name of First Street and there may be a Commerce Bank street in Hampton, VA. Additional metadata, such as a list of street names in Hampton, VA could be consulted (via programmed logic) to address this issue.
  13. A DTD supplies typing and structural information about XML documents. By interrogating the DTD, a process extracts details about the content and structure of the XML document. For example, the process could determine that the XML document is not organized properly. The process may then send the document back to have errors corrected. Of particular value is typing information about elements. This allows processes to interpret the content of the XML document. Thus, a rich typing environment is to be preferred. However, the typing ability with DTD’s is rudimentary. We need something better than DTD’s to support automated processing.
  14. As with DTD’s, XML schema provides metadata about the content and organization of XML documents. XML schemas are written in XML and provide a rich typing facility. This is often described in programming lingo as “user defined types.”
  15. Note how the schema definition gives EXPLICIT details about the content and structure on the XML document. Included are (data) element definitions and typing information. The structure of the XML document is laid out in the schema.
  16. The definition of an XML schema leads to the definition of an XML NAMESPACE, which is referenced bya URI. The specified namespace also appears in the XML instance document. Thus the XML schema pertaining to the XML instance document can be retrieved for use in processing (understanding) the XML instance document.
  17. XML schema provides semantic information to understand and use the content of an XML instance document. The XML schema can be used to “direct” the generation of an XML document. The schema guides the word processor of WSYIWYG editor prompting the human as to what information to supply in which portion of the document. This technique can also support the “automatic generation” of XML messages, say from a database. The generated document can be “validated” during (or after) the process against the schema. At the receiving end, the schema is used to both verify that the received message is valid and to guide the extraction of the contained information. Additional processing to manipulate the extracted information can be accomplished using a rules-based translator, such as an XLST engine operating against a style sheet. This information can be presented to the user or to an application.
  18. Due to the diverse way that we communicate and build automated systems, XML does not solve all interoperability problems. But XML is a big step towards the goal of system interoperability. It forces users and developers to focus on the management of information. XML provides tools that can be used to enable diverse distributed systems to share information and interoperate.
  19. These are just a few of the web sites that provide information about XML.
  20. &amp;lt;number&amp;gt; Adding tags to data to turn it into structured information involves some trade-offs. The same tags that add improved data processing and searching capabilities lead to increases in the size of the information object. However, compared to graphics, images, and video, the “size” of XML documents attributable to the tags is not large. The increased information capability more than makes up for the increased size. However, increased object size can be a problem in certain applications, such as with wireless devices. In these cases, increased size can be addressed by: A variety of compression techniques Finer granulation information transactions enabled by precision tagging and special wireless protocols Current research is investigating the best compression techniques to use depending on the application. XMill was developed at UPenn and AT&amp;T, and it has a simple command interface. They have Win32 and Unix executables and their source is available. Here are some stats, showing file sizes (XMI is compressed XML file) based on some sample ATOs: MTF XML XMI 11k 166k 7k 19k 213k 11k 17k 297k 5k 101k 1862k 15k 145k 2645k 22k   XMill is a new tool for compressing XML data efficiently. It is based on a regrouping strategy that leverages the effect of highly-efficient compression techniques in compressors such as gzip. XMill groups XML text strings with respect to their meaning and exploits similarities between those text strings for compression. Hence, XMill typically achieves much better compression rates than conventional compressors such as gzip. Defense Engineering Research Agency (UK) has achieved significant compression of XML-MTF. A 134KB MXL-MTF was compressed to 1.6KB.
  21. &amp;lt;number&amp;gt; UN/CEFACT = UN Center for Trade Facilitation and Electronic Business.
  22. Ontologies establish a joint terminology between members of a community of interest. These members can be human or automated agents.
  23. Topic Maps: A collection of topics (resource that acts as a proxy for some subject), associations (relations between topics), or scopes (topic’s context) See www.topicmaps.org. XML Topic Maps (XTM): This specification provides a model and grammar for representing the structure of information resources used to define topics, and the associations (relationships) between topics. Names, resources, and relationships are said to be characteristics of abstract subjects, which are called topics. Topics have their characteristics within scopes: i.e. the limited contexts within which the names and resources are regarded as their name, resource, and relationship characteristics. One or more interrelated documents employing this grammar is called a “topic map.” TopicMaps.Org is an independent consortium of parties developing the applicability of the topic map paradigm [ISO13250] to the World Wide Web by leveraging the XML family of specifications. This specification describes version 1.0 of XML Topic Maps (XTM) 1.0 [XTM], an abstract model and XML grammar for interchanging Web-based topic maps, written by the members of the TopicMaps.Org Authoring Group. More information on XTM and TopicMaps.Org is available at http://www.topicmaps.org/about.html.