XML From The Ground Up
?
12345678901234567890123456789
simpson
bart
springfield
flintstonefred
bedrock
rubble
barney bedrock
Fixed Width Field
12345678901234567890123456789
simpson
bart
springfield
flintstonefred
bedrock
rubble
barney bedrock
Fixed Width cont…
simpson

bart

springfield

flintstone

fred

bedrock

rubble

barney

bedrock
?
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""",,4900.00
1996,Jeep,Grand Cherokee,"MUST...
CSV
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""",,4900.00
1996,Jeep,Grand Cherokee,"MU...
CSV cont…

1997 Ford

E350

1999 Chevy

Venture "Extended
Edition"

1996 Jeep

Grand Cherokee

ac, abs, moon

3000.00
4900...
?


01041cam 2200265 a 450000100200000000300040002000
50017000240080041000410100024000820200025001060200
0440013104000180...
MARC


01041cam 2200265 a 450000100200000000300040002000
50017000240080041000410100024000820200025001060200
0440013104000...
MARC cont…







Leader 01041cam 2200265 a 4500
Control No. 001 ###89048230
Control No. ID 003 DLC DTLT 005 1991110...
?
:p.Here's an example of some BASIC statements:
:xmp.
10 PRINT USING 55 A, B, C
20 LET J = K + 2
30 IF J = X GO TO 80
:ex...
GML
:p.Here's an example of some BASIC statements:
:xmp.
10 PRINT USING 55 A, B, C
20 LET J = K + 2
30 IF J = X GO TO 80
:...
GML cont…
SGML
<QUOTE TYPE="example"> typically
something like
<ITALICS>this</ITALICS>
</QUOTE>
HTML
XML - 1
<stats21>
<ARN ref="E008026">
<AttendantCircumstancesRecord>
<PoliceForce>96</PoliceForce>
<YearOfRecord>00</YearO...
XML - 2





Is for structuring data
Is derived from SGML/HTML
Is text, but isn’t meant to be read
Is verbose by desig...
Basic Syntax of XML








All XML elements must have a closing tag
Empty elements must close with /
XML tags are ...
Special Characters in XML
strings






&
<
>
"
'

- &amp;
- &lt;
- &gt;
- &quot;
- '
Example of Special
Characters


Invalid XML

<Organization>Logica & SE</Organization>


Valid XML

<Organization>Logica ...
XML Structure
<?xml version="1.0" encoding="utf-8"



Prolog.(optional)



standalone="no"?>
<?xml-stylesheet type="text...
XML Example
<?xml version="1.0" encoding="UTF-8"?>
<Recipe name="bread" prep_time="5 mins" cook_time="3 hours">
<title>Bas...
root

p-i

text
attribute

element

Root
?xml

Recipe

prep_time

cook_time

name
title

ingredient
amount

bread

Basic
…...
Attributes vs Elements
Data can be stored in child elements or in attributes.

<person sex="female">
<fname>Anna</fname>
<...
Namespaces



Disambiguation mechanism
<x xmlns:edi='http://ecommerce.org/schema'>
  <!--the "edi" prefix is bound to ht...
XML Document Structure
Tree Representation
Tree
Pruning
Grafting
Hierarchy
Tree Traversal
Tree Models
Trees – Nested Set view
Take Home …








XML is a syntax for marking up data
Markup tags are not pre-defined
Namespaces make identical tag...
XPath






language for addressing part/s of an
XML document
designed to be used by XSLT
models XML document as tree ...
XPath & XML Document
Structure
<xml>

xml

<table>

xml/table

<rec id="1">

xml/table/rec

<numField>123</numField>

xml/...
XSL/XSLT
XSL/XSL Example - Source
<persons>
<person username="MP123456">
<name>John</name>
<family_name>Smith</family_name>
</perso...
XSLT Stylesheet
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl...
Transformed Output
<?xml version="1.0" encoding="UTF-8"?>
<transform>
<record>
<username>MP123456</username>
<name>John</n...
XSLT Functions
current
document
element-available
format-number
function-available
generate-id
key
system-property
unparse...
XPath Functions
boolean
ceiling
concat
contains
count
false
floor
id
lang

last
local-name
name
namespace-uri
normalize-sp...
XSL-FO Processor
Take Home …





XPath to address data within XML
XSLT to re-structure XML
They operate on collections of nodes
They w...
XSLT_test.htm
XML Schema





A pattern for XML documents
Content
Structure
Constraints
XML Schema Defines …


Content




Structure






elements & attributes
parent-child relationships
order of child ...
Example: Simple XML File
<?xml version="1.0"?>
<note>
<to>Peter</to>
<from>Clare</from>
<heading>Reminder</heading>
<body>...
Example: XML Schema
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema“>
<xs:element name="note">...
Schema components [1]
The <schema> element
<?xml version="1.0"?>
<xs:schema …..
... ...
</xs:schema>
Schema components [2]


Simple element


can contain only text. It cannot contain any
other elements or attributes.

<xs...
Schema components [3]


Attributes

e.g.
<xs:attribute name="lang" type="xs:string"/>
<lastname lang="EN">Smith</lastname...
Schema components [4]
Built-in data types…. E.g:
 xs:string
 xs:decimal
 xs:integer
 xs:boolean
 xs:date
 xs:time
Schema restrictions
[restriction base]
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minI...
Schema restrictions
[enumeration]
<xs:element name="car">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeratio...
Schema restrictions
[pattern/regular expression]
<xs:element name="letter">
<xs:simpleType>
<xs:restriction base="xs:strin...
Regular Expressions


Wildcards on steroids

ab|c{2}|de

“ab”; “cc”; “de”

[A-Z]{1,4}

“ABDS”; “A”; “ZXS”

[1970-2030]

e...
Restrictions for Datatypes


enumeration



minExclusive



fractionDigits



minInclusive



length



minLength

...
Complex Element
contains other elements and/or attributes.
[4 kinds]
1)
2)
3)
4)

empty elements
elements that contain onl...
Complex Element examples
a)

b)

c)

<product pid="1345"/>
<employee>
<firstname>John</firstname>
<lastname>Smith</lastnam...
Complex Element Definition
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname"
type="...
Complex Element Definition /2


Reference to complex type



<xs:element name="employee" type="personinfo"/>

<xs:comple...
Type Reuse


Several elements based on same type

<xs:element name="employee" type="personinfo"/>
<xs:element name="stude...
Type Extension
<xs:complexType name="fullpersoninfo">
<xs:complexContent>
<xs:extension base="personinfo">
<xs:sequence>
<...
Indicators


Seven type of indicators enable composition



Order indicators:






Occurrence indicators:





A...
<any>




The <any> element enables us to
extend the XML document with
elements not specified by the schema.
The <anyAtt...
Where’s the beef?
XML Schema permits…
 Standard libraries of data specifications
 Formal specification of data models
 ...
XML Schema QA




Automated using a QA XSLT
GovTalk – Schema QA Stylesheet
schemaQA_1.htm
Schema Libraries




Govtalk
Ordnance Survey MasterMap
Environmental Information Exchange
XML Toolkit


Parsers (validating & non-validating)




DOM (Document Object Model)
SAX (Simple API for XML)
Hybrid pu...
Schema & Validation


Schema provide basis for automated
validation of XML

xmlValidation.dot
Schema & Document Creation
SAS XML Mapper
SAS XMLMap
<?xml version="1.0" encoding="UTF-8" ?>
<SXLEMAP >
<TABLE name="docDscr_citation__titl">
<TABLE-PATH syntax="XP...
SAS XMLMap Manager Plugin
Benefits of the XML route





Open Standards
Vendor Neutral
e-GIF/OSIAF compliant
Very flexible – one source, many us...
Problems with the XML route






XML files tend to be large
DOM (Drudgery Object Model)
Inter-record linking & valida...
OK, What next…?






Vocabularies
Schemas
Additional intra-record validation based
on XSLT and XPath
Publish
Vocabularies




Domain experts identify data items and
agree a vocabulary.
Arrange items into logical data
groupings
XML Schemas







Model the data items (UML?)
Isolate common data definitions
Prepare Schemas
Disambiguate using na...
Intra-record validation
Options include…
 XSLT
 XPath
(SE examples: Pupil Census; Road
Accident Stats.)

Publication






Add to Schema Library
Govtalk
Ordnance Survey MasterMap
Environmental Information Exchange
Example:...
Xml pres 1
Xml pres 1
Xml pres 1
Xml pres 1
Upcoming SlideShare
Loading in …5
×

Xml pres 1

639 views

Published on

A presentation on XML from 2006! but still useful

Published in: Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
639
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Restrictions for Datatypes
    Enumeration : Defines a list of acceptable values
    fractionDigits : Specifies the maximum number of decimal places allowed. Must be equal to or greater than zero
    Length : Specifies the exact number of characters or list items allowed. Must be equal to or greater than zero
    maxExclusive : Specifies the upper bounds for numeric values (the value must be less than this value)
    maxInclusive : Specifies the upper bounds for numeric values (the value must be less than or equal to this value)
    maxLength : Specifies the maximum number of characters or list items allowed. Must be equal to or greater than zero
    minExclusive : Specifies the lower bounds for numeric values (the value must be greater than this value)
    minInclusive: Specifies the lower bounds for numeric values (the value must be greater than or equal to this value)
    minLength: Specifies the minimum number of characters or list items allowed. Must be equal to or greater than zero
    Pattern : Defines the exact sequence of characters that are acceptable
    totalDigits : Specifies the exact number of digits allowed. Must be greater than zero
    whiteSpace : Specifies how white space (line feeds, tabs, spaces, and carriage returns) is handled
  • Order indicators are used to define how elements should occur.
    The &lt;all&gt; indicator specifies by default that the child elements can appear in any order and that each child element must occur once and only once: The &lt;choice&gt; indicator specifies that either one child element or another can occur:
    The &lt;sequence&gt; indicator specifies that the child elements must appear in a specific order:
    Occurrence indicators are used to define how often an element can occur.
    The &lt;maxOccurs&gt; indicator specifies the maximum number of times an element can occur:
    The &lt;minOccurs&gt; indicator specifies the minimum number of times an element can occur:
    Group indicators are used to define related sets of elements.
  • Xml pres 1

    1. 1. XML From The Ground Up
    2. 2. ? 12345678901234567890123456789 simpson bart springfield flintstonefred bedrock rubble barney bedrock
    3. 3. Fixed Width Field 12345678901234567890123456789 simpson bart springfield flintstonefred bedrock rubble barney bedrock
    4. 4. Fixed Width cont… simpson bart springfield flintstone fred bedrock rubble barney bedrock
    5. 5. ? 1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air - moon roof loaded",4799.00
    6. 6. CSV 1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air - moon roof loaded",4799.00
    7. 7. CSV cont… 1997 Ford E350 1999 Chevy Venture "Extended Edition" 1996 Jeep Grand Cherokee ac, abs, moon 3000.00 4900.00 MUST SELL! air - moon roof loaded 4799.00
    8. 8. ?  01041cam 2200265 a 450000100200000000300040002000 50017000240080041000410100024000820200025001060200 04400131040001800175050002400193082001800217100003 20023524500870026724600360035425000120039026000370 04023000029004395000042004685200220005106500033007 30650001200763^###89048230#/AC/r91^DLC^19911106082 810.9^891101s1990####maua###j######000#0#eng##^##$ a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a 0316107506 (pbk.) : $c$5.95 ($6.95 Can.)^##$aDLC$cD LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^ 10$aBrenner, Richard J.,$d1941-^10$aMake the team. $pSoccer :$ba heads up guide to super soccer! /$cR ichard J. Brenner.^30$aHeads up guide to super soccer.^##$a1st ed.^##$aBoston : $bLittle, Brown,$cc19 90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill ustrated for kids book."^##$aInstructions for improving soccer skills. Discusses dribbling, heading, playmaking, defense, conditioning, mental attitud e, how to handle problems with coaches, parents, and other players, and the history of soccer.^#0$aS occer$vJuvenile literature.^#1$aSoccer.^
    9. 9. MARC  01041cam 2200265 a 450000100200000000300040002000 50017000240080041000410100024000820200025001060200 04400131040001800175050002400193082001800217100003 20023524500870026724600360035425000120039026000370 04023000029004395000042004685200220005106500033007 30650001200763^###89048230#/AC/r91^DLC^19911106082 810.9^891101s1990####maua###j######000#0#eng##^##$ a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a 0316107506 (pbk.) : $c$5.95 ($6.95 Can.)^##$aDLC$cD LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^ 10$aBrenner, Richard J.,$d1941-^10$aMake the team. $pSoccer :$ba heads up guide to super soccer! /$cR ichard J. Brenner.^30$aHeads up guide to super soccer.^##$a1st ed.^##$aBoston : $bLittle, Brown,$cc19 90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill ustrated for kids book."^##$aInstructions for improving soccer skills. Discusses dribbling, heading, playmaking, defense, conditioning, mental attitud e, how to handle problems with coaches, parents, and other players, and the history of soccer.^#0$aS occer$vJuvenile literature.^#1$aSoccer.^
    10. 10. MARC cont…       Leader 01041cam 2200265 a 4500 Control No. 001 ###89048230 Control No. ID 003 DLC DTLT 005 19911106082810.9 Fixed Data 008 891101s1990 maua j 001 0 eng LCCN 010 ## $a ###89048230 ISBN 020 ## $a 0316107514 : $c $12.95  ISBN 020 ## $a 0316107506 (pbk.) : $c $5.95 ($6.95 Can.)  Cat. Source 040 ## $a DLC $c DLC $d DLC LC   Call No. 050 00 $a GV943.25 $b .B74 1990 Dewey No. 082 00 $a 796.334/2 $2 20 …
    11. 11. ? :p.Here's an example of some BASIC statements: :xmp. 10 PRINT USING 55 A, B, C 20 LET J = K + 2 30 IF J = X GO TO 80 :exmp. :pc.that will solve this problem. :fig place=inline width=page frame=box. AN INLINE, PAGE-WIDE FIGURE Because the contents of a figure format EXACTLY as entered, you can enter blanks on the line (before text) and the lines will print exactly the same as they were entered! :figcap. An Inline, Page-Wide Figure :figdesc. This is the first figure I have entered myself. :efig. :p.This paragraph follows the FIG end tag. Here we have another figure (inline and column wide): :fig place=inline width=column. Let's create another figure that is column wide, which will create a second item for a list of illustrations in a future exercise. :figcap. A Column-Wide Figure :efig.
    12. 12. GML :p.Here's an example of some BASIC statements: :xmp. 10 PRINT USING 55 A, B, C 20 LET J = K + 2 30 IF J = X GO TO 80 :exmp. :pc.that will solve this problem. :fig place=inline width=page frame=box. AN INLINE, PAGE-WIDE FIGURE Because the contents of a figure format EXACTLY as entered, you can enter blanks on the line (before text) and the lines will print exactly the same as they were entered! :figcap. An Inline, Page-Wide Figure :figdesc. This is the first figure I have entered myself. :efig. :p.This paragraph follows the FIG end tag. Here we have another figure (inline and column wide): :fig place=inline width=column. Let's create another figure that is column wide, which will create a second item for a list of illustrations in a future exercise. :figcap. A Column-Wide Figure :efig.
    13. 13. GML cont…
    14. 14. SGML <QUOTE TYPE="example"> typically something like <ITALICS>this</ITALICS> </QUOTE>
    15. 15. HTML
    16. 16. XML - 1 <stats21> <ARN ref="E008026"> <AttendantCircumstancesRecord> <PoliceForce>96</PoliceForce> <YearOfRecord>00</YearOfRecord> <MonthOfRecord>00</MonthOfRecord> <AccidentReferenceNumber>E008026</AccidentReferenceNumber> <AccidentSeverity>3</AccidentSeverity> <NumberOfVehicles>002</NumberOfVehicles> <NumberOfCasualties>001</NumberOfCasualties> … </AttendantCircumstancesRecord> </ARN> </stats21>
    17. 17. XML - 2     Is for structuring data Is derived from SGML/HTML Is text, but isn’t meant to be read Is verbose by design
    18. 18. Basic Syntax of XML        All XML elements must have a closing tag Empty elements must close with / XML tags are case sensitive All XML elements must be properly nested All XML documents must have a root element Attribute values must always be quoted XML entities must be used for special characters
    19. 19. Special Characters in XML strings      & < > " ' - &amp; - &lt; - &gt; - &quot; - '
    20. 20. Example of Special Characters  Invalid XML <Organization>Logica & SE</Organization>  Valid XML <Organization>Logica &amp; SE</Organization>
    21. 21. XML Structure <?xml version="1.0" encoding="utf-8"  Prolog.(optional)  standalone="no"?> <?xml-stylesheet type="text/css“  href="xmlstyle.css"?> <bookstore xml:lang="en-US“  xmlns:def="Definitions“> <book id=“1”>The Bible</book>   … </bookstore>  Processing Instruction (optional)  Document Element (namespace/s)  Child node/s  Closing tag of Document Element 
    22. 22. XML Example <?xml version="1.0" encoding="UTF-8"?> <Recipe name="bread" prep_time="5 mins" cook_time="3 hours"> <title>Basic bread</title> <ingredient amount="3" unit="cups">Flour</ingredient> <ingredient amount="0.25" unit="ounce">Yeast</ingredient> <ingredient amount="1.5" unit="cups“ state="warm">Water</ingredient> <ingredient amount="1" unit="teaspoon">Salt</ingredient> <Instructions> <step>Mix all ingredients together, and knead thoroughly.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Knead again, place in a tin, and then bake in the oven.</step> </Instructions> </Recipe>
    23. 23. root p-i text attribute element Root ?xml Recipe prep_time cook_time name title ingredient amount bread Basic … Flour 5mins 3 step step step Mix… Cover… Knead… Instructions 3 hours
    24. 24. Attributes vs Elements Data can be stored in child elements or in attributes. <person sex="female"> <fname>Anna</fname> <lname>Smith</lname> </person> <person> <sex>female</sex> <fname>Anna</fname> <lname>Smith</lname> </person>
    25. 25. Namespaces   Disambiguation mechanism <x xmlns:edi='http://ecommerce.org/schema'>   <!--the "edi" prefix is bound to http://ecommerce.org/schema        for the "x" element and contents -->  </x> <x xmlns:edi='http://ecommerce.org/schema'>   <!-- the 'price' element's namespace is  http://ecommerce.org/schema -->   <edi:price units='Euro'>32.18</edi:price> </x>
    26. 26. XML Document Structure
    27. 27. Tree Representation
    28. 28. Tree
    29. 29. Pruning
    30. 30. Grafting
    31. 31. Hierarchy
    32. 32. Tree Traversal
    33. 33. Tree Models
    34. 34. Trees – Nested Set view
    35. 35. Take Home …      XML is a syntax for marking up data Markup tags are not pre-defined Namespaces make identical tag names unique An XML instance document is made up of markup tags and text (data) XML documents are tree structures
    36. 36. XPath     language for addressing part/s of an XML document designed to be used by XSLT models XML document as tree of nodes fully supports XML Namespaces
    37. 37. XPath & XML Document Structure <xml> xml <table> xml/table <rec id="1"> xml/table/rec <numField>123</numField> xml/table/rec/numField <stringField>StringValue</stringField> xml/table/rec/stringField </rec> <rec id="2"> xml/table/rec/@id <numField>346</numField> <stringField>Text Value</stringField> </rec> </table> </xml> xml/table/rec[@id='2']
    38. 38. XSL/XSLT
    39. 39. XSL/XSL Example - Source <persons> <person username="MP123456"> <name>John</name> <family_name>Smith</family_name> </person> <person username="PK123456"> <name>Sally</name> <family_name>Jones</family_name> </person> </persons>
    40. 40. XSLT Stylesheet <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <transform> <xsl:apply-templates/> </transform> </xsl:template> <xsl:template match=" person"> <record> <username> <xsl:value-of select=" @username" /> </username> <name> <xsl:value-of select=" name" /> </name> </record> </xsl:template> </xsl:stylesheet>
    41. 41. Transformed Output <?xml version="1.0" encoding="UTF-8"?> <transform> <record> <username>MP123456</username> <name>John</name> </record> <record> <username>PK123456</username> <name>Sally</name> </record> </transform>
    42. 42. XSLT Functions current document element-available format-number function-available generate-id key system-property unparsed-entity-uri
    43. 43. XPath Functions boolean ceiling concat contains count false floor id lang last local-name name namespace-uri normalize-space not number position round starts-with string string-length substring substring-after substring-before sum translate true
    44. 44. XSL-FO Processor
    45. 45. Take Home …     XPath to address data within XML XSLT to re-structure XML They operate on collections of nodes They work with any type of XML
    46. 46. XSLT_test.htm
    47. 47. XML Schema     A pattern for XML documents Content Structure Constraints
    48. 48. XML Schema Defines …  Content   Structure     elements & attributes parent-child relationships order of child elements number of child elements Constraints    whether an element is empty or can include text data types for elements and attributes default/fixed values for elements & attributes
    49. 49. Example: Simple XML File <?xml version="1.0"?> <note> <to>Peter</to> <from>Clare</from> <heading>Reminder</heading> <body>Don't forget the pub this weekend!</body> </note>
    50. 50. Example: XML Schema <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema“> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
    51. 51. Schema components [1] The <schema> element <?xml version="1.0"?> <xs:schema ….. ... ... </xs:schema>
    52. 52. Schema components [2]  Simple element  can contain only text. It cannot contain any other elements or attributes. <xs:element name="to" type="xs:string"/>
    53. 53. Schema components [3]  Attributes e.g. <xs:attribute name="lang" type="xs:string"/> <lastname lang="EN">Smith</lastname>
    54. 54. Schema components [4] Built-in data types…. E.g:  xs:string  xs:decimal  xs:integer  xs:boolean  xs:date  xs:time
    55. 55. Schema restrictions [restriction base] <xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="100"/> </xs:restriction> </xs:simpleType> </xs:element>
    56. 56. Schema restrictions [enumeration] <xs:element name="car"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType> </xs:element>
    57. 57. Schema restrictions [pattern/regular expression] <xs:element name="letter"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-z]"/> </xs:restriction> </xs:simpleType> </xs:element>
    58. 58. Regular Expressions  Wildcards on steroids ab|c{2}|de “ab”; “cc”; “de” [A-Z]{1,4} “ABDS”; “A”; “ZXS” [1970-2030] e.g. years in range [A-Z]{1,2}[0-9R][0-9AZ]? [0-9][A-Z]{2} Post Codes
    59. 59. Restrictions for Datatypes  enumeration  minExclusive  fractionDigits  minInclusive  length  minLength  maxExclusive  pattern  maxInclusive  totalDigits  maxLength  whiteSpace
    60. 60. Complex Element contains other elements and/or attributes. [4 kinds] 1) 2) 3) 4) empty elements elements that contain only other elements elements that contain only text elements that contain both other elements and text
    61. 61. Complex Element examples a) b) c) <product pid="1345"/> <employee> <firstname>John</firstname> <lastname>Smith</lastname> </employee> <food type="dessert">Ice cream</food>
    62. 62. Complex Element Definition <xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
    63. 63. Complex Element Definition /2  Reference to complex type  <xs:element name="employee" type="personinfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>
    64. 64. Type Reuse  Several elements based on same type <xs:element name="employee" type="personinfo"/> <xs:element name="student" type="personinfo"/> <xs:element name="member" type="personinfo"/>
    65. 65. Type Extension <xs:complexType name="fullpersoninfo"> <xs:complexContent> <xs:extension base="personinfo"> <xs:sequence> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
    66. 66. Indicators  Seven type of indicators enable composition  Order indicators:     Occurrence indicators:    All Choice Sequence maxOccurs minOccurs Group indicators:   Group name attributeGroup name
    67. 67. <any>   The <any> element enables us to extend the XML document with elements not specified by the schema. The <anyAttribute> element enables us to extend the XML document with attributes not specified by the schema.
    68. 68. Where’s the beef? XML Schema permits…  Standard libraries of data specifications  Formal specification of data models  Automated validation of XML instance files based on XML Schema  Simplified creation of structured documents
    69. 69. XML Schema QA    Automated using a QA XSLT GovTalk – Schema QA Stylesheet schemaQA_1.htm
    70. 70. Schema Libraries    Govtalk Ordnance Survey MasterMap Environmental Information Exchange
    71. 71. XML Toolkit  Parsers (validating & non-validating)    DOM (Document Object Model) SAX (Simple API for XML) Hybrid pull parsers
    72. 72. Schema & Validation  Schema provide basis for automated validation of XML xmlValidation.dot
    73. 73. Schema & Document Creation
    74. 74. SAS XML Mapper
    75. 75. SAS XMLMap <?xml version="1.0" encoding="UTF-8" ?> <SXLEMAP > <TABLE name="docDscr_citation__titl"> <TABLE-PATH syntax="XPath">/codeBook/docDscr/citation/titlStmt/titl </TABLE-PATH> <COLUMN name="docDscrcitationtitl ">   <PATH syntax="XPath">/codeBook/docDscr/citation/titlStmt/titl </PATH>   <TYPE>character</TYPE>   <DATATYPE>string</DATATYPE>   <LENGTH>950</LENGTH>   <LABEL>Full authoritative title of the documentation (DC Title)</LABEL>   </COLUMN> </TABLE> </SXLEMAP>
    76. 76. SAS XMLMap Manager Plugin
    77. 77. Benefits of the XML route     Open Standards Vendor Neutral e-GIF/OSIAF compliant Very flexible – one source, many uses
    78. 78. Problems with the XML route     XML files tend to be large DOM (Drudgery Object Model) Inter-record linking & validation across records is not trivial Many tools are not mature (but this situation is improving rapidly.)
    79. 79. OK, What next…?     Vocabularies Schemas Additional intra-record validation based on XSLT and XPath Publish
    80. 80. Vocabularies   Domain experts identify data items and agree a vocabulary. Arrange items into logical data groupings
    81. 81. XML Schemas       Model the data items (UML?) Isolate common data definitions Prepare Schemas Disambiguate using namespaces Validate model QA Schemas for compliance with standards (automated)
    82. 82. Intra-record validation Options include…  XSLT  XPath (SE examples: Pupil Census; Road Accident Stats.) 
    83. 83. Publication      Add to Schema Library Govtalk Ordnance Survey MasterMap Environmental Information Exchange Example: BS7666

    ×