This document discusses XML::Pastor, a Perl module that generates Perl code from XML schemas to allow roundtripping of XML data to and from Perl objects without losing schema information. It abstracts away some of the pain of working with XML by generating classes from XML schemas that can then be used to easily create, parse, modify and validate XML documents programmatically. The document provides examples of how XML::Pastor can be used to work with XML data in a more object-oriented way compared to alternatives like XML::LibXML. It also discusses some limitations and comparisons to other XML parsing modules.
7. XML is hard, right?
Some hard things:
• Roundtripping data
• Manipulating XML via DOM API
• Preserving element sibling order,
comments, XML entities etc.
9. XML::Pastor
• I didn’t write it
• Written by Ayhan Ulusoy
• Available on CPAN
• Abstracts away some of the pain of XML
10. What does it do?
• Generates Perl code from W3C XML
Schema (XSD)
• Roundtrip and validate XML to/from Perl
without loss of schema information
• Lets you program without caring about
XML structure
11. Parsing with Pastor
• Parse entire XML into XML::LibXML::DOM
object
• Convert XML DOM tree into native Perl
objects
• Throw away DOM, no longer needed
12. Reasons to not use
XML::Pastor
• When you have no XML Schema
• Although several tools can infer XML
schemata from documents
• It’s a code-generator
• No stream parsing
13. XML::Pastor
Code Generation
• Write out static code to tree of .pm files
• Write out static code to single .pm file
• Create code in a scalar in memory
• Create code and eval() it for use
15. How Pastor works
Code generation
• Parse schemata into schema model
• Perl data structures containing all the
global elements, types, attributes, ...
• “Resolve” Model - determine class names,
resolve references, etc
• Create boilerplate code, write out / eval
17. How Pastor works
Generated classes
• Each generated class (i.e. type) has classdata
“XmlSchemaType” containing schema
model
• If the class isa SimpleType it may contain
restriction facets
• If the class isa ComplexType it will contain
info about child elements and attributes
18. How Pastor works
In use
• If classes generated offline, then “use”
them, if online then they are already loaded
• These classes have methods to create,
retrieve, save object to/from XML
• Manipulate/query data using OO API to
complexType fields
• Validate modified objects against schema
38. XML::Pastor Scope
• Good for “data XML”
• Unsuitable for “mixed markup”
• e.g. XHTML
• Unsuitable for “huge” documents
39. XML::Pastor Supported
XML Schema Features
• Simple and Complex Types
• Global Elements
• Groups, Attributes, AttributeGroups
• Derive simpleTypes by extension
• Derive complexTypes by restriction
• W3C built-in Types, Unions, Lists
• (Most) Restriction Facets for Simple types
• External Schema import, include, redefine
40. XML::Pastor
known limitations
• Mixed elements unsupported
• Substitution groups unsupported
• ‘any’ and ‘anyAttribute’ elements
unsupported
• Encodings (only UTF-8 officially supported)
• Default values for attributes - help needed
41. XML Data Binding
• Binding XML documents to objects
specifically designed for the data in those
documents
• Allows e.g. data-centric applications to
manipulate data more naturally than by
using DOM API
47. XML::Twig
• Manipulates XML directly
• Using code is coupled closely to
document structure
• Optimised for processing huge documents
as trees
• No schemata, no validation
48. XML::Compile
• Original design rationale is to deal with
SOAP envelopes and WSDL documents
• Different approach but similar goals to
Pastor - processes XML based on XSD into
Perl data structures
• More like XML::Simple with Schema
support
49. XML::Compile pt. 2
• Schema support incomplete
• Shaky support for imports, includes
• Include restriction on targetNamespace
• I haven’t used it yet but it looks good
50. XML::Simple
• Working roundtrip binding for simple cases
• e.g. XMLout(XMLin($file))
works
• Simple API
• Produces single deep data structure
• Gotchas with element multiplicity
51. XML::Simple pt. 2
• No schemata, no validation
• Can be teamed with a SAX parser
• More suitable for configuration files?
52. XML::Smart
• Similar implementation to XML::Pastor
• Uses tie() and lots of crac^H^H^H^Hmagic
• Gathers structure information from XML
instance, rather than schema
• No code generation!
53. XML::Smart pt. 2
• No schemata, so no schema validation
• Based on Object::MultiType - overloaded
objects as HASH, ARRAY, SCALAR, CODE
& GLOB
• Like Pastor, overloads array/hashref access
to the data - promotes decoupling
• Reasonable docs, some community growing
57. XML Schema Inference
• Create an XML schema from an XML
document instance
• Every document has an (implicit) schema
• Tools like Relaxer, Trang, as well as the
System.Xml.Serializer the .NET Framework
can all infer XML Schemata from document
instances