Schematron(and other useful tools)Stuart Mylessmyles@ap.org
An Aside: AP’s  Ingestion PiplelineATOM + XHTMLOne way we ingest content:we transform ATOM and XHTML into our internal XML (APPL)  and NITFXSLT TransformAPPL + NITFThis is greatly simplified, obviously.
<p>The budget was just £100.</p><p>How could it be done for so little money?<p>Luckily open source tools were available.</p>These are not new problems.</p>The solutions were even standardized.<p/>Converting from HTML to XML
Hard to enforce rules in the spec“HeadLine - this element must contain the same value as the entry’s <title> element”“summary is required for non-text content items, such as news photos and video. This element is optional for text story content items.”XML structure complies with XSD……but can fail in downstream systems
Validate and Fix Prior to IngestionOriginal ATOM + XHTMLTidy fixes sloppy HTMLCustom XSLT tidies up XMLW3C schema validates structure & syntaxSchematron schema validates business rulesValid ATOM + XHTML, ready for ingestion
HTML Tidy      Fix sloppy HTMLHTML -> XHTML
Schematron Fact checker for XML documentsBusiness rules that can’t be expressed in W3C XSD schemaMediaType="Video" Format="ANPA1312"Previously, we had to inspect new feeds to catch errorsThe risk is that feeds are approved but errors appear later(Not to mention manual checking of XML is tedious)
SchematronSmall, powerful, lightweight fact-checker for XML documentsSpecify constraints using XPATH rulesYou write the error messagesSchematron SchemaOne time compile into an XSLTValidation as an XSLT transformValidatePresence or absence of specific contentRelationships between elements and attributesReportsValidation reports
Anatomy of a Schematron RuleEstablish the context of the rule with an XPATH expressionXSLT-style test establishes the constraint for each assert <sch:rule context="atom:feed/atom:link">      <sch:assert test="starts-with(@href, 'http://')">        The feed/link/@href must contain an http url      </sch:assert> </sch:rule>You write the error message to be used if the assert fails
DSDL – Pipeline ValidationXSDRELAX NGGrammarSchematronRulesNVDLNamespace dispatchDTTLDatatypeCRSLCharacter repertoireDSRLDocument Semantic RenamingStill under development
Declaratively specify a pipeline (using XML, naturally)Similar in concept toYahoo! Pipes    BizTalkBut XML specific and a W3C standard
Thanks!

Schematron and Other Useful Tools

  • 2.
    Schematron(and other usefultools)Stuart Mylessmyles@ap.org
  • 4.
    An Aside: AP’s Ingestion PiplelineATOM + XHTMLOne way we ingest content:we transform ATOM and XHTML into our internal XML (APPL) and NITFXSLT TransformAPPL + NITFThis is greatly simplified, obviously.
  • 5.
    <p>The budget wasjust £100.</p><p>How could it be done for so little money?<p>Luckily open source tools were available.</p>These are not new problems.</p>The solutions were even standardized.<p/>Converting from HTML to XML
  • 6.
    Hard to enforcerules in the spec“HeadLine - this element must contain the same value as the entry’s <title> element”“summary is required for non-text content items, such as news photos and video. This element is optional for text story content items.”XML structure complies with XSD……but can fail in downstream systems
  • 8.
    Validate and FixPrior to IngestionOriginal ATOM + XHTMLTidy fixes sloppy HTMLCustom XSLT tidies up XMLW3C schema validates structure & syntaxSchematron schema validates business rulesValid ATOM + XHTML, ready for ingestion
  • 9.
    HTML Tidy Fix sloppy HTMLHTML -> XHTML
  • 10.
    Schematron Fact checkerfor XML documentsBusiness rules that can’t be expressed in W3C XSD schemaMediaType="Video" Format="ANPA1312"Previously, we had to inspect new feeds to catch errorsThe risk is that feeds are approved but errors appear later(Not to mention manual checking of XML is tedious)
  • 11.
    SchematronSmall, powerful, lightweightfact-checker for XML documentsSpecify constraints using XPATH rulesYou write the error messagesSchematron SchemaOne time compile into an XSLTValidation as an XSLT transformValidatePresence or absence of specific contentRelationships between elements and attributesReportsValidation reports
  • 12.
    Anatomy of aSchematron RuleEstablish the context of the rule with an XPATH expressionXSLT-style test establishes the constraint for each assert <sch:rule context="atom:feed/atom:link"> <sch:assert test="starts-with(@href, 'http://')"> The feed/link/@href must contain an http url </sch:assert> </sch:rule>You write the error message to be used if the assert fails
  • 13.
    DSDL – PipelineValidationXSDRELAX NGGrammarSchematronRulesNVDLNamespace dispatchDTTLDatatypeCRSLCharacter repertoireDSRLDocument Semantic RenamingStill under development
  • 14.
    Declaratively specify apipeline (using XML, naturally)Similar in concept toYahoo! Pipes BizTalkBut XML specific and a W3C standard
  • 15.