Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Schematron(and other useful tools)<br />Stuart Myles<br />smyles@ap.org<br />
An Aside: AP’s  Ingestion Pipleline<br />ATOM + XHTML<br />One way we ingest content:<br />we transform ATOM and XHTML int...
&lt;p&gt;The budget was just &pound;100.&lt;/p&gt;<br />&lt;p&gt;How could it be done for so little money?<br />&lt;p&gt;L...
Hard to enforce rules in the spec<br />“HeadLine - this element must contain the same value as the entry’s &lt;title&gt; e...
Validate and Fix Prior to Ingestion<br />Original ATOM + XHTML<br />Tidy fixes sloppy HTML<br />Custom XSLT tidies up XML<...
HTML Tidy      <br />Fix sloppy HTML<br />HTML -&gt; XHTML<br />
Schematron <br />Fact checker for XML documents<br />Business rules that can’t be expressed in W3C XSD schema<br />MediaTy...
Schematron<br />Small, powerful, lightweight fact-checker for XML documents<br />Specify constraints using XPATH rules<br ...
Anatomy of a Schematron Rule<br />Establish the context of the rule with an XPATH expression<br />XSLT-style test establis...
DSDL – Pipeline Validation<br />XSD<br />RELAX NG<br />Grammar<br />Schematron<br />Rules<br />NVDL<br />Namespace dispatc...
Declaratively specify a pipeline (using XML, naturally)<br />Similar in concept to<br />Yahoo! Pipes<br />    BizTalk<br /...
Thanks!<br />
Upcoming SlideShare
Loading in …5
×

Schematron and Other Useful Tools

3,003 views

Published on

Presentation by Stuart Myles to IPTC about Schematron and Tidy.

Published in: Technology
  • Be the first to comment

Schematron and Other Useful Tools

  1. 1.
  2. 2. Schematron(and other useful tools)<br />Stuart Myles<br />smyles@ap.org<br />
  3. 3.
  4. 4. An Aside: AP’s Ingestion Pipleline<br />ATOM + XHTML<br />One way we ingest content:<br />we transform ATOM and XHTML into our internal XML (APPL) and NITF<br />XSLT Transform<br />APPL + NITF<br />This is greatly simplified, obviously.<br />
  5. 5. &lt;p&gt;The budget was just &pound;100.&lt;/p&gt;<br />&lt;p&gt;How could it be done for so little money?<br />&lt;p&gt;Luckily open source tools were available.&lt;/p&gt;<br />These are not new problems.&lt;/p&gt;<br />The solutions were even standardized.&lt;p/&gt;<br />Converting from HTML to XML<br />
  6. 6. Hard to enforce rules in the spec<br />“HeadLine - this element must contain the same value as the entry’s &lt;title&gt; element”<br />“summary is required for non-text content items, such as news photos and video. This element is optional for text story content items.”<br />XML structure complies with XSD…<br />…but can fail in downstream systems<br />
  7. 7.
  8. 8. Validate and Fix Prior to Ingestion<br />Original ATOM + XHTML<br />Tidy fixes sloppy HTML<br />Custom XSLT tidies up XML<br />W3C schema validates structure & syntax<br />Schematron schema validates business rules<br />Valid ATOM + XHTML, ready for ingestion<br />
  9. 9. HTML Tidy <br />Fix sloppy HTML<br />HTML -&gt; XHTML<br />
  10. 10. Schematron <br />Fact checker for XML documents<br />Business rules that can’t be expressed in W3C XSD schema<br />MediaType=&quot;Video&quot; <br />Format=&quot;ANPA1312&quot;<br />Previously, we had to inspect new feeds to catch errors<br />The risk is that feeds are approved but errors appear later<br />(Not to mention manual checking of XML is tedious) <br />
  11. 11. Schematron<br />Small, powerful, lightweight fact-checker for XML documents<br />Specify constraints using XPATH rules<br />You write the error messages<br />Schematron Schema<br />One time compile into an XSLT<br />Validation as an XSLT transform<br />Validate<br />Presence or absence of specific content<br />Relationships between elements and attributes<br />Reports<br />Validation reports <br />
  12. 12. Anatomy of a Schematron Rule<br />Establish the context of the rule with an XPATH expression<br />XSLT-style test establishes the constraint for each assert<br /> &lt;sch:rule context=&quot;atom:feed/atom:link&quot;&gt;<br /> &lt;sch:assert test=&quot;starts-with(@href, &apos;http://&apos;)&quot;&gt;<br /> The feed/link/@href must contain an http url<br /> &lt;/sch:assert&gt; <br />&lt;/sch:rule&gt;<br />You write the error message to be used if the assert fails<br />
  13. 13. DSDL – Pipeline Validation<br />XSD<br />RELAX NG<br />Grammar<br />Schematron<br />Rules<br />NVDL<br />Namespace dispatch<br />DTTL<br />Datatype<br />CRSL<br />Character repertoire<br />DSRL<br />Document Semantic Renaming<br />Still under development<br />
  14. 14. Declaratively specify a pipeline (using XML, naturally)<br />Similar in concept to<br />Yahoo! Pipes<br /> BizTalk<br />But XML specific and a W3C standard<br />
  15. 15. Thanks!<br />

×