SlideShare a Scribd company logo
Schematron(and other useful tools) Stuart Myles smyles@ap.org
An Aside: AP’s  Ingestion Pipleline ATOM + XHTML One way we ingest content: we transform ATOM and XHTML into our internal XML (APPL)  and NITF XSLT Transform APPL + NITF This is greatly simplified, obviously.
<p>The budget was just £100.</p> <p>How could it be done for so little money? <p>Luckily open source tools were available.</p> These are not new problems.</p> The solutions were even standardized.<p/> Converting from HTML to XML
Hard to enforce rules in the spec “HeadLine - this element must contain the same value as the entry’s <title> element” “summary is required for non-text content items, such as news photos and video. This element is optional for text story content items.” XML structure complies with XSD… …but can fail in downstream systems
Validate and Fix Prior to Ingestion Original ATOM + XHTML Tidy fixes sloppy HTML Custom XSLT tidies up XML W3C schema validates structure & syntax Schematron schema validates business rules Valid ATOM + XHTML, ready for ingestion
HTML Tidy       Fix sloppy HTML HTML -> XHTML
Schematron  Fact checker for XML documents Business rules that can’t be expressed in W3C XSD schema MediaType="Video"  Format="ANPA1312" Previously, we had to inspect new feeds to catch errors The risk is that feeds are approved but errors appear later (Not to mention manual checking of XML is tedious)
Schematron Small, powerful, lightweight fact-checker for XML documents Specify constraints using XPATH rules You write the error messages Schematron Schema One time compile into an XSLT Validation as an XSLT transform Validate Presence or absence of specific content Relationships between elements and attributes Reports Validation reports
Anatomy of a Schematron Rule Establish the context of the rule with an XPATH expression XSLT-style test establishes the constraint for each assert  <sch:rule context="atom:feed/atom:link">       <sch:assert test="starts-with(@href, 'http://')">         The feed/link/@href must contain an http url       </sch:assert>  </sch:rule> You write the error message to be used if the assert fails
DSDL – Pipeline Validation XSD RELAX NG Grammar Schematron Rules NVDL Namespace dispatch DTTL Datatype CRSL Character repertoire DSRL Document Semantic Renaming Still under development
Declaratively specify a pipeline (using XML, naturally) Similar in concept to Yahoo! Pipes     BizTalk But XML specific and a W3C standard
Thanks!

More Related Content

What's hot

Triggers and Stored Procedures
Triggers and Stored ProceduresTriggers and Stored Procedures
Triggers and Stored Procedures
Tharindu Weerasinghe
 
XSLT and XPath - without the pain!
XSLT and XPath - without the pain!XSLT and XPath - without the pain!
XSLT and XPath - without the pain!
Bertrand Delacretaz
 
XML SCHEMAS
XML SCHEMASXML SCHEMAS
Broadleaf Presents Thymeleaf
Broadleaf Presents ThymeleafBroadleaf Presents Thymeleaf
Broadleaf Presents Thymeleaf
Broadleaf Commerce
 
Introduction to Sightly
Introduction to SightlyIntroduction to Sightly
Introduction to Sightly
Ankit Gubrani
 
Session six ASP.net (MVC) View
Session six ASP.net (MVC) ViewSession six ASP.net (MVC) View
Session six ASP.net (MVC) View
Mustafa Saeed
 
Xslt by asfak mahamud
Xslt by asfak mahamudXslt by asfak mahamud
Xslt by asfak mahamud
Asfak Mahamud
 
Jsp intro
Jsp introJsp intro
Jsp intro
husnara mohammad
 

What's hot (11)

Basic JSTL
Basic JSTLBasic JSTL
Basic JSTL
 
Html JavaScript and CSS
Html JavaScript and CSSHtml JavaScript and CSS
Html JavaScript and CSS
 
Triggers and Stored Procedures
Triggers and Stored ProceduresTriggers and Stored Procedures
Triggers and Stored Procedures
 
XSLT and XPath - without the pain!
XSLT and XPath - without the pain!XSLT and XPath - without the pain!
XSLT and XPath - without the pain!
 
XML SCHEMAS
XML SCHEMASXML SCHEMAS
XML SCHEMAS
 
Broadleaf Presents Thymeleaf
Broadleaf Presents ThymeleafBroadleaf Presents Thymeleaf
Broadleaf Presents Thymeleaf
 
Sightly - Part 2
Sightly - Part 2Sightly - Part 2
Sightly - Part 2
 
Introduction to Sightly
Introduction to SightlyIntroduction to Sightly
Introduction to Sightly
 
Session six ASP.net (MVC) View
Session six ASP.net (MVC) ViewSession six ASP.net (MVC) View
Session six ASP.net (MVC) View
 
Xslt by asfak mahamud
Xslt by asfak mahamudXslt by asfak mahamud
Xslt by asfak mahamud
 
Jsp intro
Jsp introJsp intro
Jsp intro
 

Similar to Schematron and Other Useful Tools

Extending Schemas
Extending SchemasExtending Schemas
Extending SchemasLiquidHub
 
XML Transformations With PHP
XML Transformations With PHPXML Transformations With PHP
XML Transformations With PHP
Stephan Schmidt
 
Creating an RSS feed
Creating an RSS feedCreating an RSS feed
Creating an RSS feed
Karthikeyan Mkr
 
Transforming Xml Data Into Html
Transforming Xml Data Into HtmlTransforming Xml Data Into Html
Transforming Xml Data Into Html
Karthikeyan Mkr
 
XML Training Presentation
XML Training PresentationXML Training Presentation
XML Training Presentation
Sarah Corney
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Jussi Pohjolainen
 
Week 12 xml and xsl
Week 12 xml and xslWeek 12 xml and xsl
Week 12 xml and xsl
hapy
 
Kurzeinführung: Atom Publishing Protocol
Kurzeinführung: Atom Publishing ProtocolKurzeinführung: Atom Publishing Protocol
Kurzeinführung: Atom Publishing Protocol
Dirk Haun
 
Xml
XmlXml
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
BG Java EE Course
 
Improving Soap Message Serialization
Improving Soap Message SerializationImproving Soap Message Serialization
Improving Soap Message SerializationPrabath Siriwardena
 
Xml Zoe
Xml ZoeXml Zoe
Xml Zoe
zoepster
 
Xml Zoe
Xml ZoeXml Zoe
Xml Zoe
zoepster
 
Inroduction to XSLT with PHP4
Inroduction to XSLT with PHP4Inroduction to XSLT with PHP4
Inroduction to XSLT with PHP4
Stephan Schmidt
 
AK html
AK  htmlAK  html
AK html
gauravashq
 
Php Mysql Feedrss
Php Mysql FeedrssPhp Mysql Feedrss
Php Mysql FeedrssRCS&RDS
 
XML and XSLT
XML and XSLTXML and XSLT
XML and XSLT
Andrew Savory
 
SOAP Overview
SOAP OverviewSOAP Overview
SOAP Overview
elliando dias
 

Similar to Schematron and Other Useful Tools (20)

Extending Schemas
Extending SchemasExtending Schemas
Extending Schemas
 
XML Transformations With PHP
XML Transformations With PHPXML Transformations With PHP
XML Transformations With PHP
 
Creating an RSS feed
Creating an RSS feedCreating an RSS feed
Creating an RSS feed
 
Transforming Xml Data Into Html
Transforming Xml Data Into HtmlTransforming Xml Data Into Html
Transforming Xml Data Into Html
 
XML Training Presentation
XML Training PresentationXML Training Presentation
XML Training Presentation
 
Xml Schema
Xml SchemaXml Schema
Xml Schema
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml
XmlXml
Xml
 
Week 12 xml and xsl
Week 12 xml and xslWeek 12 xml and xsl
Week 12 xml and xsl
 
Kurzeinführung: Atom Publishing Protocol
Kurzeinführung: Atom Publishing ProtocolKurzeinführung: Atom Publishing Protocol
Kurzeinführung: Atom Publishing Protocol
 
Xml
XmlXml
Xml
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Improving Soap Message Serialization
Improving Soap Message SerializationImproving Soap Message Serialization
Improving Soap Message Serialization
 
Xml Zoe
Xml ZoeXml Zoe
Xml Zoe
 
Xml Zoe
Xml ZoeXml Zoe
Xml Zoe
 
Inroduction to XSLT with PHP4
Inroduction to XSLT with PHP4Inroduction to XSLT with PHP4
Inroduction to XSLT with PHP4
 
AK html
AK  htmlAK  html
AK html
 
Php Mysql Feedrss
Php Mysql FeedrssPhp Mysql Feedrss
Php Mysql Feedrss
 
XML and XSLT
XML and XSLTXML and XSLT
XML and XSLT
 
SOAP Overview
SOAP OverviewSOAP Overview
SOAP Overview
 

More from Stuart Myles

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For News
Stuart Myles
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies Ideas
Stuart Myles
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019
Stuart Myles
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 Conference
Stuart Myles
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?
Stuart Myles
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated Press
Stuart Myles
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018
Stuart Myles
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 Welcome
Stuart Myles
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?
Stuart Myles
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018
Stuart Myles
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
Stuart Myles
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and Challenges
Stuart Myles
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018
Stuart Myles
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical Challenges
Stuart Myles
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
Stuart Myles
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing Director
Stuart Myles
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSON
Stuart Myles
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017
Stuart Myles
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017
Stuart Myles
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017
Stuart Myles
 

More from Stuart Myles (20)

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For News
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies Ideas
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 Conference
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated Press
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 Welcome
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and Challenges
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical Challenges
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing Director
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSON
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017
 

Recently uploaded

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 

Recently uploaded (20)

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 

Schematron and Other Useful Tools

  • 1.
  • 2. Schematron(and other useful tools) Stuart Myles smyles@ap.org
  • 3.
  • 4. An Aside: AP’s Ingestion Pipleline ATOM + XHTML One way we ingest content: we transform ATOM and XHTML into our internal XML (APPL) and NITF XSLT Transform APPL + NITF This is greatly simplified, obviously.
  • 5. <p>The budget was just £100.</p> <p>How could it be done for so little money? <p>Luckily open source tools were available.</p> These are not new problems.</p> The solutions were even standardized.<p/> Converting from HTML to XML
  • 6. Hard to enforce rules in the spec “HeadLine - this element must contain the same value as the entry’s <title> element” “summary is required for non-text content items, such as news photos and video. This element is optional for text story content items.” XML structure complies with XSD… …but can fail in downstream systems
  • 7.
  • 8. Validate and Fix Prior to Ingestion Original ATOM + XHTML Tidy fixes sloppy HTML Custom XSLT tidies up XML W3C schema validates structure & syntax Schematron schema validates business rules Valid ATOM + XHTML, ready for ingestion
  • 9. HTML Tidy Fix sloppy HTML HTML -> XHTML
  • 10. Schematron Fact checker for XML documents Business rules that can’t be expressed in W3C XSD schema MediaType="Video" Format="ANPA1312" Previously, we had to inspect new feeds to catch errors The risk is that feeds are approved but errors appear later (Not to mention manual checking of XML is tedious)
  • 11. Schematron Small, powerful, lightweight fact-checker for XML documents Specify constraints using XPATH rules You write the error messages Schematron Schema One time compile into an XSLT Validation as an XSLT transform Validate Presence or absence of specific content Relationships between elements and attributes Reports Validation reports
  • 12. Anatomy of a Schematron Rule Establish the context of the rule with an XPATH expression XSLT-style test establishes the constraint for each assert <sch:rule context="atom:feed/atom:link"> <sch:assert test="starts-with(@href, 'http://')"> The feed/link/@href must contain an http url </sch:assert> </sch:rule> You write the error message to be used if the assert fails
  • 13. DSDL – Pipeline Validation XSD RELAX NG Grammar Schematron Rules NVDL Namespace dispatch DTTL Datatype CRSL Character repertoire DSRL Document Semantic Renaming Still under development
  • 14. Declaratively specify a pipeline (using XML, naturally) Similar in concept to Yahoo! Pipes BizTalk But XML specific and a W3C standard