This document discusses AP's content ingestion pipeline and how they use various open source tools like Schematron, HTML Tidy, and XML validation to clean up and validate content before ingestion. Schematron in particular allows them to check for business rules that can't be expressed in an XSD schema. This validation step catches errors before content is approved and ingested, reducing risks of issues appearing later in downstream systems. The document provides an example of a Schematron rule and explains how Schematron works by compiling rule specifications into XSLT transforms to validate documents. It also briefly mentions the DSDL standard for declaratively specifying validation pipelines.