Using Schematron for Appropriate Layer
       Validation: a Case Study
                                   Alexander (‘Sasha’) Schwarzman, AGU (sschwarzman@agu.org)
                                    Balisage 2011: The Markup Conference, Montréal, Canada
                                                         August 2 – 5, 2011


Appropriate layer validation—advantages
         Even the most “Prussian” DTD cannot enforce all business rules, data types, and house style

         Rules-based checking needed anyway
         May use a “Californian” DTD, such as JATS: de facto industry standard adopted by publishers,
          conversion and composition vendors, archives, etc.
         Can use tools developed for JATS: Preview XSLT stylesheets, EPUB conversion processes, etc.

Why Schematron?
         Multiple genres (document types)
                Journal article

                Book chapter

                Book

                Newspaper article

         Different lifecycle phases
                Papers in press (journal article)

                Initial validation (journal article, book chapter)

                Final validation (all genres)


                                                 Journal article




Alexander (‘Sasha’) Schwarzman, AGU      Balisage 2011: The Markup Conference, Montréal, Canada   Page 1 of 3
(sschwarzman@agu.org)                    August 2–5, 2011
Book chaper and book




                                             Newspaper article




Alexander (‘Sasha’) Schwarzman, AGU   Balisage 2011: The Markup Conference, Montréal, Canada   Page 2 of 3
(sschwarzman@agu.org)                 August 2–5, 2011
AGU Schematrons
                                            FBA   IJA    FJA    SWN    FBK    EOM    EOS    PIP       IBA
            AGUcontribs.sch                 ✓     ✓      ✓             ✓      ✓      ✓      ✓         ✓
            bibr-adhoc.sch                  ✓     ✓      ✓      ✓      ✓      ✓      ✓                ✓
            bibr-ids.sch                    ✓     ✓      ✓      ✓      ✓      ✓      ✓                ✓
            bibr-italics.sch                ✓     ✓      ✓      ✓      ✓      ✓      ✓                ✓
            bibr-structures.sch             ✓     ✓      ✓      ✓      ✓      ✓      ✓                ✓
            book-bookarticle.sch                                       ✓
            book-meta.sch                                              ✓
            bookarticle-meta-final.sch      ✓                          ✓
            bookarticle-meta.sch            ✓                          ✓                              ✓
            common-back.sch                 ✓     ✓      ✓      ✓      ✓      ✓      ✓                ✓
            common-final.sch                ✓            ✓      ✓      ✓      ✓      ✓
            common-meta.sch                 ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓
            common.sch                      ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓
            dates.sch                       ✓            ✓      ✓      ✓      ✓      ✓      ✓
            eos-only.sch                                                      ✓      ✓
            filetypes.sch                   ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓
            global.sch                      ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓
            index-codes.sch                                     ✓             ✓             ✓
            index-terms.sch                 ✓     ✓      ✓             ✓             ✓                ✓
            journalarticle-meta-final.sch                ✓      ✓             ✓      ✓
            journalarticle-meta.sch               ✓      ✓      ✓             ✓      ✓      ✓
            journalarticle-tech.sch               ✓      ✓
            mddb-ws.sch                     ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓
            names.sch                       ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓
            print-final.sch                 ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓
            ref-misc.sch                    ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓
            setup.sch                       ✓     ✓      ✓      ✓      ✓      ✓      ✓      ✓         ✓

         9 top-level Schematrons, 27 modules
         350+ requirements checked

         Perform Web Services-based verifications against relational metadata database
         Check data quality, markup integrity, business rules, data types, and house style

         Provide control over production processes
         Work best in oXygen (context-sensitive), can be compiled and integrated into pipeline scripts

Paradigm shift: validation focus moves from XML parser to Schematron engine
      ☛ Content may be valid to the DTD but make no sense

      ☛ Semantic integrity now depends on Schematron

      ☛ Should each Schematron release be preserved and the version info added to metadata?

      ☛ Constraints on business partners: they must be Schematron-capable and have tools

      ☛ Schematron does not “fix” problems—people do! Processes & procedures must be defined


How to build a good Schematron
         Elicit, document, convey, and clarify the Requirements

         Ensure Schematron fits into your workflow
         Modularize Schematron

         Ensure that individual Schematron rules aren’t in conflict
         Optimize Schematron performance

         Employ XSLT 2.0
         Test, test, test

         Cultivate Schematron & XSLT 2.0 expertise in-house

Alexander (‘Sasha’) Schwarzman, AGU          Balisage 2011: The Markup Conference, Montréal, Canada         Page 3 of 3
(sschwarzman@agu.org)                        August 2–5, 2011

Using Schematron for appropriate layer validation: A case study

  • 1.
    Using Schematron forAppropriate Layer Validation: a Case Study Alexander (‘Sasha’) Schwarzman, AGU (sschwarzman@agu.org) Balisage 2011: The Markup Conference, Montréal, Canada August 2 – 5, 2011 Appropriate layer validation—advantages  Even the most “Prussian” DTD cannot enforce all business rules, data types, and house style  Rules-based checking needed anyway  May use a “Californian” DTD, such as JATS: de facto industry standard adopted by publishers, conversion and composition vendors, archives, etc.  Can use tools developed for JATS: Preview XSLT stylesheets, EPUB conversion processes, etc. Why Schematron?  Multiple genres (document types)  Journal article  Book chapter  Book  Newspaper article  Different lifecycle phases  Papers in press (journal article)  Initial validation (journal article, book chapter)  Final validation (all genres) Journal article Alexander (‘Sasha’) Schwarzman, AGU Balisage 2011: The Markup Conference, Montréal, Canada Page 1 of 3 (sschwarzman@agu.org) August 2–5, 2011
  • 2.
    Book chaper andbook Newspaper article Alexander (‘Sasha’) Schwarzman, AGU Balisage 2011: The Markup Conference, Montréal, Canada Page 2 of 3 (sschwarzman@agu.org) August 2–5, 2011
  • 3.
    AGU Schematrons FBA IJA FJA SWN FBK EOM EOS PIP IBA AGUcontribs.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ bibr-adhoc.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ bibr-ids.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ bibr-italics.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ bibr-structures.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ book-bookarticle.sch ✓ book-meta.sch ✓ bookarticle-meta-final.sch ✓ ✓ bookarticle-meta.sch ✓ ✓ ✓ common-back.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ common-final.sch ✓ ✓ ✓ ✓ ✓ ✓ common-meta.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ common.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ dates.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ eos-only.sch ✓ ✓ filetypes.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ global.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ index-codes.sch ✓ ✓ ✓ index-terms.sch ✓ ✓ ✓ ✓ ✓ ✓ journalarticle-meta-final.sch ✓ ✓ ✓ ✓ journalarticle-meta.sch ✓ ✓ ✓ ✓ ✓ ✓ journalarticle-tech.sch ✓ ✓ mddb-ws.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ names.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ print-final.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ref-misc.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ setup.sch ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓  9 top-level Schematrons, 27 modules  350+ requirements checked  Perform Web Services-based verifications against relational metadata database  Check data quality, markup integrity, business rules, data types, and house style  Provide control over production processes  Work best in oXygen (context-sensitive), can be compiled and integrated into pipeline scripts Paradigm shift: validation focus moves from XML parser to Schematron engine ☛ Content may be valid to the DTD but make no sense ☛ Semantic integrity now depends on Schematron ☛ Should each Schematron release be preserved and the version info added to metadata? ☛ Constraints on business partners: they must be Schematron-capable and have tools ☛ Schematron does not “fix” problems—people do! Processes & procedures must be defined How to build a good Schematron  Elicit, document, convey, and clarify the Requirements  Ensure Schematron fits into your workflow  Modularize Schematron  Ensure that individual Schematron rules aren’t in conflict  Optimize Schematron performance  Employ XSLT 2.0  Test, test, test  Cultivate Schematron & XSLT 2.0 expertise in-house Alexander (‘Sasha’) Schwarzman, AGU Balisage 2011: The Markup Conference, Montréal, Canada Page 3 of 3 (sschwarzman@agu.org) August 2–5, 2011