SlideShare a Scribd company logo
1 of 2
Download to read offline
Developing an STM DTD/Schema:
      Strategic Design Choices
                          Alexander (‘Sasha’) Schwarzman, AGU (sschwarzman@agu.org)
                              Extreme Markup Languages 2006, Montréal, Canada
                                              August 7 – 11, 2006


Requirements
         Does an agreed upon Requirements document exist? (Get one!)

         What is your XML’s role?
                Archival copy-of-record (preserving scientific content)?

                Means of producing a pretty PDF?

                Both?

                Much more?


Architecture
         When during production is XML created? How is accuracy checked at each stage?
                Dummy empty elements for not-yet-assigned metadata plus use of configurable
                 production-stage-specific Business Rules Checker / Validator / QC Tool?

                Multiple DTDs: a separate one for each production stage?

         XML “layering”: What “layer” to use for enforcing editorial style and business
          rules?
                DTD / parser?

                Validator / Schematron?

                Human editors?

         Revisable unit (what is the elemental unit?)
                Article?

                Issue?

                Arbitrary / cross-journal article collection?

                Volume / year?

                Journal?

                More than one of these?


Scope
         For what material?
                Current?

                Future-only?

                Legacy?

                All of the above or some combination?

         What is the extent of an article / book?
                Does it include supplementary material, like datasets and computable spreadsheets?

                Do you model “extra stuff” as just another structured section or is it something different?

                Special links (“related links”) section?


Alexander (‘Sasha’) Schwarzman, AGU     Extreme Markup Languages 2006, Montréal, Canada          Page 1 of 2
(sschwarzman@agu.org)                   August 7 – 11, 2006
Developing an STM DTD / Schema: Strategic Design Choices (cont’d)



Modeling Language Choices
         Which constraint language is primary?
                DTD?

                XSD?

                RELAX NG?

         How many DTDs / schemas (purpose of each)?
                Authoring?

                Conversion / Transformation?

                Production?

                Archiving?

         Separate or shared: If your content includes journal article, newspaper article, book
          chapter, book, case study, lecture notes, etc., should you use:
                Distinct DTD / schema for each?

                A large shared structure?

                A DTD / schema suite with common modules?

         “Off-the-shelf, Altered-to-fit, or Bespoke?” (T. Usdin)
                If altered, what public model?

                “compatible with” or “informed by” (subset or superset)?

                If bespoke, do you use any public models at all (for tables and math, for instance)?


Modeling Design Choices
         “Prussian” or “Californian”: prescriptive or descriptive? Flexible or enforcing?

         Generated or Explicit text? (depends on XML’s role)
                Preserve generation / rendition rules?

                Different approach for text and bibliographic references?

         How to model bibliographic references?
                Mixed content?

                Genre-specific “strict models” (with an escape hatch provided)?

                “Tag abuse” tolerance?

         How to reference non-XML components, e.g., figures, in XML?
                By an ID that maps to a set of multiple images in an archive?

                By naming a specific file from the set? Which one is “the mother of all images”?

                Which components to store / migrate? Is “storing cheaper than thinking”? (D. Lapeyre)

         How to model math?
                MathML presentation versus content (computation)?
                       How to ensure the identicalness of the same math symbols in different browsers (same UNICODE
                        codepoints look differently in various browsers, e.g., epsilon and varepsilon)?

                LaTeX plus GIFs?
                       How to ensure the identicalness of special characters that occur both in a displayed formula and
                        inline?

                Just GIFs?

         “Just because you can, doesn’t mean you should” (D. Lapeyre)
                The lure of modeling for its own sake. Simplicity maintains better over time


Alexander (‘Sasha’) Schwarzman, AGU       Extreme Markup Languages 2006, Montréal, Canada                     Page 2 of 2
(sschwarzman@agu.org)                     August 7 – 11, 2006

More Related Content

More from aschwarzman

2011-09-27-JATS-Con-Presentation_Schwarzman
2011-09-27-JATS-Con-Presentation_Schwarzman2011-09-27-JATS-Con-Presentation_Schwarzman
2011-09-27-JATS-Con-Presentation_Schwarzmanaschwarzman
 
2011-Balisage-Poster-Schwarzman
2011-Balisage-Poster-Schwarzman2011-Balisage-Poster-Schwarzman
2011-Balisage-Poster-Schwarzmanaschwarzman
 
Schwarzman-CSE2011
Schwarzman-CSE2011Schwarzman-CSE2011
Schwarzman-CSE2011aschwarzman
 
Schwarzman-JATS-Con-slides
Schwarzman-JATS-Con-slidesSchwarzman-JATS-Con-slides
Schwarzman-JATS-Con-slidesaschwarzman
 
Extreme-ML-2006-Poster-A-Schwarzman
Extreme-ML-2006-Poster-A-SchwarzmanExtreme-ML-2006-Poster-A-Schwarzman
Extreme-ML-2006-Poster-A-Schwarzmanaschwarzman
 
XML2004-schwarzman
XML2004-schwarzmanXML2004-schwarzman
XML2004-schwarzmanaschwarzman
 
JATS-Con-Schwarzman-slides_corr-2016-04-29
JATS-Con-Schwarzman-slides_corr-2016-04-29JATS-Con-Schwarzman-slides_corr-2016-04-29
JATS-Con-Schwarzman-slides_corr-2016-04-29aschwarzman
 
Balisage_2011-08-03_Schwarzman
Balisage_2011-08-03_SchwarzmanBalisage_2011-08-03_Schwarzman
Balisage_2011-08-03_Schwarzmanaschwarzman
 
Balisage-2015-funding-poster
Balisage-2015-funding-posterBalisage-2015-funding-poster
Balisage-2015-funding-posteraschwarzman
 
Balisage-2015-sup-mat-poster
Balisage-2015-sup-mat-posterBalisage-2015-sup-mat-poster
Balisage-2015-sup-mat-posteraschwarzman
 
Using Schematron for appropriate layer validation: A case study
Using Schematron for appropriate layer validation: A case studyUsing Schematron for appropriate layer validation: A case study
Using Schematron for appropriate layer validation: A case studyaschwarzman
 
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...aschwarzman
 
NISO-NFAIS Supplemental Journal Article Materials Working Group
NISO-NFAIS Supplemental Journal Article Materials Working GroupNISO-NFAIS Supplemental Journal Article Materials Working Group
NISO-NFAIS Supplemental Journal Article Materials Working Groupaschwarzman
 

More from aschwarzman (14)

2011-09-27-JATS-Con-Presentation_Schwarzman
2011-09-27-JATS-Con-Presentation_Schwarzman2011-09-27-JATS-Con-Presentation_Schwarzman
2011-09-27-JATS-Con-Presentation_Schwarzman
 
2011-Balisage-Poster-Schwarzman
2011-Balisage-Poster-Schwarzman2011-Balisage-Poster-Schwarzman
2011-Balisage-Poster-Schwarzman
 
Schwarzman-CSE2011
Schwarzman-CSE2011Schwarzman-CSE2011
Schwarzman-CSE2011
 
Schwarzman-JATS-Con-slides
Schwarzman-JATS-Con-slidesSchwarzman-JATS-Con-slides
Schwarzman-JATS-Con-slides
 
Extreme-ML-2006-Poster-A-Schwarzman
Extreme-ML-2006-Poster-A-SchwarzmanExtreme-ML-2006-Poster-A-Schwarzman
Extreme-ML-2006-Poster-A-Schwarzman
 
XML2004
XML2004XML2004
XML2004
 
XML2004-schwarzman
XML2004-schwarzmanXML2004-schwarzman
XML2004-schwarzman
 
JATS-Con-Schwarzman-slides_corr-2016-04-29
JATS-Con-Schwarzman-slides_corr-2016-04-29JATS-Con-Schwarzman-slides_corr-2016-04-29
JATS-Con-Schwarzman-slides_corr-2016-04-29
 
Balisage_2011-08-03_Schwarzman
Balisage_2011-08-03_SchwarzmanBalisage_2011-08-03_Schwarzman
Balisage_2011-08-03_Schwarzman
 
Balisage-2015-funding-poster
Balisage-2015-funding-posterBalisage-2015-funding-poster
Balisage-2015-funding-poster
 
Balisage-2015-sup-mat-poster
Balisage-2015-sup-mat-posterBalisage-2015-sup-mat-poster
Balisage-2015-sup-mat-poster
 
Using Schematron for appropriate layer validation: A case study
Using Schematron for appropriate layer validation: A case studyUsing Schematron for appropriate layer validation: A case study
Using Schematron for appropriate layer validation: A case study
 
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
 
NISO-NFAIS Supplemental Journal Article Materials Working Group
NISO-NFAIS Supplemental Journal Article Materials Working GroupNISO-NFAIS Supplemental Journal Article Materials Working Group
NISO-NFAIS Supplemental Journal Article Materials Working Group
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Developing an STM DTD/Schema: Strategic design choices

  • 1. Developing an STM DTD/Schema: Strategic Design Choices Alexander (‘Sasha’) Schwarzman, AGU (sschwarzman@agu.org) Extreme Markup Languages 2006, Montréal, Canada August 7 – 11, 2006 Requirements  Does an agreed upon Requirements document exist? (Get one!)  What is your XML’s role?  Archival copy-of-record (preserving scientific content)?  Means of producing a pretty PDF?  Both?  Much more? Architecture  When during production is XML created? How is accuracy checked at each stage?  Dummy empty elements for not-yet-assigned metadata plus use of configurable production-stage-specific Business Rules Checker / Validator / QC Tool?  Multiple DTDs: a separate one for each production stage?  XML “layering”: What “layer” to use for enforcing editorial style and business rules?  DTD / parser?  Validator / Schematron?  Human editors?  Revisable unit (what is the elemental unit?)  Article?  Issue?  Arbitrary / cross-journal article collection?  Volume / year?  Journal?  More than one of these? Scope  For what material?  Current?  Future-only?  Legacy?  All of the above or some combination?  What is the extent of an article / book?  Does it include supplementary material, like datasets and computable spreadsheets?  Do you model “extra stuff” as just another structured section or is it something different?  Special links (“related links”) section? Alexander (‘Sasha’) Schwarzman, AGU Extreme Markup Languages 2006, Montréal, Canada Page 1 of 2 (sschwarzman@agu.org) August 7 – 11, 2006
  • 2. Developing an STM DTD / Schema: Strategic Design Choices (cont’d) Modeling Language Choices  Which constraint language is primary?  DTD?  XSD?  RELAX NG?  How many DTDs / schemas (purpose of each)?  Authoring?  Conversion / Transformation?  Production?  Archiving?  Separate or shared: If your content includes journal article, newspaper article, book chapter, book, case study, lecture notes, etc., should you use:  Distinct DTD / schema for each?  A large shared structure?  A DTD / schema suite with common modules?  “Off-the-shelf, Altered-to-fit, or Bespoke?” (T. Usdin)  If altered, what public model?  “compatible with” or “informed by” (subset or superset)?  If bespoke, do you use any public models at all (for tables and math, for instance)? Modeling Design Choices  “Prussian” or “Californian”: prescriptive or descriptive? Flexible or enforcing?  Generated or Explicit text? (depends on XML’s role)  Preserve generation / rendition rules?  Different approach for text and bibliographic references?  How to model bibliographic references?  Mixed content?  Genre-specific “strict models” (with an escape hatch provided)?  “Tag abuse” tolerance?  How to reference non-XML components, e.g., figures, in XML?  By an ID that maps to a set of multiple images in an archive?  By naming a specific file from the set? Which one is “the mother of all images”?  Which components to store / migrate? Is “storing cheaper than thinking”? (D. Lapeyre)  How to model math?  MathML presentation versus content (computation)?  How to ensure the identicalness of the same math symbols in different browsers (same UNICODE codepoints look differently in various browsers, e.g., epsilon and varepsilon)?  LaTeX plus GIFs?  How to ensure the identicalness of special characters that occur both in a displayed formula and inline?  Just GIFs?  “Just because you can, doesn’t mean you should” (D. Lapeyre)  The lure of modeling for its own sake. Simplicity maintains better over time Alexander (‘Sasha’) Schwarzman, AGU Extreme Markup Languages 2006, Montréal, Canada Page 2 of 2 (sschwarzman@agu.org) August 7 – 11, 2006