SlideShare a Scribd company logo
1 of 54
JATSPack and JATSPAN, a packaging format specification and a web site (mostly) for schema customizations. Chris Maloney August 4, 2011
Note JATSPack and JATSPAN are not part of the NLM/NISO JATS.  JATSPack is a proposed specification that is completely independent of the tag suite.  JATSPAN is a non-commercial web site with no affiliation with NLM or NISO.
Extensibility, Customizability, and Interchange Several different perspectives Eliot Kimber: It's all about interchange The goal should be "blind" interchange:“By blind interchange I mean interchange that requires the least amount of pre-interchange negotiation and knowledge exchange between interchange partners.” JATSPack is about lowering the barrier to interchange, but not quite down to the level of "blind" (depending on how you define “least”).
Extensibility, Customizability, and Interchange Wendell Piez: The problem with schema extensibility:“Extensions to a tag set, even as they successfully address new requirements, raise interoperability issues with systems that do not know about them.” 	“... we have a devil's choice: fork or bloat.” But, maybe schema extensions can be made more manageable
Expressiveness vs. Interoperability Yes, there’s a tradeoff But maybe not zero-sum Maybe we can push both forward together
Extensions and customizations happen When a publisher needs a feature, they will find a way to get it in. Standards bodies are sometimes, maybe, a little bit too slow. Leading to extending the "wrong" way: Documents that ostensibly are the same "type", but that are not interchangeable, because of different special vocabularies or tagging styles. Leading to interchange problems.
Schema languages are designed for this Provide the proper ways to extend and customize DTD, W3C Schema, Relax NG, Schematron, and NVDL exist for a reason XML = "Extensible Markup Language“ Escape hatches are necessary, But, there are advantages to using core schema technologies.
Users should customize They know their requirements better than others. The environment is evolving too fast. (Can’t emphasize this enough) Crowdsourcing might be a solution But, crowdsourcing needs an infrastructure.
Interoperability problems These are real Maybe, these are part of the cause: Lack of a standard way to communicate customizations Dearth of simple, step-by-step tutorials and examples on doing customizations right.
Motivation for JATSPack Facilitate systems that can use many different schema types easily Ease the installation of the complete set of all JATS schemas. Ease reuse and interchange of schema customizations. Ease reuse and interchange of libraries that go along with customizations, These should, in turn, allow for easier interchange of document instances
Inspirations oXygen "frameworks“ TEI's ODD
Requirements JATSPacks should be usable on existing systems without any special infrastructure Avoid the "chicken/egg" problem to adoption Backwards compatibility with core JATS Don’t reinvent the wheel Reuse/extend some existing packaging specification
What is JATS? Journal Article Tag Suite Old name:  NLM Journal Archiving and Interchange Tag Suite Recent NISO standard for trial use
What is JATS? Primarily for publishing journal articles. Used for other things too (books, archiving). Many “flavors” and versions. Mostly used as DTDs, Also distributed as W3C schema and Relax NG.
JATSPack A packaging format specification based on Florent Georges' EXPath packaging A way to package schema customizations and extensions And more: XProc, XQuery, XSLT, and XPath code libraries OASIS catalog files Documentation and other resources Some metadata
Extension of EXPath Packaging (EXPath-pkg) JATSPackiswill be forwards-compatible Right now there are some incompatibilities. Every JATSPack is an EXPath package Zip file with a .xar extension Every package has a abbreviated name (abbrev)  (one-part, two-part, or hierarchical?) Contains a top-level package descriptor. Any JATSPack-enabled system should be able to use EXPath packages from CXAN. EXPath-pkg is already supported by several tools
JATSPack is also a Zip file Forwards-compatible extension of a simple Zip file Can be used without any special infrastructure Simply by unpacking the Zip file to the right place, And adding a "nextCatalog" entry in your master catalog file (Note:  this introduced an incompatibility with EXPath packaging.  I require that the on-disk repository layout be the same as the in-zip directory layout; that is, that the install process does no moving of files around after they are unzipped.)
JATSPack packaging of related resources DTDs, W3C Schemas, Relax NG, Schematron, NVDL OASIS catalog files XQuery, XSLT to provide function modules XProc to bind them all Documentation Examples Self-tests
JATSPack directory structure [root]    abbrev-1/        abbrev-2/            version/                README.txt (optional)                expath-pkg.xml                catalog.xmldtd/rng/rnc/xsd/xslt/xquery/xproc/                doc/                samples/                resources/                test/
JATSPack package descriptor
JATSPack OASIS catalog file
JATSPAN JATSPack archive network Analogous to CPAN or CXAN A web site jatspan.org Allows authors to share and reuse JATSPacks Allows other users to discover relevant JATSPacks
jatspan A client program Not necessary to use JATSPacks Manages local repositories At a specified directory on the local filesystem Contains a master OASIS catalog file Automates installation of JATSPacks Resolves dependencies, and downloads and installs prerequisite packs Here's how you use it to install the TaxPub JATSPack: jatspan install taxpub-schema
Use Cases / Examples
Use Case - A publisher evaluates JATS for the first time JATS has many flavors and versions (currently 34 permutations) Downloadable from the NLM archive_dtdand JATS FTP sites Can seem overwhelming and complicated Many publishers still use older versions for their published articles Each flavor / version is distributed as separate, flattened Zip file includes the bundled version of all of the files for that particular set Installation of each requires manually tweaking the OASIS catalog file Difficult/tedious to configure a system that can use all/any of them simultaneously
Example:  Core JATS Bundle Each of the 34 flavors and versions has been refactored as a JATSPack Core modules factored out into "core" JATSPack Each pack has an OASIS catalog file that references only the modules in that pack All of these can be downloaded and installed as a single bundle. Bundle has a single top-level OASIS catalog file Currently just the DTDs (not W3C schema or Relax NG) Also includes sample XML instance documents
Changes to the core JATS Might be controversial (I don't know) Mostly necessitated by changing the directory structure and moving files around Changed relative URIs that cross-reference between the modules Cleaned up some discrepancies in old versions Didn't change any top-level public identifiers My bundle is 100% compatible (I’m 99% sure of this)
Use Case - A publisher develops a new JATS customization There is an ongoing sea change in the nature of journal articles Articles are no longer limited to the (print media) figures, tables, and equations. The lines between traditional definitions of media types, such as journal articles, books, wikis, blog posts, data-only articles, presentations, etc., are continually getting blurred Open-science movement Scientists are sharing their data more often. Grass roots efforts to bypass traditional publishing models This trend is moving/evolving very fast We cannot anticipate what will be the needs of the users
Supplemental materials Supplemental material (data) moving into main content “Pseudo-supplemental”:  essential material, but doesn't "fit” into the journal article.  (Sasha Schwarzman) Also called "integral content". E.g., Cell doesn't embed movies because they don't fit into PDFs. Sasha quoted E. Marcus:  “... over time the concept of supplemental material will gradually give way to a more modern concept of a hierarchical or layered presentation in which a reader can define which level of detail best fits their interests and needs.” We need to be facilitating this transformation
JATS Customizations JATS was designed in modular, extensible way, but the barrier to customizing is still high Alternatives to customizing: Suggest a change to the standard, and wait Create a local customization, and forego interchangeability Pseudo-customization
Pseudo-customization Strategies  Put the data into a separate file and link to it. CDATA section (à la RSS) "Escape hatches" with custom vocabulary Processing Instructions These are all ways of getting around the DTD (schema) So validation has to use a different mechanisms This is the tail wagging the dog:  the DTD (schema) should work for us, not the other way around.
JATS and supplemental data JATS has "escape hatches" for different kinds of data objects, and links to external objects. But it would often make more sense to include it natively. Bottom line:  extensions and customizations will happen.  It would be nice to have an infrastructure for communicating and managing them.
Example - TaxPub Customization of JATS Allows inclusion of Taxonomic treatments into journal articles As described by Terry Catapano at last year's JATS-Con Used in ZooKeys, published by PenSoft. Articles are simultaneously released to the Species-ID wiki
TaxPub JATSPack Named “taxpub-schema” Directory structure: taxpub/     schema/         0.1/ dtd/             doc/             samples/
Converting TaxPub into a JATSPack Fixed relative system identifiers Fixed doctype declarations, for example: From:    <!DOCTYPE article SYSTEM "../tax-treatment-NS0.dtd"> To:<!DOCTYPE book PUBLIC                      "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN"                      "../dtd/tax-treatment-NS0.dtd"> Created an OASIS catalog file Zip the results into a .xar file. Upload to JATSPAN
TaxPub to JATSPack:  advantages The advantages are not dramatic Lower the activation energy for others to discover and install Increase visibility Could allow for inclusion of (for example) XSLT libraries, self tests, documentation, in a consistent way Easier for some other developer to extend TaxPub
Use Case - Publisher or archive adds support for new document type Currently there is no standard way of packaging the information relevant to a document type. Installation is not especially hard, but does require some expertise and coordination of resources
Use Case - Publisher or archive adds support for new document type With JATSPack but not jatspan: Just download the Zip file, unzip it, and update your catalog file With jatspan: jatspan install This automatically resolves dependencies.
Use Case - JATS-related libraries These are not schema extensions; just code libraries. Right now, there is no standard way to deploy a library Advantages here are the same as for EXPath packaging In fact they could be deployed as EXPath packages.
Example - Journal Publishing 3.0 Preview Stylesheets as a JATSPack By Wendell Piez, presented at JATS-Con last year Repackaged as a JATSPack Adapted to use Xproc Not a major improvement, but, again, incrementally lowers the activation energy to find/install/use/extend these. Especially extend Other authors could write new JATSPacks that depend on these, Installing those, dependency would be automatically resolved.
Example, JATS-to-EPub transformation By Laura Kelly, presented at JATS-Con last year Depends on the preview stylesheets mentioned above This could be deployed without the preview stylesheets, and that dependency would be resolved by jatspan
Customizations and compatibility
JATSPack supports any schema language Schematron is the best language to use for validation –  -- Eliot Kimber Relax NG is very expressive and easy to use NVDL looks cool The documentation is the final word (Eliot again) JATSPack can (should) include this documentation
Forwards compatibility - review Means that newer documents (version 2) can be used by existing/old processing systems (version 1). E.g. "must ignore" pattern of extensibility of HTML HTML renderers must ignore any tags that they don't understand This is a forwards-compatibility extension substitution rule This  allows future designers to customize the HTML schema, adding elements and attributes, while being able to predict how document instances in the new schema will be processed by old systems.
Forwards compatibility – we can do better than HTML TaxPub, for example, adds new elements and attributes The package could include XSLTs that transform those into "standard JATS“ More powerful extension forward-compatibility substitution rules. Gets close to useful, blind interchange
How JATSPack and JATSPAN help Interchange By lowering the activation energy (just a little) at several rate-limiting steps in the reaction: Easier to customize ... correctly and robustly Easier to package Easier to share Easier to discover Easier to install
Closing remarks
Format is not JATS specific This format could be used to package customizations of any other XML standard. I hope to merge my extensions back into EXPath-pkg Could use CXAN
Future work Current work (not as far along as I'd hoped). Adding other existing resources (Relax NG and docs) to core bundle. Finish up the examples described in the paper. Get the JATS core bundle to be packaged with oXygen XML editor as JATS “framework”. This is an idea/suggestion.  Don’t know if it would be acceptable, but I think it’s a good fit.
Future work – forwards compatible extension mechanism I think JATSPack is an important first step, but more work is needed to realize this goal. A lot of prior work on this topic. Eliot seems to have some ideas.
Future work - JATSPAN Throw examples and how-tos on the JATSPAN site JATSPacks should be usable directly off of JATSPAN, without installing to a local machine Should be able to browse package documentation on JATSPAN, w/o downloading JATSPAN could provide document instance tools, such as a validator, style checker, and document previewer. Not just for DTDs but for any of the schema languages in the JATSPack. "Roma for JATS"
Help! Suggestions / criticisms welcome Jatspan-users mailing list https://lists.sourceforge.net/lists/listinfo/jatspan-users jatspan-users@lists.sourceforge.net(No need to subscribe.  Just send “+1” to this address. It will help!) Help with development Help with ideas
Links Sourceforge site Latest version of Balisage paper Sourceforge project JATSPAN  (Coming soon!) Interesting article on ZooKeys / TaxPub / Species-ID
Candy

More Related Content

Similar to JATSPack and JATSPAN, a packaging format specification and a web site

Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsBob Pusateri
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and ElasticsearchDean Hamstead
 
What is Object storage ?
What is Object storage ?What is Object storage ?
What is Object storage ?Nabil Kassi
 
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...Philip Schwarz
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.pptAnandKonj1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'sankarapu posibabu
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.pptssuser8c8fc1
 
data analytics lecture 3.2.ppt
data analytics lecture 3.2.pptdata analytics lecture 3.2.ppt
data analytics lecture 3.2.pptRutujaPatil247341
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJoseph Kuo
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Ravi Okade
 
Solid pods and the future of the spatial web
Solid pods and the future of the spatial webSolid pods and the future of the spatial web
Solid pods and the future of the spatial webKurt Cagle
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Jenn Riley
 
Clustering In The Wild
Clustering In The WildClustering In The Wild
Clustering In The WildSergio Bossa
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 

Similar to JATSPack and JATSPAN, a packaging format specification and a web site (20)

Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAs
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
What is Object storage ?
What is Object storage ?What is Object storage ?
What is Object storage ?
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
 
Datastores
DatastoresDatastores
Datastores
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
TileDB
TileDBTileDB
TileDB
 
data analytics lecture 3.2.ppt
data analytics lecture 3.2.pptdata analytics lecture 3.2.ppt
data analytics lecture 3.2.ppt
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)
 
Solid pods and the future of the spatial web
Solid pods and the future of the spatial webSolid pods and the future of the spatial web
Solid pods and the future of the spatial web
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
Clustering In The Wild
Clustering In The WildClustering In The Wild
Clustering In The Wild
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

JATSPack and JATSPAN, a packaging format specification and a web site

  • 1. JATSPack and JATSPAN, a packaging format specification and a web site (mostly) for schema customizations. Chris Maloney August 4, 2011
  • 2. Note JATSPack and JATSPAN are not part of the NLM/NISO JATS. JATSPack is a proposed specification that is completely independent of the tag suite. JATSPAN is a non-commercial web site with no affiliation with NLM or NISO.
  • 3. Extensibility, Customizability, and Interchange Several different perspectives Eliot Kimber: It's all about interchange The goal should be "blind" interchange:“By blind interchange I mean interchange that requires the least amount of pre-interchange negotiation and knowledge exchange between interchange partners.” JATSPack is about lowering the barrier to interchange, but not quite down to the level of "blind" (depending on how you define “least”).
  • 4. Extensibility, Customizability, and Interchange Wendell Piez: The problem with schema extensibility:“Extensions to a tag set, even as they successfully address new requirements, raise interoperability issues with systems that do not know about them.” “... we have a devil's choice: fork or bloat.” But, maybe schema extensions can be made more manageable
  • 5. Expressiveness vs. Interoperability Yes, there’s a tradeoff But maybe not zero-sum Maybe we can push both forward together
  • 6. Extensions and customizations happen When a publisher needs a feature, they will find a way to get it in. Standards bodies are sometimes, maybe, a little bit too slow. Leading to extending the "wrong" way: Documents that ostensibly are the same "type", but that are not interchangeable, because of different special vocabularies or tagging styles. Leading to interchange problems.
  • 7. Schema languages are designed for this Provide the proper ways to extend and customize DTD, W3C Schema, Relax NG, Schematron, and NVDL exist for a reason XML = "Extensible Markup Language“ Escape hatches are necessary, But, there are advantages to using core schema technologies.
  • 8. Users should customize They know their requirements better than others. The environment is evolving too fast. (Can’t emphasize this enough) Crowdsourcing might be a solution But, crowdsourcing needs an infrastructure.
  • 9. Interoperability problems These are real Maybe, these are part of the cause: Lack of a standard way to communicate customizations Dearth of simple, step-by-step tutorials and examples on doing customizations right.
  • 10. Motivation for JATSPack Facilitate systems that can use many different schema types easily Ease the installation of the complete set of all JATS schemas. Ease reuse and interchange of schema customizations. Ease reuse and interchange of libraries that go along with customizations, These should, in turn, allow for easier interchange of document instances
  • 12. Requirements JATSPacks should be usable on existing systems without any special infrastructure Avoid the "chicken/egg" problem to adoption Backwards compatibility with core JATS Don’t reinvent the wheel Reuse/extend some existing packaging specification
  • 13. What is JATS? Journal Article Tag Suite Old name: NLM Journal Archiving and Interchange Tag Suite Recent NISO standard for trial use
  • 14. What is JATS? Primarily for publishing journal articles. Used for other things too (books, archiving). Many “flavors” and versions. Mostly used as DTDs, Also distributed as W3C schema and Relax NG.
  • 15. JATSPack A packaging format specification based on Florent Georges' EXPath packaging A way to package schema customizations and extensions And more: XProc, XQuery, XSLT, and XPath code libraries OASIS catalog files Documentation and other resources Some metadata
  • 16. Extension of EXPath Packaging (EXPath-pkg) JATSPackiswill be forwards-compatible Right now there are some incompatibilities. Every JATSPack is an EXPath package Zip file with a .xar extension Every package has a abbreviated name (abbrev) (one-part, two-part, or hierarchical?) Contains a top-level package descriptor. Any JATSPack-enabled system should be able to use EXPath packages from CXAN. EXPath-pkg is already supported by several tools
  • 17. JATSPack is also a Zip file Forwards-compatible extension of a simple Zip file Can be used without any special infrastructure Simply by unpacking the Zip file to the right place, And adding a "nextCatalog" entry in your master catalog file (Note: this introduced an incompatibility with EXPath packaging. I require that the on-disk repository layout be the same as the in-zip directory layout; that is, that the install process does no moving of files around after they are unzipped.)
  • 18. JATSPack packaging of related resources DTDs, W3C Schemas, Relax NG, Schematron, NVDL OASIS catalog files XQuery, XSLT to provide function modules XProc to bind them all Documentation Examples Self-tests
  • 19. JATSPack directory structure [root] abbrev-1/ abbrev-2/ version/ README.txt (optional) expath-pkg.xml catalog.xmldtd/rng/rnc/xsd/xslt/xquery/xproc/ doc/ samples/ resources/ test/
  • 22. JATSPAN JATSPack archive network Analogous to CPAN or CXAN A web site jatspan.org Allows authors to share and reuse JATSPacks Allows other users to discover relevant JATSPacks
  • 23. jatspan A client program Not necessary to use JATSPacks Manages local repositories At a specified directory on the local filesystem Contains a master OASIS catalog file Automates installation of JATSPacks Resolves dependencies, and downloads and installs prerequisite packs Here's how you use it to install the TaxPub JATSPack: jatspan install taxpub-schema
  • 24. Use Cases / Examples
  • 25. Use Case - A publisher evaluates JATS for the first time JATS has many flavors and versions (currently 34 permutations) Downloadable from the NLM archive_dtdand JATS FTP sites Can seem overwhelming and complicated Many publishers still use older versions for their published articles Each flavor / version is distributed as separate, flattened Zip file includes the bundled version of all of the files for that particular set Installation of each requires manually tweaking the OASIS catalog file Difficult/tedious to configure a system that can use all/any of them simultaneously
  • 26. Example: Core JATS Bundle Each of the 34 flavors and versions has been refactored as a JATSPack Core modules factored out into "core" JATSPack Each pack has an OASIS catalog file that references only the modules in that pack All of these can be downloaded and installed as a single bundle. Bundle has a single top-level OASIS catalog file Currently just the DTDs (not W3C schema or Relax NG) Also includes sample XML instance documents
  • 27. Changes to the core JATS Might be controversial (I don't know) Mostly necessitated by changing the directory structure and moving files around Changed relative URIs that cross-reference between the modules Cleaned up some discrepancies in old versions Didn't change any top-level public identifiers My bundle is 100% compatible (I’m 99% sure of this)
  • 28. Use Case - A publisher develops a new JATS customization There is an ongoing sea change in the nature of journal articles Articles are no longer limited to the (print media) figures, tables, and equations. The lines between traditional definitions of media types, such as journal articles, books, wikis, blog posts, data-only articles, presentations, etc., are continually getting blurred Open-science movement Scientists are sharing their data more often. Grass roots efforts to bypass traditional publishing models This trend is moving/evolving very fast We cannot anticipate what will be the needs of the users
  • 29. Supplemental materials Supplemental material (data) moving into main content “Pseudo-supplemental”: essential material, but doesn't "fit” into the journal article. (Sasha Schwarzman) Also called "integral content". E.g., Cell doesn't embed movies because they don't fit into PDFs. Sasha quoted E. Marcus: “... over time the concept of supplemental material will gradually give way to a more modern concept of a hierarchical or layered presentation in which a reader can define which level of detail best fits their interests and needs.” We need to be facilitating this transformation
  • 30. JATS Customizations JATS was designed in modular, extensible way, but the barrier to customizing is still high Alternatives to customizing: Suggest a change to the standard, and wait Create a local customization, and forego interchangeability Pseudo-customization
  • 31. Pseudo-customization Strategies Put the data into a separate file and link to it. CDATA section (à la RSS) "Escape hatches" with custom vocabulary Processing Instructions These are all ways of getting around the DTD (schema) So validation has to use a different mechanisms This is the tail wagging the dog: the DTD (schema) should work for us, not the other way around.
  • 32. JATS and supplemental data JATS has "escape hatches" for different kinds of data objects, and links to external objects. But it would often make more sense to include it natively. Bottom line: extensions and customizations will happen. It would be nice to have an infrastructure for communicating and managing them.
  • 33. Example - TaxPub Customization of JATS Allows inclusion of Taxonomic treatments into journal articles As described by Terry Catapano at last year's JATS-Con Used in ZooKeys, published by PenSoft. Articles are simultaneously released to the Species-ID wiki
  • 34. TaxPub JATSPack Named “taxpub-schema” Directory structure: taxpub/ schema/ 0.1/ dtd/ doc/ samples/
  • 35. Converting TaxPub into a JATSPack Fixed relative system identifiers Fixed doctype declarations, for example: From: <!DOCTYPE article SYSTEM "../tax-treatment-NS0.dtd"> To:<!DOCTYPE book PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../dtd/tax-treatment-NS0.dtd"> Created an OASIS catalog file Zip the results into a .xar file. Upload to JATSPAN
  • 36. TaxPub to JATSPack: advantages The advantages are not dramatic Lower the activation energy for others to discover and install Increase visibility Could allow for inclusion of (for example) XSLT libraries, self tests, documentation, in a consistent way Easier for some other developer to extend TaxPub
  • 37. Use Case - Publisher or archive adds support for new document type Currently there is no standard way of packaging the information relevant to a document type. Installation is not especially hard, but does require some expertise and coordination of resources
  • 38. Use Case - Publisher or archive adds support for new document type With JATSPack but not jatspan: Just download the Zip file, unzip it, and update your catalog file With jatspan: jatspan install This automatically resolves dependencies.
  • 39. Use Case - JATS-related libraries These are not schema extensions; just code libraries. Right now, there is no standard way to deploy a library Advantages here are the same as for EXPath packaging In fact they could be deployed as EXPath packages.
  • 40. Example - Journal Publishing 3.0 Preview Stylesheets as a JATSPack By Wendell Piez, presented at JATS-Con last year Repackaged as a JATSPack Adapted to use Xproc Not a major improvement, but, again, incrementally lowers the activation energy to find/install/use/extend these. Especially extend Other authors could write new JATSPacks that depend on these, Installing those, dependency would be automatically resolved.
  • 41. Example, JATS-to-EPub transformation By Laura Kelly, presented at JATS-Con last year Depends on the preview stylesheets mentioned above This could be deployed without the preview stylesheets, and that dependency would be resolved by jatspan
  • 43. JATSPack supports any schema language Schematron is the best language to use for validation – -- Eliot Kimber Relax NG is very expressive and easy to use NVDL looks cool The documentation is the final word (Eliot again) JATSPack can (should) include this documentation
  • 44. Forwards compatibility - review Means that newer documents (version 2) can be used by existing/old processing systems (version 1). E.g. "must ignore" pattern of extensibility of HTML HTML renderers must ignore any tags that they don't understand This is a forwards-compatibility extension substitution rule This allows future designers to customize the HTML schema, adding elements and attributes, while being able to predict how document instances in the new schema will be processed by old systems.
  • 45. Forwards compatibility – we can do better than HTML TaxPub, for example, adds new elements and attributes The package could include XSLTs that transform those into "standard JATS“ More powerful extension forward-compatibility substitution rules. Gets close to useful, blind interchange
  • 46. How JATSPack and JATSPAN help Interchange By lowering the activation energy (just a little) at several rate-limiting steps in the reaction: Easier to customize ... correctly and robustly Easier to package Easier to share Easier to discover Easier to install
  • 48. Format is not JATS specific This format could be used to package customizations of any other XML standard. I hope to merge my extensions back into EXPath-pkg Could use CXAN
  • 49. Future work Current work (not as far along as I'd hoped). Adding other existing resources (Relax NG and docs) to core bundle. Finish up the examples described in the paper. Get the JATS core bundle to be packaged with oXygen XML editor as JATS “framework”. This is an idea/suggestion. Don’t know if it would be acceptable, but I think it’s a good fit.
  • 50. Future work – forwards compatible extension mechanism I think JATSPack is an important first step, but more work is needed to realize this goal. A lot of prior work on this topic. Eliot seems to have some ideas.
  • 51. Future work - JATSPAN Throw examples and how-tos on the JATSPAN site JATSPacks should be usable directly off of JATSPAN, without installing to a local machine Should be able to browse package documentation on JATSPAN, w/o downloading JATSPAN could provide document instance tools, such as a validator, style checker, and document previewer. Not just for DTDs but for any of the schema languages in the JATSPack. "Roma for JATS"
  • 52. Help! Suggestions / criticisms welcome Jatspan-users mailing list https://lists.sourceforge.net/lists/listinfo/jatspan-users jatspan-users@lists.sourceforge.net(No need to subscribe. Just send “+1” to this address. It will help!) Help with development Help with ideas
  • 53. Links Sourceforge site Latest version of Balisage paper Sourceforge project JATSPAN (Coming soon!) Interesting article on ZooKeys / TaxPub / Species-ID
  • 54. Candy