This document provides a progress report on the NISO/NFAIS Working Group on journal article supplemental materials. It discusses challenges around classifying, marking up, and preserving supplemental materials. It summarizes recommendations from the Business Working Group on peer review, formatting, and discoverability of supplemental content. Technical challenges include heterogeneity of content and determining appropriate granularity for markup. Integration with JATS is discussed. The future may involve more hierarchical presentation of materials. A stakeholders group will provide feedback on draft recommendations.
Using Schematron for appropriate layer validation: A case study
2011-09-27-JATS-Con-Presentation_Schwarzman
1. NISO/NFAIS Supplemental Journal
Article Materials Working Group:
A Progress Report
Alexander (‘Sasha’) Schwarzman
American Geophysical Union
sschwarzman@agu.org
Co-chair, NISO/NFAIS Working Group on
Journal Article Supplemental Materials
JATS-Con 2011: Journal Article Tag Suite Conference 2011
Bethesda, MD
27 September 2011
2. Deluge: sup. mat. ratio
Bethesda, MD 27 September 2011 JATS-Con 2011 2
Chart courtesy of Ken Beauchamp, American Society for Clinical Investigation
3. Average size of a Journal of Neuroscience article and supplemental material
Source: Maunsell, J. (2010), Announcement regarding supplemental material,
The Journal of Neuroscience 30(32): p.10599
Bethesda, MD 27 September 2011 JATS-Con 2011 3
Deluge: sup. mat. size
4. What is in the Pandora’s box?
• Multimedia
• Gene sequences, protein structures, chemical
compounds, crystallographic structures, 3-D images
• Computer programs (algorithms, code, libraries, and
executables)
• Text, Tables, Figures (Materials and methods,
Extended methodology, Survey results,
Bibliographies, Derivations, …)
• Datasets (datasets are not the only type of sup. mat.)
Bethesda, MD 27 September 2011 JATS-Con 2011 4
5. Supplemental materials: Yes, we can!
• Enabling technology makes it possible for:
• authors to present supporting evidence,
e.g.
datasets
multimedia
• researchers to present in-depth studies that
would not be available in print
• readers to replicate experiments and verify
resultsBethesda, MD 27 September 2011 JATS-Con 2011 5
6. Yes, we can… But should we?
• Do I (reader, reviewer) need to look at sup.
mat.? [Degree of importance]
• How do I (librarian, indexer) know sup. mat.
exists? How do I find it? [Discoverability]
• How do I cite / link to sup. mat.?
[Identification and Linking]
• Will sup. mat. be there in 20 years?
… 200 years? [Viability]
Bethesda, MD 27 September 2011 JATS-Con 2011 6
7. Yes, we can… But should we? (cont’d)
• Will sup. mat. be renderable/executable?
[Conversion/Forward migration]
• Do I see the original? [Preservation/Longevity]
• How do I send sup. mat. out? How do I know
nothing was lost in transmission? [Packaging]
• Who has custody? [Curatorial responsibility]
• Who owns it? [Intellectual property rights]
• Who pays for curating? [Business models]
Bethesda, MD 27 September 2011 JATS-Con 2011 7
8. Who cares? You should – if you are an …
• Author / Editor
• Reviewer
• Reader
• Publisher
• Hosting platform / Institutional Repository /
Data center / Individual
• A&I service
• Reference linking and Citation indexing service
• Librarian / Archivist / Historian of scholarship
Bethesda, MD 27 September 2011 JATS-Con 2011 8
9. Chronology
• February 2009: NFAIS Best Practices for
publishing journal articles
• November 2009: Schwarzman’s Report on
supplemental materials survey results
• January 2010: NISO/NFAIS supplemental
materials Thought Leader Roundtable
• August 2010: NISO/NFAIS Working Group
on journal article supplemental materials
Bethesda, MD 27 September 2011 JATS-Con 2011 9
13. Classification
• Supplemental materials
Additional content (“truly supplemental”)
Integral content (“pseudo-supplemental”)
For technical, business, or logistical reasons, it is
treated as if it were supplemental – but it is not!
• Related content
Generally resides in an official data center or
institutional repository. The publisher has no
responsibility or authority over it and does not host it.
No recommended practices offered.
Bethesda, MD 27 September 2011 JATS-Con 2011 13
14. Additional content
Provides a relevant and useful expansion of the article in
the form of text, tables, figures, multimedia, or data. May
aid any reader to achieve deeper understanding of the
work through added detail and context.
Examples: expanded methods sections and bibliographies;
additional supporting data or results; copies of
instruments/surveys; and multimedia and interactive
representations of additional, relevant, and useful
information.
Generally, the author has created this content and the
publisher hosts it or places it on the open web.
Bethesda, MD 27 September 2011 JATS-Con 2011 14
15. Integral content
Essential for the full understanding of the work by the general
scientist or reader in the journal’s discipline, but placed
outside the article for technical, business, or logistical
reasons.
Examples: descriptions of methods needed to evaluate a
study, review, or technical report; detailed results required to
comprehend outcomes; tables, figures, or multimedia with
primary data required to verify/fully understand the work.
In general, the publisher maintains responsibility for hosting
and curating this content in the same way the article itself is
treated. (For some specialized journals, content held in an
external repository may be considered integral.)
Bethesda, MD 27 September 2011 JATS-Con 2011 15
16. Related content
Other content the author wishes to make the reader aware
of because it may add to the understanding of the work
or to the replication or verification of the results.
Examples: data used, created, or deposited by authors and
held in external repositories, gene sequences, protein
structures, crystallographic structures, digital recordings,
3-D images, and chemical compounds.
Generally resides in an official data center or institutional
repository. Because the publisher lacks any authority
over this type of content, no recommended practices are
offered. However, some recommendations on
preservation plans and repositories are included.
Bethesda, MD 27 September 2011 JATS-Con 2011 16
17. BWG recommendations
Integral content Additional content
Selecting /
Peer reviewing
At the same level as core
article
May not be reviewed at the
same level
Copyediting At the same level as core
article. Should be noted if not
May not be edited at the same
level. If so, should be noted
Referencing
within article
Cite/link at the same level as
table or fig. No ref. list entry,
for this content is part of article
Provide in-text citation and
link at the appropriate point in
text, rather than at the end
Citing from
other pub’s
Not to be cited separately.
Cite article as a whole
Can be cited separately
References
within sup. mat.
Integrate references into the
ref. list of the core article
Keep references separate
from the core article ref. list
Bethesda, MD 27 September 2011 JATS-Con 2011: The Markup
Conference
17
18. BWG recommendations (cont’d)
Integral content Additional content
Preserving Preserve at the same level as
the core article
Provide the same metadata
markup
Include in migration plans
Take preservation into
consideration when accepting
If uncertain about preservation,
have author submit to a trusted
repository and link to it
Managing
rights
Treat rights in the same
manner as the rights for the
core article
Anyone who has access to
online article should also have
access to Integral content
Determination of rights for
Additional content may differ and
should be transparent to users
Bethesda, MD 27 September 2011 JATS-Con 2011: The Markup
Conference
18
19. BWG recommendations (cont’d)
• Managing and hosting sup. materials
If journal content is hosted by an aggregator
or other host, that host should also deliver
supplemental materials
Use persistent identifiers to ensure links to
and from core article
An author’s website is not an appropriate
place for the sole posting of supplemental
materials
Bethesda, MD 27 September 2011 JATS-Con 2011: The Markup
Conference
19
20. BWG recommendations (cont’d)
• Discovering supplemental materials
Consistent placement, naming, and navigation
Indicate presence in the table of contents
Link to the Integral content from within the article
Link to the Additional content “above the fold” on
the first PDF or HTML page of the article
Aid A&I services by including metadata that
indicate the purpose and format of the
supplemental materials
Bethesda, MD 27 September 2011 JATS-Con 2011: The Markup
Conference
20
21. BWG recommendations (cont’d)
• Linking to and from supplemental materials
Provide bidirectional linking to and from both
Additional and Integral content
Assign separate DOIs for Additional and Integral
content
• Providing context for supplemental materials
Do not supply README files.
Include the following elements either on a
landing page or within the content itself:
Bethesda, MD 27 September 2011 JATS-Con 2011: The Markup
Conference
21
22. BWG recommendations (cont’d)
• Providing context for sup. materials (cont’d)
Article citation and DOI
Title and/or succinct statement about the
content
For multimedia: player, file extension, and size
List multiple files
Browser information, if supplemental content
rendition is browser-dependent
A separate DOI or other unique identifier
Bethesda, MD 27 September 2011 JATS-Con 2011: The Markup
Conference
22
23. Technical Working Group – “how”
Co-chairs: Dave Martinsen (ACS), Sasha Schwarzman (AGU)
• Metadata, persistent identifiers, and granularity of
markup needed to support practices recommended by
the BWG
• Referencing and linking to and from supplemental
materials, handling cited references within
• Archiving, preservation, and forward migration of
supplemental materials
• Packaging, exchange, and delivery of supplemental
materials
• Technical support for accessibility practices
recommended by the BWG
Bethesda, MD 27 September 2011 JATS-Con 2011 23
24. Metadata schema
• Supplemental material
• Core (parent) article metadata
• Type: (Additional | Integral | Related)
• Core article item being supplemented (figure,
table, etc.)
• Descriptive metadata
• Physical metadata
• Object or Object group or Object wrapper
Bethesda, MD 27 September 2011 JATS-Con 2011 24
25. Object group vs. Object wrapper
• Object group contains logically different objects that share
some common metadata, e.g., a series of graphs or images
• Object wrapper contains objects that are associated with or
represent various aspects of the same logical object, e.g.,
A chemical structure represented by:
a connection table,
an image of a molecule in a static orientation, and
an interactive application allowing manipulation by the viewer.
Protein-related information represented by:
analytical measurements,
chemical structure, and
derived structures.
Bethesda, MD 27 September 2011 JATS-Con 2011 25
26. Metadata schema (cont’d)
• Object or Object group or Object wrapper
Core article item being supplemented
Descriptive metadata
Physical metadata
Object or Object group or Object wrapper
Bethesda, MD 27 September 2011 JATS-Con 2011 26
27. Descriptive metadata
• ID
• version
• label
• contrib_group
• content_descriptor
• title
• language
• alt_title
• accessibility_long_desc
• summary
• subject_descriptor
• physical_form_descriptor
• ref_count
• publication_info
• creation_date
• preservation_level
• copyright
• license
• open_access
Bethesda, MD 27 September 2011 JATS-Con 2011 27
29. TWG: Conceptual challenges:
• Heterogeneity: an archive or a document may
contain both Additional and Integral content
• Relationships: related but different objects;
alternate representations of the same object
• Recurrence: an archive (ZIP, TAR, RAR) or a
document (PDF, MS Word) may contain
nested objects and groups
• Hierarchical structure: an archive may contain
a tree with many branches and sub-branches
Bethesda, MD 27 September 2011 JATS-Con 2011 29
30. Challenges: conceptual (cont’d)
• Granularity down: what level to choose —
entire sup. mat., groups, objects, …?
• Granularity up: link to a specific item within
the article or to the article as a whole?
• Should Related content be marked up?
• What is the extent of differences in marking
up Integral and Additional content? (Think
about tables; now think about videos)
Bethesda, MD 27 September 2011 JATS-Con 2011 30
31. Challenges (practical)
• Is sup. mat. importance “in the eye of the
beholder?” (what’s Additional to you is
Integral to me) — some beholders are
more equal than others: a decision made
upfront determines downstream processing
• Real costs, hypothetical benefits
• Business models: is sup. mat. a money
maker or a money waster?
Bethesda, MD 27 September 2011 JATS-Con 2011 31
32. Integration with JATS
• supplementary-material wrapper already
contains such typically supplemental objects as
figure, media, table – but not a structural section!
• Parameterized list of supplementary-
material attributes can be extended to include
metadata developed by the NISO Working Group
• Attribute lists of elements that could be
supplemental, e.g., table, figure, media,
section, etc., can be extended as well
• alternatives can hold Object wrappers/groups
Bethesda, MD 27 September 2011 JATS-Con 2011 32
33. Integration with JATS (cont’d)
• What is currently missing is mechanism for
indicating whether sup. mat. is Additional,
Integral, or Related. A dedicated attribute could be
introduced for that purpose, e.g., @importance
(Additional | Integral
| Related)
(Note: Elsevier 5.1 has the @role attribute that
could be used to categorize sections, figures, and
e-components)
• Or the @specific-use attribute could be used
for that purpose (expedient – but overload danger)Bethesda, MD 27 September 2011 JATS-Con 2011: The Markup
Conference
33
34. What does the future hold?
“… over time the concept of
supplemental material will gradually give
way to a more modern concept of a
hierarchical or layered presentation in which
a reader can define which level of detail
best fits their interests and needs.”
Marcus, E. (2009), Taming supplemental material,
Cell 139(1), p.11, doi:10.1016/j.cell.2009.09.021
Bethesda, MD 27 September 2011 JATS-Con 2011 34
35. Stakeholders group
A larger group to be kept apprised of
development, to serve as a source of
feedback on drafts, and to provide
community vetting of a final document.
The group list is open; anyone who would
like to track the progress of this project
and would like to potentially provide
feedback on draft work can sign up by
visiting: www.niso.org/lists/suppinfo
Bethesda, MD 27 September 2011 JATS-Con 2011 35
36. Sources
Beebe, L. (2010), Supplemental materials for Journal articles: NISO/NFAIS Joint Working
Group, Information Standards Quarterly 22(3), p.33, doi:10.3789/isqv22n3.2010.07
Carpenter, T. (2009), Journal article supplementary materials: A Pandora’s box of issues
needing best practices, Against the Grain 21(6), p.84
Marcus, E. (2009), Taming supplemental material, Cell 139(1), p.11,
doi:10.1016/j.cell.2009.09.021
Maunsell, J. (2010), Announcement regarding supplemental material, The Journal of
Neuroscience 30(32): p.10599
NFAIS (2009), Best practices for publishing journal articles, 30 pp.,
http://www.nfais.org/files/file/Best_Practices_Final_Public.pdf
Schwarzman, S. (2010), Supplemental materials survey, Information Standards Quarterly
22(3), p.23, doi:10.3789/isqv22n3.2010.05
http://www.agu.org/dtd/Presentations/sup-mat/10.3789_isqv22n3.2010.05.pdf
NISO/NFAIS Supplemental journal article materials project
http://www.niso.org/workrooms/supplemental
sschwarzman@agu.org
Bethesda, MD 27 September 2011 JATS-Con 2011 36
And JCI is not an isolated case. The situation is typical
Average size of a Journal of Neuroscience article and supplemental material in megabytes. Values are trimmed means (5th–95th percentile) to exclude a handful of unaccountably large articles and supplemental files. Supplemental movies are excluded to facilitate comparisons because a megabyte of a movie is arguably easier to evaluate than a megabyte of text, figures, or tables. Data include only articles published in January of each year. Error bars are standard errors of the trimmed means.
In his article, Todd Carpenter, NISO’s managing director, referred to supplemental materials as “Pandora’s box”
Supplemental materials are not just datasets.
It seemed like such a good idea! Who could possibly quibble with it?
So, what’s the problem? As Debbie says, “Just because we can doesn’t mean we should”
Reference linking and Citation indexing services: CrossRef, ISI Web of Knowledge, Scopus, PubMed Central
The community has responded to the challenge in various ways. Researchers appear to be split on the issue: while some argue that more supplemental materials should be made available and are optimistic about the technology’s ability to solve some of the above problems, others argue that a scholarly journal “is not a data dump” and an article “is not an FTP site.”
Different publishers too responded in different ways. In 2009, Cell imposed strict limits on the number and kind of supplemental materials that could be accepted. In 2010, The Journal of Neuroscience banned supplemental materials altogether and announced that it would embed dynamic content in its articles’ PDF format. In 2011, The Journal of Experimental Medicine introduced a policy limiting supplemental materials only to “essential supporting information Marcus, E. (2009), Taming supplemental material, Cell 139(1), p.11, doi:10.1016/j.cell.2009.09.021
Maunsell, J. (2010), Announcement regarding supplemental material, The Journal of Neuroscience 30(32), p.10599, http://www.jneurosci.org/content/30/32/10599.full
NFAIS Best Practices for publishing journal articles: One key recommendation on supplemental materials was that the journal make a clear connection between an article and the supplemental materials that accompany it. Once published, the supplemental materials should be considered part of the journal’s archival record and should not be changed without a clear statement of correction. Publishers, the document noted, should always supply a recommended citation as well as good, descriptive metadata for those materials. A&i services covering the journal article should include the presence of supplemental data in the article record, indicating file types and DOI.
Mixture of publishers, librarians, archivists, A&I and citation linking services, independent consultants. Governmental and non-governmental organizations, commercial and non-commercial, academic and non-academic ones.
This represents my own view, and not that of the NISO/NFAIS Working Group
Institutional repositories and official data centers: GenBank, Protein Data Bank, World Data Centers, Pangaea, etc.
Images and Tables, if essential, have been integrated into the article for 300 years but what if it is multimedia or chemical structures or datasets that are essential?
Let’s consider two tables, one is essential and the other is not: the former appears in the main article, whereas the latter is tucked into an Appendix or Online supplement. The essential table is marked up using a CALS or XHTML model, the non-essential is not tagged. Now let’s consider a video: one is essential, the other is supplemental: while the non-essential one can still be dumped in an online supplement, the essential video may or may not be integrated into the main article. And it may or may not have metadata attached to it.
The distinction is not just “what cannot be printed” vs. what can. A non-essential table can be printed but is still “truly supplemental.” An essential video cannot be printed but is not supplemental at all. The distinction is conceptual, based on the degree of importance of a particular object to the scholarly discourse.
The fact that a particular object is not integrated into the article because of technical or business reasons should not obscure the fact that there is nothing supplemental about such content. I refer to such content as “pseudo-supplemental – it *pretends* to be supplemental while it is not.
The adjective “supplemental” is a homonym: the content critical/essential for understanding the article’s conclusions isn’t supplemental at all but, confusingly, the same term is being used to refer to both types of content.
This is a work in progress
Obviously, object groups contain objects but why would objects contain object groups? Because a single archive (ZIP, TAR) or a document (PDF, MS Word) may contain multiple objects and groups of objects
label: Figure S1
content_descriptor: General type of content for the material, for example, survey, extended methodology (semi-controlled vocabulary)
summary: A brief summary of the supplemental material. Typically used to describe the contents prior to downloading or for assistive devices.
subject_descriptor: Subject keyword describing the supplemental material (controlled on non-controlled vocabulary)
physical_form_descriptor: Table, dataset
publication_info: Who published the material an author, publisher, organization, maybe different than the copyright holder
preservation_level: A commitment from a publisher or archive to carry the supplemental material forward. Used to help determine the likelihood that the material will be accessible in the future. Different for Additional and Integral. Could be implicit
fixity: checksum, etc.
relationship: Relation between objects, binds objects together. RDF-like "is-a" "is-a-child", etc. The intellectual unit can be preserved as a group and tell which object is primary (or preferred).
Validity – whether a file is valid in relationship to the format it claims to be; goes with format. We suggest publishers use JHOVE terminology: “well-formed”, “well-formed and valid”, “not valid”, “not well-formed”. See http://hul.harvard.edu/jhove/index.html or http://www.jhove2.org
Then the only difference in how an integral supplemental object and a fully integrated one are marked up may be the @specific-use value
Moving beyond the binary “Integral/Additional” divide to a “tree-ring” (concentric circles) model: assigning weights, like in relevance ranking?
False pretenses: Pseudo-supplemental (treated as if it were supplemental but it is not)
This is a screenshot of Cell article (doi:10.1016/j.cell.2011.04.005). Movie S1 (highlighted in yellow) is mentioned in the first (!) section of the article. The movie is perfectly integrated in the HTML, it can be played in context, on the right. Clearly, Movie S1 is essential to the article's science; there is nothing supplemental about it. And yet, that very movie, Movie S1, shows up when one clicks on the "Supplemental Information" tab. Why?! After all, this is not your grandfather's print-only journal that has no option but to treat non-printable content as if it were supplemental -- this is the Article of the Future (actually, of the Present now) itself! it's not as if Cell were not capable of integrating the movie; indeed, it did incorporate it into the narrative very nicely, as the screenshot demonstrates. But if you look at the PDF version, the movie is not embedded in it. They could for sure, but they chose not to. Why, furthermore, despite the successful integration of essential movie into the HTML narrative, they still placed that movie on the Supplemental Information tab?Puzzled, I contacted Keith Wollman, Cell's VP for Content and Operations. He wrote, "we don't currently embed movies in the article PDF because we are conservative about the archival nature of that PDF in response to concerns from librarians."And then it dawned on me: it is not simply paper vs. electronic divide that brings into being the concept of "pseudo-supplemental” materials: rather, it is the VoR! Whether we have an old-fashioned print-only journal or a cutting-edge one, like Cell, they make the same decision: treat essential content as if it were supplemental. Why? Whether the VoR is print or a static PDF (or maybe PDF/A), the publisher is unable or unwilling to embed dynamic objects into the VoR. On the other side of the fence are, for example, the Journal of Neuroscience, which says that they will embed the dynamic content into its PDF, or AGU journals. For the former, the VoR is dynamic PDF; for the latter, XML.There are some interesting MARKUP implications.Suppose that the Cell article contained two movies: one essential and one truly supplemental. Outside the article's context it would have been impossible to ascertain which movie bore which functional relationship to the article. A visible manifestation of that would be the fact that both movies would appear on the same Supplemental Information tab. Which means that the implicit degree of importance (normally inferred by the reader from the context) must be made explicit: this one is essential -- you must watch it; that one is supplemental -- you may skip it.Interestingly enough, J. of Neuroscience may face the same problem: they say, there is no need for supplemental materials, we'll embed them into the PDF, and that's it. Now, suppose the submission, again, contains two movies: one essential and one supplemental. Are they going to reject the supplemental movie and embed only the essential one? Or are they going to embed both? If the latter then again, the reader will not be able to differentiate which is which.