REPRESENTING SERIALS
METADATA IN
INSTITUTIONAL
REPOSITORIES
Lisa Gonzalez, Electronic Resources
Librarian
NASIG 2015
Washington, DC
Do we even need journal
metadata?
 “What we want is articles,” said Gorman,
calling the idea of putting them together in
things called journals “irrelevant.”
Tenopir, Carol. “The Value of the Container.” Library
Journal 131, no. 2 (2/1/2006 2006): 32–32.
Consider Discoverability
 “If you're using repository or journal management
software, such as Eprints, DSpace, Digital Commons
or OJS, please configure it to export bibliographic data
in HTML "<meta>" tags. Google Scholar supports
Highwire Press tags (e.g., citation_title), Eprints tags
(e.g., eprints.title), BE Press tags (e.g.,
bepress_citation_title), and PRISM tags (e.g.,
prism.title). Use Dublin Core tags (e.g., DC.title) as a
last resort - they work poorly for journal papers
because Dublin Core doesn't have unambiguous
fields for journal title, volume, issue, and page
numbers.”
Google Scholar Indexing Guidelines,
https://scholar.google.com/intl/en-
us/scholar/inclusion.html#indexing .
Example Article Citation
Elements
 Chicago Manual of
Style
 Article Title
 Article Author
 Journal Title
 Journal Date
 Journal Volume
 Journal Issue
 Page Range
 Journal Article Tag
Suite (JATS)
 <article>
 <article-meta>
 <journal-meta>
 <contrib>
 <ref-list>
(Peroni, Lapeyre, and
Shotton, 2012)
OpenDOAR Directory for IRs
 Includes description, policies summary,
software platform, OAI-PMH availability, and
size
 Statistics for repositories includes location,
frequent languages, frequent content types,
metadata and data re-use policies, and
content, submission and preservation policies
 About 85% of repositories represented have
unknown, unstated, or undefined metadata re-
use policies
Metadata Re-use in
OpenDOAR
Repository Content Types
Open Access Repository
Software
University of Michigan - DSpace
University of Michigan
Characteristics
 DCTERMS.bibliographicCitation can refer to
pre-print or publisher’s PDF
 DC.type indicates the genre is article
 DC.date.issued is year of publication
University of Queensland-
Fedora
University of Queensland
Characteristics
 Include journal title, volume, issue, start page,
end page and date, plus ISSN – Highwire
Press tags
 Sub-type for article not contained in <meta>
tags with other Dublin Core elements, but in
<body>
 Now has Open Access Mandate Compliance
field
Columbia University - Fedora
Columbia University
Characteristics
 Includes Publisher and CU DOIs
 Includes journal title, volume, issue, start page,
end page and date – Highwire Press tags
 Uses MODS metadata schema, but not in the
<meta> tags
Columbia University MODS
Example
eLIS - EPrints
eLIS Characteristics
 eprints.type and dc.type to indicate preprint or
journal article
 eprints tags includes publication title, volume,
issue number and date range
 Identifier examples include eprints.issn,
eprints.id_number, eprints.official_url, and
dc.identifier
University of Nebraska Lincoln -
Bepress
University of Nebraska Lincoln
Characteristics
 Uses bepress_citation tags – author, title and
date
 The citation information for the journal is
contained in <body>
 PDFs appear to be formatted according to
Google Scholar inclusion guidelines
Bielefeld University - LibreCat
Bielefeld University
Characteristics
 Uses Highwire Press tags
 Includes DOI
 Includes ISSN
 RDF example:
<link rel="DC.relation" href="urn:ISSN:0361-073x"
/>
UPEI – Islandora
UPEI Characteristics
 Highwire Press tags for journal citation, except
for citation_lastpage
 Additional Dublin Core elements - DC.isPartOf
also used for journal title, DC.type for Journal
Article, and DC.identifier used for PMID
 pre-print status appears in record display
CTU - CONTENTdm
CTU Metadata
 DC.Identifier – DOI
 DC.bibliographicCitation – full citation to
journal article
Starting a Data Dictionary
 Identifier – ISSN (ISSN:1612-9768)
 Identifier – DOI (URI)
 Relation-IsPartOf – journal title
 Identifier-BibliographicCitation – full citation
 Type - “Journal Article” :
http://www.ukoln.ac.uk/repositories/digirep/index/Ep
rints_Type_Vocabulary_Encoding_Scheme#Journal
Article
 Type - “text” : DCMI
Developing Good Practices
 Try some tools to practice with Dublin Core
metadata -
http://www.dublincoregenerator.com/generator.h
tml
 Examples of useful documentation for our
library include UIC Data Dictionary for
CONTENTdm, Best Practices for
CONTENTdm and Other OAI-PMH Compliant
Repositories
 Examples directly related to journal articles
can be scattered across many data
dictionaries, best practices, and other
Use Case – Green OA
 “About 50% journal articles published during
the past 12 months are freely available on the
Internet. Nearly half of those OA articles are
Green OA. There are millions of them on IRs,
traditional journal Web sites, authors’ social
network sites, and other Web sites.”
Xiaotian Chen, “Open Access Articles Reaching 50% But
Their Retrieval is Lagging,” CARLI Annual Meeting,
2014.
Distinguishing Article Versions
 MIT metadata indicating publisher’s PDF
Example record: http://hdl.handle.net/1721.1/92550
dc.eprint.version – Final published version
dc.relation.isversionof -
http://dx.doi.org/10.1038/srep07467
Use Case – Zotero Integration and
IRs
 CoinS – recognizes genre as article, but can
be missing key citation elements
 Embedded Metadata – often detects journal
articles as web pages
 DOI – can record publisher’s URL, rather than
article version present in IR
 Retrieve Metadata for PDF – only works if
article is indexed in Google Scholar
Use Case – Open URL Link
Resolver
 SFX links to Google Scholar via
getWebSearch, which is a citation title search
 Could link resolver link to IRs individually, or,
more likely, a collection of IR metadata, such
as OpenDOAR?
WorldCat to Google Scholar to IR
Link
Final Considerations
 Start with the specific use cases for your own
institution
 Evaluate your policies in light of OpenDOAR
policy guidelines
 Don’t be afraid to share your metadata and
your documentation

Article metadata in institutional repositories gonzalez

  • 1.
    REPRESENTING SERIALS METADATA IN INSTITUTIONAL REPOSITORIES LisaGonzalez, Electronic Resources Librarian NASIG 2015 Washington, DC
  • 2.
    Do we evenneed journal metadata?  “What we want is articles,” said Gorman, calling the idea of putting them together in things called journals “irrelevant.” Tenopir, Carol. “The Value of the Container.” Library Journal 131, no. 2 (2/1/2006 2006): 32–32.
  • 3.
    Consider Discoverability  “Ifyou're using repository or journal management software, such as Eprints, DSpace, Digital Commons or OJS, please configure it to export bibliographic data in HTML "<meta>" tags. Google Scholar supports Highwire Press tags (e.g., citation_title), Eprints tags (e.g., eprints.title), BE Press tags (e.g., bepress_citation_title), and PRISM tags (e.g., prism.title). Use Dublin Core tags (e.g., DC.title) as a last resort - they work poorly for journal papers because Dublin Core doesn't have unambiguous fields for journal title, volume, issue, and page numbers.” Google Scholar Indexing Guidelines, https://scholar.google.com/intl/en- us/scholar/inclusion.html#indexing .
  • 4.
    Example Article Citation Elements Chicago Manual of Style  Article Title  Article Author  Journal Title  Journal Date  Journal Volume  Journal Issue  Page Range  Journal Article Tag Suite (JATS)  <article>  <article-meta>  <journal-meta>  <contrib>  <ref-list> (Peroni, Lapeyre, and Shotton, 2012)
  • 5.
    OpenDOAR Directory forIRs  Includes description, policies summary, software platform, OAI-PMH availability, and size  Statistics for repositories includes location, frequent languages, frequent content types, metadata and data re-use policies, and content, submission and preservation policies  About 85% of repositories represented have unknown, unstated, or undefined metadata re- use policies
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    University of Michigan Characteristics DCTERMS.bibliographicCitation can refer to pre-print or publisher’s PDF  DC.type indicates the genre is article  DC.date.issued is year of publication
  • 11.
  • 12.
    University of Queensland Characteristics Include journal title, volume, issue, start page, end page and date, plus ISSN – Highwire Press tags  Sub-type for article not contained in <meta> tags with other Dublin Core elements, but in <body>  Now has Open Access Mandate Compliance field
  • 13.
  • 14.
    Columbia University Characteristics  IncludesPublisher and CU DOIs  Includes journal title, volume, issue, start page, end page and date – Highwire Press tags  Uses MODS metadata schema, but not in the <meta> tags
  • 15.
  • 16.
  • 17.
    eLIS Characteristics  eprints.typeand dc.type to indicate preprint or journal article  eprints tags includes publication title, volume, issue number and date range  Identifier examples include eprints.issn, eprints.id_number, eprints.official_url, and dc.identifier
  • 18.
    University of NebraskaLincoln - Bepress
  • 19.
    University of NebraskaLincoln Characteristics  Uses bepress_citation tags – author, title and date  The citation information for the journal is contained in <body>  PDFs appear to be formatted according to Google Scholar inclusion guidelines
  • 20.
  • 21.
    Bielefeld University Characteristics  UsesHighwire Press tags  Includes DOI  Includes ISSN  RDF example: <link rel="DC.relation" href="urn:ISSN:0361-073x" />
  • 22.
  • 23.
    UPEI Characteristics  HighwirePress tags for journal citation, except for citation_lastpage  Additional Dublin Core elements - DC.isPartOf also used for journal title, DC.type for Journal Article, and DC.identifier used for PMID  pre-print status appears in record display
  • 24.
  • 25.
    CTU Metadata  DC.Identifier– DOI  DC.bibliographicCitation – full citation to journal article
  • 26.
    Starting a DataDictionary  Identifier – ISSN (ISSN:1612-9768)  Identifier – DOI (URI)  Relation-IsPartOf – journal title  Identifier-BibliographicCitation – full citation  Type - “Journal Article” : http://www.ukoln.ac.uk/repositories/digirep/index/Ep rints_Type_Vocabulary_Encoding_Scheme#Journal Article  Type - “text” : DCMI
  • 27.
    Developing Good Practices Try some tools to practice with Dublin Core metadata - http://www.dublincoregenerator.com/generator.h tml  Examples of useful documentation for our library include UIC Data Dictionary for CONTENTdm, Best Practices for CONTENTdm and Other OAI-PMH Compliant Repositories  Examples directly related to journal articles can be scattered across many data dictionaries, best practices, and other
  • 28.
    Use Case –Green OA  “About 50% journal articles published during the past 12 months are freely available on the Internet. Nearly half of those OA articles are Green OA. There are millions of them on IRs, traditional journal Web sites, authors’ social network sites, and other Web sites.” Xiaotian Chen, “Open Access Articles Reaching 50% But Their Retrieval is Lagging,” CARLI Annual Meeting, 2014.
  • 29.
    Distinguishing Article Versions MIT metadata indicating publisher’s PDF Example record: http://hdl.handle.net/1721.1/92550 dc.eprint.version – Final published version dc.relation.isversionof - http://dx.doi.org/10.1038/srep07467
  • 30.
    Use Case –Zotero Integration and IRs  CoinS – recognizes genre as article, but can be missing key citation elements  Embedded Metadata – often detects journal articles as web pages  DOI – can record publisher’s URL, rather than article version present in IR  Retrieve Metadata for PDF – only works if article is indexed in Google Scholar
  • 31.
    Use Case –Open URL Link Resolver  SFX links to Google Scholar via getWebSearch, which is a citation title search  Could link resolver link to IRs individually, or, more likely, a collection of IR metadata, such as OpenDOAR?
  • 32.
    WorldCat to GoogleScholar to IR Link
  • 33.
    Final Considerations  Startwith the specific use cases for your own institution  Evaluate your policies in light of OpenDOAR policy guidelines  Don’t be afraid to share your metadata and your documentation

Editor's Notes

  • #7 The first place we look for a repository's policies is its OAI-PMH Identify response (e.g. for Nottingham EPrints - http://eprints.nottingham.ac.uk/perl/oai2?verb=Identify). This often includes standard sections for policies. Alternatively, we look for a relevant web page in the repository - a special 'Policies' page or an 'About' page. We then analyse the policies using standard criteria, an assign a grade for each policy. If we are unable to find any policy information at all, the status is set to 'Unknown'. If there is information on policies, but the particular policy is not covered, the status is set to 'Unstated'. In some cases, there may be a slot for the relevant policy, but all it says is 'not yet defined'. In these cases we set the status to 'Undefined’.
  • #9 Sorted the OpenDoar list of repositories to find the largest, and then picked examples from particular platforms.
  • #10 University of Michigan Deep Blue example – has some special fields added to DSpace? Zotero detects this metadata as Embedded Metadata. The OAI-PMH URL is listed in OpenDOAR. DC.type “Article” and Highwire Press tags in the HTML <head> element are basically DSpace out of the box; i.e., you don’t have to customize these, but Dspace already has the SEO features built in
  • #12 The University of Queensland eSpace IR is able to add additional information about articles through its scoping feature – this includes a limiter for journal articles and also for Scopus article. Where does this information come from?
  • #14 Columbia University has a Fedora repository with a MODS schema, right? They support OAI-PMH - http://academiccommons.columbia.edu/catalog/oai?verb=Identify. Appears to be MODS only, or is this Highwire Press? The save to Zotero is DOI.
  • #15 MODS metadata plus additional <meta> tags Includes Publisher and CU DOIs
  • #16 MODS metadata plus additional <meta> tags Includes Publisher and CU DOIs
  • #17 eLIS and Google Scholar – could not retrieve metadata via Zotero for PDF I found in Google Scholar. Saves to Zotero as Embedded Metadata, but as a web page.
  • #19 University of Nebraska has BePress repository, which uses BePress tags. BePress can be shared through OCLC(?), and is working with Google? Can’t see Citation or DOI in metatags, but they do appear on main screen display; no volume, issue, page range or DOI saves to Zotero. Saves as article in Zotero as CoinS.
  • #21 This example has almost all of the expected elements of traditional journal citation. This example saves to Zotero as a DOI. Look for ISSNs in all my examples. Bielefeld is on the Linking Open Data (LOD) Cloud diagram, but their OpenDOAR listing that their metadata re-use policy is explicitly undefined. According to linked data model, however, they have published the dataset themselves, rather than relying on others to harvest it themselves, perhaps? Two different approaches – active to publish, passive to let it be harvested? Saves to Zotero as DOI.
  • #23 This example also saves to Zotero as DOI; includes full citation, DOI, ISSN, cross ref, and link to wiley’s website in the Zotero citation. One of the DC.identifier’s is the PubMed ID, which links to the pubmed article in the display.
  • #27 Example of local adaptations that people may make – these are the adaptations I looked to make with ContentDM
  • #28 Are we satisfied with the lack of best practices? Is it sufficient to use DC, MODS, BePress, Highwire tags as needed for our own particular context? What would we do if we shared the data more broadly? Is linking on citation title enough? Focused on examples that were the most extensive – many data dictionaries focus on controlled vocabulary, which is primarily subject terms, but can be other things(?). CSU also focused on authority control for names. CSU does include a prefix for their internal identifier schema, which includes a prefix for articles as genre; the purpose of this is statistical purposes (p.24).
  • #31 Compare how Zotero finds article citation information – how does it break out all the needed citation elements from an IR? Zotero is also dependent on Google Scholar to do some of the work for PDFs – essentially, if it can be found in Google Scholar, and you save a copy of the article from the IR, you should be able to use the Retreive Metadata for PDF feature, but this doesn’t always work, for instance with Kousha and Thelwall article, even though it is in eprints – I found this article searching Google, not the IR interface that it is in, so it is important for the PDF to have embedded metadata, since I did not end up on the landing page in the IR for this article. Eprints.rclis.org has embedded metadata that was inconsistent; it was able to identify a conference paper, but a journal article was identified by Zotero as a web page – the de robbio article from example slide. The de robbio article captured the journal title as the website title, and website type as journal article.
  • #32 Give demo of adding OpenDOAR to SFX as discoverability.
  • #33 You can link to Google Scholar from citation to this document in an IR from WorldCat, but it is using the getWebSearch link in SFX, which is a keyword title search. Remember, the basis of Google Scholar’s links is the citation title, not the journal title. The link is available in WorldCat because of Esevier’s partnership with OCLC.