Simple additions to metadata

1,808 views

Published on

My 3-min talk for Beyond The PDF 2 (Amsterdam) #btpdf2 pointing out that scholarly publishers don't tend to embed very rich metadata in their products e.g. PDFs.

Sure, arguably this data is inside the PDF, but I don't want to look inside the PDF. I want my machines to be able to grep what the PDF is in a nanosecond from structured, standardised metadata. I believe the standards exist, there's XMP and it's been around for a long time. It's just not being implemented much, or fully. In the current mixed world of Open Access and subscription access content it's *particularly* important to indicate the license details clearly in content, in a machine-readable way.

Can we please put this information into all 'professionally' published content? Surely not a difficult task?

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,808
On SlideShare
0
From Embeds
0
Number of Embeds
181
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Simple additions to metadata

  1. 1. PDF Content Metadata Ross Mounce University of Bath, PhD Candidate Open Knowledge Foundation,Panton Fellow & Community Coordinator, Open Science (I have many hats these are just some) @rmounce #btpdf2
  2. 2. Why is metadata important?● Millions of scholarly articles published per year What is in sdarticle.pdf & what is in sdarticle (1).pdf I dont want to have to read PDFs myself to know roughly whats inside!!!
  3. 3. Machines can help us
  4. 4. ExifTool reads embedded metadata ● Quick & easy to get the metadata of all files ISO 16684-1:2012 standard XMP -Extensible Metadata Platform www.sno.phy.queensu.ca/~phil/exiftool/
  5. 5. ...in practice PDF metadata is poor Some publishers do not have any embedded XMP in their Version of Record PDFs (e.g. Taylor & Francis from what Ive seen so far) ...and Ive yet to see a PDF in the wild withembedded license information. Very important! [ If I downloaded these files years ago are they CC BY, All Rights Reserved or CC BY-NC-ND ? ] Data on http://rossmounce.co.uk/2013/01/06/pdf-metadata-using-exiftool/ http://dx.doi.org/10.6084/m9.figshare.106195 http://rossmounce.co.uk/2012/12/31/pdf-metadata-why-so-poor/ http://dx.doi.org/10.6084/m9.figshare.105633
  6. 6. What producer wouldnt properly label their content? Label your products! If I cant redistribute it, put that in themetadata please!
  7. 7. Can you tell me anything about this PDF from its metadata?[XMP] XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.389687, 2009/06/02-13:20:35[XMP] Producer : Acrobat Distiller 6.0 (Windows)[XMP] Create Date : 2008:07:16 14:00:37+05:30[XMP] Creator Tool : PScript5.dll Version 5.2.2[XMP] Modify Date : 2010:10:11 08:21:42-07:00[XMP] Metadata Date : 2010:10:11 08:21:42-07:00[XMP] Format : application/pdf[XMP] Creator : Administrator[XMP] Title : ST&HV306704.qxd[XMP] Document ID : uuid:6a0d428a-3141-43c3-bdbe-8446710f5c8e[XMP] Instance ID : uuid:f7caddbd-1dd1-11b2-0a00-15f6e8a599ff PDF from http://dx.doi.org/10.1177/0162243907306704

×