Simple additions to metadata
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Simple additions to metadata

on

  • 1,370 views

My 3-min talk for Beyond The PDF 2 (Amsterdam) #btpdf2 pointing out that scholarly publishers don't tend to embed very rich metadata in their products e.g. PDFs. ...

My 3-min talk for Beyond The PDF 2 (Amsterdam) #btpdf2 pointing out that scholarly publishers don't tend to embed very rich metadata in their products e.g. PDFs.

Sure, arguably this data is inside the PDF, but I don't want to look inside the PDF. I want my machines to be able to grep what the PDF is in a nanosecond from structured, standardised metadata. I believe the standards exist, there's XMP and it's been around for a long time. It's just not being implemented much, or fully. In the current mixed world of Open Access and subscription access content it's *particularly* important to indicate the license details clearly in content, in a machine-readable way.

Can we please put this information into all 'professionally' published content? Surely not a difficult task?

Statistics

Views

Total Views
1,370
Views on SlideShare
1,161
Embed Views
209

Actions

Likes
0
Downloads
11
Comments
0

4 Embeds 209

https://twitter.com 206
http://fasel.fornax.uberspace.de 1
http://tweetedtimes.com 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Simple additions to metadata Presentation Transcript

  • 1. PDF Content Metadata Ross Mounce University of Bath, PhD Candidate Open Knowledge Foundation,Panton Fellow & Community Coordinator, Open Science (I have many hats these are just some) @rmounce #btpdf2
  • 2. Why is metadata important?● Millions of scholarly articles published per year What is in sdarticle.pdf & what is in sdarticle (1).pdf I dont want to have to read PDFs myself to know roughly whats inside!!!
  • 3. Machines can help us
  • 4. ExifTool reads embedded metadata ● Quick & easy to get the metadata of all files ISO 16684-1:2012 standard XMP -Extensible Metadata Platform www.sno.phy.queensu.ca/~phil/exiftool/
  • 5. ...in practice PDF metadata is poor Some publishers do not have any embedded XMP in their Version of Record PDFs (e.g. Taylor & Francis from what Ive seen so far) ...and Ive yet to see a PDF in the wild withembedded license information. Very important! [ If I downloaded these files years ago are they CC BY, All Rights Reserved or CC BY-NC-ND ? ] Data on http://rossmounce.co.uk/2013/01/06/pdf-metadata-using-exiftool/ http://dx.doi.org/10.6084/m9.figshare.106195 http://rossmounce.co.uk/2012/12/31/pdf-metadata-why-so-poor/ http://dx.doi.org/10.6084/m9.figshare.105633
  • 6. What producer wouldnt properly label their content? Label your products! If I cant redistribute it, put that in themetadata please!
  • 7. Can you tell me anything about this PDF from its metadata?[XMP] XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.389687, 2009/06/02-13:20:35[XMP] Producer : Acrobat Distiller 6.0 (Windows)[XMP] Create Date : 2008:07:16 14:00:37+05:30[XMP] Creator Tool : PScript5.dll Version 5.2.2[XMP] Modify Date : 2010:10:11 08:21:42-07:00[XMP] Metadata Date : 2010:10:11 08:21:42-07:00[XMP] Format : application/pdf[XMP] Creator : Administrator[XMP] Title : ST&HV306704.qxd[XMP] Document ID : uuid:6a0d428a-3141-43c3-bdbe-8446710f5c8e[XMP] Instance ID : uuid:f7caddbd-1dd1-11b2-0a00-15f6e8a599ff PDF from http://dx.doi.org/10.1177/0162243907306704