On my way to the meeting, I was watching a very moving, epic work of art that’s definitely going down in cinematic history. It’s a story of the human pursuit - dogged, persistent to change the world for the better, using knowledge and grit in intrepid ways to save lives, protect the helpless, save the environment. This great movie is Baywatch. And a convicting truth emerged when I watched this profound movie. Who’s seen it? C’mon don’t be shy… Well this is a cultured bunch.
In the same vein, we do what we do because we know knowledge, scholarly research matters and the fruits of our labors are the same ones share by the heroes and heroines of Baywatch. That’s why we’re all here. Especially Geoffrey Bilder whose beard makes him kelp in water. He’s basically the bearded version of The Rock of scholarly communications.
But Baywatch has questioned everything that I know about metadata. I’d like to ask us to take a step back and ask a central question: WHAT IS METADATA? My sense is that our notion of it is outdated, outmoded, too atrophied to adequately reflect what metadata in our digital world. Yes, we have all the elements of bibliographic metadata: authors, ISSN/ISBN, title, publication date, page numbers even? (PAGE NUMBERS!)
Perhaps our description of it remains salient: data about data (“a set of data that describes and gives information about other data”). But what constitutes it may ask us to shift our current thinking in the context of the research enterprise today where research data is in many places across systems, in numerous forms and formats, moving seamlessly (or not so seamlessly) across systems, exchanged and transacted with no central power guiding or directing it.
Scholarly metadata certainly contains the standard bibliographic information, but I’m going to thread together new ones, possibly surprising ones, that are also integral to the Crossref corpus: Peer Reviews, Event Data, Data & software citations
So you publishers of scholarly content acquire and prepare the research results, tag it, and then disseminate it to the world with your metadata that goes to Crossref.
A core part of the process - to ensure research integrity and serve your communities are to manage the peer review process, getting experts to review and discuss the results communicated. Some of our members have been sharing these discussions online, some have even been registering the content with us as datasets, components, and journal articles.
In a month we will be rolling out a new content type to properly support review assets. We’ll be extending our existing infrastructure for scholarly discussion, both pre and post publication.
This content type will accommodate a wide range of assets from the peer review history of a journal article…
So we have a new content type that will insert into the Crossref corpus a new set of rich metadata for scholarly discussion recognized (or identified) as peer reviews. But the fo rms in which scholars share/critique/recommend research comes in numerous forms and happens all over the web (scholarly and non-scholarly places), not just on publisher platforms. We are also collecting metadata about these activities through a new service in Beta now called Crossref Event Data.
Events cover a number of actions such as an addition or removal of a reference from a Wikipedia page, it could be a tweet, it could be a bookmark on a sharing platform, annotations and recommendations. Event Data is metadata, not the content, but metadata about these activities which link to a piece of research in the Crossref corpus. (where did it occur, who did it, when did it happen, etc.)
This infrastructure effectively detaches upstream event aggregation from downstream altmetrics services in the altmetrics supply chain, making specialization possible in the production process and increasing efficiency in the entire system.
The value of this data is what we call “Evidence First”. Data aggregation has been and continues to be a black box, especially when it’s coming from numerous locations. This is what we call an evidence gap, a mysterious black hole between the data our agent collected from the source, and the corresponding ‘event’ we then produced in our service. Event Data fills the evidence gap by providing a full Evidence Record for every piece of external data we receive. We make the entire corpus of records available so that it can verified and audited. This transparency serves as the bedrock for building trust in the data as well as all applications of the data by Event Data consumers.
The last set of metadata I’d like to touch on is data and software links.
As far as the how’s, depositing these data and software links are part of the standard deposit workflow - no changes needed! either through references or through relations type. For more details, we have a deposit guide online which you can get to easily from our documentation site. https://support.crossref.org/hc/en-us/articles/215787303-Crossref-Data-Software-Citation-Deposit-Guide-for-Publishers
Beyond the standard bibliographic metadata, the map asserts relationships between the article and other research objects connected to it: the article nexus. With publisher’s help we linking the publication to the other outputs associated with it.
Crossref - scholarly infrastructure where linking is our business through the metadata that links people, places, things in the research ecosystem
But with all these players, and the produce of all these players, how do we get a view of the ecosystem and its activities? Together, we can make the labors visible, their value concrete to the enterprise itself.
We at Crossref aim to support this work as part of infrastructure by making these connections across the scholarly web.
Firstly, we link literature to People: authors, reviewers, editors, etc. (ORCID auto-update) and Organisations: funders (and grants), affiliations (Open Funder Registry).
Also bibliographic descriptors: publication history (versions, updates, revisions, corrections, retractions, dates received/accepted/published) peer review (status, type, reviews) access indicators (publication license for text & data mining, machine mining URLs) We link literature to other clinical trials studies (registered report, replication study) via the clinical trial number. Publishers can deposit registered clinical trial numbers (CTNs) referenced in articles We link all the articles that reference the same CTN: Pre-results, results, post-results all threaded together
Contributors (authors, editors, reviewers) Funding information (funding body, grant number) Publication history (versions, updates, revisions, corrections, retractions, dates received/accepted/published) Peer review (status, type, reviews) Access indicators (publication license for text & data mining, machine mining URLs) Clinical trial & study information (clinical trials registry number, registered report, replication study) Resources & associated research artifacts (preprints, figures & tables, datasets, software, protocols, research resource IDs) Activity surrounding the publication (peer reviews, comments & discussions, bookmarks, social shares, recommendations)
Crossref Event Data will provide the raw ingredients, with evidence from the source, like a farmshop! It’ll be audit-able and portable. And it’s up to you how to season it, what meal to make with it. Then there are many people and tools that can serve it up in their own ways. Silver service or bicycle delivery!
What is this identifier? Under Section 702 of the FISA Amendments Act of 2008 internet companies such as Google Inc. turned over any data that match court-approved search terms to the United States National Security Agency (NSA). The code name for the program was PRISM. What did it do? PRISM collects stored internet communications from at least nine major US internet companies and then used for foreign-intelligence cases, which can involve terrorism, espionage, banned weapons proliferation, and transnational drug trafficking and hacking.
New product developments - Jennifer Lin - London LIVE 2017
Metadata Jennifer Lin
Crossref LIVE London 2017
Infrastructure for scholarly discussion
• Readers can see provenance and get context of a work
• Links to this content persist over time
• The metadata is useful, freely available, human & machine
• Reviews are connected to the full history of the published
• Contributors are given credit for their work (we will ask for
• The citation record is clear and up-to-date
I. New content type: Reviews
• Assets across peer review history for any and all review
rounds: referee reports, decision letters, and author
• Pre & post publication
• Dedicated metadata
• characterizes peer review asset: recommendation, type,
license, contributor info, competing interests
• makes the review process transparent: pre/post-
publication, revision round, review date
• Community provisioning & preservation of event metadata
• Consistent aggregation of metadata and transparency
measures is basis of data reliability and integrity
• Data standardization for downstream users
• Evidence trail to establish trust in the source data across
platforms (scholarly & non-scholarly)
II. Event Data
III. Data & software citations
• Proper research practice (DataCite-STM Joint Statement,
FORCE11 Joint Declaration of Data Citation Principles)
• Formal referencing of related resources
• Critical for validation & reproducibility of results
• Provide credit to data & software contributors
• Facilitate reuse of scholarly outputs
Event Data - some use cases
Enhance author services
Collect real-time data
Analyze publication growth
Track DOI-related activity
Measure research impact
Sample use cases