Maryann E. Martone, Ph. D.
Executive Director
Professor of Neuroscience, University of California, San Diego
Future of Res...
What is FORCE11?
Future of Research Communications and E-
Scholarship:
A grass roots effort to accelerate the pace and nat...
Who is FORCE11?
Anyone who has a stake in moving scholarly communication into the 21st century
Publishers
Library and
Info...
FORCE11 Vision
• Modern technologies enable vastly improve knowledge transfer and far wider
impact; freed from the restric...
Old Model: Single type of content;
single mode of distribution
Scholar
Library
Scholar
Publisher
The future is now...
Scholar
Consumer
Libraries
Data Repositories
Code Repositories
Community databases/platforms
OA
Curat...
The duality of modern scholarship
Observation: Those who build information systems from the
machine side don’t understand ...
Digital objects are a new beast
New modes of representation and verification
will be necessary
Trust: Not just
who produce...
Impetus for change: Is our current
method serving science?
47/50 major preclinical
published cancer studies
could not be r...
The scientific corpus is fragmented
• ~25 million articles
total, each covering a
fragment of the
biomedical space
• Each ...
A new platform for scholarly
communications
Components
• Authoring tools
– Optimized for mark up and linked content
• Cont...
FORCE11.org
• Community platform
– Meetings
– Discussions
– Tools and resources
– Blogs
– Event calendar
– Community proje...
Beyond the PDF
• Conference/unconferen
ce where all
stakeholders come
together as equals to
discuss issues
– Publishers
– ...
Promote community, cross-
fertilization and interoperability
• FORCE11 helps facilitate
communications across
disciplines ...
ORCID
Data journals
Research Data Alliance
PeerJ, eLife
Workflows 4Ever
Data Verse
Impact Story, Rubriq
Sadie
Scalar
Resou...
FORCE11 Working Groups
• FORCE11 provides a neutral convening place
for individuals to come together around issues
in scho...
Data: Who’s problem is it?
Scholar
Library
Scholar
Publisher
Domain-
specific
Repository
Web
site/Personal
data
management...
Is data like a
bibliographic record?
• Not uniform in
size
• Not uniform in
type
• Curation requires
deep
understanding of...
Surveying the resource
landscape
Neuroscience Information Framework http://neuinfo.org
Deep metadata
http://neuinfo.org
With the thousands of databases and other information sources
available, simple descripti...
A place to come together: Data
citation principles
•FORCE11 provides a neutral
space for bringing groups
together
•35 indi...
Process
Synthesis
Community
feedback
Revision Dissemination
July-Sept 2013 Nov-Dec 2013 Jan 2014 Now
Data Citation Princip...
Joint Declaration of Data Citation
Principles
• Designed to be high
level and easy to
understand
• Supplemented with
a glo...
Significance & Scope
• Sound, reproducible scholarship rests upon a
foundation of robust, accessible data.
• Data should b...
1. Importance. Data should be considered legitimate, citable
products of research. Data citations should be accorded the s...
Function
4. Unique Identification. A data citation should include a persistent
method for identification that is machine-a...
Attributes
6. Persistence. Unique identifiers, and metadata describing the data
and its disposition, should persist -- eve...
Generic Data Citation
(as it appears in printed reference list)
Note:
● Neither the format nor specific required elements ...
Placement of Citations
Intra-work:
● Should provide sufficient information to identify cited data reference within include...
Citation Metadata
Author(s), Year, Dataset Title,
Data Repository or Archive,
Version, Global Persistent
Identifier.
Metad...
Growing Adoption
https://www.force11.org/datacitation/endorsements
Endorse the Principles!
• http://www.force11.org/datacitation/endorsements
148 individuals; 60 organizations
Unique ID’s for all! Resource
Identification Initiative
• It is currently impossible
to query the biomedical
literature to...
Resource Identification Initiative
• Have authors supply
appropriate identifiers for
key resources used within
a study suc...
Pilot Project
• Have authors identify 3 different types
of research resources:
– Software tools and databases
– Antibodies...
First results are in the literature
Google Scholar: Search RRID; select since 2014
What studies used X?
To date:
•30 articles have appeared
•2 articles have disappeared, i.e.,
the RRID’s were removed at
co...
What have we learned?
Utopia plug-in: Steve Pettifer
•Authors are willing to
adopt new types of
citations
•RRID = usage of...
Data Citation Implementation Group
FORCE11 Vision
• Modern technologies enable vastly improve knowledge transfer and far wider
impact; freed from the restric...
Notes & References for Data Citation Principles
Notes
[1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman & King 2...
Upcoming SlideShare
Loading in...5
×

FORCE11: Creating a data and tools ecosystem

251
-1

Published on

Describes FORCE11 and the recent successes through the Data Citation Synthesis Working and the Resource Identification Initiative working groups

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
251
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

FORCE11: Creating a data and tools ecosystem

  1. 1. Maryann E. Martone, Ph. D. Executive Director Professor of Neuroscience, University of California, San Diego Future of Research Communications and E-Scholarship Creating a data and tools ecosystem
  2. 2. What is FORCE11? Future of Research Communications and E- Scholarship: A grass roots effort to accelerate the pace and nature of scholarly communications and e-scholarship through technology, education and community Why 11? We were born in 2011 in Dagstuhl, Germany Principles laid out in the FORCE11 Manifesto FORCE11 launched in July 2012
  3. 3. Who is FORCE11? Anyone who has a stake in moving scholarly communication into the 21st century Publishers Library and Information scientists Policy makers Tool builders Funders Scholars Science Humanities Social Sciences
  4. 4. FORCE11 Vision • Modern technologies enable vastly improve knowledge transfer and far wider impact; freed from the restrictions of paper, numerous advantages appear • We see a future in which scientific information and scholarly communication more generally become part of a global, universal and explicit network of knowledge • To enable this vision, we need to create and use new forms of scholarly publication that work with reusable scholarly artifacts • To obtain the benefits that networked knowledge promises, we have to put in place reward systems that encourage scholars and researchers to participate and contribute • To ensure that this exciting future can develop and be sustained, we have to support the rich, variegated, integrated and disparate knowledge offerings that new technologies enable Beyond the PDF Visual Notes by De Jongens van de Tekeningen is licensed under a Creative Commons Attribution 3.0 Unported License.
  5. 5. Old Model: Single type of content; single mode of distribution Scholar Library Scholar Publisher
  6. 6. The future is now... Scholar Consumer Libraries Data Repositories Code Repositories Community databases/platforms OA Curators Social Networks Social NetworksSocial Networks Peer Reviewers Workflows Data Blogs/Wikis Multimedia Nanopublications Narrative Code
  7. 7. The duality of modern scholarship Observation: Those who build information systems from the machine side don’t understand the requirements of the human very well Those who build information systems from the human side, don’t understand requirements of machines very well Scholarship requires the ability to cite and track usage of scholarly artifacts. In our current mode of working, there is no way to easily track artifacts as they move through the ecosystem; no way to incrementally add human expertise; no way to alert everyone when things go wrong
  8. 8. Digital objects are a new beast New modes of representation and verification will be necessary Trust: Not just who produced it but what produced it
  9. 9. Impetus for change: Is our current method serving science? 47/50 major preclinical published cancer studies could not be replicated  “The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.” Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531
  10. 10. The scientific corpus is fragmented • ~25 million articles total, each covering a fragment of the biomedical space • Each publisher owns a fragment of a particular field • The current process is inefficient and slow Wiley Elsevier MacMillian Oxford Spinal Muscular Atrophy Machine-based access requires that we take a global view of the body scholarly and allow mining across content
  11. 11. A new platform for scholarly communications Components • Authoring tools – Optimized for mark up and linked content • Containers – Expand the objects that are considered “publications” – Optimize the container for the content • Processes – Scholarship is code • Mark up – Data, claims, content suitable for the web – Suitable identifier systems • Reward systems – Incentives to change – Reward for new objects Scholarship must move from a “single currency system”; platforms must recognize diversity of output and representation
  12. 12. FORCE11.org • Community platform – Meetings – Discussions – Tools and resources – Blogs – Event calendar – Community projects • Promote interoperability – Data Citation – Resource identification initiative 500 members from diverse stakeholder groups 700
  13. 13. Beyond the PDF • Conference/unconferen ce where all stakeholders come together as equals to discuss issues – Publishers – Technologists – Scholars – Library scientists • Incubator for change • What would you do to change scholarly communication? San Diego, Jan 2011 ...... Amsterdam, March 2013........?2015 http://www.force11.org/beyondthepdf2 YES!!! FORCE
  14. 14. Promote community, cross- fertilization and interoperability • FORCE11 helps facilitate communications across disciplines and communities • Issues are not identical but we can learn from each other – Enhanced publications • Digital humanities + – Dealing with data • Science + – Open Access • Science + “What is an ORCID id?”-computer scientist
  15. 15. ORCID Data journals Research Data Alliance PeerJ, eLife Workflows 4Ever Data Verse Impact Story, Rubriq Sadie Scalar Resource for scholarly communications: People, organizations, publications, tools
  16. 16. FORCE11 Working Groups • FORCE11 provides a neutral convening place for individuals to come together around issues in scholarly communication – FORCE11 provides web working space and facilitation where possible – 1K Challenge: Beyond the PDF – Short term working groups with clear focus • Deliverable specified • Time line determined
  17. 17. Data: Who’s problem is it? Scholar Library Scholar Publisher Domain- specific Repository Web site/Personal data management Computing Scholars, Data Repositories, Institutional Repositories taking ownership of data. Where should it go? Sometimes it can’t go anywhere.
  18. 18. Is data like a bibliographic record? • Not uniform in size • Not uniform in type • Curation requires deep understanding of domain • Data is dynamic • Data is fluid Geoff Bilder, CrossRef
  19. 19. Surveying the resource landscape Neuroscience Information Framework http://neuinfo.org
  20. 20. Deep metadata http://neuinfo.org With the thousands of databases and other information sources available, simple descriptive metadata will not suffice
  21. 21. A place to come together: Data citation principles •FORCE11 provides a neutral space for bringing groups together •35 individuals representing > 20 organizations concerned with data citation •Conducted a review of current data citation recommendations from 4 different organizations •Arrived at a sense of consensus principles Data citation synthesis group: http://www.force11.org/node/4 381
  22. 22. Process Synthesis Community feedback Revision Dissemination July-Sept 2013 Nov-Dec 2013 Jan 2014 Now Data Citation Principles: Open for Endorsement
  23. 23. Joint Declaration of Data Citation Principles • Designed to be high level and easy to understand • Supplemented with a glossary, references and examples http://www.force11.org/datacitation 1. Importance 2. Credit and attribution 3. Evidence 4. Unique Identification 5. Access 6. Persistence 7. Specificity and verifiability 8. Interoperability and flexibility
  24. 24. Significance & Scope • Sound, reproducible scholarship rests upon a foundation of robust, accessible data. • Data should be considered legitimate, citable products of research. • Data citation, like the citation of other evidence and sources, is good research practice. • The Joint Principles cover purpose, function and attributes of citations. • Specific practices vary across communities and technologies – we recommend communities develop practices for machine and human citations consistent with these general principles.
  25. 25. 1. Importance. Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications [1]. 2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data [2]. 3. Evidence. In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited [3]. Purpose
  26. 26. Function 4. Unique Identification. A data citation should include a persistent method for identification that is machine-actionable, globally unique, and widely used by a community [4]. 5. Access. Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data [5]. Joint Declaration of Data
  27. 27. Attributes 6. Persistence. Unique identifiers, and metadata describing the data and its disposition, should persist -- even beyond the lifespan of the data they describe [6]. 7. Specificity and verifiability. Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited [7]. 8. Interoperability and flexibility. Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities [8].
  28. 28. Generic Data Citation (as it appears in printed reference list) Note: ● Neither the format nor specific required elements are intended to be defined with this example. Formats, optional elements, and required elements will vary across publishers and communities. [Principle 8: Interoperability and flexibility]. ● As illustrated in the previous examples, intra-work citations may be accompanied with information including the specific portion used. [Principles 7,8]. ● As illustrated in the next example, printed citations should be accompanied by metadata that support credit, attribution, specificity, and verification. [Principles 2, 5 and 7]. Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier Principle 2: Credit and Attribution (e.g. authors, repositories or other distributors and contributors) Principle 4: Unique Identifier (e.g. DOI, Handle.). Principle 5, 6 Access, Persistence: A persistent identifier that provides access and metadata Principle 7: Specificity and verification (e.g. the specific version used). Versioning or timeslice information should be supplied with any updated or dynamic dataset.
  29. 29. Placement of Citations Intra-work: ● Should provide sufficient information to identify cited data reference within included reference list. ● Citation to data should be in close proximity to claims relying on data. [Principle 3] ● May include additional information identifying specific portion of data related supporting that claim. [Principle 7] Example: The plots shown in Figure X show the distribution of selected measures from the main data [Author(s), Year, portion or subset used]. Full Citation: Citation may vary in style, but should be included in the full reference list along with citations to other types works. Example: References Section Author(s), Year, Article Title, Journal, Publisher, DOI. Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier. Author(s), Year, Book Title, Publisher, ISBN.
  30. 30. Citation Metadata Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier. Metadata retrieval <!--- CONTRIBUTOR METADATA --> <contributor role=” ORCIDid=”>Name</contributor> <!-- FIXITY and PROVENANCE -- <fixity type=”MD5”>XXXX</fixity> <fixity type=”UNF”>UNF:XXXX</fixity> <!-- MACHINE UNDERSTANDABILITY -- > <content type>data</content type> <format>HDF5</format> Note: ● Metadata location, formats, and elements will vary across publishers and communities. [Principle 8] ● Citation metadata is needed in addition to the information in the printed citation. ● Metadata describing the data and its disposition should persist beyond the lifespan of the data. [Principle 6] ● Citation metadata should support attribution and credit [Principle 2]; machine use [Principle 5]; specificity and verification [principle 7] ● For example, additional citation metadata may be embedded in the citing document; attached to the persistent identifier for the citation, through its resolution service; stored in a separate community indexing service (e.g. DataCite, CrossRef); or provided in a machine-readable way through the surrogate (“landing page”) presented by the repository to which the identifier is resolved. For more detail, see the References section. http://www.force11.org/node/4772 EXAMPLE METADATA
  31. 31. Growing Adoption https://www.force11.org/datacitation/endorsements
  32. 32. Endorse the Principles! • http://www.force11.org/datacitation/endorsements 148 individuals; 60 organizations
  33. 33. Unique ID’s for all! Resource Identification Initiative • It is currently impossible to query the biomedical literature to find out what research resources have been used to produce the results of a study • Impossible to find all studies that used a resource • Critical for reproducibility and data mining • Critical for trouble- shooting http://www.force11.org/resource_identification_initiative Faulty Antibodies Continue to Enter US and European Markets, Warns Top Clinical Chemistry Researcher-Genome Web Daily, October 11, 2013
  34. 34. Resource Identification Initiative • Have authors supply appropriate identifiers for key resources used within a study such that they are: – Machine processible (i.e., unique identifier that resolves to a single resource) – Outside of the paywall – Uniform across journals and publishers Launched February 2014: > 30 journals participating
  35. 35. Pilot Project • Have authors identify 3 different types of research resources: – Software tools and databases – Antibodies – Genetically modified animals • Include RRID in methods section • RRID=RRID:Accession number – Just a string at this point • Voluntary for authors • Journals did not have to modify their submission system • Journals have flexibility in implementation. Send request to author at: – Submission – During review – After acceptance http://scicrunch.com/resources Resource Identification Portal: Aggregates accession numbers from >10 different databases that are the authorities for registering research resources
  36. 36. First results are in the literature Google Scholar: Search RRID; select since 2014
  37. 37. What studies used X? To date: •30 articles have appeared •2 articles have disappeared, i.e., the RRID’s were removed at copyediting •195 RRID’s were reported •14 were in error = 0.7% •> 200 antibodies were added •> 75 software tools/databases were added •A resolver service has been created •3rd party tools are being created to provide linkage between resources and papers RRID:nif-0000-30467
  38. 38. What have we learned? Utopia plug-in: Steve Pettifer •Authors are willing to adopt new types of citations •RRID = usage of research resource •Ideal: resolved by search engines without requiring specialized citation services •Citation drives registration •Clear role for repositories as authorities •Should RRID’s be DOI’s? Will system work for data citation and more complicated research objects?
  39. 39. Data Citation Implementation Group
  40. 40. FORCE11 Vision • Modern technologies enable vastly improve knowledge transfer and far wider impact; freed from the restrictions of paper, numerous advantages appear • We see a future in which scientific information and scholarly communication more generally become part of a global, universal and explicit network of knowledge • To enable this vision, we need to create and use new forms of scholarly publication that work with reusable scholarly artifacts • To obtain the benefits that networked knowledge promises, we have to put in place reward systems that encourage scholars and researchers to participate and contribute • To ensure that this exciting future can develop and be sustained, we have to support the rich, variegated, integrated and disparate knowledge offerings that new technologies enable No single infrastructure serves everything; cooperation in defining a global system of scholarly communication
  41. 41. Notes & References for Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman & King 2007 [2] CODATA 2013, Sec 3.2; 7.2.3; Uhlir (ed.) 2012,ch. 14 [3] CODATA 2013, Sec 3.1; 7.2.3; Uhlir (ed.) 2012, ch. 14 [4] Altman-King 2007; CODATA 2013, Sec 3.2.3, Ch. 5; Ball & Duke 2012 [5] CODATA 2013, Sec 3.2.4, 3.2.5, 3.2.8 [6] Altman-King 2007; Ball & Duke 2012; CODATA 2013, Sec 3.2.2 [7] Altman-King 2007; CODATA 2013, Sec 3.2.7, 3.2.8 [8] CODATA 2013, Sec 3.2.10 References • M. Altman & G. King, 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib • Ball, A., Duke, M. (2012). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. • CODATA-ICSTI Task Group on Data Citation, 2013; Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal • P. Uhlir (ed.),2011. For Attribution -- Developing Data Attribution and Citation Practices and Standards. National Academies of Sciences
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×