The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
BioPharma and FAIR Data, a Collaborative Advantage
1. BioPharma Adoption of FAIR* Data,
a Collaborative Advantage
Tom Plasterer, PhD
Research & Development Information (RDI); US Cross-Science Director 25 May 2017
* Findable, Accessible, Interoperable and Reusable
2. The right data is there when I need it
Your data and my data are mutually understandable
Our data can be effortlessly combined
I am permitted to use any data I can access
Data can be reshaped for a different purpose
Data sharing is rewarded
‘I’ can be a human or a machine
3
We Want Data Nirvana!
3. 4
FAIR Data: Overview
To be Findable:
• Globally unique, resolvable and persistent identifiers
• Machine-actionable contextual information supporting discovery
To be Accessible:
• Clearly defined access protocol
• Clearly defined rules for authorization/authentication
To be Interoperable:
• Use shared vocabularies and/or ontologies
• Syntactically and semantically machine-accessible format
To be Reusable:
• Be compliant with the F, A and I Principles
• Contextual information, allowing proper interpretation
• Rich provenance information facilitating accurate citation
Mark Wilkinson,
Data Interoperability and FAIRness Through Existing Web Technologies
4. 5
FAIR Data: A Brief History
Moving away from Narrative
• Nanopublications
Incubating Standards in Open PHACTS
• VoID, PROV-O
Lorentz Center Workshop
• FORCE 11 FAIR Guiding Principles
• Participants: IMI members, US researchers,
Content providers, ELIXIR; European Open
Science Cloud, Big Data to Knowledge (BD2K)
Current Status:
• FAIR Data Workshops (EU-ELIXIR nodes, Bio-IT)
• Inclusion in Horizon 2020, NIH Advocacy
• IMI2 Data FAIR-ification Call
• Vendors getting up to speed
5. 6
Rapid Adoption of Principles
Developed and endorsed
by researchers,
publishers, funding
agencies, industry
partners.
As of May 2017,
100+ citations since 2016
publication
Included in G20
communique, EOSC,
H2020, NIH, and more…
Thanks to: @micheldumontier::2017-05-19
6. 7
Introductory Nature Paper:
The FAIR Guiding Principles for scientific data management and stewardship
Thanks to: @micheldumontier::2017-05-19
This Altmetric score
indicates the article is:
• In the 99th percentile (ranked
615th) of the 278,235 tracked
articles of a similar age in all
journals
• In the 95th percentile (ranked
1st )of the 23 tracked articles
of a similar age in Scientific
Data
7. 8
FAIR Data: Systems Biology Survey
Molecular Systems Biology
Volume 11, Issue 12, 28 DEC 2015 DOI: 10.15252/msb.20156053
http://onlinelibrary.wiley.com/doi/10.15252/msb.20156053/full#msb156053-fig-0001
8. 9
FAIR Data: Data Stewardship Survey
Data Stewardship Survey
13 Questions – One minute out of your day!
http://bit.ly/BiopharmaDataStewardship
9. 10
Survey: What best describes your department?
65.24.3
8.7
13
4.34.3
IT/IS
Target Discovery
Lead Discovery
Clinical Development
Marketing & Sales
Other - Write In
10. 11
Survey: What is your scientific background?
21.7
13
4.3
34.8
8.7
17.4 Experimentalist
Modeler (Structural)
Modeler (Statistical)
Informatician
Project Manager
Other - Write In
11. 12
Survey: How importance is data reuse to your organization?
2 2
4
14
0
2
4
6
8
10
12
14
16
2 3 4 5
2
3
4
5
12. 13
How important are the use of public standards to structuring your data?
2 2
8
9
0
1
2
3
4
5
6
7
8
9
10
2 3 4 5
2
3
4
5
15. 16
Are metadata and data models considered proprietary at your organization?
40
55
5
Yes
No
Don't know
16. 17
What controlled vocabularies and/or ontologies do you use for structuring and
annotating your data and models?
31.6
21.1
26.3 26.3
36.8
68.4
42.1
21.1
15.8
52.6
78.9
63.2
15.8 15.8 15.8
10.5
52.6
15.8
31.6
0
0
10
20
30
40
50
60
70
80
90
17. 18
Are data usage requirements clearly understood within your organization?
No Yes
1
7
3
2 2 2
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5
0
1
2
3
4
5
18. 19
Is it easy or hard to get access to clinical data in your organization?
Easy Hard
2
4
7
2
0
1
2
3
4
5
6
7
8
2 3 4 5
2
3
4
5
19. 20
Is it easy or hard to get access to clinical metadata in your organization?
Easy Hard
3 3
4 4
1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5
1
2
3
4
5
20. 21
Survey: Who ‘owns’ clinical data at your organization?
40
13.36.7
26.7
13.3
A drug project team
A clinical area
A third party/vendor
Don't Know/Not Applicable
Other
21. 22
How do you share models and data with your collaborators before publication?
43.8 43.8
6.3
50
12.5 12.5
0
10
20
30
40
50
60
By email Through project
database/content
management system
Through bespoke Systems
Biology platform
Dropbox/Box/SharePoint Software Versioning
System
Don't know
22. 23
FAIR Data & Biopharma?
Collaborative & Competitive Intelligence:
• Who do we want to partner with? Are there complementary assets to our portfolio?
• What space is too crowded and not our area of expertise?
• Greenfield situations?
Mergers, Acquisitions, Partnerships:
• How do we efficiently and deeply absorb data generated elsewhere into our systems? How
do we efficiently share?
• Does this make a smaller biotech/start-up a more viable partner?
Improved Patient Care:
• Can we share data and outcomes more efficiently in complicated trial settings (basket trials,
adaptive trials) to better engage opinion leaders and foster dialog?
• Along with Differential Privacy approaches, can we have the broader research community
help mine our data?
Data (Ir)-reproducibility:
• Is preclinical data reproducible?
• Can we utilize data credentialization? (thanks to Dan Crowther @ Sanofi)
23. 25
Getting Started
What’s the difference between FAIR Data and Linked Data?
What’s Critical?
• URIs, PURLs
• Standards, vocabularies, cross-mapping
• Access rules
• FAIR-ness metrics
• Data and Information Scientists
FAIR and Enterprise Data Management
Adoption, Sticks and Carrots; Winners and Losers
Linked Data FAIR Data
24. R&D | RDI
Interoperable: Need clearly recognized
• Use the same plumbing and your data won’t be stuck in a silo
Accessible: Open, if permitted
• Interoperate first then govern
Reusable: Use public solutions and consortia
• Don’t reinvent the wheel (OK—Ontology…)
Invest in FAIR Data Stewardship
• Investment to future-proof your efforts
FAIR Data and Collaboration: Take-aways
25. R&D | RDI
Thanks
Key Influencers
David Wood
Toby Segaran
Tim Berners-Lee
Lee Harland
Bryn Williams-Jones
Eric Neumann
Dean Allemang
Barend Mons
Carole Goble
Bernadette Hyland
Bob Stanley
Eric Little
Michel Dumontier
John Wilbanks
Hans Constandt
Dan Crowther
Tim Hoctor
Bio-IT 2017
Conference Organizers
AZ/MedImmune Linked
Data Community
Kerstin Forsberg
Rajan Desai
Jeff Saltzman
David Ruau
Kathy Reinold
Bridget Behringer
Nirmal Keshava
Sara Dempster
Bryan Takasaki
Nick Wright
David Fenstermacher
Editor's Notes
EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020)
EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020)
50% (or higher) preclinical research could not be reproduced with a cost of $28B/year
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165