As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Circulatory Shock, types and stages, compensatory mechanisms
Â
Linked Data for Biopharma
1. Tom Plasterer, PhD.
integrated informatics Semantic Framework Lead (i2SF)
The Path to Linked Data in
BioPharma
Integrated R&D Informatics and Knowledge Management
2. R&D | RDI
Blockbuster âPatent Cliffâ Gives Way to Personalized Approach
Drivers & Solutions
Blockbuster
Patent Cliff
Growth of
Generics
Mergers &
Acquisitions
Personalized
Medicine
â˘Pharmacogenetics
â˘Biomarkers
American Action Forum; Primer: The Pharmaceutical Industry (Han Zhong l Updated June 2012)
IMAP Pharma & Biotech Industry Global Report 2011
Evaluate Pharma World Preview 2018From: http://www.liv.ac.uk/pharmacogenetics/
3. R&D | RDI
⢠Nurture âbest in classâ programs
⢠Kill early
⢠Repositioning
Build from within
⢠Partner or Buy?
⢠Integrate cultures & technology
⢠Is the disruption worth it?
Mergers &
Acquisitions
⢠How much can be sharedâand still be useful?
⢠Who is driving?
Pre-Competitive
Consortiums
⢠Aggressive Regional Partnerships (Pfizer's Centers for
Therapeutic Innovation)
⢠Co-locate near Academic Centers of Excellence (Novartis)
⢠Cherry pick (GSK, AZ, others)
Finding âKOLsâ
Where do the new opportunities arise?
Inside & Outside
4. R&D | RDI
Distributed Data in a Monolithic Environment
Managing Silos
â˘Regulated Systems vs. Discovery
Partitioned By Content
â˘US, EU, ASIAPAC
Partitioned By Geography & Organization
â˘RDB, Excel, Text, RSS, RDF?
Data Formats
â˘Steps in the right direction?
Warehouses & Service Oriented
Architecture
â˘eRooms, Sharepoint,Yammer, âLyncâ vs. Twitter, Google
Docs, Skype
Collaborative Environment
â˘Vendor specific or open?
â˘Mixed BagStandards?
â˘UI? Services?
â˘Metadata?Where are the âsmartsâ
5. R&D | RDI
Requirements of The Informatics Landscape
ďś Must span the entire drug development lifecycle
o and back (post-market surveillance to discovery)
ďś Must support large and very heterogeneous data
o single nucleotide polymorphisms to countries
ďś Will change as new science emerges & new regulations come into play
o Medline just under 1M articles/year
ďś Must be able to work with multiple, international regulatory bodies
o Emerging markets
ďś Partners, customers and collaborators will change
o and will have divergent technical aptitudes
ďś Must be able to interoperated with precompetitive consortia
o Can they perform common tasks for the community
ďś Must be able to work with legacy data
o Lots of unmined gems here!
Maximal Agility
7. R&D | RDI
The 5 Stars of Open Linked Data
W3C/TBL Guidance
7
http://www.w3.org/DesignIssues/LinkedData.html
â Make your stuff available on the web (any
format)
â â make it available as structured data (e.g. Excel
instead of image scan of a table)
â â â Use a non-proprietary format (e.g. CSV instead
of Excel)
â â â â Use URLs to identify things, so that people can
point at your stuff
â â â â â Link your data to other peopleâs data to provide
context
8. R&D | RDI
The 5 Stars of Open ClosedLinked Data
8
http://www.w3.org/DesignIssues/LinkedData.html
â Make your stuff available on the web intranet
(any format)
â â make it available as structured data (e.g. Excel
instead of image scan of a table)
â â â Use a non-proprietary format (e.g. CSV instead
of Excel)
â â â â Use URLs to identify things, so that people can
point at your stuff
â â â â â Link your data to other peopleâs data to provide
context
W3C/TBL Guidance
9. Catalogues, Mapping, Queries
RDF
Towards a Linked Data Architecture
9
Active & Partial PURLs
Central Identity
Management
Structured
Triplestores
http://research.vocab.astrazeneca.com/id/DOID/2841 http://humandiseaseontology.astrazeneca.net/DOID/2841
Semantic
Visualization
Semi-StructuredUnstructured
Content
+Tagging
Vocabulary
Server
Search
10. R&D | RDI
Choosing Linked Vocabularies
Current LOD Cloud Adoption
10
Vocabulary prefix Vocabulary link
Number of
usages in data
sets
dc http://purl.org/dc/elements/1.1/ 92 (31.19 %)
foaf http://xmlns.com/foaf/0.1/ 81 (27.46 %)
skos http://www.w3.org/2004/02/skos/core# 58 (19.66 %)
geo http://www.w3.org/2003/01/geo/wgs84_pos# 25 (8.47 %)
xhtml http://www.w3.org/1999/xhtml/vocab# 19 (6.44 %)
akt http://www.aktors.org/ontology/portal# 17 (5.76 %)
bibo http://purl.org/ontology/bibo/ 14 (4.75 %)
mo http://purl.org/ontology/mo/ 13 (4.41 %)
vcard http://www.w3.org/2006/vcard/ns# 10 (3.39 %)
sioc http://rdfs.org/sioc/ns# 10 (3.39 %)
cc http://creativecommons.org/ns# 8 (2.71 %)
geonames http://www.geonames.org/ontology# 6 (2.03 %)
http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms
Vocabulary
Server
11. R&D | RDI
The 5 Stars of Open Linked Vocabularies
Bernard Vatant (Mondeca) Guidance
11 http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html
â Publish your vocabulary on the Web at a stable
URI
â â Provide human-readable documentation and
basic metadata (e.g. creator, publisher, date of
creation, last modification, version number)
â â â Provide labels and descriptions, if possible in
several languages, to make your vocabulary
usable in multiple linguistic scopes
â â â â Make your vocabulary available via its
namespace URI, both as a formal file and
human-readable documentation, using content
negotiation
â â â â â Link to other vocabularies by re-using elements
rather than re-inventing
12. R&D | RDI
Domain Specific Vocabularies
Linked Open Vocabularies, NCBO
12
http://labs.mondeca.com/dataset/lov/index.html
http://bioportal.bioontology.org/
13. Capture Business
Questions and
Sources
Domain Expert
Concept Map
Build Formal
Ontology
â˘Reuse Vocabularies!
Challenge with
Linked Data
Model Business
Questions
(SPARQL)
Interact with RDF
answer in a
Faceted Browser
Building Linked Data Applications
14. Improving Internal Interoperability
Scientists, Clinicians, Informaticists can now freely interoperate as:
ďśThe PURL server provides a central identity management authority for
resources that are of value (need to persist) across the enterprise.
The Persistent URLs are used to connect resources found in multiple
locations
ďśThe vocabulary server provides a way of harmonizing concepts across
different domains
o Where possible, public vocabularies are used
o Where not, theyâre extended
o We donât want to develop and maintain vocabularies
16. R&D | RDI
Unstructured Content
ďśGiving Structure to Unstructured Content
o Entity Recognition
o Use of common vocabularies
o Schemas
o Domain-Specific Content? Open BEL? TMO?
o Compatibility of text indices with triplestores & middleware tools
ďśEncouraging Publishers to Structure Content
o How can this be âmonetizedâ so they donât lose their ROI?
o What about interoperability & persistence?
o Can this be mandated via funding agencies
o RDFa to start?
ďśPublishers or âRe-publishersâ
o Thomson-Reuters
o Ingenuity
o Open up vocabularies
(or most of the data out thereâŚ)
17. R&D | RDI
Pre-Competitive Consortia
ďśOpen PHACTS (Innovative Medicines Initiative)
ďśPistoia Alliance
ďśW3C Health Care & Life Sciences Interest Group
ďśNational Center for Biomedical Ontologies
(NCBO)
ďśOpen BEL (Biological Expression Language)
18. R&D | RDI
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
18
Open PHACTS (Open Pharmacological Space)
⢠EU/EFPIA Innovative Medicines Initiative (IMI) project
From: Open PHACTS Architecture - Building the extensible platform (EuroQSAR 2012 in Vienna, 30.08.2012)
19. R&D | RDI
W3C HCLS
ďśActivities:
o Continue to develop high level (e.g. TMO) and architectural (e.g. SWAN)
vocabularies.
o Implement proof-of-concept demonstrations and industry-ready code.
o Document guidelines to accelerate the adoption of the technology.
o Disseminate information about the group's work at government, industry, academic
events and by participating in community initiatives.
ďśUse Cases/Domains
o Drug Discovery
o Electronic Lab Notebooks
o Comparator Arm Data
o Patient Data Ownership
o Biotech Acquisition
o Supply Chain Automation
o Web Integration
o Bio-surveillance
o Co-development
http://www.w3.org/blog/hcls/
The mission of the Semantic Web Health Care and Life Sciences
Interest Group (HCLS IG) is to develop, advocate for, and
support the use of Semantic Web technologies across health
care, life sciences, clinical research and translational medicine
20. R&D | RDI
Pleas & Future Directions
Prognostications
RDF Content Farms
ďśVendors: Someone will figure out
how to monetize this
ďśConsortia: Who âOwnsâ this?
ďśGovernment in Health Care & Life
Sciences; can we learn from the
EPA? open.gov?
Shrinking Pharma
ďśSmaller (or virtual) footprint
o Back to first principlesâwhat do
we do best?
ďśMore modeling & Simulation
ďśRise of the informaticistâŚ
Community Help
Resist Silos
ďśWhere is your data? Where is it likely
to be in 5, 10 years?
ďśA single triplestore with all ETL-
streams leading to an RDF âdata
warehouseâ is another silo
o Building on top of âstandards+â may
lead to silos
ďśNeed to follow & influence emergence
of standards if you have a âhorse in the
raceâ
Support (business focused) Consortiums
ďśWeâre doing the same job many, many
times
From 2010 through 2013, 30 blockbuster drugs with an annual sales total of approximately $98 billion have already had or will see their patents expire.The annual growth of the generic pharmaceutical industry (7.3%) is three times as high as the annual growth of the brand name pharmaceutical industry (2.4%).