Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019

19 September 2019
Ted Slater, Sr. Director Product Management PaaS, Elsevier
t.slater@elsevier.com
Progress Towards
Commercial FAIR Data
Products and Services
Playing FAIR at Elsevier

Summary
• About Elsevier
• Elsevier’s commitment to FAIR Data
• External efforts
• Internal efforts
• Wrap up & questions

About Elsevier
• Elsevier is a global information
analytics company that helps
institutions and professionals
progress science, advance
healthcare and improve performance
for the benefit of humanity.
• Founded in 1880.
• The logo represents the symbiotic
relationship between publisher and
scholar. Non solus means “not
alone.”
• Empowering Knowledge™
3

RELX actively harnesses & invests
in disruptive big data & analytics
REV Venture Partners, RELX Group’s
venture arm, has invested £150M in
promising big data & analytics companies,
including Palantir
RELX Group’s High Performance
Computing Cluster (HPCC) analyzes
structured and unstructured data across all
market segments
To develop expertise in Artificial Intelligence,
LexisNexis has invested $1.2 MM in
technology to streamline development and
improve performance for customers
RELX operate in 4 major market
segments
Scientific, Technical & Medical
Risk & Business Analytics
Legal
Exhibitions
Where RELX is going
How RELX is getting there
• Deliver improved outcomes to
customers
• Combine content & data with analytics
& technology in global platforms
• Build leading positions in long-term
global growth markets
• Leverage institutional skills, assets and resources
across RELX
• Organic development: investment in transforming
core business; build-out of new products
• Portfolio reshaping
Elsevier is part of RELX, a global provider of information-based
analytics and decision tools for professional and business customers

Scientific information and analytics are core RELX group capabilities
Source
strong data
Develop deep
understanding
of customer
needs
Build the right
infrastructure
Apply the right
analytics
Continuous
refinement
We harness deep customer understanding to create innovative solutions which combine content and data with analytics and
technology.
..we serve customers in
180+
countries worldwide
..with approximately
30,000
employees
..in offices across
>50
countries
…we have
25%
of the world’s peer-reviewed
STM content (3 petabytes)
…and spend
$1.4bn
on technology annually

Some Names You May Recognize
• Today Elsevier has more than
20,000 products for educational
and professional science and
healthcare communities
worldwide, including
− Cell Press
− ClinicalKey
− Embase
− Gold Standard Drug Database
− Gray’s Anatomy
− The Lancet
− Mendeley
− Pathway Studio/ResNet
− PharmaPendium
− QUOSA
− Reaxys
− ScienceDirect
− Scopus
6
For more, see https://www.elsevier.com/en-gb/solutions

What is Elsevier doing to
provide more FAIR data
products and services?
7

External FAIR Advocacy - Bio-IT World
9

Elsevier in the FSPC
10
The FAIR Service Provider Consortium comprises >10 companies built to
develop the tools, skills, and capacity required to meet the growing demand
for professional FAIR services.
• Build consulting capacity by training FAIR data stewards and ontologists
• (Co-)develop professional FAIR tooling
• Establish a FAIR Center of Competence
http://www.phortosconsultants.com/Consortium

What FSPC Is About
Partners commit to
• Adhere to the GO FAIR Rules of
Engagement
• Implement the FAIR Data principles via
services and technology solutions in
accordance with GO FAIR best practices
• Share experiences and approaches
regarding development of FAIR
competence
See go-fair.org for more information.
Consortium aims:
• Enable the development of professional FAIR
support capacity in terms of services and tooling
• Develop tooling preferably as a multi-tenant
cloud-based FAIR-as-a-Service (FaaS)
• Help guide the professionalization of tools and
services
• Stimulate the adoption of FAIR principles and
their implementation
• Co-develop market opportunities, including
licensing, to build or expand services portfolio
• Develop best practices for FAIR implementations
• Liaise with public domain parties with unique
FAIR expertise
• Collaborate on skill development, training,
positioning and communication
11

FAIR Implementation Project at Pistoia Alliance
• Pistoia Alliance recognizes
that it’s a big commitment to
follow the FAIR Guiding
Principles
• Project will provide pre-
competitive support for FAIR
Implementation by the life
sciences industry through the
development of a FAIR
Toolkit
12See Wise et al., Implementation and relevance of FAIR data principles in Biopharmaceutical R&D

“A ‘Standard for FAIR Principles
Compliance’ is currently working its way
through the Elsevier Technology review
process.”
– Greg Dart,
Elsevier’s Lead Architect, Health

Mendeley Data
Share Your Data With Your Research
15
Thanks to Wouter Haak

Introduction to Mendeley Data
• An open, modular, cloud-based research data management (RDM)
platform helping research institutions to manage the entire lifecycle of
research data
• Mission: facilitate data sharing
− the findings can be verified, reproduced, and cited correctly
− the data can be reused in new ways
− discovery of relevant research is facilitated
− funders get more value from their funding investment
• https://data.mendeley.com
16

Mendeley Data Benefits
To Researchers
• Discover relevant research data
• Comply with funders' mandates
• Prevent re-work
• Save time searching, collecting, and
sharing data
• Improve the impact of research and
increase data reuse
To Institutions
• Provide transparency into the
research lifecycle
• Help researchers save time,
increase collaboration, and manage
resources effectively
• Increase the exposure of research
and showcase research outputs
• Keep track of where data are stored
and shared both within and outside
an institution
18

How Mendeley Data Helps You Be FAIR
• Makes data findable
− Provides a place to put it
− Automatically and dynamically enriches metadata via “deep-data indexing”
• Helps make data comprehensible
− Facilitate structured annotation (perhaps via Hivebench), including provenance
• Establishes and maintains clear data ownership
− Control where data are stored and who has access
− Enable citations
• Enhances interoperability
− Modular platform connects to other RDM resources via open APIs
19
From W. Haak,
https://www.elsevier.com/connect/4-principles-for-unlocking-the-full-potential-of-research-data

Interoperability with Other RDMs
20

H-Graph
Curated Medical Knowledge Graph
21
Thanks to Helena Deus

About H-Graph
• Medical knowledge and metadata created by subject-
matter experts, extracted from the literature via NLP,
and stored as a graph
• Assembled for clinical product developers who need
trusted, comprehensive medical knowledge to deliver
advanced clinical decision-support applications for
healthcare professionals
• Thanks to Lena Deus for the following H-Graph slides.
22

| 23
1. It is a graph-based platform
2. Contains complex medical information
3. Delivers a structured version of
medically-validated literature
4. Uses federation to query healthcare
databases that span the patient
journey to ensure its content is always
up to date
5. Provides data scientists with a source
of data to validate machine learning
tools
H-Graph Today
400k concepts
5M relationships
75k diseases
46k drugs
63k procedures
90k symptoms
1 million journals 6000 books100+ years of clinical
knowledge

24
Creating a Web of Medical Knowledge through Linked Data

Videos and
Documents
Patient
Education
Gold
Standard
Drug
Database
Care
Plans
Order
Sets
Clinical
Trial Data
MACRORadiology
Images
StatDX
Pathology
Images
ExpertPath
Books and
Journals
ScienceDirect
ICD-10
RxNorm LOINC
SNOMED
MeSH
External Vocabularies
H-Graph
Core
Clinical
Guidelines
Enabling Potential
Without Creating
Friction
LDAPI
LDAPI

| 26
• Everything has an identifier
−The identifier is really a URL - so you can paste it on a browser
• Everything is a triple
−asthma has drug albuterol .
−albuterol has cost $100 / inhaler .
• Modern KG technologies allow “quads”
−Ferri’s Clinical Advisory said: “Asthma” “has drug” “albuterol”
• Modern KG technologies allow inference
− IF shortness of breath same as wheezing AND asthma has finding wheezing
THEN asthma has finding shortness of breath
• Modern KG technologies allow query federation
− One query system can recover and integrate data from many sources
Key Benefits of Knowledge Graphs (KG)

Entellect
Elsevier’s New, FAIR iPaaS for the Life Sciences
27
Thanks to Tim Miller and Lee Hollister

Entellect™:Elsevier’s Life Sciences Knowledge Platform
Build a rich knowledge graph of
harmonized, linked data.
We use advanced science-led processing of
content via proprietary text and data mining,
taxonomies & ontologies
Bring together disparate data for a clean,
comprehensive knowledge base.
Sources can include: structured &
unstructured data from databases, websites,
LIMs, document archives, ELNs, applications
Discover knowledge using semantic
search, applied analytics, and ML/AI.
Entellect provides flexible compute
capabilities augmented by Elsevier
Professional Services’ domain expertise
Entellect™
Your data’s value, fully realized.
Collect &
Curate
Connect &
Contextualize
Compute & Custom
Deliver

Entellect iPaaS Concept
Entellect™
compounds drugs targets AE
s
diseases
Semantic search
Applied analytics
AI/ML
C28H33N7O2is
a compound
Osimertinib
is a drug
EGFR is
a gene
target
dry
skin
Adenocarcinoma
is a sub-type of
non-small cell
lung cancer
C28H33N7O2is a
compound in
the drug
Osimertinib
Osimertinib
inhibits EGFR
EGFR is a gene
target for non small
cell lung cancer
Inferred: Osimertinib is
a therapy for EGFR
mutated non small cell
lung cancer
Collect & Curate Connect & Contextualize Compute & Custom Deliver

Entellect™
Data
source A
Data
source B
Data
source C
Knowledge
Streams
RawData
Streams
Extractor
Fetcher
Entity
reconciler
Taxonomies
Mapping rules
Data shaper
Micro-service
builders
Micro-
Services
Aggregators
Use case groups
Use case specific
ontologies &
reconciliation
API
Data
stream
processing
Applied
analytics
ML/AI
Semantic
search
RML Mappings /
Text Mining /
NLP
Entellect Architecture
Linking
Streams
Data
Streams
ProxyOntology
Collect & Curate Connect & Contextualize Compute & Custom Deliver

Ex 1: Unstructured data pipeline enabling semantic search & discovery
Medical Information
1. Ensuring disparate drug information is easily discoverable to healthcare
practitioners.
2. Detecting and filtering data that fails to meet regulatory standards
The solution allows clinicians to quickly search by related
terms and disease areas from the latest approved medical
information (e.g. drug labels)
Outcome: Medical practitioners can prescribe medication to patients, knowing they are using the most current information without having to
consult multiple sources of out-of-date data both online and offline
Medical Information data challenge: Entellect™ powered solution
Drug Labels
Medical Information
Documents
Documents Usage analytics Outcome
Web Portal
Unstructured
Document
Pipeline
Search
API
Logs
Author
improved
documents

Ex 2: Structured data pipeline enabling applied analytics
Optimizing chemical synthesis
Chemists performing retrosynthesis using conventional methods typically rely on
evaluating lists of reactions recorded by others and drawing on their own intuition
to work out a step-by-step method to creating a compound.
Entellect can apply novel algorithms to an integrated
knowledgebase of proprietary and published reaction data.
(Ex*: Improve the accuracy of computer-aided retrosynthesis).
Outcome: Researchers can now use novel algorithms to plan organic chemical synthesis more effectively
Chemistry data challenge Entellect™ powered solution
Elsevier data
3rd party data
1
2
3
Synthetic Routes
* Sources: Coley, Conner et al, (2017). “Prediction of Organic Reaction Outcomes Using Machine Learning.” ACS.; Marwin Segler, Mike Preuss, Mark Waller, (2017). “Planning chemical synthesis with deep
neural networks and symbolic AI,” Nature.
Reaction Data Algorithm Development & Deployment Answers

Ex 3: Structured data pipeline enabling analytics for drug repurposing
In spite of available data on approved drugs, identifying opportunities for drug
repurposing remains challenging due to the siloed, heterogeneous nature of
the requisite data.
Entellect can bring together, clean, harmonized and enriched
disparate data and make it usable for advanced analytics. This
opens up a wide range of opportunities for interrogation (statistical
techniques, machine learning, and AI).
Outcome:
• In a recent Datathon Entellect-processed data enabled a community
of data scientists to perform analytics on disparate content (from
Pathway Studio, Reaxys Medicinal Chemistry, PharmaPendium and
OpenTargets).
• Participants applied a drug target interaction prediction model
(binding affinity between a target and all possible drugs for repurposing).
ML enabled the analyses to be performed over a large search space.
• Within 30-60 days of starting the datathon, drug candidates with
promising repurposing opportunities were identified (for chronic
pancreatitis).
Drug repurposing data challenge Entellect™ powered solution

Findable
F1: (Meta)data are assigned a globally unique and persistent identifier
• We use IRIs throughout for data sets, data items (facts), and schema elements
F2: Data are described with rich metadata
• We use the RDF data model for capturing metadata, data, and schema
• We capture provenance for both source and data transformation processes
F3: Metadata clearly and explicitly include the identifier of the data they describe
• This is sanctioned by our internal dataset metadata standards that associate all datasets with an RDF
file with metadata.
F4: (Meta)data are registered or indexed in a searchable resource
• Data sets must be registered in our data catalog; metadata is then automatically gleaned from the
RDF metadata associated with the file.
34

Accessible
A1: (Meta)data are retrievable by their identifier using a
standardized communications protocol
• All IRIs are HTTPS IRIs that are dereferenceable through our Linked
Data endpoint, which uses state of the art
authentication/authorization mechanisms.
A2: Metadata are accessible, even when the data are no
longer available
• The data catalog and data items are managed separately to ensure
metadata longevity.
35

Interoperable
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for
knowledge representation
• We use OWL ontologies to describe the data in Entellect.
• We use RDF and RDF-sanctioned serializations throughout.
I2. (Meta)data use vocabularies that follow FAIR principles
• All Entellect specific vocabularies (ontologies) are part of the larger ecosystem, and thus follow the same FAIR
principles as the data themselves.
• We use several well-known community-defined vocabularies that to a large extent follow the FAIR principles.
Where they don’t, we host them as such in our own space.
I3. (Meta)data include qualified references to other (meta)data
• We preserve and maintain this information as it's collected from sources.
• Entellect data are a part of a larger ecosystem of Life Sciences data where multiple pre-existing data sets and
coding & identification mechanisms currently create a lot of value for our customers. We reuse and build on
these to create a larger interconnected knowledge graph.
36

Reusable
R1: (Meta)data are richly described with a plurality of accurate and
relevant attributes
• R1.1: (Meta)data are released with a clear and accessible data usage license
• Entellect uses a provenance-based entitlements mechanism which allows us to
propagate licenses through the provenance trail and detect potential conflicts.
Usage licenses are part of our company-wide metadata standards.
• R1.2: (Meta)data are associated with detailed provenance
• We track provenance at the source and process level; guided especially by the
need to capture license information from sources and components, and by
requirements related to entitlements.
• R1.3: (Meta)data meet domain-relevant community standards
• We use a two-step modeling approach, where source data are captured 1)
according to a canonical representation of the source, and 2) aligned with both
internal standards and schemas, as well as external ones.
37

Summary
• Elsevier is committed to
supporting external FAIR Data
efforts and initiatives
• We are committed to working
toward compliance with FAIR
Principles with our own data
• We are developing FAIR-
compliant data and analytics
products, including an advanced
iPaaS called Entellect, that can
help our customers be FAIR
38

Thank you
Ian Harrow & the Pistoia Alliance
Wouter Haak Lena Deus
Albert Mons Greg Dart
Jack Leon Rinke Hoekstra
Lee Hollister Jabe Wilson
Tim Miller

Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019

Similar to Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019 (20)

More from Pistoia Alliance

More from Pistoia Alliance (20)

Recently uploaded

Recently uploaded (20)

Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019