Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019


Published on

Elsevier is a global information analytics business that helps institutions and professional’s
advance healthcare and open science to improve performance for the benefit of humanity.

In this webinar, we  discuss how Elsevier is increasingly leveraging the FAIR Guiding Principles to improve its products and services to better serve the scientific community.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019

  1. 1. 19 September 2019 Ted Slater, Sr. Director Product Management PaaS, Elsevier Progress Towards Commercial FAIR Data Products and Services Playing FAIR at Elsevier
  2. 2. Summary • About Elsevier • Elsevier’s commitment to FAIR Data • External efforts • Internal efforts • Wrap up & questions
  3. 3. About Elsevier • Elsevier is a global information analytics company that helps institutions and professionals progress science, advance healthcare and improve performance for the benefit of humanity. • Founded in 1880. • The logo represents the symbiotic relationship between publisher and scholar. Non solus means “not alone.” • Empowering Knowledge™ 3
  4. 4. RELX actively harnesses & invests in disruptive big data & analytics REV Venture Partners, RELX Group’s venture arm, has invested £150M in promising big data & analytics companies, including Palantir RELX Group’s High Performance Computing Cluster (HPCC) analyzes structured and unstructured data across all market segments To develop expertise in Artificial Intelligence, LexisNexis has invested $1.2 MM in technology to streamline development and improve performance for customers RELX operate in 4 major market segments Scientific, Technical & Medical Risk & Business Analytics Legal Exhibitions Where RELX is going How RELX is getting there • Deliver improved outcomes to customers • Combine content & data with analytics & technology in global platforms • Build leading positions in long-term global growth markets • Leverage institutional skills, assets and resources across RELX • Organic development: investment in transforming core business; build-out of new products • Portfolio reshaping Elsevier is part of RELX, a global provider of information-based analytics and decision tools for professional and business customers
  5. 5. Scientific information and analytics are core RELX group capabilities Source strong data Develop deep understanding of customer needs Build the right infrastructure Apply the right analytics Continuous refinement We harness deep customer understanding to create innovative solutions which combine content and data with analytics and technology. ..we serve customers in 180+ countries worldwide ..with approximately 30,000 employees offices across >50 countries …we have 25% of the world’s peer-reviewed STM content (3 petabytes) …and spend $1.4bn on technology annually
  6. 6. Some Names You May Recognize • Today Elsevier has more than 20,000 products for educational and professional science and healthcare communities worldwide, including − Cell Press − ClinicalKey − Embase − Gold Standard Drug Database − Gray’s Anatomy − The Lancet − Mendeley − Pathway Studio/ResNet − PharmaPendium − QUOSA − Reaxys − ScienceDirect − Scopus 6 For more, see
  7. 7. What is Elsevier doing to provide more FAIR data products and services? 7
  8. 8. External Efforts
  9. 9. External FAIR Advocacy - Bio-IT World 9
  10. 10. Elsevier in the FSPC 10 The FAIR Service Provider Consortium comprises >10 companies built to develop the tools, skills, and capacity required to meet the growing demand for professional FAIR services. • Build consulting capacity by training FAIR data stewards and ontologists • (Co-)develop professional FAIR tooling • Establish a FAIR Center of Competence
  11. 11. What FSPC Is About Partners commit to • Adhere to the GO FAIR Rules of Engagement • Implement the FAIR Data principles via services and technology solutions in accordance with GO FAIR best practices • Share experiences and approaches regarding development of FAIR competence See for more information. Consortium aims: • Enable the development of professional FAIR support capacity in terms of services and tooling • Develop tooling preferably as a multi-tenant cloud-based FAIR-as-a-Service (FaaS) • Help guide the professionalization of tools and services • Stimulate the adoption of FAIR principles and their implementation • Co-develop market opportunities, including licensing, to build or expand services portfolio • Develop best practices for FAIR implementations • Liaise with public domain parties with unique FAIR expertise • Collaborate on skill development, training, positioning and communication 11
  12. 12. FAIR Implementation Project at Pistoia Alliance • Pistoia Alliance recognizes that it’s a big commitment to follow the FAIR Guiding Principles • Project will provide pre- competitive support for FAIR Implementation by the life sciences industry through the development of a FAIR Toolkit 12See Wise et al., Implementation and relevance of FAIR data principles in Biopharmaceutical R&D
  13. 13. Internal Efforts
  14. 14. “A ‘Standard for FAIR Principles Compliance’ is currently working its way through the Elsevier Technology review process.” – Greg Dart, Elsevier’s Lead Architect, Health
  15. 15. Mendeley Data Share Your Data With Your Research 15 Thanks to Wouter Haak
  16. 16. Introduction to Mendeley Data • An open, modular, cloud-based research data management (RDM) platform helping research institutions to manage the entire lifecycle of research data • Mission: facilitate data sharing − the findings can be verified, reproduced, and cited correctly − the data can be reused in new ways − discovery of relevant research is facilitated − funders get more value from their funding investment • 16
  17. 17. 17
  18. 18. Mendeley Data Benefits To Researchers • Discover relevant research data • Comply with funders' mandates • Prevent re-work • Save time searching, collecting, and sharing data • Improve the impact of research and increase data reuse To Institutions • Provide transparency into the research lifecycle • Help researchers save time, increase collaboration, and manage resources effectively • Increase the exposure of research and showcase research outputs • Keep track of where data are stored and shared both within and outside an institution 18
  19. 19. How Mendeley Data Helps You Be FAIR • Makes data findable − Provides a place to put it − Automatically and dynamically enriches metadata via “deep-data indexing” • Helps make data comprehensible − Facilitate structured annotation (perhaps via Hivebench), including provenance • Establishes and maintains clear data ownership − Control where data are stored and who has access − Enable citations • Enhances interoperability − Modular platform connects to other RDM resources via open APIs 19 From W. Haak,
  20. 20. Interoperability with Other RDMs 20
  21. 21. H-Graph Curated Medical Knowledge Graph 21 Thanks to Helena Deus
  22. 22. About H-Graph • Medical knowledge and metadata created by subject- matter experts, extracted from the literature via NLP, and stored as a graph • Assembled for clinical product developers who need trusted, comprehensive medical knowledge to deliver advanced clinical decision-support applications for healthcare professionals • Thanks to Lena Deus for the following H-Graph slides. 22
  23. 23. | 23 1. It is a graph-based platform 2. Contains complex medical information 3. Delivers a structured version of medically-validated literature 4. Uses federation to query healthcare databases that span the patient journey to ensure its content is always up to date 5. Provides data scientists with a source of data to validate machine learning tools H-Graph Today 400k concepts 5M relationships 75k diseases 46k drugs 63k procedures 90k symptoms 1 million journals 6000 books100+ years of clinical knowledge
  24. 24. 24 Creating a Web of Medical Knowledge through Linked Data
  25. 25. Videos and Documents Patient Education Gold Standard Drug Database Care Plans Order Sets Clinical Trial Data MACRORadiology Images StatDX Pathology Images ExpertPath Books and Journals ScienceDirect ICD-10 RxNorm LOINC SNOMED MeSH External Vocabularies H-Graph Core Clinical Guidelines Enabling Potential Without Creating Friction LDAPI LDAPI
  26. 26. | 26 • Everything has an identifier −The identifier is really a URL - so you can paste it on a browser • Everything is a triple −asthma has drug albuterol . −albuterol has cost $100 / inhaler . • Modern KG technologies allow “quads” −Ferri’s Clinical Advisory said: “Asthma” “has drug” “albuterol” • Modern KG technologies allow inference − IF shortness of breath same as wheezing AND asthma has finding wheezing THEN asthma has finding shortness of breath • Modern KG technologies allow query federation − One query system can recover and integrate data from many sources Key Benefits of Knowledge Graphs (KG)
  27. 27. Entellect Elsevier’s New, FAIR iPaaS for the Life Sciences 27 Thanks to Tim Miller and Lee Hollister
  28. 28. Entellect™:Elsevier’s Life Sciences Knowledge Platform Build a rich knowledge graph of harmonized, linked data. We use advanced science-led processing of content via proprietary text and data mining, taxonomies & ontologies Bring together disparate data for a clean, comprehensive knowledge base. Sources can include: structured & unstructured data from databases, websites, LIMs, document archives, ELNs, applications Discover knowledge using semantic search, applied analytics, and ML/AI. Entellect provides flexible compute capabilities augmented by Elsevier Professional Services’ domain expertise Entellect™ Your data’s value, fully realized. Collect & Curate Connect & Contextualize Compute & Custom Deliver
  29. 29. Entellect iPaaS Concept Entellect™ compounds drugs targets AE s diseases Semantic search Applied analytics AI/ML C28H33N7O2is a compound Osimertinib is a drug EGFR is a gene target dry skin Adenocarcinoma is a sub-type of non-small cell lung cancer C28H33N7O2is a compound in the drug Osimertinib Osimertinib inhibits EGFR EGFR is a gene target for non small cell lung cancer Inferred: Osimertinib is a therapy for EGFR mutated non small cell lung cancer Collect & Curate Connect & Contextualize Compute & Custom Deliver
  30. 30. Entellect™ Data source A Data source B Data source C Knowledge Streams RawData Streams Extractor Fetcher Entity reconciler Taxonomies Mapping rules Data shaper Micro-service builders Micro- Services Aggregators Use case groups Use case specific ontologies & reconciliation API Data stream processing Applied analytics ML/AI Semantic search RML Mappings / Text Mining / NLP Entellect Architecture Linking Streams Data Streams ProxyOntology Collect & Curate Connect & Contextualize Compute & Custom Deliver
  31. 31. Ex 1: Unstructured data pipeline enabling semantic search & discovery Medical Information 1. Ensuring disparate drug information is easily discoverable to healthcare practitioners. 2. Detecting and filtering data that fails to meet regulatory standards The solution allows clinicians to quickly search by related terms and disease areas from the latest approved medical information (e.g. drug labels) Outcome: Medical practitioners can prescribe medication to patients, knowing they are using the most current information without having to consult multiple sources of out-of-date data both online and offline Medical Information data challenge: Entellect™ powered solution Drug Labels Medical Information Documents Documents Usage analytics Outcome Web Portal Unstructured Document Pipeline Search API Logs Author improved documents
  32. 32. Ex 2: Structured data pipeline enabling applied analytics Optimizing chemical synthesis Chemists performing retrosynthesis using conventional methods typically rely on evaluating lists of reactions recorded by others and drawing on their own intuition to work out a step-by-step method to creating a compound. Entellect can apply novel algorithms to an integrated knowledgebase of proprietary and published reaction data. (Ex*: Improve the accuracy of computer-aided retrosynthesis). Outcome: Researchers can now use novel algorithms to plan organic chemical synthesis more effectively Chemistry data challenge Entellect™ powered solution Elsevier data 3rd party data 1 2 3 Synthetic Routes * Sources: Coley, Conner et al, (2017). “Prediction of Organic Reaction Outcomes Using Machine Learning.” ACS.; Marwin Segler, Mike Preuss, Mark Waller, (2017). “Planning chemical synthesis with deep neural networks and symbolic AI,” Nature. Reaction Data Algorithm Development & Deployment Answers
  33. 33. Ex 3: Structured data pipeline enabling analytics for drug repurposing In spite of available data on approved drugs, identifying opportunities for drug repurposing remains challenging due to the siloed, heterogeneous nature of the requisite data. Entellect can bring together, clean, harmonized and enriched disparate data and make it usable for advanced analytics. This opens up a wide range of opportunities for interrogation (statistical techniques, machine learning, and AI). Outcome: • In a recent Datathon Entellect-processed data enabled a community of data scientists to perform analytics on disparate content (from Pathway Studio, Reaxys Medicinal Chemistry, PharmaPendium and OpenTargets). • Participants applied a drug target interaction prediction model (binding affinity between a target and all possible drugs for repurposing). ML enabled the analyses to be performed over a large search space. • Within 30-60 days of starting the datathon, drug candidates with promising repurposing opportunities were identified (for chronic pancreatitis). Drug repurposing data challenge Entellect™ powered solution
  34. 34. Findable F1: (Meta)data are assigned a globally unique and persistent identifier • We use IRIs throughout for data sets, data items (facts), and schema elements F2: Data are described with rich metadata • We use the RDF data model for capturing metadata, data, and schema • We capture provenance for both source and data transformation processes F3: Metadata clearly and explicitly include the identifier of the data they describe • This is sanctioned by our internal dataset metadata standards that associate all datasets with an RDF file with metadata. F4: (Meta)data are registered or indexed in a searchable resource • Data sets must be registered in our data catalog; metadata is then automatically gleaned from the RDF metadata associated with the file. 34
  35. 35. Accessible A1: (Meta)data are retrievable by their identifier using a standardized communications protocol • All IRIs are HTTPS IRIs that are dereferenceable through our Linked Data endpoint, which uses state of the art authentication/authorization mechanisms. A2: Metadata are accessible, even when the data are no longer available • The data catalog and data items are managed separately to ensure metadata longevity. 35
  36. 36. Interoperable I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation • We use OWL ontologies to describe the data in Entellect. • We use RDF and RDF-sanctioned serializations throughout. I2. (Meta)data use vocabularies that follow FAIR principles • All Entellect specific vocabularies (ontologies) are part of the larger ecosystem, and thus follow the same FAIR principles as the data themselves. • We use several well-known community-defined vocabularies that to a large extent follow the FAIR principles. Where they don’t, we host them as such in our own space. I3. (Meta)data include qualified references to other (meta)data • We preserve and maintain this information as it's collected from sources. • Entellect data are a part of a larger ecosystem of Life Sciences data where multiple pre-existing data sets and coding & identification mechanisms currently create a lot of value for our customers. We reuse and build on these to create a larger interconnected knowledge graph. 36
  37. 37. Reusable R1: (Meta)data are richly described with a plurality of accurate and relevant attributes • R1.1: (Meta)data are released with a clear and accessible data usage license • Entellect uses a provenance-based entitlements mechanism which allows us to propagate licenses through the provenance trail and detect potential conflicts. Usage licenses are part of our company-wide metadata standards. • R1.2: (Meta)data are associated with detailed provenance • We track provenance at the source and process level; guided especially by the need to capture license information from sources and components, and by requirements related to entitlements. • R1.3: (Meta)data meet domain-relevant community standards • We use a two-step modeling approach, where source data are captured 1) according to a canonical representation of the source, and 2) aligned with both internal standards and schemas, as well as external ones. 37
  38. 38. Summary • Elsevier is committed to supporting external FAIR Data efforts and initiatives • We are committed to working toward compliance with FAIR Principles with our own data • We are developing FAIR- compliant data and analytics products, including an advanced iPaaS called Entellect, that can help our customers be FAIR 38
  39. 39. Thank you Ian Harrow & the Pistoia Alliance Wouter Haak Lena Deus Albert Mons Greg Dart Jack Leon Rinke Hoekstra Lee Hollister Jabe Wilson Tim Miller