Elsevier is a global information analytics business that helps institutions and professional’s
advance healthcare and open science to improve performance for the benefit of humanity.
In this webinar, we discuss how Elsevier is increasingly leveraging the FAIR Guiding Principles to improve its products and services to better serve the scientific community.
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
The slides from the Pistoia Alliance Debates Webinar where a panel of experts from technology support providers and the biopharma industry, who have been invited to share their views on the "Benefits and costs of FAIR Implementation for life science industry".
Knowledge graphs ilaria maresi the hyve 23apr2020Pistoia Alliance
Data for drug discovery and healthcare is often trapped in silos which hampers effective interpretation and reuse. To remedy this, such data needs to be linked both internally and to external sources to make a FAIR data landscape which can power semantic models and knowledge graphs.
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
The FAIR (Findable, Accessible, Interoperable and Reusable) principles aim to maximize the discovery and reuse of digital resources. Using recently developed software and metrics to assess FAIRness and supported through an ELIXIR Implementation Study, Michel worked with a subset of ELIXIR Core Data Resources to apply these technologies. In this webinar, he will discuss their approach, findings, and lessons learned towards the understanding and promotion of the FAIR principles.
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
In this webinar Dr Henriette Harmse from EMBL-EBI presents how they are using their ontology services at EMBL-EBI to scale up the annotation of data and deliver added value through ontologies and semantics to their users.
Fairification experience clarifying the semantics of data matricesPistoia Alliance
This webinar presents the Statistics Ontology, STATO which is a semantic framework to support the creation of standardized analysis reports to help with review of results in the form of data matrices. STATO includes a hierarchy of classes and a vocabulary for annotating statistical methods used in life, natural and biomedical sciences investigations, text mining and statistical analyses.
This presentation reviewed the challenges in identifying, acquiring and utilizing research data in relation to an evolving data market. Strategic solutions were examined in which the FAIR principles play a key role in the future of data management.
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...Dr. Haxel Consult
Synonym breaks search! How? Why is this important? What synonym is and how it breaks search will be explained with real-world examples. AI-based solutions are proposed, and relevant standards are identified. How synonym solutions should be used for search are explained. Learn what you can do yourself. Tools help, but it doesn’t have to be complicated, nor expensive. It is as straight forward as setting priorities!
PA webinar on benefits & costs of FAIR implementation in life sciences Pistoia Alliance
The slides from the Pistoia Alliance Debates Webinar where a panel of experts from technology support providers and the biopharma industry, who have been invited to share their views on the "Benefits and costs of FAIR Implementation for life science industry".
Knowledge graphs ilaria maresi the hyve 23apr2020Pistoia Alliance
Data for drug discovery and healthcare is often trapped in silos which hampers effective interpretation and reuse. To remedy this, such data needs to be linked both internally and to external sources to make a FAIR data landscape which can power semantic models and knowledge graphs.
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
The FAIR (Findable, Accessible, Interoperable and Reusable) principles aim to maximize the discovery and reuse of digital resources. Using recently developed software and metrics to assess FAIRness and supported through an ELIXIR Implementation Study, Michel worked with a subset of ELIXIR Core Data Resources to apply these technologies. In this webinar, he will discuss their approach, findings, and lessons learned towards the understanding and promotion of the FAIR principles.
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
In this webinar Dr Henriette Harmse from EMBL-EBI presents how they are using their ontology services at EMBL-EBI to scale up the annotation of data and deliver added value through ontologies and semantics to their users.
Fairification experience clarifying the semantics of data matricesPistoia Alliance
This webinar presents the Statistics Ontology, STATO which is a semantic framework to support the creation of standardized analysis reports to help with review of results in the form of data matrices. STATO includes a hierarchy of classes and a vocabulary for annotating statistical methods used in life, natural and biomedical sciences investigations, text mining and statistical analyses.
This presentation reviewed the challenges in identifying, acquiring and utilizing research data in relation to an evolving data market. Strategic solutions were examined in which the FAIR principles play a key role in the future of data management.
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...Dr. Haxel Consult
Synonym breaks search! How? Why is this important? What synonym is and how it breaks search will be explained with real-world examples. AI-based solutions are proposed, and relevant standards are identified. How synonym solutions should be used for search are explained. Learn what you can do yourself. Tools help, but it doesn’t have to be complicated, nor expensive. It is as straight forward as setting priorities!
The OntoChem IT Solutions GmbH ...
... was founded in 2015 as a purely IT-oriented offshoot of the OntoChem GmbH. Even before we had many years of experience and it has always been our mission to provide added value to our customers by helping them to navigate today’s complex information world by developing cognitive computing solutions, indexing intranet and internet data and applying semantic search solutions for pharmaceutical, material sciences and technology driven businesses.
We strive to support our customers with the most useful tools for knowledge discovery possible, encompassing up-to-date data sources, optimized ontologies and high-throughput semantic document processing and annotation techniques.
We create new knowledge from structured and unstructured data by extracting relationships thereby exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.
We aim at an unprecedented, machine understanding of text and subsequent knowledge extraction and inference. The application of our methods towards chemical compounds and their properties supports our customers in generating intellectual property and their use as novel therapeutics, agrochemical products, nutraceuticals, cosmetics and in the field of novel materials.
It's our mission to provide added value to customers by:
developing and applying cognitive computing solutions
creating intranet and internet data indexing and semantic search solutions
Big Data analytics for technology driven businesses
supporting product development and surveillance.
We deliver useful tools for knowledge discovery for:
creating background knowledge ontologies
high-throughput semantic document processing and annotation
knowledge mining by extracting relationships
exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.
Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204Kees van Bochove
At the Bio Data World conference in Basel in December 2019, Kees van Bochove, Founder of The Hyve gave a talk on re-use of pharma R&D data, and what strategies could be used to realize operationalization of FAIR data at scale.
"Big data" is a broad term that encompasses a wide range of data and contents. Big data offers new approaches to analysis and decision making. At first glance big data and IP may seem to be opposites, but have more in common than one may think. This talk will focus on how big data will impact, and be impacted, by IP. One of the biggest promises in big data is the possibility to re-use data produced via different sources, create new services or predict the future, via the analysis of correlations. In this context, how can companies protect information assets and analytical skills? What are the new skills required to search and analyze in real time a big amount of datasets ? Big data will change not only patents information, but will also generate new types of patents.
SciBite is an award-winning leading provider of semantic solutions for the life sciences industry. Our fast, scalable easy-to-use semantic technologies understand the complexity and variability of content within life sciences. We can quickly identify and extract scientific terminology from unstructured text and transform it into valuable machine-readable data for your downstream applications. Our hand-curated ontologies ensure accuracy and reliability of high-quality results. Headquartered in the UK, we support our customers with additional sites in the US and Japan.
More infos at: www.scibite.com
Biomax provides computational solutions for better decision making and knowledge management in the life science industry. Biomax helps customers generate value from proprietary and public resources by extracting the knowledge indispensable for efficient data exploration and interpretation. They focus on integrating information to enable a knowledge-based approach to develop innovative life science products. The company supports its customers with a platform that combines software products with knowledge resources, including oncology, nutrigenomics, plant research and functional genomics. With the launch of the NeuroXM Brain Science Suite in 2018 Biomax offers products tailored for the field of connectome research. The new Semantic Searching Platform AILANI provides a corporate-wide knowledge repository accessible for everyone, any time and from anywhere. Biomax’s worldwide customer community includes companies and research organizations that are successful in the areas of drug discovery, diagnostics, fine chemicals, food and plant production.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...Dr. Haxel Consult
Focusing on the significance of targets is one of the key drivers for quality of web search.
Filtering targeted companies based on the significance of their business model for the expected search results was one of our “nice to haves” last year.
Evaluating a number of artificial intelligence approaches based on neural networks, classical machine learning and semantic technologies lead us to a working hybrid approach.
Kairntech combines technologies from natural language processing (NLP) and machine learning to support clients in analysing large amounts of text-based information.
You find more information at https://kairntech.com/
Managing sensitive data at the University of BristolJisc RDM
Presentation on managing sensitive data at the University of Bristol by Kellie Snow, Research Data Librarian for the Research Data Network event, May 2016, Cardiff University.
Access the webinar: http://goo.gl/p08pTz
These slides were presented in a webinar by Denodo in collaboration with BioStorage Technologies and Indiana Clinical and Translational Sciences Institute and Regenstrief Institute.
BioStorage Technologies, Inc., Indiana Clinical and Translational Sciences Institute, and Regenstrief Institute (CTSI) have joined Denodo to talk about the important role of technological advancements, such as data virtualization, in advancing biospecimen research.
By watching this webinar, you can gain insight into best practices around the integration of biospecimen and research data as well as technology solutions that provide consolidated views and rapid conversions of this data into valuable business insights. You will also learn how data virtualization can assist with the integration of data residing in heterogeneous repositories and can securely deliver aggregated data in real-time.
Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology
The OntoChem IT Solutions GmbH ...
... was founded in 2015 as a purely IT-oriented offshoot of the OntoChem GmbH. Even before we had many years of experience and it has always been our mission to provide added value to our customers by helping them to navigate today’s complex information world by developing cognitive computing solutions, indexing intranet and internet data and applying semantic search solutions for pharmaceutical, material sciences and technology driven businesses.
We strive to support our customers with the most useful tools for knowledge discovery possible, encompassing up-to-date data sources, optimized ontologies and high-throughput semantic document processing and annotation techniques.
We create new knowledge from structured and unstructured data by extracting relationships thereby exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.
We aim at an unprecedented, machine understanding of text and subsequent knowledge extraction and inference. The application of our methods towards chemical compounds and their properties supports our customers in generating intellectual property and their use as novel therapeutics, agrochemical products, nutraceuticals, cosmetics and in the field of novel materials.
It's our mission to provide added value to customers by:
developing and applying cognitive computing solutions
creating intranet and internet data indexing and semantic search solutions
Big Data analytics for technology driven businesses
supporting product development and surveillance.
We deliver useful tools for knowledge discovery for:
creating background knowledge ontologies
high-throughput semantic document processing and annotation
knowledge mining by extracting relationships
exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.
Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204Kees van Bochove
At the Bio Data World conference in Basel in December 2019, Kees van Bochove, Founder of The Hyve gave a talk on re-use of pharma R&D data, and what strategies could be used to realize operationalization of FAIR data at scale.
"Big data" is a broad term that encompasses a wide range of data and contents. Big data offers new approaches to analysis and decision making. At first glance big data and IP may seem to be opposites, but have more in common than one may think. This talk will focus on how big data will impact, and be impacted, by IP. One of the biggest promises in big data is the possibility to re-use data produced via different sources, create new services or predict the future, via the analysis of correlations. In this context, how can companies protect information assets and analytical skills? What are the new skills required to search and analyze in real time a big amount of datasets ? Big data will change not only patents information, but will also generate new types of patents.
SciBite is an award-winning leading provider of semantic solutions for the life sciences industry. Our fast, scalable easy-to-use semantic technologies understand the complexity and variability of content within life sciences. We can quickly identify and extract scientific terminology from unstructured text and transform it into valuable machine-readable data for your downstream applications. Our hand-curated ontologies ensure accuracy and reliability of high-quality results. Headquartered in the UK, we support our customers with additional sites in the US and Japan.
More infos at: www.scibite.com
Biomax provides computational solutions for better decision making and knowledge management in the life science industry. Biomax helps customers generate value from proprietary and public resources by extracting the knowledge indispensable for efficient data exploration and interpretation. They focus on integrating information to enable a knowledge-based approach to develop innovative life science products. The company supports its customers with a platform that combines software products with knowledge resources, including oncology, nutrigenomics, plant research and functional genomics. With the launch of the NeuroXM Brain Science Suite in 2018 Biomax offers products tailored for the field of connectome research. The new Semantic Searching Platform AILANI provides a corporate-wide knowledge repository accessible for everyone, any time and from anywhere. Biomax’s worldwide customer community includes companies and research organizations that are successful in the areas of drug discovery, diagnostics, fine chemicals, food and plant production.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...Dr. Haxel Consult
Focusing on the significance of targets is one of the key drivers for quality of web search.
Filtering targeted companies based on the significance of their business model for the expected search results was one of our “nice to haves” last year.
Evaluating a number of artificial intelligence approaches based on neural networks, classical machine learning and semantic technologies lead us to a working hybrid approach.
Kairntech combines technologies from natural language processing (NLP) and machine learning to support clients in analysing large amounts of text-based information.
You find more information at https://kairntech.com/
Managing sensitive data at the University of BristolJisc RDM
Presentation on managing sensitive data at the University of Bristol by Kellie Snow, Research Data Librarian for the Research Data Network event, May 2016, Cardiff University.
Access the webinar: http://goo.gl/p08pTz
These slides were presented in a webinar by Denodo in collaboration with BioStorage Technologies and Indiana Clinical and Translational Sciences Institute and Regenstrief Institute.
BioStorage Technologies, Inc., Indiana Clinical and Translational Sciences Institute, and Regenstrief Institute (CTSI) have joined Denodo to talk about the important role of technological advancements, such as data virtualization, in advancing biospecimen research.
By watching this webinar, you can gain insight into best practices around the integration of biospecimen and research data as well as technology solutions that provide consolidated views and rapid conversions of this data into valuable business insights. You will also learn how data virtualization can assist with the integration of data residing in heterogeneous repositories and can securely deliver aggregated data in real-time.
Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology
Panel Discussion: Integrated Scientific DiscoveryHPCC Systems
From the 2017 HPCC Systems Community Day:
Ann Gabriel will lead a panel discussion on our academic program collaboration across RELX and how we are bringing together disparate data sources for new knowledge creation.
We are pleased to have panelists from:
Amy Apon, Professor and Chair, Division of Computer Science, School of Computing, Clemson University, SC
J. Christoph Freytag, Professor, Humboldt University Berlin, Germany
Tim Menzies, Professor, Computer Science, North Carolina State University, Raleigh, NC
Jon Preston, Interim Dean of CCSE, Kennesaw State University, Kennesaw, GA
HETT Conference Olympic Central 2014 Integrating Healthcare DeliveryElmar Flamme
Integrating Healthcare Delivery through the Innovative Use of Information & Technology - A user story from behind the CONTENT covered mountains and the deep
BIG DATA forest
Drive Compliance and Profit with Oracle Healthcare AnalyticsPerficient, Inc.
Learn how Oracle's Enterprise Health Analytics (EHA), coupled with Oracle Business Intelligence, speeds the delivery of clinical event reporting by leveraging data integrations to Cerner and the EHA Healthcare Data model.
How EHA integrates EMR and other operational data to provide actionable information with integrity and precision to ready you for the ACO market
How EHA integrates clinical, financial, administrative and research data to speed the time from data input to robust retrospective and predictive analytics
Examples of how Oracle EHA can unlock your EMR data for hospital-acquired conditions and prevention, ad-hoc and standard reporting and other valuable metadata
Data warehouse solutions including strategic roadmaps for Meaningful Use, Population Health Management and Accountable Care
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveKees van Bochove
In this talk, the Personal Health Train concept will be introduced, which enables running personalized medicine workflows as trains visiting data stations (e.g. hospital records, primary care records, clinical studies and registries, patient-held data from e.g. wearable sensors etc.) The Personal Health Train is a very powerful concept, which is however dependent on source medical data to be coded with appropriate metadata on consent, license, scope etc. of the data, and the data itself to be encoded using biomedical data standards, which is an ever growing field in biomedical informatics. In order to realize the Personal Health Train biomedical data will need to be FAIR, i.e. adopt the FAIR Guiding Principles. This talk will cover the emerging GO-FAIR international movement, and provide examples of how several European health data networks currently are adopting open standards based stacks, to enable routine health care data to be come accessible for research.
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
Making data and analytics FAIR has transformative potential within organizations to build on existing knowledge. FAIR resources also democratize access to information and tools in underserved communities. Global standards and analysis platforms provide strong foundational elements. However, FAIRness across time and different sectors of the biomedical workforce presents challenges. Here we summarize how platforms make data and analysis FAIR today and what we see as key areas of future focus.
BioIT 2024 invited talk.
Turning FAIR into Reality - Role for Libraries dri_ireland
Presentation by Dr. Natalie Harrower, Director Digital Repository of Ireland and European Commission FAIR data expert group member, on what role librarians can play in the FAIR ecosystem. "Applying the FAIR data principles in day-to-day library practice" session by the Research Data Management Working Group, LIBER Steering Committee Research Infrastructures, LIBER2019, Dublin, 26 June 2019
Overview of FAIR and the IMI FAIRplus project at the UK Conference of Bioinformatics and Computational Biology 2020: https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-2020
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
Abstract
In this presentation, Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health, will share the NIH’s vision for a modernized, integrated FAIR biomedical data ecosystem and the strategic roadmap that NIH is following to achieve this vision. Dr. Gregurick will highlight projects being implemented by team members across the NIH’s 27 institutes and centers and will ways that industry, academia, and other communities can help NIH enable a FAIR data ecosystem. Finally, she will weave in how this strategy is being leveraged to address the COVID-19 pandemic.
Presenter: Susan Gregurick, Ph.D., Associate Director of Data Science and Director, Office of Data Science Strategy at the National Institutes of Health
dkNET Webinar Information: https://dknet.org/about/webinar
A hybrid approach to data management is emerging in healthcare as organizations recognize the value of an enterprise data warehouse in combination with a data lake.
In this SlideShare, we discuss data lakes in healthcare and we:
Provide an overview of a Hadoop-based data lake architecture and integration platform, and its application in machine learning, predictive modeling, and data discovery
Discuss several key use cases driving the adoption of data lakes for both providers and health plans
Discuss available data storage forms and the required tools for a data lake environment
Detail best practices for conducting data lake assessments and review key implementation considerations for healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareHealth Catalyst
In 1989, John Reed, the CEO of Citibank and the early pioneer for ATMs, said, “I can see a future in which the data and information that is exchanged in our transactions are worth more than the transactions themselves.” We are at an interesting digital nexus in healthcare. Few of us would argue against the notion that data and digital health will play a bigger and bigger role in the future. But, are we on the right track to deliver on that future? It required $30B in federal incentive money to subsidize the uptake of Electronic Health Records (EHRs). You could argue that the federal incentives stimulated the first major step towards the digitization of health, but few physicians would celebrate its value in comparison to its expense. As the healthcare market consolidates through mergers and acquisitions (M&A), patching disparate EHRs and other information systems together becomes even more important, and challenging. An organization is not integrated until its data is integrated, but costly forklift replacements of these transaction information systems and consolidating them with a single EHR solution is not a viable financial solution.
Turning FAIR into Reality: Briefing on the EC’s report on FAIR datadri_ireland
DRI Director Natalie Harrower, a member of the European Commission's Expert Group on FAIR (Findable, Accessible, Interoperable and Re-usable) data, delivered a lunchtime briefing on the recently published 'Turning FAIR into Reality' report on Tuesday 26 February in the Royal Irish Academy, Dublin.
In 2016 the FAIR Data Principles were developed to support the position that effective research data management is ‘not a goal in itself but rather is the key conduit leading to knowledge discovery and innovation’. The new publication is both a report and an action plan for turning FAIR into reality. It offers a survey and analysis of what is needed to implement FAIR and it provides a set of concrete recommendations and actions for stakeholders in Europe and beyond.
The briefing provided an overview of the contents of the report, which include the principles of FAIR, as well as the elements required to implement FAIR data.
The Data Operating System: Changing the Digital Trajectory of HealthcareDale Sanders
This is the next evolution in health information exchanges and data warehouses, specifically designed to support analytics, transaction processing, and third party application development, in one platform, the Data Operating System.
Healthcare is undergoing a fundamental transformation, driven by advancing innovations and demand for a 360-degree view of patient care. Whether providers, payers, or pharmaceutical companies, organizations across the industry face an inundation of data, often in new and varied formats.
Similar to Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019 (20)
Innovation applications of microphysiological systems (MPS) have been growing over the past decade, especially with respect to the use of complex human tissues for assessing safety of drug candidates – but broad industry adoption of MPS methods has not yet become a reality.
This webinar addresses some recent advances in MPS development and begins to explore the barriers to increased incorporation of MPS to improve drug safety assessment and to provide safer, more effective drugs into the clinical pipeline.
Federated Learning (FL) is a learning paradigm that enables collaborative learning without centralizing datasets. In this webinar, NVIDIA present the concept of FL and discuss how it can help overcome some of the barriers seen in the development of AI-based solutions for pharma, genomics and healthcare. Following the presentation, the panel debate on other elements that could drive the adoption of digital approaches more widely and help answer currently intractable science and business questions.
It seems that AI is also becoming a buzzword, like design thinking. Everyone is talking about AI or wants to have AI, and sees all the ideas and benefits – that’s fine, but how do you get started? But what’s different now? Three innovations have finally put AI on the fast track: Big Data, with the internet and sensors everywhere; massive computing power, especially through the Cloud; and the development of breakthrough algorithms, so computers can be trained to accomplish more sophisticated tasks on their own with deep learning. If you use new technology, you need to explore and know what’s possible. With design thinking, it aids to outline the steps and define the ways in which you’re going to create the solution. Starting with mapping the customer journey, defining who will be using that service enhanced with intelligent technology, or who will benefit and gain value from it. We discuss how these two worlds are coming together, and how you get started to transform your venture with Artificial Intelligence using Design Thinking.
Speaker: Claudio Mirti, Principal Solution Specialist – Data & AI, Microsoft
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
This presentation described how data-driven chemoinformatics methods may automate much of what has historically been done by a medicinal chemist. It explored what is reasonable to expect “AI” approaches might achieve, and what is best left with a human expert. The implications of automation for the human-machine interface were explored and illustrated with examples from Bradshaw, GSK’s experimental automated design environment.
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.
Implementing Blockchain applications in healthcarePistoia Alliance
Blockchain technology can revolutionise the way information is exchanged between parties by bringing an unprecedented level of security and trust to these transactions. The technology is finding its way into multiple use cases but we are yet to see full adoption and real-world business implementation in the Healthcare industry.
In this webinar we will explore the main challenges and considerations for the implementation of Blockchain technology in Healthcare use cases. This is the third webinar in our Blockchain Education series.
Building trust and accountability - the role User Experience design can play ...Pistoia Alliance
In this webinar our panel of UX specialists give a brief introduction to User Experience before presenting the design opportunities UX can bring to AI. We all know that AI has great potential but has some significant hurdles to overcome not least so the human aspect of trust and ethical considerations when designing in the life sciences.
In the late Fall and Winter of 2018, the Pistoia Alliance in cooperation with Elsevier and charitable organizations Cures within Reach and Mission: Cure ran a datathon aiming to find drugs suitable for treatment of childhood chronic pancreatitis, a rare disease that causes extreme suffering. The datathon resulted in identification of four candidate compounds in a short time frame of just under three months. In this webinar our speakers discuss the technologies that made this leap possible
Creating novel drugs is an extraordinarily hard and complex problem.
One of the many challenges in drug design is the sheer size of the search space for novel chemical compounds. Scientists need to find molecules that are active toward a biological target or pathway and at the same time have acceptable ADMET properties.
There is now considerable research going on using various AI and ML approaches to tackle these challenges.
Our distinguished speakers, Drs. Alex Tropsha and Ola Engkvist, will discuss their recent work in Drug Design involving Deep Reinforcement Learning and Neural Networks, and will answer questions from the audience on the current state of the research in the field.
Speakers:
Prof Alex Tropsha, Professor at University of North Carolina at Chapel Hill, USA
Dr. Ola Engkvist, Associate Director at AstraZeneca R&D, Gothenburg, Sweden
The slides from thecontinuing part of Pistoia Alliance's drive to improve education and communication around new technologies to life science professionals, this webinar explored how blockchain/DLT and IoT could come together to add even more trust to the GxP domain. If you want to know more about how these new technologies could help enhance GxP compliance, then this webinar will give you much food for thought.
This talk presents an overview of the philosophy and ongoing work of the PhUSE project “Clinical Trials Results as Resource Description Framework.” The team is converting data from the CDISC Study Data Tabulation Model (SDTM) to graph data using an ontology-based approach. The wider implications of this work will be discussed, along with deployment strategies within and beyond the industry.
Pistoia alliance harmonizing fair data catalog approaches webinarPistoia Alliance
Multiple groups in the life sciences community have started their journey towards data FAIR-ification by implementing Data Catalogs, a clear first step towards Finding your data. While in many cases the approaches are quite similar, in both origin and intent, differing implementations could end up hampering interoperability and reuse. The Pistoia Alliance and the Linked Data Community of Practice hosted a panel discussion describing at three implementations and their downstream goals:
[1] Pharma cross-omics data catalogs,
[2] Clinical data catalogs
[3] Bioschemas for dataset discoverability on the inter/intranet
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018Pistoia Alliance
In order to advance Machine Learning driven analytic approaches, having access to more data is better. In order to achieve increasingly larger patient level datasets, Researchers require the pooling of data from participants across the Healthcare ecosystem.
Common requirements and technical design patterns have emerged from company-specific and industry consortia efforts, forming underlying patterns that make up an overall Reference Architecture for data that can ultimately feed new analytics and Machine Learning.
Pistoia Alliance datathon for drug repurposing for rare diseasesPistoia Alliance
As part of the Pistoia Alliance Centre of Excellence for AI in Life Sciences, we are running a datathon.
Rare Disease Drug Repurposing Datathon is your chance to advance knowledge on rare diseases and illustrate best practices in data science. Are you ready to help make a difference — and to showcase your organization’s data science work and skills?
As a run-up to the Pistoia Alliance Blockchain Bootcamp in October, we are please to bring you an in-depth introduction to blockchain technology. Hear Wolfgang Prinz and Wolfgang Gräther from Fraunhofer FIT provide an explanation of the underlying technology, the current uses of blockchain in other industries, and when blockchain is appropriate for various use-cases, followed by a demo.
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...Pistoia Alliance
Pistoia Alliance launched its Centre of Excellence for Artificial Intelligence (AI) in Life Sciences where we hope to bring together best practice, adoption strategy and hackathons covering a range of challenges.
Over the coming months we will be hosting a series of topics and speakers giving their perspectives on the role of Artificial & Augmented Intelligence in Life Sciences and Healthcare.
The topics will cover some of the current challenges, user stories & value in using AI in life sciences. If you want to get involved in this series as a speaker or suggest topics please get in touch
Webinar 1 will focused on the following
A Brief History
Big Data/ML/DL/AI - fundamentals and concepts
Data Fidelity importance
Some best practices
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Fair webinar, Ted slater: progress towards commercial fair data products and services 19sep2019
1. 19 September 2019
Ted Slater, Sr. Director Product Management PaaS, Elsevier
t.slater@elsevier.com
Progress Towards
Commercial FAIR Data
Products and Services
Playing FAIR at Elsevier
2. Summary
• About Elsevier
• Elsevier’s commitment to FAIR Data
• External efforts
• Internal efforts
• Wrap up & questions
3. About Elsevier
• Elsevier is a global information
analytics company that helps
institutions and professionals
progress science, advance
healthcare and improve performance
for the benefit of humanity.
• Founded in 1880.
• The logo represents the symbiotic
relationship between publisher and
scholar. Non solus means “not
alone.”
• Empowering Knowledge™
3
4. RELX actively harnesses & invests
in disruptive big data & analytics
REV Venture Partners, RELX Group’s
venture arm, has invested £150M in
promising big data & analytics companies,
including Palantir
RELX Group’s High Performance
Computing Cluster (HPCC) analyzes
structured and unstructured data across all
market segments
To develop expertise in Artificial Intelligence,
LexisNexis has invested $1.2 MM in
technology to streamline development and
improve performance for customers
RELX operate in 4 major market
segments
Scientific, Technical & Medical
Risk & Business Analytics
Legal
Exhibitions
Where RELX is going
How RELX is getting there
• Deliver improved outcomes to
customers
• Combine content & data with analytics
& technology in global platforms
• Build leading positions in long-term
global growth markets
• Leverage institutional skills, assets and resources
across RELX
• Organic development: investment in transforming
core business; build-out of new products
• Portfolio reshaping
Elsevier is part of RELX, a global provider of information-based
analytics and decision tools for professional and business customers
5. Scientific information and analytics are core RELX group capabilities
Source
strong data
Develop deep
understanding
of customer
needs
Build the right
infrastructure
Apply the right
analytics
Continuous
refinement
We harness deep customer understanding to create innovative solutions which combine content and data with analytics and
technology.
..we serve customers in
180+
countries worldwide
..with approximately
30,000
employees
..in offices across
>50
countries
…we have
25%
of the world’s peer-reviewed
STM content (3 petabytes)
…and spend
$1.4bn
on technology annually
6. Some Names You May Recognize
• Today Elsevier has more than
20,000 products for educational
and professional science and
healthcare communities
worldwide, including
− Cell Press
− ClinicalKey
− Embase
− Gold Standard Drug Database
− Gray’s Anatomy
− The Lancet
− Mendeley
− Pathway Studio/ResNet
− PharmaPendium
− QUOSA
− Reaxys
− ScienceDirect
− Scopus
6
For more, see https://www.elsevier.com/en-gb/solutions
7. What is Elsevier doing to
provide more FAIR data
products and services?
7
10. Elsevier in the FSPC
10
The FAIR Service Provider Consortium comprises >10 companies built to
develop the tools, skills, and capacity required to meet the growing demand
for professional FAIR services.
• Build consulting capacity by training FAIR data stewards and ontologists
• (Co-)develop professional FAIR tooling
• Establish a FAIR Center of Competence
http://www.phortosconsultants.com/Consortium
11. What FSPC Is About
Partners commit to
• Adhere to the GO FAIR Rules of
Engagement
• Implement the FAIR Data principles via
services and technology solutions in
accordance with GO FAIR best practices
• Share experiences and approaches
regarding development of FAIR
competence
See go-fair.org for more information.
Consortium aims:
• Enable the development of professional FAIR
support capacity in terms of services and tooling
• Develop tooling preferably as a multi-tenant
cloud-based FAIR-as-a-Service (FaaS)
• Help guide the professionalization of tools and
services
• Stimulate the adoption of FAIR principles and
their implementation
• Co-develop market opportunities, including
licensing, to build or expand services portfolio
• Develop best practices for FAIR implementations
• Liaise with public domain parties with unique
FAIR expertise
• Collaborate on skill development, training,
positioning and communication
11
12. FAIR Implementation Project at Pistoia Alliance
• Pistoia Alliance recognizes
that it’s a big commitment to
follow the FAIR Guiding
Principles
• Project will provide pre-
competitive support for FAIR
Implementation by the life
sciences industry through the
development of a FAIR
Toolkit
12See Wise et al., Implementation and relevance of FAIR data principles in Biopharmaceutical R&D
14. “A ‘Standard for FAIR Principles
Compliance’ is currently working its way
through the Elsevier Technology review
process.”
– Greg Dart,
Elsevier’s Lead Architect, Health
16. Introduction to Mendeley Data
• An open, modular, cloud-based research data management (RDM)
platform helping research institutions to manage the entire lifecycle of
research data
• Mission: facilitate data sharing
− the findings can be verified, reproduced, and cited correctly
− the data can be reused in new ways
− discovery of relevant research is facilitated
− funders get more value from their funding investment
• https://data.mendeley.com
16
18. Mendeley Data Benefits
To Researchers
• Discover relevant research data
• Comply with funders' mandates
• Prevent re-work
• Save time searching, collecting, and
sharing data
• Improve the impact of research and
increase data reuse
To Institutions
• Provide transparency into the
research lifecycle
• Help researchers save time,
increase collaboration, and manage
resources effectively
• Increase the exposure of research
and showcase research outputs
• Keep track of where data are stored
and shared both within and outside
an institution
18
19. How Mendeley Data Helps You Be FAIR
• Makes data findable
− Provides a place to put it
− Automatically and dynamically enriches metadata via “deep-data indexing”
• Helps make data comprehensible
− Facilitate structured annotation (perhaps via Hivebench), including provenance
• Establishes and maintains clear data ownership
− Control where data are stored and who has access
− Enable citations
• Enhances interoperability
− Modular platform connects to other RDM resources via open APIs
19
From W. Haak,
https://www.elsevier.com/connect/4-principles-for-unlocking-the-full-potential-of-research-data
22. About H-Graph
• Medical knowledge and metadata created by subject-
matter experts, extracted from the literature via NLP,
and stored as a graph
• Assembled for clinical product developers who need
trusted, comprehensive medical knowledge to deliver
advanced clinical decision-support applications for
healthcare professionals
• Thanks to Lena Deus for the following H-Graph slides.
22
23. | 23
1. It is a graph-based platform
2. Contains complex medical information
3. Delivers a structured version of
medically-validated literature
4. Uses federation to query healthcare
databases that span the patient
journey to ensure its content is always
up to date
5. Provides data scientists with a source
of data to validate machine learning
tools
H-Graph Today
400k concepts
5M relationships
75k diseases
46k drugs
63k procedures
90k symptoms
1 million journals 6000 books100+ years of clinical
knowledge
26. | 26
• Everything has an identifier
−The identifier is really a URL - so you can paste it on a browser
• Everything is a triple
−asthma has drug albuterol .
−albuterol has cost $100 / inhaler .
• Modern KG technologies allow “quads”
−Ferri’s Clinical Advisory said: “Asthma” “has drug” “albuterol”
• Modern KG technologies allow inference
− IF shortness of breath same as wheezing AND asthma has finding wheezing
THEN asthma has finding shortness of breath
• Modern KG technologies allow query federation
− One query system can recover and integrate data from many sources
Key Benefits of Knowledge Graphs (KG)
28. Entellect™:Elsevier’s Life Sciences Knowledge Platform
Build a rich knowledge graph of
harmonized, linked data.
We use advanced science-led processing of
content via proprietary text and data mining,
taxonomies & ontologies
Bring together disparate data for a clean,
comprehensive knowledge base.
Sources can include: structured &
unstructured data from databases, websites,
LIMs, document archives, ELNs, applications
Discover knowledge using semantic
search, applied analytics, and ML/AI.
Entellect provides flexible compute
capabilities augmented by Elsevier
Professional Services’ domain expertise
Entellect™
Your data’s value, fully realized.
Collect &
Curate
Connect &
Contextualize
Compute & Custom
Deliver
29. Entellect iPaaS Concept
Entellect™
compounds drugs targets AE
s
diseases
Semantic search
Applied analytics
AI/ML
C28H33N7O2is
a compound
Osimertinib
is a drug
EGFR is
a gene
target
dry
skin
Adenocarcinoma
is a sub-type of
non-small cell
lung cancer
C28H33N7O2is a
compound in
the drug
Osimertinib
Osimertinib
inhibits EGFR
EGFR is a gene
target for non small
cell lung cancer
Inferred: Osimertinib is
a therapy for EGFR
mutated non small cell
lung cancer
Collect & Curate Connect & Contextualize Compute & Custom Deliver
30. Entellect™
Data
source A
Data
source B
Data
source C
Knowledge
Streams
RawData
Streams
Extractor
Fetcher
Entity
reconciler
Taxonomies
Mapping rules
Data shaper
Micro-service
builders
Micro-
Services
Aggregators
Use case groups
Use case specific
ontologies &
reconciliation
API
Data
stream
processing
Applied
analytics
ML/AI
Semantic
search
RML Mappings /
Text Mining /
NLP
Entellect Architecture
Linking
Streams
Data
Streams
ProxyOntology
Collect & Curate Connect & Contextualize Compute & Custom Deliver
31. Ex 1: Unstructured data pipeline enabling semantic search & discovery
Medical Information
1. Ensuring disparate drug information is easily discoverable to healthcare
practitioners.
2. Detecting and filtering data that fails to meet regulatory standards
The solution allows clinicians to quickly search by related
terms and disease areas from the latest approved medical
information (e.g. drug labels)
Outcome: Medical practitioners can prescribe medication to patients, knowing they are using the most current information without having to
consult multiple sources of out-of-date data both online and offline
Medical Information data challenge: Entellect™ powered solution
Drug Labels
Medical Information
Documents
Documents Usage analytics Outcome
Web Portal
Unstructured
Document
Pipeline
Search
API
Logs
Author
improved
documents
32. Ex 2: Structured data pipeline enabling applied analytics
Optimizing chemical synthesis
Chemists performing retrosynthesis using conventional methods typically rely on
evaluating lists of reactions recorded by others and drawing on their own intuition
to work out a step-by-step method to creating a compound.
Entellect can apply novel algorithms to an integrated
knowledgebase of proprietary and published reaction data.
(Ex*: Improve the accuracy of computer-aided retrosynthesis).
Outcome: Researchers can now use novel algorithms to plan organic chemical synthesis more effectively
Chemistry data challenge Entellect™ powered solution
Elsevier data
3rd party data
1
2
3
Synthetic Routes
* Sources: Coley, Conner et al, (2017). “Prediction of Organic Reaction Outcomes Using Machine Learning.” ACS.; Marwin Segler, Mike Preuss, Mark Waller, (2017). “Planning chemical synthesis with deep
neural networks and symbolic AI,” Nature.
Reaction Data Algorithm Development & Deployment Answers
33. Ex 3: Structured data pipeline enabling analytics for drug repurposing
In spite of available data on approved drugs, identifying opportunities for drug
repurposing remains challenging due to the siloed, heterogeneous nature of
the requisite data.
Entellect can bring together, clean, harmonized and enriched
disparate data and make it usable for advanced analytics. This
opens up a wide range of opportunities for interrogation (statistical
techniques, machine learning, and AI).
Outcome:
• In a recent Datathon Entellect-processed data enabled a community
of data scientists to perform analytics on disparate content (from
Pathway Studio, Reaxys Medicinal Chemistry, PharmaPendium and
OpenTargets).
• Participants applied a drug target interaction prediction model
(binding affinity between a target and all possible drugs for repurposing).
ML enabled the analyses to be performed over a large search space.
• Within 30-60 days of starting the datathon, drug candidates with
promising repurposing opportunities were identified (for chronic
pancreatitis).
Drug repurposing data challenge Entellect™ powered solution
34. Findable
F1: (Meta)data are assigned a globally unique and persistent identifier
• We use IRIs throughout for data sets, data items (facts), and schema elements
F2: Data are described with rich metadata
• We use the RDF data model for capturing metadata, data, and schema
• We capture provenance for both source and data transformation processes
F3: Metadata clearly and explicitly include the identifier of the data they describe
• This is sanctioned by our internal dataset metadata standards that associate all datasets with an RDF
file with metadata.
F4: (Meta)data are registered or indexed in a searchable resource
• Data sets must be registered in our data catalog; metadata is then automatically gleaned from the
RDF metadata associated with the file.
34
35. Accessible
A1: (Meta)data are retrievable by their identifier using a
standardized communications protocol
• All IRIs are HTTPS IRIs that are dereferenceable through our Linked
Data endpoint, which uses state of the art
authentication/authorization mechanisms.
A2: Metadata are accessible, even when the data are no
longer available
• The data catalog and data items are managed separately to ensure
metadata longevity.
35
36. Interoperable
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for
knowledge representation
• We use OWL ontologies to describe the data in Entellect.
• We use RDF and RDF-sanctioned serializations throughout.
I2. (Meta)data use vocabularies that follow FAIR principles
• All Entellect specific vocabularies (ontologies) are part of the larger ecosystem, and thus follow the same FAIR
principles as the data themselves.
• We use several well-known community-defined vocabularies that to a large extent follow the FAIR principles.
Where they don’t, we host them as such in our own space.
I3. (Meta)data include qualified references to other (meta)data
• We preserve and maintain this information as it's collected from sources.
• Entellect data are a part of a larger ecosystem of Life Sciences data where multiple pre-existing data sets and
coding & identification mechanisms currently create a lot of value for our customers. We reuse and build on
these to create a larger interconnected knowledge graph.
36
37. Reusable
R1: (Meta)data are richly described with a plurality of accurate and
relevant attributes
• R1.1: (Meta)data are released with a clear and accessible data usage license
• Entellect uses a provenance-based entitlements mechanism which allows us to
propagate licenses through the provenance trail and detect potential conflicts.
Usage licenses are part of our company-wide metadata standards.
• R1.2: (Meta)data are associated with detailed provenance
• We track provenance at the source and process level; guided especially by the
need to capture license information from sources and components, and by
requirements related to entitlements.
• R1.3: (Meta)data meet domain-relevant community standards
• We use a two-step modeling approach, where source data are captured 1)
according to a canonical representation of the source, and 2) aligned with both
internal standards and schemas, as well as external ones.
37
38. Summary
• Elsevier is committed to
supporting external FAIR Data
efforts and initiatives
• We are committed to working
toward compliance with FAIR
Principles with our own data
• We are developing FAIR-
compliant data and analytics
products, including an advanced
iPaaS called Entellect, that can
help our customers be FAIR
38
39. Thank you
Ian Harrow & the Pistoia Alliance
Wouter Haak Lena Deus
Albert Mons Greg Dart
Jack Leon Rinke Hoekstra
Lee Hollister Jabe Wilson
Tim Miller