This talk will provide a means to discuss the capture, integration and dissemination of data across large enterprises. We will show how data variety is continuing to grow, meaning new data sources are steadily becoming available for use in analysis. Data veracity is also of importance since a large amount of data is fuzzy (uncertain) in nature. The ability to integrate these various data sources and provide improved capabilities to understand and use it is of increasing importance in today’s pharma climate. We call this Reference Master Data Management (RMDM).
This talk will span an arc of data lifecycle management, beginning with instrument data, moving across to clinical studies, production, regulatory affairs and finally e-archiving (see Fig. 1). I will show how these systems can use a common semantics for modeling of important metadata, which can apply the FAIR principles of Findability, Accessibility, Interoperability and Reusability to a common “semantic hub” that can connect data sources of different varieties across the enterprise. ADF files, for example, use their Data Description layer to provide semantic metadata about file contents. Similarly, semantics can be used to describe clinical trials data, regulatory data, etc., through to archiving, for improved storage and search over long periods of time.
1. V.2.2
Eric Little, PhD
Chief Data Officer
OSTHUS
eric.little@osthus.com
Data Lifecycle Management
Across The Enterprise
2. Slide 2
Pharma invests in R&D and has to
make $ back over subsequent years
Most R&D will fail, so risk is high
Law of Diminishing Returns
R&D productivity is declining
Harder treatments have greater costs,
potentially lower returns
Drugs with minimal improvements
(not as many blockbusters + generics)
The Pharma Industry Is At A Tipping Point
From: Kelvin Stott - https://endpts.com/pharmas-broken-business-model-
an-industry-on-the-brink-of-terminal-decline/
3. Slide 3
Reduce R&D costs through better use of data
Many experiments are re-run because scientists cannot find existing data
Costs of system integration is much higher than data integration
Standardization upstream can significantly impact costs downstream
Once data is available – automate as much as possible
Connect your internal data to other external data sources
Many items exist in open source that can be modified easier than built from scratch
How To Help Remedy the Situation
Use the data you have before you generate more!
Start with reoccurring tasks – workflows, models,
query patterns, analytics, etc., then build out!
Don’t reinvent the wheel! Build data communities!
4. Slide 4
THE MOVE FROM BIG DATA TO
BIG ANALYSIS
STATISTICAL
SEMANTICS
MACHINE
LEARNING
REASONING
5. Slide 5
Moving to Smart Data
Smart data can be added to existing systems
Does not require replacement of existing tech
Smart data provides a separation of:
Model Layer
Data Layer
Link to the model layer
Leave data in place
Smart data links information from the models to instance-level data
Smart Data uses metadata in order to capture context about data
6. Slide 6
Semantic Spectrum of Knowledge Organization Systems
• Deborah L. McGuinness. "Ontologies Come of Age". In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2003.
• Michael Uschold and Michael Gruninger “Ontologies and semantics for seamless connectivity” SIGMOD Rec. 33, 4 (December 2004), 58-64. DOI=http://dx.doi.org/10.1145/1041410.1041420
• Leo Obrst “The Ontology Spectrum”. Book section in of Roberto Poli, Michael Healy, Achilles Kameas “Theory and Applications of Ontology: Computer Applications”. Springer Netherlands, 17 Sep 2010.
• Leo Obrst and Mills Davis "Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities”. 2008.
Sources
7. Slide 7
Advantages of Using This Tech
Use cases where customers report distinct improvement:
Better defined terms
• Differentiates between Entities and Labels – more specific data dictionary
Better taxonomic structure
• Hierarchies can be accurately captured – not buried in incorrect tables
Query Federation
• Can easily use multiple data sources (integration)
Query Faceting
• Query results can be easily refined (and shared)
Better use of metadata
• Provides context for users
• Raw data is more valuable over time
Makes data actionable across an enterprise
• Moves from local data (on people’s machines, in their heads) to explicit sharable resources
• Adding SMART DATA to BIG DATA provides the means to access and use the data
• Requires combining logical data with statistical data in order to find patterns of
interest inside of large data sets
8. Slide 8
A Semantic Framework can connect the entire enterprise using a common semantics
The Semantic Hub should only focus on metadata (not instance level data)
Benefits: Common Terms, Models, Queries, Rules and Results (End-to-End)
Integrating Data Across the Enterprise
Lab Instruments Clinical Trials Regulatory AffairsProduction eArchiving
16. Slide 16
Often times R&D and manufacturing cannot easily share data
Competing systems can evolve which cause incompatibilities
Manufacturing data is often lower less complex than R&D data, but significantly
higher in throughput
QA/QC plays a major role
Far more interpretation in R&D
Manufacturing needs results fast
• Alarms
• Trends
Manufacturing data is less retrospective
Manufacturing Data Vs. R&D Data
18. Slide 18
Regulatory compliance requires accessing and mining unstructured data
Linking unstructured data to other data provides significant advantages
Text to DB links unstructured and structured data
Text to Public Data Sources leverages open source research
Regulatory Compliance
Regulatory Documentation
20. Slide 20
Data is made available for easier search and indexing (even after long periods of time)
Archiving is no longer a “vault” concept but is integrated within the Data Mgt. Lifecycle
E-Archiving Using the Allotrope Data Framework
22. Slide 22
Data Science (machine learning, text analytics, clustering etc.)
FAIR Data Is Now Accessible For Advanced Analytics
Linked Open Data
& Open APIs
Semantic
Graph DB
(Knowledge Graph)
Operational DBs
…
Unstructured
Documents
Analytics Tools
simulations
statistics
reasoning
Visualization
dashboards
exploration
search
…
Semi-structured
Data
Instrument
Data
Lightweight Semantic Integration Layer
(semantic RMDM, APIs, semantic indexing, data annotation, catalogues, meta data and linking)
Reporting
regulatory
internal
external
23. Slide 23
CONNECTING DATA, PEOPLE AND ORGANIZATIONS
Contact Information:
Email: eric.little@osthus.com
Web: www.osthus.com
www.biganalysis.com
Twitter: OntoEric