Making true “molecule”-“mechanism”-“observation” relationship connections is a time consuming, iterative and laborious process. In addition, it is very easy to miss critical information that affects key decisions or helps make plausible scientific connections.
The current practice for deciphering such relationships frequently involves subject matter experts (SMEs) requesting resource from resource-constrained data science departments to refine and redo highly similar ad hoc searches. The result of this is impairment of both the pace and quality of scientific reviews.
In this presentation, I show how semantic integration can be made to ultimately become part of an integrated learning framework for more informed scientific decision making. I will take the audience through our pilot journey and highlight practical learnings that should inform subsequent endeavours.
Powering Question-Driven Problem Solving to Improve the Chances of Finding New Medicines
Powering question driven problem solving to
improve the chances of finding new medicines
Data Analytics and Visualization Director,
GSK Data & Computational Sciences
Connected Data London
4th October 2019
Most hypotheses are not
going to make it
Aspirations of scientific knowledge management
– Efficient organization.
– The hypotheses that we validate/invalidate today need to be
revisited by the next generation of scientists.
– Effective organization.
– Without access to the right data and prior knowledge at the right
time, we risk making very costly, avoidable business decisions.
GSK CEO 2008-2017
Inconsistent use of language at source =
Serious downstream problems
– Serine dehydratase
– Sodium dodecyl sulfate
– Shwachman–Diamond syndrome
– Safety data sheet
– Glycogen synthase kinase
Data & knowledge capture forms:
Regulatory in purpose but what about
reward in design?
All Pharma motivation: Acquire knowledge to assert
confidence in core types of evidence
Self-learning questionnaires: Concept
“Auto-suggest” metadata tagging [AUTHOR
ACTION] & auto literature evidence searching
[AUTHOR REWARD] to improve language
consistency1 at source and findability2 of
reported evidence [OUTCOME]
1Great for improving search engines
2Great for making scientists effective
Application of state-of-the-art
What’s possible now?
What will be possible with expert-
curated data over the next few
1. Determine from the context of the sentence whether author meant
“GlaxoSmithKline” or “Glycogen Synthase Kinase” when he/she
2. Classify and present sentences (+link to documents) with most
similar metadata content to expected answers
3. Recommend predominant synonyms being used by individual
departments e.g. is a particular department really working on
“Glycogen Synthase Kinase” and using the synonym “GSK”?
4. If a significant efficacy or safety event is reported in a “Clinical”
questionnaire, automatically alert the author whether the
outcome/risk was predicted earlier in a “Pre-clinical”
1Named entity recognition, 2Document classification,
3Reinforcement learning, 4Trigger event detection
Scoring “everything” – does it make sense to do it now or once
we actually have enough labelled training sets?
e.g. email spam
1. Found evidence from rare
disease clinical trial missed by
2. Found mechanistic hypothesis
that a program team had not
3. Identified plausible mechanism
for lab observation
• It’s all about the questions
• Technology can help
overcome cultural challenges
• Persistence and patience key