SlideShare a Scribd company logo
Data: The Good, The Bad
& The Ugly
Lee Harland
@SciBitely
http://www.scibite.com
http://www.slideshare.net/scibitely
Lee Harland
Lilly Global IT Meeting November 2016
Context
• This is an invited talk I gave at Lilly’s Internal Global IT meeting on the
subject of “data”
The Good
http://www.nejm.org/doi/full/10.1056/NEJMp1606181
What matters to me!
The Bad
+ =
…. (Promotion of) the nutritional importance of spinach over
other foods, lead to an increase of over 30 per cent in its
consumption during the 1920s and 30s.
The action of S. Oleracea on cardiovascular
output and muscular tone
Bad, Bad Data Point
1870 35.2 mg Fe/100g
1937 3.52 mg Fe/100g
The mythical strength-giving properties of
spinach are ... credited to a simple mistake
concerning the iron content of the vegetable.
In 1870, Dr E von Wolf published figures
which were accepted until the 1930s, when
they were rechecked
This revealed that a decimal point had been
placed wrongly and that the real figure was
only one tenth of Dr von Wolf's claim
Still Making Headlines After 140 Years
2013
There Is No
Decimal Point
Error
X X
X
Spinach: One Small Data Point, One Huge Mess
1870 35.2 mg Fe/100g
1937 3.52 mg Fe/100g
✓
✓
Both Values Are Correct – The difference is down to the assay conditions
http://www.merriam-webster.com/dictionary/provenance
35.2
35.2
The datapoint + its provenance
(experimental context)
What people saw
So What?
……estimates for the reproducibility of preclinical research range
from 51 percent to 89 percent. They estimate that at least half of
all U.S. preclinical biomedical research funding—about
$28 billion annually—is therefore squandered……
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165
http://www.merriam-webster.com/dictionary/provenance
Provenance Is A Critical Component of Reproducibility
What L cells, where from,
how old, epigenetic profile
etc etc?
When, how often, in what
way, using what
system?????
What, when, how?
Could you accurately reproduce this experiment from this method?
* I was responsible for
this paragraph
http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html
A first-of-a-kind analysis of Bayer's internal efforts to validate 'new drug
target' claims now not only supports this view but suggests that 50%
may be an underestimate; the company's in-house
experimental data do not match literature claims in
65% of target-validation projects, leading to project
discontinuation.
This is where
Informatics & Data
Science can add real
value to
Drug Discovery
Open PHACTS https://www.openphacts.org/
Open PHACTS: Adding Provenance To Data
http://nanopub.org/
.sub:Head {
this: np:hasAssertion sub:assertion ;
np:hasProvenance sub:provenance ;
np:hasPublicationInfo sub:pubinfo ;
a np:Nanopublication .
}
sub:assertion {
nx:NX_P35712 bfo:BFO_0000066 ts:TS-0276 ; # Protein NX_P35712 is localized in tissue TS-0276
ro:has_quality "positive" .
}
sub:provenance {
<http://www.nextprot.org/help/quality_criteria/silver> a eco:ECO_0000205 ;
rdfs:label "neXtProt silver"^^xsd:string .
sub:_1 a efo:EFO_00027688 .
sub:_10 a eco:ECO_0000218 .
sub:_2 a eco:ECO_0000218 .
sub:_9 a efo:EFO_00027688 .
sub:assertion prv:usedData <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000087&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693> ,
<http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000088&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693> ,
<http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000090&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693&amp;stage_children=on> ,
<http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000092&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693&amp;stage_children=on> ,
<http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000094&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693&amp;stage_children=on> ;
wi:evidence <http://www.nextprot.org/help/quality_criteria/silver> ;
a eco:ECO_0000220 ;
rdfs:comment " data, NX_P35712 is expressed in Endometrium"^^xsd:string ;
prov:wasDerivedFrom sub:_1 , sub:_3 , sub:_5 , sub:_7 , sub:_9 ;
prov:wasGeneratedBy sub:_10 , sub:_2 , sub:_4 , sub:_6 , sub:_8 .
}
sub:pubinfo {
sub:_11 a eco:ECO_0000205 .
sub:_12 a eco:ECO_0000205 . sub:_15 a eco:ECO_0000205 .
this: dcterms:created "2014-09-19T00:00:00.0Z"^^xsd:dateTime ;
dcterms:rights <http://creativecommons.org/licenses/by/3.0/> ;
dcterms:rightsHolder <http://nextprot.org> ;
prv:usedData "neXtProt database" ;
pav:authoredBy "CALIPHO project" , <http://orcid.org/0000-0001-6710-1373> , <http://orcid.org/0000-0001-6818-334X> , <http://orcid.org/0000-0002-1303-2189> , <http://orcid.org/0000-0003-1813-6857> ;
pav:versionNumber "3" ;
prov:wasGeneratedBy sub:_11 , sub:_12 , sub:_13 , sub:_14 , sub:_15 .
}
http://nanopub.org
https://explorer.openphacts.org
One of the few user interfaces where provenance is intrinsically “there”
The Ugly
80-90% of all potentially
usable business information
may originate in
unstructured form
https://en.wikipedia.org/wiki/Unstructured_data
The Ugly
“Carboxypeptidase B2” “Thrombin-Activatable
Fibrinolysis Inhibitor”
“Plasma CPU”
The True Picture
(they are the same thing)
It hasn’t just got 3 names its got LOTS
carboxypeptidase B-like protein OR thrombin-activatable fibrinolysis
inhibitor OR CPB type 2 OR Carboxypeptidase type B2 OR
plasma carboxypeptidase type B OR carboxypeptidase type B2 OR
CPB2 OR Plasma carboxypeptidase type B OR CPB-2 OR
carboxypeptidase B2 (plasma),carboxypeptidase U OR
Carboxypeptidase type U OR carboxypeptidase type U OR plasma
carboxypeptidase B2 OR carboxy-peptidylase U OR thrombin-
activable fibrinolysis inhibitor OR plasma carboxypeptidase type
B2 OR carboxypeptidase B2 (plasma OR CPU OR
carboxypeptidase B2 OR PCPB OR pCPB OR Carboxypeptidase U
OR plasma carboxypeptidase B OR TAFI OR Carboxypeptidase B2
OR Plasma carboxypeptidase B OR Thrombin-activable
fibrinolysis inhibitor OR carboxypeptidase B2 plasma OR
carboxypeptidase R
“We also manually standardized
data related to lab measurement
units and terminology related to
patient race and ethnicity,
geographical study regions, and
names of drugs and drug
families. “
Yet Another Issue
(an accident waiting to happen)
VARCHAR2
PROJ_TITLE
EXPERIMENT_INFO
ASSAY_DESCRIPTION
KEYWORDS
USER_PROFILE SUMMARY
EXPT_METADATA
SETTINGS_INFO
REPORT_TEXT
EXPT_NAME
Databases: Where Knowledge Goes To Die
MEETING_MINUTES
PROJ_ACTIONS
ASSAY_CONLCUSION
COHORT_DESC
INCLUSION_CRITERIA
POLICY_DETAILS
PROJECT_OVERVIEW
RATIONALE
JUSTIFICATION
Text2Data MicroService
TERMite
Supports basic
keyword search only
TEXT Rich substrate for search
and discovery & insight
DATA
Just What Is “The Data”?
•Mentions of all
• Genes, Diseases, Drugs, Tissues, Cells, Techniques,
Assays, Measures, Protocols, Compounds, Regimens,
Companies, People, Locations, Pathologies, Adverse
Events, Pathways, Metabolism, Manufacturing Concepts,
QC/QA, Pathogens, Strains, Animals … and so on...
•… And their relationships to each other
•… And their locations (section, database column)
•… Inferring relationships between documents/entries
•… Regardless of actual keyword used
Systems Integration Guide
http://yourcompany.com/termite?
text=<content>
app=<application name>
index=<e.g. page, table or column name>
ELN
Screening
Registry
PDM
Registry
Project
Management
Sharepoint
Whats going on, right now
Trending Today
Why Give Ugly Data A Makeover?
• ELN annotation using Bioassay Ontology
• Find all experiments using any Cell Flourescence technique”
• Pharmacovigilance
• Monitoring newsfeeds & internal data for safety signals
• Automatic Process Notification
• Alert groups based on content of CRO documents Etc
• Synergise Both Semantic Technology & Information Professionals
• Re-energise Therapeutic Area Literature Searching
• Build Knowledge Chains (Assertional Provenance)
• Project Management à ELN Data à Screen SOP
Before I go…..
Spinach: The Truth Is Out There!
Spinach is high
in iron (!)
..oxalic acid in spinach prevents
more than 90% of iron from being
absorbed..
Acknowledgement
Acknowledgements
IMI Open PHACTS Team
(many more involved, I just
don’t have a photo L )
http://openphacts.org
SciBite Team
http://scibite.com

More Related Content

Viewers also liked

SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite Limited
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentation
elasticdave
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Cambridge Semantics
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
Cambridge Semantics
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
Cambridge Semantics
 
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Cambridge Semantics
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
Cambridge Semantics
 

Viewers also liked (7)

SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentation
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
Applying Data Engineering and Semantic Standards to Tame the "Perfect Storm" ...
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 

Similar to Data: The Good, The Bad & The Ugly

dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET
 
The Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebThe Scientific Method on the Semantic Web
The Scientific Method on the Semantic Web
Mark Wilkinson
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
Andre Freitas
 
Tips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI toolsTips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI tools
Integrated DNA Technologies
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Text mining and deep learning for biomedicine
Text mining and deep learning for biomedicineText mining and deep learning for biomedicine
Text mining and deep learning for biomedicine
Zhiyong Lu, PhD FACMI
 
FAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic DataFAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic Data
Ian Fore
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
Chris Southan
 
Indications discovery and drug repurposing
Indications discovery and drug repurposingIndications discovery and drug repurposing
Indications discovery and drug repurposing
Sean Ekins
 
Data Visualization in Biomedical Sciences: More than Meets the Eye
Data Visualization in Biomedical Sciences: More than Meets the EyeData Visualization in Biomedical Sciences: More than Meets the Eye
Data Visualization in Biomedical Sciences: More than Meets the Eye
Nils Gehlenborg
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
Chris Southan
 
McIntosh "Improving the quality of preprints with automated checks"
McIntosh "Improving the quality of preprints with automated checks"McIntosh "Improving the quality of preprints with automated checks"
McIntosh "Improving the quality of preprints with automated checks"
National Information Standards Organization (NISO)
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET
 
Platforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-esPlatforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-es
Joaquin Dopazo
 
Publishing chemical data in public data repository
Publishing chemical data in public data repository Publishing chemical data in public data repository
Publishing chemical data in public data repository
Jian Zhang
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
Nolan Nichols
 
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014Anita de Waard
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
adcobb
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
Golden Helix Inc
 
Euretos presentation ACS
Euretos presentation ACSEuretos presentation ACS
Euretos presentation ACS
albertmons
 

Similar to Data: The Good, The Bad & The Ugly (20)

dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
 
The Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebThe Scientific Method on the Semantic Web
The Scientific Method on the Semantic Web
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
Tips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI toolsTips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI tools
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Text mining and deep learning for biomedicine
Text mining and deep learning for biomedicineText mining and deep learning for biomedicine
Text mining and deep learning for biomedicine
 
FAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic DataFAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic Data
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
Indications discovery and drug repurposing
Indications discovery and drug repurposingIndications discovery and drug repurposing
Indications discovery and drug repurposing
 
Data Visualization in Biomedical Sciences: More than Meets the Eye
Data Visualization in Biomedical Sciences: More than Meets the EyeData Visualization in Biomedical Sciences: More than Meets the Eye
Data Visualization in Biomedical Sciences: More than Meets the Eye
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
McIntosh "Improving the quality of preprints with automated checks"
McIntosh "Improving the quality of preprints with automated checks"McIntosh "Improving the quality of preprints with automated checks"
McIntosh "Improving the quality of preprints with automated checks"
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
Platforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-esPlatforms CIBERER and INB-ELIXIR-es
Platforms CIBERER and INB-ELIXIR-es
 
Publishing chemical data in public data repository
Publishing chemical data in public data repository Publishing chemical data in public data repository
Publishing chemical data in public data repository
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
 
Euretos presentation ACS
Euretos presentation ACSEuretos presentation ACS
Euretos presentation ACS
 

More from SciBite Limited

Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
SciBite Limited
 
Are Ontologies Relevant In A Machine Learning World?
Are Ontologies Relevant In A Machine Learning World?Are Ontologies Relevant In A Machine Learning World?
Are Ontologies Relevant In A Machine Learning World?
SciBite Limited
 
Data Integration Score Card
Data Integration Score CardData Integration Score Card
Data Integration Score Card
SciBite Limited
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
SciBite Limited
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry Programme
SciBite Limited
 
Scibite - We Do.
Scibite - We Do.Scibite - We Do.
Scibite - We Do.
SciBite Limited
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
SciBite Limited
 
Scibite flyer 2013
Scibite flyer 2013Scibite flyer 2013
Scibite flyer 2013
SciBite Limited
 

More from SciBite Limited (8)

Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
 
Are Ontologies Relevant In A Machine Learning World?
Are Ontologies Relevant In A Machine Learning World?Are Ontologies Relevant In A Machine Learning World?
Are Ontologies Relevant In A Machine Learning World?
 
Data Integration Score Card
Data Integration Score CardData Integration Score Card
Data Integration Score Card
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry Programme
 
Scibite - We Do.
Scibite - We Do.Scibite - We Do.
Scibite - We Do.
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
Scibite flyer 2013
Scibite flyer 2013Scibite flyer 2013
Scibite flyer 2013
 

Recently uploaded

HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COMHUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
priyabhojwani1200
 
DELIRIUM BY DR JAGMOHAN PRAJAPATI.......
DELIRIUM BY DR JAGMOHAN PRAJAPATI.......DELIRIUM BY DR JAGMOHAN PRAJAPATI.......
DELIRIUM BY DR JAGMOHAN PRAJAPATI.......
DR Jag Mohan Prajapati
 
Time line.ppQAWSDRFTGYUIOPÑLKIUYTREWASDFTGY
Time line.ppQAWSDRFTGYUIOPÑLKIUYTREWASDFTGYTime line.ppQAWSDRFTGYUIOPÑLKIUYTREWASDFTGY
Time line.ppQAWSDRFTGYUIOPÑLKIUYTREWASDFTGY
DianaRodriguez639773
 
PrudentRx's Function in the Management of Chronic Illnesses
PrudentRx's Function in the Management of Chronic IllnessesPrudentRx's Function in the Management of Chronic Illnesses
PrudentRx's Function in the Management of Chronic Illnesses
PrudentRx Program
 
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdfChampions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
eurohealthleaders
 
Nursing education curriculum development.pptx
Nursing education curriculum development.pptxNursing education curriculum development.pptx
Nursing education curriculum development.pptx
sadhanajagtap3
 
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
The Lifesciences Magazine
 
Trauma Outpatient Center .
Trauma Outpatient Center                       .Trauma Outpatient Center                       .
Trauma Outpatient Center .
TraumaOutpatientCent
 
CCSN_June_06 2024_jones. Cancer Rehabpptx
CCSN_June_06 2024_jones. Cancer RehabpptxCCSN_June_06 2024_jones. Cancer Rehabpptx
CCSN_June_06 2024_jones. Cancer Rehabpptx
Canadian Cancer Survivor Network
 
KEY Points of Leicester travel clinic In London doc.docx
KEY Points of Leicester travel clinic In London doc.docxKEY Points of Leicester travel clinic In London doc.docx
KEY Points of Leicester travel clinic In London doc.docx
NX Healthcare
 
INFECTION OF THE BRAIN -ENCEPHALITIS ( PPT)
INFECTION OF THE BRAIN -ENCEPHALITIS ( PPT)INFECTION OF THE BRAIN -ENCEPHALITIS ( PPT)
INFECTION OF THE BRAIN -ENCEPHALITIS ( PPT)
blessyjannu21
 
Rate Controlled Drug Delivery Systems.pdf
Rate Controlled Drug Delivery Systems.pdfRate Controlled Drug Delivery Systems.pdf
Rate Controlled Drug Delivery Systems.pdf
Rajarambapu College of Pharmacy Kasegaon Dist Sangli
 
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
rightmanforbloodline
 
Letter to MREC - application to conduct study
Letter to MREC - application to conduct studyLetter to MREC - application to conduct study
Letter to MREC - application to conduct study
Azreen Aj
 
RECENT ADVANCES IN BREAST CANCER RADIOTHERAPY
RECENT ADVANCES IN BREAST CANCER RADIOTHERAPYRECENT ADVANCES IN BREAST CANCER RADIOTHERAPY
RECENT ADVANCES IN BREAST CANCER RADIOTHERAPY
Isha Jaiswal
 
ABDOMINAL COMPARTMENT SYSNDROME
ABDOMINAL COMPARTMENT SYSNDROMEABDOMINAL COMPARTMENT SYSNDROME
ABDOMINAL COMPARTMENT SYSNDROME
Rommel Luis III Israel
 
Dimensions of Healthcare Quality
Dimensions of Healthcare QualityDimensions of Healthcare Quality
Dimensions of Healthcare Quality
Naeemshahzad51
 
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac CareStem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Dr. David Greene Arizona
 
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in CardiologyDr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
R3 Stem Cell
 
NKTI Annual Report - Annual Report FY 2022
NKTI Annual Report - Annual Report FY 2022NKTI Annual Report - Annual Report FY 2022
NKTI Annual Report - Annual Report FY 2022
nktiacc3
 

Recently uploaded (20)

HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COMHUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
 
DELIRIUM BY DR JAGMOHAN PRAJAPATI.......
DELIRIUM BY DR JAGMOHAN PRAJAPATI.......DELIRIUM BY DR JAGMOHAN PRAJAPATI.......
DELIRIUM BY DR JAGMOHAN PRAJAPATI.......
 
Time line.ppQAWSDRFTGYUIOPÑLKIUYTREWASDFTGY
Time line.ppQAWSDRFTGYUIOPÑLKIUYTREWASDFTGYTime line.ppQAWSDRFTGYUIOPÑLKIUYTREWASDFTGY
Time line.ppQAWSDRFTGYUIOPÑLKIUYTREWASDFTGY
 
PrudentRx's Function in the Management of Chronic Illnesses
PrudentRx's Function in the Management of Chronic IllnessesPrudentRx's Function in the Management of Chronic Illnesses
PrudentRx's Function in the Management of Chronic Illnesses
 
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdfChampions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
 
Nursing education curriculum development.pptx
Nursing education curriculum development.pptxNursing education curriculum development.pptx
Nursing education curriculum development.pptx
 
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
Deep Leg Vein Thrombosis (DVT): Meaning, Causes, Symptoms, Treatment, and Mor...
 
Trauma Outpatient Center .
Trauma Outpatient Center                       .Trauma Outpatient Center                       .
Trauma Outpatient Center .
 
CCSN_June_06 2024_jones. Cancer Rehabpptx
CCSN_June_06 2024_jones. Cancer RehabpptxCCSN_June_06 2024_jones. Cancer Rehabpptx
CCSN_June_06 2024_jones. Cancer Rehabpptx
 
KEY Points of Leicester travel clinic In London doc.docx
KEY Points of Leicester travel clinic In London doc.docxKEY Points of Leicester travel clinic In London doc.docx
KEY Points of Leicester travel clinic In London doc.docx
 
INFECTION OF THE BRAIN -ENCEPHALITIS ( PPT)
INFECTION OF THE BRAIN -ENCEPHALITIS ( PPT)INFECTION OF THE BRAIN -ENCEPHALITIS ( PPT)
INFECTION OF THE BRAIN -ENCEPHALITIS ( PPT)
 
Rate Controlled Drug Delivery Systems.pdf
Rate Controlled Drug Delivery Systems.pdfRate Controlled Drug Delivery Systems.pdf
Rate Controlled Drug Delivery Systems.pdf
 
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
 
Letter to MREC - application to conduct study
Letter to MREC - application to conduct studyLetter to MREC - application to conduct study
Letter to MREC - application to conduct study
 
RECENT ADVANCES IN BREAST CANCER RADIOTHERAPY
RECENT ADVANCES IN BREAST CANCER RADIOTHERAPYRECENT ADVANCES IN BREAST CANCER RADIOTHERAPY
RECENT ADVANCES IN BREAST CANCER RADIOTHERAPY
 
ABDOMINAL COMPARTMENT SYSNDROME
ABDOMINAL COMPARTMENT SYSNDROMEABDOMINAL COMPARTMENT SYSNDROME
ABDOMINAL COMPARTMENT SYSNDROME
 
Dimensions of Healthcare Quality
Dimensions of Healthcare QualityDimensions of Healthcare Quality
Dimensions of Healthcare Quality
 
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac CareStem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
 
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in CardiologyDr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
 
NKTI Annual Report - Annual Report FY 2022
NKTI Annual Report - Annual Report FY 2022NKTI Annual Report - Annual Report FY 2022
NKTI Annual Report - Annual Report FY 2022
 

Data: The Good, The Bad & The Ugly

  • 1. Data: The Good, The Bad & The Ugly Lee Harland @SciBitely http://www.scibite.com http://www.slideshare.net/scibitely Lee Harland Lilly Global IT Meeting November 2016
  • 2. Context • This is an invited talk I gave at Lilly’s Internal Global IT meeting on the subject of “data”
  • 5.
  • 6.
  • 7.
  • 10. + = …. (Promotion of) the nutritional importance of spinach over other foods, lead to an increase of over 30 per cent in its consumption during the 1920s and 30s. The action of S. Oleracea on cardiovascular output and muscular tone
  • 11. Bad, Bad Data Point 1870 35.2 mg Fe/100g 1937 3.52 mg Fe/100g The mythical strength-giving properties of spinach are ... credited to a simple mistake concerning the iron content of the vegetable. In 1870, Dr E von Wolf published figures which were accepted until the 1930s, when they were rechecked This revealed that a decimal point had been placed wrongly and that the real figure was only one tenth of Dr von Wolf's claim
  • 12. Still Making Headlines After 140 Years 2013
  • 13. There Is No Decimal Point Error
  • 14. X X
  • 15. X
  • 16. Spinach: One Small Data Point, One Huge Mess 1870 35.2 mg Fe/100g 1937 3.52 mg Fe/100g ✓ ✓ Both Values Are Correct – The difference is down to the assay conditions
  • 18. 35.2 35.2 The datapoint + its provenance (experimental context) What people saw
  • 20. ……estimates for the reproducibility of preclinical research range from 51 percent to 89 percent. They estimate that at least half of all U.S. preclinical biomedical research funding—about $28 billion annually—is therefore squandered…… http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165
  • 22. Provenance Is A Critical Component of Reproducibility What L cells, where from, how old, epigenetic profile etc etc? When, how often, in what way, using what system????? What, when, how? Could you accurately reproduce this experiment from this method? * I was responsible for this paragraph
  • 23. http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html A first-of-a-kind analysis of Bayer's internal efforts to validate 'new drug target' claims now not only supports this view but suggests that 50% may be an underestimate; the company's in-house experimental data do not match literature claims in 65% of target-validation projects, leading to project discontinuation.
  • 24. This is where Informatics & Data Science can add real value to Drug Discovery
  • 26. Open PHACTS: Adding Provenance To Data http://nanopub.org/
  • 27. .sub:Head { this: np:hasAssertion sub:assertion ; np:hasProvenance sub:provenance ; np:hasPublicationInfo sub:pubinfo ; a np:Nanopublication . } sub:assertion { nx:NX_P35712 bfo:BFO_0000066 ts:TS-0276 ; # Protein NX_P35712 is localized in tissue TS-0276 ro:has_quality "positive" . } sub:provenance { <http://www.nextprot.org/help/quality_criteria/silver> a eco:ECO_0000205 ; rdfs:label "neXtProt silver"^^xsd:string . sub:_1 a efo:EFO_00027688 . sub:_10 a eco:ECO_0000218 . sub:_2 a eco:ECO_0000218 . sub:_9 a efo:EFO_00027688 . sub:assertion prv:usedData <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000087&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693> , <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000088&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693> , <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000090&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693&amp;stage_children=on> , <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000092&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693&amp;stage_children=on> , <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000094&amp;organ_id=EV:0100115&amp;gene_id=ENSG00000110693&amp;stage_children=on> ; wi:evidence <http://www.nextprot.org/help/quality_criteria/silver> ; a eco:ECO_0000220 ; rdfs:comment " data, NX_P35712 is expressed in Endometrium"^^xsd:string ; prov:wasDerivedFrom sub:_1 , sub:_3 , sub:_5 , sub:_7 , sub:_9 ; prov:wasGeneratedBy sub:_10 , sub:_2 , sub:_4 , sub:_6 , sub:_8 . } sub:pubinfo { sub:_11 a eco:ECO_0000205 . sub:_12 a eco:ECO_0000205 . sub:_15 a eco:ECO_0000205 . this: dcterms:created "2014-09-19T00:00:00.0Z"^^xsd:dateTime ; dcterms:rights <http://creativecommons.org/licenses/by/3.0/> ; dcterms:rightsHolder <http://nextprot.org> ; prv:usedData "neXtProt database" ; pav:authoredBy "CALIPHO project" , <http://orcid.org/0000-0001-6710-1373> , <http://orcid.org/0000-0001-6818-334X> , <http://orcid.org/0000-0002-1303-2189> , <http://orcid.org/0000-0003-1813-6857> ; pav:versionNumber "3" ; prov:wasGeneratedBy sub:_11 , sub:_12 , sub:_13 , sub:_14 , sub:_15 . } http://nanopub.org
  • 29. One of the few user interfaces where provenance is intrinsically “there”
  • 31. 80-90% of all potentially usable business information may originate in unstructured form https://en.wikipedia.org/wiki/Unstructured_data The Ugly
  • 32. “Carboxypeptidase B2” “Thrombin-Activatable Fibrinolysis Inhibitor” “Plasma CPU” The True Picture (they are the same thing)
  • 33. It hasn’t just got 3 names its got LOTS carboxypeptidase B-like protein OR thrombin-activatable fibrinolysis inhibitor OR CPB type 2 OR Carboxypeptidase type B2 OR plasma carboxypeptidase type B OR carboxypeptidase type B2 OR CPB2 OR Plasma carboxypeptidase type B OR CPB-2 OR carboxypeptidase B2 (plasma),carboxypeptidase U OR Carboxypeptidase type U OR carboxypeptidase type U OR plasma carboxypeptidase B2 OR carboxy-peptidylase U OR thrombin- activable fibrinolysis inhibitor OR plasma carboxypeptidase type B2 OR carboxypeptidase B2 (plasma OR CPU OR carboxypeptidase B2 OR PCPB OR pCPB OR Carboxypeptidase U OR plasma carboxypeptidase B OR TAFI OR Carboxypeptidase B2 OR Plasma carboxypeptidase B OR Thrombin-activable fibrinolysis inhibitor OR carboxypeptidase B2 plasma OR carboxypeptidase R
  • 34. “We also manually standardized data related to lab measurement units and terminology related to patient race and ethnicity, geographical study regions, and names of drugs and drug families. “ Yet Another Issue
  • 35. (an accident waiting to happen)
  • 36. VARCHAR2 PROJ_TITLE EXPERIMENT_INFO ASSAY_DESCRIPTION KEYWORDS USER_PROFILE SUMMARY EXPT_METADATA SETTINGS_INFO REPORT_TEXT EXPT_NAME Databases: Where Knowledge Goes To Die MEETING_MINUTES PROJ_ACTIONS ASSAY_CONLCUSION COHORT_DESC INCLUSION_CRITERIA POLICY_DETAILS PROJECT_OVERVIEW RATIONALE JUSTIFICATION
  • 37. Text2Data MicroService TERMite Supports basic keyword search only TEXT Rich substrate for search and discovery & insight DATA
  • 38. Just What Is “The Data”? •Mentions of all • Genes, Diseases, Drugs, Tissues, Cells, Techniques, Assays, Measures, Protocols, Compounds, Regimens, Companies, People, Locations, Pathologies, Adverse Events, Pathways, Metabolism, Manufacturing Concepts, QC/QA, Pathogens, Strains, Animals … and so on... •… And their relationships to each other •… And their locations (section, database column) •… Inferring relationships between documents/entries •… Regardless of actual keyword used
  • 42.
  • 43.
  • 44. Why Give Ugly Data A Makeover? • ELN annotation using Bioassay Ontology • Find all experiments using any Cell Flourescence technique” • Pharmacovigilance • Monitoring newsfeeds & internal data for safety signals • Automatic Process Notification • Alert groups based on content of CRO documents Etc • Synergise Both Semantic Technology & Information Professionals • Re-energise Therapeutic Area Literature Searching • Build Knowledge Chains (Assertional Provenance) • Project Management à ELN Data à Screen SOP
  • 46. Spinach: The Truth Is Out There! Spinach is high in iron (!) ..oxalic acid in spinach prevents more than 90% of iron from being absorbed.. Acknowledgement
  • 47. Acknowledgements IMI Open PHACTS Team (many more involved, I just don’t have a photo L ) http://openphacts.org SciBite Team http://scibite.com