SlideShare a Scribd company logo
1 of 36
Download to read offline
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Automatic extraction of bioactivity
data from patents
Daniel Lowe*, Stefan Senger† and Roger Sayle*
*NextMove Software Cambridge, UK
†GlaxoSmithKline, Stevenage, UK
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Example Use cases
• “A patent has recently come out on a topic of
interest, can the key compounds be extracted
with their activity data?”
• “Which compounds have been found to be
active against this target?”
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
US Patent data freely available
patents.reedtech.com
(Or from the USPTO: bulkdata.uspto.gov)
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
= text-mined
What are
these
compounds?
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Understanding table semantics
SureChEMBL Google Patents
After text-mining for chemical entities:
Green = substituent
Purple = molecule
Source: US20170050925A9
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
SureChEMBL
Google PatentsPatent PDF
PatFetch
(NextMove Software)Source: US20010016661A1
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Understanding table semantics
5 columns
6 columns
• Columns merged such that header and body
have same number of columns
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Getting the compound
structures
• Chemical names
• Chemical sketches
• R-group tables
• Compound identifier associated with any of
the above
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Chemical names
• OPSIN (Open Parser for systematic IUPAC
nomenclature)
• Dictionaries (ChEMBL/PubChem/NextMove)
• Chemical line formula parsing, especially
useful for peptide names and R-group
definitions
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Chemical sketches
• Utilize the ChemDraw sketches provided by
the USPTO
• Detection and handling of repeat brackets and
positional variation
• Fixing obvious errors e.g. undervalent
nitrogen near to H atom with no bond
• Labels reinterpreted
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Formula Interpretation
Input ChemDraw 15 This work
HATU
C4F9
H3PO4
CON(cHex)2 No result
III-2 No result
N
N
+
O
N
N
N
N
F
P
-
F
F
F
F
F
A
T U
C C
F
FF
F
F
F
F F
F
FF
F F
FF
F
F
F
O
N
P
O
O
O
OH
HH HO P
O
OH
OH
I
I
2
-
I
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
R-group tables
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Resolving Identifiers
• Need to “name space” identifiers
– “Compound 1”, “Reference compound 1”,
“Example 1”
– But “Compound 1” = “cmpd 1” = “cpd. #1”
• Where a column is just called “#” is it a
compound number, example number or just a
table row number!
• Identifier may be defined multiple times e.g.
as a sketch and chemical name
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Resolving Identifiers
(text-mining)
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Resolving Identifiers
(Sketches)
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Resolving Identifiers
(Tables)
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Extracting compound-activity
relationships
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Excel table export
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Extracting compound-activity
relationships
What is the
target?
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Assay identification
• Naïve Bayes classifier trained from assay
descriptions identified by BindingDB curators
• 10-fold cross validation: 98.9% recall, 94.7%
precision
• Paragraph associated with next table or table
mentioned in paragraph
• Target/organism detected
• Care taken to avoid common irrelevant
organisms/proteins e.g. bovine serum albumin
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Results
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Results From US Patent
applications (2001-Mar 2017)
Red = Bioactivity
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Activities with associated
structures per year
0
100,000
200,000
300,000
400,000
500,000
600,000
Activitty-structurerelationshipsextracted
Publication Year
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Comparison with BindingDB
• Activity data from ~1500 US patent grants (2013-
2016) manually extracted over the course of 3 years
• ~150,000 activities
• Comparison done on the subset that was made
available in ChEMBL 22_1 (98,898 activity values,
1012 patents)
• As some assay results are missed by the automatic
extraction, and some are considered out of scope by
BindingDB, difficult to distinguish differences in
coverage from genuine disagreements
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Comparison with BindingDB
• Values normalized into nM
– 1000s of instances of measurements in nanometers!
• Mid point of ranges taken
• Structures compared by StdInChI
• Target name normalized to ChEMBL target ID
(organism specific), using either:
– ChEMBL target synonyms
– Normalize to HGNC symbol and check if HGNC symbol is a
ChEMBL target synonym
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Comparison
Expected
values
found
Expected
structures
found
Expected
value +
structure
found
Expected
value +
structure +
target
75% 65% 53% 18%
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Unclear structure assignment
? ?
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Stereochemistry and salts
OH
O
O
N
H
CH3H3C
Br
H
H
Patent BindingDB This
work
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Long tail of difficult cases
What does this
superscript term
mean?
What are the
units?
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Targets of patent data compared
to journal data
ChEMBL 22_1
(excluding BindingDB)
US Patent Applications
Common Target Classes
0%
5%
10%
15%
20%
25%
30%
35%
40%
2002
2004
2006
2008
2010
2012
2014
2016
%peryear
Kinase
GPCR (Family A)
Protease
Nuclear receptor
Voltage-gated ion
channel
Electrochemical
transporter
Oxidoreductase
0%
5%
10%
15%
20%
25%
30%
35%
40%
2002
2004
2006
2008
2010
2012
2014
2016
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Upcoming target classes
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
Percentageofdocumentswithactivityvaluesagainst
targetclass
Epigenetic writer (Patents)
Epigenetic reader (Patents)
Epigenetic writer (ChEMBL ex
BindingDB)
Epigenetic reader (ChEMBL ex
BindingDB)
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Future work
• Support for more complex R-group tables
• Improve recognition and resolution of protein
target names
• Support for activities specified in text e.g.
Example 1 has an IC50 of 12 nM measured at rat EP4
• Resolution of symbols for activity ranges e.g.
“A” indicates an IC50 value of less than 100 nM
• Improve assay metadata extraction
cf. BioAssay Express
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Disambiguation of Conflicting
structure descriptions
Image from
original filing
Redrawn by US
patent office in
ChemDraw
Intended
structure from
chemical name
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Conclusions
• Processing all US patents from 2001 to present
can be done in less than a day on a desktop PC
• Technique applicable to chemical properties
other than activity values
• Compound number <-> structure relationships
useful for key compound identification
• For the majority of patents, extracting
structure-activity relationships can be
significantly expedited
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Acknowledgements
• Noel O`Boyle
• John Mayfield
• Funding provided by:
253rd ACS National Meeting, San Francisco CA, USA 4th April 2017
Thank you for your time!
http://nextmovesoftware.com
http://nextmovesoftware.com/blog
daniel@nextmovesoftware.com

More Related Content

What's hot

CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningCINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningNextMove Software
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
 
Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...NextMove Software
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...NextMove Software
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Substructure Search Face-off
Substructure Search Face-offSubstructure Search Face-off
Substructure Search Face-offNextMove Software
 
ICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASDr. Haxel Consult
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChemNextMove Software
 
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...Dr. Haxel Consult
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspKen Karapetyan
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Ken Karapetyan
 
Resolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceResolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceChris Southan
 
OpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsOpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsValery Tkachenko
 
2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) englishPOSTECH Library
 

What's hot (20)

CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text miningCINF 18: Wikipedia and Wiktionary as resources for chemical text mining
CINF 18: Wikipedia and Wiktionary as resources for chemical text mining
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Substructure Search Face-off
Substructure Search Face-offSubstructure Search Face-off
Substructure Search Face-off
 
ICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CAS
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChem
 
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
 
Resolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experienceResolving cryptic needles to molecular structures: The GtoPdb experience
Resolving cryptic needles to molecular structures: The GtoPdb experience
 
Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...
 
Data model
Data modelData model
Data model
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...
 
OpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsOpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and Learnings
 
Open PHACTS Chemistry Platform Update and Learnings
Open PHACTS Chemistry Platform Update and Learnings Open PHACTS Chemistry Platform Update and Learnings
Open PHACTS Chemistry Platform Update and Learnings
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english
 

Similar to Automatic extraction of bioactivity data from patents

Unlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesUnlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesNextMove Software
 
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...Dr. Haxel Consult
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Paolo Missier
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 
Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...Iconic Translation Machines
 
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...taxonbytes
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesAlasdair Gray
 
Tackling the difficult areas of chemical entity extraction
Tackling the difficult areas of chemical entity extractionTackling the difficult areas of chemical entity extraction
Tackling the difficult areas of chemical entity extractionNextMove Software
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanPhilippe Rocca-Serra
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
How to Find Physical Properties of Chemical Substances
How to Find Physical Properties of Chemical SubstancesHow to Find Physical Properties of Chemical Substances
How to Find Physical Properties of Chemical SubstancesBruce Slutsky
 
CAS: Transforming Discovery
CAS: Transforming DiscoveryCAS: Transforming Discovery
CAS: Transforming DiscoveryCAS
 
Semantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity CardsSemantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity CardsFaegheh Hasibi
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals FederationManjulaPatel
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectKen Karapetyan
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 

Similar to Automatic extraction of bioactivity data from patents (20)

Unlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesUnlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articles
 
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...Making the Old New Again - Modern Technical Provides Access to Historical Che...
Making the Old New Again - Modern Technical Provides Access to Historical Che...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Tackling the difficult areas of chemical entity extraction
Tackling the difficult areas of chemical entity extractionTackling the difficult areas of chemical entity extraction
Tackling the difficult areas of chemical entity extraction
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
How to Find Physical Properties of Chemical Substances
How to Find Physical Properties of Chemical SubstancesHow to Find Physical Properties of Chemical Substances
How to Find Physical Properties of Chemical Substances
 
Kk m5re9v2e3
Kk m5re9v2e3Kk m5re9v2e3
Kk m5re9v2e3
 
CAS: Transforming Discovery
CAS: Transforming DiscoveryCAS: Transforming Discovery
CAS: Transforming Discovery
 
Semantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity CardsSemantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity Cards
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 

More from NextMove Software

Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]NextMove Software
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulNextMove Software
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)NextMove Software
 
Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?NextMove Software
 
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...NextMove Software
 

More from NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be useful
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)
 
Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?
 
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
 

Recently uploaded

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Automatic extraction of bioactivity data from patents

  • 1. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Automatic extraction of bioactivity data from patents Daniel Lowe*, Stefan Senger† and Roger Sayle* *NextMove Software Cambridge, UK †GlaxoSmithKline, Stevenage, UK
  • 2. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Example Use cases • “A patent has recently come out on a topic of interest, can the key compounds be extracted with their activity data?” • “Which compounds have been found to be active against this target?”
  • 3. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 US Patent data freely available patents.reedtech.com (Or from the USPTO: bulkdata.uspto.gov)
  • 4. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 = text-mined What are these compounds?
  • 5. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Understanding table semantics SureChEMBL Google Patents After text-mining for chemical entities: Green = substituent Purple = molecule Source: US20170050925A9
  • 6. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 SureChEMBL Google PatentsPatent PDF PatFetch (NextMove Software)Source: US20010016661A1
  • 7. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Understanding table semantics 5 columns 6 columns • Columns merged such that header and body have same number of columns
  • 8. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Getting the compound structures • Chemical names • Chemical sketches • R-group tables • Compound identifier associated with any of the above
  • 9. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Chemical names • OPSIN (Open Parser for systematic IUPAC nomenclature) • Dictionaries (ChEMBL/PubChem/NextMove) • Chemical line formula parsing, especially useful for peptide names and R-group definitions
  • 10. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Chemical sketches • Utilize the ChemDraw sketches provided by the USPTO • Detection and handling of repeat brackets and positional variation • Fixing obvious errors e.g. undervalent nitrogen near to H atom with no bond • Labels reinterpreted
  • 11. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Formula Interpretation Input ChemDraw 15 This work HATU C4F9 H3PO4 CON(cHex)2 No result III-2 No result N N + O N N N N F P - F F F F F A T U C C F FF F F F F F F FF F F FF F F F O N P O O O OH HH HO P O OH OH I I 2 - I
  • 12. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 R-group tables
  • 13. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Resolving Identifiers • Need to “name space” identifiers – “Compound 1”, “Reference compound 1”, “Example 1” – But “Compound 1” = “cmpd 1” = “cpd. #1” • Where a column is just called “#” is it a compound number, example number or just a table row number! • Identifier may be defined multiple times e.g. as a sketch and chemical name
  • 14. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Resolving Identifiers (text-mining)
  • 15. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Resolving Identifiers (Sketches)
  • 16. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Resolving Identifiers (Tables)
  • 17. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Extracting compound-activity relationships
  • 18. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Excel table export
  • 19. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Extracting compound-activity relationships What is the target?
  • 20. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Assay identification • Naïve Bayes classifier trained from assay descriptions identified by BindingDB curators • 10-fold cross validation: 98.9% recall, 94.7% precision • Paragraph associated with next table or table mentioned in paragraph • Target/organism detected • Care taken to avoid common irrelevant organisms/proteins e.g. bovine serum albumin
  • 21. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Results
  • 22. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Results From US Patent applications (2001-Mar 2017) Red = Bioactivity
  • 23. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Activities with associated structures per year 0 100,000 200,000 300,000 400,000 500,000 600,000 Activitty-structurerelationshipsextracted Publication Year
  • 24. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Comparison with BindingDB • Activity data from ~1500 US patent grants (2013- 2016) manually extracted over the course of 3 years • ~150,000 activities • Comparison done on the subset that was made available in ChEMBL 22_1 (98,898 activity values, 1012 patents) • As some assay results are missed by the automatic extraction, and some are considered out of scope by BindingDB, difficult to distinguish differences in coverage from genuine disagreements
  • 25. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Comparison with BindingDB • Values normalized into nM – 1000s of instances of measurements in nanometers! • Mid point of ranges taken • Structures compared by StdInChI • Target name normalized to ChEMBL target ID (organism specific), using either: – ChEMBL target synonyms – Normalize to HGNC symbol and check if HGNC symbol is a ChEMBL target synonym
  • 26. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Comparison Expected values found Expected structures found Expected value + structure found Expected value + structure + target 75% 65% 53% 18%
  • 27. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Unclear structure assignment ? ?
  • 28. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Stereochemistry and salts OH O O N H CH3H3C Br H H Patent BindingDB This work
  • 29. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Long tail of difficult cases What does this superscript term mean? What are the units?
  • 30. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Targets of patent data compared to journal data ChEMBL 22_1 (excluding BindingDB) US Patent Applications Common Target Classes 0% 5% 10% 15% 20% 25% 30% 35% 40% 2002 2004 2006 2008 2010 2012 2014 2016 %peryear Kinase GPCR (Family A) Protease Nuclear receptor Voltage-gated ion channel Electrochemical transporter Oxidoreductase 0% 5% 10% 15% 20% 25% 30% 35% 40% 2002 2004 2006 2008 2010 2012 2014 2016
  • 31. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Upcoming target classes 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Percentageofdocumentswithactivityvaluesagainst targetclass Epigenetic writer (Patents) Epigenetic reader (Patents) Epigenetic writer (ChEMBL ex BindingDB) Epigenetic reader (ChEMBL ex BindingDB)
  • 32. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Future work • Support for more complex R-group tables • Improve recognition and resolution of protein target names • Support for activities specified in text e.g. Example 1 has an IC50 of 12 nM measured at rat EP4 • Resolution of symbols for activity ranges e.g. “A” indicates an IC50 value of less than 100 nM • Improve assay metadata extraction cf. BioAssay Express
  • 33. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Disambiguation of Conflicting structure descriptions Image from original filing Redrawn by US patent office in ChemDraw Intended structure from chemical name
  • 34. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Conclusions • Processing all US patents from 2001 to present can be done in less than a day on a desktop PC • Technique applicable to chemical properties other than activity values • Compound number <-> structure relationships useful for key compound identification • For the majority of patents, extracting structure-activity relationships can be significantly expedited
  • 35. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Acknowledgements • Noel O`Boyle • John Mayfield • Funding provided by:
  • 36. 253rd ACS National Meeting, San Francisco CA, USA 4th April 2017 Thank you for your time! http://nextmovesoftware.com http://nextmovesoftware.com/blog daniel@nextmovesoftware.com