SlideShare a Scribd company logo
1 of 42
Download to read offline
256th ACS National Meeting, Boston, Aug 2018
Structure searching for patent
information:
The need for speed
John Mayfield, Noel O’Boyle, and Roger Sayle

NextMove Software
Cambridge, UK
256th ACS National Meeting, Boston, Aug 2018
Data Search Algorithms
Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
US 2016/16966 A1
Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
US 2016/16966 A1
Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
Product Properties
7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid
Reactant Properties
7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol
(3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol
Agent Properties
1,4-dioxane 3mL
water 1.5mL
sodium carbonate 435 mg, 4.10 mol
tetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmol
DMSO
US 2016/16966 A1
252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016
SKETCH PROCESSING
US 2004/101442 C00025
Default Interpretation
(USPTO molfile)
Our InterpretationOriginal Sketch
Re-interpretation of ChemDraw sketches

1. Correct systematic errors

2. Extract extra semantics (structure variation, reaction schemes)

3. Categorise output (is this something we can’t interpret)
John	May,	et	al.	Sketchy	Sketches:	Hiding	Chemistry	in	Plain	Sight.	Seventh	Joint	Sheffield	Conference	on	
Cheminformatics.	2016
Example 26, US 09718816 B2
John	May,	et	al.	Sketchy	Sketches:	Hiding	Chemistry	in	Plain	Sight.	Seventh	Joint	Sheffield	Conference	on	
Cheminformatics.	2016
	Step	1
	Step	4
	Step	3
	Step	2
	etc..
Reaction SCHEME SKETCHES
252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016
SKETCH CATEGORISATION
Molecule/Specific
Molecule/Generic
Reaction/Specific
Reaction/Generic
NoConnectionTable
US 7092578 B2, Table 1 ”Signaling adaptive-quantization matrices in JPEG using end-of-block codes”
US 7092578 B2, Table 1

C000001.CDX
A category is assigned to
each extracted sketch:
252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016
SKETCH CATEGORISATION
US 7092578 B2, Table 1 ”Signaling adaptive-quantization matrices in JPEG using end-of-block codes”
US 7092578 B2, Table 1

C000001.CDX
256th ACS National Meeting, Boston, Aug 2018
mixtures and formulations
(cocktails)
US 2001/2252 A1
“TOOTH WHITENING PREPARATIONS”
252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016
R group Tables
US 2016/0002208 A1
252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016
R group Tables
US 2016/0002208 A1
252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016
Chemical name translation
6-aminopyrimidine-2,4,5-triol	
Chinese	(Hanzi	used	for	each	morpheme)	
6-氨基嘧啶-2,4,5-三醇	
Japanese	(Phonetic	translation	to	Katakana)	
6-アミノピリミジン-2,4,5-トリオール
Korean	(Phonetic	translation	to	Hangul)	
6-아미노피리미딘-2,4,5-트리올
		ammonia	radical					pyrimidine																																three		alcohol
amino																							pyrimidine																																																			tri														ol
amino																											pyrimidine																																																						tri																							ol
N
N
OHHO
HO
NH2
EXTRACTED CHEMICAL DATA GROWTH
0
5M
10M
15M
20M
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
CumulativeNumberofRecords
Year
USPTO Exemplified Compounds
USPTO Reactions
EPO Reactions
USPTO Mixtures
~22M
~6M
~1M
256th ACS National Meeting, Boston, Aug 2018
Rule-base text-mining SPEED
Chih-Hsuan Wei et al. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V
chemical-disease relation (CDR) task. Database (Oxford). 2016; 2016: baw032. PMC4799720
BioCreAtIvE V challenge
evaluating text-mining and
extraction systems.

Web service response time to
annotate an abstract evaluated for
CDR task.
256th ACS National Meeting, Boston, Aug 2018
Rule-base text-mining SPEED
Chih-Hsuan Wei et al. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V
chemical-disease relation (CDR) task. Database (Oxford). 2016; 2016: baw032. PMC4799720
BioCreAtIvE V challenge
evaluating text-mining and
extraction systems.

Web service response time to
annotate an abstract evaluated for
CDR task.

Efficient rule-based text-mining
provides provenance for
annotations and can mine entire
back-archive of US patents in ~24
hours on a single machine.
256th ACS National Meeting, Boston, Aug 2018
Data Search Algorithms
256th ACS National Meeting, Boston, Aug 2018
256th ACS National Meeting, Boston, Aug 2018
Arthor Demo Video
256th ACS National Meeting, Boston, Aug 2018
Intelligent query box
Systematic	Name Date	Range Trivial	Name
Yield	Range Affiliation Reaction	SMARTS
Disease	Target DocumentLine	Formula
SMILES InChIAuthor Protein	Target Collection
Reaction	Type	(NameRxn)SMARTSSource
…and	logical	combinations	thereof
256th ACS National Meeting, Boston, Aug 2018
Pistachio: Reactions
256th ACS National Meeting, Boston, Aug 2018
make/break REACTION SEARCH
Find:	“7H-purine	substructure	product”
Find:	“Synthesis	of	7H-purine”
Requires fast-substructure search to compute using the complement of two sets.
256th ACS National Meeting, Boston, Aug 2018
Cocktails: Mixtures and formulations
256th ACS National Meeting, Boston, Aug 2018
Data Search Algorithms
256th ACS National Meeting, Boston, Aug 2018
ARTHOR - MOTIVATION
History in optimising search:

– R.Sayle, “1st-class SMARTS patterns”, Daylight CIS, European UGM, EuroMUG
1997, Verona, Italy
– R. Sayle, “Improved SMILES Substructure Searching”, Daylight CIS, European
UGM, EuroMUG 2000, Cambridge, UK.
– R. Sayle, “Efficient Matching of Chemical Subgraphs”, 9th ICCS,
Noordwijkerhout, The Netherlands, 9th June 2011.
“A substructure search of indole against eMolecules (~7M at the time)
takes 17 seconds” - 2014
Benchmark of 3.4K queries on 7M compounds from eMolecules
– John May and Roger Sayle, “Substructure Search Face-Off”, CCNM,
Cambridge, May 2015
256th ACS National Meeting, Boston, Aug 2018
SUBSEARCH PERFORMANCE
Updated from: John May and Roger Sayle, Substructure Search Face-Off, Presented at CCNM, Cambridge, May 2015
https://www.slideshare.net/NextMoveSoftware/substructure-search-faceoff
1
10
100
1000
3341
1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07
Time (ms)
NumQueries(n)
1s 10s 1m 5m 1h
90%
BioVia Direct
EPAM Bingo NoSQL
ChemAxon JCART
RDKit Cart
OpenChemLib
OB FastSearch
50m35s
1h2m59s
2h9m11s
2h44m47s
5h13m19s
5h53m40s
2d11h42m14s
EPAM Bingo Cart
Sachem 16m50s
256th ACS National Meeting, Boston, Aug 2018
SUBSEARCH PERFORMANCE
John May and Roger Sayle, Substructure Search Face-Off, Presented at CCNM, Cambridge, May 2015
https://www.slideshare.net/NextMoveSoftware/substructure-search-faceoff
1
10
100
1000
3341
1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07
Time (ms)
NumQueries(n)
1s 10s 1m 5m 1h
90%
BioVia Direct
EPAM Bingo NoSQL
ChemAxon JCART
RDKit Cart
OpenChemLib
OB FastSearch
50m35s
1h2m59s
2h9m11s
2h44m47s
5h13m19s
5h53m40s
2d11h42m14s
EPAM Bingo Cart
Sachem 16m50s
Arthor (Brute force) 27m17s
Arthor 46s
Arthor (8 threads) 12s
256th ACS National Meeting, Boston, Aug 2018
Substructure Optimisations
Ahead-of-time (AOT)
• Chemical records converted to pointer-free memory optimised
data structure (~166B per molecule)

• Path-based fingerprint computed and stored in inverted index

• Sensible ordering of results

Just-in-time (JIT)
• SMARTS traversal based on frequency statistics

• Atom/Bond expressions compiled and optimised using
boolean algebra

• Fingerprint screening bit selection
256th ACS National Meeting, Boston, Aug 2018
AOT: Storage order
Order by those most similar to the query and favour plain molecules.
CID 60795
CID 11669779
CID 11576259
CID 37888405
256th ACS National Meeting, Boston, Aug 2018
AOT: Storage order
CID 60795
CID 11669779
CID 11576259
CID 37888405
CID 60795
CID 11669779
CID 11576259
CID 37888405
Order by those most similar to the query and favour plain molecules.
256th ACS National Meeting, Boston, Aug 2018
Storage order
Order by those most similar to the query and favour plain molecules.

Tanimoto can’t be calculated ahead of time, but can be approximated.
Generate a hexadecimal key based on size and other properties
favouring “plain” molecules and order by this.
000e000e01000a0004000065000000 CCC(C(=O)O)Oc1ccc(cc1)Cl CHEMBL23477
AtomCountBondCountPartCountCarbonCountCommonHeteroCount
AtomicNumberSum
RadicalCount
ChargeCount
IsotopeCount
256th ACS National Meeting, Boston, Aug 2018
JIT: Pattern Traversal
The same query can be traversed (and matched) in a different orders.
How much slower?

Best BrCCCC
3.4x CC(Br)CC
5.6x CCCCBr
Best n1ccc2c1cccc2
1.4x c12c(ccn1)cccc2
2.3x c12ccccc1ccn2
3.3x c1cnc2ccccc12
3.3x c12ccnc1cccc2
4.8x c1c2ccccc2nc1
Before the query is matched it is rearranged to the “best” traversal
order based on frequency statistics
256th ACS National Meeting, Boston, Aug 2018
SIMILARITY Optimisations
Ahead-of-time (AOT)
• Store binary fingerprints in buckets based on the cardinality of
the fingerprint as the number of set bits: pop(ulation) count

• Stripe (or “transpose”) fingerprints reducing the memory reads
for the JIT code

Just-in-time (JIT)
• Generate machine code to perform to calculate the Tanimoto
256th ACS National Meeting, Boston, Aug 2018
TANIMOTO CODE GEN
double similarity(long[] q_fp, long[] db_fp) {
int intersect = 0;
int union = 0;
for (int i = 0; i < q_fp.length; i++) {
intersect += Long.bitCount(q_fp[i] & db_fp[i]);
union += Long.bitCount(q_fp[i] | db_fp[i]);
}
return intersect / (double) union;
}
double similarity(long[] q_fp, long[] db_fp, int q_pop, int db_pop) {
int intersect = 0;
for (int i = 0; i < q_fp.length; i++) {
intersect += Long.bitCount(q_fp[i] & db_fp[i]);
}
return intersect / (double) (q_pop+db_pop-intersect);
}
Tanimoto Calculation (Java, 64-bit words)
Equivalent Tanimoto Calculating Union from Intersect
256th ACS National Meeting, Boston, Aug 2018
TANIMOTO CODE GEN
double intersect(long[] q_fp, long[] db_fp) {
int pop = 0;
for (int i = 0; i < q_fp.length; i++) {
intersect += Long.bitCount(q_fp[i] & db_fp[i]);
}
return pop;
}
double intersect(long[] q_fp, long[] db_fp) {
int pop = 0;
intersect += Long.bitCount(q_fp[0] & db_fp[0]);
intersect += Long.bitCount(q_fp[1] & db_fp[1]);
intersect += Long.bitCount(q_fp[2] & db_fp[2]);
intersect += Long.bitCount(q_fp[3] & db_fp[3]);
intersect += Long.bitCount(q_fp[4] & db_fp[4]);
intersect += Long.bitCount(q_fp[5] & db_fp[5]);
intersect += Long.bitCount(q_fp[6] & db_fp[6]);
intersect += Long.bitCount(q_fp[7] & db_fp[7]);
intersect += Long.bitCount(q_fp[8] & db_fp[8]);
intersect += Long.bitCount(q_fp[9] & db_fp[9]);
intersect += Long.bitCount(q_fp[10] & db_fp[10]);
intersect += Long.bitCount(q_fp[11] & db_fp[11]);
intersect += Long.bitCount(q_fp[12] & db_fp[12]);
intersect += Long.bitCount(q_fp[13] & db_fp[13]);
intersect += Long.bitCount(q_fp[14] & db_fp[14]);
intersect += Long.bitCount(q_fp[15] & db_fp[15]);
return pop;
}
Intersect Function
Intersect Function Unrolled
256th ACS National Meeting, Boston, Aug 2018
CHEMBL1906145
TANIMOTO CODE GEN
int intersectChembl1906145(long[] db_fp) {
int pop = 0;
pop += Long.bitCount(0x0000000000000000L & db_fp[1]);
pop += Long.bitCount(0x0000000000000000L & db_fp[1]);
pop += Long.bitCount(0x0000000000400020L & db_fp[2]);
pop += Long.bitCount(0x0010000008000002L & db_fp[3]);
pop += Long.bitCount(0x0160000000000200L & db_fp[4]);
pop += Long.bitCount(0x00000800000a1000L & db_fp[5]);
pop += Long.bitCount(0x1000000001580000L & db_fp[6]);
pop += Long.bitCount(0x0800002000000000L & db_fp[7]);
pop += Long.bitCount(0x0000000000000841L & db_fp[8]);
pop += Long.bitCount(0x0000000006000100L & db_fp[9]);
pop += Long.bitCount(0x0000280002002100L & db_fp[10]);
pop += Long.bitCount(0x0100000000048000L & db_fp[11]);
pop += Long.bitCount(0x0000002088000000L & db_fp[12]);
pop += Long.bitCount(0x0000008000400000L & db_fp[13]);
pop += Long.bitCount(0x0008000000000100L & db_fp[14]);
pop += Long.bitCount(0x0000000000010180L & db_fp[15]);
return pop;
}
For a given query (e.g. ) we can hard code the fingerprint.
256th ACS National Meeting, Boston, Aug 2018
CHEMBL1906145
TANIMOTO CODE GEN
bitCount on empty and singleton words (for ) can be eliminated.
int intersectChembl1906145(long[] db_fp) {
int pop = 0;
pop += (db_fp[0] >> 2) & 0x1;
// pop += Long.bitCount(0x0000000000000000L & db_fp[1]);
pop += Long.bitCount(0x0000000000400020L & db_fp[2]);
pop += Long.bitCount(0x0010000008000002L & db_fp[3]);
pop += Long.bitCount(0x0160000000000200L & db_fp[4]);
pop += Long.bitCount(0x00000800000a1000L & db_fp[5]);
pop += Long.bitCount(0x1000000001580000L & db_fp[6]);
pop += Long.bitCount(0x0800002000000000L & db_fp[7]);
pop += Long.bitCount(0x0000000000000841L & db_fp[8]);
pop += Long.bitCount(0x0000000006000100L & db_fp[9]);
pop += Long.bitCount(0x0000280002002100L & db_fp[10]);
pop += Long.bitCount(0x0100000000048000L & db_fp[11]);
pop += Long.bitCount(0x0000002088000000L & db_fp[12]);
pop += Long.bitCount(0x0000008000400000L & db_fp[13]);
pop += Long.bitCount(0x0008000000000100L & db_fp[14]);
pop += Long.bitCount(0x0000000000010180L & db_fp[15]);
return pop;
}
256th ACS National Meeting, Boston, Aug 2018
2
6
13
3
12
4
11
5
10
14
15
79
8
To optimise the remaining 64-bit words (numbered 2-15) we can derive a graph by
connecting any two words that share a common bit.
TANIMOTO CODE GEN
256th ACS National Meeting, Boston, Aug 2018
TANIMOTO CODE GEN
2
6
13
3
12
4
11
5
10
14
15
79
8
Colouring the graph (such that no two colours are adjacent) tells us how many pop
counts we will need (the number of colours).
256th ACS National Meeting, Boston, Aug 2018
2
6
13
3
12
4
11
5
10
14
15
79
8
TANIMOTO CODE GEN
int intersectChembl1906145(long[] db_fp) {
int pop = 0;
pop += (db_fp[0] >> 2) & 0x1;
// pop += Long.bitCount(0x0000000000000000L & db_fp[1]);
pop += Long.bitCount((0x0000000000400020L & db_fp[2]) |
(0x00000800000a1000L & db_fp[5]) |
(0x0000000006000100L & db_fp[9]) |
(0x0010000008000002L & db_fp[3]) |
(0x0800002000000000L & db_fp[7]) |
(0x0160000000000200L & db_fp[4]));
pop += Long.bitCount((0x1000000001580000L & db_fp[6]) |
(0x0000280002002100L & db_fp[10]) |
(0x0100000000048000L & db_fp[11]) |
(0x0000002088000000L & db_fp[12]) |
(0x0000000000000841L & db_fp[8]));
pop += Long.bitCount((0x0000008000400000L & db_fp[13]) |
(0x0008000000000100L & db_fp[14]));
pop += Long.bitCount(0x0000000000010180L & db_fp[15]);
return pop;
}
We can combine bitCount on words of the
same colour
256th ACS National Meeting, Boston, Aug 2018
Speedy tools for structure searching

• Quick feedback from a search allows refinement if needed

• Enables different types of search (e.g. make/break)
Speedy tools for text-mining patents
• Assists in improvement of grammar and dictionaries

• Extract from all patents not just a subset of IPC codes
CONCLUSIONS
Future Work
• Extract additional types of chemical data

• Advanced query features beyond SMARTS
256th ACS National Meeting, Boston, Aug 2018
Acknowledgements
Yurii Moroz, Chemspace

Pat Walters, Relay Therapeutics

James Davidson, Vernalis

Mathew Swain, Vernalis
Daniel Lowe, Minesoft
Related Talks:

• R Sayle. Recent Advances in Chemical & Biological Search Systems: Evolution v
Resolution. ICCS, May 2018

• J Mayfield, Pistachio: Search and Faceting of Large Reaction Databases. 254th ACS
National Meeting, Aug 2017

• D Lowe. Sketchy sketches: Hiding chemistry in plain sight. 252nd ACS National
Meeting, Aug 2016
Available at: https://www.slideshare.net/NextMoveSoftware
CINF 162: NextMove for Chemspace: Millisecond search in a database of 100
million structures. Thursday 10:25, Grand Ballroom A

CINF 170: Regioselectivity: An application of expert systems and ontologies to
chemical (named) reaction analysis. Thursday 10:40, Lewis

More Related Content

What's hot

CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...NextMove Software
 
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...NextMove Software
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)NextMove Software
 
Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions fr...
Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions fr...Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions fr...
Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions fr...NextMove Software
 
Unlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesUnlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesNextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulNextMove Software
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChemNextMove Software
 
Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...NextMove Software
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]NextMove Software
 
Sketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightSketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightNextMove Software
 
CINF 4: Naming algorithms for derivatives of peptide-like natural products
CINF 4: Naming algorithms for derivatives of peptide-like natural productsCINF 4: Naming algorithms for derivatives of peptide-like natural products
CINF 4: Naming algorithms for derivatives of peptide-like natural productsNextMove Software
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...NextMove Software
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspKen Karapetyan
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Ken Karapetyan
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Condensed Lipid Poster Sigma Xi Final
Condensed Lipid Poster Sigma Xi FinalCondensed Lipid Poster Sigma Xi Final
Condensed Lipid Poster Sigma Xi FinalMax Bourdillon
 

What's hot (20)

CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
CHAS 31: Encoding reactive chemical hazards and incompatibilities in an alert...
 
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)
 
Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions fr...
Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions fr...Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions fr...
Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions fr...
 
Unlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articlesUnlocking chemical information from tables and legacy articles
Unlocking chemical information from tables and legacy articles
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be useful
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChem
 
Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
 
Sketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightSketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sight
 
CINF 4: Naming algorithms for derivatives of peptide-like natural products
CINF 4: Naming algorithms for derivatives of peptide-like natural productsCINF 4: Naming algorithms for derivatives of peptide-like natural products
CINF 4: Naming algorithms for derivatives of peptide-like natural products
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
CINF 1: Generating Canonical Identifiers For (Glycoproteins And Other Chemica...
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Condensed Lipid Poster Sigma Xi Final
Condensed Lipid Poster Sigma Xi FinalCondensed Lipid Poster Sigma Xi Final
Condensed Lipid Poster Sigma Xi Final
 

Similar to CINF 35: Structure searching for patent information: The need for speed

Mukesh Kumar Resume
Mukesh Kumar ResumeMukesh Kumar Resume
Mukesh Kumar Resumemukeshkr1
 
CV 22-09-15 - updated
CV  22-09-15 - updatedCV  22-09-15 - updated
CV 22-09-15 - updatedSyeda Gilani
 
Alkaline Extraction of Cobia (Rachycentroncanadum) Proteins: Physicochemical ...
Alkaline Extraction of Cobia (Rachycentroncanadum) Proteins: Physicochemical ...Alkaline Extraction of Cobia (Rachycentroncanadum) Proteins: Physicochemical ...
Alkaline Extraction of Cobia (Rachycentroncanadum) Proteins: Physicochemical ...IJERA Editor
 
An Efficient Synthetic Approach Towards 4-Cyano-3-(Methylthio)-5-Oxo-2H-Pyraz...
An Efficient Synthetic Approach Towards 4-Cyano-3-(Methylthio)-5-Oxo-2H-Pyraz...An Efficient Synthetic Approach Towards 4-Cyano-3-(Methylthio)-5-Oxo-2H-Pyraz...
An Efficient Synthetic Approach Towards 4-Cyano-3-(Methylthio)-5-Oxo-2H-Pyraz...inventionjournals
 
Preparation and Characterization of Phosphate Based Glasses
Preparation and Characterization of Phosphate Based GlassesPreparation and Characterization of Phosphate Based Glasses
Preparation and Characterization of Phosphate Based GlassesIRJET Journal
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
A Study of Synthesis and Characterization of Liquid Crystalline Substances
A Study of Synthesis and Characterization of Liquid Crystalline SubstancesA Study of Synthesis and Characterization of Liquid Crystalline Substances
A Study of Synthesis and Characterization of Liquid Crystalline Substancesijtsrd
 
Fibrous Scaffold Produced By Rotary Jet Spinning Technique
Fibrous Scaffold Produced By Rotary Jet Spinning TechniqueFibrous Scaffold Produced By Rotary Jet Spinning Technique
Fibrous Scaffold Produced By Rotary Jet Spinning TechniqueIJERA Editor
 
GREEN GENES- A PROMISING FUEL SOURCE FOR FUTURE Narasimha Reddy Palicherlu
GREEN GENES- A PROMISING FUEL SOURCE FOR FUTURE Narasimha Reddy PalicherluGREEN GENES- A PROMISING FUEL SOURCE FOR FUTURE Narasimha Reddy Palicherlu
GREEN GENES- A PROMISING FUEL SOURCE FOR FUTURE Narasimha Reddy PalicherluNarasimha Reddy Palicherlu
 
Ceramics International
Ceramics InternationalCeramics International
Ceramics InternationalKulwinder Kaur
 
1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagineAlexander Decker
 
1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagineAlexander Decker
 
1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagineAlexander Decker
 
Determination of 8-Hydroxy-2 Deoxyguanosine in Pseudomonas Fluorescens Freeze...
Determination of 8-Hydroxy-2 Deoxyguanosine in Pseudomonas Fluorescens Freeze...Determination of 8-Hydroxy-2 Deoxyguanosine in Pseudomonas Fluorescens Freeze...
Determination of 8-Hydroxy-2 Deoxyguanosine in Pseudomonas Fluorescens Freeze...Agriculture Journal IJOEAR
 
Handbook for Chemical Process Industries-CRC Press_Science Publishers (2023).pdf
Handbook for Chemical Process Industries-CRC Press_Science Publishers (2023).pdfHandbook for Chemical Process Industries-CRC Press_Science Publishers (2023).pdf
Handbook for Chemical Process Industries-CRC Press_Science Publishers (2023).pdfNhnL635163
 

Similar to CINF 35: Structure searching for patent information: The need for speed (20)

Presentation GA KALYAN
Presentation GA  KALYANPresentation GA  KALYAN
Presentation GA KALYAN
 
Ijaret 06 10_002
Ijaret 06 10_002Ijaret 06 10_002
Ijaret 06 10_002
 
Mukesh Kumar Resume
Mukesh Kumar ResumeMukesh Kumar Resume
Mukesh Kumar Resume
 
CV 22-09-15 - updated
CV  22-09-15 - updatedCV  22-09-15 - updated
CV 22-09-15 - updated
 
Alkaline Extraction of Cobia (Rachycentroncanadum) Proteins: Physicochemical ...
Alkaline Extraction of Cobia (Rachycentroncanadum) Proteins: Physicochemical ...Alkaline Extraction of Cobia (Rachycentroncanadum) Proteins: Physicochemical ...
Alkaline Extraction of Cobia (Rachycentroncanadum) Proteins: Physicochemical ...
 
An Efficient Synthetic Approach Towards 4-Cyano-3-(Methylthio)-5-Oxo-2H-Pyraz...
An Efficient Synthetic Approach Towards 4-Cyano-3-(Methylthio)-5-Oxo-2H-Pyraz...An Efficient Synthetic Approach Towards 4-Cyano-3-(Methylthio)-5-Oxo-2H-Pyraz...
An Efficient Synthetic Approach Towards 4-Cyano-3-(Methylthio)-5-Oxo-2H-Pyraz...
 
Preparation and Characterization of Phosphate Based Glasses
Preparation and Characterization of Phosphate Based GlassesPreparation and Characterization of Phosphate Based Glasses
Preparation and Characterization of Phosphate Based Glasses
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
A Study of Synthesis and Characterization of Liquid Crystalline Substances
A Study of Synthesis and Characterization of Liquid Crystalline SubstancesA Study of Synthesis and Characterization of Liquid Crystalline Substances
A Study of Synthesis and Characterization of Liquid Crystalline Substances
 
Symposium
SymposiumSymposium
Symposium
 
1 a
1 a1 a
1 a
 
Fibrous Scaffold Produced By Rotary Jet Spinning Technique
Fibrous Scaffold Produced By Rotary Jet Spinning TechniqueFibrous Scaffold Produced By Rotary Jet Spinning Technique
Fibrous Scaffold Produced By Rotary Jet Spinning Technique
 
GREEN GENES- A PROMISING FUEL SOURCE FOR FUTURE Narasimha Reddy Palicherlu
GREEN GENES- A PROMISING FUEL SOURCE FOR FUTURE Narasimha Reddy PalicherluGREEN GENES- A PROMISING FUEL SOURCE FOR FUTURE Narasimha Reddy Palicherlu
GREEN GENES- A PROMISING FUEL SOURCE FOR FUTURE Narasimha Reddy Palicherlu
 
Ceramics International
Ceramics InternationalCeramics International
Ceramics International
 
P368995
P368995P368995
P368995
 
1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine
 
1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine
 
1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine1 tetralinyl as carboxamide-protecting group for asparagine
1 tetralinyl as carboxamide-protecting group for asparagine
 
Determination of 8-Hydroxy-2 Deoxyguanosine in Pseudomonas Fluorescens Freeze...
Determination of 8-Hydroxy-2 Deoxyguanosine in Pseudomonas Fluorescens Freeze...Determination of 8-Hydroxy-2 Deoxyguanosine in Pseudomonas Fluorescens Freeze...
Determination of 8-Hydroxy-2 Deoxyguanosine in Pseudomonas Fluorescens Freeze...
 
Handbook for Chemical Process Industries-CRC Press_Science Publishers (2023).pdf
Handbook for Chemical Process Industries-CRC Press_Science Publishers (2023).pdfHandbook for Chemical Process Industries-CRC Press_Science Publishers (2023).pdf
Handbook for Chemical Process Industries-CRC Press_Science Publishers (2023).pdf
 

More from NextMove Software

Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 
Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?NextMove Software
 
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...NextMove Software
 

More from NextMove Software (14)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?
 
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
 

Recently uploaded

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSĂŠrgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...SĂŠrgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...SĂŠrgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSĂŠrgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSĂŠrgio Sacani
 

Recently uploaded (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

CINF 35: Structure searching for patent information: The need for speed

  • 1. 256th ACS National Meeting, Boston, Aug 2018 Structure searching for patent information: The need for speed John Mayfield, Noel O’Boyle, and Roger Sayle NextMove Software Cambridge, UK
  • 2. 256th ACS National Meeting, Boston, Aug 2018 Data Search Algorithms
  • 3. Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012 To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] US 2016/16966 A1
  • 4. Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012 To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] US 2016/16966 A1
  • 5. Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012 To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4- dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3- d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid. [0517] Product Properties 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid Reactant Properties 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol (3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol Agent Properties 1,4-dioxane 3mL water 1.5mL sodium carbonate 435 mg, 4.10 mol tetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmol DMSO US 2016/16966 A1
  • 6. 252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016 SKETCH PROCESSING US 2004/101442 C00025 Default Interpretation (USPTO molfile) Our InterpretationOriginal Sketch Re-interpretation of ChemDraw sketches 1. Correct systematic errors 2. Extract extra semantics (structure variation, reaction schemes) 3. Categorise output (is this something we can’t interpret) John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016
  • 7. Example 26, US 09718816 B2 John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step 1 Step 4 Step 3 Step 2 etc.. Reaction SCHEME SKETCHES
  • 8. 252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016 SKETCH CATEGORISATION Molecule/Specific Molecule/Generic Reaction/Specific Reaction/Generic NoConnectionTable US 7092578 B2, Table 1 ”Signaling adaptive-quantization matrices in JPEG using end-of-block codes” US 7092578 B2, Table 1 C000001.CDX A category is assigned to each extracted sketch:
  • 9. 252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016 SKETCH CATEGORISATION US 7092578 B2, Table 1 ”Signaling adaptive-quantization matrices in JPEG using end-of-block codes” US 7092578 B2, Table 1 C000001.CDX
  • 10. 256th ACS National Meeting, Boston, Aug 2018 mixtures and formulations (cocktails) US 2001/2252 A1 “TOOTH WHITENING PREPARATIONS”
  • 11. 252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016 R group Tables US 2016/0002208 A1
  • 12. 252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016 R group Tables US 2016/0002208 A1
  • 13. 252nd ACS National Meeting, Philadelphia PA, USA 25th August 2016 Chemical name translation 6-aminopyrimidine-2,4,5-triol Chinese (Hanzi used for each morpheme) 6-氨基嘧啶-2,4,5-三醇 Japanese (Phonetic translation to Katakana) 6-アミノピリミジン-2,4,5-トリオール Korean (Phonetic translation to Hangul) 6-아미노피리미딘-2,4,5-트리올 ammonia radical pyrimidine three alcohol amino pyrimidine tri ol amino pyrimidine tri ol N N OHHO HO NH2
  • 14. EXTRACTED CHEMICAL DATA GROWTH 0 5M 10M 15M 20M 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 CumulativeNumberofRecords Year USPTO Exemplified Compounds USPTO Reactions EPO Reactions USPTO Mixtures ~22M ~6M ~1M
  • 15. 256th ACS National Meeting, Boston, Aug 2018 Rule-base text-mining SPEED Chih-Hsuan Wei et al. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database (Oxford). 2016; 2016: baw032. PMC4799720 BioCreAtIvE V challenge evaluating text-mining and extraction systems. Web service response time to annotate an abstract evaluated for CDR task.
  • 16. 256th ACS National Meeting, Boston, Aug 2018 Rule-base text-mining SPEED Chih-Hsuan Wei et al. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database (Oxford). 2016; 2016: baw032. PMC4799720 BioCreAtIvE V challenge evaluating text-mining and extraction systems. Web service response time to annotate an abstract evaluated for CDR task. Efficient rule-based text-mining provides provenance for annotations and can mine entire back-archive of US patents in ~24 hours on a single machine.
  • 17. 256th ACS National Meeting, Boston, Aug 2018 Data Search Algorithms
  • 18. 256th ACS National Meeting, Boston, Aug 2018
  • 19. 256th ACS National Meeting, Boston, Aug 2018 Arthor Demo Video
  • 20. 256th ACS National Meeting, Boston, Aug 2018 Intelligent query box Systematic Name Date Range Trivial Name Yield Range Affiliation Reaction SMARTS Disease Target DocumentLine Formula SMILES InChIAuthor Protein Target Collection Reaction Type (NameRxn)SMARTSSource …and logical combinations thereof
  • 21. 256th ACS National Meeting, Boston, Aug 2018 Pistachio: Reactions
  • 22. 256th ACS National Meeting, Boston, Aug 2018 make/break REACTION SEARCH Find: “7H-purine substructure product” Find: “Synthesis of 7H-purine” Requires fast-substructure search to compute using the complement of two sets.
  • 23. 256th ACS National Meeting, Boston, Aug 2018 Cocktails: Mixtures and formulations
  • 24. 256th ACS National Meeting, Boston, Aug 2018 Data Search Algorithms
  • 25. 256th ACS National Meeting, Boston, Aug 2018 ARTHOR - MOTIVATION History in optimising search: – R.Sayle, “1st-class SMARTS patterns”, Daylight CIS, European UGM, EuroMUG 1997, Verona, Italy – R. Sayle, “Improved SMILES Substructure Searching”, Daylight CIS, European UGM, EuroMUG 2000, Cambridge, UK. – R. Sayle, “Efficient Matching of Chemical Subgraphs”, 9th ICCS, Noordwijkerhout, The Netherlands, 9th June 2011. “A substructure search of indole against eMolecules (~7M at the time) takes 17 seconds” - 2014 Benchmark of 3.4K queries on 7M compounds from eMolecules – John May and Roger Sayle, “Substructure Search Face-Off”, CCNM, Cambridge, May 2015
  • 26. 256th ACS National Meeting, Boston, Aug 2018 SUBSEARCH PERFORMANCE Updated from: John May and Roger Sayle, Substructure Search Face-Off, Presented at CCNM, Cambridge, May 2015 https://www.slideshare.net/NextMoveSoftware/substructure-search-faceoff 1 10 100 1000 3341 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 Time (ms) NumQueries(n) 1s 10s 1m 5m 1h 90% BioVia Direct EPAM Bingo NoSQL ChemAxon JCART RDKit Cart OpenChemLib OB FastSearch 50m35s 1h2m59s 2h9m11s 2h44m47s 5h13m19s 5h53m40s 2d11h42m14s EPAM Bingo Cart Sachem 16m50s
  • 27. 256th ACS National Meeting, Boston, Aug 2018 SUBSEARCH PERFORMANCE John May and Roger Sayle, Substructure Search Face-Off, Presented at CCNM, Cambridge, May 2015 https://www.slideshare.net/NextMoveSoftware/substructure-search-faceoff 1 10 100 1000 3341 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 Time (ms) NumQueries(n) 1s 10s 1m 5m 1h 90% BioVia Direct EPAM Bingo NoSQL ChemAxon JCART RDKit Cart OpenChemLib OB FastSearch 50m35s 1h2m59s 2h9m11s 2h44m47s 5h13m19s 5h53m40s 2d11h42m14s EPAM Bingo Cart Sachem 16m50s Arthor (Brute force) 27m17s Arthor 46s Arthor (8 threads) 12s
  • 28. 256th ACS National Meeting, Boston, Aug 2018 Substructure Optimisations Ahead-of-time (AOT) • Chemical records converted to pointer-free memory optimised data structure (~166B per molecule) • Path-based fingerprint computed and stored in inverted index • Sensible ordering of results Just-in-time (JIT) • SMARTS traversal based on frequency statistics • Atom/Bond expressions compiled and optimised using boolean algebra • Fingerprint screening bit selection
  • 29. 256th ACS National Meeting, Boston, Aug 2018 AOT: Storage order Order by those most similar to the query and favour plain molecules. CID 60795 CID 11669779 CID 11576259 CID 37888405
  • 30. 256th ACS National Meeting, Boston, Aug 2018 AOT: Storage order CID 60795 CID 11669779 CID 11576259 CID 37888405 CID 60795 CID 11669779 CID 11576259 CID 37888405 Order by those most similar to the query and favour plain molecules.
  • 31. 256th ACS National Meeting, Boston, Aug 2018 Storage order Order by those most similar to the query and favour plain molecules. Tanimoto can’t be calculated ahead of time, but can be approximated. Generate a hexadecimal key based on size and other properties favouring “plain” molecules and order by this. 000e000e01000a0004000065000000 CCC(C(=O)O)Oc1ccc(cc1)Cl CHEMBL23477 AtomCountBondCountPartCountCarbonCountCommonHeteroCount AtomicNumberSum RadicalCount ChargeCount IsotopeCount
  • 32. 256th ACS National Meeting, Boston, Aug 2018 JIT: Pattern Traversal The same query can be traversed (and matched) in a different orders. How much slower? Best BrCCCC 3.4x CC(Br)CC 5.6x CCCCBr Best n1ccc2c1cccc2 1.4x c12c(ccn1)cccc2 2.3x c12ccccc1ccn2 3.3x c1cnc2ccccc12 3.3x c12ccnc1cccc2 4.8x c1c2ccccc2nc1 Before the query is matched it is rearranged to the “best” traversal order based on frequency statistics
  • 33. 256th ACS National Meeting, Boston, Aug 2018 SIMILARITY Optimisations Ahead-of-time (AOT) • Store binary fingerprints in buckets based on the cardinality of the fingerprint as the number of set bits: pop(ulation) count • Stripe (or “transpose”) fingerprints reducing the memory reads for the JIT code Just-in-time (JIT) • Generate machine code to perform to calculate the Tanimoto
  • 34. 256th ACS National Meeting, Boston, Aug 2018 TANIMOTO CODE GEN double similarity(long[] q_fp, long[] db_fp) { int intersect = 0; int union = 0; for (int i = 0; i < q_fp.length; i++) { intersect += Long.bitCount(q_fp[i] & db_fp[i]); union += Long.bitCount(q_fp[i] | db_fp[i]); } return intersect / (double) union; } double similarity(long[] q_fp, long[] db_fp, int q_pop, int db_pop) { int intersect = 0; for (int i = 0; i < q_fp.length; i++) { intersect += Long.bitCount(q_fp[i] & db_fp[i]); } return intersect / (double) (q_pop+db_pop-intersect); } Tanimoto Calculation (Java, 64-bit words) Equivalent Tanimoto Calculating Union from Intersect
  • 35. 256th ACS National Meeting, Boston, Aug 2018 TANIMOTO CODE GEN double intersect(long[] q_fp, long[] db_fp) { int pop = 0; for (int i = 0; i < q_fp.length; i++) { intersect += Long.bitCount(q_fp[i] & db_fp[i]); } return pop; } double intersect(long[] q_fp, long[] db_fp) { int pop = 0; intersect += Long.bitCount(q_fp[0] & db_fp[0]); intersect += Long.bitCount(q_fp[1] & db_fp[1]); intersect += Long.bitCount(q_fp[2] & db_fp[2]); intersect += Long.bitCount(q_fp[3] & db_fp[3]); intersect += Long.bitCount(q_fp[4] & db_fp[4]); intersect += Long.bitCount(q_fp[5] & db_fp[5]); intersect += Long.bitCount(q_fp[6] & db_fp[6]); intersect += Long.bitCount(q_fp[7] & db_fp[7]); intersect += Long.bitCount(q_fp[8] & db_fp[8]); intersect += Long.bitCount(q_fp[9] & db_fp[9]); intersect += Long.bitCount(q_fp[10] & db_fp[10]); intersect += Long.bitCount(q_fp[11] & db_fp[11]); intersect += Long.bitCount(q_fp[12] & db_fp[12]); intersect += Long.bitCount(q_fp[13] & db_fp[13]); intersect += Long.bitCount(q_fp[14] & db_fp[14]); intersect += Long.bitCount(q_fp[15] & db_fp[15]); return pop; } Intersect Function Intersect Function Unrolled
  • 36. 256th ACS National Meeting, Boston, Aug 2018 CHEMBL1906145 TANIMOTO CODE GEN int intersectChembl1906145(long[] db_fp) { int pop = 0; pop += Long.bitCount(0x0000000000000000L & db_fp[1]); pop += Long.bitCount(0x0000000000000000L & db_fp[1]); pop += Long.bitCount(0x0000000000400020L & db_fp[2]); pop += Long.bitCount(0x0010000008000002L & db_fp[3]); pop += Long.bitCount(0x0160000000000200L & db_fp[4]); pop += Long.bitCount(0x00000800000a1000L & db_fp[5]); pop += Long.bitCount(0x1000000001580000L & db_fp[6]); pop += Long.bitCount(0x0800002000000000L & db_fp[7]); pop += Long.bitCount(0x0000000000000841L & db_fp[8]); pop += Long.bitCount(0x0000000006000100L & db_fp[9]); pop += Long.bitCount(0x0000280002002100L & db_fp[10]); pop += Long.bitCount(0x0100000000048000L & db_fp[11]); pop += Long.bitCount(0x0000002088000000L & db_fp[12]); pop += Long.bitCount(0x0000008000400000L & db_fp[13]); pop += Long.bitCount(0x0008000000000100L & db_fp[14]); pop += Long.bitCount(0x0000000000010180L & db_fp[15]); return pop; } For a given query (e.g. ) we can hard code the fingerprint.
  • 37. 256th ACS National Meeting, Boston, Aug 2018 CHEMBL1906145 TANIMOTO CODE GEN bitCount on empty and singleton words (for ) can be eliminated. int intersectChembl1906145(long[] db_fp) { int pop = 0; pop += (db_fp[0] >> 2) & 0x1; // pop += Long.bitCount(0x0000000000000000L & db_fp[1]); pop += Long.bitCount(0x0000000000400020L & db_fp[2]); pop += Long.bitCount(0x0010000008000002L & db_fp[3]); pop += Long.bitCount(0x0160000000000200L & db_fp[4]); pop += Long.bitCount(0x00000800000a1000L & db_fp[5]); pop += Long.bitCount(0x1000000001580000L & db_fp[6]); pop += Long.bitCount(0x0800002000000000L & db_fp[7]); pop += Long.bitCount(0x0000000000000841L & db_fp[8]); pop += Long.bitCount(0x0000000006000100L & db_fp[9]); pop += Long.bitCount(0x0000280002002100L & db_fp[10]); pop += Long.bitCount(0x0100000000048000L & db_fp[11]); pop += Long.bitCount(0x0000002088000000L & db_fp[12]); pop += Long.bitCount(0x0000008000400000L & db_fp[13]); pop += Long.bitCount(0x0008000000000100L & db_fp[14]); pop += Long.bitCount(0x0000000000010180L & db_fp[15]); return pop; }
  • 38. 256th ACS National Meeting, Boston, Aug 2018 2 6 13 3 12 4 11 5 10 14 15 79 8 To optimise the remaining 64-bit words (numbered 2-15) we can derive a graph by connecting any two words that share a common bit. TANIMOTO CODE GEN
  • 39. 256th ACS National Meeting, Boston, Aug 2018 TANIMOTO CODE GEN 2 6 13 3 12 4 11 5 10 14 15 79 8 Colouring the graph (such that no two colours are adjacent) tells us how many pop counts we will need (the number of colours).
  • 40. 256th ACS National Meeting, Boston, Aug 2018 2 6 13 3 12 4 11 5 10 14 15 79 8 TANIMOTO CODE GEN int intersectChembl1906145(long[] db_fp) { int pop = 0; pop += (db_fp[0] >> 2) & 0x1; // pop += Long.bitCount(0x0000000000000000L & db_fp[1]); pop += Long.bitCount((0x0000000000400020L & db_fp[2]) | (0x00000800000a1000L & db_fp[5]) | (0x0000000006000100L & db_fp[9]) | (0x0010000008000002L & db_fp[3]) | (0x0800002000000000L & db_fp[7]) | (0x0160000000000200L & db_fp[4])); pop += Long.bitCount((0x1000000001580000L & db_fp[6]) | (0x0000280002002100L & db_fp[10]) | (0x0100000000048000L & db_fp[11]) | (0x0000002088000000L & db_fp[12]) | (0x0000000000000841L & db_fp[8])); pop += Long.bitCount((0x0000008000400000L & db_fp[13]) | (0x0008000000000100L & db_fp[14])); pop += Long.bitCount(0x0000000000010180L & db_fp[15]); return pop; } We can combine bitCount on words of the same colour
  • 41. 256th ACS National Meeting, Boston, Aug 2018 Speedy tools for structure searching • Quick feedback from a search allows refinement if needed • Enables different types of search (e.g. make/break) Speedy tools for text-mining patents • Assists in improvement of grammar and dictionaries • Extract from all patents not just a subset of IPC codes CONCLUSIONS Future Work • Extract additional types of chemical data • Advanced query features beyond SMARTS
  • 42. 256th ACS National Meeting, Boston, Aug 2018 Acknowledgements Yurii Moroz, Chemspace Pat Walters, Relay Therapeutics James Davidson, Vernalis Mathew Swain, Vernalis Daniel Lowe, Minesoft Related Talks: • R Sayle. Recent Advances in Chemical & Biological Search Systems: Evolution v Resolution. ICCS, May 2018 • J Mayfield, Pistachio: Search and Faceting of Large Reaction Databases. 254th ACS National Meeting, Aug 2017 • D Lowe. Sketchy sketches: Hiding chemistry in plain sight. 252nd ACS National Meeting, Aug 2016 Available at: https://www.slideshare.net/NextMoveSoftware CINF 162: NextMove for Chemspace: Millisecond search in a database of 100 million structures. Thursday 10:25, Grand Ballroom A CINF 170: Regioselectivity: An application of expert systems and ontologies to chemical (named) reaction analysis. Thursday 10:40, Lewis