SlideShare a Scribd company logo
AI Drug Discovery in Patent
Space
Hanjo Kim
Principal Scientist at Standigm Inc.
hanjo.kim@standigm.com
business@standigm.com
apply@standigm.com
www.standigm.com
Disclaimer
• Statements of fact and opinions expressed in this presentation
and on the following slides are solely those of the presenter and
not necessarily those of Standigm Inc.
Standigm Inc.
2015
Founded by three researchers at Samsung Advanced Institute of Technology
Jinhan Kim, PhD Artificial Intelligence (The University of Edinburgh)
Sang Ok Song, PhD Chemical Engineering (Seoul National University)
So Jeong Yun, PhD Systems Biology (POSTECH)
$23M
Funding raised
SK Holdings, Mirae Asset Capital, Mirae Asset Venture Investment, DSC
Investment, Wonik Investment, Atinum Investment, LB Investment, Kakao
Ventures
Seoul Korea (33)
Ann Arbor
Michigan (2)
Standigm= drug discovery company that generates and optimizes therapeutic
lead compounds by using advanced artificial intelligence toward license-out
Cambridge
UK (1)
AI, 16
Biology, 6
Chemistry, 8
Systems Biology,
4
Advisor, 3
PhD
20/37*
* Except Operation 5, Patent attorney 1
The AI solution
Disease Hit Lead Preclinical Clinical Drug
Drug
repositioning
The Standigm AI solution is industrializing drug discovery
Discovery at Scale
Target
* developing
BEST
TM
ASK
TM
Insight
TM
FIRST
*
Standigm ASKTM is freely available at
https://icluenask.standigm.com
Standigm BEST Platform
Standigm BESTStandigm
ASK
Knowledge
based biology
platform
for
novel targets,
pathways, and
MoA discovery
Standigm
FIRST
Hit generation
platform
for
novel and/or
undruggable
targets
Generative Models
Graph-based VAE
Scaffold-based
conditional enumerator
Novel Molecular
Representation
Scoring Functions
Simulations
AI rescoring models
Machine learning models
Compound Database
Known Molecules
Seed Molecules
Novel Virtual Structures
Commercial Library Privileged Standigm Library
Target Database Public data (gene, protein, function) BEST Feasibility
Public Library
Strategy setup Hit Generation Hit-2-Lead
Predictive Models
ADME/Tox predictors
Novelty (patentability)
Synthetic accessibility
Filters/Ranking models
External
CROs
Organic
synthesis,
In vitro/in vivo
Assays
Novel/Commercial Hits Lead Series
Graph-based VAE
Chemical
space
Encoder Decoder
Latent
space
Chemical
space
E DZ
Learning chemical space
Training DB
~4M
Y
Property/Target information
Contextualizing:
- substructures
- topology
- shape
- etc
property 1
property 2
property 3
Z : latent space
predictor
q(y|z)
seed molecules
decoder
p(x|z)
X : original chemical space
encoder
q(z|x)
Analogue structure generation
functionally similar
but novel scaffolds/molecules
Lead optimization
novel molecules
w/ better desired properties
decoder
p(x|z)
Smart library expansion
IP generation & expansion
Patent Space
Target A Compounds in latent space
Competitor 1
Competitor 2
Competitor 3
Interesting Area
potentweak
Chemical Space Navigation
• Chemical Space ~ Map
• Known scaffolds ~ POIs
• Information-rich space (ChEMBL, PubChem Bioassays, etc.)
• Novel scaffold ~ New POI
• El Dorado
• Patent
• Markush structure: How to protect as wide as possible area
• Exemplified compounds: boundary stones
Using ChemCurator
• Project types
• Google Patents (most cases)
• PDF files (do not use pdf files!)
• Text files (when google ocr is not good)
Using ChemCurator
Google patents
Using ChemCurator
Text files
OCR (and chemical OCR)
• Lessons
• Google patents is reliable in most cases
• It even provides the compound table though very primitive
• Professional OCR software can give better results
• Convert pdf file to plain text with chemical names
• Complex tables
• Image (not OCRed) tables (next 3 slides)
• Chemical OCR engine helps a lot
• Text-image comparison
• Chemical OCR engines
• CLiDE (recommended, proprietary)
• Osra (open-source, recommended on Linux machine)
• Imago (I have no experience)
• Unsupported engines (like ChemGrapher,
https://pubs.acs.org/doi/10.1021/acs.jcim.0c00459)
Chemical structures in patens
Chemical structures in patents
Chemical structures in patents
Better OCR result
Markush Structures
• Very expressive
• Same set of compounds can be written to very different forms
• Not well-validated
• ChemCurator helps
• Extracting example compounds
• Matching them to the Markush structure
• Require manual correction
• Sentence to chemical groups
• Ambiguous/incomplete R-group definitions
AI can help
• Reduction of frequent text OCR error
• NLP technique can correct frequent OCR errors
• The availability of large training set is important
• Extraction of relevant data
• Biological activities
• Analytical data
• Chemical OCR can be improved
• AI can do image recognition very well
• Different drawing styles can be managed
Acknowledgement
• Standigm Inc.
• Sanghyung JIN, Minkyu HA, Soyeon Kim, Sangok SONG
• T&J Tech. (Korean distributor)
• Jung-A HAN

More Related Content

Similar to Patent Data for Artificial Intelligence based Drug Discovery

Semantic Solutions from Information Exploration.pptx
Semantic Solutions from Information Exploration.pptxSemantic Solutions from Information Exploration.pptx
Semantic Solutions from Information Exploration.pptx
Information Exploration
 
IT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conferenceIT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conference
Albert Yefimov
 
A Peek Into a Must-Have Add-On Solution for Oracle Clinical
A Peek Into a Must-Have Add-On Solution for Oracle ClinicalA Peek Into a Must-Have Add-On Solution for Oracle Clinical
A Peek Into a Must-Have Add-On Solution for Oracle Clinical
Perficient
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 Keynote
ICSM 2011
 
Liquilume NSF Final Presentation
Liquilume NSF Final PresentationLiquilume NSF Final Presentation
Liquilume NSF Final Presentation
Stanford University
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish research
Royal Society of Chemistry
 
Short TRIZ Workshop for the University of the Philippines
Short TRIZ Workshop for the University of the PhilippinesShort TRIZ Workshop for the University of the Philippines
Short TRIZ Workshop for the University of the Philippines
Richard Platt
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
Dr. Haxel Consult
 

Similar to Patent Data for Artificial Intelligence based Drug Discovery (20)

Semantic Solutions from Information Exploration.pptx
Semantic Solutions from Information Exploration.pptxSemantic Solutions from Information Exploration.pptx
Semantic Solutions from Information Exploration.pptx
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse
 
artifical intelligence (ai), robotics and cf in pharmaceutical dynamics
artifical intelligence (ai), robotics and cf in pharmaceutical dynamicsartifical intelligence (ai), robotics and cf in pharmaceutical dynamics
artifical intelligence (ai), robotics and cf in pharmaceutical dynamics
 
Nesher Tech I-Corps@NIH 121014
Nesher Tech I-Corps@NIH 121014Nesher Tech I-Corps@NIH 121014
Nesher Tech I-Corps@NIH 121014
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
IT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conferenceIT Cluster Skolkovo Presentation at FRUCT.org conference
IT Cluster Skolkovo Presentation at FRUCT.org conference
 
Osp 1st sep2015 OSDD
Osp 1st sep2015 OSDDOsp 1st sep2015 OSDD
Osp 1st sep2015 OSDD
 
A Peek Into a Must-Have Add-On Solution for Oracle Clinical
A Peek Into a Must-Have Add-On Solution for Oracle ClinicalA Peek Into a Must-Have Add-On Solution for Oracle Clinical
A Peek Into a Must-Have Add-On Solution for Oracle Clinical
 
Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
Predicting medical tests results using Driverless AI
Predicting medical tests results using Driverless AIPredicting medical tests results using Driverless AI
Predicting medical tests results using Driverless AI
 
IntroVision investment
IntroVision investmentIntroVision investment
IntroVision investment
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 Keynote
 
Liquilume NSF Final Presentation
Liquilume NSF Final PresentationLiquilume NSF Final Presentation
Liquilume NSF Final Presentation
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish research
 
Short TRIZ Workshop for the University of the Philippines
Short TRIZ Workshop for the University of the PhilippinesShort TRIZ Workshop for the University of the Philippines
Short TRIZ Workshop for the University of the Philippines
 
Why and How to do a Software Startup
Why and How to do a Software StartupWhy and How to do a Software Startup
Why and How to do a Software Startup
 
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities  ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
 
Indian Jugaad Technology (Frugal Engineering)
Indian Jugaad Technology (Frugal Engineering)Indian Jugaad Technology (Frugal Engineering)
Indian Jugaad Technology (Frugal Engineering)
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 

More from ChemAxon

Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
ChemAxon
 

More from ChemAxon (20)

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
 
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5Pasteur Institute User Story - Cheminfo Stories 2020 Day 5
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Recently uploaded (20)

AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
How To Build a Successful SaaS Design.pdf
How To Build a Successful SaaS Design.pdfHow To Build a Successful SaaS Design.pdf
How To Build a Successful SaaS Design.pdf
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data Migration
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 

Patent Data for Artificial Intelligence based Drug Discovery

  • 1. AI Drug Discovery in Patent Space Hanjo Kim Principal Scientist at Standigm Inc. hanjo.kim@standigm.com business@standigm.com apply@standigm.com www.standigm.com
  • 2. Disclaimer • Statements of fact and opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of Standigm Inc.
  • 3. Standigm Inc. 2015 Founded by three researchers at Samsung Advanced Institute of Technology Jinhan Kim, PhD Artificial Intelligence (The University of Edinburgh) Sang Ok Song, PhD Chemical Engineering (Seoul National University) So Jeong Yun, PhD Systems Biology (POSTECH) $23M Funding raised SK Holdings, Mirae Asset Capital, Mirae Asset Venture Investment, DSC Investment, Wonik Investment, Atinum Investment, LB Investment, Kakao Ventures Seoul Korea (33) Ann Arbor Michigan (2) Standigm= drug discovery company that generates and optimizes therapeutic lead compounds by using advanced artificial intelligence toward license-out Cambridge UK (1) AI, 16 Biology, 6 Chemistry, 8 Systems Biology, 4 Advisor, 3 PhD 20/37* * Except Operation 5, Patent attorney 1
  • 4. The AI solution Disease Hit Lead Preclinical Clinical Drug Drug repositioning The Standigm AI solution is industrializing drug discovery Discovery at Scale Target * developing BEST TM ASK TM Insight TM FIRST * Standigm ASKTM is freely available at https://icluenask.standigm.com
  • 5. Standigm BEST Platform Standigm BESTStandigm ASK Knowledge based biology platform for novel targets, pathways, and MoA discovery Standigm FIRST Hit generation platform for novel and/or undruggable targets Generative Models Graph-based VAE Scaffold-based conditional enumerator Novel Molecular Representation Scoring Functions Simulations AI rescoring models Machine learning models Compound Database Known Molecules Seed Molecules Novel Virtual Structures Commercial Library Privileged Standigm Library Target Database Public data (gene, protein, function) BEST Feasibility Public Library Strategy setup Hit Generation Hit-2-Lead Predictive Models ADME/Tox predictors Novelty (patentability) Synthetic accessibility Filters/Ranking models External CROs Organic synthesis, In vitro/in vivo Assays Novel/Commercial Hits Lead Series
  • 6. Graph-based VAE Chemical space Encoder Decoder Latent space Chemical space E DZ Learning chemical space Training DB ~4M Y Property/Target information Contextualizing: - substructures - topology - shape - etc property 1 property 2 property 3 Z : latent space predictor q(y|z) seed molecules decoder p(x|z) X : original chemical space encoder q(z|x) Analogue structure generation functionally similar but novel scaffolds/molecules Lead optimization novel molecules w/ better desired properties decoder p(x|z) Smart library expansion IP generation & expansion
  • 7. Patent Space Target A Compounds in latent space Competitor 1 Competitor 2 Competitor 3 Interesting Area potentweak
  • 8. Chemical Space Navigation • Chemical Space ~ Map • Known scaffolds ~ POIs • Information-rich space (ChEMBL, PubChem Bioassays, etc.) • Novel scaffold ~ New POI • El Dorado • Patent • Markush structure: How to protect as wide as possible area • Exemplified compounds: boundary stones
  • 9. Using ChemCurator • Project types • Google Patents (most cases) • PDF files (do not use pdf files!) • Text files (when google ocr is not good)
  • 12. OCR (and chemical OCR) • Lessons • Google patents is reliable in most cases • It even provides the compound table though very primitive • Professional OCR software can give better results • Convert pdf file to plain text with chemical names • Complex tables • Image (not OCRed) tables (next 3 slides) • Chemical OCR engine helps a lot • Text-image comparison • Chemical OCR engines • CLiDE (recommended, proprietary) • Osra (open-source, recommended on Linux machine) • Imago (I have no experience) • Unsupported engines (like ChemGrapher, https://pubs.acs.org/doi/10.1021/acs.jcim.0c00459)
  • 17. Markush Structures • Very expressive • Same set of compounds can be written to very different forms • Not well-validated • ChemCurator helps • Extracting example compounds • Matching them to the Markush structure • Require manual correction • Sentence to chemical groups • Ambiguous/incomplete R-group definitions
  • 18. AI can help • Reduction of frequent text OCR error • NLP technique can correct frequent OCR errors • The availability of large training set is important • Extraction of relevant data • Biological activities • Analytical data • Chemical OCR can be improved • AI can do image recognition very well • Different drawing styles can be managed
  • 19. Acknowledgement • Standigm Inc. • Sanghyung JIN, Minkyu HA, Soyeon Kim, Sangok SONG • T&J Tech. (Korean distributor) • Jung-A HAN