SlideShare a Scribd company logo
Chemoinformatics and
information management

Peter Willett, University of Sheffield, UK
Overview
• What is chemoinformatics and why is it
  necessary
• Managing structural information
• Typical facilities in chemoinformatics
  software
• Examples of current research
Drug discovery: I
• Drug discovery is a vastly complex, multi-disciplinary
  task that can extend over two decades
• The total cost for the discovery and development of a
  novel therapeutic agent is now ca. $1.5B
• Even so, only about 1 in 3 cover the R&D costs
   • But when they can do the pay-offs can be massive: Lipitor in
     2006 made $12.5B (cf MS Windows and Boeing 747)
• Patent cover is 20 years from initial announcement
   • Time is money so need to find potential drugs (and to reject non-
     drugs) much faster (and similarly for agrochemicals)
Drug discovery: II
• Chemoinformatics is one way of increasing the cost
  effectiveness of drug discovery
• Initial work in chemoinformatics as early as the Sixties:
  current interest because of developments in
   • Combinatorial chemistry
   • High throughput screening (HTS)
   • Change from sequential to massively parallel processing
• Resulting explosion in the amounts of data available in
  drug-discovery programmes, and an increased interest
  in computational methods
   • Focus on chemical structure diagram, cf development of other
     types of -informatics specialisms
Definitions
•   F.K. Brown (1998). Annual Reports in Medicinal Chemistry, 33,
    375-384
    • “The use of information technology and management has become
      a critical part of the drug discovery process. Chemoinformatics is
      the mixing of those information resources to transform data into
      information and information into knowledge for the intended
      purpose of making better decisions faster in the area of drug lead
      identification and optimization”
•   G. Paris (August 1999 ACS meeting), quoted by W.A. Warr at
    http://www.warr.com/warrzone.htm
    • “Chem(o)informatics is a generic term that encompasses the
      design, creation, organization, management, retrieval, analysis,
      dissemination, visualization and use of chemical information”
•   J. Gasteiger and T. Engels (editors) (2003). Chemoinformatics:
    a textbook. Wiley-VCH.
    •   “Chemoinformatics is the application of informatics methods to
        solve chemical problems.”
Representation of molecules
• Need for a machine-readable representation
  • 1D – computed/experimental global properties
  • 2D – the chemical structure diagram
  • 3D – atomic coordinate data
• 1D representations handled using conventional
  DBMS software
• Need to manipulate 2D and 3D data
Connection tables

                     9       1   C   2   2   6   1   7   1
                     O       2   C   1   2   3   1
         2                   3   C   2   1   4   2
             1
     3                       4   C   3   2   5   1
                     7
                         8   5   C   4   1   6   2
    4                        6   C   1   1   5   2
                 6
                             7   C   1   1   8   1   9   2
         5                   8   C   7   1
                             9   O   7   2



• An unambiguous representation of a 2D chemical
  structure diagram
• A connection table is a graph, the underlying data
  structure in chemoinformatics
Graph theory and chemistry
• Graph theory
   • Branch of mathematics that describes sets of objects, called
     nodes and the relationships between them, called edges
                                                         O
• A 2D connection table is a graph:                                 Br
   • Nodes correspond to atoms
   • Edges correspond to bonds
                                                  NH 2
• Graph matching algorithms
   • Search chemical databases
• Generation of other representations
Types of search
• Exact structure search (hashed connection table with
  graph isomorphism for collision handling)
• Substructure search (subgraph isomorphism)
   • cf partial or boolean matching in text
• Similarity searching (maximal common subgraph
  isomorphism (or simpler))
   • cf best match search or web searching
• Graph matching algorithms are effective
   • But time is factorial with the number of nodes
   • Need for efficient heuristics
Fingerprints                C
                                         O
                                       C C C
                        C   C   C
                            C




• A fingerprint (or fragment bit-string) is a binary vector
  encoding the presence (“1”) or absence (“0”) of
  fragment substructures in a molecule
• Each bit in the fingerprint represents one molecular
  fragment. Typical length is ~1000 bits
• An approximate representation, but one that can be
  processed very efficiently and hence often used as a
  precursor to graph matching
Chemoinformatics facilities
• Database searching as described previously
   • Structure and substructure searching originally
   • Similarity searching from mid-Eighties
   • 3D substructure searching from mid-Nineties (first rigid then
     flexible)
• Applications
   •   Database clustering
   •   Molecular diversity analysis
   •   Drug-likeness
   •   Virtual screening
        Ligand-based
        Structure-based
3D substructure
                               searching
• Generation of pharmacophore patterns
• Use of MOGA and hyperstructure approaches
                       O           a = 8.62+ 0.58 Angstroms
                                           -                                                       N
                                                                                                               O
                                   b = 7.08+ 0.56 Angstroms
                                           -
               c           a       c = 3.35+ 0.65 Angstroms                                                O
                                           -
                                                                                               O                   O


                       b       N                                                                       O       O
       O
                                                                               S
                                                                                           N
               O                                                           N
                                O
        O                  O                                                               N
                                     O                                         N
                                                   N
                                                                                               O
                       O                                                               O
                                               N       N               N                                   O
                           O                                                               O           O P O O
                                                   N       O                       N
                                                                   N
           O       N                                                                                       O   P O
                                                       O                           N                                   O
                                                               O       N
                           N                                                                                   O P     O
                                                                                           O
                                              O                            O                                       O
                           N   O                       O
                                                                                   O           O
Similarity searching
                     using 2D fingerprints
    Use of data fusion methods to enhance performance,
    combining information from multiple searches


                                   H
                                   N        O
        H    H                                                  H2N
        N    N       OH                                    H
N                                  N        NH2            N
                                                  N
                                   Q uery
        N
                                                           N

                          OH
                                                      H   H2N
HO                                                    N
             N
                               H                      N
                 N             N

                               N
Molecular modelling and QSAR
• Use of computational chemistry to obtain the structures
  and properties of small molecules
   • Quantum mechanics
   • Molecular dynamics
   • Molecular modelling
• Statistical correlation of structure (however described)
  with physical, chemical and biological properties
   • Initially biological activity (QSAR)
   • Now pharmacokinetics and toxicity (ADMET)
Integration with database searching
• Related, but largely separate, research areas for many
  years
   • Simple search operations on very large numbers of molecules
   • Increasingly complex operations on smaller and smaller
     (normally homogeneous) datasets
   • Substructural analysis as an early, notable exception
• The future lies in the integration of these two
  approaches, applying more sophisticated methods on
  larger datasets
   • Docking now well established
   • Property calculations at a database level
   • ADMET
General references
J. Gasteiger (ed.), Handbook of Chemoinformatics (Wiley-VCH,
   Weinheim, 2003).
W.L. Chen, Chemoinformatics: past, present and future, Journal of
  Chemical Information and Modeling 46 (2006) 2230-2255.
D.J. Wild and G.D. Wiggins, Challenges for chemoinformatics
   education in drug discovery, Drug Discovery Today 11 (2006) 436-
   439.
A.R. Leach and V.J. Gillet, An Introduction to Chemoinformatics
   (Kluwer, Dordrecht, 2nd sedition, 2007).
P. Willett, A bibliometric analysis of chemoinformatics, Aslib
   Proceedings 60 (2008) 4-17.

More Related Content

Viewers also liked

An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
Devakumar Jain
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminar
Haitham Hijazi
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
Surmil Shah
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessProf. Dr. Basavaraj Nanjwade
 
Substructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph DatabasesSubstructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph Databasespgst
 
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.pptURBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.pptgrssieee
 
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
Dr. Haxel Consult
 
Code camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingCode camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific Thinking
Mitch Miller
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Chem spider introduction spring 2011
Chem spider introduction spring 2011Chem spider introduction spring 2011
Chem spider introduction spring 2011
Royal Society of Chemistry
 
Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...
James Jeffryes
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
Jeremy Besnard
 
Bio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsBio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformatics
abdelazim Galal
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Sean Ekins
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Hezekiah Fatoki
 
Data Structures
Data StructuresData Structures
Data Structures
Nitesh Bichwani
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug designReihaneh Safavi
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
Dr. Paulsharma Chakravarthy
 

Viewers also liked (20)

An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminar
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Substructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph DatabasesSubstructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph Databases
 
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.pptURBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
 
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
 
Code camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingCode camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific Thinking
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Chem spider introduction spring 2011
Chem spider introduction spring 2011Chem spider introduction spring 2011
Chem spider introduction spring 2011
 
Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Bio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsBio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformatics
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
 
Data Structures
Data StructuresData Structures
Data Structures
 
AVL Tree
AVL TreeAVL Tree
AVL Tree
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug design
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
 

Similar to Chemoinformatics and information management

kddvince
kddvincekddvince
kddvince
Jiangwen Wei
 
Naked DNA And DNA Vaccines A Retrospective
Naked DNA And DNA Vaccines  A RetrospectiveNaked DNA And DNA Vaccines  A Retrospective
Naked DNA And DNA Vaccines A Retrospective
rwmalonemd
 
Dr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsisDr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsis
vibhabhagat2007
 
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
Structure-Activity Relationships and Networks: A Generalized Approachto Expl...Structure-Activity Relationships and Networks: A Generalized Approachto Expl...
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...Rajarshi Guha
 
Postdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWDPostdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWD
Sean Clancy, Ph.D.
 
Lessons learned - the pharma experience
Lessons learned  - the pharma experienceLessons learned  - the pharma experience
Lessons learned - the pharma experienceDESCA_2012
 

Similar to Chemoinformatics and information management (6)

kddvince
kddvincekddvince
kddvince
 
Naked DNA And DNA Vaccines A Retrospective
Naked DNA And DNA Vaccines  A RetrospectiveNaked DNA And DNA Vaccines  A Retrospective
Naked DNA And DNA Vaccines A Retrospective
 
Dr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsisDr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsis
 
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
Structure-Activity Relationships and Networks: A Generalized Approachto Expl...Structure-Activity Relationships and Networks: A Generalized Approachto Expl...
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
 
Postdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWDPostdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWD
 
Lessons learned - the pharma experience
Lessons learned  - the pharma experienceLessons learned  - the pharma experience
Lessons learned - the pharma experience
 

More from Duncan Hull

Why study plants?
Why study plants?Why study plants?
Why study plants?
Duncan Hull
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculum
Duncan Hull
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Duncan Hull
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
Duncan Hull
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Duncan Hull
 
OWL and OBO
OWL and OBOOWL and OBO
OWL and OBO
Duncan Hull
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
Duncan Hull
 
How to Blog
How to BlogHow to Blog
How to Blog
Duncan Hull
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
Duncan Hull
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenID
Duncan Hull
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible Scientist
Duncan Hull
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
Duncan Hull
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging Dangerously
Duncan Hull
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Duncan Hull
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
Duncan Hull
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-up
Duncan Hull
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literature
Duncan Hull
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and
Duncan Hull
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
Duncan Hull
 

More from Duncan Hull (20)

Why study plants?
Why study plants?Why study plants?
Why study plants?
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculum
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
 
OWL and OBO
OWL and OBOOWL and OBO
OWL and OBO
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
How to Blog
How to BlogHow to Blog
How to Blog
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenID
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible Scientist
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging Dangerously
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-up
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literature
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

Chemoinformatics and information management

  • 1. Chemoinformatics and information management Peter Willett, University of Sheffield, UK
  • 2. Overview • What is chemoinformatics and why is it necessary • Managing structural information • Typical facilities in chemoinformatics software • Examples of current research
  • 3. Drug discovery: I • Drug discovery is a vastly complex, multi-disciplinary task that can extend over two decades • The total cost for the discovery and development of a novel therapeutic agent is now ca. $1.5B • Even so, only about 1 in 3 cover the R&D costs • But when they can do the pay-offs can be massive: Lipitor in 2006 made $12.5B (cf MS Windows and Boeing 747) • Patent cover is 20 years from initial announcement • Time is money so need to find potential drugs (and to reject non- drugs) much faster (and similarly for agrochemicals)
  • 4. Drug discovery: II • Chemoinformatics is one way of increasing the cost effectiveness of drug discovery • Initial work in chemoinformatics as early as the Sixties: current interest because of developments in • Combinatorial chemistry • High throughput screening (HTS) • Change from sequential to massively parallel processing • Resulting explosion in the amounts of data available in drug-discovery programmes, and an increased interest in computational methods • Focus on chemical structure diagram, cf development of other types of -informatics specialisms
  • 5. Definitions • F.K. Brown (1998). Annual Reports in Medicinal Chemistry, 33, 375-384 • “The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization” • G. Paris (August 1999 ACS meeting), quoted by W.A. Warr at http://www.warr.com/warrzone.htm • “Chem(o)informatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information” • J. Gasteiger and T. Engels (editors) (2003). Chemoinformatics: a textbook. Wiley-VCH. • “Chemoinformatics is the application of informatics methods to solve chemical problems.”
  • 6. Representation of molecules • Need for a machine-readable representation • 1D – computed/experimental global properties • 2D – the chemical structure diagram • 3D – atomic coordinate data • 1D representations handled using conventional DBMS software • Need to manipulate 2D and 3D data
  • 7. Connection tables 9 1 C 2 2 6 1 7 1 O 2 C 1 2 3 1 2 3 C 2 1 4 2 1 3 4 C 3 2 5 1 7 8 5 C 4 1 6 2 4 6 C 1 1 5 2 6 7 C 1 1 8 1 9 2 5 8 C 7 1 9 O 7 2 • An unambiguous representation of a 2D chemical structure diagram • A connection table is a graph, the underlying data structure in chemoinformatics
  • 8. Graph theory and chemistry • Graph theory • Branch of mathematics that describes sets of objects, called nodes and the relationships between them, called edges O • A 2D connection table is a graph: Br • Nodes correspond to atoms • Edges correspond to bonds NH 2 • Graph matching algorithms • Search chemical databases • Generation of other representations
  • 9. Types of search • Exact structure search (hashed connection table with graph isomorphism for collision handling) • Substructure search (subgraph isomorphism) • cf partial or boolean matching in text • Similarity searching (maximal common subgraph isomorphism (or simpler)) • cf best match search or web searching • Graph matching algorithms are effective • But time is factorial with the number of nodes • Need for efficient heuristics
  • 10. Fingerprints C O C C C C C C C • A fingerprint (or fragment bit-string) is a binary vector encoding the presence (“1”) or absence (“0”) of fragment substructures in a molecule • Each bit in the fingerprint represents one molecular fragment. Typical length is ~1000 bits • An approximate representation, but one that can be processed very efficiently and hence often used as a precursor to graph matching
  • 11. Chemoinformatics facilities • Database searching as described previously • Structure and substructure searching originally • Similarity searching from mid-Eighties • 3D substructure searching from mid-Nineties (first rigid then flexible) • Applications • Database clustering • Molecular diversity analysis • Drug-likeness • Virtual screening Ligand-based Structure-based
  • 12. 3D substructure searching • Generation of pharmacophore patterns • Use of MOGA and hyperstructure approaches O a = 8.62+ 0.58 Angstroms - N O b = 7.08+ 0.56 Angstroms - c a c = 3.35+ 0.65 Angstroms O - O O b N O O O S N O N O O O N O N N O O O N N N O O O O P O O N O N N O N O P O O N O O N N O P O O O O O N O O O O
  • 13. Similarity searching using 2D fingerprints Use of data fusion methods to enhance performance, combining information from multiple searches H N O H H H2N N N OH H N N NH2 N N Q uery N N OH H H2N HO N N H N N N N
  • 14. Molecular modelling and QSAR • Use of computational chemistry to obtain the structures and properties of small molecules • Quantum mechanics • Molecular dynamics • Molecular modelling • Statistical correlation of structure (however described) with physical, chemical and biological properties • Initially biological activity (QSAR) • Now pharmacokinetics and toxicity (ADMET)
  • 15. Integration with database searching • Related, but largely separate, research areas for many years • Simple search operations on very large numbers of molecules • Increasingly complex operations on smaller and smaller (normally homogeneous) datasets • Substructural analysis as an early, notable exception • The future lies in the integration of these two approaches, applying more sophisticated methods on larger datasets • Docking now well established • Property calculations at a database level • ADMET
  • 16. General references J. Gasteiger (ed.), Handbook of Chemoinformatics (Wiley-VCH, Weinheim, 2003). W.L. Chen, Chemoinformatics: past, present and future, Journal of Chemical Information and Modeling 46 (2006) 2230-2255. D.J. Wild and G.D. Wiggins, Challenges for chemoinformatics education in drug discovery, Drug Discovery Today 11 (2006) 436- 439. A.R. Leach and V.J. Gillet, An Introduction to Chemoinformatics (Kluwer, Dordrecht, 2nd sedition, 2007). P. Willett, A bibliometric analysis of chemoinformatics, Aslib Proceedings 60 (2008) 4-17.