Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chemoinformatics and information management


Published on

Presentation by Peter Willett, March 2008 in Manchester

Published in: Technology
  • Fioricet is often prescribed for tension headaches caused by contractions of the muscles in the neck and shoulder area. Buy now from and make a deal for you.
    Are you sure you want to  Yes  No
    Your message goes here

Chemoinformatics and information management

  1. 1. Chemoinformatics and information management Peter Willett, University of Sheffield, UK
  2. 2. Overview • What is chemoinformatics and why is it necessary • Managing structural information • Typical facilities in chemoinformatics software • Examples of current research
  3. 3. Drug discovery: I • Drug discovery is a vastly complex, multi-disciplinary task that can extend over two decades • The total cost for the discovery and development of a novel therapeutic agent is now ca. $1.5B • Even so, only about 1 in 3 cover the R&D costs • But when they can do the pay-offs can be massive: Lipitor in 2006 made $12.5B (cf MS Windows and Boeing 747) • Patent cover is 20 years from initial announcement • Time is money so need to find potential drugs (and to reject non- drugs) much faster (and similarly for agrochemicals)
  4. 4. Drug discovery: II • Chemoinformatics is one way of increasing the cost effectiveness of drug discovery • Initial work in chemoinformatics as early as the Sixties: current interest because of developments in • Combinatorial chemistry • High throughput screening (HTS) • Change from sequential to massively parallel processing • Resulting explosion in the amounts of data available in drug-discovery programmes, and an increased interest in computational methods • Focus on chemical structure diagram, cf development of other types of -informatics specialisms
  5. 5. Definitions • F.K. Brown (1998). Annual Reports in Medicinal Chemistry, 33, 375-384 • “The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization” • G. Paris (August 1999 ACS meeting), quoted by W.A. Warr at • “Chem(o)informatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information” • J. Gasteiger and T. Engels (editors) (2003). Chemoinformatics: a textbook. Wiley-VCH. • “Chemoinformatics is the application of informatics methods to solve chemical problems.”
  6. 6. Representation of molecules • Need for a machine-readable representation • 1D – computed/experimental global properties • 2D – the chemical structure diagram • 3D – atomic coordinate data • 1D representations handled using conventional DBMS software • Need to manipulate 2D and 3D data
  7. 7. Connection tables 9 1 C 2 2 6 1 7 1 O 2 C 1 2 3 1 2 3 C 2 1 4 2 1 3 4 C 3 2 5 1 7 8 5 C 4 1 6 2 4 6 C 1 1 5 2 6 7 C 1 1 8 1 9 2 5 8 C 7 1 9 O 7 2 • An unambiguous representation of a 2D chemical structure diagram • A connection table is a graph, the underlying data structure in chemoinformatics
  8. 8. Graph theory and chemistry • Graph theory • Branch of mathematics that describes sets of objects, called nodes and the relationships between them, called edges O • A 2D connection table is a graph: Br • Nodes correspond to atoms • Edges correspond to bonds NH 2 • Graph matching algorithms • Search chemical databases • Generation of other representations
  9. 9. Types of search • Exact structure search (hashed connection table with graph isomorphism for collision handling) • Substructure search (subgraph isomorphism) • cf partial or boolean matching in text • Similarity searching (maximal common subgraph isomorphism (or simpler)) • cf best match search or web searching • Graph matching algorithms are effective • But time is factorial with the number of nodes • Need for efficient heuristics
  10. 10. Fingerprints C O C C C C C C C • A fingerprint (or fragment bit-string) is a binary vector encoding the presence (“1”) or absence (“0”) of fragment substructures in a molecule • Each bit in the fingerprint represents one molecular fragment. Typical length is ~1000 bits • An approximate representation, but one that can be processed very efficiently and hence often used as a precursor to graph matching
  11. 11. Chemoinformatics facilities • Database searching as described previously • Structure and substructure searching originally • Similarity searching from mid-Eighties • 3D substructure searching from mid-Nineties (first rigid then flexible) • Applications • Database clustering • Molecular diversity analysis • Drug-likeness • Virtual screening Ligand-based Structure-based
  12. 12. 3D substructure searching • Generation of pharmacophore patterns • Use of MOGA and hyperstructure approaches O a = 8.62+ 0.58 Angstroms - N O b = 7.08+ 0.56 Angstroms - c a c = 3.35+ 0.65 Angstroms O - O O b N O O O S N O N O O O N O N N O O O N N N O O O O P O O N O N N O N O P O O N O O N N O P O O O O O N O O O O
  13. 13. Similarity searching using 2D fingerprints Use of data fusion methods to enhance performance, combining information from multiple searches H N O H H H2N N N OH H N N NH2 N N Q uery N N OH H H2N HO N N H N N N N
  14. 14. Molecular modelling and QSAR • Use of computational chemistry to obtain the structures and properties of small molecules • Quantum mechanics • Molecular dynamics • Molecular modelling • Statistical correlation of structure (however described) with physical, chemical and biological properties • Initially biological activity (QSAR) • Now pharmacokinetics and toxicity (ADMET)
  15. 15. Integration with database searching • Related, but largely separate, research areas for many years • Simple search operations on very large numbers of molecules • Increasingly complex operations on smaller and smaller (normally homogeneous) datasets • Substructural analysis as an early, notable exception • The future lies in the integration of these two approaches, applying more sophisticated methods on larger datasets • Docking now well established • Property calculations at a database level • ADMET
  16. 16. General references J. Gasteiger (ed.), Handbook of Chemoinformatics (Wiley-VCH, Weinheim, 2003). W.L. Chen, Chemoinformatics: past, present and future, Journal of Chemical Information and Modeling 46 (2006) 2230-2255. D.J. Wild and G.D. Wiggins, Challenges for chemoinformatics education in drug discovery, Drug Discovery Today 11 (2006) 436- 439. A.R. Leach and V.J. Gillet, An Introduction to Chemoinformatics (Kluwer, Dordrecht, 2nd sedition, 2007). P. Willett, A bibliometric analysis of chemoinformatics, Aslib Proceedings 60 (2008) 4-17.