Matteo FlorisLa chemoinformaticauno strumento computazionale per la chimica farmaceuticaCRS4 - collana seminari 20123 Magg...
Mi presentoMatteo FlorisLaurea in C.T.F., Univ di PadovaMaster in Bioinformatica, Koeln Univ.Dottorato in Biochimica, Univ...
Chemoinformatics or cheminformatics?Chemoinformatics is a vast discipline, standing on theinterface of chemistry, biology ...
Premessa
PremessaDrug designRational drug design o rational designRicerca di nuovi (potenziali!) farmaci sulla base dellaconoscenza...
PremessaDrug designRational drug design o rational designRicerca di nuovi (potenziali!) farmaci sulla base dellaconoscenza...
PremessaDrug designRational drug design o rational designRicerca di nuovi (potenziali!) farmaci sulla base dellaconoscenza...
PremessaLigand based CADD    Structure based CADD
PremessaLigand based CADD                                       Structure based CADDBasato sulla conoscenza di altremoleco...
PremessaLigand based CADD                                       Structure based CADD                                      ...
A virtual space odysseyOne of the main goals in drug discovery is to identify anddevelop new ligands with high binding aff...
A real world odyssey
Luniverso chimico
Luniverso chimicoChemical space is the space spanned by all possible (i.e.energetically stable) molecules and chemical com...
Luniverso chimicoChemical space is the space spanned by all possible (i.e.energetically stable) molecules and chemical com...
Luniverso chimicoCAS REGISTRY is the most authoritative collection of disclosed chemical substanceinformation, containing ...
Luniverso chimicoGDB-13 enumerates small organic molecules up to 13atoms of C, N, O, S and Cl following simple chemicalsta...
Luniverso chimico
Luniverso chimico 150 possibili sostituentida mono a 14 sostituenti  10^29 derivati teorici
Luniverso chimicoNavigating chemical space for biology and medicineChristopher Lipinski & Andrew HopkinsNature 432, 855–86...
Luniverso chimicoNavigating chemical space for biology and medicineChristopher Lipinski & Andrew HopkinsNature 432, 855–86...
Luniverso chimico
Luniverso chimico
Trust, but verifyMany scientists TRUST chemistry and biology databases that are sooften reused, reanalyzed and integrated ...
Rappresentare molecoleMOLECULES real objects                   MOLECULE                REPRESENTATIONS                    ...
Rappresentare molecoleChemical table file                     benzene                    6 6 0 0    0 0 0 0   0 0 1 V2000 ...
Rappresentare molecoleSMILES ®Benzene: c1ccccc1Metano: CEtino: C#CSildenafil citrato (Viagra): OC(=O)CC(O)(CC(O)=O)C(O)=O....
Rappresentare molecoleInChIThe IUPAC International Chemical Identifier (InChI) is anon-proprietary identifier for chemical...
Rappresentare molecoleInChI is short for International Chemical Identifier.InChIs are text strings comprising different la...
Rappresentare molecole
Rappresentare molecoleAstrazioni:grafigrafi astrattimarkushdescrittori (rappresentazioni numeriche)fingerprints (rappresen...
Descrittori molecolari"The molecular descriptor is the final result of a logic and mathematicalprocedure which transforms ...
Descrittori molecolariThe main classes of theoretical molecular descriptors are: • 0D-descriptors (i.e. constitutional des...
QSARMore than a century ago, Crum-Brown and Fraserexpressed the idea that the physiological action of asubstance in a cert...
QSARMore than a century ago, Crum-Brown and Fraserexpressed the idea that the physiological action of asubstance in a cert...
QSARThere is a consensus among current predictive toxicologists that CorwinHansch is the founder of modern QSAR. In the cl...
Librerie computazionaliCDKOpenbabelCACTVSRDKitSVL/MOE
Librerie computazionaliCDK         Web: cdk.sf.net            Linguaggio: Java (Jython, Groovy)Openbabel   GUI: n.a.      ...
Librerie computazionaliCDK         Web: openbabel.org            Linguaggio: c++, python/java/ perl bindingsOpenbabel   GU...
Librerie computazionaliCDK         Web: xemistry.com            Linguaggio: TclOpenbabel   GUI: a pagamento             Pr...
Librerie computazionaliCDK         Web: www.rdkit.org/            Linguaggio: Python, c++Openbabel   GUI: n.a.            ...
Algoritmi: similarity search
Algoritmi: similarity search
Algoritmi: similarity searchSimilarity measures, calculations that quantify the similarityof two molecules, and screening,...
Algoritmi: similarity searchStructural keys   • The presence/absence of each element, or if an element is common     (nitr...
Algoritmi: similarity searchFor example, the molecule OC=CN would generate thefollowing patterns:0-bond paths:   C O N1-bo...
Algoritmi: similarity search10001001010001001010001001110100100010101001001011100111010010010000100101011010010Tanimoto in...
Algoritmi: substructure search
Database pubbliciPubchemChemblZINCVendors variBindingDB
Librerie chimicheProblematicheRegistrazioneUnicitàStrumenti     1. filtering     2. normalizzazione     3. generazione dei...
MMsINC 1.03.967.056 total compounds3.297.001 parent compounds449.482 ionic states220.573 tautomers283.464.647 conformers (...
MMsINC 2.092.355.744 compounds from 65 public data sources and commercial catalogs71.206.303 after single-vendor-based cle...
Ridondanza chimica
Limpatto della tautomeria250000                                                                                  total pai...
Mimicking peptides... in silico
Mimicking peptides... in silicoFloris et al, Nucleic acid research, 2011; Floris M and Moro S, Molecular Informatics, 2011
Mimicking peptides... in silico
Mimicking peptides... in silico
Screening farmacoforico su larga scala•   2 minutes for the screening of 1 ph4 model on the CRS4 cluster resources over 17...
La cassetta degli attrezzi del chemoinformatico•   Python, Java•   R, Weka•   Openbabel, CDK•   Marvin Beans•   un databas...
Limportanza di un ambiente di lavoro sano
Ringraziamenti    •   Alessandro Bulfone    •   Prof Stefano Moro    •   Silvana Urru, Andrea Cristiani, Ricardo Medda, St...
Upcoming SlideShare
Loading in …5
×

La chemoinformatica: uno strumento computazionale per la chimica farmaceutica

1,536 views
1,313 views

Published on

by Matteo floris
@seminaricrs42012:http://www.crs4.it/vale/workshop-c-2012

on twitter: #crs4seminars2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,536
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

La chemoinformatica: uno strumento computazionale per la chimica farmaceutica

  1. 1. Matteo FlorisLa chemoinformaticauno strumento computazionale per la chimica farmaceuticaCRS4 - collana seminari 20123 Maggio 2012
  2. 2. Mi presentoMatteo FlorisLaurea in C.T.F., Univ di PadovaMaster in Bioinformatica, Koeln Univ.Dottorato in Biochimica, Univ. Roma “La Sapienza”Chemoinformatica: sviluppo di metodi per ligand baseddrug designBioinformatica presso il CRS4 per 6 anni (genomicacomputazionale)matteo.floris@gmail.com
  3. 3. Chemoinformatics or cheminformatics?Chemoinformatics is a vast discipline, standing on theinterface of chemistry, biology and computer scienceD. Agrafiotis, J&J
  4. 4. Premessa
  5. 5. PremessaDrug designRational drug design o rational designRicerca di nuovi (potenziali!) farmaci sulla base dellaconoscenza di un target biologico
  6. 6. PremessaDrug designRational drug design o rational designRicerca di nuovi (potenziali!) farmaci sulla base dellaconoscenza di un target biologicoDrug design spesso si serve di tecniche di modelingcomputazionale (computer-aided drug design, CADD)
  7. 7. PremessaDrug designRational drug design o rational designRicerca di nuovi (potenziali!) farmaci sulla base dellaconoscenza di un target biologicoDrug design spesso si serve di tecniche di modelingcomputazionale (computer-aided drug design, CADD)Se la struttura tridimensionale del target molecolare è nota,allora si parla di structure-based drug design.
  8. 8. PremessaLigand based CADD Structure based CADD
  9. 9. PremessaLigand based CADD Structure based CADDBasato sulla conoscenza di altremolecole che in grado di legarsi coltarget biologico di interesse.Queste altre molecole possonoessere utilizzate per costruire unaipotesi farmacoforica che definiscale caratteristiche minime richiesteper avere linterazione.In alternativa, le techichequantitative structure-activityrelationship (QSAR) permettono dicercare una correlazione traproprietà chimico-fisiche dellamolecola e lattività biologica.
  10. 10. PremessaLigand based CADD Structure based CADD Basato sulla conoscenza di altre Si basa sulla conoscenza dellamolecole che in grado di legarsi col struttura del target biologico ditarget biologico di interesse. interesse, ottenuta tramite tecniche di x-ray crystallography oQueste altre molecole possono spetroscopia NMR.essere utilizzate per costruire una ipotesi farmacoforica che definisca Qualora la struttura del target nonle caratteristiche minime richieste fosse a disposizione, si può ovviareper avere linterazione. con la costruzione di modelli tridimensionali per omologia.In alternativa, le techiche quantitative structure-activity Con lausilio di strumentirelationship (QSAR) permettono di computazionali è possibile stimarecercare una correlazione tra laffinità e la selettività di uno o piùproprietà chimico-fisiche della composti per il target.molecola e lattività biologica.
  11. 11. A virtual space odysseyOne of the main goals in drug discovery is to identify anddevelop new ligands with high binding affinity towardsa protein target. Today, there is increased reliance oncomputer-based tools […]. These help select moleculesfrom the vast expanse of chemical space and aidoptimization of compounds of interest into drugs. Cath ODriscoll, Nature, 2004
  12. 12. A real world odyssey
  13. 13. Luniverso chimico
  14. 14. Luniverso chimicoChemical space is the space spanned by all possible (i.e.energetically stable) molecules and chemical compounds –that is, all stoichiometric combinations of electrons andatomic nuclei, in all possible topology isomers. Chemical reactions allow us to move in chemical space.
  15. 15. Luniverso chimicoChemical space is the space spanned by all possible (i.e.energetically stable) molecules and chemical compounds –that is, all stoichiometric combinations of electrons andatomic nuclei, in all possible topology isomers. Chemical reactions allow us to move in chemical space. The mapping between chemical space and molecularproperties is often not unique, meaning that there can bemultiple molecules which exhibit the same properties
  16. 16. Luniverso chimicoCAS REGISTRY is the most authoritative collection of disclosed chemical substanceinformation, containing more than 65 million organic and inorganicsubstances and 63 million sequences67,370,815 Commercially available chemicals in CASPubchemPcsubstance contains about 85 million records.Pccompound contains nearly 30 million unique structures.PCBioAssay contains more than 585,000 BioAssays. Each BioAssaycontains a various number of data points.
  17. 17. Luniverso chimicoGDB-13 enumerates small organic molecules up to 13atoms of C, N, O, S and Cl following simple chemicalstability and synthetic feasibility rules. With 977.468.314 structures, GDB-13 is the largest publiclyavailable small organic molecule database to date
  18. 18. Luniverso chimico
  19. 19. Luniverso chimico 150 possibili sostituentida mono a 14 sostituenti 10^29 derivati teorici
  20. 20. Luniverso chimicoNavigating chemical space for biology and medicineChristopher Lipinski & Andrew HopkinsNature 432, 855–861 (16 December 2004) doi:10.1038/nature03193Despite over a century of applying organic synthesis to the search for drugs, we are still farfrom even a cursory examination of the vast number of possible small molecules that couldbe created. Indeed, a thorough examination of all ‘chemical space’ is practicallyimpossible. Given this, what are the best strategies for identifying small moleculesthat modulate biological targets?
  21. 21. Luniverso chimicoNavigating chemical space for biology and medicineChristopher Lipinski & Andrew HopkinsNature 432, 855–861 (16 December 2004) doi:10.1038/nature03193Despite over a century of applying organic synthesis to the search for drugs, we are still farfrom even a cursory examination of the vast number of possible small molecules that couldbe created. Indeed, a thorough examination of all ‘chemical space’ is practicallyimpossible. Given this, what are the best strategies for identifying small moleculesthat modulate biological targets? Il salvarsan (o arsfenamina o 606) è un farmaco utilizzato nel trattamento della sifilide e della tripanosomiasi africana. È stato il primo agente chemioterapico conosciuto.
  22. 22. Luniverso chimico
  23. 23. Luniverso chimico
  24. 24. Trust, but verifyMany scientists TRUST chemistry and biology databases that are sooften reused, reanalyzed and integrated with new cheminformatics orbioinformatics tools. The authors of such articles do not appear to analyze for problemscaused by poor DATA QUALITY or hypotheses that are incorrect dueto poor underlying data. Antony Williams, ChemSpider
  25. 25. Rappresentare molecoleMOLECULES real objects MOLECULE REPRESENTATIONS models MOLECULAR DESCRIPTORS information
  26. 26. Rappresentare molecoleChemical table file benzene 6 6 0 0 0 0 0 0 0 0 1 V2000 1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 3 1 2 0 0 0 0 4 2 2 0 0 0 0 5 3 1 0 0 0 0 6 4 1 0 0 0 0 6 5 2 0 0 0 0 M END $$$$
  27. 27. Rappresentare molecoleSMILES ®Benzene: c1ccccc1Metano: CEtino: C#CSildenafil citrato (Viagra): OC(=O)CC(O)(CC(O)=O)C(O)=O.CCCc1nn(C)c2c1nc([nH]c2=O)-c1cc(ccc1OCC)S(=O)(=O)N1CCN(C)CC1
  28. 28. Rappresentare molecoleInChIThe IUPAC International Chemical Identifier (InChI) is anon-proprietary identifier for chemical substances that canbe used in printed and electronic data sources thusenabling easier linking of diverse data compilationshttp://www.inchi-trust.org/
  29. 29. Rappresentare molecoleInChI is short for International Chemical Identifier.InChIs are text strings comprising different layers andsublayers of information separated by slashes (/). Each InChI strings starts with the InChI version numberfollowed by the main layer. This main layer containssublayers for chemical formula, atom connections andhydrogen atoms. Depending on the structure of the molecule the main layermay be followed by additional layers e. g. for charge,stereochemical and/or isotop information.InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H
  30. 30. Rappresentare molecole
  31. 31. Rappresentare molecoleAstrazioni:grafigrafi astrattimarkushdescrittori (rappresentazioni numeriche)fingerprints (rappresentazioni binarie)
  32. 32. Descrittori molecolari"The molecular descriptor is the final result of a logic and mathematicalprocedure which transforms chemical information encoded within a symbolicrepresentation of a molecule into a useful number or the result of somestandardized experiment. The field of molecular descriptors is strongly interdisciplinary and involves a mass ofdifferent theories. For the definition of molecular descriptors, a knowledge ofalgebra, graph theory, information theory, computational chemistry, theories oforganic reactivity and physical chemistry is usually required, although atdifferent levels. For the use of the molecular descriptors, a knowledge of statistics, chemometrics,and the principles of the QSAR/QSPR approaches is necessary in addition tothe specific knowledge of the problem. Moreover, programming, sophisticatedsoftware and hardware are often inseparable fellow-travelers of the researcher inthis field. From the introduction to the "Handbook of Molecular Descriptors"by Roberto Todeschini and Viviana Consonni, Wiley-VCH, 2000.
  33. 33. Descrittori molecolariThe main classes of theoretical molecular descriptors are: • 0D-descriptors (i.e. constitutional descriptors, count descriptors), • 1D-descriptors (i.e. list of structural fragments, fingerprints), • 2D-descriptors (i.e. graph invariants), • 3D-descriptors (such as, for example, 3D-MoRSE descriptors, WHIM descriptors, GETAWAY descriptors, quantum-chemical descriptors, size, steric, surface and volume descriptors), • 4D-descriptors (such as those derived from GRID or CoMFA methods, Volsurf).
  34. 34. QSARMore than a century ago, Crum-Brown and Fraserexpressed the idea that the physiological action of asubstance in a certain biological system (A) was afunction (f) of its chemical constitution C: A = f C
  35. 35. QSARMore than a century ago, Crum-Brown and Fraserexpressed the idea that the physiological action of asubstance in a certain biological system (A) was afunction (f) of its chemical constitution C: A = f C To explain the complex relationships between molecules andobserved quantities, two main streams were developed, the firstrelated to the search for relationships between molecularstructures and physico-chemical properties (QSPR,Quantitative Structure-Property Relationships) and the secondbetween molecular structures and biological activities (QSAR,Quantitative Structure-Activity Relationships).
  36. 36. QSARThere is a consensus among current predictive toxicologists that CorwinHansch is the founder of modern QSAR. In the classic article it wasillustrated that, in general, biological activity for a group of ‘congeneric’chemicals can be described by a comprehensive model: Log 1/C50 = a π + b ε + cS + din which C, the toxicant concentration at which an endpoint is manifested(e.g. 50% mortality or effect), is related to a hydrophobicity term, p, anelectronic and a steric term, S, (typically Taft’s substituent constant,ES).
  37. 37. Librerie computazionaliCDKOpenbabelCACTVSRDKitSVL/MOE
  38. 38. Librerie computazionaliCDK Web: cdk.sf.net Linguaggio: Java (Jython, Groovy)Openbabel GUI: n.a. Pro: licenza LGPL, JmolCACTVS Cons: solo per programmatori 1 AMBIT 2 BioclipseRDKit 3 CDK Taverna 4 CDKDescUI 5 Evince 6 HyperDossierSVL/MOE 7 JChemPaint 8 JOELib 9 Jumbo 10 KNIME CDK feature 11 LICSS 12 NMRShiftDB 13 Nomen 14 PaDEL 15 QueryConstructor 16 rcdk 17 SafeBase(TM) 18 Scaffold Hunter 19 SENECA 20 SmileMS 21 Obsolete projects 21.1 XB Edit (Working title) 22 Jmol
  39. 39. Librerie computazionaliCDK Web: openbabel.org Linguaggio: c++, python/java/ perl bindingsOpenbabel GUI: si! Pro: flessibilitaCACTVS Cons:RDKitSVL/MOE
  40. 40. Librerie computazionaliCDK Web: xemistry.com Linguaggio: TclOpenbabel GUI: a pagamento Pro: free for academics, teamCACTVS Cons: TclRDKitSVL/MOE
  41. 41. Librerie computazionaliCDK Web: www.rdkit.org/ Linguaggio: Python, c++Openbabel GUI: n.a. Pro: smirks, teamCACTVS Cons: installazione, etaRDKitSVL/MOE
  42. 42. Algoritmi: similarity search
  43. 43. Algoritmi: similarity search
  44. 44. Algoritmi: similarity searchSimilarity measures, calculations that quantify the similarityof two molecules, and screening, a way of rapidlyeliminating molecules as candidates in a substructuresearch, are both processes that use fingerprints. Fingerprints are a very abstract representation of certainstructural features of a molecule
  45. 45. Algoritmi: similarity searchStructural keys • The presence/absence of each element, or if an element is common (nitrogen, for example), several bits might represent "at least 1 N", "at least 2 N", "at least 4 N", and so forth. • Unusual or important electronic configurations, such as "sp3 carbon" or "triple-bonded nitrogen." • Rings and ring systems, such as cyclohexane, pyridine, or napthalene. • Common functional groups, such as alcohols, amines, hydrocarbons, and so forth. • Functional groups of special importance in a particular database. For example, a database of organo-metallic molecules might have bits assigned for metal-containing functional groups; in a drug database one might have bits for specific skeletal features such as steroids and barbiturates.
  46. 46. Algoritmi: similarity searchFor example, the molecule OC=CN would generate thefollowing patterns:0-bond paths: C O N1-bond paths: OC C=C CN2-bond paths: OC=C C=CN 3-bond paths: OC=CN
  47. 47. Algoritmi: similarity search10001001010001001010001001110100100010101001001011100111010010010000100101011010010Tanimoto index = c/(a + b + c)
  48. 48. Algoritmi: substructure search
  49. 49. Database pubbliciPubchemChemblZINCVendors variBindingDB
  50. 50. Librerie chimicheProblematicheRegistrazioneUnicitàStrumenti 1. filtering 2. normalizzazione 3. generazione dei tautomeri 4. stati ionici 5. unicizzazione 6. generazione dei conformeri
  51. 51. MMsINC 1.03.967.056 total compounds3.297.001 parent compounds449.482 ionic states220.573 tautomers283.464.647 conformers (about 30confs/mol); ordered by empirical E-pot; max 5 confs/mol (= about 4.6 conformers per compound)Final number of conformers: 18.461.878 (for which we have ph4-FP and USRdescriptors)Fanton et al, IEEE, 2008; Masciocchi et al, Nucleic acid research, 2009
  52. 52. MMsINC 2.092.355.744 compounds from 65 public data sources and commercial catalogs71.206.303 after single-vendor-based cleaning42.073.344 unique compounds after redundancy washing40 M of alternative tautomers5 M of ionic statesExpected number of conformers: about 220 MAverage intra-vendor redundancy: 14%10 vendors with redundancy more than 40%!4 vendors with redundancy = 0% (small sets, 100 - 2000 comp.)
  53. 53. Ridondanza chimica
  54. 54. Limpatto della tautomeria250000 total pairs taut/neu different pred Different AD diff pred & diff AD20000015000010000050000 0 Skin DevTox LC50DM LC50FM Carcinogenicity Mutagenicity BCF
  55. 55. Mimicking peptides... in silico
  56. 56. Mimicking peptides... in silicoFloris et al, Nucleic acid research, 2011; Floris M and Moro S, Molecular Informatics, 2011
  57. 57. Mimicking peptides... in silico
  58. 58. Mimicking peptides... in silico
  59. 59. Screening farmacoforico su larga scala• 2 minutes for the screening of 1 ph4 model on the CRS4 cluster resources over 17 M of conformers (4 M of commercial compounds)• Output: SDF with top commercial compounds with highest overlap with the original pharmacophore hypothesis• Possibility of multiple simultaneous screenings and parameter tuning in a reasonable time lapse
  60. 60. La cassetta degli attrezzi del chemoinformatico• Python, Java• R, Weka• Openbabel, CDK• Marvin Beans• un database personale• il BlueObelisk
  61. 61. Limportanza di un ambiente di lavoro sano
  62. 62. Ringraziamenti • Alessandro Bulfone • Prof Stefano Moro • Silvana Urru, Andrea Cristiani, Ricardo Medda, Stefania Olla • i colleghi di Outreach del CRS4 • i colleghi del CNR (IRGB-CNR, Prof F. Cucca) • Marco Fanton, Mattia Sturlese, Fabian Cedrati, Davide Sabbadin • tutti gli altri collaboratori: Alberto Manganaro, Emilio Benfenati, i colleghi del gruppo ministeriale QSAR-Reach, i colleghi del BlueObeslik, il gruppo TNBC • la mia famiglia (Lolli, Ric, Vera, nonni assortiti, sorelle varie)matteo.floris@gmail.com

×