How am I supposed to organize a protein database when I canteven organize my address book?                         Jeremy ...
Alternate title (and take home message):Cheminformatics is so great!  But is it too good to be   (transferably) true?     ...
How great is cheminformatics?Example: Are these the same or different molecules?                                          ...
How great is cheminformatics?Example: Are these the same or different molecules? Answer: Same, that’s easy, just use canon...
Thanks to…?                ?                     5 / 17
Thanks to…Harry Morgan        Dave WeiningerActor, “MASH”          Daylight (Hmmm…?)              (SMILES)                ...
Thanks to…  Harry Morgan                         Dave Weininger     ACS CAS                              Daylight(Morgan A...
Now about those proteins…  • Example: Are these the same or different proteins?1YIN:ALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMG...
Now about those proteins…• Example: Are these the same or different proteins?Answer: Same… but what does that even mean?  ...
Why protein identification is hard• Proteins are large, complex, dynamic• PDB is database of crystallography  experiments,...
How about human identification?    (Should be easier, may shed light…)                                          11 / 17
Human identification hard too,        apparently…                               Credit card fraudHomeland security        ...
(Which brings us to…)My address book problems       How many Rob Yangs?                             13 / 17
(Philosophical tangent:)Are human entities actually identifiable?   One Harry Morgan or two?      How can we know?        ...
(Philosophical tangent:)Are human entities actually identifiable?                                     Individuality may be...
Could I organize my address book    using cheminformatics?   What would the algorithm look like?                          ...
Conclusions• “CINF” (cheminformatics) is awesome.• But some CINF-awesomeness is not readily  transferable to other domains...
Upcoming SlideShare
Loading in …5
×

How am I supposed to organize a protein database when I can't even organize my address book?

504 views

Published on

Presented at the Spring 2012 ACS National Meeting in San Diego, at the CINF flash session.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
504
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

How am I supposed to organize a protein database when I can't even organize my address book?

  1. 1. How am I supposed to organize a protein database when I canteven organize my address book? Jeremy Yang UNM & IU CINF Flash session - ACS National Meeting, March 25, 2012 – San Diego, CA
  2. 2. Alternate title (and take home message):Cheminformatics is so great! But is it too good to be (transferably) true? 2 / 17
  3. 3. How great is cheminformatics?Example: Are these the same or different molecules? 3 / 17
  4. 4. How great is cheminformatics?Example: Are these the same or different molecules? Answer: Same, that’s easy, just use canonical graph algorithm via canonical SMILES: CNC1C(O)C(O)C(CO)OC1OC2C(OC(C)C2(O)C=O)OC7C(O)C(O)C(NC(=N)NCNC(=O)C4=C(O)C(C3CC6C(=C(O)C3(O)C4=O)C(=O)c5c(O)cccc5C6(C)O)N(C)C)C(O)C7NC(N)=N (TETRACYCLINOMETHYLSTREPTOMYCIN) 4 / 17
  5. 5. Thanks to…? ? 5 / 17
  6. 6. Thanks to…Harry Morgan Dave WeiningerActor, “MASH” Daylight (Hmmm…?) (SMILES) / 17 6
  7. 7. Thanks to… Harry Morgan Dave Weininger ACS CAS Daylight(Morgan Algorithm) (SMILES) Et al., et al…. 7 / 17
  8. 8. Now about those proteins… • Example: Are these the same or different proteins?1YIN:ALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEMLDAHRLHAPTS3OS8:SNAKRSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTRHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKCKNVVPSYDLLLEMLDAHRLHAPT PAM250 alignment score: (gap: -3; extend: -10) 1156 (1156/1260 = 92%) Ergo, um… Maybe. 8 / 17
  9. 9. Now about those proteins…• Example: Are these the same or different proteins?Answer: Same… but what does that even mean? 9 / 17
  10. 10. Why protein identification is hard• Proteins are large, complex, dynamic• PDB is database of crystallography experiments, not molecules• Ligands, co-crystals, waters• Protein crystallography & NMR is hard• History, culture… 10 / 17
  11. 11. How about human identification? (Should be easier, may shed light…) 11 / 17
  12. 12. Human identification hard too, apparently… Credit card fraudHomeland security http://forms.cybersource.com/forms/NAFRDQ12012whi 12 / 17 tepaperFraudReport2012CYBSwww2012
  13. 13. (Which brings us to…)My address book problems How many Rob Yangs? 13 / 17
  14. 14. (Philosophical tangent:)Are human entities actually identifiable? One Harry Morgan or two? How can we know? 14 / 17
  15. 15. (Philosophical tangent:)Are human entities actually identifiable? Individuality may be contextual. One Harry Morgan or two? How can we know? 15 / 17
  16. 16. Could I organize my address book using cheminformatics? What would the algorithm look like? 16 / 17
  17. 17. Conclusions• “CINF” (cheminformatics) is awesome.• But some CINF-awesomeness is not readily transferable to other domains.• Cannot automate logic if not logical (How many Harry Morgans?).• Perhaps CINF-awesomeness can be used as an indexing approach for chem-related domains. “Chester” 17 / 17

×