Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Which Drug Did You Mean?Resolving the linkage spaghetti betweensemantic names, structures, bioactivity                    ...
History of Drug Names                                    Approximate timelines[cpd registration system structure and ID---...
History of Atorvastatin•   1985: (3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-(propan-2-yl)-1H-    pyrrol-...
Causes of Drug Linkage Spaghetti (I)•   Tautomer/stereo mutiplexing and structure interconversion differences (e.g.    com...
Causes of Drug Linkage Spaghetti (II)•   Literature extractions flowing into drug databases (including MeSH) can have     ...
Atorvastatin• The scale of links provides a good cross section of problems• Relationship cross-mappings and the PubChem to...
What is Atorvastatin ? - for Patients                                        [7]
Atorvastatin - for InformaticiansPubChem CID 60823                               PubChem submissions include:Wikepedia    ...
Name Retrieval Specificity (I)                                 [9]
Name Retrieval Specificity (II)”atorvastin” in DailyMed link not synonyms                                              [10]
Drug BioAssay Data: Splitting bySubmitted Structure Differences               Mainly uHTS and counterscreens              ...
Pharmacological Activity in vivo is ~70% Active         Metabolites i.e. not AtorvastatinHazardous Substances DataBank x-r...
Salt Confusion (I) Atorvastatin Calcium                                     FDA packegeCID 656846 Mw 1209                 ...
Salt Confusion (II): What gets to PatientsCID 656846CID 53252956CID 23665101 No INNs, USANs or clinical trials entries for...
Mixtures: Problematic all Round•   Atorvastatin parent (CID 60823) has 379 mixture SIDs and 147 mixture CIDs    permuatate...
The Famous Polypill: A Fuzzy term                                                 CID 44602839 Thomson Pharma             ...
Caduet: an Approved CombinationDrugbank                                         Wikipediahttp://clinicaltrials.gov/ct2/sho...
Submitter Synonym Noise in PubChem                                     [18]
A more Recent Combination     But, QA149 is negative in PubChem, DrugBank and TTD                                         ...
Spaghetti is Resolvable but Errors are Tough:     Will the Real LX4211 Please Stand up ? http://cenblog.org/the-haystack/2...
Summary•   You can navigate the linkage spaghetti in name, synonym, structure    bioactivity and mixture space, but this n...
Questions WelcomeChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htmMobile: +46(0)702-530710, Skype: cdsout...
Upcoming SlideShare
Loading in …5
×

Which Drug Did You Mean ?

1,614 views

Published on

BioIT workshop 2012

  • Be the first to comment

Which Drug Did You Mean ?

  1. 1. Which Drug Did You Mean?Resolving the linkage spaghetti betweensemantic names, structures, bioactivity and mixtures Christopher Southan ChrisDS Consulting, Göteborg, Sweden, Prepared for BioIT, Boston, April 2012, Track 14, Tuesday See also http://cdsouthan.blogspot.se/2012/ 06/will-real-bosinhib-please-stand- up-take.html [1]
  2. 2. History of Drug Names Approximate timelines[cpd registration system structure and ID------------------------------------------------------------] [patent IUPAC or image--------------------------------------------------------------------] [internal code name(s) externally blinded-------] [code name(s) > structure declared externally -----] [journal papers -----------------------------------------------------------------------] [International Non-proprietary name INN] [INN indexed in MeSH-----------------] [USAN, BAN, JAN --------------------] [brand name(s)-------------------] [combination brand ] [2]
  3. 3. History of Atorvastatin• 1985: (3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-(propan-2-yl)-1H- pyrrol-1-yl]-3,5-dihydroxyheptanoic acid IUPAC• ~ 1987: Park-Davis internal code number CI-981• ~ 1995: Atorvastatin [INN:BAN] Atorvastatin calcium [USAN], Atorvastatin calcium trihydrate INN (error ?) Atorvastatina (Spain)• 1997 Lipitor (brand name) Faboxim (Argentina) Zurinel (Chile) etc• 2004: Caduet (brand name) Norvasc (amlodipine besylate) and Lipitor(atorvastatin calcium)• 2012: atorvastatin calcium – generic - Ranbaxy• 2012: amlodipine besylate and atorvastatin calcium – generic - Ranbaxy [3]
  4. 4. Causes of Drug Linkage Spaghetti (I)• Tautomer/stereo mutiplexing and structure interconversion differences (e.g. complex antibiotics)• Popular structures > 100s of submitters > many vendors > more noise• Opaque ecosystem of primary submitters, secondary linkers, declared circularity, cryptic circularity, and submitters having independent portals with different rules• Older drugs accumulate 100’s of synonyms and database x-refs, with erros• Accumulated wet assay results are dependent on how long the drug has been in which public screening collection• Deprecated structures not always refreshed between databases globally• Pro-drugs, metabolites or tested combinations rarely have explicit x-refs [4]
  5. 5. Causes of Drug Linkage Spaghetti (II)• Literature extractions flowing into drug databases (including MeSH) can have – Author errors and paucity of standards in the primary report – No quality filtration at the result level – Curation errors and different annotation rules – No discrimination of independent de-novo checking from annotation recycling• Large-scale patent extraction feeds into databases bring in – Forests of analogues with no data links – High redundency for drugs and leads – Structural differences between pipeline outputs – Opportunistic permutations of salts and mixtures – Opportunistic virtual deuteration of all best-selling drugs• Drug discovery operations use many drugs as reference compounds in their internal screening collections . This means – Name > structure cross-mapping, internal, public and commercial – Integration of internal and external data across the same drugs [5]
  6. 6. Atorvastatin• The scale of links provides a good cross section of problems• Relationship cross-mappings and the PubChem tool-box facilitate navigation through the links• External submissons get a substance ID (SID) which are merged to compound records (CID) vi chemistry rules (see PubChem documentation)• This drug has accumulated years of submissions from different sources, BioAssay entries and pharmacology literature links• The parent CID 60823 has – 99 synonyms – 6 stero forms – 70 cannonicaly-related structures – 449 substance records [6]
  7. 7. What is Atorvastatin ? - for Patients [7]
  8. 8. Atorvastatin - for InformaticiansPubChem CID 60823 PubChem submissions include:Wikepedia (3R,5R) CID 60823 (5R) CID 51052072ChemSpider 54810 (3R) CID 21029434 (3S,5R) CID 6093359 (3S,5S) CID 62976DrugBank APRD00055 No stereo CID 2250 Query: Same, Isotopes forCHEMBL1487 PubChem Compound (Select 60823)CAS 134523-00-5 [8]
  9. 9. Name Retrieval Specificity (I) [9]
  10. 10. Name Retrieval Specificity (II)”atorvastin” in DailyMed link not synonyms [10]
  11. 11. Drug BioAssay Data: Splitting bySubmitted Structure Differences Mainly uHTS and counterscreens from Scripps & Burnham AIDs 406848-53 in ChEMBL – (antimalarial assay specified salt) ChEMBL Antimalarial strain assays (also specified salt), in vivo plus three target links Mainly qHTS from NCGC, no hits [11]
  12. 12. Pharmacological Activity in vivo is ~70% Active Metabolites i.e. not AtorvastatinHazardous Substances DataBank x-ref in the CID, but nodirect links to the metabolites(yet). Only one in-vitro assay CID 9851106result for 9808225 CID 60823 CID 9808225 [12]
  13. 13. Salt Confusion (I) Atorvastatin Calcium FDA packegeCID 656846 Mw 1209 insert lable,CAS 344423-98-9 hemicalcium trihydrateCID 60822 Mw 1155CAS 134523-03-8 INN = atorvastatin USAN/BAN = atorvastatinCID 11227182 Mw 598 calcium [13]
  14. 14. Salt Confusion (II): What gets to PatientsCID 656846CID 53252956CID 23665101 No INNs, USANs or clinical trials entries for these salts [14]
  15. 15. Mixtures: Problematic all Round• Atorvastatin parent (CID 60823) has 379 mixture SIDs and 147 mixture CIDs permuatated from 122 component CIDs• Of the 122 components 58 have a MeSH pharmacology tag, 92 have BioAssays results, 70 are in DrugBank, 101 are in ChEMBL, and 47 are below 200 mw (and thus probably salts not drugs)• Of the 147 mixture CIDs, only the 2 atorvastatin dimers have assay results or pharmacology so none of the drug mixtures have direct data links• None are in DrugBank CIDs and only atorvastin calcium is in ChEMBL• 138 of the 147 have been extracted from patents by Derwent/Thomson and are unlikely to get data links• The small number of important drug combinations that do have data and/or trial results are difficult to identify• Tested drug mixtures rarely get public code names, some get trade names but never INNs• Chemistry rules may split mixtures and synonyms in databases• PubMed "Drug Combinations"[MeSH Term] = 54,186 but no SID or CID links• Mixture components can be designated with space, / , + or ”co” [15]
  16. 16. The Famous Polypill: A Fuzzy term CID 44602839 Thomson Pharma 18 clinicaltrials.gov entries, but only partial component linksaspirin 81 mg, enalapril 2.5 mg, atorvastatin 20 mg and hydrochlorothiazide 12.5 mg(polypill) PMID: 21647425: Australian New Zealand Clinical Trials RegistryACTRN12607000099426DrugBank and TTD negative [16]
  17. 17. Caduet: an Approved CombinationDrugbank Wikipediahttp://clinicaltrials.gov/ct2/show/NCT01107743 [17]
  18. 18. Submitter Synonym Noise in PubChem [18]
  19. 19. A more Recent Combination But, QA149 is negative in PubChem, DrugBank and TTD [19]
  20. 20. Spaghetti is Resolvable but Errors are Tough: Will the Real LX4211 Please Stand up ? http://cenblog.org/the-haystack/2012/03/liveblogging-first-time-disclosures-from-acssandiego/See also: http://cdsouthan.blogspot.se/2012/03/live-chemical-structure-blogging-but.html [20]
  21. 21. Summary• You can navigate the linkage spaghetti in name, synonym, structure bioactivity and mixture space, but this needs perspicacity and circumspection.• The current drug information ecosystem with multiple stakeholders seems destined to remain ”fuzzy”• Beyond informatics challenges the consequences, particularly from frank errors, could be more serious• WHO INNs and naming stems play a key positive role – but ; – No open athoritative database - only 7000 PDF entries (!) – No transparent coordination between USAN, FDA, MeSH, national offices, or clinical trials registries – Susceptable to commercial flanking tactics• Drug combinations have a bright pharmacological future but a difficult informatics one• The fuzz includes scientific challenges (e.g. complex strucutures, dynamic tautomerism, active metabolites, formulation differences, paucity of standardised and comparable activity data.• Efforts are being made to improve the situation, including from the databases represented in this Workshop session. [21]
  22. 22. Questions WelcomeChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htmMobile: +46(0)702-530710, Skype: cdsouthanEmail: cdsouthan@hotmail.comTwitter: http://twitter.com/#!/cdsouthanBlog: http://cdsouthan.blogspot.com/LinkedIN: http://www.linkedin.com/in/cdsouthanWebsite: http://www.cdsouthan.info/CDS_prof.htmPublications: http://www.citeulike.org/user/cdsouthan/publications/order/yearCitations: http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=enPresentations: http://www.slideshare.net/cdsouthanFYI : A short piece on identifying the names and molecular details ofdrugs in clinicaltrials.govhttp://www.samedanltd.com/magazine/13/issue/166/article/3152 [22]

×