Towards semantic systems chemical biology

1,281 views

Published on

introduce a semantic framework for studying systems chemical biology / systems pharmacology, in which three major projects (Chem2Bio2RDF, Chem2Bio2OWL, SLAP (semantic link association prediction) are covered.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,281
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Learn a basic and used several software, What, how,
  • What is scb, why we need seman, the whole architecture,
  • Antibacterial drug, 4 parts, data, we wanna ask can this drug have side effect?
  • Can we use semantic web, answered it by google or siri?
  • Remove logic, to application
  • Demo, links
  • Show indice search
  • To link them, 25 database; why need download (nobody else is doing) why relation database (already have, and doubt semantic web, data quality), ontology or mapping file is key
  • Show mapping file, demo one database, SQL no difference. show exhibit: http://cheminfov.informatics.indiana.edu:8080/exhibit/drugbank.html Show triples select * where { <http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB00041> ?p ?o }
  • Here is the figure of all chem2bio2rdf datasets. Node represents each database colored by its RDF vender; red nodes are RDF data sources provided in chem2bio2rdf. Directed edge shows the linkage from one dataset to another dataset, colored by the linkage type. E.g,., the type compound includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively. Up to now, we already have over 110million triples
  • Show indice search
  • Now we can use our portal to answer a variety of questions in systems chemical biology. From the basic one like give me all info about this compound, to advanced one like linke kegg and pubchem to identify potential multiple pathway inhibitors for MAPK
  • homogenous data, why OWL?
  • Bottom up, basic ontology…. Bottom down… recommend BFO, why bottom up
  • Concepts…concise as possible
  • Different with only pizza, more like beverage, cheese… each concept like pizza. Interaction decribes all kinds of relation between objects
  • Subclass---interaction,, utility class---chemical structure, chemical physical property
  • Show NCBO bioportal
  • Two author similar
  • demo
  • Towards semantic systems chemical biology

    1. 1. From Data Integration to Data mining in Semantic Web systems chemical biology as a case study Bin Chen School of Informatics and Computing Indiana University at Bloomington Lecture for S636 Nov 17, 2011
    2. 2. Outline <ul><li>Introduction </li></ul><ul><li>RDF (Chem2Bio2RDF) </li></ul><ul><li>OWL (Chem2Bio2OWL) </li></ul><ul><li>Graph mining (SLAP) </li></ul>
    3. 3. <ul><li>What’s Systems Chemical Biology </li></ul>Chemical Biology Systems Phenotype interacting mapping Compound Drug Protein Gene PPI Metabolic Pathway Gene Regulatory Disease Side effect Toxicity Oprea TI, et al, Systems chemical biology, nature, 2007 Chemogenomics
    4. 4. <ul><li>The data are heterogeneous and scattered around the web… </li></ul>MATADOR
    5. 5. Semantic Web <ul><li>an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. </li></ul>Semantic web Stack http://en.wikipedia.org/wiki/Semantic_Web
    6. 6. SPARQL RDF Ontology Algorithm and tools Applications Experimental Data Text mining Data Chem2Bio2RDF Chem2Bio2OWL Path finding; Association search; Association ranking and prediction Polypharmacology; drug side effect Architecture of Semantic Systems Chemical Biology
    7. 7. Outline <ul><li>Introduction </li></ul><ul><li>RDF (Chem2Bio2RDF) </li></ul><ul><li>OWL (Chem2Bio2OWL) </li></ul><ul><li>Graph mining (SLAP) </li></ul>
    8. 8. RDF (Resource Description Framework) <ul><li>a standard model for data interchange on the Web, using triples (subject, predicate, object) to present and link data, and using URIs to identify resources. </li></ul>Resource (subject) Value (object) Property (predicate) Drug Lipitor name <RDF> <Description about=&quot;http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB01076&quot;> <name> Lipitor </author> <company>Pfizer</company> </Description> </RDF> company Pfizer http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB01076 URI
    9. 9. Use RDF to Integrate Data http://chem2bio2rdf.org/drugbank/DB01076 name company lipitor Pfizer http://chem2bio2rdf.org/drugbank/DB01076 Molecular_Weight formula 558.6398 C33H35FN2O5 Database 1 Database 2 Same URI, merged!
    10. 10. Use RDF to Link Data http://chem2bio2rdf.org/drugbank/DB01076 sameAs http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01076 http://chem2bio2rdf.org/pubchem/resource/pubchem_compound/60823 cid
    11. 11. uniprot Bio2RDF Others LODD Chem2Bio2RDF Virtuoso Triple store SPARQL ENDPOINTS Dereferenable URI Browsing PlotViz: Visualization Cytoscape Plugin Linked Path Generation and Ranking Third party tools
    12. 12. Workflow for RDF conversion XML CSV DB TXT Relational DB D2R Mapping D2R server Dumping Virtuoso Triple Store Scripts Ontology Publishing External Sources Download Local copy … Chen B,et al. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics. 2010
    13. 13. # Table c2b2r_DrugBankDrug map:c2b2r_DrugBankDrug a d2rq:ClassMap; d2rq:dataStorage map:database; d2rq:uriPattern &quot;drugbank_drug/@@c2b2r_DrugBankDrug.DBID|urlify@@&quot;; d2rq:class drugbank:DrugBankDrug; d2rq:classDefinitionLabel &quot;c2b2r_DrugBankDrug&quot;; . map:c2b2r_DrugBankDrug__label a d2rq:PropertyBridge; d2rq:belongsToClassMap map:c2b2r_DrugBankDrug; d2rq:property rdfs:label; d2rq:pattern &quot;@@c2b2r_DrugBankDrug.Generic_Name@@&quot;; . map:c2b2r_DrugBankDrug_DBID a d2rq:PropertyBridge; d2rq:belongsToClassMap map:c2b2r_DrugBankDrug; d2rq:property drugbank:DBID; d2rq:propertyDefinitionLabel &quot;c2b2r_DrugBankDrug DBID&quot;; d2rq:column &quot;c2b2r_DrugBankDrug.DBID&quot;; Table D2R mapping RDF Exhibit link
    14. 14. Node represents each database colored by its RDF vender; Directed edge shows the linkage from one dataset to another dataset, colored by the linkage type. E.g,., the type compound includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively. Chem2Bio2RDF Datasets http://chem2bio2rdf.org Chem2Bio2RDF data Other data venders compound protein/gene chemogenomics literature others
    15. 15. http://linkeddata.org
    16. 16. uniprot Bio2RDF Others LODD Chem2Bio2RDF Virtuoso Triple store SPARQL ENDPOINTS Dereferenable URI Browsing PlotViz: Visualization Cytoscape Plugin Linked Path Generation and Ranking Third party tools
    17. 17. SPARQL <ul><li>SQL-like Query Language for RDF </li></ul>
    18. 18. Implement cheminformatics and bioinformatics tools into SPARQL ARQ Function Extension SPARQL Chemistry Development Kits BioJAVA Web Services PREFIX drugbank: < http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ > PREFIX f: <java:org.bio2chem2rdf.arq.> SELECT ?x ?s WHERE { ?x drugbank:smilesStringCanonical ?s FILTER ( f:tanimoto( 'NS(=O)(=O)C1=CC(=C(Cl)C(Cl)=C1)S(N)(=O)=O', ?s, 'MACCS') > 0.9 ) } f:tanimoto is used for compound similarity search
    19. 19. Answer scientific questions <ul><li>Give me all information about this compound </li></ul><ul><li>Give me all information about this target </li></ul><ul><li>Find chemical associated genes </li></ul><ul><li>Find gene associated chemicals </li></ul><ul><li>Find disease associated chemicals </li></ul><ul><li>Find side effect associated chemicals </li></ul><ul><li>Find all the drug-like compounds in PubChem BioAssay that share at least two targets with a drug in DrugBank </li></ul><ul><li>Link KEGG / Reactome Pathways and PubChem to identify potential multiple pathway inhibitors for MAPK </li></ul>More in http://chem2bio2rdf.wikispaces.com/multiple+sources
    20. 20. link
    21. 21. Outline <ul><li>Introduction </li></ul><ul><li>RDF (Chem2Bio2RDF) </li></ul><ul><li>OWL (Chem2Bio2OWL) </li></ul><ul><li>Graph mining (SLAP) </li></ul>
    22. 22. Node represents each database colored by its RDF vender; Directed edge shows the linkage from one dataset to another dataset, colored by the linkage type. E.g,., the type compound includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively. Chem2Bio2RDF Datasets http://chem2bio2rdf.org Chem2Bio2RDF data Other data venders compound protein/gene chemogenomics literature others
    23. 23. Ontology workflow
    24. 24. Step 1: Hunting for scientific questions and targeting goals <ul><li>What's the targets of troglitazone ? </li></ul><ul><li>Find PPARG inhibitors with molecular weight smaller than 500d. </li></ul><ul><li>Which pathway will be affected by troglitazone ? </li></ul><ul><li>Find all the common/unique genes or proteins or drugs between/among two or many nodes . </li></ul><ul><li>What genes may the compound interact with and are expressed in liver ? </li></ul>
    25. 25. Step 2: Propose framework and basic classes <ul><li>SmallMolecule </li></ul><ul><li>MacroMolecule </li></ul><ul><li>Disease </li></ul><ul><li>SideEffect </li></ul><ul><li>Pathway </li></ul><ul><li>BioAssay </li></ul><ul><li>Literature </li></ul><ul><li>Interaction </li></ul>
    26. 26. Step 3: Define classes, relations and data properties <ul><li>Refine class </li></ul><ul><ul><li>Subclass </li></ul></ul><ul><ul><li>Utility class </li></ul></ul><ul><li>Object property </li></ul><ul><li>Data property </li></ul>http://chem2bio2owl.wikispaces.com/Version+1.0
    27. 27. Step 4: Align with External ontology <ul><li>Import BioPAX </li></ul><ul><li>Map disease to Disease Ontology </li></ul><ul><li>Standardize terms </li></ul><ul><ul><li>OBO Foundry </li></ul></ul><ul><ul><li>NCBO Bioportal </li></ul></ul>
    28. 28. Chem2Bio2OWL
    29. 29. Step 5: Populate Chem2Bio2OWL <ul><li>Identifier for compound, drug, protein, gene, pathway, side effect and disease </li></ul><ul><ul><li>Primary source </li></ul></ul><ul><li>Term mapping </li></ul><ul><ul><li>String similarity match </li></ul></ul>Chem2Bio2RDF Protégé API Virtuoso Pellet reasoning Chem2Bio2OWL
    30. 30. Step 6: Evaluation---Consistence checking <ul><li>Data property </li></ul><ul><li>Manually check sample reasoning results by domain experts </li></ul>
    31. 31. Step 6: Evaluation---case study <ul><li>Drug target identification </li></ul>PREFIX c2b2r: http://chem2bio2rdf.org/chem2bio2rdf.owl# PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> select distinct ?target from <http://chem2bio2rdf.org/owl#> where { ?chemical rdfs:label ?drugName ; c2b2r:hasInteraction ?interaction . ?interaction c2b2r:hasTarget [bp:name ?target]; c2b2r:drugTarget true .   FILTER (str(?drugName)=&quot;Troglitazone&quot;) } Annotated Chem2Bio2OWL Mashed Chem2Bio2RDF
    32. 32. Outline <ul><li>Introduction </li></ul><ul><li>RDF (Chem2Bio2RDF) </li></ul><ul><li>OWL (Chem2Bio2OWL) </li></ul><ul><li>Graph mining (SLAP) </li></ul><ul><ul><li>Semantic Link Association Prediction </li></ul></ul>
    33. 34. Two objects are similar if they are related to similar objects Coauthorship Same Target
    34. 35. Two objects are related if they share same objects or their related objects are related Compound 1 Protein 2 Protein 1 Compound 1 Protein 2 Protein 1 Compound 2 Computer Science Person2 Person 1 Computer Science Person2 paper1 paper2 advisor major publish cite conference
    35. 36. Cmpd1 Protein 1 Protein 2 Cmpd 2 Cmpd 1 Cmpd 2 Protein 1 Neighbor Chemogenomics Chemogenomics Chemogenomics Chemogenomics Protein 2 Cmpd1 Protein 1 Chemogenomics hasGO hasGO Protein 2 Cmpd1 Protein 1 Chemogenomics PPI GO:0001 Sample patterns Cmpd1 Protein 1 Cmpd 2 Chemogenomics hypertension Side effect Side effect Cmpd1 Protein 1 Cmpd 2 Chemogenomics Substructure substructure substructure
    36. 37. Target 2 Compound1 Compound 2 Compound 3 Target 3 GO:00001 hasGO hasGO chemogenomics chemogenomics chemogenomics chemogenomics chemogenomics neighbor Side Effect 1 hasSideEffect hasSideEffect Gene Family 1 hasGeneFamily hasGeneFamily Target 1 chemogenomics Target 4 chemogenomics proteinProteinInteraction Association depends on its neighborhood
    37. 39. Statistical Model Convert the question to a path surfing problem Gene i Gene j PPI PPI PPI hasGO hasGO hasPathway chemogenomics P(i j) =1/3
    38. 40. Protein 2 Cmpd1 (s) Protein 1 (t) e1 e2
    39. 41. <ul><li>Randomly sample 100,000 drug target pairs </li></ul><ul><li>Yielding 453,087 paths, 17 patterns </li></ul>Pattern Samples: Pattern Distribution
    40. 42. Statistical Model 3. Nodes association estimation Raw score of random pairs fit to normal distribution!
    41. 43. Direct: drug target pairs with IC50<30um Indirect: drug target pairs with no interaction Random: random pairs
    42. 45. SLAP interface
    43. 46. Acknowledgement <ul><li>Cheminformatics/Chemogenomics Group (Dr. David Wild, Indiana University) </li></ul><ul><ul><li>Xiao Dong, Huijun Wang, Dazhi Jiao, Dr. Qian Zhu, Madhuvanthi Sankaranarayanan, Jaehong Shin </li></ul></ul><ul><li>Semantic Web Lab (Dr. Ying Ding, Indiana University) </li></ul><ul><ul><li>Yuyin Sun, Bing He, Shanshan Chen </li></ul></ul><ul><li>High performance computing (Indiana University) </li></ul><ul><ul><li>Jong Youl Choi </li></ul></ul><ul><li>Pfizer CS COE (Dr. Eric Gifford) </li></ul>
    44. 47. Thanks!

    ×