Your SlideShare is downloading. ×
Semantic (Web) Technologies for Translational Research in Life Sciences
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Semantic (Web) Technologies for Translational Research in Life Sciences

1,351
views

Published on

Published in: Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,351
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Cognitive model, cognitive behavioral model
  • In parasite research, create new strains of a parasite by knocking out specific genes. So, given a cloned sample, we may need to know the gene(s) that was knocked out.Both these scenarios are real world examples of the importance of provenance. There are many research issues in provenance management. This presentation is on addressing 1) the provenance modeling issue. Specifically, provenance interoperability, consistent modeling, and reduction of terminological heterogeneity. (2) Provenance Query
  • References: http://www.armman.org/projecthero http://www.armman.org/mmitra
  • Transcript

    • 1. Semantic (Web) Technologies for Translational Research in Life Sciences
      Ohio State University, June 16, 2011
      Amit P. Sheth
      Ohio Center ofExcellence in Knowledge-enabled Computing (Kno.e.sis)
      amit.sheth@wright.edu
      Thanks to Kno.e.sis team (Satya, Priti, Rama, and Ajith);
      Collaborators at CTEGD UGA(Dr. Tarleton, Brent Weatherly), NLM(Olivier Bodenreider), CCRC, UGA (Will York), NCBO/Stanford,
      CITAR/WSU
    • 2. Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing
    • 3. Web ofpeople
      - social networks, user-createdcasualcontent
      Web of resources
      - data, service, data, mashups
      Web of databases
      - dynamically generated pages
      - web query interfaces
      Web of pages
      - text, manually created links
      - extensive navigation
      Evolutionof Web & Semantic Computing
      Tech assimilated in life
      Web ofSensors, Devices/IoT
      - 40 billionsensors, 5 billionmobile connections
      2007
      Situations,
      Events
      Web 3.0
      Semantic TechnologyUsed
      Objects
      Web 2.0
      Patterns
      Keywords
      1997
      Web 1.0
    • 4. Outline
      Semantic Web – very brief intro
      Scenarios to demonstrate the applications and benefit of semantic web technologies
      HealthCare
      BiomedicalResearch
      Translational
    • 5. Biomedical Informatics...
      Biomedical Informatics
      Pubmed
      Clinical
      Trials.gov
      ...needs a connection
      Hypothesis Validation
      Experiment design
      Predictions
      Personalized medicine
      Semantic Web research aims at
      providing this connection!
      Etiology
      Pathogenesis
      Clinical findings
      Diagnosis
      Prognosis
      Treatment
      Genome
      Transcriptome
      Proteome
      Metabolome
      Physiome
      ...ome
      More advanced capabilities for
      search,
      integration,
      analysis,
      linking to new insights
      and discoveries!
      Genbank
      Uniprot
      Medical Informatics
      Bioinformatics
    • 6. Decision Making, Insights, InnovationsHuman Performance
      Data and Facts
      Knowledge and Understanding
      Health & Performance
      Cognitive Science, Psychology
      Neuroscience
      Anatomy, Physiology
      Cellular biology
      Molecular Biology
      ACATATGGGTACTATTTACTATTCATGGGTACTATTTATGGCATATGGCGTACTATTCTAATCCTATATCCGTCTAATCTATTTACTATTATCTATTACTATACCTTTTGGGGAAAAAAATTCTATACCGTCTAATCCTATAAATCAAGCCG
      Biochemistry
    • 7. Semantic Web standards @ W3C
      Semantic Web is built in a layered manner
      Not everybody needs all the layers

      Queries: SPARQL, Rules: RIF
      Semantic Web
      Rich ontologies: OWL
      Simple data models & taxonomies: RDF Schema
      Uniformmetamodel: RDF+ URI
      Encoding structure: XML
      Encoding characters : Unicode
    • 8. Linked Data: Semantic Web “diluted”
      Achieve for data what Web did to documents
      Relationship with the original Semantic Web vision: no AI, no agents, no autonomy
      Interoperability is still very important
      interoperability of formats
      interoperability of semantics
      Enables interchange of large data sets
      (thus very useful in, say, collaborative research)
      Semantic Web vision is largely predicated on the availability of data
      Linked Data is a movement that gets us there
      Thanks – OraLassila
    • 9. Opportunity: exploiting clinical and biomedical data
      text
      Health
      Information
      Services
      Elsevier
      iConsult
      Scientific
      Literature
      PubMed
      300 Documents
      Published Online
      each day
      User-contributed
      Content (Informal)
      GeneRifs
      WikiGene
      NCBI
      Public Datasets
      Genome,
      Protein DBs
      new sequences
      daily
      Laboratory
      Data
      Lab tests,
      RTPCR,
      Mass spec
      Clinical Data
      Personal
      health history
      Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.
    • 10. Major Community Efforts
      W3C Semantic Web Health Care & Life Sciences Interest Group: http://www.w3.org/2001/sw/hcls/
      Clinical Observations Interoperability: EMR + Clinical Trials: http://esw.w3.org/HCLS/ClinicalObservationsInteroperability
      National Center for Biomedical Ontologies: http://bioportal.bioontology.org/
    • 11. Major SW Projects
      OpenPHACTS: A knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA). http://www.openphacts.org/
      LarKC: develop the Large Knowledge Collider, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. http://www.larkc.eu/
      NCBO: contribute to collaborative science and translational research. http://bioportal.bioontology.org/
    • 12. Semantic Web Enablers and Techniques
      Ontology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge base
      Semantic Annotation (meatadata Extraction): Manual, Semi-automatic (automatic with human verification), Automatic
      Semantic Computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, inferencing, reasoning, hypothesis validation, discovery, visualization
    • 13. Drug Ontology Hierarchy(showing is-a relationships)
      owl:thing
      prescription_drug_ brand_name
      brandname_undeclared
      brandname_composite
      prescription_drug
      monograph_ix_class
      cpnum_ group
      prescription_drug_ property
      indication_ property
      formulary_ property
      non_drug_ reactant
      interaction_property
      property
      formulary
      brandname_individual
      interaction_with_prescription_drug
      interaction
      indication
      generic_ individual
      prescription_drug_ generic
      generic_ composite
      interaction_with_monograph_ix_class
      interaction_ with_non_ drug_reactant
    • 14. N-glycan_beta_GlcNAc_9
      N-glycan_alpha_man_4
      GNT-Vattaches GlcNAc at position 6
      N-acetyl-glucosaminyl_transferase_V
      UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=>
      UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2
      UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021
      N-Glycosylation metabolic pathway
      GNT-Iattaches GlcNAc at position 2
    • 15. Maturing capabilites and ongoing research
      Ontology Creation
      SemanticAnnotation & Textmining: Entity recognition, Relationship extraction
      SemanticIntegration & Provenance:
      Integratingalltypesof data used in biomedicalresearch: text, experimetal data, curated/structured/publicandmultimedia
      Semantic search, browsing, analysis
      Clinical and Scientific Workflows with semantic web services
      SemanticExplorationofscientific literature, Undiscovered publicknowledge
    • 16. Project 1: ASEMR
      Why:Improve Quality of Care and Decision Making without loss of Efficiency in active Cardiology practice.
      What: Use of semantic Web technologies for clinical decision support
      Where: Athens Heart Center & its partners and labs
      Status: In usecontinuously since 01/2006
    • 17. Operational since January 2006
      Details: http://knoesis.org/library/resource.php?id=00004
    • 18. Active Semantic EMR
      Annotate ICD9s
      Annotate Doctors
      Lexical Annotation
      Insurance Formulary
      Level 3 Drug Interaction
      Drug Allergy
      Demo at: http://knoesis.org/library/demos/
    • 19. Project 2: Glycomics
      Why:To help in the treatment of certain kinds of cancer and Parkinson's Disease.
      What: Semantic Annotation of Experiment Data
      Where:Complex Carbohydrate Research Center, UGA
      Status: Research prototype in use
      Workflow with Semantic Annotation of Experimental Data already in use
    • 20. N-Glycosylation Process (NGP)
      Cell Culture
      extract
      Glycoprotein Fraction
      proteolysis
      Glycopeptides Fraction
      1
      Separation technique I
      n
      Glycopeptides Fraction
      PNGase
      n
      Peptide Fraction
      Separation technique II
      n*m
      Peptide Fraction
      Mass spectrometry
      ms data
      ms/ms data
      Data reduction
      Data reduction
      ms peaklist
      ms/ms peaklist
      binning
      Peptide identification
      Glycopeptide identification
      and quantification
      Peptide list
      N-dimensional array
      Data correlation
      Signal integration
    • 21. Agent
      Agent
      Agent
      Agent
      Biological Sample
      Analysis by MS/MS
      Raw Data to
      Standard Format
      Data
      Pre- process
      DB Search
      (Mascot/Sequest)
      Results Post-process
      (ProValt)
      O
      I
      O
      I
      O
      I
      O
      I
      O
      Storage
      Standard Format
      Data
      Raw Data
      Filtered Data
      Search Results
      Final Output
      Biological Information
      Scientific workflow for proteome analysis
      Semantic
      Annotation
      Applications
    • 22. Semantic Annotation of Experimental Data
      parent ion charge
      830.9570 194.9604 2
      580.2985 0.3592
      688.3214 0.2526
      779.4759 38.4939
      784.3607 21.7736
      1543.7476 1.3822
      1544.7595 2.9977
      1562.8113 37.4790
      1660.7776 476.5043
      parent ion m/z
      parent ionabundance
      fragment ion m/z
      fragment ionabundance
      ms/ms peaklist data
      Mass Spectrometry (MS) Data
    • 23. Semantic Annotation of Experimental Data
      <ms-ms_peak_list>
      <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer”
      mode=“ms-ms”/>
      <parent_ionm-z=“830.9570” abundance=“194.9604” z=“2”/>
      <fragment_ionm-z=“580.2985” abundance=“0.3592”/>
      <fragment_ionm-z=“688.3214” abundance=“0.2526”/>
      <fragment_ionm-z=“779.4759” abundance=“38.4939”/>
      <fragment_ionm-z=“784.3607” abundance=“21.7736”/>
      <fragment_ionm-z=“1543.7476” abundance=“1.3822”/>
      <fragment_ionm-z=“1544.7595” abundance=“2.9977”/>
      <fragment_ionm-z=“1562.8113” abundance=“37.4790”/>
      <fragment_ionm-z=“1660.7776” abundance=“476.5043”/>
      </ms-ms_peak_list>
      OntologicalConcepts
      Semantically Annotated MS Data
    • 24. Project 3:
      Why: To associate genotype and phenotype information for knowledge discovery
      What:integrated data sources to run complex queries
      Enriching data with ontologies for integration, querying, and automation
      Ontologies beyond vocabularies: the power of relationships
      Where: NCRR (NIH)
      Status:Completed
    • 25. Use data to test hypothesis
      Gene name
      GO
      Interactions
      gene
      Sequence
      PubMed
      OMIM
      Link between glycosyltransferase activity and congenital muscular dystrophy?
      Glycosyltransferase
      Congenital muscular dystrophy
      Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
    • 26. In a Web pages world…
      (GeneID: 9215)
      has_associated_disease
      Congenital muscular dystrophy,type 1D
      has_molecular_function
      Acetylglucosaminyl-transferase activity
      Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
    • 27. With the semantically enhanced data
      glycosyltransferase
      GO:0016757
      isa
      GO:0008194
      GO:0016758
      acetylglucosaminyl-transferase
      GO:0008375
      has_molecular_function
      acetylglucosaminyl-transferase
      GO:0008375
      EG:9215
      LARGE
      Muscular dystrophy, congenital, type 1D
      MIM:608840
      has_associated_phenotype
      SELECT DISTINCT ?t ?g ?d {
      ?t is_a GO:0016757 .
      ?g has molecular function ?t .
      ?g has_associated_phenotype ?b2 .
      ?b2 has_textual_description ?d .
      FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”) }
      From medinfo paper.
      Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
    • 28. Project 4: Nicotine Dependence
      Why: For understanding the genetic basis of nicotine dependence.
      What:Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.
      How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources).
      Where: NLM (NIH)
      Status: Completed research
    • 29. Motivation
      NIDA study on nicotine dependency
      List of candidate genes in humans
      Analysis objectives include:
      • Find interactions between genes
      • 30. Identification of active genes – maximum number of pathways
      • 31. Identification of genes based on anatomical locations
      Requires integration of genome and biological pathway information
    • 32. Genome and pathway information integration
      KEGG
      Reactome
      HumanCyc
      Entrez Gene
      GeneOntology
      HomoloGene
    • JBI
    • 41. Entrez
      Knowledge
      Model
      (EKoM)
      BioPAX
      ontology
    • 42. Results: Gene Pathway network and Hub Genes involved with Nicotine Dependence
    • 43. Project 5: T. cruzi SPSE
      Why: For Integrative Parasite Research to help expedite knowledge discovery
      What: Semantics and Services Enabled Problem Solving Environment (PSE) for Trypanosomacruzi
      Where: Center for Tropical and Emerging Global Diseases (CTEGD), UGA
      Who: Kno.e.sis, UGA, NCBO (Stanford)
      Status: Research prototype – in regular lab use
    • 44. Project Outline
      Data Sources
      • Internal Lab Data
      Gene Knockout
      Strain Creation
      Microarray
      Proteome
      • External Database
      Ontological Infrastructure
      • Parasite Lifecycle
      • 45. Parasite Experiment
      Query processing
      • Cuebee
      Results
    • 46. Provenance in Parasite Research
      Gene Name
      Sequence
      Extraction
      Gene Knockout and Strain Creation*
      Related Queries from Biologists
      List all groups in the lab that used a Target Region Plasmid?
      Which researcher created a new strain of the parasite (with ID = 66)?
      An experiment was not successful – has this experiment been conducted earlier? What were the results?
      3‘ & 5’
      Region
      Drug Resistant Plasmid
      Gene Name
      Plasmid
      Construction
      Knockout Construct Plasmid
      T.Cruzi sample
      ?
      Transfection
      Transfected Sample
      Drug
      Selection
      Cloned Sample
      Selected Sample
      Cell
      Cloning
      Cloned
      Sample
      *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
    • 47. Research Accomplishments
      SPSE
      • Integrated internal data with external databases, such as KEGG, GO, and some datasets on TriTrypDB
      • 48. Developed semantic provenance framework and influence W3C community
      • 49. SPSE supports complex biological queries that help find gene knockout, drug and/or vaccination targets. For example:
      • 50. Show me proteins that are downregulated in the epimastigote stage and exist in a single metabolic pathway.
      • 51. Give me the gene knockout summaries, both for plasmid construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosomabrucei.
    • Knowledge driven query formulation
      Complex queries can also include:
      - on-the-fly Web services execution to retrieve additional data
      • inference rules to make implicit knowledge explicit
    • Project 6: HPCO
      Why:collaborative knowledge exploration over scientific literature
      What: An up-to-date knowledge based literature search and exploration framework
      How: Using information extraction, conventional IR, and semantic web technologies for collaborative literature exploration
      Where: AFRL
      Status: Completed research
    • 52. Focused KB Work Flow (Use case: HPCO)
      HPC keywords
      Doozer: Base Hierarchy from Wikipedia
      Focused Pattern based extraction
      SenseLab Neuroscience Ontologies
      Initial KB Creation
      Meta Knowledgebase
      PubMed Abstracts
      Knoesis: Parsing based NLP Triples
      Enrich Knowledge Base
      NLM: Rule based BKR Triples
      Final Knowledge Base
    • 53. Triple Extraction Approaches
      Open Extraction
      No fixed number of predetermined entities and predicates
      At Knoesis – NLP (parsing and dependency trees)
      Supervised Extraction
      Predetermined set of entities and predicates
      At Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniques
      At NLM – NLP and rule based approaches
    • 54. Mapping Triples to Base Hierarchy
      Entities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KB
      Preliminary synonyms based on anchor labels and page redirects in Wikipedia
      Prolactostatin redirects to Dopamine
      Predicates (verbs) and entities are subjected to stemming using Wordnet
    • 55. Scooner: Full Architecture
    • 56. Scooner Features
      Knowledge-based browsing: Relations window, inverse relations, creating trails
      Persistent projects: Work bench, browsing history, comments, filtering
      Collaboration: comments, dashboard, exporting (sub)projects, importing projects
    • 57. Scooner Screenshot
    • 58. New Knowledge/hypothesis Example
      Three triples from different abstracts
      VIP Peptide – increases – Catecholamine Biosynthesis
      Catecholamines – induce – β-adrenergic receptor activity
      β-adrenergic receptors – are involved – fear conditioning
      New implicit knowledge
      VIP Peptide – affects – fear conditioning
      Caveat: Each triple above was observed in a different organism (cows, mice, humans), but still interesting hypothesis. Scooner’s contextual browsing makes this clear to the user.
    • 59. Project 7: Drug Abuse
      Why: To study social trends in pharmaceutical opioid abuse
      What:
      Describe drug user’s knowledge, attitudes, and behaviors related to illicit use of OxyContin®
      Describe temporal patterns of non-medical use of OxyContin® tablets as discussed on Web-based forums
      Where: CITAR (Center for Interventions, Treatment and Addictions Research) at Wright State Univ.
      Status: In-progress (Recently funded from NIDA)
    • 60.
    • 61. Project 8: NMR
      Why: Streamline the NMR data processing tasks. Processing NMR experimental data is complex and time consuming.
      What: Providing biologists with tools to effectively process and manage Nuclear Magnetic Resonance (NMR) experimental data.
      How: Use Domain Specific Languages (DSL) to create scientist-friendly abstractions for complex statistical workflows. Use semantics based techniques to store and manage data.
      Where: Air Force Research Lab
      Status: In progress
    • 62. Motivation
      • NMR spectroscopy data is complex and require significant statistical processing before interpreting
      - Writing these processes is hard
      - They have to run on many different computational platforms
      - The data collected has to be shared among multiple parties
      A simple NMR spectrum, highlighting peaks that correspond to the presence of specific chemical compounds
    • 63. A complex NMR spectrum, marked with chemical compound identifiers by human observers.
    • 64. Project Outline
      • Identify fundamental operators required for common NMR processing tasks
      • 65. Use a DSL to provide abstractions for the operators (named SCALE)
      • 66. Build compilers to generate multiple, cloud-enabled applications
    • Real time Healthcare Information
      Matching medical requirements with availability of medical resources (Mumbai, India)
      Project HERO Helpline for Emergency Response Operations
      For patients seeking for immediate medical help
      Medical awareness in rural India
      mMitra, info. service during pregnancy and childhood emergency
      Medical
      Emergency
      Medical
      Resources
      Information bridge
    • 67. Future Interoperability Challenge:360 degree health
      Insurance,
      Financial Aspects
      Clinical Care
      Follow up,
      Lifestyle
      Genetic Tests…
      Profiles
      Clinical Trials
      Social Media
    • 68. For each component in 360-degree health care, we have data, processes, knowledge and experience. Interoperability solutions need to encompass all these!
      Possibly largest growth in data will be in sensors (eg Body Area Networks, Biosensors) and social content. Extensive use of mobile phones.
      Credit: ece.virginia.edu
    • 69. Summary
      Semantic Web is an “interoperability technology”
      Semantic Web provides the needed interoperability, and can accommodate all necessary “points of view”
      Linked Data as a way of sharing data is highly promising
      Many examples of viable usage of Semantic Web technologies
      Words of warning about deployment
      Significant research challenges remain as Health presents the most complex domain
    • 70. Representative References
      A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006.
      SatyaSahoo, Olivier Bodenreider, Kelly Zeng, and AmitSheth, An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and Phenotype InformationWWW2007 HCLS Workshop, May 2007.
      Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and AmitSheth, From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4
      Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence, Journal of Biomedical Informatics, 2008.
      CarticRamakrishnan, Krzysztof J. Kochut, and AmitSheth, "A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583-596
      Satya S. Sahoo, Christopher Thomas, AmitSheth, William S. York, and SamirTartir, "Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006.
      Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth and KrishnaprasadThirunarayan, 'Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data.’ SSDBM, Heidelberg, Germany 2010.
      Papers: http://knoesis.org/library
      Demos at: http://knoesis.wright.edu/library/demos/