Semantic (Web) Technologies for Translational Research in Life Sciences
Upcoming SlideShare
Loading in...5
×
 

Semantic (Web) Technologies for Translational Research in Life Sciences

on

  • 1,597 views

 

Statistics

Views

Total Views
1,597
Views on SlideShare
1,597
Embed Views
0

Actions

Likes
1
Downloads
27
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Cognitive model, cognitive behavioral model
  • In parasite research, create new strains of a parasite by knocking out specific genes. So, given a cloned sample, we may need to know the gene(s) that was knocked out.Both these scenarios are real world examples of the importance of provenance. There are many research issues in provenance management. This presentation is on addressing 1) the provenance modeling issue. Specifically, provenance interoperability, consistent modeling, and reduction of terminological heterogeneity. (2) Provenance Query
  • References: http://www.armman.org/projecthero http://www.armman.org/mmitra

Semantic (Web) Technologies for Translational Research in Life Sciences Semantic (Web) Technologies for Translational Research in Life Sciences Presentation Transcript

  • Semantic (Web) Technologies for Translational Research in Life Sciences
    Ohio State University, June 16, 2011
    Amit P. Sheth
    Ohio Center ofExcellence in Knowledge-enabled Computing (Kno.e.sis)
    amit.sheth@wright.edu
    Thanks to Kno.e.sis team (Satya, Priti, Rama, and Ajith);
    Collaborators at CTEGD UGA(Dr. Tarleton, Brent Weatherly), NLM(Olivier Bodenreider), CCRC, UGA (Will York), NCBO/Stanford,
    CITAR/WSU
  • Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing
  • Web ofpeople
    - social networks, user-createdcasualcontent
    Web of resources
    - data, service, data, mashups
    Web of databases
    - dynamically generated pages
    - web query interfaces
    Web of pages
    - text, manually created links
    - extensive navigation
    Evolutionof Web & Semantic Computing
    Tech assimilated in life
    Web ofSensors, Devices/IoT
    - 40 billionsensors, 5 billionmobile connections
    2007
    Situations,
    Events
    Web 3.0
    Semantic TechnologyUsed
    Objects
    Web 2.0
    Patterns
    Keywords
    1997
    Web 1.0
    View slide
  • Outline
    Semantic Web – very brief intro
    Scenarios to demonstrate the applications and benefit of semantic web technologies
    HealthCare
    BiomedicalResearch
    Translational
    View slide
  • Biomedical Informatics...
    Biomedical Informatics
    Pubmed
    Clinical
    Trials.gov
    ...needs a connection
    Hypothesis Validation
    Experiment design
    Predictions
    Personalized medicine
    Semantic Web research aims at
    providing this connection!
    Etiology
    Pathogenesis
    Clinical findings
    Diagnosis
    Prognosis
    Treatment
    Genome
    Transcriptome
    Proteome
    Metabolome
    Physiome
    ...ome
    More advanced capabilities for
    search,
    integration,
    analysis,
    linking to new insights
    and discoveries!
    Genbank
    Uniprot
    Medical Informatics
    Bioinformatics
  • Decision Making, Insights, InnovationsHuman Performance
    Data and Facts
    Knowledge and Understanding
    Health & Performance
    Cognitive Science, Psychology
    Neuroscience
    Anatomy, Physiology
    Cellular biology
    Molecular Biology
    ACATATGGGTACTATTTACTATTCATGGGTACTATTTATGGCATATGGCGTACTATTCTAATCCTATATCCGTCTAATCTATTTACTATTATCTATTACTATACCTTTTGGGGAAAAAAATTCTATACCGTCTAATCCTATAAATCAAGCCG
    Biochemistry
  • Semantic Web standards @ W3C
    Semantic Web is built in a layered manner
    Not everybody needs all the layers

    Queries: SPARQL, Rules: RIF
    Semantic Web
    Rich ontologies: OWL
    Simple data models & taxonomies: RDF Schema
    Uniformmetamodel: RDF+ URI
    Encoding structure: XML
    Encoding characters : Unicode
  • Linked Data: Semantic Web “diluted”
    Achieve for data what Web did to documents
    Relationship with the original Semantic Web vision: no AI, no agents, no autonomy
    Interoperability is still very important
    interoperability of formats
    interoperability of semantics
    Enables interchange of large data sets
    (thus very useful in, say, collaborative research)
    Semantic Web vision is largely predicated on the availability of data
    Linked Data is a movement that gets us there
    Thanks – OraLassila
  • Opportunity: exploiting clinical and biomedical data
    text
    Health
    Information
    Services
    Elsevier
    iConsult
    Scientific
    Literature
    PubMed
    300 Documents
    Published Online
    each day
    User-contributed
    Content (Informal)
    GeneRifs
    WikiGene
    NCBI
    Public Datasets
    Genome,
    Protein DBs
    new sequences
    daily
    Laboratory
    Data
    Lab tests,
    RTPCR,
    Mass spec
    Clinical Data
    Personal
    health history
    Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.
  • Major Community Efforts
    W3C Semantic Web Health Care & Life Sciences Interest Group: http://www.w3.org/2001/sw/hcls/
    Clinical Observations Interoperability: EMR + Clinical Trials: http://esw.w3.org/HCLS/ClinicalObservationsInteroperability
    National Center for Biomedical Ontologies: http://bioportal.bioontology.org/
  • Major SW Projects
    OpenPHACTS: A knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA). http://www.openphacts.org/
    LarKC: develop the Large Knowledge Collider, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. http://www.larkc.eu/
    NCBO: contribute to collaborative science and translational research. http://bioportal.bioontology.org/
  • Semantic Web Enablers and Techniques
    Ontology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge base
    Semantic Annotation (meatadata Extraction): Manual, Semi-automatic (automatic with human verification), Automatic
    Semantic Computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, inferencing, reasoning, hypothesis validation, discovery, visualization
  • Drug Ontology Hierarchy(showing is-a relationships)
    owl:thing
    prescription_drug_ brand_name
    brandname_undeclared
    brandname_composite
    prescription_drug
    monograph_ix_class
    cpnum_ group
    prescription_drug_ property
    indication_ property
    formulary_ property
    non_drug_ reactant
    interaction_property
    property
    formulary
    brandname_individual
    interaction_with_prescription_drug
    interaction
    indication
    generic_ individual
    prescription_drug_ generic
    generic_ composite
    interaction_with_monograph_ix_class
    interaction_ with_non_ drug_reactant
  • N-glycan_beta_GlcNAc_9
    N-glycan_alpha_man_4
    GNT-Vattaches GlcNAc at position 6
    N-acetyl-glucosaminyl_transferase_V
    UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=>
    UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2
    UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021
    N-Glycosylation metabolic pathway
    GNT-Iattaches GlcNAc at position 2
  • Maturing capabilites and ongoing research
    Ontology Creation
    SemanticAnnotation & Textmining: Entity recognition, Relationship extraction
    SemanticIntegration & Provenance:
    Integratingalltypesof data used in biomedicalresearch: text, experimetal data, curated/structured/publicandmultimedia
    Semantic search, browsing, analysis
    Clinical and Scientific Workflows with semantic web services
    SemanticExplorationofscientific literature, Undiscovered publicknowledge
  • Project 1: ASEMR
    Why:Improve Quality of Care and Decision Making without loss of Efficiency in active Cardiology practice.
    What: Use of semantic Web technologies for clinical decision support
    Where: Athens Heart Center & its partners and labs
    Status: In usecontinuously since 01/2006
  • Operational since January 2006
    Details: http://knoesis.org/library/resource.php?id=00004
  • Active Semantic EMR
    Annotate ICD9s
    Annotate Doctors
    Lexical Annotation
    Insurance Formulary
    Level 3 Drug Interaction
    Drug Allergy
    Demo at: http://knoesis.org/library/demos/
  • Project 2: Glycomics
    Why:To help in the treatment of certain kinds of cancer and Parkinson's Disease.
    What: Semantic Annotation of Experiment Data
    Where:Complex Carbohydrate Research Center, UGA
    Status: Research prototype in use
    Workflow with Semantic Annotation of Experimental Data already in use
  • N-Glycosylation Process (NGP)
    Cell Culture
    extract
    Glycoprotein Fraction
    proteolysis
    Glycopeptides Fraction
    1
    Separation technique I
    n
    Glycopeptides Fraction
    PNGase
    n
    Peptide Fraction
    Separation technique II
    n*m
    Peptide Fraction
    Mass spectrometry
    ms data
    ms/ms data
    Data reduction
    Data reduction
    ms peaklist
    ms/ms peaklist
    binning
    Peptide identification
    Glycopeptide identification
    and quantification
    Peptide list
    N-dimensional array
    Data correlation
    Signal integration
  • Agent
    Agent
    Agent
    Agent
    Biological Sample
    Analysis by MS/MS
    Raw Data to
    Standard Format
    Data
    Pre- process
    DB Search
    (Mascot/Sequest)
    Results Post-process
    (ProValt)
    O
    I
    O
    I
    O
    I
    O
    I
    O
    Storage
    Standard Format
    Data
    Raw Data
    Filtered Data
    Search Results
    Final Output
    Biological Information
    Scientific workflow for proteome analysis
    Semantic
    Annotation
    Applications
  • Semantic Annotation of Experimental Data
    parent ion charge
    830.9570 194.9604 2
    580.2985 0.3592
    688.3214 0.2526
    779.4759 38.4939
    784.3607 21.7736
    1543.7476 1.3822
    1544.7595 2.9977
    1562.8113 37.4790
    1660.7776 476.5043
    parent ion m/z
    parent ionabundance
    fragment ion m/z
    fragment ionabundance
    ms/ms peaklist data
    Mass Spectrometry (MS) Data
  • Semantic Annotation of Experimental Data
    <ms-ms_peak_list>
    <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer”
    mode=“ms-ms”/>
    <parent_ionm-z=“830.9570” abundance=“194.9604” z=“2”/>
    <fragment_ionm-z=“580.2985” abundance=“0.3592”/>
    <fragment_ionm-z=“688.3214” abundance=“0.2526”/>
    <fragment_ionm-z=“779.4759” abundance=“38.4939”/>
    <fragment_ionm-z=“784.3607” abundance=“21.7736”/>
    <fragment_ionm-z=“1543.7476” abundance=“1.3822”/>
    <fragment_ionm-z=“1544.7595” abundance=“2.9977”/>
    <fragment_ionm-z=“1562.8113” abundance=“37.4790”/>
    <fragment_ionm-z=“1660.7776” abundance=“476.5043”/>
    </ms-ms_peak_list>
    OntologicalConcepts
    Semantically Annotated MS Data
  • Project 3:
    Why: To associate genotype and phenotype information for knowledge discovery
    What:integrated data sources to run complex queries
    Enriching data with ontologies for integration, querying, and automation
    Ontologies beyond vocabularies: the power of relationships
    Where: NCRR (NIH)
    Status:Completed
  • Use data to test hypothesis
    Gene name
    GO
    Interactions
    gene
    Sequence
    PubMed
    OMIM
    Link between glycosyltransferase activity and congenital muscular dystrophy?
    Glycosyltransferase
    Congenital muscular dystrophy
    Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • In a Web pages world…
    (GeneID: 9215)
    has_associated_disease
    Congenital muscular dystrophy,type 1D
    has_molecular_function
    Acetylglucosaminyl-transferase activity
    Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • With the semantically enhanced data
    glycosyltransferase
    GO:0016757
    isa
    GO:0008194
    GO:0016758
    acetylglucosaminyl-transferase
    GO:0008375
    has_molecular_function
    acetylglucosaminyl-transferase
    GO:0008375
    EG:9215
    LARGE
    Muscular dystrophy, congenital, type 1D
    MIM:608840
    has_associated_phenotype
    SELECT DISTINCT ?t ?g ?d {
    ?t is_a GO:0016757 .
    ?g has molecular function ?t .
    ?g has_associated_phenotype ?b2 .
    ?b2 has_textual_description ?d .
    FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”) }
    From medinfo paper.
    Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • Project 4: Nicotine Dependence
    Why: For understanding the genetic basis of nicotine dependence.
    What:Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.
    How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources).
    Where: NLM (NIH)
    Status: Completed research
  • Motivation
    NIDA study on nicotine dependency
    List of candidate genes in humans
    Analysis objectives include:
    • Find interactions between genes
    • Identification of active genes – maximum number of pathways
    • Identification of genes based on anatomical locations
    Requires integration of genome and biological pathway information
  • Genome and pathway information integration
    KEGG
    Reactome
    HumanCyc
    • pathway
    • protein
    • pmid
    Entrez Gene
    • pathway
    • protein
    • pmid
    • pathway
    • protein
    • pmid
    GeneOntology
    HomoloGene
    • GO ID
    • HomoloGene ID
  • JBI
  • Entrez
    Knowledge
    Model
    (EKoM)
    BioPAX
    ontology
  • Results: Gene Pathway network and Hub Genes involved with Nicotine Dependence
  • Project 5: T. cruzi SPSE
    Why: For Integrative Parasite Research to help expedite knowledge discovery
    What: Semantics and Services Enabled Problem Solving Environment (PSE) for Trypanosomacruzi
    Where: Center for Tropical and Emerging Global Diseases (CTEGD), UGA
    Who: Kno.e.sis, UGA, NCBO (Stanford)
    Status: Research prototype – in regular lab use
  • Project Outline
    Data Sources
    • Internal Lab Data
    Gene Knockout
    Strain Creation
    Microarray
    Proteome
    • External Database
    Ontological Infrastructure
    • Parasite Lifecycle
    • Parasite Experiment
    Query processing
    • Cuebee
    Results
  • Provenance in Parasite Research
    Gene Name
    Sequence
    Extraction
    Gene Knockout and Strain Creation*
    Related Queries from Biologists
    List all groups in the lab that used a Target Region Plasmid?
    Which researcher created a new strain of the parasite (with ID = 66)?
    An experiment was not successful – has this experiment been conducted earlier? What were the results?
    3‘ & 5’
    Region
    Drug Resistant Plasmid
    Gene Name
    Plasmid
    Construction
    Knockout Construct Plasmid
    T.Cruzi sample
    ?
    Transfection
    Transfected Sample
    Drug
    Selection
    Cloned Sample
    Selected Sample
    Cell
    Cloning
    Cloned
    Sample
    *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
  • Research Accomplishments
    SPSE
    • Integrated internal data with external databases, such as KEGG, GO, and some datasets on TriTrypDB
    • Developed semantic provenance framework and influence W3C community
    • SPSE supports complex biological queries that help find gene knockout, drug and/or vaccination targets. For example:
    • Show me proteins that are downregulated in the epimastigote stage and exist in a single metabolic pathway.
    • Give me the gene knockout summaries, both for plasmid construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosomabrucei.
  • Knowledge driven query formulation
    Complex queries can also include:
    - on-the-fly Web services execution to retrieve additional data
    • inference rules to make implicit knowledge explicit
  • Project 6: HPCO
    Why:collaborative knowledge exploration over scientific literature
    What: An up-to-date knowledge based literature search and exploration framework
    How: Using information extraction, conventional IR, and semantic web technologies for collaborative literature exploration
    Where: AFRL
    Status: Completed research
  • Focused KB Work Flow (Use case: HPCO)
    HPC keywords
    Doozer: Base Hierarchy from Wikipedia
    Focused Pattern based extraction
    SenseLab Neuroscience Ontologies
    Initial KB Creation
    Meta Knowledgebase
    PubMed Abstracts
    Knoesis: Parsing based NLP Triples
    Enrich Knowledge Base
    NLM: Rule based BKR Triples
    Final Knowledge Base
  • Triple Extraction Approaches
    Open Extraction
    No fixed number of predetermined entities and predicates
    At Knoesis – NLP (parsing and dependency trees)
    Supervised Extraction
    Predetermined set of entities and predicates
    At Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniques
    At NLM – NLP and rule based approaches
  • Mapping Triples to Base Hierarchy
    Entities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KB
    Preliminary synonyms based on anchor labels and page redirects in Wikipedia
    Prolactostatin redirects to Dopamine
    Predicates (verbs) and entities are subjected to stemming using Wordnet
  • Scooner: Full Architecture
  • Scooner Features
    Knowledge-based browsing: Relations window, inverse relations, creating trails
    Persistent projects: Work bench, browsing history, comments, filtering
    Collaboration: comments, dashboard, exporting (sub)projects, importing projects
  • Scooner Screenshot
  • New Knowledge/hypothesis Example
    Three triples from different abstracts
    VIP Peptide – increases – Catecholamine Biosynthesis
    Catecholamines – induce – β-adrenergic receptor activity
    β-adrenergic receptors – are involved – fear conditioning
    New implicit knowledge
    VIP Peptide – affects – fear conditioning
    Caveat: Each triple above was observed in a different organism (cows, mice, humans), but still interesting hypothesis. Scooner’s contextual browsing makes this clear to the user.
  • Project 7: Drug Abuse
    Why: To study social trends in pharmaceutical opioid abuse
    What:
    Describe drug user’s knowledge, attitudes, and behaviors related to illicit use of OxyContin®
    Describe temporal patterns of non-medical use of OxyContin® tablets as discussed on Web-based forums
    Where: CITAR (Center for Interventions, Treatment and Addictions Research) at Wright State Univ.
    Status: In-progress (Recently funded from NIDA)
  • Project 8: NMR
    Why: Streamline the NMR data processing tasks. Processing NMR experimental data is complex and time consuming.
    What: Providing biologists with tools to effectively process and manage Nuclear Magnetic Resonance (NMR) experimental data.
    How: Use Domain Specific Languages (DSL) to create scientist-friendly abstractions for complex statistical workflows. Use semantics based techniques to store and manage data.
    Where: Air Force Research Lab
    Status: In progress
  • Motivation
    • NMR spectroscopy data is complex and require significant statistical processing before interpreting
    - Writing these processes is hard
    - They have to run on many different computational platforms
    - The data collected has to be shared among multiple parties
    A simple NMR spectrum, highlighting peaks that correspond to the presence of specific chemical compounds
  • A complex NMR spectrum, marked with chemical compound identifiers by human observers.
  • Project Outline
    • Identify fundamental operators required for common NMR processing tasks
    • Use a DSL to provide abstractions for the operators (named SCALE)
    • Build compilers to generate multiple, cloud-enabled applications
  • Real time Healthcare Information
    Matching medical requirements with availability of medical resources (Mumbai, India)
    Project HERO Helpline for Emergency Response Operations
    For patients seeking for immediate medical help
    Medical awareness in rural India
    mMitra, info. service during pregnancy and childhood emergency
    Medical
    Emergency
    Medical
    Resources
    Information bridge
  • Future Interoperability Challenge:360 degree health
    Insurance,
    Financial Aspects
    Clinical Care
    Follow up,
    Lifestyle
    Genetic Tests…
    Profiles
    Clinical Trials
    Social Media
  • For each component in 360-degree health care, we have data, processes, knowledge and experience. Interoperability solutions need to encompass all these!
    Possibly largest growth in data will be in sensors (eg Body Area Networks, Biosensors) and social content. Extensive use of mobile phones.
    Credit: ece.virginia.edu
  • Summary
    Semantic Web is an “interoperability technology”
    Semantic Web provides the needed interoperability, and can accommodate all necessary “points of view”
    Linked Data as a way of sharing data is highly promising
    Many examples of viable usage of Semantic Web technologies
    Words of warning about deployment
    Significant research challenges remain as Health presents the most complex domain
  • Representative References
    A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006.
    SatyaSahoo, Olivier Bodenreider, Kelly Zeng, and AmitSheth, An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and Phenotype InformationWWW2007 HCLS Workshop, May 2007.
    Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and AmitSheth, From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4
    Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence, Journal of Biomedical Informatics, 2008.
    CarticRamakrishnan, Krzysztof J. Kochut, and AmitSheth, "A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583-596
    Satya S. Sahoo, Christopher Thomas, AmitSheth, William S. York, and SamirTartir, "Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006.
    Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth and KrishnaprasadThirunarayan, 'Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data.’ SSDBM, Heidelberg, Germany 2010.
    Papers: http://knoesis.org/library
    Demos at: http://knoesis.wright.edu/library/demos/