Semantic (Web) Technologies for Translational Research in Life SciencesOhio State University, June 16, 2011Amit P. ShethOhio Center ofExcellence in Knowledge-enabled Computing (Kno.e.sis)amit.sheth@wright.eduThanks to Kno.e.sis team (Satya, Priti, Rama, and Ajith);Collaborators at CTEGD UGA(Dr. Tarleton, Brent Weatherly), NLM(Olivier Bodenreider), CCRC, UGA (Will York), NCBO/Stanford, CITAR/WSU
Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing
Web ofpeople   - social networks, user-createdcasualcontentWeb of resources    - data, service, data, mashupsWeb of databases   - dynamically generated pages   - web query interfacesWeb of pages   - text, manually created links   - extensive navigationEvolutionof Web & Semantic ComputingTech assimilated in lifeWeb ofSensors, Devices/IoT- 40 billionsensors, 5 billionmobile connections2007Situations,EventsWeb 3.0Semantic TechnologyUsedObjectsWeb 2.0PatternsKeywords1997Web 1.0
OutlineSemantic Web – very brief introScenarios to demonstrate the applications and benefit of semantic web technologiesHealthCareBiomedicalResearchTranslational
Biomedical Informatics...Biomedical InformaticsPubmedClinical Trials.gov...needs a connectionHypothesis ValidationExperiment designPredictionsPersonalized medicineSemantic Web research aims atproviding this connection!Etiology PathogenesisClinical findingsDiagnosisPrognosisTreatmentGenomeTranscriptomeProteomeMetabolomePhysiome...omeMore advanced capabilities for 	search, 	integration, 	analysis, 	linking to new insights 	and discoveries!GenbankUniprotMedical InformaticsBioinformatics
Decision Making, Insights, InnovationsHuman PerformanceData and FactsKnowledge and UnderstandingHealth & PerformanceCognitive Science, PsychologyNeuroscienceAnatomy, PhysiologyCellular biologyMolecular BiologyACATATGGGTACTATTTACTATTCATGGGTACTATTTATGGCATATGGCGTACTATTCTAATCCTATATCCGTCTAATCTATTTACTATTATCTATTACTATACCTTTTGGGGAAAAAAATTCTATACCGTCTAATCCTATAAATCAAGCCGBiochemistry
Semantic Web standards @ W3CSemantic Web is built in a layered mannerNot everybody needs all the layers…Queries: SPARQL, Rules: RIFSemantic WebRich ontologies: OWLSimple data models & taxonomies: RDF Schema Uniformmetamodel: RDF+ URI Encoding structure: XML Encoding characters : Unicode
Linked Data: Semantic Web “diluted”Achieve for data what Web did to documentsRelationship with the original Semantic Web vision: no AI, no agents, no autonomyInteroperability is still very importantinteroperability of formatsinteroperability of semanticsEnables interchange of large data sets(thus very useful in, say, collaborative research)Semantic Web vision is largely predicated on the availability of dataLinked Data is a movement that gets us thereThanks – OraLassila
Opportunity: exploiting clinical and biomedical datatextHealth Information ServicesElsevier iConsultScientific LiteraturePubMed300 Documents Published Online each dayUser-contributed Content (Informal)GeneRifsWikiGeneNCBI Public DatasetsGenome, Protein DBsnew sequencesdailyLaboratory DataLab tests, RTPCR,Mass specClinical DataPersonal health historySearch, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.
Major Community EffortsW3C Semantic Web Health Care & Life Sciences Interest Group: http://www.w3.org/2001/sw/hcls/Clinical Observations Interoperability: EMR + Clinical Trials: http://esw.w3.org/HCLS/ClinicalObservationsInteroperabilityNational Center for Biomedical Ontologies: http://bioportal.bioontology.org/
Major SW ProjectsOpenPHACTS: A knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA). http://www.openphacts.org/LarKC: develop the Large Knowledge Collider, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. http://www.larkc.eu/NCBO: contribute to collaborative science and translational research. http://bioportal.bioontology.org/
Semantic Web Enablers and TechniquesOntology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge baseSemantic Annotation (meatadata Extraction): Manual, Semi-automatic (automatic with human verification), AutomaticSemantic Computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, inferencing, reasoning, hypothesis validation, discovery, visualization
Drug Ontology Hierarchy(showing is-a relationships)owl:thingprescription_drug_ brand_namebrandname_undeclaredbrandname_compositeprescription_drugmonograph_ix_classcpnum_ groupprescription_drug_ propertyindication_ propertyformulary_ propertynon_drug_ reactantinteraction_propertypropertyformularybrandname_individualinteraction_with_prescription_druginteractionindicationgeneric_ individualprescription_drug_ genericgeneric_ compositeinteraction_with_monograph_ix_classinteraction_ with_non_ drug_reactant
N-glycan_beta_GlcNAc_9N-glycan_alpha_man_4GNT-Vattaches GlcNAc at position 6N-acetyl-glucosaminyl_transferase_VUDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 N-Glycosylation metabolic pathwayGNT-Iattaches GlcNAc at position 2
Maturing capabilites and ongoing researchOntology CreationSemanticAnnotation & Textmining: Entity recognition, Relationship extractionSemanticIntegration & Provenance: Integratingalltypesof data used in biomedicalresearch: text, experimetal data, curated/structured/publicandmultimediaSemantic search, browsing, analysisClinical and Scientific Workflows with semantic web servicesSemanticExplorationofscientific literature, Undiscovered publicknowledge
Project 1: ASEMRWhy:Improve Quality of Care and Decision Making without loss of Efficiency in active Cardiology practice. What: Use of semantic Web technologies for clinical decision supportWhere: Athens Heart Center & its partners and labsStatus: In usecontinuously since 01/2006
Operational since January 2006Details: http://knoesis.org/library/resource.php?id=00004
Active Semantic EMRAnnotate ICD9sAnnotate DoctorsLexical AnnotationInsurance FormularyLevel 3 Drug InteractionDrug AllergyDemo at: http://knoesis.org/library/demos/
Project 2: GlycomicsWhy:To help in the treatment of certain kinds of cancer and Parkinson's Disease.What: Semantic Annotation of Experiment DataWhere:Complex Carbohydrate Research Center, UGAStatus: Research prototype in useWorkflow with Semantic Annotation of Experimental Data already in use
N-Glycosylation Process (NGP)Cell CultureextractGlycoprotein FractionproteolysisGlycopeptides Fraction1Separation technique InGlycopeptides FractionPNGasenPeptide FractionSeparation technique IIn*mPeptide FractionMass spectrometryms datams/ms dataData reductionData reductionms peaklistms/ms peaklistbinningPeptide identificationGlycopeptide identificationand quantificationPeptide listN-dimensional arrayData correlationSignal integration
Agent Agent Agent Agent Biological Sample Analysis by MS/MSRaw Data toStandard FormatDataPre- processDB Search(Mascot/Sequest)Results Post-process(ProValt)OIOIOIOIOStorageStandard FormatDataRaw DataFiltered DataSearch ResultsFinal OutputBiological InformationScientific workflow for proteome analysisSemanticAnnotationApplications
Semantic Annotation of Experimental Data parent ion charge830.9570    194.9604    2    580.2985     0.3592    688.3214     0.2526    779.4759    38.4939    784.3607    21.7736   1543.7476     1.3822   1544.7595     2.9977   1562.8113    37.4790   1660.7776   476.5043parent ion m/zparent ionabundancefragment ion m/zfragment ionabundancems/ms peaklist dataMass Spectrometry (MS) Data
Semantic Annotation of Experimental Data <ms-ms_peak_list><parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer”       mode=“ms-ms”/>	<parent_ionm-z=“830.9570” abundance=“194.9604” z=“2”/>			<fragment_ionm-z=“580.2985” abundance=“0.3592”/>			<fragment_ionm-z=“688.3214” abundance=“0.2526”/>			<fragment_ionm-z=“779.4759” abundance=“38.4939”/>			<fragment_ionm-z=“784.3607” abundance=“21.7736”/>			<fragment_ionm-z=“1543.7476” abundance=“1.3822”/>			<fragment_ionm-z=“1544.7595” abundance=“2.9977”/>			<fragment_ionm-z=“1562.8113” abundance=“37.4790”/>			<fragment_ionm-z=“1660.7776” abundance=“476.5043”/></ms-ms_peak_list>OntologicalConceptsSemantically Annotated MS Data
Project 3: Why: To associate genotype and phenotype information for knowledge discoveryWhat:integrated data sources to run complex queriesEnriching data with ontologies for integration, querying, and automationOntologies beyond vocabularies: the power of relationshipsWhere: NCRR (NIH) Status:Completed
Use data to test hypothesisGene nameGOInteractionsgeneSequencePubMedOMIMLink between glycosyltransferase activity and congenital muscular dystrophy?GlycosyltransferaseCongenital muscular dystrophyAdapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
In a Web pages world…(GeneID: 9215)has_associated_diseaseCongenital muscular dystrophy,type 1Dhas_molecular_functionAcetylglucosaminyl-transferase activityAdapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
With the semantically enhanced dataglycosyltransferaseGO:0016757isaGO:0008194GO:0016758acetylglucosaminyl-transferaseGO:0008375has_molecular_functionacetylglucosaminyl-transferaseGO:0008375EG:9215LARGEMuscular dystrophy, congenital, type 1D MIM:608840has_associated_phenotypeSELECT DISTINCT ?t ?g ?d  {    ?t is_a GO:0016757 .    ?g has molecular function ?t .    ?g has_associated_phenotype ?b2 .    ?b2 has_textual_description ?d .FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”)      }From medinfo paper.Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
Project 4: Nicotine DependenceWhy: For understanding the genetic basis of nicotine dependence. What:Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources). Where: NLM (NIH) Status: Completed research
MotivationNIDA study on nicotine dependencyList of candidate genes in humansAnalysis objectives include:Find interactions between genes
Identification of active genes – maximum number of pathways
Identification of genes based on anatomical locationsRequires integration of genome and biological pathway information
Genome and pathway information integrationKEGGReactomeHumanCycpathway
protein
pmidEntrez Genepathway
protein
pmid
pathway
protein
pmidGeneOntologyHomoloGeneGO ID
HomoloGene IDJBI
EntrezKnowledgeModel(EKoM)BioPAXontology
Results: Gene Pathway network and Hub Genes involved with Nicotine Dependence
Project 5: T. cruzi SPSE Why: For Integrative Parasite Research to help expedite knowledge discoveryWhat: Semantics and Services Enabled Problem Solving Environment (PSE) for TrypanosomacruziWhere: Center for Tropical and Emerging Global  Diseases (CTEGD), UGA Who: Kno.e.sis, UGA, NCBO (Stanford)Status: Research prototype – in regular lab use
Project OutlineData SourcesInternal Lab DataGene KnockoutStrain CreationMicroarrayProteomeExternal DatabaseOntological InfrastructureParasite Lifecycle
Parasite ExperimentQuery processingCuebeeResults
Provenance in Parasite ResearchGene NameSequenceExtractionGene Knockout and Strain Creation*Related Queries from BiologistsList all groups in the lab that used a Target Region Plasmid?Which researcher created a new strain of the parasite (with ID = 66)?An experiment was not successful – has this experiment been conducted earlier? What were the results? 3‘ & 5’RegionDrug Resistant PlasmidGene NamePlasmidConstructionKnockout Construct PlasmidT.Cruzi sample?TransfectionTransfected SampleDrugSelectionCloned SampleSelected SampleCellCloningClonedSample*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
Research AccomplishmentsSPSEIntegrated internal data with external databases, such as KEGG, GO, and some datasets on TriTrypDB
Developed semantic provenance framework and influence W3C community
SPSE supports complex biological queries that help find gene knockout, drug and/or vaccination targets.  For example:
Show me proteins that are downregulated in the epimastigote stage and exist in a single metabolic pathway.
Give me the gene knockout summaries, both for plasmid construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosomabrucei.Knowledge driven query formulationComplex queries can also include:- on-the-fly Web services execution to retrieve additional data inference rules to make implicit knowledge explicitProject 6: HPCOWhy:collaborative knowledge exploration over scientific literature What: An up-to-date knowledge based literature search and exploration framework How:  Using information extraction, conventional IR, and semantic web technologies for collaborative literature explorationWhere: AFRLStatus: Completed research
 Focused KB Work Flow  (Use case: HPCO)HPC keywordsDoozer: Base Hierarchy from WikipediaFocused Pattern based extractionSenseLab Neuroscience OntologiesInitial KB CreationMeta KnowledgebasePubMed AbstractsKnoesis: Parsing based NLP Triples  Enrich Knowledge BaseNLM: Rule based BKR TriplesFinal Knowledge Base
 Triple Extraction ApproachesOpen Extraction No fixed number of predetermined entities and predicatesAt  Knoesis – NLP (parsing and dependency trees)Supervised ExtractionPredetermined set of entities and predicatesAt  Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniquesAt NLM – NLP and rule based approaches
Mapping Triples to Base HierarchyEntities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KBPreliminary synonyms based on anchor labels and page redirects in WikipediaProlactostatin redirects to DopaminePredicates  (verbs) and entities are subjected to stemming using Wordnet
Scooner:  Full Architecture
Scooner FeaturesKnowledge-based browsing: Relations window, inverse relations, creating trailsPersistent projects: Work bench, browsing history, comments, filteringCollaboration: comments, dashboard, exporting (sub)projects, importing projects
Scooner Screenshot

Semantic (Web) Technologies for Translational Research in Life Sciences

  • 1.
    Semantic (Web) Technologiesfor Translational Research in Life SciencesOhio State University, June 16, 2011Amit P. ShethOhio Center ofExcellence in Knowledge-enabled Computing (Kno.e.sis)amit.sheth@wright.eduThanks to Kno.e.sis team (Satya, Priti, Rama, and Ajith);Collaborators at CTEGD UGA(Dr. Tarleton, Brent Weatherly), NLM(Olivier Bodenreider), CCRC, UGA (Will York), NCBO/Stanford, CITAR/WSU
  • 2.
    Kno.e.sis: Ohio Centerof Excellence in Knowledge-enabled Computing
  • 3.
    Web ofpeople - social networks, user-createdcasualcontentWeb of resources - data, service, data, mashupsWeb of databases - dynamically generated pages - web query interfacesWeb of pages - text, manually created links - extensive navigationEvolutionof Web & Semantic ComputingTech assimilated in lifeWeb ofSensors, Devices/IoT- 40 billionsensors, 5 billionmobile connections2007Situations,EventsWeb 3.0Semantic TechnologyUsedObjectsWeb 2.0PatternsKeywords1997Web 1.0
  • 4.
    OutlineSemantic Web –very brief introScenarios to demonstrate the applications and benefit of semantic web technologiesHealthCareBiomedicalResearchTranslational
  • 5.
    Biomedical Informatics...Biomedical InformaticsPubmedClinicalTrials.gov...needs a connectionHypothesis ValidationExperiment designPredictionsPersonalized medicineSemantic Web research aims atproviding this connection!Etiology PathogenesisClinical findingsDiagnosisPrognosisTreatmentGenomeTranscriptomeProteomeMetabolomePhysiome...omeMore advanced capabilities for search, integration, analysis, linking to new insights and discoveries!GenbankUniprotMedical InformaticsBioinformatics
  • 6.
    Decision Making, Insights,InnovationsHuman PerformanceData and FactsKnowledge and UnderstandingHealth & PerformanceCognitive Science, PsychologyNeuroscienceAnatomy, PhysiologyCellular biologyMolecular BiologyACATATGGGTACTATTTACTATTCATGGGTACTATTTATGGCATATGGCGTACTATTCTAATCCTATATCCGTCTAATCTATTTACTATTATCTATTACTATACCTTTTGGGGAAAAAAATTCTATACCGTCTAATCCTATAAATCAAGCCGBiochemistry
  • 7.
    Semantic Web standards@ W3CSemantic Web is built in a layered mannerNot everybody needs all the layers…Queries: SPARQL, Rules: RIFSemantic WebRich ontologies: OWLSimple data models & taxonomies: RDF Schema Uniformmetamodel: RDF+ URI Encoding structure: XML Encoding characters : Unicode
  • 8.
    Linked Data: SemanticWeb “diluted”Achieve for data what Web did to documentsRelationship with the original Semantic Web vision: no AI, no agents, no autonomyInteroperability is still very importantinteroperability of formatsinteroperability of semanticsEnables interchange of large data sets(thus very useful in, say, collaborative research)Semantic Web vision is largely predicated on the availability of dataLinked Data is a movement that gets us thereThanks – OraLassila
  • 9.
    Opportunity: exploiting clinicaland biomedical datatextHealth Information ServicesElsevier iConsultScientific LiteraturePubMed300 Documents Published Online each dayUser-contributed Content (Informal)GeneRifsWikiGeneNCBI Public DatasetsGenome, Protein DBsnew sequencesdailyLaboratory DataLab tests, RTPCR,Mass specClinical DataPersonal health historySearch, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.
  • 10.
    Major Community EffortsW3CSemantic Web Health Care & Life Sciences Interest Group: http://www.w3.org/2001/sw/hcls/Clinical Observations Interoperability: EMR + Clinical Trials: http://esw.w3.org/HCLS/ClinicalObservationsInteroperabilityNational Center for Biomedical Ontologies: http://bioportal.bioontology.org/
  • 11.
    Major SW ProjectsOpenPHACTS:A knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA). http://www.openphacts.org/LarKC: develop the Large Knowledge Collider, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. http://www.larkc.eu/NCBO: contribute to collaborative science and translational research. http://bioportal.bioontology.org/
  • 12.
    Semantic Web Enablersand TechniquesOntology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge baseSemantic Annotation (meatadata Extraction): Manual, Semi-automatic (automatic with human verification), AutomaticSemantic Computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, inferencing, reasoning, hypothesis validation, discovery, visualization
  • 13.
    Drug Ontology Hierarchy(showingis-a relationships)owl:thingprescription_drug_ brand_namebrandname_undeclaredbrandname_compositeprescription_drugmonograph_ix_classcpnum_ groupprescription_drug_ propertyindication_ propertyformulary_ propertynon_drug_ reactantinteraction_propertypropertyformularybrandname_individualinteraction_with_prescription_druginteractionindicationgeneric_ individualprescription_drug_ genericgeneric_ compositeinteraction_with_monograph_ix_classinteraction_ with_non_ drug_reactant
  • 14.
    N-glycan_beta_GlcNAc_9N-glycan_alpha_man_4GNT-Vattaches GlcNAc atposition 6N-acetyl-glucosaminyl_transferase_VUDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 N-Glycosylation metabolic pathwayGNT-Iattaches GlcNAc at position 2
  • 15.
    Maturing capabilites andongoing researchOntology CreationSemanticAnnotation & Textmining: Entity recognition, Relationship extractionSemanticIntegration & Provenance: Integratingalltypesof data used in biomedicalresearch: text, experimetal data, curated/structured/publicandmultimediaSemantic search, browsing, analysisClinical and Scientific Workflows with semantic web servicesSemanticExplorationofscientific literature, Undiscovered publicknowledge
  • 16.
    Project 1: ASEMRWhy:ImproveQuality of Care and Decision Making without loss of Efficiency in active Cardiology practice. What: Use of semantic Web technologies for clinical decision supportWhere: Athens Heart Center & its partners and labsStatus: In usecontinuously since 01/2006
  • 17.
    Operational since January2006Details: http://knoesis.org/library/resource.php?id=00004
  • 18.
    Active Semantic EMRAnnotateICD9sAnnotate DoctorsLexical AnnotationInsurance FormularyLevel 3 Drug InteractionDrug AllergyDemo at: http://knoesis.org/library/demos/
  • 19.
    Project 2: GlycomicsWhy:Tohelp in the treatment of certain kinds of cancer and Parkinson's Disease.What: Semantic Annotation of Experiment DataWhere:Complex Carbohydrate Research Center, UGAStatus: Research prototype in useWorkflow with Semantic Annotation of Experimental Data already in use
  • 20.
    N-Glycosylation Process (NGP)CellCultureextractGlycoprotein FractionproteolysisGlycopeptides Fraction1Separation technique InGlycopeptides FractionPNGasenPeptide FractionSeparation technique IIn*mPeptide FractionMass spectrometryms datams/ms dataData reductionData reductionms peaklistms/ms peaklistbinningPeptide identificationGlycopeptide identificationand quantificationPeptide listN-dimensional arrayData correlationSignal integration
  • 21.
    Agent Agent AgentAgent Biological Sample Analysis by MS/MSRaw Data toStandard FormatDataPre- processDB Search(Mascot/Sequest)Results Post-process(ProValt)OIOIOIOIOStorageStandard FormatDataRaw DataFiltered DataSearch ResultsFinal OutputBiological InformationScientific workflow for proteome analysisSemanticAnnotationApplications
  • 22.
    Semantic Annotation ofExperimental Data parent ion charge830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043parent ion m/zparent ionabundancefragment ion m/zfragment ionabundancems/ms peaklist dataMass Spectrometry (MS) Data
  • 23.
    Semantic Annotation ofExperimental Data <ms-ms_peak_list><parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode=“ms-ms”/> <parent_ionm-z=“830.9570” abundance=“194.9604” z=“2”/> <fragment_ionm-z=“580.2985” abundance=“0.3592”/> <fragment_ionm-z=“688.3214” abundance=“0.2526”/> <fragment_ionm-z=“779.4759” abundance=“38.4939”/> <fragment_ionm-z=“784.3607” abundance=“21.7736”/> <fragment_ionm-z=“1543.7476” abundance=“1.3822”/> <fragment_ionm-z=“1544.7595” abundance=“2.9977”/> <fragment_ionm-z=“1562.8113” abundance=“37.4790”/> <fragment_ionm-z=“1660.7776” abundance=“476.5043”/></ms-ms_peak_list>OntologicalConceptsSemantically Annotated MS Data
  • 24.
    Project 3: Why:To associate genotype and phenotype information for knowledge discoveryWhat:integrated data sources to run complex queriesEnriching data with ontologies for integration, querying, and automationOntologies beyond vocabularies: the power of relationshipsWhere: NCRR (NIH) Status:Completed
  • 25.
    Use data totest hypothesisGene nameGOInteractionsgeneSequencePubMedOMIMLink between glycosyltransferase activity and congenital muscular dystrophy?GlycosyltransferaseCongenital muscular dystrophyAdapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • 26.
    In a Webpages world…(GeneID: 9215)has_associated_diseaseCongenital muscular dystrophy,type 1Dhas_molecular_functionAcetylglucosaminyl-transferase activityAdapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • 27.
    With the semanticallyenhanced dataglycosyltransferaseGO:0016757isaGO:0008194GO:0016758acetylglucosaminyl-transferaseGO:0008375has_molecular_functionacetylglucosaminyl-transferaseGO:0008375EG:9215LARGEMuscular dystrophy, congenital, type 1D MIM:608840has_associated_phenotypeSELECT DISTINCT ?t ?g ?d { ?t is_a GO:0016757 . ?g has molecular function ?t . ?g has_associated_phenotype ?b2 . ?b2 has_textual_description ?d .FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”) }From medinfo paper.Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • 28.
    Project 4: NicotineDependenceWhy: For understanding the genetic basis of nicotine dependence. What:Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources). Where: NLM (NIH) Status: Completed research
  • 29.
    MotivationNIDA study onnicotine dependencyList of candidate genes in humansAnalysis objectives include:Find interactions between genes
  • 30.
    Identification of activegenes – maximum number of pathways
  • 31.
    Identification of genesbased on anatomical locationsRequires integration of genome and biological pathway information
  • 32.
    Genome and pathwayinformation integrationKEGGReactomeHumanCycpathway
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
    Results: Gene Pathwaynetwork and Hub Genes involved with Nicotine Dependence
  • 43.
    Project 5: T.cruzi SPSE Why: For Integrative Parasite Research to help expedite knowledge discoveryWhat: Semantics and Services Enabled Problem Solving Environment (PSE) for TrypanosomacruziWhere: Center for Tropical and Emerging Global Diseases (CTEGD), UGA Who: Kno.e.sis, UGA, NCBO (Stanford)Status: Research prototype – in regular lab use
  • 44.
    Project OutlineData SourcesInternalLab DataGene KnockoutStrain CreationMicroarrayProteomeExternal DatabaseOntological InfrastructureParasite Lifecycle
  • 45.
  • 46.
    Provenance in ParasiteResearchGene NameSequenceExtractionGene Knockout and Strain Creation*Related Queries from BiologistsList all groups in the lab that used a Target Region Plasmid?Which researcher created a new strain of the parasite (with ID = 66)?An experiment was not successful – has this experiment been conducted earlier? What were the results? 3‘ & 5’RegionDrug Resistant PlasmidGene NamePlasmidConstructionKnockout Construct PlasmidT.Cruzi sample?TransfectionTransfected SampleDrugSelectionCloned SampleSelected SampleCellCloningClonedSample*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
  • 47.
    Research AccomplishmentsSPSEIntegrated internaldata with external databases, such as KEGG, GO, and some datasets on TriTrypDB
  • 48.
    Developed semantic provenanceframework and influence W3C community
  • 49.
    SPSE supports complexbiological queries that help find gene knockout, drug and/or vaccination targets. For example:
  • 50.
    Show me proteinsthat are downregulated in the epimastigote stage and exist in a single metabolic pathway.
  • 51.
    Give me thegene knockout summaries, both for plasmid construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosomabrucei.Knowledge driven query formulationComplex queries can also include:- on-the-fly Web services execution to retrieve additional data inference rules to make implicit knowledge explicitProject 6: HPCOWhy:collaborative knowledge exploration over scientific literature What: An up-to-date knowledge based literature search and exploration framework How: Using information extraction, conventional IR, and semantic web technologies for collaborative literature explorationWhere: AFRLStatus: Completed research
  • 52.
    Focused KBWork Flow (Use case: HPCO)HPC keywordsDoozer: Base Hierarchy from WikipediaFocused Pattern based extractionSenseLab Neuroscience OntologiesInitial KB CreationMeta KnowledgebasePubMed AbstractsKnoesis: Parsing based NLP Triples Enrich Knowledge BaseNLM: Rule based BKR TriplesFinal Knowledge Base
  • 53.
    Triple ExtractionApproachesOpen Extraction No fixed number of predetermined entities and predicatesAt Knoesis – NLP (parsing and dependency trees)Supervised ExtractionPredetermined set of entities and predicatesAt Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniquesAt NLM – NLP and rule based approaches
  • 54.
    Mapping Triples toBase HierarchyEntities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KBPreliminary synonyms based on anchor labels and page redirects in WikipediaProlactostatin redirects to DopaminePredicates (verbs) and entities are subjected to stemming using Wordnet
  • 55.
    Scooner: FullArchitecture
  • 56.
    Scooner FeaturesKnowledge-based browsing:Relations window, inverse relations, creating trailsPersistent projects: Work bench, browsing history, comments, filteringCollaboration: comments, dashboard, exporting (sub)projects, importing projects
  • 57.

Editor's Notes

  • #7 Cognitive model, cognitive behavioral model
  • #37 In parasite research, create new strains of a parasite by knocking out specific genes. So, given a cloned sample, we may need to know the gene(s) that was knocked out.Both these scenarios are real world examples of the importance of provenance. There are many research issues in provenance management. This presentation is on addressing 1) the provenance modeling issue. Specifically, provenance interoperability, consistent modeling, and reduction of terminological heterogeneity. (2) Provenance Query
  • #54 References: http://www.armman.org/projecthero http://www.armman.org/mmitra