• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bio it 2005_rdf_workshop05
 

Bio it 2005_rdf_workshop05

on

  • 331 views

BioIT 2005 Reported on Project with Siderean Software. http://bit.ly/gsi68E (J Web Semantics Paper,

BioIT 2005 Reported on Project with Siderean Software. http://bit.ly/gsi68E (J Web Semantics Paper,

Statistics

Views

Total Views
331
Views on SlideShare
331
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Slides 78 and 87 show the graph of the heterogeneous data sets we brought together for the first ever demo of a life science mashup. This slide would later be used for about 2 years by Sir Tim Berners-Lee to communicate linked data. It was before links were URIs and so we had to link on mapping of database IDs (typically Database + ID within that database). See BioPAX (a few years later) for examples.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Bio it 2005_rdf_workshop05 Bio it 2005_rdf_workshop05 Presentation Transcript

    • An RDF Data Model for the Semantic Web5th Oracle Life Sciences User Group meeting May 16-17, 2005
    • AgendaIntroduction – 5 min – Susie StephensSemantic Web for Life Sciences – 25 min – Susie StephensOracle support of RDF in RDBMS – 25 min – Souripriya DasDemo of Siderean’s Seamark Navigation Server – 25 min – Mike DiLascio, David LaVigna & Joanne LucianoDiscussion – 10 min – Susie Stephens
    • Semantic Web for Life Sciences Susie Stephens
    • What is the Semantic Web? A machine-readable format that is Web compatible The Semantic Web adds definition tags to information in Web pages – Enables computers to discover data more effectively – Allows new associations to form between pieces of information
    • Resource Description Framework W3C standard for the common data format Based on triples (subject–predicate–object) Everything has a URI Ontologies used to label the RDF tagged elements Image Source: W3C
    • Image Source: W3C
    • Enterprise Integration Hub Image Source: W3C
    • Semantic Web Stack Image Source: W3C
    • Pharma Productivity Source: PhRMA & FDA 2003
    • Critical Path Initiative Source: Innovation or Stagnation, FDA Report, March 2004
    • Ontology Frameworks for Integration <hasProduct> Protein <participatesIn> Gene <transcribes> <translatesTo> mRNA <located> <influences> Cascade <affectedTissue> Localization pathway Disease <probeFor> <partOf> <targets> <profiledBy> Intervention Bio-process Drug point <MOA> Microarray <drugInteraction> experiment <affecting> Target <efficacyMarkerFor> model Treatment
    • Biological Pathways Image Source: Cytoscape
    • Beyond the “Dead” Graphical Model Image Source: KEGG
    • Assigning Trust Values to Data Image Source: SWANS
    • InferencingIf Gene G is implicated in Disease D, and its ProteinProduct P is a functional component of only PathwayP2 -> then Disease D directly perturbs Pathway P2<rdf:Description><log:is rdf:parseType=‘Quote’><rdf:Description rdf:about=‘variable#Gene_G’> <hasProduct rdf:resource=‘variable#Protein_P’/> <isImplicatedIn rdf:resource=‘variable#Disease_D’/></rdf:Description> <rdf:Description rdf:about=‘variable#Protein_P’> <inPathway rdf:resource=‘variable#Pathway_P2’/></rdf:Description><log:is><log:implies rdf:parseType=‘Quote’> <rdf:Description rdf:about=‘variable#Disease_D’> <D_perturbs rdf:resource=‘variable#pathway_P2’></rdf:Description></log:implies></rdf:Description>
    • Why Semantic Web for Life Sciences? Heterogeneous data integration using explicit semantics Expression well-defined and rich models of biological systems Annotating findings and interpretations formally and sharing with other scientists Embedding models and semantics within papers Applying logic to infer additional insights and to propose and/or capture new hypotheses
    • Q U E S T I O N S A N S W E R S
    • RDF Support in Oracle RDBMS Souripriya Das, Ph.D. Consultant Member of Technical Staff Oracle New England Development Center
    • OverviewThree types of database objects Model RDF graph consisting of a set of triples Rulebase Set of (user-defined) rules Rule Index Entailed RDF graphWe discuss following aspects for each type of object DDL DML Views SecurityRDF Query (with Inference)
    • RDF Models
    • Model: Overview Each RDF Model (graph) consists of a set of triples A triple (statement) consists of three components – Subject URI or blank node – Predicate URI – Object URI or literal or blank node A statement itself can be a resource (allowing nested graphs)
    • Model: Example :John 16 ageFamily: brotherOf(:John :brotherOf :Mary)(:John :age “16”^^xsd:Integer) parentOf(:Mary :parentOf :Matt) :Mary :Matt(:John :name “John”)(:Mary :name “Mary”) thinksReification:(:John :thinks _:S1)(_:S1 rdf:subject :Sue) livesIn(_:S1 rdf:predicate :livesIn) :Sue NYC(_:S1 rdf:object “NYC”)
    • RDF Query
    • SDO_RDF_MATCH Table Func Arguments – Graph pattern A sequence of triple patterns Triple patterns typically use variables – RDF Data set a set of models – Filter – Aliases … FROM TABLE(SDO_RDF_MATCH( ‘(?x :brotherOf ?y) (?y :parentOf ?z)’, SDO_RDF_Models(‘family’), … )) t …
    • SDO_RDF_MATCH: returnColumns (of type VARCHAR2) in each returned row: For each variable ?x in Graph Pattern – x – x$rdfVTYP URI, Literal, Blank node – x$rdfLTYP Specific literal type (e.g., xsd:integer) – x$rdfCLOB Contains actual value, if ?x matches a CLOB value – x$rdfLANG Language tag, if any (e.g., “en-us”) If no variable in Graph Pattern – A dummy column
    • SDO_RDF_MATCH: matchingMatching multiple representations The same point in value space may have multiple representations – “10”^^xsd:Integer – “10”^^xsd:PositiveInteger – “010”^^xsd:Integer – “000010”^^xsd:Integer SDO_RDF_MATCH automatically resolves these
    • RDF Query: Example Find salary and hiredate of all the uncles SELECT emp.name, emp.salary, emp.hiredate FROM emp, TABLE(SDO_RDF_MATCH( ‘(?x :brotherOf ?y) (?y :parentOf ?z) (?x :name ?name)’, SDO_RDF_Models(‘family), …)) t WHERE emp.name=t.name; Use of SDO_RDF_MATCH allows embedding a graph query in a SQL query
    • RDF Query: Example 2 Find pairs of persons residing at the same address where the first person rents a truck and the second person buys a fertilizer SELECT t3.x name1, t3.y name2 FROM AddrTable t1, AddrTable t2, TABLE(SDO_RDF_MATCH( ‘(?x :rents ?a) (?a rdf:type :Truck) (?y :buys ?b) (?b rdf:type :Fertilizer)’, SDO_RDF_Models(‘Activities), …)) t3 WHERE t1.name=t3.x and t2.name=t3.y and t1.addr=t2.addr;
    • RDF Rulebases
    • Rulebase: Overview Each RDF rulebase consists of a set of rules Each rule consists of – antecedent: graph-pattern – filter condition (optional) – Consequent: graph-pattern One or more rulebases may be used with relevant RDF models (graphs) to obtain entailed graphs
    • Rulebase: ExampleRules in a rulebase family_rb: Antecedent: ‘(?x :brotherOf ?y) (?y :parentOf ?z)’ Filter: NULL Consequent: ‘(?x :uncleOf ?z)’ Antecedent: ‘(?x :age ?a)’ Filter: ‘a >= 65’ Consequent: ‘(?x :ageGroup “Senior”)’ Antecedent: ‘(?x :parentOf ?y) (?y :parentOf ?z)’ Filter: NULL Consequent: ‘(?x :grandParentOf ?z)’
    • RDF Rule Indexes
    • Rule Index: Overview A rule index represents an entailed graph A rule index is created on an RDF dataset (consisting of a set of RDF models and a set of RDF rulebases)
    • Rule Index: Example A rule index may be created on a dataset consisting of – family RDF data, and – family_rb rulebase (shown earlier) The rule index will contain inferred triples showing uncleOf and ageGroup information
    • RDF Query with Inference
    • SDO_RDF_MATCH withRulebases Arguments – Graph pattern A sequence of triples (with variables) – RDF Data set a set of models a set of rulebases – Filter – Aliases … FROM TABLE(SDO_RDF_MATCH( ‘(?x :uncleOf ?y)’, SDO_RDF_Models(‘family’), SDO_RDF_Rulebases (‘rdfs’, ‘family_rb’) … )) t …
    • RDF Query w/ Inference:Example Find salary and hiredate of all the uncles SELECT emp.name, emp.salary, emp.hiredate FROM emp, TABLE(SDO_RDF_MATCH( ‘(?x :uncleOf ?y) (?x :name ?name)’, SDO_RDF_Models(‘family), SDO_RDF_Rulebases(‘rdfs’, ‘family_rb), …)) t WHERE emp.name=t.name;
    • RDF Query w/ Inference:Example 2 Find pairs of persons residing at the same address where the first person rents a truck and the second person buys a fertilizer SELECT t3.x name1, t3.y name2 FROM AddrTable t1, AddrTable t2, TABLE(SDO_RDF_MATCH( ‘(?x :rents ?a) (?a rdf:type :Truck) (?y :buys ?b) (?b rdf:type :Fertilizer)’, SDO_RDF_Models(‘Activities), SDO_RDF_Rulebases(‘rdfs’), …)) t3 WHERE t1.name=t3.x and t2.name=t3.y and t1.addr=t2.addr;
    • RDF Models
    • Model: DDL Procedures provided as part of the API may be used to – Create a model – Drop a model When a user creates a model, a database view gets created automatically – rdfm_family A model corresponds to a column of type SDO_RDF_TRIPLE_S in a base table Each model has exactly one base table associated with it
    • Model: DDL Creating a Model Create an Application TableCREATE TABLE family_table ( id NUMBER, family_triple SDO_RDF_TRIPLE_S); Create a ModelEXEC SDO_RDF.CREATE_RDF_MODEL( ‘family’, ‘family_table’,‘family_triple’); Automatically creates the following database viewrdfm_family (…)
    • Loading RDF Data into Oracle Java API provided to load NTriple into NDM Sample XSLs provided – To convert RDF to NTriple – To convert RDF to INSERT statements
    • Model: DML SQL DML commands may be used to do DML operations on a base table to effect DML (i.e., triple insert, delete, and update) on the corresponding model Insert Triples INSERT INTO family_table VALUES (1, SDO_RDF_TRIPLE_S(‘family, <http://example.org/family/John>, <http://example.org/family/brotherOf>, ‘<http://example.org/family/Mary>));
    • Model: Security The creator of the base table corresponding to a model can grant privileges to other users To perform DML to a model, a user must have DML privileges for the corresponding base table The creator of a model can grant QUERY privileges on the corresponding database view to other users A user can query only those models for which s/he has QUERY privileges to the corr. database views Only the creator of a model can drop the model
    • Model: Views Database views corresponding to the models
    • RDF Rulebases
    • Rulebase: DDL Procedures provided as part of the API may be used to – Create a rulebase create_rulebase(family_rb); – Drop a rulebase – drop_rulebase(family_rb); When a user creates a rulebase, a database view gets created automatically – rdfr_family_rb (rule_name, antecedent, filter, consequent, aliases)
    • Rulebase: DML SQL DML commands may be used on the database view corresponding to a target rulebase to insert, delete, and update rules insert into mdsys.rdfr_family_rb values( ‘uncle_rule, ‘(?x :brotherOf ?y) (?y :parentOf ?z)’, NULL, (?x :uncleOf ?z), SDO_RDF_Aliases(…));
    • Rulebase: Security Creator of a rulebase can grant privileges to the corresponding database view to other users Performing DML operations requires invoker to have appropriate privileges on the database view Only the creator of a rulebase can drop the rulebase
    • Rulebase: Views RDF_RULEBASE_INFO – Contains the list of rulebases – For each rulebase, contains additional information (such as, creator, view name, etc) Content of each rulebase is available from the corresponding database view
    • RDF Rule Indexes
    • Rule Index: DDL Procedures provided as part of the API may be used to – Create a rule index create_rules_index (family_rb_rix_family‘, SDO_RDF_Models(family), SDO_RDF_Rulebases(‘rdfs,family_rb)); – Drop a rule index drop_rules_index (family_rb_rix_family); When a user creates a rule index, a database view gets created automatically – rdfi_family_rb_rix_family (…)
    • Rule Index: Security To create a rule index on an RDF dataset (models and rulebases), user needs to have QUERY privileges on those models and rulebases Creator of a rule index holds QUERY privilege on the rule index and may grant this privilege to other users Only the creator of a rule index can drop it
    • Rule Index: Views RDF_RULEINDEX_INFO – Contains the list of rule indexes – For each rule index, contains additional information (such as, creator, status, etc) RDF_RULEINDEX_DATASETS – For every rule index, stores the names of its models and rulebases
    • Rule Index: Dependencies Content of a rule index depends upon the content of each element of its dataset – Any modification to the models or rulebases in its dataset invalidates the rule index – Dropping a model or rulebase will drop dependent rule indexes automatically.
    • Summary RDF Data Model – Models (Graphs) – RDF Query using SDO_RDF_MATCH Table Function RDF Data Model with (user-defined) Rules – Models (Graphs) – Rulebases – Rule Indexes – RDF Query on entailed RDF graphs Management (DDL, DML, Security, …) – Models, Rulebases, and Rule Indexes
    • RDF Data Model Demo
    • Demo: Family Schema
    • Demo: Family Schema 2
    • Demo: Family Model Data
    • Demo: Family Model Data (Alt)
    • Demo: Query without Inferenceselect m from TABLE(SDO_RDF_MATCH( (?m rdf:type :Male), SDO_RDF_Models(family), null, SDO_RDF_Aliases( SDO_RDF_Alias(, http://www.example.org/family/)), null));M--------------------------------------------------------------------------------http://www.example.org/family/Jackhttp://www.example.org/family/Tom
    • Demo: Query w/ RDFS Inferenceselect m from TABLE(SDO_RDF_MATCH( (?m rdf:type :Male), SDO_RDF_Models(family), SDO_RDF_Rulebases(‘RDFS’), SDO_RDF_Aliases( SDO_RDF_Alias(, http://www.example.org/family/)), null));M--------------------------------------------------------------------------------http://www.example.org/family/Jackhttp://www.example.org/family/Tomhttp://www.example.org/family/Johnhttp://www.example.org/family/Matthttp://www.example.org/family/Sammy
    • Demo: Family Rulebase Antecedent: ‘(?x :parentOf ?y) (?y :parentOf ?z)’ Filter: NULL Consequent: ‘(?x :grandParentOf ?z)’
    • Demo: Query w/ Family and RDFS Inferenceselect x, y from TABLE(SDO_RDF_MATCH( (?x :grandParentOf ?y) (?x rdf:type :Male), SDO_RDF_Models(family), SDO_RDF_Rulebases(RDFS,family_rb), SDO_RDF_Aliases( SDO_RDF_Alias(,http://www.example.org/family/)), null));X Y------------------------------------------------------ -----------------------------------------------------http://www.example.org/family/John http://www.example.org/family/Cindyhttp://www.example.org/family/John http://www.example.org/family/Tomhttp://www.example.org/family/John http://www.example.org/family/Jackhttp://www.example.org/family/John http://www.example.org/family/Cathy
    • Q U E S T I O N S A N S W E R S
    • Demo of Siderean’s Seamark Navigation Server Mike DiLascio & Joanne Luciano
    • Agenda About Siderean Software & Predictive Medicine, Inc. Introducing Seamark Navigation Server v.3.6 Seamark & Oracle 10g RDF Data Model Demonstration of Seamark / Oracle 10g integration Lessons Learned / Q&A
    • About Siderean Software Aggregate, organize and navigate information -the way users think – -to improve analysis and decision making. Founded in 2001 and based in El Segundo, CA Ventured backed in 2004 Delivering RDF-centric navigation and analysis capabilities for end users (a.k.a. - “the last mile”) Active W3C member leveraging Semantic Web standards Demonstrating integrated Seamark navigation layer over Oracle 10g RDF Data Model in collaboration with Predictive Medicine, Inc.
    • Current solutions“50,000 results!!! Now what?” “I give up! Hello? Get me an apple!” “Why do I get oranges when I’m looking for apples?” IT: CONTENT PRODUCER:“As soon as I fix his, “I just produced three appleshers stops working.” last week!” Enterprise search – Knowledge management – a brute force approach breathtakingly expensive
    • Introducing Seamark Navigation Server “I can see the big picture!” “No more staring at a blank text box.” “I can drill down quickly to what I want.” IT: CONTENT PRODUCER: “I can take my coffee “I knew we had an apple in break now.” here somewhere.” Seamark – layering organization to deliver pinpoint navigation
    • How it works: process Term View View Person Text Place Event Metadata about Organized into a unified Analyzed to generate Providing pinpointdata and content information architecture… on-demand views… navigation acrossis aggregated… the data and content
    • How it works: architecture User Navigation and User TaggingUnstructured Content and Data Feeds Web Browsers & Portals Search Engines User Alerts Metadata Navigation Navigation Aggregator Metadata Web Services Feed Aggregators Structured Content Sources
    • Seamark/Oracle integration architecture: Phase 1 User Navigation and User Tagging Web Browsers & Portals User Alerts Batch RDFMatch Oracle 10g Query issued from Cached Navigation RDF Data Seamark at Navigation Web Services Model for index time Metadata scalablepersistence of Feed Aggregators metadata
    • Seamark/Oracle integration architecture: Phase 2 User Navigation and User Tagging Web Browsers & Portals User Alerts Oracle 10g Federated RDFMatch Dynamic Navigation RDF Data Queries issued from Navigation Web Services Model for Seamark at query time Metadata scalablepersistence of metadata Feed Aggregators
    • Seamark Demo: Background & Concepts Life Sciences demonstration premise RDF offers high value during early stage research Leveraging strengths of Oracle 10g & Seamark v3.6 Oracle – large datasets / scalability Seamark – useful subsets / flexible navigation & insights Project elapsed time - about one week Locating and identifying data sources represented the greatest time element Data sources in RDF required minimal integration time Non-RDF data sources required transformation and linking values (non-trivial but straightforward)
    • Seamark Demonstration: Identification of new drug candidates 1. Differentiate different forms GO2Keyword.rdf Keywords.rdf of disease ProbeSet.rdf 2. Identify patients subgroups. 3. Identify top biomarkers Keyword 4. Identify function GO2UniProt.rdf GO2OMIM.rdf Probe 5. Identify biological and chemical properties and Protein disease associations of Gene biomarker MIM Id OMIM.rdf 6. Identify documentsIntAct.rdf 7. Identify role in metabolic GO.rdf GO2Enzyme.rdf pathways UniProt.rdf Enzyme Organism 8. Identify compounds that Citation interact 9. Identify and compare Compound Taxonomy.rdf function in other organisms PubMed.xml Enzymes.rdf KEGG.rdf Pathway 10. Identify any prior art
    • Live Seamark Life Sciences Demonstration: Sample Screenshots
    • Seamark application start page shows integration of OMIM, GO, KEGG, UniProt and NCBI
    • Select: Probe Set ID: “M18255_cds2_s_at”
    • Results: 9 Matches on “M18255_cds2_s_at” to the Gene Ontology Cytoplasm 1st of 9 Matches Cellular Location Via Gene Ontology
    • Cytoplasm 1st of 9 Matches Page Scroll
    • Cytoplasm 1st of 9 Matches Page ScrollPlasma Membrane, …, 2nd of 9 MatchesCellular Location Via Gene Ontology Page Scroll for more results, etc.
    • Start Page: Optionally search across entire collection based uponkeywords from the integrated data sources
    • Seamark Lessons Learned RDF offers multiple unconstrained views of data/relationships – Provides maximum flexibility during early stage research – Later stages can leverage OWL to constrain known relationships Data providers – Timing is right to publish in RDF format – Cut your customer’s integration costs – Speed discovery time Even with one week of effort… – Proof of Concept demonstrates value of broad & deep integration – Additional value in extending POC in customer pilot initiatives
    • Siderean Seamark Conclusion Getting the precise information we need from today’s data glut is profoundly difficult Solving this problem requires a solution that works the way you think Siderean is the world’s first turnkey navigation server for the enterprise and people at large
    • To arrange a demonstration of Seamark orThank You! for more information please contact: Mike DiLascio Office: +1 781 652 0339 Mobile: +1 781 354 7663 mdilascio@siderean.com Siderean Software, Inc. 390 North Sepulveda Blvd., Suite 2070 El Segundo, CA 90245-4475 USA http://www.siderean.com