SADI, SHARE and the Scientific MethodThe Quest for the Holy Grail
The Problem
The Problem
The Holy Grail:(this slide created circa 2002)Align the promoters of all serine threoninekinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
Two novel technologies developed in our labare getting us very close to the Holy Grail!
Holy Grail Demo #1
Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis
How do we query that database?
A Brief Digression…
“Database”
?
Boxes became ovals…Straight lines became curvy lines…
Boxes became ovals…Straight lines became curvy lines……and you want us to give you a grant for THAT??
Relational Database“Graph”
Gene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
“Foreign keys” are used to link tables in a databaseGene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
Gene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDLinks in Graphs consist of statements called“TRIPLES”   isRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
Both Data Sources are on the Same MachineGene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
Gene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDGraph Data Sources (may be) on Independent Machines on the WebisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
“Meaning” of the connection between data-points is understood only by the database administratorProtein regulates GeneGene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
Gene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates ID“Meaning” of the connection in a Graph is explicitly labeled(and machine-readable!)isRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
Connect all of the graphs in the world to one anotherAnd what do you get?
Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
The lavender portion represents biology – currently ~40,000,000,000 Triples(we and our collaborators will be doubling that number in the next 12 months)
How do you find information on this“Semantic Web”??
SPARQLThe query language used to discover and extract information represented in Graphs
SPARQLUnfortunately, YOU have to know which Web resources contain which Triples(HARD!)Even if you do know this, SPARQL has significant limitations when attempting to query over disparate Graphs(SLOW  AND  CUMBERSOME)
SPARQLIf the data doesn’t existin any Graph at all…
Basically…A novel way of making Triples available on the Semantic Web, using a technology called Web Services“Services” for short
Basically…We invented SADI to overcome some/all of these problems…but I wont bore you with the technical details…
Detour EndsPlease resume speed
Holy Grail Demo #1Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysisHow do we query that database?
SHARESemantic Health And Research EnvironmentSPARQL enhanced by SADI
A  Novel SPARQL Query EngineOvercomes some of the limitations of traditional SPARQL query-handlers
A  Novel SPARQL Query EngineOvercomes some of the limitations of traditional SPARQL query-handlers…and more…
A  Novel SPARQL Query EngineOvercomes some of the limitations of traditional SPARQL query-handlers…and more…MUCH more!!
What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE { 	uniprot:P47989 pred:isEncodedBy ?gene . 	?gene ont:isParticipantIn ?pathway . }
What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE { 	uniprot:P47989 pred:isEncodedBy ?gene . 	?gene ont:isParticipantIn ?pathway . }
What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE { uniprot:P47989pred:isEncodedBy ?gene . 	?geneont:isParticipantIn ?pathway . }Note that there is no “From” clause… I have neglected to tell the system where to look for the answer, I am simply asking my question
Now stick that query into SHARE
Recapwhat we just sawA standard SPARQL query was entered into SHARE, a SADI-aware query engine
Recapwhat we just sawThe query was interpreted to extract the individual data/relationships being requested (and any component/sub-properties, as we shall see later!)
Recapwhat we just sawThe “triple-patterns” required to answer the query are passed to SADI for Web Service discovery
Recapwhat we just sawServices capable of generating those triple-patterns are automatically executed, the triples are stored, and the query is resolved.
Recapwhat we just sawWe posed, and answered a ~complex database query WITHOUT A DATABASE(in fact, the data didn’t even have to exist...)
Holy Grail Demo #1Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
Holy Grail Demo #2
Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplantsPREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {	?patientrdf:typepatient:LikelyRejecter .	?patient l:latestBUN ?bun . 	?patient l:latestCreatinine ?creat . }
Likely Rejecter:A patient who has creatinine levelsthat are increasing over time- - Wilkinson MD
Likely Rejecter:…but there is no “likely rejecter” column or table in our database… only blood chemistry measurementsat various time-points
?
The definition of a LikelyRejecter is encoded in a machine-readable document  written in the OWL language  (“Ontology”)“the regression line over creatinine measurements should have an increasing slope”
The machine continues to burrow down through the definition and discovers that regression lines have things like slopes and intercepts, etc…
Then… Two magical events occur…
The machine figures outby itselfthe need to do a Linear Regression analysisin order to answer your question
The machine figures outby itselfhow and wherethat analysiscan be doneand does it automatically!
http://www.impactlab.net/2009/03/22/improve-your-brain-power/
The SHARE system utilizes SADI to discover analytical services on the Web that do linear regression analysis
VOILA!
How do we do that?!?We let the data describe itself!This is a different frommost of the bioinformatics world,where the person giving you the data also tells you how to interpret it
Data exhibits “late binding”
Late binding:“purpose and meaning”of the data isnot determined untilthe moment it is required
Benefitof late bindingData is amenable toconstant re-interpretation
Example?Blood Creatinine measurementswere not dictated to be (only)Blood Creatinine measurements!
Example?The data had the ‘qualities/properties’ thatallowed the machine to inferthat they were Blood Creatinine measurements
Example?But the data also had the ‘qualities/properties’ that allowed them to be interpreted as X/Y coordinate data by another Service
http://www.flickr.com/people/faernworks/
Holy Grail Demo #2Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
The Holy Grail may not yet be in-handbut we can at least see it from here!So… now what?
Mark’s ManifestoWhat is my next “Holy Grail”?
ScienceSupport for the in silicoScientific Method
The Scientific MethodDiscourse:  What do you believe?  What do I believe?Disagreement:  You’re wrong!  And I’m gonna prove it!Clarity:  This is the experiment I am going to doReproducibility:  This is how I did it (“provenance”)Clarity:  This is my new hypothesis
The Scientific MethodDiscourse:  What do you believe?  What do I believe?Disagreement:  You’re wrong!  And I’m gonna prove it!Clarity:  This is the experiment I am going to doReproducibility:  This is how I did it (“provenance”)Clarity:  This is my new hypothesisWorkflows                (e.g. myExperiment)
Another Brief Digression…
“Facebook” for Scientistshttp://myexperiment.org
An exciting evolution in the way Researchers express and share their in silico “Materials and Methods”Through things called ‘Workflows’
Workflows are explicit representationsof the method by which an analysis was doneand which resources are used to do it
Workflows can be very simple…   “Blast this sequence”
Or not...This workflow takes in a CEL file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the MADAT software package (MicroArray Data Analysis Tool) Also returned by this workflow are a list of the top differentially expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data.
Why bother?
TavernaA workbench for designing and executingScientific Workflows
Load-up your data and press “play”!…Then go home for the weekend!  You are just one click away from your M.Sc.!!
By the by…The SHARE application automatically creates a Workflow and then automatically runs it.This is where the data comes from to answer the queries… Workflows are a Good Thing™
Detour EndsPlease resume speed
WORKFLOWS
At the moment the Semantic Web in Healthcare and Life Sciencesaddresses these issues by attempting to create “consensus”
Large, centralized ontologies (e.g. the Gene Ontology)that claim to represent community agreement about “biological reality”
…is that Science?
To restore the “traditions of Science” to in silico scienceThe Semantic Web needs to encourage/facilitate personal opinion and debate
What has this got to do with SADI and SHARE?
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {	?patientrdf:typepatient:LikelyRejecter .	?patient l:latestBUN ?bun . 	?patient l:latestCreatinine ?creat . }
Likely Rejecter
I created a small ontology describing my definition ofa Likely Rejecter
… it was MY ontology!
I can re-use it
I can modify it as I change myworld-view
I can publish it for others to use
Others can modify it and/or compare it to THEIR world-view
Sharing my ontology gives opportunities for “micro-attribution”“Credit” to me is automatic when someone uses my ontology in their ontology/query
Using SADI and SHAREmypersonal world-view isexplicitlyexpressedand can bedynamically evaluated againstglobal data and knowledge
http://www.dailymail.co.uk/femail/article-488234/Friends-dignity-self-respect---weight-wasnt-I-lost-slimming-club.html
…but there’s more…
“Likely Rejecter”
I made that up!  It came out of my head!
What’s another word for a world-view that you make-up?Hypothesis
The “Likely Rejecter” OWL Classis an explicitly-expressed hypothesis;Members of that class may or may not exist!
Ontologically-expressed Hypotheses drive the discovery, assembly, and analysis of data capable of evaluating their validityHypothesisIschemiaSADI+ SHAREHypertensionBlood PressureAnalytical AlgorithmDatabase 1Database 2
Join us!SADI and CardioSHARE are Open-Source projectsCome join us – we’re having a lot of fun!!http://sadiframework.org
 CreditsBenjamin VanderValk(SHARE & SADI)Luke McCarthy (SADI, SHARE, Taverna, CardioSHARE)SoroushSamadian(CardioSHARE)David Withers(Taverna)Edward Kawas(SADI Service auto-generator)
U of New BrunswickDr. Chris BakerAlexandreRiazanovCarleton UniversityDr. Michel Dumontier	Marc-Alexandre Nolin	Leonid Chepelev	Steve EtlingerNichaellaKieth	Jose Cruz
Microsoft Research
                          CreditsBenjamin VanderValk (SADI & CardioSHARE)Luke McCarthy (SADI & CardioSHARE)SoroushSamadian (CardioSHARE)IO Informatics (Knowledge Explorer API)Microsoft Research FinThis presentation available on SlideShare:  keywords ‘wilkinson’ ‘iCAPTURE’ ‘HLI’

The Scientific Method on the Semantic Web

  • 1.
    SADI, SHARE andthe Scientific MethodThe Quest for the Holy Grail
  • 2.
  • 3.
  • 4.
    The Holy Grail:(thisslide created circa 2002)Align the promoters of all serine threoninekinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 5.
    Two novel technologiesdeveloped in our labare getting us very close to the Holy Grail!
  • 6.
  • 7.
    Imagine there isa “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis
  • 8.
    How do wequery that database?
  • 9.
  • 10.
  • 14.
  • 15.
    Boxes became ovals…Straightlines became curvy lines…
  • 16.
    Boxes became ovals…Straightlines became curvy lines……and you want us to give you a grant for THAT??
  • 17.
  • 18.
    Gene Table-----------------------Gene IDTissueIDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
  • 19.
    “Foreign keys” areused to link tables in a databaseGene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
  • 20.
    Gene Table-----------------------Gene IDTissueIDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDLinks in Graphs consist of statements called“TRIPLES” isRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
  • 21.
    Both Data Sourcesare on the Same MachineGene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
  • 22.
    Gene Table-----------------------Gene IDTissueIDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDGraph Data Sources (may be) on Independent Machines on the WebisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
  • 23.
    “Meaning” of theconnection between data-points is understood only by the database administratorProtein regulates GeneGene Table-----------------------Gene IDTissue IDType IDProtein Table-----------------------Protein IndexProtein NameRegulates IDisRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
  • 24.
    Gene Table-----------------------Gene IDTissueIDType IDProtein Table-----------------------Protein IndexProtein NameRegulates ID“Meaning” of the connection in a Graph is explicitly labeled(and machine-readable!)isRepressorOfhttp://ncbi.nlm/NR/NR_14487http://pdb.org/114487
  • 25.
    Connect all ofthe graphs in the world to one anotherAnd what do you get?
  • 26.
    Mark Butler (2003)Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
  • 27.
    The lavender portionrepresents biology – currently ~40,000,000,000 Triples(we and our collaborators will be doubling that number in the next 12 months)
  • 28.
    How do youfind information on this“Semantic Web”??
  • 29.
    SPARQLThe query languageused to discover and extract information represented in Graphs
  • 30.
    SPARQLUnfortunately, YOU haveto know which Web resources contain which Triples(HARD!)Even if you do know this, SPARQL has significant limitations when attempting to query over disparate Graphs(SLOW AND CUMBERSOME)
  • 31.
    SPARQLIf the datadoesn’t existin any Graph at all…
  • 33.
    Basically…A novel wayof making Triples available on the Semantic Web, using a technology called Web Services“Services” for short
  • 34.
    Basically…We invented SADIto overcome some/all of these problems…but I wont bore you with the technical details…
  • 35.
  • 36.
    Holy Grail Demo#1Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysisHow do we query that database?
  • 37.
    SHARESemantic Health AndResearch EnvironmentSPARQL enhanced by SADI
  • 38.
    A NovelSPARQL Query EngineOvercomes some of the limitations of traditional SPARQL query-handlers
  • 39.
    A NovelSPARQL Query EngineOvercomes some of the limitations of traditional SPARQL query-handlers…and more…
  • 40.
    A NovelSPARQL Query EngineOvercomes some of the limitations of traditional SPARQL query-handlers…and more…MUCH more!!
  • 41.
    What pathways doesUniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 42.
    What pathways doesUniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 43.
    What pathways doesUniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE { uniprot:P47989pred:isEncodedBy ?gene . ?geneont:isParticipantIn ?pathway . }Note that there is no “From” clause… I have neglected to tell the system where to look for the answer, I am simply asking my question
  • 44.
    Now stick thatquery into SHARE
  • 47.
    Recapwhat we justsawA standard SPARQL query was entered into SHARE, a SADI-aware query engine
  • 48.
    Recapwhat we justsawThe query was interpreted to extract the individual data/relationships being requested (and any component/sub-properties, as we shall see later!)
  • 49.
    Recapwhat we justsawThe “triple-patterns” required to answer the query are passed to SADI for Web Service discovery
  • 50.
    Recapwhat we justsawServices capable of generating those triple-patterns are automatically executed, the triples are stored, and the query is resolved.
  • 51.
    Recapwhat we justsawWe posed, and answered a ~complex database query WITHOUT A DATABASE(in fact, the data didn’t even have to exist...)
  • 52.
    Holy Grail Demo#1Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 53.
  • 54.
    Show me thelatest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplantsPREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE { ?patientrdf:typepatient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 55.
    Likely Rejecter:A patientwho has creatinine levelsthat are increasing over time- - Wilkinson MD
  • 56.
    Likely Rejecter:…but thereis no “likely rejecter” column or table in our database… only blood chemistry measurementsat various time-points
  • 57.
  • 58.
    The definition ofa LikelyRejecter is encoded in a machine-readable document written in the OWL language (“Ontology”)“the regression line over creatinine measurements should have an increasing slope”
  • 59.
    The machine continuesto burrow down through the definition and discovers that regression lines have things like slopes and intercepts, etc…
  • 60.
    Then… Two magicalevents occur…
  • 61.
    The machine figuresoutby itselfthe need to do a Linear Regression analysisin order to answer your question
  • 62.
    The machine figuresoutby itselfhow and wherethat analysiscan be doneand does it automatically!
  • 63.
  • 64.
    The SHARE systemutilizes SADI to discover analytical services on the Web that do linear regression analysis
  • 65.
  • 66.
    How do wedo that?!?We let the data describe itself!This is a different frommost of the bioinformatics world,where the person giving you the data also tells you how to interpret it
  • 67.
  • 69.
    Late binding:“purpose andmeaning”of the data isnot determined untilthe moment it is required
  • 70.
    Benefitof late bindingDatais amenable toconstant re-interpretation
  • 71.
    Example?Blood Creatinine measurementswerenot dictated to be (only)Blood Creatinine measurements!
  • 72.
    Example?The data hadthe ‘qualities/properties’ thatallowed the machine to inferthat they were Blood Creatinine measurements
  • 73.
    Example?But the dataalso had the ‘qualities/properties’ that allowed them to be interpreted as X/Y coordinate data by another Service
  • 74.
  • 75.
    Holy Grail Demo#2Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 76.
    The Holy Grailmay not yet be in-handbut we can at least see it from here!So… now what?
  • 77.
    Mark’s ManifestoWhat ismy next “Holy Grail”?
  • 78.
    ScienceSupport for thein silicoScientific Method
  • 80.
    The Scientific MethodDiscourse: What do you believe? What do I believe?Disagreement: You’re wrong! And I’m gonna prove it!Clarity: This is the experiment I am going to doReproducibility: This is how I did it (“provenance”)Clarity: This is my new hypothesis
  • 81.
    The Scientific MethodDiscourse: What do you believe? What do I believe?Disagreement: You’re wrong! And I’m gonna prove it!Clarity: This is the experiment I am going to doReproducibility: This is how I did it (“provenance”)Clarity: This is my new hypothesisWorkflows (e.g. myExperiment)
  • 82.
  • 83.
  • 84.
    An exciting evolutionin the way Researchers express and share their in silico “Materials and Methods”Through things called ‘Workflows’
  • 86.
    Workflows are explicitrepresentationsof the method by which an analysis was doneand which resources are used to do it
  • 87.
    Workflows can bevery simple… “Blast this sequence”
  • 88.
    Or not...This workflowtakes in a CEL file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the MADAT software package (MicroArray Data Analysis Tool) Also returned by this workflow are a list of the top differentially expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data.
  • 89.
  • 90.
    TavernaA workbench fordesigning and executingScientific Workflows
  • 92.
    Load-up your dataand press “play”!…Then go home for the weekend! You are just one click away from your M.Sc.!!
  • 93.
    By the by…TheSHARE application automatically creates a Workflow and then automatically runs it.This is where the data comes from to answer the queries… Workflows are a Good Thing™
  • 94.
  • 95.
  • 97.
    At the momentthe Semantic Web in Healthcare and Life Sciencesaddresses these issues by attempting to create “consensus”
  • 98.
    Large, centralized ontologies(e.g. the Gene Ontology)that claim to represent community agreement about “biological reality”
  • 99.
  • 104.
    To restore the“traditions of Science” to in silico scienceThe Semantic Web needs to encourage/facilitate personal opinion and debate
  • 105.
    What has thisgot to do with SADI and SHARE?
  • 106.
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE { ?patientrdf:typepatient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 107.
  • 108.
    I created asmall ontology describing my definition ofa Likely Rejecter
  • 109.
    … it wasMY ontology!
  • 110.
  • 111.
    I can modifyit as I change myworld-view
  • 112.
    I can publishit for others to use
  • 113.
    Others can modifyit and/or compare it to THEIR world-view
  • 114.
    Sharing my ontologygives opportunities for “micro-attribution”“Credit” to me is automatic when someone uses my ontology in their ontology/query
  • 115.
    Using SADI andSHAREmypersonal world-view isexplicitlyexpressedand can bedynamically evaluated againstglobal data and knowledge
  • 116.
  • 117.
  • 118.
  • 119.
    I made thatup! It came out of my head!
  • 120.
    What’s another wordfor a world-view that you make-up?Hypothesis
  • 121.
    The “Likely Rejecter”OWL Classis an explicitly-expressed hypothesis;Members of that class may or may not exist!
  • 124.
    Ontologically-expressed Hypotheses drivethe discovery, assembly, and analysis of data capable of evaluating their validityHypothesisIschemiaSADI+ SHAREHypertensionBlood PressureAnalytical AlgorithmDatabase 1Database 2
  • 125.
    Join us!SADI andCardioSHARE are Open-Source projectsCome join us – we’re having a lot of fun!!http://sadiframework.org
  • 126.
    CreditsBenjamin VanderValk(SHARE& SADI)Luke McCarthy (SADI, SHARE, Taverna, CardioSHARE)SoroushSamadian(CardioSHARE)David Withers(Taverna)Edward Kawas(SADI Service auto-generator)
  • 127.
    U of NewBrunswickDr. Chris BakerAlexandreRiazanovCarleton UniversityDr. Michel Dumontier Marc-Alexandre Nolin Leonid Chepelev Steve EtlingerNichaellaKieth Jose Cruz
  • 128.
  • 129.
    CreditsBenjamin VanderValk (SADI & CardioSHARE)Luke McCarthy (SADI & CardioSHARE)SoroushSamadian (CardioSHARE)IO Informatics (Knowledge Explorer API)Microsoft Research FinThis presentation available on SlideShare: keywords ‘wilkinson’ ‘iCAPTURE’ ‘HLI’