owl:sameAs Considered Harmful to Provenance
Upcoming SlideShare
Loading in...5
×
 

owl:sameAs Considered Harmful to Provenance

on

  • 1,572 views

Presentation Title:...

Presentation Title:
owl:sameAs Considered Harmful to Provenance

Presentation Abstract:
GOTO was once a standard operation in most computer programming languages. Edsger Dijkstra argued in 1968 that GOTO is a low level operation that is not appropriate for higher-level programming languages, and advocated structured programming in its place. Arguably, owl:sameAs in its current usage may be poised to go through a similar discussion and transformation period. In biomedical research, the provenance of information gathered is nearly as important as, and sometimes even more important than, the information itself. owl:sameAs allows someone to state that two separate descriptions really refer to the same entity. Currently that means that operational systems merge the descriptions and at the same time, merge the provenance information, thus losing the ability to retrieve where each individual description came from. This merging of provenance can be problematic or even catastrophic in biomedical applications that demand access to provenance information. Based on our knowledge of integration issues of data in biomedicine, we give examples as use cases of this issue in biospecimen management and experimental metadata representations. We suggest that systems using any construct like owl:sameAs must provide an option preserve the provenance of the entities and ground assertions related to those entities in question.

Statistics

Views

Total Views
1,572
Views on SlideShare
1,560
Embed Views
12

Actions

Likes
4
Downloads
18
Comments
2

4 Embeds 12

http://localhost 7
http://www.slideshare.net 3
http://www.docshut.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Indeed! Our follow-on work did just that: see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.5164 and http://tw.rpi.edu/web/doc/AnalysisOfIdentityInLinkedData

    Further, we have introduced two relations in W3C PROV, alternateOf and specializationOf, which address many of these issues: www.w3.org/TR/prov-primer/#alternate-entities-and-specialization-1
    Are you sure you want to
    Your message goes here
    Processing…
  • How about an ontology of the term 'similar'?
    After all, LISP distinguishes between 'eq' and 'equal', and Java distinguishes between '==' (which tests for reference equality; i.e. same object), '.equals' (tests for value equality) and 'equalsIgnoreCase' (tests for case-insensitive value equality). In many applications, misspellings are also considered 'same' depending on the Levenshtein distance from the correct spelling. When comparing RDF graphs, you can add different levels of isomorphism, plus different Levenshtein distances for every subject, predicate, and object name (and you could add regular expressions!). It gets very complicated very quickly.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities. <br />
  • There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities. <br />
  • There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities. <br />
  • There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities. <br />
  • There is a growing appreciation in the Linked Data community for owl:sameAs, but a growing apprehension and concern for it in the provenance and reasoning communities. <br />
  • What is passage? <br /> DON&#x2019;T RAMBLE!!! <br /> Any time a particular cell line is mentioned in an experiment, it is the &#x201C;same as&#x201D; itself, because we want to compare, for instance, kinase phosphorylation with gene expression. <br /> A cell line is an abstract concept. We both have &#x201C;YUMAC&#x201D; cells, but I have many different colonies of YUMAC, since I have been growing them for a while. <br /> In fact, I sent you one of my colonies so you can do other research on them. <br /> The provenance of one colony shouldn&apos;t affect that of the others. Each biospecimen needs it&apos;s own provenance trace. <br />
  • What is passage? <br /> DON&#x2019;T RAMBLE!!! <br /> Any time a particular cell line is mentioned in an experiment, it is the &#x201C;same as&#x201D; itself, because we want to compare, for instance, kinase phosphorylation with gene expression. <br /> A cell line is an abstract concept. We both have &#x201C;YUMAC&#x201D; cells, but I have many different colonies of YUMAC, since I have been growing them for a while. <br /> In fact, I sent you one of my colonies so you can do other research on them. <br /> The provenance of one colony shouldn&apos;t affect that of the others. Each biospecimen needs it&apos;s own provenance trace. <br />
  • What is passage? <br /> DON&#x2019;T RAMBLE!!! <br /> Any time a particular cell line is mentioned in an experiment, it is the &#x201C;same as&#x201D; itself, because we want to compare, for instance, kinase phosphorylation with gene expression. <br /> A cell line is an abstract concept. We both have &#x201C;YUMAC&#x201D; cells, but I have many different colonies of YUMAC, since I have been growing them for a while. <br /> In fact, I sent you one of my colonies so you can do other research on them. <br /> The provenance of one colony shouldn&apos;t affect that of the others. Each biospecimen needs it&apos;s own provenance trace. <br />
  • <br />
  • <br />
  • Better explanation for the breast cancer to melanoma issue. <br /> DON&#x2019;T RAMBLE!!! <br /> <br />
  • Better explanation for the breast cancer to melanoma issue. <br /> DON&#x2019;T RAMBLE!!! <br /> <br />
  • Better explanation for the breast cancer to melanoma issue. <br /> DON&#x2019;T RAMBLE!!! <br /> <br />
  • Better explanation for the breast cancer to melanoma issue. <br /> DON&#x2019;T RAMBLE!!! <br /> <br />
  • Better explanation for the breast cancer to melanoma issue. <br /> DON&#x2019;T RAMBLE!!! <br /> <br />
  • Better explanation for the breast cancer to melanoma issue. <br /> DON&#x2019;T RAMBLE!!! <br /> <br />
  • Better explanation for the breast cancer to melanoma issue. <br /> DON&#x2019;T RAMBLE!!! <br /> <br />
  • sameas allows reasoners to infer useful equality relationships. <br />
  • sameas allows reasoners to infer useful equality relationships. <br />
  • sameas allows reasoners to infer useful equality relationships. <br />
  • sameas allows reasoners to infer useful equality relationships. <br />
  • sameas allows reasoners to infer useful equality relationships. <br />
  • DON&#x2019;T RAMBLE!!! <br /> Deprecate owl:sameAs? <br /> Maybe, but we need an alternative. owl:sameAs is very useful for linked data. <br /> Weaken owl:sameAs? <br /> This would disrupt the semantics that are relied on in existing applications. <br /> A transitive, reflexive skos:exactMatch? <br /> Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them. <br />
  • DON&#x2019;T RAMBLE!!! <br /> Deprecate owl:sameAs? <br /> Maybe, but we need an alternative. owl:sameAs is very useful for linked data. <br /> Weaken owl:sameAs? <br /> This would disrupt the semantics that are relied on in existing applications. <br /> A transitive, reflexive skos:exactMatch? <br /> Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them. <br />
  • DON&#x2019;T RAMBLE!!! <br /> Deprecate owl:sameAs? <br /> Maybe, but we need an alternative. owl:sameAs is very useful for linked data. <br /> Weaken owl:sameAs? <br /> This would disrupt the semantics that are relied on in existing applications. <br /> A transitive, reflexive skos:exactMatch? <br /> Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them. <br />
  • DON&#x2019;T RAMBLE!!! <br /> Deprecate owl:sameAs? <br /> Maybe, but we need an alternative. owl:sameAs is very useful for linked data. <br /> Weaken owl:sameAs? <br /> This would disrupt the semantics that are relied on in existing applications. <br /> A transitive, reflexive skos:exactMatch? <br /> Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them. <br />
  • DON&#x2019;T RAMBLE!!! <br /> Deprecate owl:sameAs? <br /> Maybe, but we need an alternative. owl:sameAs is very useful for linked data. <br /> Weaken owl:sameAs? <br /> This would disrupt the semantics that are relied on in existing applications. <br /> A transitive, reflexive skos:exactMatch? <br /> Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them. <br />
  • DON&#x2019;T RAMBLE!!! <br /> Deprecate owl:sameAs? <br /> Maybe, but we need an alternative. owl:sameAs is very useful for linked data. <br /> Weaken owl:sameAs? <br /> This would disrupt the semantics that are relied on in existing applications. <br /> A transitive, reflexive skos:exactMatch? <br /> Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them. <br />
  • DON&#x2019;T RAMBLE!!! <br /> Deprecate owl:sameAs? <br /> Maybe, but we need an alternative. owl:sameAs is very useful for linked data. <br /> Weaken owl:sameAs? <br /> This would disrupt the semantics that are relied on in existing applications. <br /> A transitive, reflexive skos:exactMatch? <br /> Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them. <br />
  • DON&#x2019;T RAMBLE!!! <br /> Deprecate owl:sameAs? <br /> Maybe, but we need an alternative. owl:sameAs is very useful for linked data. <br /> Weaken owl:sameAs? <br /> This would disrupt the semantics that are relied on in existing applications. <br /> A transitive, reflexive skos:exactMatch? <br /> Would provide a list of individuals that can be treated the same as the original individuals, while still distinguishing among them. <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • What is passage? <br /> DON&#x2019;T RAMBLE!!! <br /> Any time a particular cell line is mentioned in an experiment, it is the &#x201C;same as&#x201D; itself, because we want to compare, for instance, kinase phosphorylation with gene expression. <br /> A cell line is an abstract concept. We both have &#x201C;YUMAC&#x201D; cells, but I have many different colonies of YUMAC, since I have been growing them for a while. <br /> In fact, I sent you one of my colonies so you can do other research on them. <br /> The provenance of one colony shouldn&apos;t affect that of the others. Each biospecimen needs it&apos;s own provenance trace. <br />
  • What is passage? <br /> DON&#x2019;T RAMBLE!!! <br /> Any time a particular cell line is mentioned in an experiment, it is the &#x201C;same as&#x201D; itself, because we want to compare, for instance, kinase phosphorylation with gene expression. <br /> A cell line is an abstract concept. We both have &#x201C;YUMAC&#x201D; cells, but I have many different colonies of YUMAC, since I have been growing them for a while. <br /> In fact, I sent you one of my colonies so you can do other research on them. <br /> The provenance of one colony shouldn&apos;t affect that of the others. Each biospecimen needs it&apos;s own provenance trace. <br />
  • What is passage? <br /> DON&#x2019;T RAMBLE!!! <br /> Any time a particular cell line is mentioned in an experiment, it is the &#x201C;same as&#x201D; itself, because we want to compare, for instance, kinase phosphorylation with gene expression. <br /> A cell line is an abstract concept. We both have &#x201C;YUMAC&#x201D; cells, but I have many different colonies of YUMAC, since I have been growing them for a while. <br /> In fact, I sent you one of my colonies so you can do other research on them. <br /> The provenance of one colony shouldn&apos;t affect that of the others. Each biospecimen needs it&apos;s own provenance trace. <br />

owl:sameAs Considered Harmful to Provenance owl:sameAs Considered Harmful to Provenance Presentation Transcript

  • The Tetherless World Constellation owl:sameAs Considered Harmful to Provenance James McCusker and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute http://tw.rpi.edu
  • Background
  • Background For science, provenance is:
  • Background For science, provenance is: The origin or source from which something comes,
  • Background For science, provenance is: The origin or source from which something comes, intention for use,
  • Background For science, provenance is: The origin or source from which something comes, intention for use, who/what generated it,
  • Background For science, provenance is: The origin or source from which something comes, intention for use, who/what generated it, manner of manufacture,
  • Background For science, provenance is: The origin or source from which something comes, intention for use, who/what generated it, manner of manufacture, history of subsequent owners,
  • Background For science, provenance is: The origin or source from which something comes, intention for use, who/what generated it, manner of manufacture, history of subsequent owners, sense of place and time of manufacture,
  • Background For science, provenance is: The origin or source from which something comes, intention for use, who/what generated it, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, and production or discovery,
  • Background For science, provenance is: The origin or source from which something comes, intention for use, who/what generated it, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, and production or discovery, all in sufficient detail for reproducibility.
  • Provenance of this Talk
  • Provenance of this Talk • Discussions from International Semantic Web Conference (ISWC) Workshop on Role of Semantic Web in Provenance Management
  • Provenance of this Talk • Discussions from International Semantic Web Conference (ISWC) Workshop on Role of Semantic Web in Provenance Management • Pay Hayes’ ISWC talk on Blogic
  • Provenance of this Talk • Discussions from International Semantic Web Conference (ISWC) Workshop on Role of Semantic Web in Provenance Management • Pay Hayes’ ISWC talk on Blogic • Web Science 2010 Talk: An Empirical Study of owl:sameAs Use in Linked Data, Ding et al.
  • Provenance of this Talk • Discussions from International Semantic Web Conference (ISWC) Workshop on Role of Semantic Web in Provenance Management • Pay Hayes’ ISWC talk on Blogic • Web Science 2010 Talk: An Empirical Study of owl:sameAs Use in Linked Data, Ding et al. • Semantic web data warehousing for caGrid, McCusker et al.
  • Provenance of this Talk • Discussions from International Semantic Web Conference (ISWC) Workshop on Role of Semantic Web in Provenance Management • Pay Hayes’ ISWC talk on Blogic • Web Science 2010 Talk: An Empirical Study of owl:sameAs Use in Linked Data, Ding et al. • Semantic web data warehousing for caGrid, McCusker et al. • Other discussions about sameAs.
  • SameAs and Provenance collide in experiments Datasets D andhas and A scientist E, wants to link them, but D and E refer to different instances of the same cell line. Dataset D Dataset E
  • SameAs and Provenance collide in experiments Datasets D andhas and A scientist E, wants to link them, but Specimen LB D and E refer to Type Cell Line different instances of the same cell line. Created on 8/31/09 Quantity 5g Passage 0 used Dataset D derived from Specimen LA used Dataset E Type Cell Line Created on 9/20/09 Quantity 10 g Passage 10
  • SameAs and Provenance collide in experiments Datasets D andhas and A scientist E, wants to link them, but Specimen LB D and E refer to Type Cell Line different instances of the same cell line. Created on 8/31/09 Quantity 5g Passage 0 used Dataset D Specimen T derived from Type Tumor Created on 7/8/09 Specimen LA used Dataset E Quantity 5g Type Cell Line Created on 9/20/09 derived from Quantity 10 g Passage 10
  • SameAs and Provenance collide in experiments Datasets D andhas and A scientist E, Patient A wants to link them, but Visit Date 7/8/09 Specimen LB D and E refer to DOB 2/3/45 Type Cell Line different instances of the same cell line. Dx Melanoma Created on 8/31/09 Quantity 5g derived from Passage 0 used Dataset D Specimen T derived from Type Tumor Created on 7/8/09 Specimen LA used Dataset E Quantity 5g Type Cell Line Created on 9/20/09 derived from Quantity 10 g Passage 10
  • SameAs and Provenance collide in experiments The naturalstate is to inclination Patient A (LA owl:sameAs LB) Visit Date 7/8/09 Specimen LB and LA. Then D and E DOB 2/3/45 Type Cell Line can refer to the same specimens. Dx Melanoma Created on 8/31/09 Quantity 5g derived from Passage 0 used Dataset D Specimen T derived from owl:sameAs Type Tumor Created on 7/8/09 Specimen LA used Dataset E Quantity 5g Type Cell Line Created on 9/20/09 derived from Quantity 10 g Passage 10
  • SameAs and Provenance collide in experiments Now theOops! has specimen Patient A multiple values for some Visit Date 7/8/09 Specimen LB important properties and LA appears to have DOB 2/3/45 Type Cell Line been derived from itself. Dx Melanoma Created on 8/31 or 9/20 Quantity 5 or 10 g derived from derived from Passage 0 or 10 used Dataset D Specimen T derived from owl:sameAs used Type Tumor Created on 7/8/09 Specimen LA used Dataset E Quantity 5g Type Cell Line Created on 9/20 or 8/31 derived from Quantity 10 or 5 g Passage 10 or 0
  • Now try to answer:
  • Now try to answer: • Experiment Analysis:
  • Now try to answer: • Experiment Analysis: – The data doesn't look right. What were the methods and protocols, and how consistent were they, going back to surgical resection?
  • Now try to answer: • Experiment Analysis: – The data doesn't look right. What were the methods and protocols, and how consistent were they, going back to surgical resection? – Did the “same cell line” actually come from the same tumor, or just from the same patient? Or even different patients?
  • Now try to answer: • Experiment Analysis: – The data doesn't look right. What were the methods and protocols, and how consistent were they, going back to surgical resection? – Did the “same cell line” actually come from the same tumor, or just from the same patient? Or even different patients? – What originally seemed to be a primary breast cancer or lung cancer is now a metastasized melanoma. Now what?
  • Now try to answer: • Experiment Analysis: – The data doesn't look right. What were the methods and protocols, and how consistent were they, going back to surgical resection? – Did the “same cell line” actually come from the same tumor, or just from the same patient? Or even different patients? – What originally seemed to be a primary breast cancer or lung cancer is now a metastasized melanoma. Now what? • Biospecimen Manangement:
  • Now try to answer: • Experiment Analysis: – The data doesn't look right. What were the methods and protocols, and how consistent were they, going back to surgical resection? – Did the “same cell line” actually come from the same tumor, or just from the same patient? Or even different patients? – What originally seemed to be a primary breast cancer or lung cancer is now a metastasized melanoma. Now what? • Biospecimen Manangement: – Is a histology slide made from a tumor the same as the tumor? What about the tissue microarray, the cell culture, or the isolated molecular material?
  • Now try to answer: None of this is important, until • Experiment Analysis: it turns out to be. – The data doesn't look right. What were the methods and protocols, and how consistent were they, going back to surgical resection? – Did the “same cell line” actually come from the same tumor, or just from the same patient? Or even different patients? – What originally seemed to be a primary breast cancer or lung cancer is now a metastasized melanoma. Now what? • Biospecimen Manangement: – Is a histology slide made from a tumor the same as the tumor? What about the tissue microarray, the cell culture, or the isolated molecular material?
  • Issues and Requirements
  • Issues and Requirements • owl:sameAs is a powerful construct.
  • Issues and Requirements • owl:sameAs is a powerful construct. • Need an alternative way of representing a portion of the owl:sameAs relationship.
  • Issues and Requirements • owl:sameAs is a powerful construct. • Need an alternative way of representing a portion of the owl:sameAs relationship. • Could be something that is possibly weaker or decomposable.
  • Issues and Requirements • owl:sameAs is a powerful construct. • Need an alternative way of representing a portion of the owl:sameAs relationship. • Could be something that is possibly weaker or decomposable. • Could be a domain-specific best practice modeling option.
  • Issues and Requirements • owl:sameAs is a powerful construct. • Need an alternative way of representing a portion of the owl:sameAs relationship. • Could be something that is possibly weaker or decomposable. • Could be a domain-specific best practice modeling option. • We need understand what values came from where.
  • Possible Fixes
  • Possible Fixes • Deprecate owl:sameAs?
  • Possible Fixes X • Deprecate owl:sameAs?
  • Possible Fixes X • Deprecate owl:sameAs? • Weaken owl:sameAs?
  • Possible Fixes X • Deprecate owl:sameAs? • Weaken owl:sameAs?X
  • Possible Fixes • Deprecate owl:sameAs?X • Weaken owl:sameAs? X • Less liberal use of owl:sameAs?
  • Possible Fixes • Deprecate owl:sameAs?X • Weaken owl:sameAs? X • Less liberal use of owl:sameAs? • A domain-literate modeling of weakened notion owl:sameAs?
  • Possible Fixes • Deprecate owl:sameAs? X • Weaken owl:sameAs? X • Less liberal use of owl:sameAs? • A domain-literate modeling of weakened notion owl:sameAs? • A transitive, reflexive version of skos:exactMatch?
  • Possible Fixes • Deprecate owl:sameAs? X • Weaken owl:sameAs? X • Less liberal use of owl:sameAs? • A domain-literate modeling of weakened notion owl:sameAs? • A transitive, reflexive version of skos:exactMatch? • ?x skos:exactMatch ?y. ?y propertyOfInterest ?value.
  • Conclusions
  • Conclusions • Provenance is critical for understanding linked data in scientific applications.
  • Conclusions • Provenance is critical for understanding linked data in scientific applications.
  • Conclusions • Provenance is critical for understanding linked data in scientific applications. • Using owl:sameAs can result in the confusion of provenance and ground truths.
  • Conclusions • Provenance is critical for understanding linked data in scientific applications. • Using owl:sameAs can result in the confusion of provenance and ground truths.
  • Conclusions • Provenance is critical for understanding linked data in scientific applications. • Using owl:sameAs can result in the confusion of provenance and ground truths. • We are exploring some of the potential solutions.
  • Acknowledgements & References • Tetherless World Constellation: • Jim Hendler, Deborah McGuinness, Peter Fox, Li Ding, and the rest. • Carole Goble (for the title) • Blogic. P. Hayes. International Semantic Web Conference, 2009 http://www.slideshare.net/PatHayes/blogic-iswc-2009-invited- talk. • An Empirical Study of owl:sameAs Use in Linked Data. L. Ding, J. Shinavier, T. Finin and D. L. McGuinness Web Science 2010, http://tw.rpi.edu/wiki/An_Empirical_Study_of_owl:sameAs_Use_in_Linked_Data • L. Moreau, “The Foundations for Provenance on the Web,” Nov. 2009 http://eprints.ecs.soton.ac.uk/18176. • J. McCusker, J. Phillips, A. Beltran, A. Finkelstein, and M. Krauthammer, “Semantic web data warehousing for caGrid,” BMC Bioinformatics, vol. 10, 2009, p. S2.  
  • skos:exactMatch Example select ?ds, ?dx where { ?ds used ?spec. ?spec matches ?x. matches ?x diagnosis ?dx. } Dataset D matches Dataset E
  • skos:exactMatch Example select ?ds, ?dx where { ?ds used ?spec. Specimen LB ?spec matches ?x. matches Type Cell Line ?x diagnosis ?dx. } Created on 8/31/09 Quantity 5g Passage 0 used Dataset D derived from matches Specimen LA used Dataset E Type Cell Line Created on 9/20/09 Quantity 10 g Passage 10
  • skos:exactMatch Example select ?ds, ?dx where { ?ds used ?spec. Specimen LB ?spec matches ?x. matches Type Cell Line ?x diagnosis ?dx. } Created on 8/31/09 Quantity 5g Passage 0 used Dataset D Specimen T derived from matches Type Tumor Created on 7/8/09 Specimen LA used Dataset E Quantity 5g Type Cell Line Created on 9/20/09 derived from Quantity 10 g Passage 10
  • skos:exactMatch Example select ?ds, ?dx where { Patient A ?ds used ?spec. Visit Date 7/8/09 Specimen LB ?spec matches ?x. DOB 2/3/45 matches Type Cell Line ?x diagnosis ?dx. } Dx Melanoma Created on 8/31/09 Quantity 5g derived from Passage 0 used Dataset D Specimen T derived from matches Type Tumor Created on 7/8/09 Specimen LA used Dataset E Quantity 5g Type Cell Line Created on 9/20/09 derived from Quantity 10 g Passage 10