Next Generation Cancer Data
Discovery, Access, and
Integration Using Prizms and
Nanopublications
Jim McCusker@jpmccu, Timo...
What we’re trying to fix
From: Data Sharing and Management SNAFU in 3 Acts
What we’re trying to fix
Ah yes, SAM1 is
the level of CXCR4
expression.
What is the content
of the field called
“SAM1”?
Fr...
What we’re trying to fix
That is logical if
you think about it.
And what is the
content of the field
called “SAM2”?
From: ...
What we’re trying to fix
…
What is the content
of the field called
“SAM2”?
I don’t
remember.
From: Data Sharing and Manage...
Life Science data seems
to start its life very
scruffy.
5 Levels of Data Sharing,
from scruffy to neat
Level 1: Basic data sharing
Who, what, when, where, why
Level 2: Automated ...
The Prizms Architecture
Prizms User Interactions
Provenance of Prizms
Prizms
healthdata.tw.rpi.edu
lod.melagrid.org
More Prizms Nodes:
https://github.com/timrdf/prizms/wik...
5 Levels in Prizms
Level 1: Basic data sharing
CKAN dataset metadata + datapubs
Level 2: Automated Conversion
Prizms raw c...
Level 1: Basic Data
Sharing
CKAN1 and Datapubs
1Comprehensive Knowledge Archive Network
What is CKAN?
• A data portal for
all kinds of data
• Link or upload
• Linked Data-
friendly
• Link to:
o  Files
o  APIs
o...
•  A data portal for all kinds of data
•  Link or upload
•  Linked Data-friendly
•  Link to:
o  Files
o  APIs
o  SPARQL en...
What is a Datapub?
hasAttribution
hasSupporting
hasAssertion
hasProvenance
exome-variants-in-melanoma
a Nanopublication
pr...
Anatomy of a Datapub:
Assertion
Attributes, and Entities in RDF (VRAER)
ls2013/exome-‐‑variants-‐‑in-‐‑melanoma-‐‑assertio...
Anatomy of a Datapub:
Attribution, Evidence
contributor
creatorexome-variants-in-melanoma
rights: cc-by
James McCusker
mbo...
Citing a Dataset
using Datapubs
Citing a Dataset
using Datapubs
Levels 2-3: Automated
Conversion, Semantic
Conversion
Prizms raw conversions,
enhanced conversions
Prizms RDF Converter
smart, naïve
bootstrap
"Hawaii","Alii Garden Market Place",
"75-6129 Alii Drive", "Kailua-Kona",
"967...
Prizms Benefits
Prizms has worked
with:
• BFO/IAO/OBI
• SIO
• RDF Data Cube
Vocabulary
• PROV
• VOID
• FOAF
• etc.
For fre...
Future Work:
Supporting Levels 4-5
Level 1: Basic data sharing
CKAN dataset metadata + datapubs
Level 2: Automated Convers...
Publishing Custom Linked
Data Using LODSPeaKr
•  Custom
templates for
RDF and HTML
•  Templates driven
by rdf:type
•  Web-...
Conclusions
•  Prizms is an infrastructure for sharing data on many
levels of sophistication
•  Good support for Level 1-3...
Thanks!
• Rensselaer Polytechnic (Tetherless World):
o  Alvaro Graves
o  John Erickson
o  The LOGD Team
• The Open Knowled...
Upcoming SlideShare
Loading in...5
×

Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications

330

Published on

To encourage data sharing in the life sciences, supporting tools need to minimize effort and maximize incentives. We have created infrastructure that makes it easy to create portals that supports dataset sharing and simplified publishing of the datasets as high quality linked data. We report here on our infrastructure and its use in the creation of a melanoma dataset portal. This portal is based on the Comprehensive Knowledge Archive Network (CKAN) and Prizms, an infrastructure to acquire, integrate, and publish data using Linked Data principles. In addition, we introduce an extension to CKAN that makes it easy for others to cite datasets from within both publications and subsequently-derived datasets using the emerging nanopublication and World Wide Web Consortium provenance standards.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
330
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications

  1. 1. Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications Jim McCusker@jpmccu, Timothy Lebo@timrdf, Michael Krauthammer, and Deborah McGuinness@dlmcguinness
  2. 2. What we’re trying to fix From: Data Sharing and Management SNAFU in 3 Acts
  3. 3. What we’re trying to fix Ah yes, SAM1 is the level of CXCR4 expression. What is the content of the field called “SAM1”? From: Data Sharing and Management SNAFU in 3 Acts
  4. 4. What we’re trying to fix That is logical if you think about it. And what is the content of the field called “SAM2”? From: Data Sharing and Management SNAFU in 3 Acts
  5. 5. What we’re trying to fix … What is the content of the field called “SAM2”? I don’t remember. From: Data Sharing and Management SNAFU in 3 Acts
  6. 6. Life Science data seems to start its life very scruffy.
  7. 7. 5 Levels of Data Sharing, from scruffy to neat Level 1: Basic data sharing Who, what, when, where, why Level 2: Automated Conversion Computable RDF representations Level 3: Semantic enhancement Human-enhanced RDF representations Level 4: Semantic eScience Use of vocabularies with formal semantics Level 5: Community-Based Standards Consensus use of preferred ontologies
  8. 8. The Prizms Architecture
  9. 9. Prizms User Interactions
  10. 10. Provenance of Prizms Prizms healthdata.tw.rpi.edu lod.melagrid.org More Prizms Nodes: https://github.com/timrdf/prizms/wiki/Prizms-Nodes prov:wasDerivedFrom prov:wasDerivedFrom Linking Open Govt. Dataprov:wasDerivedFrom
  11. 11. 5 Levels in Prizms Level 1: Basic data sharing CKAN dataset metadata + datapubs Level 2: Automated Conversion Prizms raw conversions Level 3: Semantic Conversion Prizms enhanced conversions Level 4: Semantic eScience Level 3 + NCBO ontology recommender + similar tools Level 5: Community-Based Standards Level 4 + Vocabulary reuse analysis
  12. 12. Level 1: Basic Data Sharing CKAN1 and Datapubs 1Comprehensive Knowledge Archive Network
  13. 13. What is CKAN? • A data portal for all kinds of data • Link or upload • Linked Data- friendly • Link to: o  Files o  APIs o  SPARQL endpoints o  Metadata o  Publications o  Visualizations…
  14. 14. •  A data portal for all kinds of data •  Link or upload •  Linked Data-friendly •  Link to: o  Files o  APIs o  SPARQL endpoints o  Metadata data.melagrid.org A portal for melanoma data
  15. 15. What is a Datapub? hasAttribution hasSupporting hasAssertion hasProvenance exome-variants-in-melanoma a Nanopublication provenance a Provenance attribution a Attribution supporting a Supporting assertion a Assertion Groth et al., 2010
  16. 16. Anatomy of a Datapub: Assertion Attributes, and Entities in RDF (VRAER) ls2013/exome-‐‑variants-‐‑in-‐‑melanoma-‐‑assertion.ttlRedraw IMT homepage distribution exome_aa_variants_final.xls a Distribution accessURL: exome_aa_variants_final.xls xls value: xls Variant data from "Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma" a Dataset description: Variant data from M. Krauthammer, Y. Kong, B. Ha, P. Evans, A. Bacchiocchi, J.P. McCusker, E. Cheng, M.J. Davis, G. Goh, M. Choi, S. Ariyan, D. Narayan, K. Dutton-Regester, A. Capatana, E.C. Holman, M. Bosenberg, M. Sznol, H.M. Kluger, D.E. Brash, D.F. Stern, M.A. Materin, R.S. Lo, S. Mane, S. Ma, K.K. Kidd, N.K. Hayward, R.P. Lifton, J. Schlessinger, T.J. Boggon, and R. Halaban, Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Nature Genetics, 2012. in press. **Tab 1: Description** This worksheet contains a description of the variant calling method. **Tab 2: SNVs** This worksheet contains automatically called somatic non-silent SNVs in matched melanoma samples. Annotations from MU2A. **Tab 3: InDels** This worksheet contains automatically called somatic InDels in matched melanoma samples. Annotations from VEP. **Tab 4: Splice Site Variants** This worksheet contains automatically called somatic splice site variants in matched melanoma samples. Annotations from VEP. **Tab 5: Additional mutations** This worksheet contains additional somatic mutations. These mutations are either inferred in unmatched samples (see Methods overview above), or have been Sanger-validated via PCR amplified products, after manual inspections of sequencing reads. Annotations from MU2A/VEP. Nomenclature -------- **SNV:** Single Nucleotide Variant **DNV:** Dinucleotide Variant **DNV*: ** Two SNVs affecting the same codon, at positions 1 and 3 of the codon **TNV:** Trinucleotide Variant **Parentheses in genotype calls:** Nucleotides that appear in
  17. 17. Anatomy of a Datapub: Attribution, Evidence contributor creatorexome-variants-in-melanoma rights: cc-by James McCusker mbox: mailto:james.mccusker@yale.edu Michael Krauthammer mbox: mailto:michael.krauthammer@yale.edu Attribution Evidence
  18. 18. Citing a Dataset using Datapubs
  19. 19. Citing a Dataset using Datapubs
  20. 20. Levels 2-3: Automated Conversion, Semantic Conversion Prizms raw conversions, enhanced conversions
  21. 21. Prizms RDF Converter smart, naïve bootstrap "Hawaii","Alii Garden Market Place", "75-6129 Alii Drive", "Kailua-Kona", "96740", "-155.9819183", "19.61436844" ds4383:thing_1367 raw:column_1 "Hawaii"; raw:column_2 "Alii Garden Market Place"; raw:column_3 "75-6129 Alii Drive"; raw:column_4 "Kailua-Kona"; raw:column_5 "96740"; raw:column_6 "-155.9819183"; raw:column_7 "19.61436844" . ds4383:thing_1367 con:preferredURI ds4383:farmersMarket_1367 . ds4383:farmersMarket_1367 a ds4383_vocab:FarmersMarket; con:address :address_1367; dcterms:title "Alii Garden Market Place"; wgs:lat -155.9; wgs:long 19.6 . :address_1367 a con:Address; con:stateOrProvince typed_state:Hawaii; con:street "75-6129 Alii Drive"; con:city "Kailua-Kona"; con:zip "96740" . typed_state:Hawaii a ds4383_vocab:State; dcterms:identifier "Hawaii"; rdfs:label "Hawaii"; owl:sameAs <http://sws.geonames.org/5855797/>, govtrackusgov:HI, dbpedia:Hawaii . enhancement Time Domain Expertise SemWeb Expertise Time Domain Expertise SemWeb Expertise Lebo et al., 2012
  22. 22. Prizms Benefits Prizms has worked with: • BFO/IAO/OBI • SIO • RDF Data Cube Vocabulary • PROV • VOID • FOAF • etc. For free, you get: • Provenance at dataset and triple levels • Automatic source/ dataset/version URI generation • Automated conversion as data changes
  23. 23. Future Work: Supporting Levels 4-5 Level 1: Basic data sharing CKAN dataset metadata + datapubs Level 2: Automated Conversion Prizms raw conversions Level 3: Semantic Conversion Prizms enhanced conversions Level 4: Semantic eScience Level 3 + NCBO ontology recommender + similar tools Level 5: Community-Based Standards Level 4 + Vocabulary reuse analysis✔✔ ✔ ✔ ✔
  24. 24. Publishing Custom Linked Data Using LODSPeaKr •  Custom templates for RDF and HTML •  Templates driven by rdf:type •  Web-based template editor •  Embed easy-to- generate visualizations
  25. 25. Conclusions •  Prizms is an infrastructure for sharing data on many levels of sophistication •  Good support for Level 1-3 Data Sharing •  Initial support for Level 4-5 Data Sharing •  Didn't just make life science data better, it made future Linked Data better! •  More to be done, but lots of progress
  26. 26. Thanks! • Rensselaer Polytechnic (Tetherless World): o  Alvaro Graves o  John Erickson o  The LOGD Team • The Open Knowledge Foundation Network (OKFN) • Yale University: o  Ruth Halaban o  Tobias Kuhn • Grant support from: o  Yale SPORE in Skin Cancer o  Semantic Sea Ice Interoperability Initiative
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×