Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Curation andBiodiversity Research --Lessons from BiSciCol anda look at the “TriplifierSimplifier”John Deck, Universit...
•   BiSciCol is National Science Foundation funded 2010 – 2014•   Infrastructure to tag & track specimens & derivates in c...
A Biological Relationship Graph …                          Taxonomic Type Filter                          Class Filter    ...
Why Linked Data? Why BiSciCol?Here is Gustav’s Problem                   Generates Lots of Data… (Prefers to collect stuff)
Biodiversity Data Challenges   Data is Distributed   Rapidly Changing   Technologies   Covers Multiple   Domains
Solving Biodiversity Data Challenges withBiSciCol and Linked Data                                             Is a dwc:Eve...
The Triplifier                  (Advanced Interface)Loading DataNaming and Identifying ObjectsLinking ObjectsPublishing   ...
Advanced Interface: Loading Data                             Darwin Core                              Archive      Darwin ...
Advanced Interface: Entities                                                                                              ...
Advanced Interface: Entity                                                  RelationsRelations as Triples:<http://biocode....
Triplify!: View graph based data            Response    Query
The Triplifier (Simple Interface)          Publish
What challenges are we facing now?(for BiSciCol, Linked Data, and data integration                   In general)
Identifier Issues                                 Persistence                                 Solutions:                  ...
“Occurrence”      Classification Issues                                              Inadequate representational units    ...
Relation Issues        Non-sensical conclusions are possible!        Solution:        • apply directional links only where...
Adoption Issues    Critical mass required for effective utilization    Solutions:    • Work with aggregators (GBIF, VertNe...
The BiSciCol Mission• BiSciCol tackles biodiversity data challenges:    •    Tracking and integration of objects across di...
Upcoming SlideShare
Loading in …5
×

Triplifier talk

563 views

Published on

Connecting content with a tool to convert database and spreadsheet data to be useable on the semantic web.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Triplifier talk

  1. 1. Data Curation andBiodiversity Research --Lessons from BiSciCol anda look at the “TriplifierSimplifier”John Deck, University of California, BerkeleyBrian Stucky, University of Colorado, BoulderLukasz Ziemba, University of Florida, GainesevilleNico Cellinese, University of Florida, GainesvilleRob Guralnick, University of Colorado, BoulderBiSciCol TeamReed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, RobGuralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, BrianStucky, Rob Whitton, Lukasz Ziemba
  2. 2. • BiSciCol is National Science Foundation funded 2010 – 2014• Infrastructure to tag & track specimens & derivates in cyberspace• Relies on globally unique identifiers (GUIDs) to track objects• Implements a Linked Data approach• Provides support for the Global Names Architecture
  3. 3. A Biological Relationship Graph … Taxonomic Type Filter Class Filter X Specimens Tissues X Sequences
  4. 4. Why Linked Data? Why BiSciCol?Here is Gustav’s Problem Generates Lots of Data… (Prefers to collect stuff)
  5. 5. Biodiversity Data Challenges Data is Distributed Rapidly Changing Technologies Covers Multiple Domains
  6. 6. Solving Biodiversity Data Challenges withBiSciCol and Linked Data Is a dwc:Event Group data into classes. Is a dwc:Event Assign identifiers. Link identifiers. [ ] Ocean Sampling Day Publish. [X] Moorea Biocode [X] SI MSNGR System [+] Add My Data
  7. 7. The Triplifier (Advanced Interface)Loading DataNaming and Identifying ObjectsLinking ObjectsPublishing Powered by:
  8. 8. Advanced Interface: Loading Data Darwin Core Archive Darwin Core Archive Spreadsheets Mysql KEMU MySQL
  9. 9. Advanced Interface: Entities 78 TissueResult is identifiers assigned to Entities:78 a door .427 a cat .<http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> a <dwc:Occurrence> .<http://biocode.berkeley.edu/collectorevents/MIB_25> a <dwc:Event> . From Gary Larsen and adapted by Barry Smith in Referent Tracking Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health R Biomed presentation at the Semantics of Biodiversity Workshop, 2012. Inform. 2006 Jun;39(3):362-78.
  10. 10. Advanced Interface: Entity RelationsRelations as Triples:<http://biocode.berkeley.edu/collectorevents/MIB_25> <ma:isSourceOf> <http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> .<http://biocode.berkeley.edu/collectorevents/MIB_37> <ma:isSourceOf> <http://biocode.berkeley.edu/collectorspecimens/BMOO_2667> .<http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> <ma:isSourceOf> <http://biocode.berkeley.edu/plate_well/Plate_M037F10> .<http://biocode.berkeley.edu/collectorspecimens/BMOO_2667> <ma:isSourceOf> <http://biocode.berkeley.edu/plate_well/Plate_M028G5> .
  11. 11. Triplify!: View graph based data Response Query
  12. 12. The Triplifier (Simple Interface) Publish
  13. 13. What challenges are we facing now?(for BiSciCol, Linked Data, and data integration In general)
  14. 14. Identifier Issues Persistence Solutions: • DOIs (http://doi.org/) • EZIDs (http://ezid.net/) Assignment at the source is difficult Solutions: • Calculated namespaces (e.g. geo:lat,lng) via PDAs • UUIDs (randomly unique) The digestible RFID tag Semantic web requires URIs but many standards (includingscheme : string Darwin Core) do not require URIs for identifiers Solution: • Promote use of URIs for identifiers in all Standards. URI
  15. 15. “Occurrence” Classification Issues Inadequate representational units Confusion between representational units“Sample, Specimen, Individual, Aggregation” Solutions: • Continue working on clarity in term definitions • Work from upper level ontologies (e.g. Basic Formal Ontology) to derive definitions.
  16. 16. Relation Issues Non-sensical conclusions are possible! Solution: • apply directional links only where appropriate.
  17. 17. Adoption Issues Critical mass required for effective utilization Solutions: • Work with aggregators (GBIF, VertNet, NCBI). • View Triples as a publishable unit Reality is complicated Solutions: • Work collaboratively (e.g. BioPortal, hackathons, interdisciplinary workshops)
  18. 18. The BiSciCol Mission• BiSciCol tackles biodiversity data challenges: • Tracking and integration of objects across disciplines • Linking derivatives back to their source• BiSciCol is about community, collaborative practice • Commitment to standards, ontologies • Agreement on permanent, resolvable identifiers • Triplification of data sources to enhance linked data http://biscicol.blogspot.com/ http://biscicol.org

×