Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Linked Data Can Speed Information Discovery


Published on

Linked data platforms are now making it easier than ever to perform data exploration and discovery without having to wait to get the data integrated into the data warehouse. In this presentation, we discuss what linked data is and show a case study on integrating separate source systems so that scientists don't have to learn the source systems structures to get to their data.

Published in: Data & Analytics
  • Be the first to comment

How Linked Data Can Speed Information Discovery

  1. 1. How Linked Data Can Speed Information Discovery Alex Meadows, CSpring Bubba Puryear, Syngenta
  2. 2. Agenda  Linked Data Overview  Case Study: Linked Data At Syngenta  Q&A
  3. 3. We don’t know your data, it’s Going to take us some time. -or- We have so many other projects we’re not sure when we can get to this request. We’re not sure what we want, but can’t we have it all? -or- Here’s our requirements, when can we have this completed? Business BI Team
  4. 4. New source: weeks to months Existing source: days to weeks
  5. 5. What is Linked Data?  Coined in 2006 by Tim Berners-Lee  Provides vocabulary for every data set  Can combine vocabularies  Highly structured in triple format
  6. 6. Vocabulary: Classes
  7. 7. Vocabulary: Properties
  8. 8. Triples Pale Ale Beer Mark Person Mt. Carmel Brewing Co. Brewer
  9. 9. Triples: RDF/XML
  10. 10. Option 1: Virtualization New source: hours to week Existing source: hours to days
  11. 11. Ontop  Mapping layer between SQL and SPARQL  Integrates with many tools (Protégé, Sesame, etc.)
  12. 12. Option 2: Lift and Format New source: days to weeks Existing source: hours to days
  13. 13. SPARQL PREFIX beer: SELECT ?brewery Name WHERE { ?brewery beer:hasName ?breweryName ?person beer:owner_of ?brewery ?person beer:first_name “Mark” } PREFIX beer: SELECT ?beertype WHERE { ?beer beer:isOfType ?beertype ?person beer:brews ?beer ?person beer:first_name “Mark” <beer:isOfType rdf:resource="beer:PaleAle"/> <beer:isOfType rdf:resource=“beer:Lager”/> <beer:hasName>Mt. Carmel Brewing Company</beer:hasName>
  14. 14. Case Study: Linked Data At Syngenta
  15. 15. Syngenta Syngenta is a leading agriculture company helping to improve global food security by enabling millions of farmers to make better use of available resources. We have two primary lines of business: Seeds and Agricultural Chemicals. We have a huge commitment to internal R&D and that is where our linked data initiatives are.
  16. 16. Linked Data at Syngenta  Concept Store Enable Syngenta applications to consume and publish linked data controlled vocabulary (reference terms and relationships)  ENVision Tool Enables trial placements and weightings that best represent target markets  MINT Data Make genetic identity & inventory data available for discovery, analysis and R&D driven proof of concepts
  17. 17. What we accomplished  In a 3 day hackathon we:  Mapped about 60% of MINT’s model from 2 databases to RDF  Built a virtualized RDF triple store  Created a data-discovery / browsing user interface
  18. 18. MINT Data MINT Browser Repository Configuration • Identity • Material MINT Ontology • Identity • Material RDBMS-RDF Mapper RDF Repository Broker Open-Sesame MINT Material RDBMS JDBC R2RML Mapping • Material Semantic Wiki SPARQL Ontology & Mapping Designer Ontologist RDBMS-RDF Mapper MINT Identity RDBMS JDBC R2RML Mapping • Identity
  19. 19. MINT Class Model  The MINT ontology was created within Protégé as shown here
  20. 20. MINT Virtualization Mapping
  21. 21. MINT Virtualization Mapping
  22. 22. Next Steps  Moving from the virtualized layer into actual physical triple store implementation  Partnering with our benefits tracking team to get accurate metrics on MINT adoption and value  Linking to additional data sources to provide dashboard KPI’s and analytics for our R&D seeds pipeline
  23. 23. THANK YOU!
  24. 24. About Alex…  Principal Consultant, CSpring   Twitter, GitHub as OpenDataAlex  Alex has spent the last ten years working in various industries to help businesses unlock the information hidden in their data sets. He specializes in open source business intelligence solutions from data warehousing to dashboards, analytics, and beyond. His latest area of research has been on linked data (also known as triple stores). Alex has a Masters in Business Intelligence from Saint Joseph’s University in Pennsylvania and a Bachelors in Business Administration from Chowan University in North Carolina.
  25. 25. About Bubba…  Team Leader, R&D IS, Syngenta   I’ve held roles as a software engineer, architect and manager across multiple industries. The last 13 years I’ve worked in the life sciences industry supporting Research & Development. I’m currently the program architect / technical lead for a standardization program within Syngenta bringing Track & Trace compliance to R&D’s material operations. Many of Syngenta’s R&D product decisions for our Seeds line of business are founded on data associated with plant material identity. I have a Bachelors degree in Computer Science from Rose-Hulman Institute of Technology.