Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantic Web Standards and the Variety “V” of Big Data

1,369 views

Published on

TopQuadrant presentation by Bob DuCharme given in the dual NoSQL and SemanticTechnology & Business track in San Jose on August 20, 2014

Published in: Data & Analytics
  • Be the first to comment

Semantic Web Standards and the Variety “V” of Big Data

  1. 1. © Copyright 2014 TopQuadrant Inc. Slide 1 Semantic Web standards and the Variety “V” of Big Data Bob DuCharme August 20, 2014
  2. 2. © Copyright 2014 TopQuadrant Inc. Slide 2 Three Vs of Big Data  Volume  Velocity  Variety
  3. 3. © Copyright 2014 TopQuadrant Inc. Slide 3 Gartner, September 2013
  4. 4. © Copyright 2014 TopQuadrant Inc. Slide 4 Which dimensions did people struggle with the most?  Volume 35%  Velocity 16%  Variety 49%
  5. 5. © Copyright 2014 TopQuadrant Inc. Slide 5 Why is variety hard? Furniture Inventory Protein Database ? Customer Database Conference Attendees? Surname GivenName LastPurchase ZipCode Email last_name first_name is_speaker postal_code email
  6. 6. © Copyright 2014 TopQuadrant Inc. Slide 6 Schemas Good thing: Ensure data quality Make query writing* easier Add efficiency *And essentially, all application development Annoying thing:  Can’t add property values someone didn’t see coming  Changing schema (and data with it) slow and expensive  Often tied too closely to specific implementation Inflexibility × 3.
  7. 7. © Copyright 2014 TopQuadrant Inc. Slide 7 Schemaless NoSQL databases  Can’t add property values someone didn’t see coming?  Changing schema (and data with it) slow and expensive?  Often tied too closely to specific implementation?
  8. 8. © Copyright 2014 TopQuadrant Inc. Slide 8 Schemaless: how do applications know what properties are available?  By any means necessary  Documentation  Query for properties that got used  App possibly written by same person or team  Responsibility shifted from database (designer) to application (designer)
  9. 9. © Copyright 2014 TopQuadrant Inc. Slide 9 Schema: all or nothing? Customer Database Conference Attendees? Surname GivenName LastPurchase ZipCode Email last_name first_name is_speaker postal_code email ETL (Extract-Transform-Load)?
  10. 10. © Copyright 2014 TopQuadrant Inc. Slide 10 RDF Schema (RDFS)  W3C Standard since 2004  Often overshadowed by superset standard OWL  Describes RDF, written using RDF syntaxes Semantic Web Linked Data
  11. 11. © Copyright 2014 TopQuadrant Inc. Slide 11 RDF  www.w3.org/RDF (second sentence!): “RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.”
  12. 12. © Copyright 2014 TopQuadrant Inc. Slide 12 Sample schema @prefix cust: <http://companyX.com/ns/customer#> . @prefix ca: <http://companyY.com/ns/confAttendees#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . cust:Surname a rdf:Property . # or: cust:Surname rdf:type rdf:Property . cust:GivenName a rdf:Property . cust:ZipCode a rdf:Property . cust:Email a rdf:Property . ca:last_name a rdf:Property . ca:first_name a rdf:Property . ca:postal_code a rdf:Property. ca:email a rdf:Property . # LastPurchase and is_speaker: don't care (for now)! Customer Database Conference Attendees
  13. 13. © Copyright 2014 TopQuadrant Inc. Slide 13 Relating properties # assuming prefix declarations from previous slide @prefix schema: <http://schema.org/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . cust:Surname rdfs:subPropertyOf schema:familyName . ca:last_name rdfs:subPropertyOf schema:familyName . cust:GivenName rdfs:subPropertyOf schema:givenName . ca:first_name rdfs:subPropertyOf schema:givenName . cust:Email rdfs:subPropertyOf schema:email . ca:email rdfs:subPropertyOf schema:email . Cust:ZipCode rdfs:subPropertyOf schema:postalCode . ca:postal_code rdfs:subPropertyOf schema:postalCode .
  14. 14. © Copyright 2014 TopQuadrant Inc. Slide 14 Using the combined data # SPARQL query: where should we open # a government relations office? SELECT ?postalCode WHERE { ?person schema:email ?email . FILTER(strends(?email,".gov")) ?person schema:postalCode ?postalCode . }
  15. 15. © Copyright 2014 TopQuadrant Inc. Slide 15 Middleware to treat RDBMS as RDF Customers Mapping Middleware (e.g. D2R, Ultrawrap) Application SPARQL query SQL query Relational results SPARQL query results
  16. 16. © Copyright 2014 TopQuadrant Inc. Slide 16 Middleware to treat RDBMS as RDF Customers Mapping Middleware (e.g. D2R, Ultrawrap) Application SPARQL query SQL query Relational results SPARQL query results Conference Attendees SQL query Relational results Schema metadata triplestore
  17. 17. © Copyright 2014 TopQuadrant Inc. Slide 17 Further enhancement ex:Person a rdfs:Class. schema:familyName rdfs:domain ex:Person . schema:givenName rdfs:domain ex:Person . schema:email rdfs:domain ex:Person . schema:postalCode rdfs:domain ex:Person . schema:postalCode rdfs:label "postal code" . Schema:postalCode rdfs:comment "Zip code in the USA, postcode in the UK."
  18. 18. © Copyright 2014 TopQuadrant Inc. Slide 18 Adding more with OWL equipment code room X1703 main kitchen Z0439 cold storage room building main kitchen 98 Main St. cold storage 14 Broad St. Equipment Room addresses eq:room rdfs:subPropertyOf ex:locatedIn . rmaddr:building rdfs:subPropertyOf ex:locatedIn . ex:locatedIn a owl:TransitiveProperty. rmaddr:98MainSt a ex:Building. eq:X1703 eq:room eq:mainKitchen . eq:mainKitchen rmaddr:building rmaddr:98MainSt .
  19. 19. © Copyright 2014 TopQuadrant Inc. Slide 19 Query for which building # SPARQL query: what building is # equipment piece x1703 in? SELECT ?building WHERE { ?building a ex:Building. eq:X1703 ex:locatedIn ?building . } located in located in
  20. 20. © Copyright 2014 TopQuadrant Inc. Slide 20 A little more OWL schema:email a owl:inverseFunctionalProperty . ex:cust401 cust:GivenName "James" . ex:cust401 cust:Surname "Smith" . ex:cust401 cust:Email "jsmith@somecompany.com" . ex:ca04395 ca:first_name "Jim" . ex:ca04395 ca:last_name "Smith" . ex:ca04395 ca:email "jsmith@somecompany.com" . ex:cust401 owl:sameAs ex:ca04395 .
  21. 21. © Copyright 2014 TopQuadrant Inc. Slide 21 What OWL adds to RDFS  RDFS gives you properties to describe your properties, classes, and instances (i.e. your resources)  OWL gives you: • More properties to describe your resources • Classes that you can use to describe resources • The ability to define your own classes that you can use to describe resources
  22. 22. © Copyright 2014 TopQuadrant Inc. Slide 22 Middleware to treat RDBMS as RDF Customers Mapping Middleware (e.g. D2R, Ultrawrap) Application SPARQL query SQL query Relational results SPARQL query results Conference Attendees SQL query Relational results Schema metadata triplestore
  23. 23. © Copyright 2014 TopQuadrant Inc. Slide 23 Descriptive vs. Proscriptive schemas  Not rules to follow – e.g. “Employee must have a first and last name!” – Other ways to do implement constraints  Machine-readable guides to what you’ve got to work with – Data types – Relationships to other resources and classes of resources  Metadata!
  24. 24. © Copyright 2014 TopQuadrant Inc. Slide 24 Whose schemas?  Your own schemas can describe what you need from the data you’re using  Standardized schemas (e.g. schema.org, GoodRelations) can tie together your data with data form other sources  Tie together your custom schemas with (subsets that you’re interested in of) standardized schemas  Tie together (subsets that you’re interested in of) different data sets from different sources
  25. 25. © Copyright 2014 TopQuadrant Inc. Slide 25 Top-down or bottom-up schema development?  Whichever you like  I like bottom-up – (Hey Cyc project: good luck with that!)  Lots of data to deal with? – Model just enough to drive a simple, proof-of- concept application – Build the model (schema) a little at a time, then add more to your application – Connect that model to models of (subsets of) other data sets
  26. 26. © Copyright 2014 TopQuadrant Inc. Slide 26 Who is doing this now?  Pharma  Oil and gas  Publishing
  27. 27. © Copyright 2014 TopQuadrant Inc. Slide 27 TopQuadrant Products and Solutions Solutions Asset Management Solutions Search / Content Enrichment TopBraid Platform Solution Engine IDE Solutions Compose your own Solutions Master Data Management Solutions Information Discovery for Life Sciences Solutions Information Exchange • TopQuadrant offers configurable, out-of-the box solutions enabling organizations to evolve their information infrastructure into a semantic ecosystem
  28. 28. © Copyright 2014 TopQuadrant Inc. Slide 28  Dynamic Interactive Exploration - Search, Query, Filter, Browse, Navigate, Visualize, Share  Logical Data Warehouse - Flexible, Adaptive Information Structuring TopBraid Insight™ (TBI) Connect the dots for new insights. Ease Big Data Variety
  29. 29. © Copyright 2013 TopQuadrant Inc. Slide 29
  30. 30. © Copyright 2014 TopQuadrant Inc. Slide 30 • Tames Big Data to empower businesses • Offers on-demand integrated access to diverse data, making it possible to discover information just in time • Delivers new levels of creativity and infrastructure flexibility TopBraid Insight: Connects the Dots
  31. 31. © Copyright 2014 TopQuadrant Inc. Slide 31 Photo credits • Volume: (CC BY-NC 2.0) Fabrizio Monti https://www.flickr.com/photos/delphaber/3514894189 • Velocity: (CC BY 2.0) Gabriel https://www.flickr.com/photos/cod_gabriel/1332225362 • Variety: (CC BY-NC-SA 2.0) IRRI Photos https://www.flickr.com/photos/ricephotos/4753359957
  32. 32. © Copyright 2014 TopQuadrant Inc. Slide 32 “A wonderful harmony is created when we join together the seemingly unconnected.” - Heraclitus Bob DuCharme bducharme@topquadrant.com Thank you!

×