Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

244 views

Published on

http://2016.semantics.cc/stephen-buxton-0

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

  1. 1. © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Stephen Buxton, Senior Director, Product Management, MarkLogic When to Use Documents vs Triples
  2. 2. SLIDE: 2 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. NoSQL KEY- VALUE COLUMN DOCUMENT GRAPH PROPERTY GRAPHS TRIPLE STORES NoSQL
  3. 3. SLIDE: 3 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. NoSQL KEY- VALUE COLUMN DOCUMENT GRAPH PROPERTY GRAPHS TRIPLE STORES NoSQL A Database That Integrates Data Better, Faster, with Less Cost
  4. 4. SLIDE: 4 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Leading Organizations Using MarkLogic Semantics  Intelligent Search  Semantic Metadata Hub  Dynamic Semantic Publishing  Recommendation Engines  Compliance Entertainment Company Pharmaceutical Company
  5. 5. SLIDE: 5 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Relational Databases Table PROs  Natural way to model strictly-tabular data  Mature technology with rich eco-system Ph_ID Cus_ID Type Number 4001 2001 Home 555-6789 4002 2001 Cell 555-7238 4003 2002 Home 137-2859 4004 2003 Home 189-2212 4005 2003 Cell 199-2312 4006 2003 Office 444-1898 4007 2003 Main 199-2312 CONs  Real-world entities require complex modeling up-front  Brittle: changes require adding columns and tables  No inherent semantics
  6. 6. SLIDE: 6 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Document Databases Document PROs  Natural way to model entities  Schema is flexible within/across documents  Self-describing  Query and Search immediately  Handles hierarchical data  Handles repeating elements  Handles sparse data  Joins can be denormalized away { “ID” : 1001 , “Fname” : “Paul” , “Lname” : “Jackson” , “Phone” : “415-555-1212” , “SSN” : “123-45-6789” , “Addr” : “123 Avenue Road” , “City” : “San Francisco” , “State” : “CA” , “Zip” : 94111 }
  7. 7. SLIDE: 7 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Graph Databases – Triple Stores Graph PROs  A triple defines a relationship  Entity->Entity  Entity->Concept  Concept->Value  Triples come together to form Graphs  Graphs can be easily shared, combined  Graphs can be traversed  Can infer new triples using definitions (rules)
  8. 8. SLIDE: 8 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Hybrid Documents + Triple Store Hybrid PROs  PROs of a Document Store  PROs of a Triple Store  Combination: Documents with Semantic context  Define the semantics of your data  Richer search through context and facts  Combination: Triples with Document context  Arbitrary annotation of Triples  Metadata, provenance, temporal, etc.  Rich queries over rich data  Fast, iterative development  Query through a SQL lens where appropriate { “ID” : 1001 , “Fname” : “Paul” , “Lname” : “Jackson” , “Phone” : “415-555-1212” , “SSN” : “123-45-6789” , “Addr” : “123 Avenue Road” , “City” : “San Francisco” , “State” : “CA” , “Zip” : 94111 }
  9. 9. SLIDE: 9 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Sidetrack – Documents and Data Title Date Body Section Section Section Article Abstract Paragraph Paragraph Paragraph Type Date Parties Seller Buyer Channel Trade Amount PaidBy Affiliation Name
  10. 10. TRIPLES AND DOCUMENTS
  11. 11. SLIDE: 11 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Triples Alongside Documents User1 rank Senior Manager Geneva basedIn Compliance Officer role High risk personApp1 runsOn Cluster1 TopSecret requires Database1 accesses runs
  12. 12. SLIDE: 12 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Show me documents that mention App1 (or its dependencies)  … and "trades" or "markets"  … that were valid yesterday afternoon  … that were produced near HQ  see Intelligent Search, Infobox  Show me instructions to access App1  App1 user guide  How to get TopSecret access  Scope of Database1  see Dynamic Semantic Publishing Triples Alongside Documents
  13. 13. SLIDE: 13 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Documents as Part of the Graph User1 rank Senior Manager Geneva basedIn Compliance Officer role Hig h risk pers on App1 runsOn Cluster1 TopSecret requires Database1 accesses runs deep dive license user guide tutorialMovie order
  14. 14. SLIDE: 14 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.  Document as opaque object  Show me all the instructional documents related to App1  Search inside the document  Show me all the applications that managers use that expire in the next 6 months Documents as Part of the Graph
  15. 15. SLIDE: 15 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Triples About Documents – Extended Metadata User1 rank Senior Manager Geneva basedIn Compliance Officer role Hig h risk pers on App1 runsOn Cluster1 TopSecret requires Database1 accesses runs order format JSON English Delaware 2016-12-31 jurisdiction expires Ts and Cs language
  16. 16. SLIDE: 16 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.  Triples are a natural way to represent metadata about documents  Extended because that metadata is part of the graph  Example: show me all orders for a TopSecret app that will expire soon Triples About Documents – Extended Metadata
  17. 17. SLIDE: 19 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.  Data Integration: Dirty data  Show me license documents from vendor Acme  Data Integration: Overlapping data  Show me all assets from vendor Acme Triples About Documents
  18. 18. SLIDE: 23 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Triples as part of a document  Embed triples in a document  Triples and document have the same security, transactions, backup, temporality, …  Annotate triples in an entirely generic way (XML or JSON)  Provenance  Confidence  Bitemporal  Query across triples and documents in the same query  SPARQL, restrict result to some source, confidence range, bitemporal range  Search, restrict result to documents that contain some facts or metadata
  19. 19. SUMMARY
  20. 20. SLIDE: 25 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use Triples when you want to … Data Documents Triples  Store and query hundreds of billions of facts and relationships  Explore a graph  Visualize a graph  Leverage standards: data + query  Infer new information  better insights  simpler data modeling  Semantics of data  integration
  21. 21. SLIDE: 26 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use Documents when you want to … Data Documents Triples  Easily store heterogeneous data (transactional data, records, free-text)  Schema-agnostic  modeling freedom  integrate without ETL*  Search flexibility and specificity  Fast app development
  22. 22. SLIDE: 27 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Document Store and Triple Store Combined Data Documents Triples All the benefits of each, plus:  Docs can contain triples, Triples can annotate docs, Graphs can contain docs – Faster data integration using semantics as the glue – Ideal model for reference data, metadata, provenance – Ability to run really powerful queries  Massive speed and scale  Simplicity of a single unified platform  Enterprise features (security, HA/DR, ACID transactions,…)

×