Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition of Graph Data Models

954 views

Published on

Presentation for Data Day Texas / Global Graph Summit 2019

Published in: Engineering

A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition of Graph Data Models

  1. 1. A Graph is a Graph is a Graph: Equivalence, Transformations, and Composition of Graph Data Models Joshua Shinavier, PhD Global Graph Summit 2019 26.01.2019
  2. 2. • Graphs • Categories • Data models • Transformations • Graphs @ Uber Outline
  3. 3. Graphs
  4. 4. TinkerFactory.createClassic() + vertex labels
  5. 5. Graph structure
  6. 6. Categories
  7. 7. An object
  8. 8. An arrow (morphism)
  9. 9. Composition
  10. 10. Identity
  11. 11. An isomorphism
  12. 12. An isomorphism: f-1 ∘f = ida , f∘f-1 = idb
  13. 13. An isomorphism
  14. 14. Composition of isomorphisms
  15. 15. A category
  16. 16. A functor
  17. 17. Equivalence
  18. 18. Graphs and operations as a category
  19. 19. Equivalent models
  20. 20. Data models
  21. 21. • Property Graphs • Resource Description Framework (RDF) • Relational Model • SQL, CQL, HQL, DQL, ... • Data interchange formats • Protocol Buffers, Apache Thrift, Apache Avro • Mostly based on algebraic data types (ADTs) Some data models of interest (@ Uber)
  22. 22. • A simple interpretation1 I of an RDF graph is: • A non-empty set IR of resources, called the domain or universe of I • A set IP, called the set of properties of I • A mapping IEXT from IP into 𝒫(IR x IR), i.e. the set of sets of pairs <x, y> with x and y in IR • Assuming: • Ground RDF graphs only (⇒ no blank nodes) Resource Description Framework (RDF) 1) World Wide Web Consortium. (2014). RDF 1.1 concepts and abstract syntax.
  23. 23. • Simple. • Vertices, edges, properties • No standard • W3C Property Graphs working group (2013) • W3C Graph Data Workshop (March 2019) • Next best thing? • TinkerPop’s Graph.Features Property Graph data model
  24. 24. • Many are concerned with the data model • Vertex/edge id data types • Numeric, string, UUID, “any” • Property support • Supports vertex properties? edge properties? vertex metaproperties? • Multi-properties? • Property data types • Primitive: boolean, byte, double, float, integer, long, string • Complex: array, map, list (uniform/mixed), serializable TinkerPop’s Graph.Features
  25. 25. Property graph objects v = a vertex label (e.g. Person, Place, Dataset) e = an edge label (e.g. knows, likes, claims) p = a property key (e.g. name, weight) d = a data type (e.g. String, Integer, List<String>)
  26. 26. Edges
  27. 27. Vertex properties
  28. 28. Edge properties
  29. 29. Vertex meta-properties
  30. 30. Meta-edges: edge to vertex
  31. 31. Meta-edge: vertex to edge
  32. 32. Bonus: hyper-edge
  33. 33. Bonus: RDF statement
  34. 34. Transformations
  35. 35. • neo4j-rdf-sail (2008-2010) • Wrapper based on OpenRDF (now RDF4J) Sail API • Neo4j-internal representation of RDF resources and statements • Blueprints RDF suite (2010-2014) • SailGraph: pragmatic RDF → PG mapping • GraphSail: storage and retrieval of RDF data in a PG database • PropertyGraphSail: PG → RDF mappings (simple vs. lossless) • Cudré-Mauroux et al. 2013: RDF storage and query on various NoSQL backends • Hartig 2014: formal PG ↔ RDF* mappings (simple vs. lossless and invertible) • Das et al. 2014: PG → RDF mappings (reification-, NG-, subproperty-based) • SPARQL-Gremlin (2015-present): SPARQL → Gremlin query mapping • jbarrasa/neosemantics (2016-present): RDF storage/query on Neo4j • etc. Example: Property Graphs ↔ RDF
  36. 36. • We want them: • Bidirectional • Composable • Techniques • Bidirectional arrows • Symmetric lenses • Symmetric currying Graph transformations
  37. 37. Goal: define an isomorphism
  38. 38. Problem: internal structure is not one-to-one
  39. 39. ...but some components are analogous
  40. 40. Asymmetric context
  41. 41. Symmetric context Fruit images: Wikipedia, PicsArt, PNG Mart, PNG Only, toppng
  42. 42. Graphs @ Uber
  43. 43. • Let’s build a knowledge graph • E.g. Riders, drivers, trips, vehicles, orders, etc. • Evaluate OLTP and OLAP queries • Use cases: • Risk and safety, recommendation, analytics Goal
  44. 44. • Thousands of datasets and schemas • Static, streaming, RPC • Data sources are not composable • Strong identifiers, weak semantics • Duplicate types, homonyms, synonyms • Per-language data islands • Diversity of data modeling conventions Problems
  45. 45. • Define mappings on a per-dataset basis • Great precision, low recall • Build bridges between data models • Must be composable • Standardized vocabulary saves a lot of pain • Commonly-used types and properties • E.g. basic data types, time, geometry and geolocation, addresses and contact info, sensors, money, etc. • Shared logical data model • Tooling for transformations • Written in Haskell, compiles to bytecode via Eta Solutions UBER KNOWLEDGE GRAPH
  46. 46. Building bridges
  47. 47. Logical
  48. 48. YAML
  49. 49. Protocol Buffers
  50. 50. Apache Thrift
  51. 51. Apache Avro
  52. 52. N-Triples (OWL)
  53. 53. Docs
  54. 54. Thanks joshsh@uber.com

×