8 likes•5,190 views

Presentation for Data Day Texas / Global Graph Summit 2019

- 1. A Graph is a Graph is a Graph: Equivalence, Transformations, and Composition of Graph Data Models Joshua Shinavier, PhD Global Graph Summit 2019 26.01.2019
- 2. • Graphs • Categories • Data models • Transformations • Graphs @ Uber Outline
- 3. Graphs
- 15. TinkerFactory.createClassic() + vertex labels
- 16. Graph structure
- 17. Categories
- 19. An object
- 22. Composition
- 23. Identity
- 25. An isomorphism
- 26. An isomorphism: f-1 ∘f = ida , f∘f-1 = idb
- 27. An isomorphism
- 30. A category
- 32. A functor
- 33. Equivalence
- 34. Graphs and operations as a category
- 37. Data models
- 38. • Property Graphs • Resource Description Framework (RDF) • Relational Model • SQL, CQL, HQL, DQL, ... • Data interchange formats • Protocol Buffers, Apache Thrift, Apache Avro • Mostly based on algebraic data types (ADTs) Some data models of interest (@ Uber)
- 39. • A simple interpretation1 I of an RDF graph is: • A non-empty set IR of resources, called the domain or universe of I • A set IP, called the set of properties of I • A mapping IEXT from IP into 𝒫(IR x IR), i.e. the set of sets of pairs <x, y> with x and y in IR • Assuming: • Ground RDF graphs only (⇒ no blank nodes) Resource Description Framework (RDF) 1) World Wide Web Consortium. (2014). RDF 1.1 concepts and abstract syntax.
- 40. • Simple. • Vertices, edges, properties • No standard • W3C Property Graphs working group (2013) • W3C Graph Data Workshop (March 2019) • Next best thing? • TinkerPop’s Graph.Features Property Graph data model
- 41. • Many are concerned with the data model • Vertex/edge id data types • Numeric, string, UUID, “any” • Property support • Supports vertex properties? edge properties? vertex metaproperties? • Multi-properties? • Property data types • Primitive: boolean, byte, double, float, integer, long, string • Complex: array, map, list (uniform/mixed), serializable TinkerPop’s Graph.Features
- 42. Property graph objects v = a vertex label (e.g. Person, Place, Dataset) e = an edge label (e.g. knows, likes, claims) p = a property key (e.g. name, weight) d = a data type (e.g. String, Integer, List<String>)
- 43. Edges
- 45. Edge properties
- 47. Meta-edges: edge to vertex
- 48. Meta-edge: vertex to edge
- 51. Transformations
- 52. • neo4j-rdf-sail (2008-2010) • Wrapper based on OpenRDF (now RDF4J) Sail API • Neo4j-internal representation of RDF resources and statements • Blueprints RDF suite (2010-2014) • SailGraph: pragmatic RDF → PG mapping • GraphSail: storage and retrieval of RDF data in a PG database • PropertyGraphSail: PG → RDF mappings (simple vs. lossless) • Cudré-Mauroux et al. 2013: RDF storage and query on various NoSQL backends • Hartig 2014: formal PG ↔ RDF* mappings (simple vs. lossless and invertible) • Das et al. 2014: PG → RDF mappings (reification-, NG-, subproperty-based) • SPARQL-Gremlin (2015-present): SPARQL → Gremlin query mapping • jbarrasa/neosemantics (2016-present): RDF storage/query on Neo4j • etc. Example: Property Graphs ↔ RDF
- 53. • We want them: • Bidirectional • Composable • Techniques • Bidirectional arrows • Symmetric lenses • Symmetric currying Graph transformations
- 57. Goal: define an isomorphism
- 58. Problem: internal structure is not one-to-one
- 59. ...but some components are analogous
- 61. Symmetric context Fruit images: Wikipedia, PicsArt, PNG Mart, PNG Only, toppng
- 62. Graphs @ Uber
- 63. • Let’s build a knowledge graph • E.g. Riders, drivers, trips, vehicles, orders, etc. • Evaluate OLTP and OLAP queries • Use cases: • Risk and safety, recommendation, analytics Goal
- 64. • Thousands of datasets and schemas • Static, streaming, RPC • Data sources are not composable • Strong identifiers, weak semantics • Duplicate types, homonyms, synonyms • Per-language data islands • Diversity of data modeling conventions Problems
- 65. • Define mappings on a per-dataset basis • Great precision, low recall • Build bridges between data models • Must be composable • Standardized vocabulary saves a lot of pain • Commonly-used types and properties • E.g. basic data types, time, geometry and geolocation, addresses and contact info, sensors, money, etc. • Shared logical data model • Tooling for transformations • Written in Haskell, compiles to bytecode via Eta Solutions UBER KNOWLEDGE GRAPH
- 66. Building bridges
- 67. Logical
- 68. YAML
- 69. Protocol Buffers
- 70. Apache Thrift
- 71. Apache Avro
- 72. N-Triples (OWL)
- 73. Docs