Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition of Graph Data Models
• Property Graphs
• Resource Description Framework (RDF)
• Relational Model
• SQL, CQL, HQL, DQL, ...
• Data interchange formats
• Protocol Buffers, Apache Thrift, Apache Avro
• Mostly based on algebraic data types (ADTs)
Some data models of interest (@ Uber)
• A simple interpretation1
I of an RDF graph is:
• A non-empty set IR of resources, called the domain or universe of I
• A set IP, called the set of properties of I
• A mapping IEXT from IP into 𝒫(IR x IR), i.e. the set of sets of pairs <x, y> with
x and y in IR
• Ground RDF graphs only (⇒ no blank nodes)
Resource Description Framework (RDF)
1) World Wide Web Consortium. (2014). RDF 1.1 concepts and abstract syntax.
• Vertices, edges, properties
• No standard
• W3C Property Graphs working group (2013)
• W3C Graph Data Workshop (March 2019)
• Next best thing?
• TinkerPop’s Graph.Features
Property Graph data model
• Many are concerned with the data model
• Vertex/edge id data types
• Numeric, string, UUID, “any”
• Property support
• Supports vertex properties? edge properties? vertex metaproperties?
• Property data types
• Primitive: boolean, byte, double, float, integer, long, string
• Complex: array, map, list (uniform/mixed), serializable
Property graph objects
v = a vertex label (e.g. Person, Place, Dataset)
e = an edge label (e.g. knows, likes, claims)
p = a property key (e.g. name, weight)
d = a data type (e.g. String, Integer, List<String>)
• Let’s build a knowledge graph
• E.g. Riders, drivers, trips, vehicles, orders, etc.
• Evaluate OLTP and OLAP queries
• Use cases:
• Risk and safety, recommendation, analytics
• Thousands of datasets and schemas
• Static, streaming, RPC
• Data sources are not composable
• Strong identifiers, weak semantics
• Duplicate types, homonyms, synonyms
• Per-language data islands
• Diversity of data modeling conventions
• Define mappings on a per-dataset basis
• Great precision, low recall
• Build bridges between data models
• Must be composable
• Standardized vocabulary saves a lot of pain
• Commonly-used types and properties
• E.g. basic data types, time, geometry and geolocation, addresses
and contact info, sensors, money, etc.
• Shared logical data model
• Tooling for transformations
• Written in Haskell, compiles to bytecode via Eta
UBER KNOWLEDGE GRAPH