An Algebraic Data Model for Graphs and Hypergraphs
Bonus Intro to Category Theory
November 21, 2019
Joshua Shinavier, PhD (Uber)
• Graphs
• Categories
• Data Models
• Algebraic Property Graphs
Outline
Graphs
TinkerFactory.createClassic() + vertex labels
Graph structure
Categories
An object
An arrow (morphism)
Composition
Associativity
Identity
An isomorphism
An isomorphism: f-1
∘f = ida
, f∘f-1
= idb
An isomorphism
Composition of isomorphisms
A category
A functor
Graphs and graph operations
Transformations
Data models
• Property Graphs
• Resource Description Framework (RDF)
• Relational Model
• SQL, CQL, HQL, DQL, ...
• Data interchange formats
• Protocol Buffers, Apache Thrift, Apache Avro
• Mostly based on algebraic data types (ADTs)
Some data models of interest (@ Uber)
• A simple interpretation1
I of an RDF graph is:
• A non-empty set IR of resources, called the domain or universe of I
• A set IP, called the set of properties of I
• A mapping IEXT from IP into 𝒫(IR x IR), i.e. the set of sets of pairs <x, y> with
x and y in IR
• Assuming:
• Ground RDF graphs only (⇒ no blank nodes)
Resource Description Framework (RDF)
1) World Wide Web Consortium. (2014). RDF 1.1 concepts and abstract syntax.
• Simple.
• Vertices, edges, properties
• No standard
• W3C Property Graphs working group (2013)
• W3C Graph Data Workshop (March 2019)
• Next best thing?
• TinkerPop’s Graph.Features
Property Graph data model
• Many are concerned with the data model
• Vertex/edge id data types
• Numeric, string, UUID, “any”
• Property support
• Supports vertex properties? edge properties? vertex metaproperties?
• Multi-properties?
• Property data types
• Primitive: boolean, byte, double, float, integer, long, string
• Complex: array, map, list (uniform/mixed), serializable
TinkerPop’s Graph.Features
Property graph objects
v = a vertex label (e.g. Person, Place, Dataset)
e = an edge label (e.g. knows, likes, claims)
p = a property key (e.g. name, weight)
d = a data type (e.g. String, Integer, List<String>)
Edges
Vertex properties
Edge properties
Vertex meta-properties
Meta-edges: edge to vertex
Meta-edges: vertex to edge
Bonus: hyper-edges
Bonus: RDF statements
Algebraic Property Graphs
A formal data model for enterprise Property Graphs
● Things we need
○ Formal schema & mapping language for property graphs
○ Framework for graph transformation and integration
○ Means to build virtual graphs out of non-graph datasets (and vice versa)
● Algebraic Property Graphs
○ Based on algebraic data types
○ Strong connections to type theory, database theory, and category theory:
■ A familiar approach to type inference
■ Easy to reason about: all operations and proofs mechanically verified in Coq
■ Generalizes to algebraic databases
Product and sum types, identifiers, and values
● APG:
● Thrift example:
Algebraic types in common data interchange languages
Graph, schema, and data
Natural Transformations of Graphs, Schemas, and Data
● Non-natural transformations are also useful and are described in our paper.
Taxonomy of graph elements
● Vertex: “no data”.
● Edge: ordered pair of elements.
● Property: ordered pair (element, value).
○ Vertex property: element is a vertex.
○ Edge property: element is an edge
○ Meta-property: element is another property
■ Vertex / edge metaproperty, etc.
● Alias: contains another element
○ Vertex or edge “tags”
○ Aliases for primitive data types
● Hyperelement: everything else
Useful operations on Graphs, schemas, and data
● Conjunctive and disjunctive queries (select/from/where/union)
○ In: APGs and a predicate. Out: APG
● Views
○ In: APG and APG schema mapping. Out: ‘virtual’ APG
● Convert APGs to algebraic databases, and vice versa
○ In: Schema mapping and APG / algebraic database. Out: algebraic database / APG.
Thanks
● joshsh@uber.com
● ryan@conexus.ai
https://arxiv.org/abs/1909.04881
(submitted to ICDT)

An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, November 2019)

  • 1.
    An Algebraic DataModel for Graphs and Hypergraphs Bonus Intro to Category Theory November 21, 2019 Joshua Shinavier, PhD (Uber)
  • 2.
    • Graphs • Categories •Data Models • Algebraic Property Graphs Outline
  • 3.
  • 15.
  • 16.
  • 17.
  • 19.
  • 20.
  • 22.
  • 23.
  • 25.
  • 27.
  • 28.
    An isomorphism: f-1 ∘f= ida , f∘f-1 = idb
  • 29.
  • 31.
  • 32.
  • 34.
  • 36.
    Graphs and graphoperations
  • 38.
  • 39.
  • 40.
    • Property Graphs •Resource Description Framework (RDF) • Relational Model • SQL, CQL, HQL, DQL, ... • Data interchange formats • Protocol Buffers, Apache Thrift, Apache Avro • Mostly based on algebraic data types (ADTs) Some data models of interest (@ Uber)
  • 41.
    • A simpleinterpretation1 I of an RDF graph is: • A non-empty set IR of resources, called the domain or universe of I • A set IP, called the set of properties of I • A mapping IEXT from IP into 𝒫(IR x IR), i.e. the set of sets of pairs <x, y> with x and y in IR • Assuming: • Ground RDF graphs only (⇒ no blank nodes) Resource Description Framework (RDF) 1) World Wide Web Consortium. (2014). RDF 1.1 concepts and abstract syntax.
  • 42.
    • Simple. • Vertices,edges, properties • No standard • W3C Property Graphs working group (2013) • W3C Graph Data Workshop (March 2019) • Next best thing? • TinkerPop’s Graph.Features Property Graph data model
  • 43.
    • Many areconcerned with the data model • Vertex/edge id data types • Numeric, string, UUID, “any” • Property support • Supports vertex properties? edge properties? vertex metaproperties? • Multi-properties? • Property data types • Primitive: boolean, byte, double, float, integer, long, string • Complex: array, map, list (uniform/mixed), serializable TinkerPop’s Graph.Features
  • 44.
    Property graph objects v= a vertex label (e.g. Person, Place, Dataset) e = an edge label (e.g. knows, likes, claims) p = a property key (e.g. name, weight) d = a data type (e.g. String, Integer, List<String>)
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
    A formal datamodel for enterprise Property Graphs ● Things we need ○ Formal schema & mapping language for property graphs ○ Framework for graph transformation and integration ○ Means to build virtual graphs out of non-graph datasets (and vice versa) ● Algebraic Property Graphs ○ Based on algebraic data types ○ Strong connections to type theory, database theory, and category theory: ■ A familiar approach to type inference ■ Easy to reason about: all operations and proofs mechanically verified in Coq ■ Generalizes to algebraic databases
  • 55.
    Product and sumtypes, identifiers, and values ● APG: ● Thrift example:
  • 56.
    Algebraic types incommon data interchange languages
  • 57.
  • 58.
    Natural Transformations ofGraphs, Schemas, and Data ● Non-natural transformations are also useful and are described in our paper.
  • 59.
    Taxonomy of graphelements ● Vertex: “no data”. ● Edge: ordered pair of elements. ● Property: ordered pair (element, value). ○ Vertex property: element is a vertex. ○ Edge property: element is an edge ○ Meta-property: element is another property ■ Vertex / edge metaproperty, etc. ● Alias: contains another element ○ Vertex or edge “tags” ○ Aliases for primitive data types ● Hyperelement: everything else
  • 60.
    Useful operations onGraphs, schemas, and data ● Conjunctive and disjunctive queries (select/from/where/union) ○ In: APGs and a predicate. Out: APG ● Views ○ In: APG and APG schema mapping. Out: ‘virtual’ APG ● Convert APGs to algebraic databases, and vice versa ○ In: Schema mapping and APG / algebraic database. Out: algebraic database / APG.
  • 61.