SlideShare a Scribd company logo
1 of 29
Download to read offline
Anything-to-Graph
Knowledge Graph Conference
May 2021
Joshua Shinavier, PhD ( : joshsh)
Data @
Overview
○ Building graphs
○ Models
○ Mappings
○ Use cases
○ TinkerPop 4
Building graphs
There are graphs, and graphs
○ Domain-specific graphs are simplest
○ ETL a few data sources into a graph
○ Mappings can be written and maintained by hand
○ Off-the-shelf tools work well
○ Difficulty increases with
○ Complexity of source schemas
○ Diversity of ownership, quality, and governance in data sources
○ Lack of standardization on languages and vocabulary
○ Some challenges are organizational, others technical
○ Organizational: see Lessons From Reality (KGC 2019)
○ Technical: let’s take a closer look
The challenges of heterogeneity
○ Diverse data sources → need supporting metadata
○ E.g. see Uber’s Databook
○ Diverse data governance → need better data isolation
○ Typically, RDF triple stores do better than PG databases here
○ Diverse domain data models → need standardized schemas
○ E.g. see Uber’s Data Standardization
○ Diverse schema and data languages → need well-defined mappings
○ Enter the Dragon
Prerequisites
○ Data must conform to a schema
○ It is OK to have a mix of schemas, and schema languages
○ Unique identifiers must be clearly distinguished, and typed
○ What is this UUID field? Does it identify a User, a Document, etc.?
○ Some degree of standardization is needed
○ Are identifiers consistent across data sources?
○ Can this timestamp value be compared with that timestamp value? Etc.
○ Need well-defined mappings
○ From each data language into a graph format
○ From each schema language into a graph schema language
○ ...without losing too much information
○ ...and while maintaining consistency between schema and data
Models
Is there a “universal data model” for graphs?
○ Desirable characteristics
○ Centrality: ease of alignment with other graph and non-graph models
○ Flexibility: captures a wide variety of graph structures
○ Formality: serves as a tool for reasoning about data models
○ Practicality: serves as a basis for inference, query optimization, etc.
○ Intuitiveness: a good graph schema captures our mental model
○ Where can the right abstractions be found?
○ Logics? Set theory? Algebras? Category theory?
○ Does the data model already exist?
○ What about RDF-based languages (RDFS, OWL, SHACL, ShEx, etc.)?
○ Is that what GQL is going to be?
Graph features are very diverse
Spreadsheet by Gábor Szárnyas
Complexity is not the answer
○ No one language will ever satisfy all graph use cases
○ Implementing the model must be simple, or developers won’t
○ Who remembers CODASYL? Why aren’t we using it?
○ Go for minimalism + extensibility
○ Dimensions being explored for GQL
○ Mathematical foundations
○ Nominal vs. structural typing
○ Inheritance vs. composition
○ Graph element types vs. property types
○ Descriptive vs. prescriptive schemas
A little algebra goes a long way
○ Algebraic Property Graphs1
○ Pragmatic and opinionated graph data model
○ Schemas are vertex/edge/etc. labels bound to algebraic data types
○ Formally defined using category theory
○ Effectively a subset of SHACL
○ Ideally, GQL will also be a superset
[1] https://arxiv.org/abs/1909.04881
Mappings
Desirable properties
○ Composability
○ If we have f : A → B and g : B → C, we should also have f;g : A → C
○ Bidirectionality
○ If we have f : A → B, we should also have f-1
: B → A, with f;f-1
≅ f-1
;f ≅ id
○ Data : schema consistency
○ Mappings should be defined for data and schemas languages in parallel
○ If data d conforms to schema s, then we need f(d) to conform to f(s)
Topology of mappings
○ Arbitrary / ad-hoc topology
○ Pros: seems easiest; just use the pairwise mappings you have
○ Cons: more indirection, and mappings may not compose well
○ Star topology
○ Pros: mappings compose well, and all paths are short
○ Cons: need to define/identify a central data model (the “dragon”)
Transforming schemas and data
○ Schema-level mappings
○ Transformations on data types
○ Data-level mappings
○ Transformations on instances of data types
○ Schema and data-level mappings must be consistent
○ a :: A ⇒ F(a) :: F(A)
○ Use common intermediate representations
Dragon
○ Framework for data and schema migration
○ Developed for Uber’s Data Standardization
○ Dragon data model
○ Extension of Algebraic Property Graphs
○ Covers the majority of schemas at Uber
○ Sources and targets
○ RPC / interface languages: Protocol Buffers, Apache Thrift, Apache Avro
○ RDF-based languages: SHACL, OWL
○ “Schemaless” formats: YAML, JSON
○ Programming languages: Haskell, Java, Scala
○ Implementations in Haskell and Java
○ Dragon generates over half of its own source code (from YAML)
$ dragon transform -i Protobuf -o SHACL $SCHEMAS -d maps /tmp/shacl
______. ______/ .______
Dragon 0.3.4 ----___ O .. O ___----
---------------------vvv----vvvv----vvv---------------------
Loading configuration from dragon.yaml
Reading from Protobuf starting at maps in base directory /Users/joshsh/projects/uber/example/idl
Found 3 *.proto sources in /Users/joshsh/projects/uber/example/idl/maps
Visiting 3 files (total of 0 so far):
/Users/joshsh/projects/uber/example/idl/maps/coordinates.proto
/Users/joshsh/projects/uber/example/idl/maps/geometry.proto
/Users/joshsh/projects/uber/example/idl/maps/position.proto
Visiting 3 files (total of 3 so far):
/Users/joshsh/projects/uber/example/idl/physics/units.proto
/Users/joshsh/projects/uber/example/idl/time/duration.proto
/Users/joshsh/projects/uber/example/idl/time/epoch.proto
Instantiated a graph of 12 schemas
Note 1 alert:
info: unsigned integers not supported for RDF targets. Using signed integers
Writing 12 SHACL artifacts to /tmp/shacl
Use cases
JSON-to-Graph
○ Graph data formats are used nowhere in the enterprise
○ RDF serialization formats, GraphML, etc. are a hard sell
○ JSON and YAML are used everywhere
○ Give the JSON or YAML a schema, then treat it like a serialized graph
○ E.g. Databook1
snapshots to RDF
[1] https://eng.uber.com/metadata-insights-databook
gRPC-to-Graph
○ Don’t let developers write individual edges / triples to the graph
○ Schema constraints are harder to enforce, errors are harder to understand
○ Transform graph schemas into developer-facing schemas
○ Use familiar schema languages and protocols like Protobuf + gRPC
○ Transform payload messages into graph data
○ Schema and data transformations guarantee constraint satisfaction
Enriching domain schemas
○ Need a little more than “just Protobuf” / “just Thrift” to build a big graph
○ E.g. entity / identifier types need to be widely re-used
○ Start with a graph schema (e.g. in YAML)
○ Propagate logical data types into domain schema languages
○ At Uber: Protobuf, Thrift, Avro
○ Re-use the logical types in many domain schemas
○ Incentivize re-use, and use schema transformations for metrics
○ Now, graph-friendly types are everywhere
Toward TinkerPop 4
○ Alibaba Graph Database
○ Amazon Neptune
○ ArangoDB
○ Bitsy
○ Blazegraph
○ CosmosDB
○ ChronoGraph
○ DSEGraph
○ GRAKN.AI
○ Hadoop (Spark)
○ HGraphDB
○ Huawei Graph Engine Service
○ IBM Graph
○ JanusGraph
○ Neo4j
○ neo4j-gremlin-bolt
○ OrientDB
○ Apache S2Graph
○ Sqlg
○ Stardog
○ TinkerGraph
○ Titan
○ Titan + Tupl
○ Unipop
Dozens of graph systems
○ Clojure: ogre
○ Cypher: cypher-for-gremlin
○ Elixir: gremlex
○ Go: grammes, gremgo
○ Haskell: greskell, gremlin-haskell
○ Java: Ferma, gremlin-objects,
Peapod, spring-data-gremlin,
gremlin-driver
○ JavaScript: gremlin-javascript,
gremlin-orm,
gremlin-template-string
○ Kotlin: kotlin-gremlin-ogm
○ .NET: Gremlin.Net, Gremlinq
○ PHP: gremlin-php
○ Python: Goblin, gremlin-python,
gremlin-py, ipython-gremlin,
gremlinclient, gremlin-python, JUGRI,
gremlinrestclient, python-gremlin-rest
○ Ruby: gremlin_client
○ Rust: gremlin-rs
○ Scala: gremlin-scala,
reactive-gremlin,
scalajs-gremlin-client
○ SPARQL: sparql-gremlin
○ SQL: sql-gremlin
○ Typescript: ts-tinkerpop
Dozens of programming languages
Interoperability via mappings
○ We need parity across languages and systems
○ Make it easier to create new Gremlin language variants in a consistent way
○ “Escape from the JVM”
○ Code generation can help
○ Generate graph APIs consistently into many programming languages
○ Generate grammars for parsing/validation
A unifying schema language
○ A few vendor-specific schema languages exist
○ Weak and disconnected from each other
○ Also note TinkerPop’s Graph.Features
○ Need a real, vendor-neutral schema language
○ Facilitate composability of data and queries
○ Enable inference for type safety and optimizations
○ Propagate logical schemas into multiple graph back-ends
○ Build vendor-agnostic tools for data migration
Serialization formats for graphs
○ Currently serialization options are limited
○ GraphML (XML), GraphSON (JSON), GraphBinary, Gryo (Kryo)
○ Generic JSON and YAML are straightforward
○ What about common RPC formats?
○ Protobuf, Thrift, Avro, etc.
○ What about RDF serialization formats?
○ N-Triples, Turtle, JSON-LD, etc.
○ Others?
○ Parquet? CBOR? FlatBuffers? MessagePack? Etc.
Stay tuned
○ “How to Build a Dragon”
○ https://www.meetup.com/Category-Theory
○ Release Dragon!
○ http://bit.ly/release_dragon
○ Gremlin-users
○ https://groups.google.com/g/gremlin-users
@KGConference
linkedin.com/company/the-knowldge-graph-conference/
youtube.com/playlist?list=PLAiy7NYe9U2Gjg-600CTV1HGypiF95d_D
@joshsh
Proprietary © 2021 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any
form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval
systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to
whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under
applicable law. All recipients of this document are notified that the information contained herein includes proprietary and
confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any
of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.

More Related Content

What's hot

Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 
CIDOC CRM Tutorial
CIDOC CRM TutorialCIDOC CRM Tutorial
CIDOC CRM TutorialISLCCIFORTH
 
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Neo4j
 
stackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introductionstackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – IntroductionNETWAYS
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)EUCLID project
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectOntotext
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQLOlaf Hartig
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainNeo4j
 
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinVectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinDatabricks
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesOsama Jomaa
 
Big Pharma Problems. Big Graphs: Creating the Merck Manufacturing Mesh
Big Pharma Problems. Big Graphs: Creating the Merck Manufacturing MeshBig Pharma Problems. Big Graphs: Creating the Merck Manufacturing Mesh
Big Pharma Problems. Big Graphs: Creating the Merck Manufacturing MeshNeo4j
 

What's hot (20)

Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
CIDOC CRM Tutorial
CIDOC CRM TutorialCIDOC CRM Tutorial
CIDOC CRM Tutorial
 
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
 
stackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introductionstackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introduction
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQL
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
 
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinVectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 
Big Pharma Problems. Big Graphs: Creating the Merck Manufacturing Mesh
Big Pharma Problems. Big Graphs: Creating the Merck Manufacturing MeshBig Pharma Problems. Big Graphs: Creating the Merck Manufacturing Mesh
Big Pharma Problems. Big Graphs: Creating the Merck Manufacturing Mesh
 

Similar to Build Graphs from Diverse Data Sources

HPEC 2021 sparse binary format
HPEC 2021 sparse binary formatHPEC 2021 sparse binary format
HPEC 2021 sparse binary formatErikWelch2
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinthsDaniel Camarda
 
Substrait Overview.pdf
Substrait Overview.pdfSubstrait Overview.pdf
Substrait Overview.pdfRinat Abdullin
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Aaron Saray
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
Ballerina Tutorial @ SummerSOC 2019
Ballerina Tutorial @ SummerSOC 2019Ballerina Tutorial @ SummerSOC 2019
Ballerina Tutorial @ SummerSOC 2019kshanth2101
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...Jean Ihm
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypheropenCypher
 
Overview of no sql
Overview of no sqlOverview of no sql
Overview of no sqlSean Murphy
 
Graph analysis over relational database
Graph analysis over relational databaseGraph analysis over relational database
Graph analysis over relational databaseGraphRM
 
Gremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphGremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphStephen Mallette
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetupJoshua Bae
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022ArangoDB Database
 

Similar to Build Graphs from Diverse Data Sources (20)

HPEC 2021 sparse binary format
HPEC 2021 sparse binary formatHPEC 2021 sparse binary format
HPEC 2021 sparse binary format
 
Data quality in Real Estate
Data quality in Real EstateData quality in Real Estate
Data quality in Real Estate
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinths
 
Substrait Overview.pdf
Substrait Overview.pdfSubstrait Overview.pdf
Substrait Overview.pdf
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Ballerina Tutorial @ SummerSOC 2019
Ballerina Tutorial @ SummerSOC 2019Ballerina Tutorial @ SummerSOC 2019
Ballerina Tutorial @ SummerSOC 2019
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and Cypher
 
Overview of no sql
Overview of no sqlOverview of no sql
Overview of no sql
 
Graph analysis over relational database
Graph analysis over relational databaseGraph analysis over relational database
Graph analysis over relational database
 
Gremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphGremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise Graph
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 

More from Joshua Shinavier

In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)Joshua Shinavier
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)Joshua Shinavier
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Joshua Shinavier
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsJoshua Shinavier
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.orgJoshua Shinavier
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsJoshua Shinavier
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsJoshua Shinavier
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsJoshua Shinavier
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked DataJoshua Shinavier
 

More from Joshua Shinavier (11)

In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBs
 
Semantics and Sensors
Semantics and SensorsSemantics and Sensors
Semantics and Sensors
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.org
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
Linked Process
Linked ProcessLinked Process
Linked Process
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 chars
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked Data
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Build Graphs from Diverse Data Sources

  • 1. Anything-to-Graph Knowledge Graph Conference May 2021 Joshua Shinavier, PhD ( : joshsh) Data @
  • 2. Overview ○ Building graphs ○ Models ○ Mappings ○ Use cases ○ TinkerPop 4
  • 4. There are graphs, and graphs ○ Domain-specific graphs are simplest ○ ETL a few data sources into a graph ○ Mappings can be written and maintained by hand ○ Off-the-shelf tools work well ○ Difficulty increases with ○ Complexity of source schemas ○ Diversity of ownership, quality, and governance in data sources ○ Lack of standardization on languages and vocabulary ○ Some challenges are organizational, others technical ○ Organizational: see Lessons From Reality (KGC 2019) ○ Technical: let’s take a closer look
  • 5. The challenges of heterogeneity ○ Diverse data sources → need supporting metadata ○ E.g. see Uber’s Databook ○ Diverse data governance → need better data isolation ○ Typically, RDF triple stores do better than PG databases here ○ Diverse domain data models → need standardized schemas ○ E.g. see Uber’s Data Standardization ○ Diverse schema and data languages → need well-defined mappings ○ Enter the Dragon
  • 6. Prerequisites ○ Data must conform to a schema ○ It is OK to have a mix of schemas, and schema languages ○ Unique identifiers must be clearly distinguished, and typed ○ What is this UUID field? Does it identify a User, a Document, etc.? ○ Some degree of standardization is needed ○ Are identifiers consistent across data sources? ○ Can this timestamp value be compared with that timestamp value? Etc. ○ Need well-defined mappings ○ From each data language into a graph format ○ From each schema language into a graph schema language ○ ...without losing too much information ○ ...and while maintaining consistency between schema and data
  • 8. Is there a “universal data model” for graphs? ○ Desirable characteristics ○ Centrality: ease of alignment with other graph and non-graph models ○ Flexibility: captures a wide variety of graph structures ○ Formality: serves as a tool for reasoning about data models ○ Practicality: serves as a basis for inference, query optimization, etc. ○ Intuitiveness: a good graph schema captures our mental model ○ Where can the right abstractions be found? ○ Logics? Set theory? Algebras? Category theory? ○ Does the data model already exist? ○ What about RDF-based languages (RDFS, OWL, SHACL, ShEx, etc.)? ○ Is that what GQL is going to be?
  • 9. Graph features are very diverse Spreadsheet by Gábor Szárnyas
  • 10. Complexity is not the answer ○ No one language will ever satisfy all graph use cases ○ Implementing the model must be simple, or developers won’t ○ Who remembers CODASYL? Why aren’t we using it? ○ Go for minimalism + extensibility ○ Dimensions being explored for GQL ○ Mathematical foundations ○ Nominal vs. structural typing ○ Inheritance vs. composition ○ Graph element types vs. property types ○ Descriptive vs. prescriptive schemas
  • 11. A little algebra goes a long way ○ Algebraic Property Graphs1 ○ Pragmatic and opinionated graph data model ○ Schemas are vertex/edge/etc. labels bound to algebraic data types ○ Formally defined using category theory ○ Effectively a subset of SHACL ○ Ideally, GQL will also be a superset [1] https://arxiv.org/abs/1909.04881
  • 13. Desirable properties ○ Composability ○ If we have f : A → B and g : B → C, we should also have f;g : A → C ○ Bidirectionality ○ If we have f : A → B, we should also have f-1 : B → A, with f;f-1 ≅ f-1 ;f ≅ id ○ Data : schema consistency ○ Mappings should be defined for data and schemas languages in parallel ○ If data d conforms to schema s, then we need f(d) to conform to f(s)
  • 14. Topology of mappings ○ Arbitrary / ad-hoc topology ○ Pros: seems easiest; just use the pairwise mappings you have ○ Cons: more indirection, and mappings may not compose well ○ Star topology ○ Pros: mappings compose well, and all paths are short ○ Cons: need to define/identify a central data model (the “dragon”)
  • 15. Transforming schemas and data ○ Schema-level mappings ○ Transformations on data types ○ Data-level mappings ○ Transformations on instances of data types ○ Schema and data-level mappings must be consistent ○ a :: A ⇒ F(a) :: F(A) ○ Use common intermediate representations
  • 16. Dragon ○ Framework for data and schema migration ○ Developed for Uber’s Data Standardization ○ Dragon data model ○ Extension of Algebraic Property Graphs ○ Covers the majority of schemas at Uber ○ Sources and targets ○ RPC / interface languages: Protocol Buffers, Apache Thrift, Apache Avro ○ RDF-based languages: SHACL, OWL ○ “Schemaless” formats: YAML, JSON ○ Programming languages: Haskell, Java, Scala ○ Implementations in Haskell and Java ○ Dragon generates over half of its own source code (from YAML)
  • 17. $ dragon transform -i Protobuf -o SHACL $SCHEMAS -d maps /tmp/shacl ______. ______/ .______ Dragon 0.3.4 ----___ O .. O ___---- ---------------------vvv----vvvv----vvv--------------------- Loading configuration from dragon.yaml Reading from Protobuf starting at maps in base directory /Users/joshsh/projects/uber/example/idl Found 3 *.proto sources in /Users/joshsh/projects/uber/example/idl/maps Visiting 3 files (total of 0 so far): /Users/joshsh/projects/uber/example/idl/maps/coordinates.proto /Users/joshsh/projects/uber/example/idl/maps/geometry.proto /Users/joshsh/projects/uber/example/idl/maps/position.proto Visiting 3 files (total of 3 so far): /Users/joshsh/projects/uber/example/idl/physics/units.proto /Users/joshsh/projects/uber/example/idl/time/duration.proto /Users/joshsh/projects/uber/example/idl/time/epoch.proto Instantiated a graph of 12 schemas Note 1 alert: info: unsigned integers not supported for RDF targets. Using signed integers Writing 12 SHACL artifacts to /tmp/shacl
  • 19. JSON-to-Graph ○ Graph data formats are used nowhere in the enterprise ○ RDF serialization formats, GraphML, etc. are a hard sell ○ JSON and YAML are used everywhere ○ Give the JSON or YAML a schema, then treat it like a serialized graph ○ E.g. Databook1 snapshots to RDF [1] https://eng.uber.com/metadata-insights-databook
  • 20. gRPC-to-Graph ○ Don’t let developers write individual edges / triples to the graph ○ Schema constraints are harder to enforce, errors are harder to understand ○ Transform graph schemas into developer-facing schemas ○ Use familiar schema languages and protocols like Protobuf + gRPC ○ Transform payload messages into graph data ○ Schema and data transformations guarantee constraint satisfaction
  • 21. Enriching domain schemas ○ Need a little more than “just Protobuf” / “just Thrift” to build a big graph ○ E.g. entity / identifier types need to be widely re-used ○ Start with a graph schema (e.g. in YAML) ○ Propagate logical data types into domain schema languages ○ At Uber: Protobuf, Thrift, Avro ○ Re-use the logical types in many domain schemas ○ Incentivize re-use, and use schema transformations for metrics ○ Now, graph-friendly types are everywhere
  • 23. ○ Alibaba Graph Database ○ Amazon Neptune ○ ArangoDB ○ Bitsy ○ Blazegraph ○ CosmosDB ○ ChronoGraph ○ DSEGraph ○ GRAKN.AI ○ Hadoop (Spark) ○ HGraphDB ○ Huawei Graph Engine Service ○ IBM Graph ○ JanusGraph ○ Neo4j ○ neo4j-gremlin-bolt ○ OrientDB ○ Apache S2Graph ○ Sqlg ○ Stardog ○ TinkerGraph ○ Titan ○ Titan + Tupl ○ Unipop Dozens of graph systems
  • 24. ○ Clojure: ogre ○ Cypher: cypher-for-gremlin ○ Elixir: gremlex ○ Go: grammes, gremgo ○ Haskell: greskell, gremlin-haskell ○ Java: Ferma, gremlin-objects, Peapod, spring-data-gremlin, gremlin-driver ○ JavaScript: gremlin-javascript, gremlin-orm, gremlin-template-string ○ Kotlin: kotlin-gremlin-ogm ○ .NET: Gremlin.Net, Gremlinq ○ PHP: gremlin-php ○ Python: Goblin, gremlin-python, gremlin-py, ipython-gremlin, gremlinclient, gremlin-python, JUGRI, gremlinrestclient, python-gremlin-rest ○ Ruby: gremlin_client ○ Rust: gremlin-rs ○ Scala: gremlin-scala, reactive-gremlin, scalajs-gremlin-client ○ SPARQL: sparql-gremlin ○ SQL: sql-gremlin ○ Typescript: ts-tinkerpop Dozens of programming languages
  • 25. Interoperability via mappings ○ We need parity across languages and systems ○ Make it easier to create new Gremlin language variants in a consistent way ○ “Escape from the JVM” ○ Code generation can help ○ Generate graph APIs consistently into many programming languages ○ Generate grammars for parsing/validation
  • 26. A unifying schema language ○ A few vendor-specific schema languages exist ○ Weak and disconnected from each other ○ Also note TinkerPop’s Graph.Features ○ Need a real, vendor-neutral schema language ○ Facilitate composability of data and queries ○ Enable inference for type safety and optimizations ○ Propagate logical schemas into multiple graph back-ends ○ Build vendor-agnostic tools for data migration
  • 27. Serialization formats for graphs ○ Currently serialization options are limited ○ GraphML (XML), GraphSON (JSON), GraphBinary, Gryo (Kryo) ○ Generic JSON and YAML are straightforward ○ What about common RPC formats? ○ Protobuf, Thrift, Avro, etc. ○ What about RDF serialization formats? ○ N-Triples, Turtle, JSON-LD, etc. ○ Others? ○ Parquet? CBOR? FlatBuffers? MessagePack? Etc.
  • 28. Stay tuned ○ “How to Build a Dragon” ○ https://www.meetup.com/Category-Theory ○ Release Dragon! ○ http://bit.ly/release_dragon ○ Gremlin-users ○ https://groups.google.com/g/gremlin-users @KGConference linkedin.com/company/the-knowldge-graph-conference/ youtube.com/playlist?list=PLAiy7NYe9U2Gjg-600CTV1HGypiF95d_D @joshsh
  • 29. Proprietary © 2021 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.