SlideShare a Scribd company logo
1 of 42
Download to read offline
TinkerPop 2020Joshua Shinavier, Ph.D.
Global Graph Summit
Austin, Texas - January 25th
, 2020
○ A brief history of Gremlin
○ Open problems
○ What’s next
TinkerPop 2020
A brief history of Gremlin
Long, long ago...
Long, long ago...
Long, long ago...
TinkerPop 0.x
TinkerPop 0.x
○ “Making stuff for the fun of it..."
○ From RDF to the property graph data model
○ A Turing complete path language for graphs
○ Ripple!
○ Oh, and Gremlin
○ Blueprints
○ “JDBC for graphs”
○ RDF ←→ PG support added early on
TinkerPop 0.x
○ Rexster
○ Server for Blueprints-enabled graphs
○ Predecessor of Gremlin Server
○ Pipes
○ Pull-based dataflow framework
○ Frames
○ Object-oriented graph interfaces using Java annotations
TinkerPop 1.x
TinkerPop 1.x
○ Supported graph back-ends as of May 2012:
○ TinkerGraph (in-memory), Neo4j, OrientDB, DEX (Sparksee)
○ Blueprints adapters for
○ Sesame RDF (RDF4j), JUNG
TinkerPop 2.x
TinkerPop 2.x
○ Furnace
○ Algorithms package built for property graphs
○ Predecessor of graph OLAP in TinkerPop3
○ New language ecosystems
○ Expansion of functionality on top of Blueprints
TinkerPop 3.x
○ Complete rewrite of TinkerPop
○ Focus on scale and performance
○ Symmetry between OLTP and OLAP
○ Gremlin becomes more central
○ Git mono-repo
○ Interfaces with not only graph DBs, but
graph processors
TinkerPop 3.x
○ Not-only-JVM
○ Gremlin in native programming languages
○ Now dozens of graph systems implementing TinkerPop
○ Third-party managed libraries and tools
Apache TinkerPop
Graph systems
○ Alibaba Graph Database
○ Amazon Neptune
○ ArangoDB
○ Bitsy
○ Blazegraph
○ CosmosDB
○ ChronoGraph
○ DSEGraph
○ GRAKN.AI
○ Hadoop (Spark)
○ HGraphDB
○ Huawei Graph Engine Service
○ IBM Graph
○ JanusGraph
○ Neo4j
○ neo4j-gremlin-bolt
○ OrientDB
○ Apache S2Graph
○ Sqlg
○ Stardog
○ TinkerGraph
○ Titan
○ Titan + Tupl
○ Unipop
Query languages, drivers, and GLVs
○ Clojure: ogre
○ Cypher: cypher-for-gremlin
○ Elixir: gremlex
○ Go: grammes, gremgo
○ Haskell: greskell, gremlin-haskell
○ Java: Ferma, gremlin-objects,
Peapod, spring-data-gremlin,
gremlin-driver
○ JavaScript: gremlin-javascript,
gremlin-orm,
gremlin-template-string
○ Kotlin: kotlin-gremlin-ogm
○ .NET: Gremlin.Net, Gremlinq
○ PHP: gremlin-php
○ Python: Goblin, gremlin-python,
gremlin-py, ipython-gremlin,
gremlinclient, gremlin-python, JUGRI,
gremlinrestclient, python-gremlin-rest
○ Ruby: gremlin_client
○ Rust: gremlin-rs
○ Scala: gremlin-scala,
reactive-gremlin,
scalajs-gremlin-client
○ SPARQL: sparql-gremlin
○ SQL: sql-gremlin
○ Typescript: ts-tinkerpop
Open problems
Escape from the JVM
○ TinkerPop originally 100% Java + Groovy
○ Still very JVM-heavy
○ Gremlin-Server is Java-only
○ How to achieve parity across languages?
○ Ideally: complete Gremlin VM in every language ecosystem
○ Code generation?
○ How to generate both:
○ Clean APIs
○ Efficient runtime code
○ ...that fit together?
Making life easier for graph providers
○ Creating TinkerPop implementations
○ Currently a monolithic effort for each language / environment
○ How do we:
○ Ensure consistency across implementations?
○ Reduce the workload?
○ Thoughtful test suite
○ Rigorous in terms of correct operations
○ Does not force functionality that may not fit
○ Types and constraints may help
Network serialization formats
○ GraphML (XML)
○ Widely supported
○ Graphs only
○ GraphSON (JSON)
○ TinkerPop-specific
○ Graphs, elements, paths, etc.
○ {1.0, 2.0, 3.0}
○ GraphBinary
○ TinkerPop-specific
○ Graphs, elements, paths, etc.
○ Good forward-compatibility
○ Gryo (Kryo)
○ JVM only
Network serialization formats
○ GraphML (XML)
○ Widely supported
○ Graphs only
○ GraphSON (JSON)
○ TinkerPop-specific
○ Graphs, elements, paths, etc.
○ {1.0, 2.0, 3.0}
○ GraphBinary
○ TinkerPop-specific
○ Graphs, elements, paths, etc.
○ Good forward-compatibility
○ Gryo (Kryo)
○ JVM only
○ Bit of a format zoo
○ One format to rule them all?
○ Mappings between formats?
○ Will schemas help?
○ How about common RPC formats
○ Thrift, Protobuf, Avro, etc.
○ Property graphs:
○ Strong on intuitiveness
○ Historically weak on schema
○ Lightweight property graph schemas
○ E.g. in JanusGraph, Neo4j, basic Graph.Features
○ Stronger graph schemas
○ RDF triple stores, hypergraph databases, object databases, etc.
○ Schemas facilitate composability of data and queries
○ ...enabling optimizations, mappings, migration, other good stuff
○ What’s the best fit for TinkerPop?
Schemas in TinkerPop
Getting transactions right
○ How to support diverse transactional models?
○ Neo4j is different than JanusGraph is different than...
○ Is there a unified approach to:
○ Threads + queries + transactions?
○ Transactional scope?
○ Transaction failures?
○ Nested transactions?
○ etc.
○ Will functional approaches to concurrency help?
Static analysis for traversals
○ Stop supporting opaque traversals
○ Security issues
○ Portability issues
○ Need a replacement for closures/lambdas
○ “Just write Gremlin”
○ What additional features are required?
Graph stream processing
○ Much of the world’s data is streaming
○ Much of that data describes entities and relationships
○ Decades of research on relational stream processing
○ 10+ years on continuous SPARQL
○ What is continuous Gremlin?
○ (RDF)-[:betterThan]->(PG) for streaming
○ RDF stream := unbounded sequence of triples
○ Property graph stream := ?
○ Need schemas, global identifiers, set operations on graphs
Abstractions
Data models
Query languages
Formal inference
Transformations
Embeddings
Graph +
Relational model
Streams
...
Human and machine knowledge
Knowledge graphs
Enterprise
Personal
Collaborative
Mental representations
Representation learning
Visualization and HCI
...
Processing and performance
Graph...
Ingestion
Generation
Partitioning
Compression
Concurrent systems
Parallel
Distributed
Graph analytics
Hardware acceleration
Benchmarks and metrics
...
The 1010
foot view
What’s next
From Graph.Features to a real type system
○ No existing standard for property graphs
○ Recent community efforts
○ W3C Workshop on Web Standardization for Graph Data (March 2019)
○ Property Graph Schema Working Group (PGSWG)
○ Graph Query Language (GQL)
○ Don’t forget about external data models
○ Relational model
○ RDF and other graph models
○ Data interchange formats (Protocol Buffers, Thrift, Avro, etc.)
○ OO, ER, and semistructured data models
Taming the dragon (connecting 3+ data models)
Taming the dragon (connecting 3+ data models)
→
Algebraic Property Graphs
○ Last year at Data Day...
○ A Graph is a Graph is a Graph
○ Composable and bidirectional mappings
○ Formal property graph data model
○ Taxonomy of graph elements
○ Use category theory for the model
○ Developed with Ryan Wisnesky (Conexus AI)
○ Implementations in Haskell and CQL
○ Minimal cover for enterprise data
○ Analogous features in graph and non-graph data models
EIements, labels, values, and types
Graph transformations
Building structure APIs
○ Vertices, edges, and properties
○ Special cases that can be derived from the type system
○ Graphs are different
○ Not described in terms of types
○ Graph API is often redundant in TinkerPop3
○ Structure APIs currently written by hand
○ In each language, for each Gremlin Language Variant
○ We can generate consistent interfaces across GLVs
○ Some tooling already exists
○ Build new tools if we want to make it easier
Building process APIs
○ Need abstractions for graph processing
○ Steps, constraints, traversals
○ Freebie: every traversal has a graph representation
○ Graph programs as graph data
○ Generate process APIs for each GLV
○ Using a schema; analogous to generating structure APIs
○ Possible to also generate process implementations?
○ That would be great, but... TBD
○ Code gen options: Haskell? Idris? LLVM? Custom code...
Abstractions for graph processing
○ Gremlin traversals are “like” monadic composition
○ Let’s make them properly monadic
○ Pure functional encapsulation of:
○ Side-effects, transactions, exception handling
○ Learn from existing functional approaches to Gremlin
○ Gremlin-Scala, Greskell, Gremlin-Haskell
Mainstream languages for serialization
Transforming graph data and operations
○ Need a language for schema mappings
○ In theory, that gives us:
○ Automated query rewriting
○ Automated data migration
○ Mix-and-match operations
○ Easy, right...?
Making a smooth transition
○ (TP3 → TP4) ≠ (TP2 → TP3)
○ Large user base, good support for TinkerPop3
○ Q: how do we:
○ Make new features useful to the current community
○ Make the migration to TinkerPop4 as seamless as possible
○ A: we try stuff out
○ “The revolution will be A/B tested”
○ Get involved!
○ gremlin-users@googlegroups.com
○ dev@tinkerpop.apache.org
Thanks!
Joshua Shinavier
joshsh@uber.com
{ }∪{ , , , ...
Stephen Mallette Marko Rodriguez Ketrina Yim Graph community

More Related Content

What's hot

Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 

What's hot (20)

Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Approaching (almost) Any NLP Problem
Approaching (almost) Any NLP ProblemApproaching (almost) Any NLP Problem
Approaching (almost) Any NLP Problem
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 

Similar to TinkerPop 2020

Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
Pierre de Lacaze
 

Similar to TinkerPop 2020 (20)

AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better Recommendations
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
 
Cypher for Apache Spark
Cypher for Apache SparkCypher for Apache Spark
Cypher for Apache Spark
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Marco Liberati - Graph analytics
Marco Liberati - Graph analyticsMarco Liberati - Graph analytics
Marco Liberati - Graph analytics
 
GraphQL ♥︎ GraphDB
GraphQL ♥︎ GraphDBGraphQL ♥︎ GraphDB
GraphQL ♥︎ GraphDB
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 

More from Joshua Shinavier

The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
Joshua Shinavier
 

More from Joshua Shinavier (11)

In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBs
 
Semantics and Sensors
Semantics and SensorsSemantics and Sensors
Semantics and Sensors
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.org
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
Linked Process
Linked ProcessLinked Process
Linked Process
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 chars
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked Data
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 

TinkerPop 2020

  • 1. TinkerPop 2020Joshua Shinavier, Ph.D. Global Graph Summit Austin, Texas - January 25th , 2020
  • 2. ○ A brief history of Gremlin ○ Open problems ○ What’s next TinkerPop 2020
  • 3. A brief history of Gremlin
  • 8. TinkerPop 0.x ○ “Making stuff for the fun of it..." ○ From RDF to the property graph data model ○ A Turing complete path language for graphs ○ Ripple! ○ Oh, and Gremlin ○ Blueprints ○ “JDBC for graphs” ○ RDF ←→ PG support added early on
  • 9. TinkerPop 0.x ○ Rexster ○ Server for Blueprints-enabled graphs ○ Predecessor of Gremlin Server ○ Pipes ○ Pull-based dataflow framework ○ Frames ○ Object-oriented graph interfaces using Java annotations
  • 11. TinkerPop 1.x ○ Supported graph back-ends as of May 2012: ○ TinkerGraph (in-memory), Neo4j, OrientDB, DEX (Sparksee) ○ Blueprints adapters for ○ Sesame RDF (RDF4j), JUNG
  • 13. TinkerPop 2.x ○ Furnace ○ Algorithms package built for property graphs ○ Predecessor of graph OLAP in TinkerPop3 ○ New language ecosystems ○ Expansion of functionality on top of Blueprints
  • 15. ○ Complete rewrite of TinkerPop ○ Focus on scale and performance ○ Symmetry between OLTP and OLAP ○ Gremlin becomes more central ○ Git mono-repo ○ Interfaces with not only graph DBs, but graph processors TinkerPop 3.x
  • 16. ○ Not-only-JVM ○ Gremlin in native programming languages ○ Now dozens of graph systems implementing TinkerPop ○ Third-party managed libraries and tools Apache TinkerPop
  • 17. Graph systems ○ Alibaba Graph Database ○ Amazon Neptune ○ ArangoDB ○ Bitsy ○ Blazegraph ○ CosmosDB ○ ChronoGraph ○ DSEGraph ○ GRAKN.AI ○ Hadoop (Spark) ○ HGraphDB ○ Huawei Graph Engine Service ○ IBM Graph ○ JanusGraph ○ Neo4j ○ neo4j-gremlin-bolt ○ OrientDB ○ Apache S2Graph ○ Sqlg ○ Stardog ○ TinkerGraph ○ Titan ○ Titan + Tupl ○ Unipop
  • 18. Query languages, drivers, and GLVs ○ Clojure: ogre ○ Cypher: cypher-for-gremlin ○ Elixir: gremlex ○ Go: grammes, gremgo ○ Haskell: greskell, gremlin-haskell ○ Java: Ferma, gremlin-objects, Peapod, spring-data-gremlin, gremlin-driver ○ JavaScript: gremlin-javascript, gremlin-orm, gremlin-template-string ○ Kotlin: kotlin-gremlin-ogm ○ .NET: Gremlin.Net, Gremlinq ○ PHP: gremlin-php ○ Python: Goblin, gremlin-python, gremlin-py, ipython-gremlin, gremlinclient, gremlin-python, JUGRI, gremlinrestclient, python-gremlin-rest ○ Ruby: gremlin_client ○ Rust: gremlin-rs ○ Scala: gremlin-scala, reactive-gremlin, scalajs-gremlin-client ○ SPARQL: sparql-gremlin ○ SQL: sql-gremlin ○ Typescript: ts-tinkerpop
  • 20. Escape from the JVM ○ TinkerPop originally 100% Java + Groovy ○ Still very JVM-heavy ○ Gremlin-Server is Java-only ○ How to achieve parity across languages? ○ Ideally: complete Gremlin VM in every language ecosystem ○ Code generation? ○ How to generate both: ○ Clean APIs ○ Efficient runtime code ○ ...that fit together?
  • 21. Making life easier for graph providers ○ Creating TinkerPop implementations ○ Currently a monolithic effort for each language / environment ○ How do we: ○ Ensure consistency across implementations? ○ Reduce the workload? ○ Thoughtful test suite ○ Rigorous in terms of correct operations ○ Does not force functionality that may not fit ○ Types and constraints may help
  • 22. Network serialization formats ○ GraphML (XML) ○ Widely supported ○ Graphs only ○ GraphSON (JSON) ○ TinkerPop-specific ○ Graphs, elements, paths, etc. ○ {1.0, 2.0, 3.0} ○ GraphBinary ○ TinkerPop-specific ○ Graphs, elements, paths, etc. ○ Good forward-compatibility ○ Gryo (Kryo) ○ JVM only
  • 23. Network serialization formats ○ GraphML (XML) ○ Widely supported ○ Graphs only ○ GraphSON (JSON) ○ TinkerPop-specific ○ Graphs, elements, paths, etc. ○ {1.0, 2.0, 3.0} ○ GraphBinary ○ TinkerPop-specific ○ Graphs, elements, paths, etc. ○ Good forward-compatibility ○ Gryo (Kryo) ○ JVM only ○ Bit of a format zoo ○ One format to rule them all? ○ Mappings between formats? ○ Will schemas help? ○ How about common RPC formats ○ Thrift, Protobuf, Avro, etc.
  • 24. ○ Property graphs: ○ Strong on intuitiveness ○ Historically weak on schema ○ Lightweight property graph schemas ○ E.g. in JanusGraph, Neo4j, basic Graph.Features ○ Stronger graph schemas ○ RDF triple stores, hypergraph databases, object databases, etc. ○ Schemas facilitate composability of data and queries ○ ...enabling optimizations, mappings, migration, other good stuff ○ What’s the best fit for TinkerPop? Schemas in TinkerPop
  • 25. Getting transactions right ○ How to support diverse transactional models? ○ Neo4j is different than JanusGraph is different than... ○ Is there a unified approach to: ○ Threads + queries + transactions? ○ Transactional scope? ○ Transaction failures? ○ Nested transactions? ○ etc. ○ Will functional approaches to concurrency help?
  • 26. Static analysis for traversals ○ Stop supporting opaque traversals ○ Security issues ○ Portability issues ○ Need a replacement for closures/lambdas ○ “Just write Gremlin” ○ What additional features are required?
  • 27. Graph stream processing ○ Much of the world’s data is streaming ○ Much of that data describes entities and relationships ○ Decades of research on relational stream processing ○ 10+ years on continuous SPARQL ○ What is continuous Gremlin? ○ (RDF)-[:betterThan]->(PG) for streaming ○ RDF stream := unbounded sequence of triples ○ Property graph stream := ? ○ Need schemas, global identifiers, set operations on graphs
  • 28. Abstractions Data models Query languages Formal inference Transformations Embeddings Graph + Relational model Streams ... Human and machine knowledge Knowledge graphs Enterprise Personal Collaborative Mental representations Representation learning Visualization and HCI ... Processing and performance Graph... Ingestion Generation Partitioning Compression Concurrent systems Parallel Distributed Graph analytics Hardware acceleration Benchmarks and metrics ... The 1010 foot view
  • 30. From Graph.Features to a real type system ○ No existing standard for property graphs ○ Recent community efforts ○ W3C Workshop on Web Standardization for Graph Data (March 2019) ○ Property Graph Schema Working Group (PGSWG) ○ Graph Query Language (GQL) ○ Don’t forget about external data models ○ Relational model ○ RDF and other graph models ○ Data interchange formats (Protocol Buffers, Thrift, Avro, etc.) ○ OO, ER, and semistructured data models
  • 31. Taming the dragon (connecting 3+ data models)
  • 32. Taming the dragon (connecting 3+ data models) →
  • 33. Algebraic Property Graphs ○ Last year at Data Day... ○ A Graph is a Graph is a Graph ○ Composable and bidirectional mappings ○ Formal property graph data model ○ Taxonomy of graph elements ○ Use category theory for the model ○ Developed with Ryan Wisnesky (Conexus AI) ○ Implementations in Haskell and CQL ○ Minimal cover for enterprise data ○ Analogous features in graph and non-graph data models
  • 36. Building structure APIs ○ Vertices, edges, and properties ○ Special cases that can be derived from the type system ○ Graphs are different ○ Not described in terms of types ○ Graph API is often redundant in TinkerPop3 ○ Structure APIs currently written by hand ○ In each language, for each Gremlin Language Variant ○ We can generate consistent interfaces across GLVs ○ Some tooling already exists ○ Build new tools if we want to make it easier
  • 37. Building process APIs ○ Need abstractions for graph processing ○ Steps, constraints, traversals ○ Freebie: every traversal has a graph representation ○ Graph programs as graph data ○ Generate process APIs for each GLV ○ Using a schema; analogous to generating structure APIs ○ Possible to also generate process implementations? ○ That would be great, but... TBD ○ Code gen options: Haskell? Idris? LLVM? Custom code...
  • 38. Abstractions for graph processing ○ Gremlin traversals are “like” monadic composition ○ Let’s make them properly monadic ○ Pure functional encapsulation of: ○ Side-effects, transactions, exception handling ○ Learn from existing functional approaches to Gremlin ○ Gremlin-Scala, Greskell, Gremlin-Haskell
  • 39. Mainstream languages for serialization
  • 40. Transforming graph data and operations ○ Need a language for schema mappings ○ In theory, that gives us: ○ Automated query rewriting ○ Automated data migration ○ Mix-and-match operations ○ Easy, right...?
  • 41. Making a smooth transition ○ (TP3 → TP4) ≠ (TP2 → TP3) ○ Large user base, good support for TinkerPop3 ○ Q: how do we: ○ Make new features useful to the current community ○ Make the migration to TinkerPop4 as seamless as possible ○ A: we try stuff out ○ “The revolution will be A/B tested” ○ Get involved! ○ gremlin-users@googlegroups.com ○ dev@tinkerpop.apache.org
  • 42. Thanks! Joshua Shinavier joshsh@uber.com { }∪{ , , , ... Stephen Mallette Marko Rodriguez Ketrina Yim Graph community