Amundsen gremlin proxy design

Gremlins for Amundsen
08/13/2020

p3
Content
p6
p13
p16
Gremlin Introduction
Amundsen Gremlin Overview
Lessons Learned
Upstream Plan
2

Image
● Graph traversal language
● Apache Tinkerpop
● Vertexes and Edges
● Widely supported
4

● Sample queries
g.V().hasLabel('airport').has('code','DFW')
g.V().has(Table, ‘key’, table_uri)
.outE().inV().hasLabel(Column).as_('column')
● Curious to know more? See
Practical Gremlin
● Cypher equivalent
MATCH (Airport {code: DFW})
MATCH (Table {key: $table_uri})
-[:COLUMN]->(column:Column)
5

Image
● Why build this?
○ Hosted graph
○ Online backups
○ Proxy is platform-agnostic*
7

● Amundsen
8
Postgres Hive Redshift ... Presto
Github
Source
File
Databuilder Crawler
AWS Neptune
Elastic
Search
Metadata Service Search Service
Frontend Service
Metadata Sources
Gremlin shared code

● Gremlin shared code
9
Github
Source
File
Databuilder Crawler
AWS Neptune
Elastic
Search
Frontend Service
Metadata Sources
Gremlin shared code

● Metadata service
○ Gremlin proxy
10
Github
Source
File
Databuilder Crawler
AWS Neptune
Elastic
Search
Frontend Service
Metadata Sources
Gremlin shared code

11
Github
Source
File
Databuilder Crawler
AWS Neptune
Elastic
Search
Frontend Service
Metadata Sources
Gremlin shared code
● Abstract proxy tests
○ Construct one case, test against
every* proxy
def test_rt_table(self) -> None:
expected = Fixtures.next_table()
self.get_proxy().put_table(table=expected)
actual: Table = self.get_proxy().get_table(table_uri=expected.key)
self.assertEqual(expected, actual)

● Databuilder
12
Github
Source
File
Databuilder Crawler
AWS Neptune
Elastic
Search
Frontend Service
Metadata Sources
Gremlin shared code

Lessons Learned
Image
● Failed experiments
○ Transactional gremlin for writes:
■ V only once - prefer V(id)
● g.V(id1).as_('one').V(id2).addE(label).from_('one')
■ Smaller traversals are better
■ Minimize coalesce() in write
14

Lessons Learned
Image
● Failed experiments (cont)
○ SessionedClient
○ AWS Lambda write from Kinesis
15

Upstream Plan
TODAY
Internal refactoring
Consolidation of gremlin code into new shared
amundsen-gremlin repository. Databuilder and
metadata service will utilize the shared code.
Approx. August 17
Stabilization
Improve stability/performance of existing gremlin
code
Approx. August 7
Ship to amundsen
Clean up square-specific bits of amundsen-gremlin,
publish. Publish proxy and proxy tests utilizing
amundsen-gremlin
Approx. August 21
17

Thank you
Kudos to the rest of the Privacy Engineering team
at Square who worked on this - Dan Simms, Alyssa
Ransbury, Sarah Harvey, and Kat Hawthorne

Questions?
See also: amundsen-io issue 526

Amundsen gremlin proxy design

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Amundsen gremlin proxy design

Similar to Amundsen gremlin proxy design (20)

More from markgrover

More from markgrover (20)

Recently uploaded

Recently uploaded (20)

Amundsen gremlin proxy design