SlideShare a Scribd company logo
1 of 94
Download to read offline
MAD · NOV 22-23 · 2019
En un mundo hiperconectado,
las bases de datos de grafos
son tu arma secreta
Javier Ramirez
Technical Evangelist. Amazon Web Services
MAD · NOV 22-23 · 2019
Six degrees of Bacon
SAGIndie from Hollywood, USA - Flickr
CC BY 2.0
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IF BY DEFAULT YOU THINK IN TABLES, YOU NEED
PROFESSIONAL HELP (OR SOME HOLIDAYS)
Purpose-built for a business process
Purpose-built to answer questions about
relationships
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HIGHLY CONNECTED DATA
Retail Fraud DetectionRestaurant RecommendationsSocial Networks
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
US E C A S E S FO R HI GHLY C O NNE C T E D D A T A
Social Networking
Life Sciences Network & IT OperationsFraud Detection
Recommendations Knowledge Graphs
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDATIONS BASED ON RELATIONSHIPS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KNOWLEDGE GRAPH APPLICATIONS
What museums should Alice
visit while in Paris?
Who painted the Mona Lisa?
What artists have paintings
in The Louvre?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
NA VI GA T E A WE B O F GLO BA L T A X PO LI C I E S
“Our customers are increasingly required to navigate a complex web of global tax policies and
regulations. We need an approach to model the sophisticated corporate structures of our
largest clients and deliver an end-to-end tax solution. We use a microservices architecture
approach for our platforms and are beginning to leverage Amazon Neptune as a graph-based
system to quickly create links within the data.”
said Tim Vanderham, chief technology officer, Thomson Reuters Tax & Accounting
Thomson Reuters’ financial knowledge graph as a service
Airlines use case
Legacy systems running on mainframes backed by relational databases with complex workflow
engines and state machines.
Everything could be treated as entity and relationships between planes, parts, maintenance
locations, workstations able to perform the work, availability of parts at set workstations,
personnel and personnel skillsets. Impact of sudden logistic changes and reassignments would
be greatly simplified.
Cyber security at a major Telco
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges Building Apps with Highly Connected DataRELATIONAL DATABASE CHALLENGES BUILDING
APPS WITH HIGHLY CONNECTED DATA
Unnatural for
querying graph
Inefficient
graph processing
Rigid schema inflexible
for changing data
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A G RA PH DA T A BA SE IS OPT IMIZ E D F OR E F F ICIE NT
ST ORA G E A ND RE T RIE VA L OF H IG H L Y CONNE CT E D DA T A
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPHS ARE INTUITIVE.
TRIADIC CLOSURE – CLOSING TRIANGLES
FRIEND
FRIEND
Terry
Bill
Sarah
FRIEND
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDING NEW CONNECTIONS
Terry
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IMMEDIATE FRIENDSHIPS
FRIEND
Terry
Bill
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MEANS AND MOTIVE
FRIEND
FRIEND
Terry
Bill
Sarah
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RECOMMENDATION
FRIEND
FRIEND
Terry
Bill
Sarah
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Open Source Apache TinkerPop
Gremlin Traversal Language
W3C Standard
SPARQL Query Language
R E S O U R C E D E S C R I P T I O N
F R A M E W O R K ( R D F )
P R O P E R T Y G R A P H
LEADING GRAPH MODELS AND FRAMEWORKS.
2 MODELS, 2 QUERY LANGUAGES
Property graph
Data model
• Vertices
• Edges
• Properties
• Labels
Gremlin 3.4.1
• Imperative traversal
language
RDF
Data model
• Triples
• subject-predicate-
object
SPARQL 1.1
• Declarative pattern
matching language
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Property Graph versus RDF
Property Graph RDF
Abstraction level Vertices and connecting edges Triples (or quads)
Data model Naturally supports edge properties,
strongly typed literals
Multiple graphs, custom
datatypes (with loose typing
constraints)
Data reuse and
publishing
Typically application-specific model,
not primarily designed for data sharing
Access to Linked Open Data
useful in many domains, ease
of data publishing and sharing
by use of global URIs and
shared vocabulary
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gremlin versus SPARQL
Property Graph RDF
Standardisation Apache Tinkerpop spec as de-facto
standard – Neptune has some
implementation differences
Based on W3C standards,
with different standards on
top (LDP, R2RML, etc)
Query language
features
Path extraction, iterative looping,
coin flips – closer to an algorithmic
approach
Optional selection of
patterns, variable selection,
nested subqueries
Approach DSL-like graph traversals (low fanout
queries) – path extraction
Clause-based pattern
matching (high fanout
queries) – entity extraction
Learning curve Easy to get started with simple
queries, steep learning curve for
complex queries
Initial learning curve steeper,
but easier to generalize to
complex queries
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RDF: URIs as Globally Unique Identifiers
URIs to identify nodes and edge labels
<https://permid.org/1-4295902158>
=> identifies the company “Netflix Inc”
organization:isIncorporatedIn1
=> identifies the relationship “is incorporated in”
<http://sws.geonames.org/6252001/>
=> identifies country “USA”
1 This is a shortcut for
<http://permid.org/ontology/organization/isIncorporatedIn>.
RDF uses XML prefix notation, where the prefix organization is a shortcut
for <http://permid.org/ontology/organization/>.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Querying RDF Using SPARQL (2)
?property
?property
?property
?node ?node
?node
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Power of URIs: Linked Data
Linking across datasets by referencing globally unique URIs
GeoNames
Wikidata
PermID
Example: PermID (re)uses <http://sws.geonames.org/6252001/>
as a global Identifier for the USA, which is an identifier rooted in GeoNames.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Linked Open Data Cloud
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul
Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ (CC-BY-SA)
Example: SNOMED CT (Systematized Nomenclature of Medicine –Clinical Terms)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Power of Linked Data
Data from Wikidata
Data from PermID
Data from GeoNames
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PROPERTY GRAPH
A property graph is a set of vertices and edges with respective properties (i.e. key/value pairs)
• Vertex represents entities/domains
• Edge represents directional relationship
between vertices.
• Each edge has a label that denotes the
type of relationship
• Each vertex & edge has a unique identifier
• Vertex and edges can have properties
• Properties express non-relational information about the vertices and edges
FRIENDname:
Bill
name:
Sarah
UserUser
Since 11/29/16
Edges – the routes to success
Performance depends on how much of the graph a query
must “touch”
• Choose domain-meaningful edge labels
• Discover only what is absolutely necessary
• “Grey out” unnecessary portions of the graph
How fine-grained should my edge labels be?
Is it an open or extensible set of label values?
Do you need to query across values in the set? (e.g. all addresses)
Bi-directional relationships
Ignore edge direction in queries g.V().hasLabel('Person')
.
both('FRIENDS')
Uni-directional relationships
Be explicit about edge direction in queries g.V().hasLabel('Person')
.
out('FOLLOWS')
g.V().hasLabel('Person')
.
in('FOLLOWS')
Multiple edges between vertices
You can even have self-edges
MAD · NOV 22-23 · 2019
Relational DBs should really be called “row
databases”Did you ever felt you were overcomplicating your db schema adding
intermediate tables to model a complex relationship?
Did you ever use some obscure hack to query hierarchical or nested data?
Did you experience very degraded performance when having to join many
tables (or to self-join a table)?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE MODELING.
Relational model
Write a query to give me
everything related to a
customer.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE MODELING.
Relational model Write a query to give me everything related to a
customer.
You will probably need a
mega join or a mega union.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE MODELING.
Relational model Write a query to give me everything related to a
customer.
You will probably need a mega join or a
megaunion.
What if we add two or three
more tables in a couple of
weeks? What happens with
your code?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GRAPH VS. RELATIONAL DATABASE MODELING.
* Source : http://www.playnexacro.com/index.html#show:article
Relational model Graph model subset
CompanyName:
Acme
…
Customers
OrderDate:
8/1/2018
…
Order
PURCHASED
HAS_DETAILS
UnitPrice:
$179.99
…
Order
DetailsProductName:
“Echo”
…
Product
HAS_PRODUCT
CompanyName:
“Amazon”
…
SupplierSUPPLIES
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SQL RELATIONAL DATABASE QUERY
SELECT distinct c.CompanyName
FROM customers AS c
JOIN orders AS o ON /* Join the customer from the order */
(c.CustomerID = o.CustomerID)
JOIN order_details AS od /* Join the order details from the order
*/
ON (o.OrderID = od.OrderID)
JOIN products as p /* Join the products from the order details
*/
ON (od.ProductID = p.ProductID)
WHERE p.ProductName = ’Echo'; /* Find the product named ‘Echo’ */
Find the name of companies that purchased the ‘Echo’.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SPARQL DECLARATIVE GRAPH QUERY
PREFIX sales_db: <http://sales.widget.com/>
SELECT distinct ?comp_name WHERE {
?customer <sales_db:HAS_ORDER> ?order ; #customer graph pattern
<sales_db:CompanyName> ?comp_name . #orders graph pattern
?order <sales_db:HAS_DETAILS> ?order_d . #order details graph pattern
?order_d <sales_db:HAS_PRODUCT> ?product . #products graph
pattern
?product <sales_db:ProductName> “Echo” .
}
* Source : http://www.playnexacro.com/index.html#show:article
Find the name of companies that purchased the ‘Echo’.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GREMLIN IMPERATIVE GRAPH TRAVERSAL
/* All products named ”Echo” */
g.V().hasLabel(‘Product’).has('name',’Echo')
.in(’HAS_PRODUCT') /* Traverse to order details */
.in(‘HAS_DETAILS’) /* Traverse to order */
.in(’HAS_ORDER’) /* Traverse to Customer */
.values(’CompanyName’).dedup() /* Unique Company Name */
Find the name of companies that purchased the ‘Echo’.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TRIADIC CLOSURE – CLOSING TRIANGLES
FRIEND
FRIEND
Terry
Bill
Sarah
FRIEND
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recommend New Connections
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FIND TERRY
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FIND TERRY’S FRIENDS
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AND THE FRIENDS OF THOSE FRIENDS
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
user
friend
fof
FRIEND
FRIEND
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
...WHO AREN’T TERRY AND AREN’T FRIENDS
WITH TERRY
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)
user
friend
fof
X
FRIEND
FRIEND
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CHALLENGES OF EXISTING GRAPH DATABASES
Difficult to maintain
high availability
Difficult to scale
Limited support for
open standards
Too expensive
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE
F u l l y m a n a g e d g r a p h d a t a b a s e
FAST RELIABLE OPEN
Query billions of
relationships with
millisecond latency
6 replicas of your data
across 3 AZs with full
backup and restore
Build powerful
queries easily with
Gremlin and SPARQL
Supports Apache
TinkerPop & W3C
RDF graph models
EASY
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE HIGH LEVEL ARCHITECTURE
Bulk load
from
Amazon S3
Database
Mgmt.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fully Managed Service
Easily configurable via the console
Multi-AZ high availability
Support for up to 15 read replicas
Supports encryption at rest
Supports encryption in transit (TLS)
Backup and restore, point-in-time
recovery
B E N E F I T S
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Secure deployment in a VPC
• Increased availability through
deployment in two subnets in two
different Availability Zones (AZs)
• Cluster volume always spans three
AZ to provide durable storage
• See the Amazon Neptune
Documentation for VPC setup details
AMAZON NEPTUNE: VPC DEPLOYMENT
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BATTLE-TESTED CLOUD-NATIVE STORAGE ENGINE
OVERVIEW
Data is replicated 6 times across 3 Availability Zones
Continuous backup to Amazon S3
(built for 11 9s durability)
Continuous monitoring of nodes and disks for repair
10 GB segments as unit of repair or hotspot rebalance
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Storage volume automatically grows up to 64 TB
AZ 1 AZ 2 AZ 3
Amazon S3
Amazon
Neptune
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Monitoring
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE HIGH AVAILABILITY AND FAULT
TOLERANCE (CLOUD-NATIVE STORAGE)
What can fail?
Segment failures (disks)
Node failures (machines)
AZ failures (network or datacenter)
Optimizations
4 out of 6 write quorum
3 out of 6 read quorum
Peer-to-peer replication for repairs
AZ 1 AZ 2 AZ 3
Caching
Amazon
Neptune
AZ 1 AZ 2 AZ 3
Caching
Amazon
Neptune
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE READ REPLICAS
Availability
• Failing database nodes are
automatically detected and replaced
• Failing database processes are
automatically detected and recycled
• Replicas are automatically promoted
to primary if needed (failover)
• Customer specifiable fail-over order
AZ 1 AZ 3AZ 2
Primary
Node
Primary
Node
Primary
Master
Node
Primary
Node
Primary
Node
Read
Replica
Primary
Node
Primary
Node
Read
Replica
Cluster
and
Instance
Monitoring
Performance
• Customer applications can scale out read
traffic across read replicas
• Read balancing across read replicas
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE FAILOVER TIMES ARE
TYPICALLY < 30 SECONDS
Replica-Aware App Running
Failure Detection DNS Propagation
Recovery
Database
Failure
1 5 - 2 0 s e c 3 - 1 0 s e c
App
Running
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE CONTINUOUS BACKUP (CLOUD-
NATIVE STORAGE)
• Take periodic snapshot of each segment in parallel; stream the logs to Amazon S3
• Backup happens continuously without performance or availability impact
• At restore, retrieve the appropriate segment snapshots and log streams to storage nodes
• Apply log streams to segment snapshots in parallel and asynchronously
Segment snapshot Log records
Recovery point
Segment 1
Segment 2
Segment 3
Time
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMAZON NEPTUNE ONLINE POINT-IN-TIME
RESTORE (CLOUD-NATIVE STORAGE)
Online point-in-time restore is a quick way to bring the database to a particular point
in time without having to restore from backups
• Rewinding the database to quickly
• Rewind multiple times to determine the desired point-in-time in the database state
t0 t1 t2
t0 t1
t2
t3 t4
t3
t4
Rewind to t1
Rewind to t3
Invisible Invisible
DEMO TIME
https://aws.amazon.com/blogs/database/analyze-amazon-neptune-graphs-
using-amazon-sagemaker-jupyter-notebooks/
https://github.com/aws-samples/amazon-neptune-
samples/tree/master/gremlin/collaborative-filtering
https://aws.amazon.com/blogs/database/let-me-graph-that-for-you-part-1-air-
routes/
Data modelling example
Example scenario
Employment history application
• People, companies, roles
Use cases
1. Find the companies where X has worked, and their roles at those
companies
2. Find the people who have worked for a company at a specific
location during a particular time period
3. Find the people in more senior roles at the companies where X
worked
Identify entities, relationships, and attributes
Find the companies where X has worked, and their roles at
those companies
Which companies has X worked for, and in what roles?
Companies Company Entity
X Person Entity
Worked for Worked for Relationship
Roles Role Attribute?
Identify candidate vertices, labels, and
properties
Company Entity Vertex Company
Person Entity Vertex Person
Worked for Relationship Edge WORKED_FOR
Role Attribute? Property? role
name
CompanyPerson
Entity or attribute? Vertex or property?
Is role an entity or an attribute?
• Does it have identity (or is it a value type)? X
• Is it a complex type (with multiple fields)? X
• Are there any structural relations between values? ?
Keep it simple
• Prefer properties to vertices/edges until the need arises
Model 1
What questions would we have to ask of our
data?
Find the people in more senior roles at the companies where
X worked
Who were in senior roles at the companies where X worked?
Entity or attribute? Vertex or property?
Is role an entity or an attribute?
• Does it have identity (or is it a value type)? X
• Is it a complex type (with multiple fields)? X
• Are there any structural relations between values? ✓
Model structural relations with edges
• Promote role to being a vertex
Role hierarchy
Role
Role Role
Model 3
Traversal 3
Who were in more senior roles at the companies where Li
worked?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin query 3
g.V('p-3').
out('JOB').as('j1').
out('ROLE').
repeat(out('PARENT_ROLE')).until(outE().count().is(0)).
emit().in('ROLE').as('j2').
or(
(where('j1', between('j2', 'j2')).by('from').by('from').by('to')),
(where('j1', between('j2', 'j2')).by('to').by('from').by('to')),
(where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from'))
).
project('role', 'name').
by(out('ROLE').values('name')).
by(in('JOB').values('firstName', 'lastName').fold()).
toList())
Wrapping up
Graphs can be applied to a huge number of use cases
A graph database fits more naturally and performs much faster than other
databases when working with highly connected data
Scaling out a graph database is not easy. Amazon Neptune makes your
life easier
¡Gracias!
Javier Ramirez
Technical Evangelist. Amazon Web Services
Appendix I
Converting Other Data Models to Graph
Mapping relational to graph (RDF)
W3C DirectMapping
• “Out-of-the-box” schema for mapping relational data to RDF
• https://www.w3.org/TR/rdb-direct-mapping/
R2RML
• Standard that allows you to specify mappings from relations to
graph
• Rules defined over logical tables
• https://www.w3.org/TR/r2rml/
Tooling
• D2RQ – access relational database as virtual, read-only RDF graph
Table
12 … Alice
id … f_name
37 … Bob
Foreign keys
12 … Alice
id … f_name
37 … Bob
512 12 home High St
655 37 work Main St
700 12 work Any St
id p_id type addr_1
Foreign keys
12 … Alice
id … f_name
37 … Bob
512 12 home High St
655 37 work Main St
700 12 work Any St
id p_id type addr_1
Join tables
12 … Alice
id … f_name
37 … Bob
512 … Any Co
655 … Example
Co
700 … Example.
com
id … name
12 512 2012 2015
37 512 2011 2016
p_id c_id from to
37 655 2016 2017
12 700 2015 2017
Facts and dimensions
id
43
u_id
678
p_id
94
l_id
144
date
14-12-
2018
…
…
id
94
…
…
id
678
…
…
id
144
…
…
Documents
{ id: order-1
delivery-address: {
// address-1
}
payment-address: {
// address-1
}
{ id: order-2
delivery-address: {
// address-1
}
{ id: order-3
Key-Value
Alice TX Austin:Dev
37 Bob TX Dallas:Dev 555-0100
99 Dan TX 2016 Austin:Ops 555-0199
id f_name state start city:dept tel
Data load options
Bulk loader API
• Load data from S3
into Neptune
• Low overhead,
optimized for large
datasets
• Good for append-
only loads
Online endpoints
• Gremlin or SPARQL
Bulk load
from S3
Database
Mgmt.
Appendix II
Ignition One. Customer use case presented at
AWS Atlanta Summit
https://www.slideshare.net/AmazonWebServic
es/using-amazon-neptune-to-power-identity-
resolution-at-scale-adb303-atlanta-aws-
summit

More Related Content

Similar to En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta

Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Amazon Web Services
 
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018Amazon Web Services
 
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...Amazon Web Services
 
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018Amazon Web Services
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 
What’s the big deal with Graph Databases?
What’s the big deal with Graph Databases?What’s the big deal with Graph Databases?
What’s the big deal with Graph Databases?Daniel Zivkovic
 
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...Lisa Roth, PMP
 
Going Graph With Amazon Neptune - AWS Summit Sydney 2018
Going Graph With Amazon Neptune - AWS Summit Sydney 2018Going Graph With Amazon Neptune - AWS Summit Sydney 2018
Going Graph With Amazon Neptune - AWS Summit Sydney 2018Amazon Web Services
 
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018Amazon Web Services
 
Understanding Graph Databases: AWS Developer Workshop at Web Summit
Understanding Graph Databases: AWS Developer Workshop at Web SummitUnderstanding Graph Databases: AWS Developer Workshop at Web Summit
Understanding Graph Databases: AWS Developer Workshop at Web SummitAmazon Web Services
 
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018Amazon Web Services
 
Database su AWS scegliere lo strumento giusto per il giusto obiettivo
Database su AWS scegliere lo strumento giusto per il giusto obiettivoDatabase su AWS scegliere lo strumento giusto per il giusto obiettivo
Database su AWS scegliere lo strumento giusto per il giusto obiettivoAmazon Web Services
 
Work Backwards to Your Graph Data Model & Queries with Amazon Neptune (DAT330...
Work Backwards to Your Graph Data Model & Queries with Amazon Neptune (DAT330...Work Backwards to Your Graph Data Model & Queries with Amazon Neptune (DAT330...
Work Backwards to Your Graph Data Model & Queries with Amazon Neptune (DAT330...Amazon Web Services
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSAmazon Web Services
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Building with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseBuilding with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseAWS Summits
 

Similar to En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta (20)

Graph & Neptune
Graph & NeptuneGraph & Neptune
Graph & Neptune
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
 
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
 
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
 
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
What’s the big deal with Graph Databases?
What’s the big deal with Graph Databases?What’s the big deal with Graph Databases?
What’s the big deal with Graph Databases?
 
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...
 
Going Graph With Amazon Neptune - AWS Summit Sydney 2018
Going Graph With Amazon Neptune - AWS Summit Sydney 2018Going Graph With Amazon Neptune - AWS Summit Sydney 2018
Going Graph With Amazon Neptune - AWS Summit Sydney 2018
 
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018
 
Understanding Graph Databases: AWS Developer Workshop at Web Summit
Understanding Graph Databases: AWS Developer Workshop at Web SummitUnderstanding Graph Databases: AWS Developer Workshop at Web Summit
Understanding Graph Databases: AWS Developer Workshop at Web Summit
 
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
 
Database su AWS scegliere lo strumento giusto per il giusto obiettivo
Database su AWS scegliere lo strumento giusto per il giusto obiettivoDatabase su AWS scegliere lo strumento giusto per il giusto obiettivo
Database su AWS scegliere lo strumento giusto per il giusto obiettivo
 
Work Backwards to Your Graph Data Model & Queries with Amazon Neptune (DAT330...
Work Backwards to Your Graph Data Model & Queries with Amazon Neptune (DAT330...Work Backwards to Your Graph Data Model & Queries with Amazon Neptune (DAT330...
Work Backwards to Your Graph Data Model & Queries with Amazon Neptune (DAT330...
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
AWS-Quick-Start
AWS-Quick-StartAWS-Quick-Start
AWS-Quick-Start
 
HK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-WorkshopHK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-Workshop
 
Building with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseBuilding with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right Database
 

More from javier ramirez

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfestjavier ramirez
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBjavier ramirez
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728javier ramirez
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022javier ramirez
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragónjavier ramirez
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessjavier ramirez
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloudjavier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analyticsjavier ramirez
 
Getting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelineGetting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelinejavier ramirez
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Divejavier ramirez
 
Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)javier ramirez
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSjavier ramirez
 

More from javier ramirez (20)

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragón
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
 
Getting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelineGetting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipeline
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Dive
 
Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWS
 

Recently uploaded

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta

  • 1. MAD · NOV 22-23 · 2019 En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta Javier Ramirez Technical Evangelist. Amazon Web Services
  • 2. MAD · NOV 22-23 · 2019 Six degrees of Bacon SAGIndie from Hollywood, USA - Flickr CC BY 2.0
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. IF BY DEFAULT YOU THINK IN TABLES, YOU NEED PROFESSIONAL HELP (OR SOME HOLIDAYS) Purpose-built for a business process Purpose-built to answer questions about relationships
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HIGHLY CONNECTED DATA Retail Fraud DetectionRestaurant RecommendationsSocial Networks
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. US E C A S E S FO R HI GHLY C O NNE C T E D D A T A Social Networking Life Sciences Network & IT OperationsFraud Detection Recommendations Knowledge Graphs
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RECOMMENDATIONS BASED ON RELATIONSHIPS
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KNOWLEDGE GRAPH APPLICATIONS What museums should Alice visit while in Paris? Who painted the Mona Lisa? What artists have paintings in The Louvre?
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. NA VI GA T E A WE B O F GLO BA L T A X PO LI C I E S “Our customers are increasingly required to navigate a complex web of global tax policies and regulations. We need an approach to model the sophisticated corporate structures of our largest clients and deliver an end-to-end tax solution. We use a microservices architecture approach for our platforms and are beginning to leverage Amazon Neptune as a graph-based system to quickly create links within the data.” said Tim Vanderham, chief technology officer, Thomson Reuters Tax & Accounting
  • 9. Thomson Reuters’ financial knowledge graph as a service
  • 10. Airlines use case Legacy systems running on mainframes backed by relational databases with complex workflow engines and state machines. Everything could be treated as entity and relationships between planes, parts, maintenance locations, workstations able to perform the work, availability of parts at set workstations, personnel and personnel skillsets. Impact of sudden logistic changes and reassignments would be greatly simplified.
  • 11. Cyber security at a major Telco
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenges Building Apps with Highly Connected DataRELATIONAL DATABASE CHALLENGES BUILDING APPS WITH HIGHLY CONNECTED DATA Unnatural for querying graph Inefficient graph processing Rigid schema inflexible for changing data
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A G RA PH DA T A BA SE IS OPT IMIZ E D F OR E F F ICIE NT ST ORA G E A ND RE T RIE VA L OF H IG H L Y CONNE CT E D DA T A
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GRAPHS ARE INTUITIVE. TRIADIC CLOSURE – CLOSING TRIANGLES FRIEND FRIEND Terry Bill Sarah FRIEND
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RECOMMENDING NEW CONNECTIONS Terry
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. IMMEDIATE FRIENDSHIPS FRIEND Terry Bill
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MEANS AND MOTIVE FRIEND FRIEND Terry Bill Sarah
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RECOMMENDATION FRIEND FRIEND Terry Bill Sarah
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Open Source Apache TinkerPop Gremlin Traversal Language W3C Standard SPARQL Query Language R E S O U R C E D E S C R I P T I O N F R A M E W O R K ( R D F ) P R O P E R T Y G R A P H LEADING GRAPH MODELS AND FRAMEWORKS. 2 MODELS, 2 QUERY LANGUAGES
  • 20. Property graph Data model • Vertices • Edges • Properties • Labels Gremlin 3.4.1 • Imperative traversal language
  • 21. RDF Data model • Triples • subject-predicate- object SPARQL 1.1 • Declarative pattern matching language
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Property Graph versus RDF Property Graph RDF Abstraction level Vertices and connecting edges Triples (or quads) Data model Naturally supports edge properties, strongly typed literals Multiple graphs, custom datatypes (with loose typing constraints) Data reuse and publishing Typically application-specific model, not primarily designed for data sharing Access to Linked Open Data useful in many domains, ease of data publishing and sharing by use of global URIs and shared vocabulary
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gremlin versus SPARQL Property Graph RDF Standardisation Apache Tinkerpop spec as de-facto standard – Neptune has some implementation differences Based on W3C standards, with different standards on top (LDP, R2RML, etc) Query language features Path extraction, iterative looping, coin flips – closer to an algorithmic approach Optional selection of patterns, variable selection, nested subqueries Approach DSL-like graph traversals (low fanout queries) – path extraction Clause-based pattern matching (high fanout queries) – entity extraction Learning curve Easy to get started with simple queries, steep learning curve for complex queries Initial learning curve steeper, but easier to generalize to complex queries
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RDF: URIs as Globally Unique Identifiers URIs to identify nodes and edge labels <https://permid.org/1-4295902158> => identifies the company “Netflix Inc” organization:isIncorporatedIn1 => identifies the relationship “is incorporated in” <http://sws.geonames.org/6252001/> => identifies country “USA” 1 This is a shortcut for <http://permid.org/ontology/organization/isIncorporatedIn>. RDF uses XML prefix notation, where the prefix organization is a shortcut for <http://permid.org/ontology/organization/>.
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Querying RDF Using SPARQL (2) ?property ?property ?property ?node ?node ?node
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Power of URIs: Linked Data Linking across datasets by referencing globally unique URIs GeoNames Wikidata PermID Example: PermID (re)uses <http://sws.geonames.org/6252001/> as a global Identifier for the USA, which is an identifier rooted in GeoNames.
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Linked Open Data Cloud Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ (CC-BY-SA) Example: SNOMED CT (Systematized Nomenclature of Medicine –Clinical Terms)
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Power of Linked Data Data from Wikidata Data from PermID Data from GeoNames
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. PROPERTY GRAPH A property graph is a set of vertices and edges with respective properties (i.e. key/value pairs) • Vertex represents entities/domains • Edge represents directional relationship between vertices. • Each edge has a label that denotes the type of relationship • Each vertex & edge has a unique identifier • Vertex and edges can have properties • Properties express non-relational information about the vertices and edges FRIENDname: Bill name: Sarah UserUser Since 11/29/16
  • 30. Edges – the routes to success Performance depends on how much of the graph a query must “touch” • Choose domain-meaningful edge labels • Discover only what is absolutely necessary • “Grey out” unnecessary portions of the graph
  • 31. How fine-grained should my edge labels be? Is it an open or extensible set of label values? Do you need to query across values in the set? (e.g. all addresses)
  • 32. Bi-directional relationships Ignore edge direction in queries g.V().hasLabel('Person') . both('FRIENDS')
  • 33. Uni-directional relationships Be explicit about edge direction in queries g.V().hasLabel('Person') . out('FOLLOWS') g.V().hasLabel('Person') . in('FOLLOWS')
  • 35. You can even have self-edges
  • 36. MAD · NOV 22-23 · 2019 Relational DBs should really be called “row databases”Did you ever felt you were overcomplicating your db schema adding intermediate tables to model a complex relationship? Did you ever use some obscure hack to query hierarchical or nested data? Did you experience very degraded performance when having to join many tables (or to self-join a table)?
  • 37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GRAPH VS. RELATIONAL DATABASE MODELING. Relational model Write a query to give me everything related to a customer.
  • 38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GRAPH VS. RELATIONAL DATABASE MODELING. Relational model Write a query to give me everything related to a customer. You will probably need a mega join or a mega union.
  • 39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GRAPH VS. RELATIONAL DATABASE MODELING. Relational model Write a query to give me everything related to a customer. You will probably need a mega join or a megaunion. What if we add two or three more tables in a couple of weeks? What happens with your code?
  • 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GRAPH VS. RELATIONAL DATABASE MODELING. * Source : http://www.playnexacro.com/index.html#show:article Relational model Graph model subset CompanyName: Acme … Customers OrderDate: 8/1/2018 … Order PURCHASED HAS_DETAILS UnitPrice: $179.99 … Order DetailsProductName: “Echo” … Product HAS_PRODUCT CompanyName: “Amazon” … SupplierSUPPLIES
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SQL RELATIONAL DATABASE QUERY SELECT distinct c.CompanyName FROM customers AS c JOIN orders AS o ON /* Join the customer from the order */ (c.CustomerID = o.CustomerID) JOIN order_details AS od /* Join the order details from the order */ ON (o.OrderID = od.OrderID) JOIN products as p /* Join the products from the order details */ ON (od.ProductID = p.ProductID) WHERE p.ProductName = ’Echo'; /* Find the product named ‘Echo’ */ Find the name of companies that purchased the ‘Echo’.
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SPARQL DECLARATIVE GRAPH QUERY PREFIX sales_db: <http://sales.widget.com/> SELECT distinct ?comp_name WHERE { ?customer <sales_db:HAS_ORDER> ?order ; #customer graph pattern <sales_db:CompanyName> ?comp_name . #orders graph pattern ?order <sales_db:HAS_DETAILS> ?order_d . #order details graph pattern ?order_d <sales_db:HAS_PRODUCT> ?product . #products graph pattern ?product <sales_db:ProductName> “Echo” . } * Source : http://www.playnexacro.com/index.html#show:article Find the name of companies that purchased the ‘Echo’.
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GREMLIN IMPERATIVE GRAPH TRAVERSAL /* All products named ”Echo” */ g.V().hasLabel(‘Product’).has('name',’Echo') .in(’HAS_PRODUCT') /* Traverse to order details */ .in(‘HAS_DETAILS’) /* Traverse to order */ .in(’HAS_ORDER’) /* Traverse to Customer */ .values(’CompanyName’).dedup() /* Unique Company Name */ Find the name of companies that purchased the ‘Echo’.
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TRIADIC CLOSURE – CLOSING TRIANGLES FRIEND FRIEND Terry Bill Sarah FRIEND
  • 45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recommend New Connections g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr)
  • 46. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FIND TERRY g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr)
  • 47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FIND TERRY’S FRIENDS g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr)
  • 48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AND THE FRIENDS OF THOSE FRIENDS g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr) user friend fof FRIEND FRIEND
  • 49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ...WHO AREN’T TERRY AND AREN’T FRIENDS WITH TERRY g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr) user friend fof X FRIEND FRIEND
  • 50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CHALLENGES OF EXISTING GRAPH DATABASES Difficult to maintain high availability Difficult to scale Limited support for open standards Too expensive
  • 51. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE F u l l y m a n a g e d g r a p h d a t a b a s e FAST RELIABLE OPEN Query billions of relationships with millisecond latency 6 replicas of your data across 3 AZs with full backup and restore Build powerful queries easily with Gremlin and SPARQL Supports Apache TinkerPop & W3C RDF graph models EASY
  • 52. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE HIGH LEVEL ARCHITECTURE Bulk load from Amazon S3 Database Mgmt.
  • 53. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fully Managed Service Easily configurable via the console Multi-AZ high availability Support for up to 15 read replicas Supports encryption at rest Supports encryption in transit (TLS) Backup and restore, point-in-time recovery B E N E F I T S
  • 54. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Secure deployment in a VPC • Increased availability through deployment in two subnets in two different Availability Zones (AZs) • Cluster volume always spans three AZ to provide durable storage • See the Amazon Neptune Documentation for VPC setup details AMAZON NEPTUNE: VPC DEPLOYMENT
  • 55. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BATTLE-TESTED CLOUD-NATIVE STORAGE ENGINE OVERVIEW Data is replicated 6 times across 3 Availability Zones Continuous backup to Amazon S3 (built for 11 9s durability) Continuous monitoring of nodes and disks for repair 10 GB segments as unit of repair or hotspot rebalance Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes Storage volume automatically grows up to 64 TB AZ 1 AZ 2 AZ 3 Amazon S3 Amazon Neptune Storage Node Storage Node Storage Node Storage Node Storage Node Storage Node Storage Monitoring
  • 56. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE HIGH AVAILABILITY AND FAULT TOLERANCE (CLOUD-NATIVE STORAGE) What can fail? Segment failures (disks) Node failures (machines) AZ failures (network or datacenter) Optimizations 4 out of 6 write quorum 3 out of 6 read quorum Peer-to-peer replication for repairs AZ 1 AZ 2 AZ 3 Caching Amazon Neptune AZ 1 AZ 2 AZ 3 Caching Amazon Neptune
  • 57. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE READ REPLICAS Availability • Failing database nodes are automatically detected and replaced • Failing database processes are automatically detected and recycled • Replicas are automatically promoted to primary if needed (failover) • Customer specifiable fail-over order AZ 1 AZ 3AZ 2 Primary Node Primary Node Primary Master Node Primary Node Primary Node Read Replica Primary Node Primary Node Read Replica Cluster and Instance Monitoring Performance • Customer applications can scale out read traffic across read replicas • Read balancing across read replicas
  • 58. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE FAILOVER TIMES ARE TYPICALLY < 30 SECONDS Replica-Aware App Running Failure Detection DNS Propagation Recovery Database Failure 1 5 - 2 0 s e c 3 - 1 0 s e c App Running
  • 59. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE CONTINUOUS BACKUP (CLOUD- NATIVE STORAGE) • Take periodic snapshot of each segment in parallel; stream the logs to Amazon S3 • Backup happens continuously without performance or availability impact • At restore, retrieve the appropriate segment snapshots and log streams to storage nodes • Apply log streams to segment snapshots in parallel and asynchronously Segment snapshot Log records Recovery point Segment 1 Segment 2 Segment 3 Time
  • 60. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE ONLINE POINT-IN-TIME RESTORE (CLOUD-NATIVE STORAGE) Online point-in-time restore is a quick way to bring the database to a particular point in time without having to restore from backups • Rewinding the database to quickly • Rewind multiple times to determine the desired point-in-time in the database state t0 t1 t2 t0 t1 t2 t3 t4 t3 t4 Rewind to t1 Rewind to t3 Invisible Invisible
  • 63. Example scenario Employment history application • People, companies, roles Use cases 1. Find the companies where X has worked, and their roles at those companies 2. Find the people who have worked for a company at a specific location during a particular time period 3. Find the people in more senior roles at the companies where X worked
  • 64. Identify entities, relationships, and attributes Find the companies where X has worked, and their roles at those companies Which companies has X worked for, and in what roles? Companies Company Entity X Person Entity Worked for Worked for Relationship Roles Role Attribute?
  • 65. Identify candidate vertices, labels, and properties Company Entity Vertex Company Person Entity Vertex Person Worked for Relationship Edge WORKED_FOR Role Attribute? Property? role name CompanyPerson
  • 66. Entity or attribute? Vertex or property? Is role an entity or an attribute? • Does it have identity (or is it a value type)? X • Is it a complex type (with multiple fields)? X • Are there any structural relations between values? ? Keep it simple • Prefer properties to vertices/edges until the need arises
  • 68. What questions would we have to ask of our data? Find the people in more senior roles at the companies where X worked Who were in senior roles at the companies where X worked?
  • 69. Entity or attribute? Vertex or property? Is role an entity or an attribute? • Does it have identity (or is it a value type)? X • Is it a complex type (with multiple fields)? X • Are there any structural relations between values? ✓ Model structural relations with edges • Promote role to being a vertex
  • 72. Traversal 3 Who were in more senior roles at the companies where Li worked?
  • 73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 74. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 76. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 77. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 78. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 79. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 80. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 81. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gremlin query 3 g.V('p-3'). out('JOB').as('j1'). out('ROLE'). repeat(out('PARENT_ROLE')).until(outE().count().is(0)). emit().in('ROLE').as('j2'). or( (where('j1', between('j2', 'j2')).by('from').by('from').by('to')), (where('j1', between('j2', 'j2')).by('to').by('from').by('to')), (where('j1', lte('j2').and(gt('j2'))).by('from').by('from').by('to').by('from')) ). project('role', 'name'). by(out('ROLE').values('name')). by(in('JOB').values('firstName', 'lastName').fold()). toList())
  • 82. Wrapping up Graphs can be applied to a huge number of use cases A graph database fits more naturally and performs much faster than other databases when working with highly connected data Scaling out a graph database is not easy. Amazon Neptune makes your life easier
  • 84. Appendix I Converting Other Data Models to Graph
  • 85. Mapping relational to graph (RDF) W3C DirectMapping • “Out-of-the-box” schema for mapping relational data to RDF • https://www.w3.org/TR/rdb-direct-mapping/ R2RML • Standard that allows you to specify mappings from relations to graph • Rules defined over logical tables • https://www.w3.org/TR/r2rml/ Tooling • D2RQ – access relational database as virtual, read-only RDF graph
  • 86. Table 12 … Alice id … f_name 37 … Bob
  • 87. Foreign keys 12 … Alice id … f_name 37 … Bob 512 12 home High St 655 37 work Main St 700 12 work Any St id p_id type addr_1
  • 88. Foreign keys 12 … Alice id … f_name 37 … Bob 512 12 home High St 655 37 work Main St 700 12 work Any St id p_id type addr_1
  • 89. Join tables 12 … Alice id … f_name 37 … Bob 512 … Any Co 655 … Example Co 700 … Example. com id … name 12 512 2012 2015 37 512 2011 2016 p_id c_id from to 37 655 2016 2017 12 700 2015 2017
  • 91. Documents { id: order-1 delivery-address: { // address-1 } payment-address: { // address-1 } { id: order-2 delivery-address: { // address-1 } { id: order-3
  • 92. Key-Value Alice TX Austin:Dev 37 Bob TX Dallas:Dev 555-0100 99 Dan TX 2016 Austin:Ops 555-0199 id f_name state start city:dept tel
  • 93. Data load options Bulk loader API • Load data from S3 into Neptune • Low overhead, optimized for large datasets • Good for append- only loads Online endpoints • Gremlin or SPARQL Bulk load from S3 Database Mgmt.
  • 94. Appendix II Ignition One. Customer use case presented at AWS Atlanta Summit https://www.slideshare.net/AmazonWebServic es/using-amazon-neptune-to-power-identity- resolution-at-scale-adb303-atlanta-aws- summit