1. S T R E N G T H E N C O N N E C T I O N S
B R O A D E N H O R I Z O N S
B R I D G E D I V I D E S
How Expedia’s
Entity Graph
Powers Global
Travel
Chris Williams
06.07.2022
Raghu Sayana
2. Agenda
SECTION CONTENT
E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
1|Infrastructure
Presentation is broken into 2 parts…
• Infrastructure – deploying and
supporting Neo4J with a common
framework
• Implementation – an example of using
Neo4J to solve business problems
2|Implementation
Raghu Sayana
Chris Williams
6. When we power global travel for
everyone, everywhere, we unleash
more opporunities to strengthen
connections, broaden horizons, and
bridge divides.
Our conviction that travel is a force for
good is core to who we are, we are
called to act on that worldview
through our mission, and purpose.
Together, they guide our strategy and
give us an edge.
A true calling
7. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
Platform requisite..
7
• Thousands of clusters and tens of
teams
• Varying levels of automation and
DB expertise
• License management
• Inconsistent tooling
• Lack of governance
8. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
Cerebro
“A fully managed self-serve database
platform that allows the domains to focus
on building high-quality products with no
need for database management”
9. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
Overview
10. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
Architecture
* https://aws.amazon.com/blogs/mt/how-expedia-group-built-database-as-a-service-dbaas-offering-using-aws-service-catalog/
11. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
12. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
Data Technologies at EG
13. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
Strangler
A Pattern to Follow
• Enable change tracking on the core tables/data
• Propagate those “events” to a cloud-based
datastore
• Write a new, cloud-based app to consume data in
the new store
• Dependent services can follow along as their
schedule allows
14. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
An Approach
Sources
Enable
Change
Stream
Propagate
the Change
Process
the
Change
Store the
Change
Render the
View
Event
Driven
View
15. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
It’s a classic pairing - Neo4J & MongDB
Turns out... EG is not alone
• Example -
• And another -
16. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
2 Actions here…
• Data Transport – populating the “view”
• DataSync and DataPull
• Rendering the View – getting your data out
• API calls to Neo/Mongo
17. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
DataSync
Populating the view
• Real-time Data Consumption
• Publisher/Subscriber Model
• In Order
• Easy to On-Board New Data Technologies
• Configuration Based
18. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
DataPull
Populating the view
• Backfill historical data
• Build files that are easy to access and load
• Current State of Data
• Leverage native query language
• On-demand
19. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
NEO4J
Render the View
• Views have a “root” key – always come in with that
key to fetch the entire graph
• Another key = another view – great that the graph
can be traversed in any direction
20. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
MongoDB
Render the View
• Take the Neo4j Graph of “keys” to query Entity
Collections by _id (PK)
• Many collection can be queried simultaneously
• Secondary Indexes are not necessary
21. E X P E D I A G R O U P
H o w E x p e d i a ’ s E n t i t y G r a p h P o w e r s G l o b a l T r a v e l
References
• https://www.mongodb.com/
• http://cassandra.apache.org/
• https://neo4j.com/
• https://www.docker.com/
• http://agilemethodology.org/
• https://neo4j.com/developer/docker-run-neo4j/
• https://aws.amazon.com/blogs/mt/how-expedia-group-built-database-as-a-service-dbaas-offering-using-aws-service-
catalog/
22. Thank You
• Chris Williams
• @anote2chris
• chriswilliamsdba
• Raghavendra Sayana
• @RaghuSayana
• raghavendrasayana
Editor's Notes
We believe travel is a force for good in the world.
So it is our mission to power global travel, for everyone, everywhere.
- We do that through the power of our brands
- And our platform and technology solutions power it all.
(Numbers based on FY 2020)
Pay attention to this… it’s our roadmap for the rest of the talk.
We’ll focus on sections from the middle to the far right.
Assuming you can get your changes in the message bus (kafka) to create a change stream.
What did we just see?
Saw the Neo interface I mentioned up front.
Exposed to Cypher and it’s ’ASCII art’ style
Given a “bulk” load command – best practice – LIMIT 10K seems to be a sweetspot
Given a ‘single node’ create statement
Added an indexable Property after the fact
Showed how to remove nodes and relationships – might want to invest in writing a simple loop executor to do this or get APOC installed.
Showed the ‘with’ keyword – makes it into a CTE
Showed how to use a constraint – best practice – do this before you load data.
Showed how to ‘wipe’ a neo graphDB.
Showed how to use Docker image to do this
What are some gotchas?
Neo doesn’t write well – it tries to ‘batch’ writes to avoid the expense
Other add-on things – like APOC might be able to do things better for you.
Deleting rows in bulk is not fun – can’t drop an entire node.
You get what you pay for… Neo has community edition and EE… EE is so much better.
Don’t add too many properties
What did we just see?
Saw the Neo interface I mentioned up front.
Exposed to Cypher and it’s ’ASCII art’ style
Given a “bulk” load command – best practice – LIMIT 10K seems to be a sweetspot
Given a ‘single node’ create statement
Added an indexable Property after the fact
Showed how to remove nodes and relationships – might want to invest in writing a simple loop executor to do this or get APOC installed.
Showed the ‘with’ keyword – makes it into a CTE
Showed how to use a constraint – best practice – do this before you load data.
Showed how to ‘wipe’ a neo graphDB.
Showed how to use Docker image to do this
What are some gotchas?
Neo doesn’t write well – it tries to ‘batch’ writes to avoid the expense
Other add-on things – like APOC might be able to do things better for you.
Deleting rows in bulk is not fun – can’t drop an entire node.
You get what you pay for… Neo has community edition and EE… EE is so much better.
Don’t add too many properties