Airline Reservations and Routing: A Graph Use Case

Airline Reservations
and Routing: A Graph
Use Case
Jason Plurad
Chin Huang
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation

Pilots
2DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Jason Plurad is a software developer in IBM Digital Business Group. He
develops open source software and builds open communities in the big data
and analytics space, with a current focus on graph databases and graph
analytics. He is a Technical Steering Committee member and committer on
JanusGraph and Apache TinkerPop.
Chin Huang is a software engineer at the IBM Open Technologies and
Performance. He has worked on various enterprise and open source
projects. His current focus is JanusGraph and node.js development and
performance characterization.

How Did We Get Here?
Jason
• Raleigh (RDU)
• Detroit (DTW)
• Amsterdam (AMS)
• Berlin (TXL)
Chin
• San Francisco (SFO)
• Copenhagen (CPH)
• Berlin (TXL)

Graphs are not new

Graph Data Use Cases
5
Social network analysis
Configuration management database
Master data management
Recommendation engines
Knowledge graphs
Internet of things
Cyber security attack analysis
C
A
B
D

Property Graph
6DOC ID / Month XX, 2018 / © 2018 IBM Corporation
RDU DTW AMS
TXLSFO CPH
Type: vertex
Label: airport
Name: Berlin Tegel
Code: TXL
City: Berlin
Country: Germany
Type: edge
Label: route
Flight: 343
Distance: 501
Depart: 13:05
Arrive: 14:57

Gremlin: Graph Traversal
Language
7
What is the shortest path to Berlin?
Apache TinkerPop
https://tinkerpop.apache.org
> g.V(rdu).
repeat( out('route').simplePath() ).
until( has('code’, TXL') ).
limit(5).
path().by('code').
toList()
==> [RDU, JFK, TXL]
==> [RDU, LAX, TXL]
==> [RDU, MIA, TXL]
==> [RDU, YYZ, TXL]
==> [RDU, SFO, TXL]

JanusGraph
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation 8
JanusGraph
Maintainer The Linux
Foundation
License Apache
Releases 0.3.0 planned
2Q 2018
https://janusgraph.org
• Established in January 2017
• Fork of TitanDB
• Scalable graph database distributed
on multi-machine clusters with
pluggable storage and indexing
• Vendor-neutral, open community with
open governance
• Founders: Expero, Google, Grakn,
Hortonworks, IBM
• Members: Amazon, Huawei,
Netflix, Orchestral Developments,
Seeq, Uber
• In Production: Celum, Finc, G-
Data, IBM Cloud, Seeq

JanusGraph Architecture
http://docs.janusgraph.org/latest/arch-overview.html

Graph database storage
backends: Performance
evaluation
Graph use case: Air
travel reservation

Performance Test
Environment
11
Server spec
• Physical servers: x3650 M5, 2 sockets x 14
cores, 384 GB (12 x 32G) memory
• CPU: Intel Xeon Processor E5-2690 v4 14C
2.6GHz 35MB Cache 2400MHz
• Network interface: Emulex VFA5.2 ML2 Dual
Port 10GbE SFP+ Adapter
• Disk: 720 GB SSD, RAID 5
• Operating system: Ubuntu 16.04.2 LTS
Public tools
• jMeter - load testing tool
• nmon, nmon analyser - system performance
monitor and analyze tool
• VisualVM - all-in-one Java
troubleshooting/profiling tool
• GCeasy - garbage collection log analysis tool
• Prometheus and grafana – monitoring
dashboard

JanusGraph Utility Tools
12
How about graph data in volume?
• Lack of existing data or unavailable for performance evaluation
• What are the performance characteristics for various volumes
• Graph Data Generator generates graph data in different sizes and
shapes, so you can easily simulate real data and performance
How to manage graph schema?
• Lack of graph schema management tools
• Graph schemas may change for optimal performance
• Graph Schema Loader enables you to quickly load and update
schema definitions in JanusGraph
How to massively load data into a graph database?
• Lots of RDBMS support data export to CSV files
• I have millions/billions of records!
• Data Batch Importer allows you to fully utilize system resources to
import data in CSV files into JanusGraph
Open source code: https://github.com/IBM/janusgraph-utils

Performance Test Topology
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
JanusGraph
Database Cluster
Load injector
queryinsert, update

Performance Evaluation:
Insert Vertices
• 40 mil vertices in total
• 2 properties for each vertex
• Insert scenario
• Fully utilize the injectors to generate the
loading against the databases

Performance Evaluation:
Insert Edges
• 30 mil edges in total
• 1 property for each edge
• Query and update scenario

Performance
Evaluation: Graph
Traversal

Lessons Learned: Storage
Backends
17
Cassandra
• Cluster bootstrapping takes more efforts
• Smaller memory footprint
HBase
• Uneven CPU% caused by hot regions
• Need to carefully configure read and write
cache settings for better throughput
Scylla
• Easy clustering – adding multiple nodes at once
• Well self-tuned but also lacks documentation
• Even load distributed
• Fully utilize system resources
• CPU utilization misrepresents real loads
• Nice monitoring dashboard – prometheus +
grafana
• Works with existing Cassandra utility clients

Flight Search Use Case
18
Flight search
•All flights from airport A to airport B on a given date and time
•# of stops: non-stop, one-stop, two-stop…
Data spec
•600+ airports, 350K+ flight schedules
Graph Model
Vertex: Airport
Airport code
Vertex: Country
Country name
Edge: Flight Schedule
Flight #
Departure date
Arrival date

Lessons Learned: Flight
Search
19
Model your graph database for performance
• Design data model for your use cases!
• Understand workload read/write ratio
• What kind of queries you want to support? How
many levels deep into a traversal?
• Consider denormalization…
• Design and use various indexes supported in
JanusGraph
Try different approaches to get results back faster
• Use pre-processor in custom app
• Use gremlin queries, applying filters as early as
possible in a query to limit the number of
traversals
• Use groovy methods as programmable extension
Fine-tune for your workloads and systems
• JanusGraph supports storage and index backends
therefore tune your backends!
• JanusGraph server configurations, such as
threadPoolBoss and threadPoolWorker
• JVM configurations, such as Xms (initial and
minimum Java heap size) and Xmx (maximum
Java heap size) You don’t want to see the
annoying java.lang.OutOfMemoryError exceptions
or long and slower GCs.
• Use multiple threads and/or instances to your
system’s capacity
• Consider cloud and auto-scaling
• Be thorough and be patient because it will take a
few iterations!

20
Thank you
compose.com/databases/janusgraph
twitter.com/pluradj
twitter.com/chinhuang007
github.com/IBM/janusgraph-utils
developer.ibm.com/code/patterns

Airline Reservations and Routing: A Graph Use Case

More Related Content

What's hot

Similar to Airline Reservations and Routing: A Graph Use Case

Recently uploaded

Airline Reservations and Routing: A Graph Use Case