SlideShare a Scribd company logo
Airline Reservations
and Routing: A Graph
Use Case
Jason Plurad
Chin Huang
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Pilots
2DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Jason Plurad is a software developer in IBM Digital Business Group. He
develops open source software and builds open communities in the big data
and analytics space, with a current focus on graph databases and graph
analytics. He is a Technical Steering Committee member and committer on
JanusGraph and Apache TinkerPop.
Chin Huang is a software engineer at the IBM Open Technologies and
Performance. He has worked on various enterprise and open source
projects. His current focus is JanusGraph and node.js development and
performance characterization.
How Did We Get Here?
Jason
• Raleigh (RDU)
• Detroit (DTW)
• Amsterdam (AMS)
• Berlin (TXL)
Chin
• San Francisco (SFO)
• Copenhagen (CPH)
• Berlin (TXL)
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Graphs are not new
4DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Graph Data Use Cases
5
Social network analysis
Configuration management database
Master data management
Recommendation engines
Knowledge graphs
Internet of things
Cyber security attack analysis
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
C
A
B
D
Property Graph
6DOC ID / Month XX, 2018 / © 2018 IBM Corporation
RDU DTW AMS
TXLSFO CPH
Type: vertex
Label: airport
Name: Berlin Tegel
Code: TXL
City: Berlin
Country: Germany
Type: edge
Label: route
Flight: 343
Distance: 501
Depart: 13:05
Arrive: 14:57
Gremlin: Graph Traversal
Language
7
What is the shortest path to Berlin?
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Apache TinkerPop
https://tinkerpop.apache.org
> g.V(rdu).
repeat( out('route').simplePath() ).
until( has('code’, TXL') ).
limit(5).
path().by('code').
toList()
==> [RDU, JFK, TXL]
==> [RDU, LAX, TXL]
==> [RDU, MIA, TXL]
==> [RDU, YYZ, TXL]
==> [RDU, SFO, TXL]
JanusGraph
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation 8
JanusGraph
Maintainer The Linux
Foundation
License Apache
Releases 0.3.0 planned
2Q 2018
https://janusgraph.org
• Established in January 2017
• Fork of TitanDB
• Scalable graph database distributed
on multi-machine clusters with
pluggable storage and indexing
• Vendor-neutral, open community with
open governance
• Founders: Expero, Google, Grakn,
Hortonworks, IBM
• Members: Amazon, Huawei,
Netflix, Orchestral Developments,
Seeq, Uber
• In Production: Celum, Finc, G-
Data, IBM Cloud, Seeq
JanusGraph Architecture
9DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
http://docs.janusgraph.org/latest/arch-overview.html
Graph database storage
backends: Performance
evaluation
Graph use case: Air
travel reservation
10DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Performance Test
Environment
11
Server spec
• Physical servers: x3650 M5, 2 sockets x 14
cores, 384 GB (12 x 32G) memory
• CPU: Intel Xeon Processor E5-2690 v4 14C
2.6GHz 35MB Cache 2400MHz
• Network interface: Emulex VFA5.2 ML2 Dual
Port 10GbE SFP+ Adapter
• Disk: 720 GB SSD, RAID 5
• Operating system: Ubuntu 16.04.2 LTS
Public tools
• jMeter - load testing tool
• nmon, nmon analyser - system performance
monitor and analyze tool
• VisualVM - all-in-one Java
troubleshooting/profiling tool
• GCeasy - garbage collection log analysis tool
• Prometheus and grafana – monitoring
dashboard
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
JanusGraph Utility Tools
12
How about graph data in volume?
• Lack of existing data or unavailable for performance evaluation
• What are the performance characteristics for various volumes
• Graph Data Generator generates graph data in different sizes and
shapes, so you can easily simulate real data and performance
How to manage graph schema?
• Lack of graph schema management tools
• Graph schemas may change for optimal performance
• Graph Schema Loader enables you to quickly load and update
schema definitions in JanusGraph
How to massively load data into a graph database?
• Lots of RDBMS support data export to CSV files
• I have millions/billions of records!
• Data Batch Importer allows you to fully utilize system resources to
import data in CSV files into JanusGraph
Open source code: https://github.com/IBM/janusgraph-utils
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Performance Test Topology
13DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
Cassandra
HBase + HDFS
+ ZooKeeper
Scylla
JanusGraph
Database Cluster
Load injector
queryinsert, update
Performance Evaluation:
Insert Vertices
14DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
• 40 mil vertices in total
• 2 properties for each vertex
• Insert scenario
• Fully utilize the injectors to generate the
loading against the databases
Performance Evaluation:
Insert Edges
15DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
• 30 mil edges in total
• 1 property for each edge
• Query and update scenario
Performance
Evaluation: Graph
Traversal
16DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Lessons Learned: Storage
Backends
17
Cassandra
• Cluster bootstrapping takes more efforts
• Smaller memory footprint
HBase
• Uneven CPU% caused by hot regions
• Need to carefully configure read and write
cache settings for better throughput
Scylla
• Easy clustering – adding multiple nodes at once
• Well self-tuned but also lacks documentation
• Even load distributed
• Fully utilize system resources
• CPU utilization misrepresents real loads
• Nice monitoring dashboard – prometheus +
grafana
• Works with existing Cassandra utility clients
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Flight Search Use Case
18
Flight search
•All flights from airport A to airport B on a given date and time
•# of stops: non-stop, one-stop, two-stop…
Data spec
•600+ airports, 350K+ flight schedules
Graph Model
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
Vertex: Airport
Airport code
Vertex: Country
Country name
Edge: Flight Schedule
Flight #
Departure date
Arrival date
Lessons Learned: Flight
Search
19
Model your graph database for performance
• Design data model for your use cases!
• Understand workload read/write ratio
• What kind of queries you want to support? How
many levels deep into a traversal?
• Consider denormalization…
• Design and use various indexes supported in
JanusGraph
Try different approaches to get results back faster
• Use pre-processor in custom app
• Use gremlin queries, applying filters as early as
possible in a query to limit the number of
traversals
• Use groovy methods as programmable extension
Fine-tune for your workloads and systems
• JanusGraph supports storage and index backends
therefore tune your backends!
• JanusGraph server configurations, such as
threadPoolBoss and threadPoolWorker
• JVM configurations, such as Xms (initial and
minimum Java heap size) and Xmx (maximum
Java heap size) You don’t want to see the
annoying java.lang.OutOfMemoryError exceptions
or long and slower GCs.
• Use multiple threads and/or instances to your
system’s capacity
• Consider cloud and auto-scaling
• Be thorough and be patient because it will take a
few iterations!
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
20
Thank you
compose.com/databases/janusgraph
twitter.com/pluradj
twitter.com/chinhuang007
github.com/IBM/janusgraph-utils
developer.ibm.com/code/patterns
DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
21DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation

More Related Content

What's hot

Start Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopStart Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopJason Plurad
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardDemai Ni
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinJason Plurad
 
Enabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopEnabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopJason Plurad
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyJason Plurad
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphJason Plurad
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRAkbajda
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaJason Plurad
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBigData_Europe
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
The IoT and big data
The IoT and big dataThe IoT and big data
The IoT and big dataGal Ben-Haim
 
NetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Insight
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Ibrahim Muhammadi
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRoverChristoph Matthies
 
Quix presto ide, presto summit IL
Quix presto ide, presto summit ILQuix presto ide, presto summit IL
Quix presto ide, presto summit ILOri Reshef
 

What's hot (20)

Start Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopStart Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPop
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and Gremlin
 
Enabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopEnabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPop
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical Overview
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Big data groningen
Big data groningenBig data groningen
Big data groningen
 
Data Science in the Cloud
Data Science in the CloudData Science in the Cloud
Data Science in the Cloud
 
The IoT and big data
The IoT and big dataThe IoT and big data
The IoT and big data
 
NetApp Flash Storage Facts
NetApp Flash Storage FactsNetApp Flash Storage Facts
NetApp Flash Storage Facts
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
Big data groningen
Big data groningenBig data groningen
Big data groningen
 
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRover
 
Quix presto ide, presto summit IL
Quix presto ide, presto summit ILQuix presto ide, presto summit IL
Quix presto ide, presto summit IL
 

Similar to Airline Reservations and Routing: A Graph Use Case

Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringDevOps.com
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
What's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateWhat's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateNeo4j
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowDataWorks Summit
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comKarin Patenge
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Databricks
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 

Similar to Airline Reservations and Routing: A Graph Use Case (20)

Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps Monitoring
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
What's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateWhat's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom Update
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 

Recently uploaded

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024Ortus Solutions, Corp
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsGlobus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfOrtus Solutions, Corp
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownloadvrstrong314
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfAMB-Review
 

Recently uploaded (20)

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 

Airline Reservations and Routing: A Graph Use Case

  • 1. Airline Reservations and Routing: A Graph Use Case Jason Plurad Chin Huang DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 2. Pilots 2DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Jason Plurad is a software developer in IBM Digital Business Group. He develops open source software and builds open communities in the big data and analytics space, with a current focus on graph databases and graph analytics. He is a Technical Steering Committee member and committer on JanusGraph and Apache TinkerPop. Chin Huang is a software engineer at the IBM Open Technologies and Performance. He has worked on various enterprise and open source projects. His current focus is JanusGraph and node.js development and performance characterization.
  • 3. How Did We Get Here? Jason • Raleigh (RDU) • Detroit (DTW) • Amsterdam (AMS) • Berlin (TXL) Chin • San Francisco (SFO) • Copenhagen (CPH) • Berlin (TXL) DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 4. Graphs are not new 4DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 5. Graph Data Use Cases 5 Social network analysis Configuration management database Master data management Recommendation engines Knowledge graphs Internet of things Cyber security attack analysis DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation C A B D
  • 6. Property Graph 6DOC ID / Month XX, 2018 / © 2018 IBM Corporation RDU DTW AMS TXLSFO CPH Type: vertex Label: airport Name: Berlin Tegel Code: TXL City: Berlin Country: Germany Type: edge Label: route Flight: 343 Distance: 501 Depart: 13:05 Arrive: 14:57
  • 7. Gremlin: Graph Traversal Language 7 What is the shortest path to Berlin? DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Apache TinkerPop https://tinkerpop.apache.org > g.V(rdu). repeat( out('route').simplePath() ). until( has('code’, TXL') ). limit(5). path().by('code'). toList() ==> [RDU, JFK, TXL] ==> [RDU, LAX, TXL] ==> [RDU, MIA, TXL] ==> [RDU, YYZ, TXL] ==> [RDU, SFO, TXL]
  • 8. JanusGraph DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation 8 JanusGraph Maintainer The Linux Foundation License Apache Releases 0.3.0 planned 2Q 2018 https://janusgraph.org • Established in January 2017 • Fork of TitanDB • Scalable graph database distributed on multi-machine clusters with pluggable storage and indexing • Vendor-neutral, open community with open governance • Founders: Expero, Google, Grakn, Hortonworks, IBM • Members: Amazon, Huawei, Netflix, Orchestral Developments, Seeq, Uber • In Production: Celum, Finc, G- Data, IBM Cloud, Seeq
  • 9. JanusGraph Architecture 9DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation http://docs.janusgraph.org/latest/arch-overview.html
  • 10. Graph database storage backends: Performance evaluation Graph use case: Air travel reservation 10DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 11. Performance Test Environment 11 Server spec • Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory • CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz • Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter • Disk: 720 GB SSD, RAID 5 • Operating system: Ubuntu 16.04.2 LTS Public tools • jMeter - load testing tool • nmon, nmon analyser - system performance monitor and analyze tool • VisualVM - all-in-one Java troubleshooting/profiling tool • GCeasy - garbage collection log analysis tool • Prometheus and grafana – monitoring dashboard DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 12. JanusGraph Utility Tools 12 How about graph data in volume? • Lack of existing data or unavailable for performance evaluation • What are the performance characteristics for various volumes • Graph Data Generator generates graph data in different sizes and shapes, so you can easily simulate real data and performance How to manage graph schema? • Lack of graph schema management tools • Graph schemas may change for optimal performance • Graph Schema Loader enables you to quickly load and update schema definitions in JanusGraph How to massively load data into a graph database? • Lots of RDBMS support data export to CSV files • I have millions/billions of records! • Data Batch Importer allows you to fully utilize system resources to import data in CSV files into JanusGraph Open source code: https://github.com/IBM/janusgraph-utils DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 13. Performance Test Topology 13DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Cassandra HBase + HDFS + ZooKeeper Scylla Cassandra HBase + HDFS + ZooKeeper Scylla Cassandra HBase + HDFS + ZooKeeper Scylla JanusGraph Database Cluster Load injector queryinsert, update
  • 14. Performance Evaluation: Insert Vertices 14DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation • 40 mil vertices in total • 2 properties for each vertex • Insert scenario • Fully utilize the injectors to generate the loading against the databases
  • 15. Performance Evaluation: Insert Edges 15DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation • 30 mil edges in total • 1 property for each edge • Query and update scenario
  • 16. Performance Evaluation: Graph Traversal 16DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 17. Lessons Learned: Storage Backends 17 Cassandra • Cluster bootstrapping takes more efforts • Smaller memory footprint HBase • Uneven CPU% caused by hot regions • Need to carefully configure read and write cache settings for better throughput Scylla • Easy clustering – adding multiple nodes at once • Well self-tuned but also lacks documentation • Even load distributed • Fully utilize system resources • CPU utilization misrepresents real loads • Nice monitoring dashboard – prometheus + grafana • Works with existing Cassandra utility clients DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 18. Flight Search Use Case 18 Flight search •All flights from airport A to airport B on a given date and time •# of stops: non-stop, one-stop, two-stop… Data spec •600+ airports, 350K+ flight schedules Graph Model DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Vertex: Airport Airport code Vertex: Country Country name Edge: Flight Schedule Flight # Departure date Arrival date
  • 19. Lessons Learned: Flight Search 19 Model your graph database for performance • Design data model for your use cases! • Understand workload read/write ratio • What kind of queries you want to support? How many levels deep into a traversal? • Consider denormalization… • Design and use various indexes supported in JanusGraph Try different approaches to get results back faster • Use pre-processor in custom app • Use gremlin queries, applying filters as early as possible in a query to limit the number of traversals • Use groovy methods as programmable extension Fine-tune for your workloads and systems • JanusGraph supports storage and index backends therefore tune your backends! • JanusGraph server configurations, such as threadPoolBoss and threadPoolWorker • JVM configurations, such as Xms (initial and minimum Java heap size) and Xmx (maximum Java heap size) You don’t want to see the annoying java.lang.OutOfMemoryError exceptions or long and slower GCs. • Use multiple threads and/or instances to your system’s capacity • Consider cloud and auto-scaling • Be thorough and be patient because it will take a few iterations! DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  • 21. 21DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation