SlideShare a Scribd company logo
On-boarding with JanusGraph Performance
June 17, 2017
Chin Huang, chhuang@us.ibm.com;github:chinhuang007
Yi-Hong Wang, yh.wang@us.ibm.com;github:yhwang
Ted Chang, htchang@ibm.com;github:tedhtchang
JanusGraph:@JanusGraph
Agenda
Overview – Onboarding with graph performance
JanusGraph performance evaluation scenarios
• Bulk loader performance
• Data import performance
• Query performance
Lessons learned
Q&A
Onboarding with graph performance
Exciting era with many new technologies!!
Onboarding users/developers to graph databases
• Typical focus areas: features and benefits, ease of use, suitability,
extensibility, APIs…
• Performance is one of the most important differentiators for any
application
• Is performance just for system testing?!
• Performance and scalability are key considerations for design,
development, and operations
Journey to JanusGraph with a performance mind!
• Check out graph structures and traversals
• Evaluate reads and writes in high volume
• Can JanusGraph scale out for future data/user growth?
• Look for bottlenecks and provide improvements
JanusGraph performance test environment
Server spec
• Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory
• CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz
• Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter
• Disk: 720 GB SSD, RAID 5
• Operating system: Ubuntu 16.04.2 LTS
Existing tools
• jMeter - load testing tool
• nmon, nmon analyser - system performance monitor and analyze tool
• VisualVM - all-in-one Java troubleshooting/profiling tool
• GCeasy - garbage collection log analysis tool
Home grown tools
• Graph schema loader, data generator, batch importer, batch requester
JanusGraph performance tool - Graph schema loader
Enable the graph model creation via the
gremlin console or embedded in java
• Use JSON to describe your graph model
• Support:
• Property
• Vertex
• Edge
• Index
Benefit: Create schema on-the-fly
without single line of code!
https://github.com/yhwang/janusgraph-utils
Bulk load performance – Use case and data
Data Migration
OneTimeBulkLoader
Batch Update
IncrementalBulkLoader
011110100101100101Gryo:
{“id”: 1, “label”:…}GraphSON:
1:person:marko:29Script:
Supported Formats
• OneTimeBulkLoader
• 128GB GraphSON file
• 31 million vertices
• 38 million edges
• 3277 propertyKeys
• 5 vertex labels
• 3 edge labels
• 78.9 properties per edge
• 18.7 properties per vertex
Bulk load performance – Topology
HDFS
• Spark - 1.6.1
• Standalone Cluster
• 2 worker nodes
• 8 executors per node
• 8 cores per executor
• 2GB per executor
• Hadoop - 2.7.2
• Use HDFS to store the GraphSON file
• Cassandra - 2.1.17
• 2-node cluster
• Tinkerpop3 – 3.2.3
• GraphComputer
• JanusGraph – 0.1.1
• JanusGraphBulkLoaderVertexProgram
• Astyanax persistence provider
Worker Node
Cassandra
BulkLoader
+ + HDFS client
Executor
X 8
Worker Node
Executor X 8
Cassandra
Cluster Master
Bulk load performance – results
• Vertex:
• 31,594,277
• 19 mins
• 495 records/sec per core
• Edge:
• 38,322,731
• 24.8 mins
• 461.8 records/sec per core
0
10
20
30
40
50
60
70
80
90
100
CPU(%)
Node1 Node2
0
10
20
30
40
50
60
70
80
90
100
CPU(%)
Node1 Node2
Data import performance – use case and data
Synthetic Data
Small Medium Large XLarge 10x Properties 50x Properties 100x Properties
Vertices(Million) 0.3 3 30 30 3 3 3
Edges (Million) 0.3 3 30 300 3 3 3
PropertyKeys 7 7 7 7 70 350 700
Vertex labels 3 3 3 3 3 3 3
Edge labels 2 2 2 2 2 2 2
Public Data
Wikimedia votes Higgs Twitter Panama Papers
Vertices(Million) 0.007 0.456 1.04
Edges (Million) 0.1 16 1.53
PropertyKeys 0 2 22
Vertex labels 1 1 5
Edge labels 1 4 261
Data import performance – topology and configuration
All-In-One-Node
Cassandra
Batch Importer
+
CSV Data Generator
+
JanusGraph configuration:
storage.backend=astyanax
ids.block-size = 500000
storage.buffer-size = 2560
storage.batch-loading = true
schema.default = none
BatchImporter configuration:
commit size = 100
worker target size = 10000
Schema Loader
+
Data import performance tooling- Graph data generator
A Java application
• Vertices and edges labels
• Number of vertices and edges
• Number of properties and data types
• Native and mixed index
• Relations patterns
• Super-nodes
• Generate graph-db schema in JSON
• Generate datamap JSON for BatchImporter
https://github.ibm.com/htchang/JanusGraphBench
Data import performance tooling - Graph data batch importer
Java application to Import CSV data into JanusGraph
Features:
• Multiple Threads
• Worker record size
• Commit size
• Import schema
• Import CSV to JanusGraph with configurable data mapping
https://github.com/sdmonov/JanusGraphBatchImporter
Data import performance – results
0.2 1
10 10
0.2 2
25
241
55
70
73 73
0
10
20
30
40
50
60
70
80
0
50
100
150
200
250
300
CPU%
ImportTime(min)
Size of DB
Batch Import Time V.S. # of Records
Vertex Import(min) Edges Import(min) CPU%
1,648
319
35 10
788
450
133
52
70
80
90 90
0
10
20
30
40
50
60
70
80
90
100
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
medium (8mil) 10x Properties(80mil) 50x Properties(400mil) 100x Properties(800mil)
CPU%
records/sec
Size of DB
Insert rate V.S. # of Properties Per Record
Vertex/sec/core Edges/sec/core CPU%
Data query performance – use case and data
Flight search
• All flights from airport A to airport B on a given date and time
• # of stops: non-stop, one-stop, two-stop…
Data spec
• 600+ airports, 350K+ flight schedules
Performance analysis
• How many requests per second can JanusGraph handle?
• Can JanusGraph scale with future volume growth?
Data query performance - Topology and configuration
JanusG
raph
Server
ElasticSearchCassandra
Storage Backend Node Index Backend Node
JanusGraph Node 1 JanusGraph Node 2
JanusG
raph
Server
JanusG
raph
Server
JanusG
raph
Server
JanusG
raph
Server
JanusG
raph
Server
Load Driver Node
jMeter (thread groups)
REST Calls, http post
• JanusGraph server with REST
• 1 or 10 instances per server
• Astyanax persistence provider
• threadPoolBoss: 2
• threadPoolWorker: 20
• Java heap: -Xms512m -Xmx8G
• Concurrent threads (users): 1, 5, 10, 20,
40, 100, 200
• Think time: 0 ms
• Run duration: 5 minutes
• Multiple test configurations
• 10 instances on 1 node
• 20 instances on 2 nodes
• 30 instances on 3 nodes
. . . . . . . .. .
Data query performance – Non-stop flights (one level deep traversals)
Response Time
Concurrent threads Concurrent threads
TPS
milliseconds
transactions
Performs well regardless number of instances and nodes
Data query performance – One-stop flights (two levels deep traversals)
People would like to see more than just non stop flights…
Response Time
Concurrent threads Concurrent threads
TPS
milliseconds
transactions
Data query performance – Two-stop flights (three levels deep traversals)
The query gets complicated because we need to operate and filter on
multiple vertices and edges.
Response Time
Concurrent threads Concurrent threads
TPS
milliseconds
transactions
Lessons Learned
Model your graph database for performance
• Data is yours. Design the data model for your use cases!
• What kind of queries you want to support? How many levels deep into a traversal?
• Consider denormalization…
• Design and use indexes, graph indexes and vertex-centric indexes in JanusGraph, for better performance, but
not over-use indexes
• It is recommended to create the complete data model before inserting content
Use batch commits with caution
• Batch commits allow multiple transactions to be committed together. The batch size affects performance and the
optimal size depends on the characteristics of data.
• Need to handle conflicts for inserts and updates in a multi-threads/multi-clients implementation
• Make sure the commit is completed and closed
Lessons Learned
Fine-tune for your workloads and systems
• JanusGraph supports storage and index backends therefore tune your backends!
• JanusGraph server configurations, such as threadPoolBoss and threadPoolWorker
• JVM configurations, such as Xms (initial and minimum Java heap size) and Xmx (maximum Java heap size)
You don’t want to see the annoying java.lang.OutOfMemoryError exceptions  But at the same time an
oversized Xmx has negative impact on performance due to long and slower GCs.
• Use multiple threads and/or instances to your system’s capacity
• Next step… consider cloud and auto-scaling
• Be thorough and be patient because it will take a few iterations
• Just like a fine-tuned instrument, you will enjoy the beautiful music for a long time!
Compose for JanusGraph
What is it?
• Compose is an open-source database hosting provider
• Supports backups, monitoring, performance tuning, and a full-suite of deployment management tools backed
by a 24x7 support and operations team
• Offers JanusGraph technology with Scylla database
• https://www.compose.com/janusgraph
Thank you for keeping performance in mind !!
Chin Huang, chhuang@us.ibm.com;github:chinhuang007
Yi-Hong Wang, yh.wang@ibm.com;github:yhwang
Ted Chang, htchang@ibm.com;github:tedhtchang
What’s next?
The journey continues…
• Find ways to improve JanusGraph performance
• Join us if you are interested in graph performance
• Work with us if you have graph datasets
• Talk to us if you have any comments or suggestions

More Related Content

What's hot

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
Morgan Tocker
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
Martin Traverso
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
MongoDB
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
Databricks
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 

What's hot (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 

Similar to On-boarding with JanusGraph Performance

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
harendra_pathak
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
Thomas Segismont
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
MongoDB
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
Arnab Biswas
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPages
Ulrich Krause
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
Steve Feldman
 
XPages Performance
XPages PerformanceXPages Performance
XPages Performance
Ulrich Krause
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
Avi Kedar
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
Ryan Cuprak
 
Performance on a budget
Performance on a budgetPerformance on a budget
Performance on a budget
Dimitry Ushakov
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
DataWorks Summit
 
What's New in .Net 4.5
What's New in .Net 4.5What's New in .Net 4.5
What's New in .Net 4.5
Malam Team
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
Kevin Webber
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
Teamstudio
 

Similar to On-boarding with JanusGraph Performance (20)

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPages
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
 
XPages Performance
XPages PerformanceXPages Performance
XPages Performance
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
Performance on a budget
Performance on a budgetPerformance on a budget
Performance on a budget
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
What's New in .Net 4.5
What's New in .Net 4.5What's New in .Net 4.5
What's New in .Net 4.5
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
 

Recently uploaded

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 

Recently uploaded (20)

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 

On-boarding with JanusGraph Performance

  • 1. On-boarding with JanusGraph Performance June 17, 2017 Chin Huang, chhuang@us.ibm.com;github:chinhuang007 Yi-Hong Wang, yh.wang@us.ibm.com;github:yhwang Ted Chang, htchang@ibm.com;github:tedhtchang JanusGraph:@JanusGraph
  • 2. Agenda Overview – Onboarding with graph performance JanusGraph performance evaluation scenarios • Bulk loader performance • Data import performance • Query performance Lessons learned Q&A
  • 3. Onboarding with graph performance Exciting era with many new technologies!! Onboarding users/developers to graph databases • Typical focus areas: features and benefits, ease of use, suitability, extensibility, APIs… • Performance is one of the most important differentiators for any application • Is performance just for system testing?! • Performance and scalability are key considerations for design, development, and operations Journey to JanusGraph with a performance mind! • Check out graph structures and traversals • Evaluate reads and writes in high volume • Can JanusGraph scale out for future data/user growth? • Look for bottlenecks and provide improvements
  • 4. JanusGraph performance test environment Server spec • Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory • CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz • Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter • Disk: 720 GB SSD, RAID 5 • Operating system: Ubuntu 16.04.2 LTS Existing tools • jMeter - load testing tool • nmon, nmon analyser - system performance monitor and analyze tool • VisualVM - all-in-one Java troubleshooting/profiling tool • GCeasy - garbage collection log analysis tool Home grown tools • Graph schema loader, data generator, batch importer, batch requester
  • 5. JanusGraph performance tool - Graph schema loader Enable the graph model creation via the gremlin console or embedded in java • Use JSON to describe your graph model • Support: • Property • Vertex • Edge • Index Benefit: Create schema on-the-fly without single line of code! https://github.com/yhwang/janusgraph-utils
  • 6. Bulk load performance – Use case and data Data Migration OneTimeBulkLoader Batch Update IncrementalBulkLoader 011110100101100101Gryo: {“id”: 1, “label”:…}GraphSON: 1:person:marko:29Script: Supported Formats • OneTimeBulkLoader • 128GB GraphSON file • 31 million vertices • 38 million edges • 3277 propertyKeys • 5 vertex labels • 3 edge labels • 78.9 properties per edge • 18.7 properties per vertex
  • 7. Bulk load performance – Topology HDFS • Spark - 1.6.1 • Standalone Cluster • 2 worker nodes • 8 executors per node • 8 cores per executor • 2GB per executor • Hadoop - 2.7.2 • Use HDFS to store the GraphSON file • Cassandra - 2.1.17 • 2-node cluster • Tinkerpop3 – 3.2.3 • GraphComputer • JanusGraph – 0.1.1 • JanusGraphBulkLoaderVertexProgram • Astyanax persistence provider Worker Node Cassandra BulkLoader + + HDFS client Executor X 8 Worker Node Executor X 8 Cassandra Cluster Master
  • 8. Bulk load performance – results • Vertex: • 31,594,277 • 19 mins • 495 records/sec per core • Edge: • 38,322,731 • 24.8 mins • 461.8 records/sec per core 0 10 20 30 40 50 60 70 80 90 100 CPU(%) Node1 Node2 0 10 20 30 40 50 60 70 80 90 100 CPU(%) Node1 Node2
  • 9. Data import performance – use case and data Synthetic Data Small Medium Large XLarge 10x Properties 50x Properties 100x Properties Vertices(Million) 0.3 3 30 30 3 3 3 Edges (Million) 0.3 3 30 300 3 3 3 PropertyKeys 7 7 7 7 70 350 700 Vertex labels 3 3 3 3 3 3 3 Edge labels 2 2 2 2 2 2 2 Public Data Wikimedia votes Higgs Twitter Panama Papers Vertices(Million) 0.007 0.456 1.04 Edges (Million) 0.1 16 1.53 PropertyKeys 0 2 22 Vertex labels 1 1 5 Edge labels 1 4 261
  • 10. Data import performance – topology and configuration All-In-One-Node Cassandra Batch Importer + CSV Data Generator + JanusGraph configuration: storage.backend=astyanax ids.block-size = 500000 storage.buffer-size = 2560 storage.batch-loading = true schema.default = none BatchImporter configuration: commit size = 100 worker target size = 10000 Schema Loader +
  • 11. Data import performance tooling- Graph data generator A Java application • Vertices and edges labels • Number of vertices and edges • Number of properties and data types • Native and mixed index • Relations patterns • Super-nodes • Generate graph-db schema in JSON • Generate datamap JSON for BatchImporter https://github.ibm.com/htchang/JanusGraphBench
  • 12. Data import performance tooling - Graph data batch importer Java application to Import CSV data into JanusGraph Features: • Multiple Threads • Worker record size • Commit size • Import schema • Import CSV to JanusGraph with configurable data mapping https://github.com/sdmonov/JanusGraphBatchImporter
  • 13. Data import performance – results 0.2 1 10 10 0.2 2 25 241 55 70 73 73 0 10 20 30 40 50 60 70 80 0 50 100 150 200 250 300 CPU% ImportTime(min) Size of DB Batch Import Time V.S. # of Records Vertex Import(min) Edges Import(min) CPU% 1,648 319 35 10 788 450 133 52 70 80 90 90 0 10 20 30 40 50 60 70 80 90 100 0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 medium (8mil) 10x Properties(80mil) 50x Properties(400mil) 100x Properties(800mil) CPU% records/sec Size of DB Insert rate V.S. # of Properties Per Record Vertex/sec/core Edges/sec/core CPU%
  • 14. Data query performance – use case and data Flight search • All flights from airport A to airport B on a given date and time • # of stops: non-stop, one-stop, two-stop… Data spec • 600+ airports, 350K+ flight schedules Performance analysis • How many requests per second can JanusGraph handle? • Can JanusGraph scale with future volume growth?
  • 15. Data query performance - Topology and configuration JanusG raph Server ElasticSearchCassandra Storage Backend Node Index Backend Node JanusGraph Node 1 JanusGraph Node 2 JanusG raph Server JanusG raph Server JanusG raph Server JanusG raph Server JanusG raph Server Load Driver Node jMeter (thread groups) REST Calls, http post • JanusGraph server with REST • 1 or 10 instances per server • Astyanax persistence provider • threadPoolBoss: 2 • threadPoolWorker: 20 • Java heap: -Xms512m -Xmx8G • Concurrent threads (users): 1, 5, 10, 20, 40, 100, 200 • Think time: 0 ms • Run duration: 5 minutes • Multiple test configurations • 10 instances on 1 node • 20 instances on 2 nodes • 30 instances on 3 nodes . . . . . . . .. .
  • 16. Data query performance – Non-stop flights (one level deep traversals) Response Time Concurrent threads Concurrent threads TPS milliseconds transactions Performs well regardless number of instances and nodes
  • 17. Data query performance – One-stop flights (two levels deep traversals) People would like to see more than just non stop flights… Response Time Concurrent threads Concurrent threads TPS milliseconds transactions
  • 18. Data query performance – Two-stop flights (three levels deep traversals) The query gets complicated because we need to operate and filter on multiple vertices and edges. Response Time Concurrent threads Concurrent threads TPS milliseconds transactions
  • 19. Lessons Learned Model your graph database for performance • Data is yours. Design the data model for your use cases! • What kind of queries you want to support? How many levels deep into a traversal? • Consider denormalization… • Design and use indexes, graph indexes and vertex-centric indexes in JanusGraph, for better performance, but not over-use indexes • It is recommended to create the complete data model before inserting content Use batch commits with caution • Batch commits allow multiple transactions to be committed together. The batch size affects performance and the optimal size depends on the characteristics of data. • Need to handle conflicts for inserts and updates in a multi-threads/multi-clients implementation • Make sure the commit is completed and closed
  • 20. Lessons Learned Fine-tune for your workloads and systems • JanusGraph supports storage and index backends therefore tune your backends! • JanusGraph server configurations, such as threadPoolBoss and threadPoolWorker • JVM configurations, such as Xms (initial and minimum Java heap size) and Xmx (maximum Java heap size) You don’t want to see the annoying java.lang.OutOfMemoryError exceptions  But at the same time an oversized Xmx has negative impact on performance due to long and slower GCs. • Use multiple threads and/or instances to your system’s capacity • Next step… consider cloud and auto-scaling • Be thorough and be patient because it will take a few iterations • Just like a fine-tuned instrument, you will enjoy the beautiful music for a long time!
  • 21. Compose for JanusGraph What is it? • Compose is an open-source database hosting provider • Supports backups, monitoring, performance tuning, and a full-suite of deployment management tools backed by a 24x7 support and operations team • Offers JanusGraph technology with Scylla database • https://www.compose.com/janusgraph
  • 22. Thank you for keeping performance in mind !! Chin Huang, chhuang@us.ibm.com;github:chinhuang007 Yi-Hong Wang, yh.wang@ibm.com;github:yhwang Ted Chang, htchang@ibm.com;github:tedhtchang What’s next? The journey continues… • Find ways to improve JanusGraph performance • Join us if you are interested in graph performance • Work with us if you have graph datasets • Talk to us if you have any comments or suggestions