SlideShare a Scribd company logo
POWERS OF TEN REDUX
JASON PLURAD • @pluradj
IBM • APACHE TINKERPOP • JANUSGRAPH
DATA DAY TEXAS • #DDTX18 • JANUARY 27, 2018
OPEN SOURCE GRAPH TECH
Property Graph
Connected Data
Model
Apache TinkerPop™
Graph Computing
Framework
JanusGraph® Scalable
Graph Database
Image credits: Apache TinkerPop (ALv2) and JanusGraph (CC-BY-4.
POWERS OF
TEN
Stephen Mallette
Image credit: spmallette on Twitter
101
TEN
Dart Paper Airplane
Image credit: Akkana on Wikimedia Commons, CC BY-SA 3.0
GRAPH TRAVERSALS
Vertex
id: 0
label:
person
• name:
Jason
Vertex
id: 2
label:
airplane
• name:
Dart
• type:
paper
Edge
id: 5, outV: 0, inV: 2
label: throws
• distance: 10
103
ONE
THOUSAND
Wright Flyer
Image credit: John T. Daniels on Wikimedia Commons, Public Domain
GREMLIN CONSOLE
• Read-Eval-Print Loop
• Instant gratification
• Help with reproducible
scripts
Image credit: Apache TinkerPop, ALv2
AIR ROUTES DATA, CSV TO PROPERTY GRAPH
Vertex
id: 0
label: airport
• code: AAE
• desc:
Annaba
Vertex
id: 2
label: airport
• code: ALG
• desc:
Algiers
Edge
id: 5, outV: 0, inV: 2
label: route
• distance: 254
airports.csv
(3,374)
routes.csv
(43,400)
CSV LOADING
• Leverage CSV
libraries
• Be aware of
auto-iteration
• Get-or-Create
pattern with
coalesce()
105
ONE HUNDRED
THOUSAND
Spirit of St. Louis
Image credit: Ad Meskens on Wikimedia Commons, CC BY-SA 3.0
GREMLIN SERVER AND REMOTE GRAPHS
• Gremlin Language Variants
(GLV) for queries, not for
bulkload
• Gremlin Client Drivers
enable efficient batch
scripting
• Use Script Parameterization.
Period.
Image credit: Apache TinkerPop, ALv2
NO PARAMETERIZATION
• Each script gets compiled and cached on the server – EXPENSIVE
• Eventually will exceed the GC overhead limit
BASIC PARAMETERIZATION
• Script is compiled once and reused on future requests
ADVANCED PARAMETERIZATION
• Leverage Groovy script evaluation to handle more complex
scripts
Gremlin-Groovy script
Parameters JSON
STRUCTURED RETURN VALUES
• Serializing all vertex properties and values can be expensive
• Judiciously decide what to include in the response
• Leverage Groovy scripting in combination with Gremlin
traversals for maximum efficiency
Image credit: Apache TinkerPop, ALv2
106
ONE MILLION
Cessna 172 Skyhawk
Image credit: Adrian Pingstone on Wikimedia Commons, Public Domain
JANUSGRAPH
• Open source project with open governance
• Community driven development
• Full implementation of Apache TinkerPop
• Apache license
• Broad adoption
Image credits: The Linux Foundation® and JanusGraph (CC-BY-4.
JANUSGRAPH STORAGE BACKENDS
• In-Memory
• Apache Cassandra, ScyllaDB
• Apache HBase, Google Cloud
Bigtable
• Oracle Berkeley DB Java Edition
• Amazon DynamoDB
Image credit: Apache TinkerPop, ALv2
JANUSGRAPH SCHEMA AND INDEXING
• Graph schema
• Vertex labels
• Edge labels: multiplicity
• Vertex properties: data types, cardinality
• Indexing
• Composite index: exact matches
• Mixed index: full-text search, numerical range, geospatial
• Vertex-centric index: local per vertex, a solution for supernodes
Image credit: JanusGraph, CC-BY-4.0
JANUSGRAPH QUICK-START DISTRIBUTION
• Local server mode
• Client, Storage, and Gremlin Server on a single machine
• Great for testing out JanusGraph, but not recommended for production
use
JANUSGRAPH DEPLOYMENT OPTIONS
• Remote server mode
• Client on first machine
• Storage on second machine
• Remote server mode with Gremlin Server
• Client on first machine
• Gremlin Server on second machine
• Storage on third machine
Image credit: JanusGraph, CC-BY-4.0
107
TEN MILLION
Bombardier CRJ700
Image credit: Aero Icarus on Wikimedia Commons, CC BY-SA 2.0
BATCHGRAPH FOR BOUTIQUE GRAPHS
• Wrapper for a graph instance
• Handle intermediate commits
• Maintain vertex cache
• For loading data only
• Not in Apache TinkerPop 3 or
JanusGraph
• Moved away from graph wrapper
approach
Image credit: Apache TinkerPop, ALv2
REPLACING BATCHGRAPH
• Intermediate commits
• Count the mutations and commit
periodically
• Vertex cache
• Enable fast lookup of vertices to connect
with edges
• Composite index
• LRU cache https://github.com/ben-
manes/caffeine
• Pre-sort the data to maximize cache hits
Image credit: Apache TinkerPop, ALv2
storage.batch-
loading
• Disables automatic schema
• Disables transaction logging
• Disables transactions on storage
backend
• Bigger dirty transaction cache
size
• Disables external vertex
existence checks
• Disables consistency checks
(verify uniqueness, acquire
locks)
Image credit: Apache TinkerPop, ALv2
MULTI-MODEL APPROACHES
• Only store the data you need for graph queries in the graph
• Rehydrate non-graph properties from another store
• Direct index queries
Image credit: Apache TinkerPop, ALv2
108
ONE HUNDRED
MILLION
Boeing 737
Image credit: JTOcchialini on Wikimedia Commons, CC BY-SA 2.0
FAUNUS / TITAN-HADOOP
• Faunus was the distributed graph
analytics engine from Aurelius
• Used Hadoop to do breadth-first
traversals using MapReduce
• OLAP abstraction was pulled into
Apache TinkerPop 3
Image credit: Apache TinkerPop, ALv2
HADOOPGRAPH I/O FORMATS
• TinkerPop formats pull from files
• GraphSONInputFormat
• GryoInputFormat
• ScriptInputFormat
• JanusGraph formats pull from
storage
• Cassandra3InputFormat
• HBaseInputFormat
Image credit: JanusGraph, CC-BY-4.0
SPARKGRAPHCOMPUTER AND
BULKLOADERVERTEXPROGRAM
• Flexible Spark deployment options
• Spark local with multiple threads
• Spark master with multiple workers
• Configure BLVP with ScriptInputFormat
• Script and data shared across workers via HDFS
• Assorted tips
• Pre-define schema before loading
• Define an index on “bulkLoader.vertex.id”
• gremlin.spark.persistStorageLevel=DISK_ONLY
Image credit: Apache TinkerPop, ALv2
109
ONE BILLION
Airbus A380
Image credit: Maarten Visser on Wikipedia, CC BY-SA 2.0
FULLY-
DISTRIBUTED
CLUSTER
COMPUTING
• Same loading
mechanics as pseudo-
distributed
• Consider a Hadoop
distribution, like
Apache Ambari or
Hortonworks Data
Platform
• Be aware of differences
between distributions,
especially software
versions
Image credit: Apache TinkerPop, ALv2
DON’T
WHEELIE
THE DUCATI
Ducati Wheelie
Image credit: David Hurt on Flickr, CC BY 2.0
THANK
YOU!
@pluradj
RESOURCES
• Apache TinkerPop
• @apachetinkerpop
• https://tinkerpop.apache.org
• JanusGraph
• @janusgraph
• https://janusgraph.org
• Powers of Ten
• Stephen Mallette @spmallette
• https://www.datastax.com/dev/blog/powers-
of-ten-part-i
• https://www.datastax.com/dev/blog/powers-
of-ten-part-ii
• Practical Gremlin
• Kelvin Lawrence @gfxman
• https://github.com/krlawrence/graph
• JanusGraph Code Patterns
• IBM Code @ibmcode
• https://github.com/IBM/janusgraph-utils
• HadoopMarcʼs Blog
• http://yaaics.blogspot.com
• JanusGraph Nuts and Bolts
• Ted Wilmes @trwilmes
• https://www.experoinc.com/post/janusgraph-
nuts-and-bolts-part-1-write-performance

More Related Content

What's hot

8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
Fabio Fumarola
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
Adam Doyle
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
DataWorks Summit
 
Securing Serverless Workloads with Cognito and API Gateway Part I - AWS Secur...
Securing Serverless Workloads with Cognito and API Gateway Part I - AWS Secur...Securing Serverless Workloads with Cognito and API Gateway Part I - AWS Secur...
Securing Serverless Workloads with Cognito and API Gateway Part I - AWS Secur...
Amazon Web Services
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
Treasure Data, Inc.
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
Yingjun Wu
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Amazon Web Services
 
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day
indeedeng
 
Apache Drill
Apache DrillApache Drill
Apache Drill
Ted Dunning
 
How to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active ArchitectureHow to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active Architecture
Amazon Web Services
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
Nguyen Quang
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS SummitAmazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Web Services
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
Luay AL-Assadi
 

What's hot (20)

8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 
Securing Serverless Workloads with Cognito and API Gateway Part I - AWS Secur...
Securing Serverless Workloads with Cognito and API Gateway Part I - AWS Secur...Securing Serverless Workloads with Cognito and API Gateway Part I - AWS Secur...
Securing Serverless Workloads with Cognito and API Gateway Part I - AWS Secur...
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
 
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Day
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
How to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active ArchitectureHow to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active Architecture
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS SummitAmazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
 

Similar to Powers of Ten Redux

JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
Jason Plurad
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
Demai Ni
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
David Giard
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
ITCamp
 
Ipres2019 sn-stormcrawler
Ipres2019 sn-stormcrawlerIpres2019 sn-stormcrawler
Ipres2019 sn-stormcrawler
sebastian_nagel
 
グラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみたグラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみた
Yoshiyasu SAEKI
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
boorad
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Thamme Gowda
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
Jozo Kovac
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
Anthony Baker
 
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
Yahoo Developer Network
 

Similar to Powers of Ten Redux (20)

JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
 
Ipres2019 sn-stormcrawler
Ipres2019 sn-stormcrawlerIpres2019 sn-stormcrawler
Ipres2019 sn-stormcrawler
 
グラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみたグラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみた
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Pr...
 

More from Jason Plurad

Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
Jason Plurad
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
Jason Plurad
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
Jason Plurad
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
Jason Plurad
 
Graph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopGraph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPop
Jason Plurad
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
Jason Plurad
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
Jason Plurad
 
Start Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopStart Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPop
Jason Plurad
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
Jason Plurad
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and Gremlin
Jason Plurad
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
Jason Plurad
 
Enabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopEnabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPop
Jason Plurad
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
Jason Plurad
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
Jason Plurad
 

More from Jason Plurad (14)

Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Graph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPopGraph Computing with Apache TinkerPop
Graph Computing with Apache TinkerPop
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
 
Start Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPopStart Flying with Python & Apache TinkerPop
Start Flying with Python & Apache TinkerPop
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and Gremlin
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
Enabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPopEnabling Multimodel Graphs with Apache TinkerPop
Enabling Multimodel Graphs with Apache TinkerPop
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 

Recently uploaded

Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
AnkitaPandya11
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
NishanthaBulumulla1
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
kalichargn70th171
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
Karya Keeper
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 

Recently uploaded (20)

Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 

Powers of Ten Redux

  • 1. POWERS OF TEN REDUX JASON PLURAD • @pluradj IBM • APACHE TINKERPOP • JANUSGRAPH DATA DAY TEXAS • #DDTX18 • JANUARY 27, 2018
  • 2. OPEN SOURCE GRAPH TECH Property Graph Connected Data Model Apache TinkerPop™ Graph Computing Framework JanusGraph® Scalable Graph Database Image credits: Apache TinkerPop (ALv2) and JanusGraph (CC-BY-4.
  • 3. POWERS OF TEN Stephen Mallette Image credit: spmallette on Twitter
  • 4. 101 TEN Dart Paper Airplane Image credit: Akkana on Wikimedia Commons, CC BY-SA 3.0
  • 5. GRAPH TRAVERSALS Vertex id: 0 label: person • name: Jason Vertex id: 2 label: airplane • name: Dart • type: paper Edge id: 5, outV: 0, inV: 2 label: throws • distance: 10
  • 6. 103 ONE THOUSAND Wright Flyer Image credit: John T. Daniels on Wikimedia Commons, Public Domain
  • 7. GREMLIN CONSOLE • Read-Eval-Print Loop • Instant gratification • Help with reproducible scripts Image credit: Apache TinkerPop, ALv2
  • 8. AIR ROUTES DATA, CSV TO PROPERTY GRAPH Vertex id: 0 label: airport • code: AAE • desc: Annaba Vertex id: 2 label: airport • code: ALG • desc: Algiers Edge id: 5, outV: 0, inV: 2 label: route • distance: 254 airports.csv (3,374) routes.csv (43,400)
  • 9. CSV LOADING • Leverage CSV libraries • Be aware of auto-iteration • Get-or-Create pattern with coalesce()
  • 10. 105 ONE HUNDRED THOUSAND Spirit of St. Louis Image credit: Ad Meskens on Wikimedia Commons, CC BY-SA 3.0
  • 11. GREMLIN SERVER AND REMOTE GRAPHS • Gremlin Language Variants (GLV) for queries, not for bulkload • Gremlin Client Drivers enable efficient batch scripting • Use Script Parameterization. Period. Image credit: Apache TinkerPop, ALv2
  • 12. NO PARAMETERIZATION • Each script gets compiled and cached on the server – EXPENSIVE • Eventually will exceed the GC overhead limit
  • 13. BASIC PARAMETERIZATION • Script is compiled once and reused on future requests
  • 14. ADVANCED PARAMETERIZATION • Leverage Groovy script evaluation to handle more complex scripts Gremlin-Groovy script Parameters JSON
  • 15. STRUCTURED RETURN VALUES • Serializing all vertex properties and values can be expensive • Judiciously decide what to include in the response • Leverage Groovy scripting in combination with Gremlin traversals for maximum efficiency Image credit: Apache TinkerPop, ALv2
  • 16. 106 ONE MILLION Cessna 172 Skyhawk Image credit: Adrian Pingstone on Wikimedia Commons, Public Domain
  • 17. JANUSGRAPH • Open source project with open governance • Community driven development • Full implementation of Apache TinkerPop • Apache license • Broad adoption Image credits: The Linux Foundation® and JanusGraph (CC-BY-4.
  • 18. JANUSGRAPH STORAGE BACKENDS • In-Memory • Apache Cassandra, ScyllaDB • Apache HBase, Google Cloud Bigtable • Oracle Berkeley DB Java Edition • Amazon DynamoDB Image credit: Apache TinkerPop, ALv2
  • 19. JANUSGRAPH SCHEMA AND INDEXING • Graph schema • Vertex labels • Edge labels: multiplicity • Vertex properties: data types, cardinality • Indexing • Composite index: exact matches • Mixed index: full-text search, numerical range, geospatial • Vertex-centric index: local per vertex, a solution for supernodes Image credit: JanusGraph, CC-BY-4.0
  • 20. JANUSGRAPH QUICK-START DISTRIBUTION • Local server mode • Client, Storage, and Gremlin Server on a single machine • Great for testing out JanusGraph, but not recommended for production use
  • 21. JANUSGRAPH DEPLOYMENT OPTIONS • Remote server mode • Client on first machine • Storage on second machine • Remote server mode with Gremlin Server • Client on first machine • Gremlin Server on second machine • Storage on third machine Image credit: JanusGraph, CC-BY-4.0
  • 22. 107 TEN MILLION Bombardier CRJ700 Image credit: Aero Icarus on Wikimedia Commons, CC BY-SA 2.0
  • 23. BATCHGRAPH FOR BOUTIQUE GRAPHS • Wrapper for a graph instance • Handle intermediate commits • Maintain vertex cache • For loading data only • Not in Apache TinkerPop 3 or JanusGraph • Moved away from graph wrapper approach Image credit: Apache TinkerPop, ALv2
  • 24. REPLACING BATCHGRAPH • Intermediate commits • Count the mutations and commit periodically • Vertex cache • Enable fast lookup of vertices to connect with edges • Composite index • LRU cache https://github.com/ben- manes/caffeine • Pre-sort the data to maximize cache hits Image credit: Apache TinkerPop, ALv2
  • 25. storage.batch- loading • Disables automatic schema • Disables transaction logging • Disables transactions on storage backend • Bigger dirty transaction cache size • Disables external vertex existence checks • Disables consistency checks (verify uniqueness, acquire locks) Image credit: Apache TinkerPop, ALv2
  • 26. MULTI-MODEL APPROACHES • Only store the data you need for graph queries in the graph • Rehydrate non-graph properties from another store • Direct index queries Image credit: Apache TinkerPop, ALv2
  • 27. 108 ONE HUNDRED MILLION Boeing 737 Image credit: JTOcchialini on Wikimedia Commons, CC BY-SA 2.0
  • 28. FAUNUS / TITAN-HADOOP • Faunus was the distributed graph analytics engine from Aurelius • Used Hadoop to do breadth-first traversals using MapReduce • OLAP abstraction was pulled into Apache TinkerPop 3 Image credit: Apache TinkerPop, ALv2
  • 29. HADOOPGRAPH I/O FORMATS • TinkerPop formats pull from files • GraphSONInputFormat • GryoInputFormat • ScriptInputFormat • JanusGraph formats pull from storage • Cassandra3InputFormat • HBaseInputFormat Image credit: JanusGraph, CC-BY-4.0
  • 30. SPARKGRAPHCOMPUTER AND BULKLOADERVERTEXPROGRAM • Flexible Spark deployment options • Spark local with multiple threads • Spark master with multiple workers • Configure BLVP with ScriptInputFormat • Script and data shared across workers via HDFS • Assorted tips • Pre-define schema before loading • Define an index on “bulkLoader.vertex.id” • gremlin.spark.persistStorageLevel=DISK_ONLY Image credit: Apache TinkerPop, ALv2
  • 31. 109 ONE BILLION Airbus A380 Image credit: Maarten Visser on Wikipedia, CC BY-SA 2.0
  • 32. FULLY- DISTRIBUTED CLUSTER COMPUTING • Same loading mechanics as pseudo- distributed • Consider a Hadoop distribution, like Apache Ambari or Hortonworks Data Platform • Be aware of differences between distributions, especially software versions Image credit: Apache TinkerPop, ALv2
  • 33. DON’T WHEELIE THE DUCATI Ducati Wheelie Image credit: David Hurt on Flickr, CC BY 2.0
  • 34. THANK YOU! @pluradj RESOURCES • Apache TinkerPop • @apachetinkerpop • https://tinkerpop.apache.org • JanusGraph • @janusgraph • https://janusgraph.org • Powers of Ten • Stephen Mallette @spmallette • https://www.datastax.com/dev/blog/powers- of-ten-part-i • https://www.datastax.com/dev/blog/powers- of-ten-part-ii • Practical Gremlin • Kelvin Lawrence @gfxman • https://github.com/krlawrence/graph • JanusGraph Code Patterns • IBM Code @ibmcode • https://github.com/IBM/janusgraph-utils • HadoopMarcʼs Blog • http://yaaics.blogspot.com • JanusGraph Nuts and Bolts • Ted Wilmes @trwilmes • https://www.experoinc.com/post/janusgraph- nuts-and-bolts-part-1-write-performance

Editor's Notes

  1. One of the first problems a developer encounters when evaluating a graph database is how to construct a graph efficiently. Recognizing this need in 2014, TinkerPop's Stephen Mallette penned a series of blog posts titled "Powers of Ten" which addressed several bulkload techniques for Titan. Since then Titan has gone away, and the open source graph database landscape has evolved significantly. Do the same approaches stand the test of time? In this session, we will take a deep dive into strategies for loading data of various sizes into modern Apache TinkerPop graph systems. We will discuss bulkloading with JanusGraph, the scalable graph database forked from Titan, to better understand how its architecture can be optimized for ingestion.
  2. spmallette on Twitter https://twitter.com/spmallette/status/931575876046729217
  3. Akkana on Wikimedia Commons, CC BY-SA 3.0 https://commons.wikimedia.org/wiki/File:Paperairplane.png
  4. John T. Daniels on Wikimedia Commons, Public Domain https://commons.wikimedia.org/wiki/File:First_flight2.jpg
  5. Ad Meskens on Wikimedia Commons, CC BY-SA 3.0 https://commons.wikimedia.org/wiki/File:Spirit_Of_St_Louis2.jpg
  6. Adrian Pingstone on Wikimedia Commons, Public Domain https://commons.wikimedia.org/wiki/File:Cessna_172S_Skyhawk_at_Bristol_Airport_(England)_23Aug2014_arp.jpg
  7. Aero Icarus on Wikimedia Commons, CC BY-SA 2.0 https://commons.wikimedia.org/wiki/File:Delta_Connection_Canadair_CRJ700;_N603QX@SLC;09.10.2011_621ds_(6299961315).jpg
  8. JTOcchialini on Wikimedia Commons, CC BY-SA 2.0 https://commons.wikimedia.org/wiki/File:WestJet_C-GWSZ_Disney_World_JTPI_9598_(14506120928).jpg
  9. Maarten Visser on Wikimedia Commons, CC BY-SA 2.0 https://commons.wikimedia.org/wiki/File:A6-EDY_A380_Emirates_31_jan_2013_jfk_(8442269364)_(cropped).jpg
  10. David Hurt on Flickr, CC BY 2.0 https://www.flickr.com/photos/davidht/1787402541