Running Neo4j in Production: Tips, Tricks and Optimizations

•Download as PPTX, PDF•

0 likes•532 views

Nick Manning

Technology

Running Neo4j in Production
Tips, Tricks and Optimizations

This Talk...
● How we scaled our prod graph
● Challenges faced doing this
● Various lessons we learned and techniques
we used
● Some stuff I’m looking forward to in Neo4j

SNAP Interactive
● Presented by David Fox (Big Data Engineer)
● Social dating app AYI (Are You Interested?)
● Friends and interests

How We Use Neo4j
● Model the friend data of our millions of users
● Indicate connections everywhere on app
● 1.1+ billion nodes
● 8.5+ billion relationships
● 450gb+ store
● 3 instance cluster

Importing lots of data
● Find the right tool
o First try normal Cypher
o No good? Bring out the big guns - Java Batch
Inserter
● Java Batch Inserter
o Sort relationships (GNU sort)
o Try to keep index lookups to in-memory lookups only
 Giant HashMap!

But wait!!!
● Cypher CSV import
o 2.1 M01
o Supposed to be good for importing large data sets
o Anyone tried it?

Read Querying
● Always try Cypher first
o Performance is being improved
● How can you tell if performance is where you
need it to be?
o Time queries (cold vs. warm cache)
o Load testing!

Read Querying cont.
● Dark querying
o Great for benchmarking system where Neo4j
functionality is being injected
o Mitigates risk
o Provides results that are very close to real world
patterns

Read Querying cont.
● Reads too slow? Try these things.
o Write high-throughput business-critical queries in
Java
 unmanaged extension
 faster
 hard limits
o Cache shard
 country, age, gender, etc.
 you hit warm cache more often

Read Querying cont.
● Warm the cache!
o Touch all the nodes
o Touch all the relationships

Writing
● Decide which writes need to be synchronous
and which can be asynchronous
● Queue up asynchronous writes (routine
updates, non-vital to immediate user-
experience)
o Try to evenly distribute them
o How do we do this? Baserunner!

Baserunner
● Written by SNAP developer
● Walks userbase randomly instead of
sequentially
o This avoids pockets of heavily increased write
queries
o Allows us to do high-velocity updating of our data

Tuning the JVM
● For a really high-throughput environment,
G1 GC has been very helpful
o Good at adapting itself
o We experienced less system-stopping pauses than
with CMS
o Try CMS first but remember G1 as option

Hardware is Important
● Lots of memory
● Working set too big for memory?
o SSDs are helpful
o Optimization techniques discussed become much
more important

Not Everything is Your Fault!
● Like any software, Neo4j has bugs
● Developers are receptive
● File reports on Github when you find issues

Some stuff to look forward to...
● Relationship grouping (2.1 M01)
o helps mitigate the super node/dense node problem
● Ronja (rewrite of the Cypher query
language, 2.1?)
● More flexible label index searching (after
2.1)

Similar to Running Neo4j in Production: Tips, Tricks and Optimizations

OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLoveNETWAYS

Monitoring and automationRicardo Bánffy

The 5 Minute MySQL DBAIrawan Soetomo

Tips about hibernate with spring data jpaThiago Dos Santos Hora

Writing clean scientific software Murphy cleancodingsaber tabatabaee

Path dependent-development (PyCon India)ncoghlan_dev

The Professional ProgrammerDave Cross

Jonathan Coveney: Why Pig?mortardata

Unbreaking Your Django ApplicationOSCON Byrum

Path Dependent Development (PyCon AU)ncoghlan_dev

Cloud accounting software ukArcus Universe Ltd

OSMC 2015 | Testing in Production by Devdas BhagatNETWAYS

OSMC 2015: Testing in Production by Devdas BhagatNETWAYS

Devops, the future is here, it's just not evenly distributed yet.Kris Buytaert

Bringing Open-Source Practices to Your Day JobBen Coe

Spaghetti gateJon Bachelor

Services, tools & practices for a software houseParis Apostolopoulos

High performance json- postgre sql vs. mongodbWei Shan Ang

What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago

Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversHYS Enterprise

Similar to Running Neo4j in Production: Tips, Tricks and Optimizations (20)

OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove

Monitoring and automation

The 5 Minute MySQL DBA

Tips about hibernate with spring data jpa

Writing clean scientific software Murphy cleancoding

Path dependent-development (PyCon India)

The Professional Programmer

Jonathan Coveney: Why Pig?

Unbreaking Your Django Application

Path Dependent Development (PyCon AU)

Cloud accounting software uk

OSMC 2015 | Testing in Production by Devdas Bhagat

OSMC 2015: Testing in Production by Devdas Bhagat

Devops, the future is here, it's just not evenly distributed yet.

Bringing Open-Source Practices to Your Day Job

Spaghetti gate

Services, tools & practices for a software house

High performance json- postgre sql vs. mongodb

What drives Innovation? Innovations And Technological Solutions for the Distr...

Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers

Recently uploaded

Commit 2024 - Secret Management made easyAlfredo García Lavilla

APIForce Zurich 5 April Automation LPDGMarianaLemus7

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Understanding the Laravel MVC ArchitecturePixlogix Infotech

WordPress Websites for Engineers: Elevate Your Brandgvaughan

CloudStudio User manual (basic edition):comworks

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

AI as an Interface for Commercial BuildingsMemoori

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Recently uploaded (20)

Commit 2024 - Secret Management made easy

APIForce Zurich 5 April Automation LPDG

"Debugging python applications inside k8s environment", Andrii Soldatenko

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Ensuring Technical Readiness For Copilot in Microsoft 365

Advanced Test Driven-Development @ php[tek] 2024

DevEX - reference for building teams, processes, and platforms

Vertex AI Gemini Prompt Engineering Tips

Developer Data Modeling Mistakes: From Postgres to NoSQL

Understanding the Laravel MVC Architecture

WordPress Websites for Engineers: Elevate Your Brand

CloudStudio User manual (basic edition):

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Artificial intelligence in cctv survelliance.pptx

Streamlining Python Development: A Guide to a Modern Project Setup

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

AI as an Interface for Commercial Buildings

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Running Neo4j in Production: Tips, Tricks and Optimizations

1. Running Neo4j in Production Tips, Tricks and Optimizations

2. This Talk... ● How we scaled our prod graph ● Challenges faced doing this ● Various lessons we learned and techniques we used ● Some stuff I’m looking forward to in Neo4j

3. SNAP Interactive ● Presented by David Fox (Big Data Engineer) ● Social dating app AYI (Are You Interested?) ● Friends and interests

4. How We Use Neo4j ● Model the friend data of our millions of users ● Indicate connections everywhere on app ● 1.1+ billion nodes ● 8.5+ billion relationships ● 450gb+ store ● 3 instance cluster

5. Importing lots of data ● Find the right tool o First try normal Cypher o No good? Bring out the big guns - Java Batch Inserter ● Java Batch Inserter o Sort relationships (GNU sort) o Try to keep index lookups to in-memory lookups only  Giant HashMap!

6. But wait!!! ● Cypher CSV import o 2.1 M01 o Supposed to be good for importing large data sets o Anyone tried it?

7. Read Querying ● Always try Cypher first o Performance is being improved ● How can you tell if performance is where you need it to be? o Time queries (cold vs. warm cache) o Load testing!

8. Read Querying cont. ● Dark querying o Great for benchmarking system where Neo4j functionality is being injected o Mitigates risk o Provides results that are very close to real world patterns

9. Read Querying cont. ● Reads too slow? Try these things. o Write high-throughput business-critical queries in Java  unmanaged extension  faster  hard limits o Cache shard  country, age, gender, etc.  you hit warm cache more often

10. Read Querying cont. ● Warm the cache! o Touch all the nodes o Touch all the relationships

11. Writing ● Decide which writes need to be synchronous and which can be asynchronous ● Queue up asynchronous writes (routine updates, non-vital to immediate user- experience) o Try to evenly distribute them o How do we do this? Baserunner!

12. Baserunner ● Written by SNAP developer ● Walks userbase randomly instead of sequentially o This avoids pockets of heavily increased write queries o Allows us to do high-velocity updating of our data

13. Tuning the JVM ● For a really high-throughput environment, G1 GC has been very helpful o Good at adapting itself o We experienced less system-stopping pauses than with CMS o Try CMS first but remember G1 as option

14. Hardware is Important ● Lots of memory ● Working set too big for memory? o SSDs are helpful o Optimization techniques discussed become much more important

15. Not Everything is Your Fault! ● Like any software, Neo4j has bugs ● Developers are receptive ● File reports on Github when you find issues

16. Some stuff to look forward to... ● Relationship grouping (2.1 M01) o helps mitigate the super node/dense node problem ● Ronja (rewrite of the Cypher query language, 2.1?) ● More flexible label index searching (after 2.1)

17. Questions?

Running Neo4j in Production: Tips, Tricks and Optimizations

Recommended

Recommended

More Related Content

Similar to Running Neo4j in Production: Tips, Tricks and Optimizations

Similar to Running Neo4j in Production: Tips, Tricks and Optimizations (20)

Recently uploaded

Recently uploaded (20)

Running Neo4j in Production: Tips, Tricks and Optimizations