SlideShare a Scribd company logo
1 of 17
Running Neo4j in Production
Tips, Tricks and Optimizations
This Talk...
● How we scaled our prod graph
● Challenges faced doing this
● Various lessons we learned and techniques
we used
● Some stuff I’m looking forward to in Neo4j
SNAP Interactive
● Presented by David Fox (Big Data Engineer)
● Social dating app AYI (Are You Interested?)
● Friends and interests
How We Use Neo4j
● Model the friend data of our millions of users
● Indicate connections everywhere on app
● 1.1+ billion nodes
● 8.5+ billion relationships
● 450gb+ store
● 3 instance cluster
Importing lots of data
● Find the right tool
o First try normal Cypher
o No good? Bring out the big guns - Java Batch
Inserter
● Java Batch Inserter
o Sort relationships (GNU sort)
o Try to keep index lookups to in-memory lookups only
 Giant HashMap!
But wait!!!
● Cypher CSV import
o 2.1 M01
o Supposed to be good for importing large data sets
o Anyone tried it?
Read Querying
● Always try Cypher first
o Performance is being improved
● How can you tell if performance is where you
need it to be?
o Time queries (cold vs. warm cache)
o Load testing!
Read Querying cont.
● Dark querying
o Great for benchmarking system where Neo4j
functionality is being injected
o Mitigates risk
o Provides results that are very close to real world
patterns
Read Querying cont.
● Reads too slow? Try these things.
o Write high-throughput business-critical queries in
Java
 unmanaged extension
 faster
 hard limits
o Cache shard
 country, age, gender, etc.
 you hit warm cache more often
Read Querying cont.
● Warm the cache!
o Touch all the nodes
o Touch all the relationships
Writing
● Decide which writes need to be synchronous
and which can be asynchronous
● Queue up asynchronous writes (routine
updates, non-vital to immediate user-
experience)
o Try to evenly distribute them
o How do we do this? Baserunner!
Baserunner
● Written by SNAP developer
● Walks userbase randomly instead of
sequentially
o This avoids pockets of heavily increased write
queries
o Allows us to do high-velocity updating of our data
Tuning the JVM
● For a really high-throughput environment,
G1 GC has been very helpful
o Good at adapting itself
o We experienced less system-stopping pauses than
with CMS
o Try CMS first but remember G1 as option
Hardware is Important
● Lots of memory
● Working set too big for memory?
o SSDs are helpful
o Optimization techniques discussed become much
more important
Not Everything is Your Fault!
● Like any software, Neo4j has bugs
● Developers are receptive
● File reports on Github when you find issues
Some stuff to look forward to...
● Relationship grouping (2.1 M01)
o helps mitigate the super node/dense node problem
● Ronja (rewrite of the Cypher query
language, 2.1?)
● More flexible label index searching (after
2.1)
Questions?

More Related Content

Similar to Running Neo4j in Production: Tips, Tricks and Optimizations

OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLoveOSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLoveNETWAYS
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automationRicardo Bánffy
 
The 5 Minute MySQL DBA
The 5 Minute MySQL DBAThe 5 Minute MySQL DBA
The 5 Minute MySQL DBAIrawan Soetomo
 
Tips about hibernate with spring data jpa
Tips about hibernate with spring data jpaTips about hibernate with spring data jpa
Tips about hibernate with spring data jpaThiago Dos Santos Hora
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingsaber tabatabaee
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)ncoghlan_dev
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional ProgrammerDave Cross
 
Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?mortardata
 
Unbreaking Your Django Application
Unbreaking Your Django ApplicationUnbreaking Your Django Application
Unbreaking Your Django ApplicationOSCON Byrum
 
Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)ncoghlan_dev
 
OSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015 | Testing in Production by Devdas BhagatOSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015 | Testing in Production by Devdas BhagatNETWAYS
 
OSMC 2015: Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas BhagatOSMC 2015: Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas BhagatNETWAYS
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Kris Buytaert
 
Bringing Open-Source Practices to Your Day Job
Bringing Open-Source Practices to Your Day JobBringing Open-Source Practices to Your Day Job
Bringing Open-Source Practices to Your Day JobBen Coe
 
Services, tools & practices for a software house
Services, tools & practices for a software houseServices, tools & practices for a software house
Services, tools & practices for a software houseParis Apostolopoulos
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversHYS Enterprise
 

Similar to Running Neo4j in Production: Tips, Tricks and Optimizations (20)

OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLoveOSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
OSDC 2015: Kris Buytaert | From ConfigManagementSucks to ConfigManagementLove
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automation
 
The 5 Minute MySQL DBA
The 5 Minute MySQL DBAThe 5 Minute MySQL DBA
The 5 Minute MySQL DBA
 
Tips about hibernate with spring data jpa
Tips about hibernate with spring data jpaTips about hibernate with spring data jpa
Tips about hibernate with spring data jpa
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancoding
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional Programmer
 
Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?
 
Unbreaking Your Django Application
Unbreaking Your Django ApplicationUnbreaking Your Django Application
Unbreaking Your Django Application
 
Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
OSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015 | Testing in Production by Devdas BhagatOSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015 | Testing in Production by Devdas Bhagat
 
OSMC 2015: Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas BhagatOSMC 2015: Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas Bhagat
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
 
Bringing Open-Source Practices to Your Day Job
Bringing Open-Source Practices to Your Day JobBringing Open-Source Practices to Your Day Job
Bringing Open-Source Practices to Your Day Job
 
Spaghetti gate
Spaghetti gateSpaghetti gate
Spaghetti gate
 
Services, tools & practices for a software house
Services, tools & practices for a software houseServices, tools & practices for a software house
Services, tools & practices for a software house
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Running Neo4j in Production: Tips, Tricks and Optimizations

  • 1. Running Neo4j in Production Tips, Tricks and Optimizations
  • 2. This Talk... ● How we scaled our prod graph ● Challenges faced doing this ● Various lessons we learned and techniques we used ● Some stuff I’m looking forward to in Neo4j
  • 3. SNAP Interactive ● Presented by David Fox (Big Data Engineer) ● Social dating app AYI (Are You Interested?) ● Friends and interests
  • 4. How We Use Neo4j ● Model the friend data of our millions of users ● Indicate connections everywhere on app ● 1.1+ billion nodes ● 8.5+ billion relationships ● 450gb+ store ● 3 instance cluster
  • 5. Importing lots of data ● Find the right tool o First try normal Cypher o No good? Bring out the big guns - Java Batch Inserter ● Java Batch Inserter o Sort relationships (GNU sort) o Try to keep index lookups to in-memory lookups only  Giant HashMap!
  • 6. But wait!!! ● Cypher CSV import o 2.1 M01 o Supposed to be good for importing large data sets o Anyone tried it?
  • 7. Read Querying ● Always try Cypher first o Performance is being improved ● How can you tell if performance is where you need it to be? o Time queries (cold vs. warm cache) o Load testing!
  • 8. Read Querying cont. ● Dark querying o Great for benchmarking system where Neo4j functionality is being injected o Mitigates risk o Provides results that are very close to real world patterns
  • 9. Read Querying cont. ● Reads too slow? Try these things. o Write high-throughput business-critical queries in Java  unmanaged extension  faster  hard limits o Cache shard  country, age, gender, etc.  you hit warm cache more often
  • 10. Read Querying cont. ● Warm the cache! o Touch all the nodes o Touch all the relationships
  • 11. Writing ● Decide which writes need to be synchronous and which can be asynchronous ● Queue up asynchronous writes (routine updates, non-vital to immediate user- experience) o Try to evenly distribute them o How do we do this? Baserunner!
  • 12. Baserunner ● Written by SNAP developer ● Walks userbase randomly instead of sequentially o This avoids pockets of heavily increased write queries o Allows us to do high-velocity updating of our data
  • 13. Tuning the JVM ● For a really high-throughput environment, G1 GC has been very helpful o Good at adapting itself o We experienced less system-stopping pauses than with CMS o Try CMS first but remember G1 as option
  • 14. Hardware is Important ● Lots of memory ● Working set too big for memory? o SSDs are helpful o Optimization techniques discussed become much more important
  • 15. Not Everything is Your Fault! ● Like any software, Neo4j has bugs ● Developers are receptive ● File reports on Github when you find issues
  • 16. Some stuff to look forward to... ● Relationship grouping (2.1 M01) o helps mitigate the super node/dense node problem ● Ronja (rewrite of the Cypher query language, 2.1?) ● More flexible label index searching (after 2.1)