SlideShare a Scribd company logo
1 of 20
Neo4j, Inc. All rights reserved 2022
Neo4j, Inc. All rights reserved 2022
1
Taming Large Databases
Ravindranatha Anthapu
Principal Consultant – Professional Services
Neo4j, Inc. All rights reserved 2022
2
Objectives
• What are Large Databases
• What issues are faced
◦ How to identify,
◦ What approaches can work,
◦ And how to educate customers to avoid these issues
• Understand some common mistakes (and how to avoid them!)
Neo4j, Inc. All rights reserved 2022
3
What are Large Databases and Why?
• How this was determined
• Not a technical limitation of Neo4j.
• More driven by licensing and infrastructure cost.
• Driven by issues observed in the field.
• What are Large Databases
◦ Databases bigger than 512 GB
• Large Databases are unforgiving with non-optimal models
• Query SLA’s can be hugely impacted.
Neo4j, Inc. All rights reserved 2022
4
Issues Observed
• Performance issues faced
• Data model not optimal
• Over reliance on indexes
• Over reliance on Property based Conditional Traversals
• Not understanding how property access works*
• Bad Write performance
• Not understanding how locking works in Neo4j*
* Addressed in 5.0 with new store format
Neo4j, Inc. All rights reserved 2022
5
Identify issues
• We will take a look different use cases gathered
• Identify what the issues can be
• Review the options to address the issues
Neo4j, Inc. All rights reserved 2022
6
Scenario 1
Node and Relationship stores
normal
Graph Data Size
● Node Store - 13 GB
● Relationships - 45 GB
● Property Store - 207 GB
● Property (Arrays) - 7 GB
● Property (Strings) - 149 GB
● Indexes - 166 GB
● Total - 587 GB
6
Index store size is too big.
Indicates over reliance on indexes
Property store is huge. Not an issue unless
they are being accessed too often. Query
performance depends on how it is written.
String Property store also huge. This means
there are lot of string properties. Query
performance depends on how it is written
Neo4j, Inc. All rights reserved 2022
7
Scenario 1 - Continued
7
After:346 total db hits in 4 ms.
Query is retrieving node once using index and using that node to traverse and make
decisions.
MATCH(profile:Profile {profileType: $profileType, profileId: toInteger($profileId)})
OPTIONAL MATCH (profile)-[:HAS_ACCESS_TO|:HAS_ADMIN_ACCESS_TO]-
>(starVendor:Vendor {id: '*'})
CALL apoc.when(
starVendor IS NOT NULL,
'
MATCH (p: Person)
RETURN p as publisher
',
'
OPTIONAL MATCH (profile)-[:HAS_ACCESS_TO]->(: Content)-
[:HAS_ASSOCIATED_ CONTENT]->(pub1: Person)
OPTIONAL MATCH (profile)-[:HAS_ACCESS_TO]->(:Vendor)-[:OWNS]-
>(:SubAccount)-[:HAS_ASSOCIATED_ CONTENT]->(pub2: Person)
WITH COALESCE(pub1, pub2) as person
WHERE person IS NOT NULL
RETURN DISTINCT person
',
{
profile: profile,
startVendor: starVendor
}
)
YIELD value
WITH value. person as pub ORDER BY pub.name
WITH collect(pub) as persons, count(pub) as totalCount
RETURN persons[0..50] as publishers, totalCount
Before: 216608 total db hits in 202 ms
Query is retrieving the same node multiple times.
MATCH (p:Person)
WHERE ((p)
<-[:HAS_ASSOCIATED_CONTENT]-(:Content)
<-[:HAS_ACCESS_TO]-(:Profile {profileType: $profileType, profileId:
toInteger($profileId)})
OR (p)
<-[:HAS_ASSOCIATED_CONTENT]-(:SubAccount)
<-[:OWNS]-(:Vendor)
<-[:HAS_ACCESS_TO]-(:Profile {profileType: $profileType, profileId:
toInteger($profileId)})
OR EXISTS(
(:Vendor {id: ‘*’})
<-[:HAS_ACCESS_TO]-(:Profile {profileType: $profileType,
profileId: toInteger($profileId)})
))
WITH p ORDER BY p.name
WITH collect(p) as persons, count(p) as totalCount
RETURN persons[$offset..($offset + $limit)] as persons, totalCount
Neo4j, Inc. All rights reserved 2022
Scenario 1 - Continued
• Why the first query too more time than the modified one?
• Even though Profile has an index for name and type, it cannot leverage
indexes as it is part of traversal.
WHERE ((p) <-[:HAS_ASSOCIATED_CONTENT]-(:Content)
<-[:HAS_ACCESS_TO]-(:Profile {profileType: $profileType, profileId: toInteger($profileId)})
• Just retrieving Profile node first and then using as part of traversal
reduces the amount of work done by DB.
• Leverage index store and reduce the property based conditional
traversals.
Neo4j, Inc. All rights reserved 2022
9
Scenario 2
• DB has 1.5 billion nodes and 4.5
billion relationships.
• DB is 2.2 TB in size.
• Address has incoming and
outgoing transactions with
amount moving in/out of
address.
• Queries
◦ Return the addresses current
balance
◦ Return the total number of
transactions an address had
made
9
Neo4j, Inc. All rights reserved 2022
10
Scenario 2 - Continued
• Model is simple and does not suffer from other issues.
• Query tries to calculate Address current balance at run time.
• Works well for Addresses with small number of transactions.
• For transactions with millions of transactions it takes lot of time.
• Best way to address this issue is to leverage triggers(transaction
handlers) that can update the Address node with in/out flow amounts.
• * In Neo4j Transaction Handlers are at DB level, and not at Node and Relationship level. So, needs to be careful to not
create more than one trigger.
Neo4j, Inc. All rights reserved 2022
11
Scenario 2 - Continued
• Given model is fine for answering current questions.
• Is it enough to answer these future questions?
• For a given date range what is the in/out flow of amounts for the address
• Provide the daily summary of in/out flows for a given address.
• What was the activity for the last 6 months?
• Number of transactions
• In/out flow (daily and total)
Neo4j, Inc. All rights reserved 2022
12
Scenario 2 - Continued
12
Neo4j, Inc. All rights reserved 2022
13
Scenario 2 - Continued
• With updated model we get these advantages
• We can answer how many transactions happened for a given day
• We can answer how much amount exchanged for a given day
• We can answer lot of statistical answers using daily summary nodes
• For a given address we can answer total amount details, daily amount movement
details as well as for a given date range.
• All of these queries can be answered using a small amount of Page cache (<
64 GB) even for a large database like this (> 3 TB) in a few ms time.
• For scenarios wanting to look at each transaction users can be reasonable
about SLA as large amount of data being shown.
Neo4j, Inc. All rights reserved 2022
14
Scenario 3 – Product Recommendation
• DB is 1 TB in size.
• Around 1000 views/second are
added.
• Queries
◦ On visiting a page, for the last
5 products visited show
related products as
recommendation.
14
Neo4j, Inc. All rights reserved 2022
15
Scenario 3 - Continued
• Model is simple and straight forward.
• Ingestion rate can be impacted for most popular products due to locking.
• If the Browser has lot of page views, getting product recommendation can
take time.
• It also requires more page cache as we need to retrieve all the views the
browser associated with and sort them to get the latest views and
products associated with them.
• As the DB grows it would require more page cache to be able to answer
the questions quickly.
Neo4j, Inc. All rights reserved 2022
16
Scenario 3 – Continued
16
Neo4j, Inc. All rights reserved 2022
17
Scenario 3 - Continued
• By introducing the StoreProduct node we reduce the locking pressure on
Product node. This improves write performance.
• Another change is introducing the LATEST relationship that points to the
LATEST Product view. This acts as a pointer to the latest view.
• Also, We connect the Product views using PREV relationship.
• Using LATEST and PREV relationships we can traverse the views in the order
they were created without relying on reading properties and sorting them. This
reduces the pressure on page cache and property store.
• Creating a Java Stored Procedure to answer the query (traverse only the
required steps) can make the query performance more constant even when
database grows. (Reduce the working data set, so we can use page cache
more effectively)
Neo4j, Inc. All rights reserved 2022
Summary
• Review the DB store sizes to see how the DB is growing. These values
are available in the debug.log.
• Check indexes and understand how they are being used.
• Lookup/collect/sort and filter are costlier when compared to traversals.
• Review queries for conditional property traversals and see if they can be
avoided using other traversal patterns.
• If there is a pattern of traverse a path till a condition is satisfied,
leveraging stored procedures might help with reducing the pressure on
page cache giving consistent performance.
Neo4j, Inc. All rights reserved 2022
Summary
• Think outside box like having aggregated data as the data builds if those
attributes are most frequently accessed.
Neo4j, Inc. All rights reserved 2022
Neo4j, Inc. All rights reserved 2022
Thank you!
Questions?
Answers!
20

More Related Content

Similar to Taming Large Databases

Webinar: The OpEx Business Plan for NoSQL
 Webinar: The OpEx Business Plan for NoSQL Webinar: The OpEx Business Plan for NoSQL
Webinar: The OpEx Business Plan for NoSQLMongoDB
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014Ryusuke Kajiyama
 
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas SuravarapuGraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas SuravarapuNeo4j
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and moreDenodo
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolEDB
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_dbhyeongchae lee
 
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB
 
Got documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionGot documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionMaggie Pint
 
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...Neo4j
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolEDB
 
Trillion graph : Distribuer les données connectées sur des centaines d'instan...
Trillion graph : Distribuer les données connectées sur des centaines d'instan...Trillion graph : Distribuer les données connectées sur des centaines d'instan...
Trillion graph : Distribuer les données connectées sur des centaines d'instan...Neo4j
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesCidar Mendizabal
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesMongoDB
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsIke Ellis
 
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Lucidworks
 
Iod session 3423 analytics patterns of expertise, the fast path to amazing ...
Iod session 3423   analytics patterns of expertise, the fast path to amazing ...Iod session 3423   analytics patterns of expertise, the fast path to amazing ...
Iod session 3423 analytics patterns of expertise, the fast path to amazing ...Rachel Bland
 

Similar to Taming Large Databases (20)

Webinar: The OpEx Business Plan for NoSQL
 Webinar: The OpEx Business Plan for NoSQL Webinar: The OpEx Business Plan for NoSQL
Webinar: The OpEx Business Plan for NoSQL
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014
 
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas SuravarapuGraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
 
Performance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and morePerformance Acceleration: Summaries, Recommendation, MPP and more
Performance Acceleration: Summaries, Recommendation, MPP and more
 
Dataweek-Talk-2014
Dataweek-Talk-2014Dataweek-Talk-2014
Dataweek-Talk-2014
 
Scalability and performance for e commerce
Scalability and performance for e commerceScalability and performance for e commerce
Scalability and performance for e commerce
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
 
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
 
Got documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionGot documents - The Raven Bouns Edition
Got documents - The Raven Bouns Edition
 
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
Trillion graph : Distribuer les données connectées sur des centaines d'instan...
Trillion graph : Distribuer les données connectées sur des centaines d'instan...Trillion graph : Distribuer les données connectées sur des centaines d'instan...
Trillion graph : Distribuer les données connectées sur des centaines d'instan...
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiences
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use Cases
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applications
 
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
 
Iod session 3423 analytics patterns of expertise, the fast path to amazing ...
Iod session 3423   analytics patterns of expertise, the fast path to amazing ...Iod session 3423   analytics patterns of expertise, the fast path to amazing ...
Iod session 3423 analytics patterns of expertise, the fast path to amazing ...
 

More from Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Taming Large Databases

  • 1. Neo4j, Inc. All rights reserved 2022 Neo4j, Inc. All rights reserved 2022 1 Taming Large Databases Ravindranatha Anthapu Principal Consultant – Professional Services
  • 2. Neo4j, Inc. All rights reserved 2022 2 Objectives • What are Large Databases • What issues are faced ◦ How to identify, ◦ What approaches can work, ◦ And how to educate customers to avoid these issues • Understand some common mistakes (and how to avoid them!)
  • 3. Neo4j, Inc. All rights reserved 2022 3 What are Large Databases and Why? • How this was determined • Not a technical limitation of Neo4j. • More driven by licensing and infrastructure cost. • Driven by issues observed in the field. • What are Large Databases ◦ Databases bigger than 512 GB • Large Databases are unforgiving with non-optimal models • Query SLA’s can be hugely impacted.
  • 4. Neo4j, Inc. All rights reserved 2022 4 Issues Observed • Performance issues faced • Data model not optimal • Over reliance on indexes • Over reliance on Property based Conditional Traversals • Not understanding how property access works* • Bad Write performance • Not understanding how locking works in Neo4j* * Addressed in 5.0 with new store format
  • 5. Neo4j, Inc. All rights reserved 2022 5 Identify issues • We will take a look different use cases gathered • Identify what the issues can be • Review the options to address the issues
  • 6. Neo4j, Inc. All rights reserved 2022 6 Scenario 1 Node and Relationship stores normal Graph Data Size ● Node Store - 13 GB ● Relationships - 45 GB ● Property Store - 207 GB ● Property (Arrays) - 7 GB ● Property (Strings) - 149 GB ● Indexes - 166 GB ● Total - 587 GB 6 Index store size is too big. Indicates over reliance on indexes Property store is huge. Not an issue unless they are being accessed too often. Query performance depends on how it is written. String Property store also huge. This means there are lot of string properties. Query performance depends on how it is written
  • 7. Neo4j, Inc. All rights reserved 2022 7 Scenario 1 - Continued 7 After:346 total db hits in 4 ms. Query is retrieving node once using index and using that node to traverse and make decisions. MATCH(profile:Profile {profileType: $profileType, profileId: toInteger($profileId)}) OPTIONAL MATCH (profile)-[:HAS_ACCESS_TO|:HAS_ADMIN_ACCESS_TO]- >(starVendor:Vendor {id: '*'}) CALL apoc.when( starVendor IS NOT NULL, ' MATCH (p: Person) RETURN p as publisher ', ' OPTIONAL MATCH (profile)-[:HAS_ACCESS_TO]->(: Content)- [:HAS_ASSOCIATED_ CONTENT]->(pub1: Person) OPTIONAL MATCH (profile)-[:HAS_ACCESS_TO]->(:Vendor)-[:OWNS]- >(:SubAccount)-[:HAS_ASSOCIATED_ CONTENT]->(pub2: Person) WITH COALESCE(pub1, pub2) as person WHERE person IS NOT NULL RETURN DISTINCT person ', { profile: profile, startVendor: starVendor } ) YIELD value WITH value. person as pub ORDER BY pub.name WITH collect(pub) as persons, count(pub) as totalCount RETURN persons[0..50] as publishers, totalCount Before: 216608 total db hits in 202 ms Query is retrieving the same node multiple times. MATCH (p:Person) WHERE ((p) <-[:HAS_ASSOCIATED_CONTENT]-(:Content) <-[:HAS_ACCESS_TO]-(:Profile {profileType: $profileType, profileId: toInteger($profileId)}) OR (p) <-[:HAS_ASSOCIATED_CONTENT]-(:SubAccount) <-[:OWNS]-(:Vendor) <-[:HAS_ACCESS_TO]-(:Profile {profileType: $profileType, profileId: toInteger($profileId)}) OR EXISTS( (:Vendor {id: ‘*’}) <-[:HAS_ACCESS_TO]-(:Profile {profileType: $profileType, profileId: toInteger($profileId)}) )) WITH p ORDER BY p.name WITH collect(p) as persons, count(p) as totalCount RETURN persons[$offset..($offset + $limit)] as persons, totalCount
  • 8. Neo4j, Inc. All rights reserved 2022 Scenario 1 - Continued • Why the first query too more time than the modified one? • Even though Profile has an index for name and type, it cannot leverage indexes as it is part of traversal. WHERE ((p) <-[:HAS_ASSOCIATED_CONTENT]-(:Content) <-[:HAS_ACCESS_TO]-(:Profile {profileType: $profileType, profileId: toInteger($profileId)}) • Just retrieving Profile node first and then using as part of traversal reduces the amount of work done by DB. • Leverage index store and reduce the property based conditional traversals.
  • 9. Neo4j, Inc. All rights reserved 2022 9 Scenario 2 • DB has 1.5 billion nodes and 4.5 billion relationships. • DB is 2.2 TB in size. • Address has incoming and outgoing transactions with amount moving in/out of address. • Queries ◦ Return the addresses current balance ◦ Return the total number of transactions an address had made 9
  • 10. Neo4j, Inc. All rights reserved 2022 10 Scenario 2 - Continued • Model is simple and does not suffer from other issues. • Query tries to calculate Address current balance at run time. • Works well for Addresses with small number of transactions. • For transactions with millions of transactions it takes lot of time. • Best way to address this issue is to leverage triggers(transaction handlers) that can update the Address node with in/out flow amounts. • * In Neo4j Transaction Handlers are at DB level, and not at Node and Relationship level. So, needs to be careful to not create more than one trigger.
  • 11. Neo4j, Inc. All rights reserved 2022 11 Scenario 2 - Continued • Given model is fine for answering current questions. • Is it enough to answer these future questions? • For a given date range what is the in/out flow of amounts for the address • Provide the daily summary of in/out flows for a given address. • What was the activity for the last 6 months? • Number of transactions • In/out flow (daily and total)
  • 12. Neo4j, Inc. All rights reserved 2022 12 Scenario 2 - Continued 12
  • 13. Neo4j, Inc. All rights reserved 2022 13 Scenario 2 - Continued • With updated model we get these advantages • We can answer how many transactions happened for a given day • We can answer how much amount exchanged for a given day • We can answer lot of statistical answers using daily summary nodes • For a given address we can answer total amount details, daily amount movement details as well as for a given date range. • All of these queries can be answered using a small amount of Page cache (< 64 GB) even for a large database like this (> 3 TB) in a few ms time. • For scenarios wanting to look at each transaction users can be reasonable about SLA as large amount of data being shown.
  • 14. Neo4j, Inc. All rights reserved 2022 14 Scenario 3 – Product Recommendation • DB is 1 TB in size. • Around 1000 views/second are added. • Queries ◦ On visiting a page, for the last 5 products visited show related products as recommendation. 14
  • 15. Neo4j, Inc. All rights reserved 2022 15 Scenario 3 - Continued • Model is simple and straight forward. • Ingestion rate can be impacted for most popular products due to locking. • If the Browser has lot of page views, getting product recommendation can take time. • It also requires more page cache as we need to retrieve all the views the browser associated with and sort them to get the latest views and products associated with them. • As the DB grows it would require more page cache to be able to answer the questions quickly.
  • 16. Neo4j, Inc. All rights reserved 2022 16 Scenario 3 – Continued 16
  • 17. Neo4j, Inc. All rights reserved 2022 17 Scenario 3 - Continued • By introducing the StoreProduct node we reduce the locking pressure on Product node. This improves write performance. • Another change is introducing the LATEST relationship that points to the LATEST Product view. This acts as a pointer to the latest view. • Also, We connect the Product views using PREV relationship. • Using LATEST and PREV relationships we can traverse the views in the order they were created without relying on reading properties and sorting them. This reduces the pressure on page cache and property store. • Creating a Java Stored Procedure to answer the query (traverse only the required steps) can make the query performance more constant even when database grows. (Reduce the working data set, so we can use page cache more effectively)
  • 18. Neo4j, Inc. All rights reserved 2022 Summary • Review the DB store sizes to see how the DB is growing. These values are available in the debug.log. • Check indexes and understand how they are being used. • Lookup/collect/sort and filter are costlier when compared to traversals. • Review queries for conditional property traversals and see if they can be avoided using other traversal patterns. • If there is a pattern of traverse a path till a condition is satisfied, leveraging stored procedures might help with reducing the pressure on page cache giving consistent performance.
  • 19. Neo4j, Inc. All rights reserved 2022 Summary • Think outside box like having aggregated data as the data builds if those attributes are most frequently accessed.
  • 20. Neo4j, Inc. All rights reserved 2022 Neo4j, Inc. All rights reserved 2022 Thank you! Questions? Answers! 20

Editor's Notes

  1. Databases greater than 512 GB can be considered large databases. This is mainly due to how the customers are using Neo4J more than any other aspects. Most of the times these instances are using around 256 GB RAM. This tag is mainly based on license and infrastructure cost. From my experience most of the customers are using 256 GB RAM instances for database this size. Application with continuous ingestion of data can make it difficult to get a consistent backup. Check consistency can be very time consuming Using separate servers for backup Say a backup is not consistent, what are the options to clean it for recovery purposes.
  2. Databases greater than 512 GB can be considered large databases. This is mainly due to how the customers are using Neo4J more than any other aspects. Most of the times these instances are using around 256 GB RAM. This tag is mainly based on license and infrastructure cost. From my experience most of the customers are using 256 GB RAM instances for database this size. Application with continuous ingestion of data can make it difficult to get a consistent backup. Check consistency can be very time consuming Using separate servers for backup Say a backup is not consistent, what are the options to clean it for recovery purposes.