SlideShare a Scribd company logo
1 of 25
Download to read offline
Neo4j
After 1 year in production
with Andrey Nikishaev
What we will talk about today
Neo4j internals
Cypher - query language
Extensions developing
Neo4j in production
Conclusion
Data
Properties
Linked lists of properties records. Key:Value in
each.
Node
Refers to its first Property & first node in its
relationship chain.
Relationship
Refers to its first Property & Start and End Nodes.
Also it refers to Prev/Next Relationship of its
Start/End Nodes.
All data in Neo4j is Linked lists with fixed size records.
● ID lookup = O(1)
● It's great at localized searches. E.g. to get the
people you follow.
● It's not great at aggregation. E.g. the nodes or
relationships aren't stored in any sorted order,
so deriving the 20 most popular users
requires a full scan.
● It suffers from the "supernode problem". At
least currently, a node's neighboring
relationships are stored as a flat list, so if you
have a million followers, fetching even one
person you follow is slow.
Caching
File Cache
Blocks of the same size.
Map blocks with OS Mmap to memory.
Evicts data by LFU policy
(hits vs misses).
Object Cache (removed in v2.3+)
Saves serialized data to memory to boost
queries.
No eviction policy (can eat all your memory)
Evicted only on transaction log sync(HA) or data
deletion.
To use it you should warm it up with query like
this:
MATCH (n)
OPTIONAL MATCH (n)-[r]->()
RETURN count(n.prop) + count(r.prop);
Transactions
As a context Tx using Thread Local Object.
Gathering lists of
commands
Sorting commands
(predictable
execution order)
Write commands to
Tx log
Mark Tx in log as
finished
Write to DB
Tx Log
Tx ID
Transactions
As a context Tx using Thread Local Object.
Gathering lists of
commands
Sorting commands
(predictable
execution order)
Write commands to
Tx log
Mark Tx in log as
finished
Write to DB
Tx Log
Tx ID
HA
Only Master-Slave replication
● Sync every N time (configurable).
● All writes only through the master. Writes on slave would
be done slower.
● Same Node/Rels IDs on all servers.
● Needs quorum for write else read-only mode.
● IDs allocated by blocks.
● Master elects by this rules:
○ Highest Tx ID.
○ If multiple: instance that was master for this Tx.
○ If unavailable: instance with the lowest clock value.
○ If multiple: instance with the lowest ID.
Cypher
MATCH (girl: Girl)
WHERE girl.age > 18 AND girl.age < 25
AND (
NOT (girl)-[:HAS_BOYFRIEND]->(some_dick: Guy) )
OR NOT (girl)-[:HAS_BOYFRIEND]->(pussy: Guy)-[:ENGAGED_IN]->(gym: Gym)
)
RETURN girl
ORDER BY girl.age ASC
Cypher
No query watcher
You should control each query that goes to a server, because a query can kill the server.
Read all data first
When you engage with properties(extend operation) data gets cached in memory, if it does not fit there
then query will crash(or even the server). Evan MATCH (n) DELETE n will fail if you have many nodes.
Locking
Making an update query doesn’t mean that you set an update lock, even in a transaction.
MATCH (n:Node)
SET n.count = n.count + 1
MATCH (n:Node)
SET n._lock = true
SET n.count = n.count + 1
FAIL PASS
More about this at: http://goo.gl/Cy3MEU
Cypher
You can try it on real data for free here: https://neo4j.com/sandbox-v2/
Similarity example.
Used recommendation dataset: 32314 Nodes, 332622 Relations
Top 25 similar users:
MATCH
(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
return [u1.name,u2.name] as pairs, count(*) as cnt
order by cnt desc
limit 25
Run time: 16366 ms. Number of pairs: 6 246 674
Most queries will not work
without warming up.
Use Indexes as much as
possible.
Cypher
> Sushi restaurants in New York that my friends like.
MATCH (person:Person)-[:IS_FRIEND_OF]->(friend),
(friend)-[:LIKES]->(restaurant:Restaurant),
(restaurant)-[:LOCATED_IN]->(loc:Location),
(restaurant)-[:SERVES]->(type:Cuisine)
WHERE person.name = 'Philip'
AND loc.location = 'New York'
AND type.cuisine = 'Sushi'
RETURN restaurant.name, count(*) AS occurrence
ORDER BY occurrence DESC
LIMIT 5
https://neo4j.com/developer/guide-build-a-recommendation-engine/
Extensions developing
User-Defined Procedures & Functions
Same as in SQL DBs
Unmanaged server extensions
Extensions that can create new API to work with Neo4j. You can even create
new Dashboard.
Server plugins
Extensions that only can extend Neo4j Core API.
Kernel extensions
Here you can do almost anything.
https://github.com/creotiv/neo4j-kernel-plugin-example
User-Defined Procedures & Functions (v3.0+ only)
public class Join
{
@UserFunction
@Description("example.join(['s1','s2',...], delimiter) - join the given strings with the
given delimiter.")
public String join(
@Name("strings") List<String> strings,
@Name(value = "delimiter", defaultValue = ",") String delimiter) {
if (strings == null || delimiter == null) {
return null;
}
return String.join(delimiter, strings);
}
}
Calling:
MATCH (p: Person)
WHERE p.age = 36
RETURN org.neo4j.examples.join(collect(p.names))
Unmanaged extensions
@Path("/helloworld")
public class HelloWorldResource {
private final GraphDatabaseService database;
public HelloWorldResource(@Context GraphDatabaseService database) {
this.database = database;
}
@GET
@Produces(MediaType.TEXT_PLAIN)
@Path("/{nodeId}")
public Response hello(@PathParam("nodeId") long nodeId) {
return Response.status(Status.OK).entity(
UTF8.encode("Hello World, nodeId=" + nodeId)).build();
}
}
Kernel extensions - Factory
public class ExampleKernelExtensionFactory extends KernelExtensionFactory<ExampleKernelExtensionFactory.Dependencies> {
public static abstract class ExampleSettings {
public static Setting<Boolean> debug = setting("examplekernelextension.debug", BOOLEAN, Settings.FALSE);
}
public ExampleKernelExtensionFactory() {super(SERVICE_NAME);}
@Override
public Lifecycle newKernelExtension(Dependencies dependencies) throws Throwable {
Config config = dependencies.getConfig();
return new ExampleExtension(dependencies.getGraphDatabaseService(), config.get(ExampleSettings.debug), ...);
}
public interface Dependencies {
GraphDatabaseService getGraphDatabaseService();
Config getConfig();
}
}
Kernel extensions - Extension
public class ExampleExtension implements Lifecycle {
...
public ExampleExtension(GraphDatabaseService gds, Boolean debug, String somevar) {
this.gds = gds;
this.debug = debug;
this.somevar = somevar;
}
@Override
public void init() throws Throwable {
handler = new ExampleEventHandler(gds, debug, somevar);
gds.registerTransactionEventHandler(handler);
}
... Start/Stop methods ...
@Override
public void shutdown() throws Throwable {
gds.unregisterTransactionEventHandler(handler);
}
}
Kernel extensions - Event Handler
class ExampleEventHandler implements TransactionEventHandler<String> {
...
@Override
public String beforeCommit(TransactionData transactionData) throws Exception {
updateConstraints();
return prepareCreatedNodes(transactionData);
}
@Override
public void afterCommit(TransactionData transactionData, String result) {
processCreatedNodes(result);
}
@Override
public void afterRollback(TransactionData transactionData, String result) {
error("Something bad happend, Harry: " + result);
}
}
Kernel extensions - Event Handler
Problems
beforeCommit (which should be run when DB is not changed)
You can’t access deleted nodes params, labels, relations, because they are already deleted. Yeah..
strange. So you need to gather them from events data.
afterCommit (which should be run after transaction committed and closed)
Executed when transaction is still opened, which will lead to deadlock(without any info and exception) if
you try to update your local db.
Local DB
- Bad API.
- You can’t access to the HA status of the local server, need to run requests through REST API.
- No way to access user request.
- Plugins can conflict with each other and cause deadlocks.
Neo4j in Production
Neo4j in Production - Cache-Based Sharding
Cache A Cache B Cache C
Router
Neo4j in Production - Settings
Log slow queries
dbms.querylog.enabled=true
dbms.querylog.threshold=4s
Logical logs for debug
keep_logical_logs=7 days
Enable online backup
online_backup_enabled=true
online_backup_server=127.0.0.1:6362
Number of threads (for concurrent access)
org.neo4j.server.webserver.maxthreads=64
(default number of CPUs)
Memory used for page cache
dbms.pagecache.memory=2g
Time of pulling updates from master
ha.pull_interval=10 (seconds)
Without timeout replication
Number of slaves to which Tx will be pushed
upon commit on master.(Optimistic - can mark Tx
success even if some pushes failed)
ha.tx_push_factor=1
Push strategy
Fixed push Txs based on server id order.
ha.tx_push_strategy=fixed|round_robin
Master to slave communication chunk size
ha.com_chunk_size=2M
Maximum number of connections a slave can have
to the master
ha.max_concurrent_channels_per_slave=20
http://neo4j.com/docs/stable/ha-configuration.html
Neo4j in Production - Performance
Use SSD
It is much cheaper than 16-32Gb RAM
IO tunning
Disable file and dir access time updates.
Set deadline scheduler for disk operations. This will increase read
speed but decrease write speed.
$ echo 'deadline' > /sys/block/sda/queue/scheduler
$ cat /sys/block/sda/queue/scheduler
Memory tunning
Set dbms.pagecache.memory to the size of *store*.db files +
20-40% for growth.
Leave some memory for OS
OS Memory = 1GB + (size of graph.db/index) + (size of
graph.db/schema)
If you see swapping then increase OS memory size.
JVM tunning
Set dbms.memory.heap.initial_size and
dbms.memory.heap.max_size to the same size to avoid
unwanted full garbage collection pauses.
Use concurrent Garbage Collector -XX:+UseG1GC
Set old/new generation ration -XX:NewRatio=N (1
minimum. calculated like old/new = ratio)
The more data updated in Txs the lower ratio you need.
Neo4j in Production - Problems
- Based on Java
- Not stable
- Problems with memory use and control
- No control over queries
- Problems with some silly queries like “delete all”
- No sharding
- No DC - replication
- No master-master replication
- Query planning is a mystery
- Can’t work without big amount of memory
- Dashboard shows unreal execution time
- Hell with plugin deployment
- Problems with data loss on master
death
- Problems with not synced data
during requests.
- Coming soon ...
Conclusion
70/30
Thank You!
User Stories: https://neo4j.com/case-studies/
Free Sand box with data: https://neo4j.com/sandbox-v2/
Kernel extension example https://github.com/creotiv/neo4j-kernel-plugin-example
Advanced locking: http://goo.gl/Cy3MEU
HA configuration: http://neo4j.com/docs/stable/ha-configuration.html
Andrey Nikishaev
creotiv@gmail.com
fb.me/anikishaev

More Related Content

What's hot

Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
Randall Hunt
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Michaël Figuière
 
glance replicator
glance replicatorglance replicator
glance replicator
irix_jp
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 

What's hot (20)

Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!
 
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Back to Basics Spanish Webinar 3 - Introducción a los replica sets
Back to Basics Spanish Webinar 3 - Introducción a los replica setsBack to Basics Spanish Webinar 3 - Introducción a los replica sets
Back to Basics Spanish Webinar 3 - Introducción a los replica sets
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
 
Troubleshooting PostgreSQL with pgCenter
Troubleshooting PostgreSQL with pgCenterTroubleshooting PostgreSQL with pgCenter
Troubleshooting PostgreSQL with pgCenter
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
 
glance replicator
glance replicatorglance replicator
glance replicator
 
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
 
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
 
Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumps
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Drivers APIs and Looking Forward
Drivers APIs and Looking ForwardDrivers APIs and Looking Forward
Drivers APIs and Looking Forward
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
 
Cassandra
CassandraCassandra
Cassandra
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
 

Similar to Neo4j after 1 year in production

Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
Ruben Verborgh
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
DataStax
 

Similar to Neo4j after 1 year in production (20)

Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Performance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware BottlenecksPerformance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware Bottlenecks
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFSData correlation using PySpark and HDFS
Data correlation using PySpark and HDFS
 
Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
[Globant summer take over] Empowering Big Data with Cassandra
[Globant summer take over] Empowering Big Data with Cassandra[Globant summer take over] Empowering Big Data with Cassandra
[Globant summer take over] Empowering Big Data with Cassandra
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 

More from Andrew Nikishaev

More from Andrew Nikishaev (10)

What is ML and how it can be used in sport
What is ML and how it can be used in sportWhat is ML and how it can be used in sport
What is ML and how it can be used in sport
 
Photo echance. Problems. Solutions. Ideas
Photo echance. Problems. Solutions. Ideas Photo echance. Problems. Solutions. Ideas
Photo echance. Problems. Solutions. Ideas
 
Crypto trading - the basics
Crypto trading - the basicsCrypto trading - the basics
Crypto trading - the basics
 
Machine learning for newbies
Machine learning for newbiesMachine learning for newbies
Machine learning for newbies
 
Ideal pitch - for investors and clients
Ideal pitch - for investors and clientsIdeal pitch - for investors and clients
Ideal pitch - for investors and clients
 
От идеи до рабочей MVP
От идеи до рабочей MVPОт идеи до рабочей MVP
От идеи до рабочей MVP
 
Sit&fit - uderdesk stepper trainer with charger
Sit&fit - uderdesk stepper trainer with chargerSit&fit - uderdesk stepper trainer with charger
Sit&fit - uderdesk stepper trainer with charger
 
Тонкости работы с Facebook
Тонкости работы с FacebookТонкости работы с Facebook
Тонкости работы с Facebook
 
Построение Business Model Canvas и Value Proposition Canvas
Построение Business Model Canvas и Value Proposition CanvasПостроение Business Model Canvas и Value Proposition Canvas
Построение Business Model Canvas и Value Proposition Canvas
 
Нетворкинг и Социальная Инженерия
Нетворкинг и Социальная ИнженерияНетворкинг и Социальная Инженерия
Нетворкинг и Социальная Инженерия
 

Recently uploaded

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Neo4j after 1 year in production

  • 1. Neo4j After 1 year in production with Andrey Nikishaev
  • 2. What we will talk about today Neo4j internals Cypher - query language Extensions developing Neo4j in production Conclusion
  • 3. Data Properties Linked lists of properties records. Key:Value in each. Node Refers to its first Property & first node in its relationship chain. Relationship Refers to its first Property & Start and End Nodes. Also it refers to Prev/Next Relationship of its Start/End Nodes. All data in Neo4j is Linked lists with fixed size records. ● ID lookup = O(1) ● It's great at localized searches. E.g. to get the people you follow. ● It's not great at aggregation. E.g. the nodes or relationships aren't stored in any sorted order, so deriving the 20 most popular users requires a full scan. ● It suffers from the "supernode problem". At least currently, a node's neighboring relationships are stored as a flat list, so if you have a million followers, fetching even one person you follow is slow.
  • 4. Caching File Cache Blocks of the same size. Map blocks with OS Mmap to memory. Evicts data by LFU policy (hits vs misses). Object Cache (removed in v2.3+) Saves serialized data to memory to boost queries. No eviction policy (can eat all your memory) Evicted only on transaction log sync(HA) or data deletion. To use it you should warm it up with query like this: MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN count(n.prop) + count(r.prop);
  • 5. Transactions As a context Tx using Thread Local Object. Gathering lists of commands Sorting commands (predictable execution order) Write commands to Tx log Mark Tx in log as finished Write to DB Tx Log Tx ID
  • 6. Transactions As a context Tx using Thread Local Object. Gathering lists of commands Sorting commands (predictable execution order) Write commands to Tx log Mark Tx in log as finished Write to DB Tx Log Tx ID
  • 7. HA Only Master-Slave replication ● Sync every N time (configurable). ● All writes only through the master. Writes on slave would be done slower. ● Same Node/Rels IDs on all servers. ● Needs quorum for write else read-only mode. ● IDs allocated by blocks. ● Master elects by this rules: ○ Highest Tx ID. ○ If multiple: instance that was master for this Tx. ○ If unavailable: instance with the lowest clock value. ○ If multiple: instance with the lowest ID.
  • 8. Cypher MATCH (girl: Girl) WHERE girl.age > 18 AND girl.age < 25 AND ( NOT (girl)-[:HAS_BOYFRIEND]->(some_dick: Guy) ) OR NOT (girl)-[:HAS_BOYFRIEND]->(pussy: Guy)-[:ENGAGED_IN]->(gym: Gym) ) RETURN girl ORDER BY girl.age ASC
  • 9. Cypher No query watcher You should control each query that goes to a server, because a query can kill the server. Read all data first When you engage with properties(extend operation) data gets cached in memory, if it does not fit there then query will crash(or even the server). Evan MATCH (n) DELETE n will fail if you have many nodes. Locking Making an update query doesn’t mean that you set an update lock, even in a transaction. MATCH (n:Node) SET n.count = n.count + 1 MATCH (n:Node) SET n._lock = true SET n.count = n.count + 1 FAIL PASS More about this at: http://goo.gl/Cy3MEU
  • 10. Cypher You can try it on real data for free here: https://neo4j.com/sandbox-v2/ Similarity example. Used recommendation dataset: 32314 Nodes, 332622 Relations Top 25 similar users: MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User) return [u1.name,u2.name] as pairs, count(*) as cnt order by cnt desc limit 25 Run time: 16366 ms. Number of pairs: 6 246 674 Most queries will not work without warming up. Use Indexes as much as possible.
  • 11. Cypher > Sushi restaurants in New York that my friends like. MATCH (person:Person)-[:IS_FRIEND_OF]->(friend), (friend)-[:LIKES]->(restaurant:Restaurant), (restaurant)-[:LOCATED_IN]->(loc:Location), (restaurant)-[:SERVES]->(type:Cuisine) WHERE person.name = 'Philip' AND loc.location = 'New York' AND type.cuisine = 'Sushi' RETURN restaurant.name, count(*) AS occurrence ORDER BY occurrence DESC LIMIT 5 https://neo4j.com/developer/guide-build-a-recommendation-engine/
  • 12. Extensions developing User-Defined Procedures & Functions Same as in SQL DBs Unmanaged server extensions Extensions that can create new API to work with Neo4j. You can even create new Dashboard. Server plugins Extensions that only can extend Neo4j Core API. Kernel extensions Here you can do almost anything. https://github.com/creotiv/neo4j-kernel-plugin-example
  • 13. User-Defined Procedures & Functions (v3.0+ only) public class Join { @UserFunction @Description("example.join(['s1','s2',...], delimiter) - join the given strings with the given delimiter.") public String join( @Name("strings") List<String> strings, @Name(value = "delimiter", defaultValue = ",") String delimiter) { if (strings == null || delimiter == null) { return null; } return String.join(delimiter, strings); } } Calling: MATCH (p: Person) WHERE p.age = 36 RETURN org.neo4j.examples.join(collect(p.names))
  • 14. Unmanaged extensions @Path("/helloworld") public class HelloWorldResource { private final GraphDatabaseService database; public HelloWorldResource(@Context GraphDatabaseService database) { this.database = database; } @GET @Produces(MediaType.TEXT_PLAIN) @Path("/{nodeId}") public Response hello(@PathParam("nodeId") long nodeId) { return Response.status(Status.OK).entity( UTF8.encode("Hello World, nodeId=" + nodeId)).build(); } }
  • 15. Kernel extensions - Factory public class ExampleKernelExtensionFactory extends KernelExtensionFactory<ExampleKernelExtensionFactory.Dependencies> { public static abstract class ExampleSettings { public static Setting<Boolean> debug = setting("examplekernelextension.debug", BOOLEAN, Settings.FALSE); } public ExampleKernelExtensionFactory() {super(SERVICE_NAME);} @Override public Lifecycle newKernelExtension(Dependencies dependencies) throws Throwable { Config config = dependencies.getConfig(); return new ExampleExtension(dependencies.getGraphDatabaseService(), config.get(ExampleSettings.debug), ...); } public interface Dependencies { GraphDatabaseService getGraphDatabaseService(); Config getConfig(); } }
  • 16. Kernel extensions - Extension public class ExampleExtension implements Lifecycle { ... public ExampleExtension(GraphDatabaseService gds, Boolean debug, String somevar) { this.gds = gds; this.debug = debug; this.somevar = somevar; } @Override public void init() throws Throwable { handler = new ExampleEventHandler(gds, debug, somevar); gds.registerTransactionEventHandler(handler); } ... Start/Stop methods ... @Override public void shutdown() throws Throwable { gds.unregisterTransactionEventHandler(handler); } }
  • 17. Kernel extensions - Event Handler class ExampleEventHandler implements TransactionEventHandler<String> { ... @Override public String beforeCommit(TransactionData transactionData) throws Exception { updateConstraints(); return prepareCreatedNodes(transactionData); } @Override public void afterCommit(TransactionData transactionData, String result) { processCreatedNodes(result); } @Override public void afterRollback(TransactionData transactionData, String result) { error("Something bad happend, Harry: " + result); } }
  • 18. Kernel extensions - Event Handler Problems beforeCommit (which should be run when DB is not changed) You can’t access deleted nodes params, labels, relations, because they are already deleted. Yeah.. strange. So you need to gather them from events data. afterCommit (which should be run after transaction committed and closed) Executed when transaction is still opened, which will lead to deadlock(without any info and exception) if you try to update your local db. Local DB - Bad API. - You can’t access to the HA status of the local server, need to run requests through REST API. - No way to access user request. - Plugins can conflict with each other and cause deadlocks.
  • 20. Neo4j in Production - Cache-Based Sharding Cache A Cache B Cache C Router
  • 21. Neo4j in Production - Settings Log slow queries dbms.querylog.enabled=true dbms.querylog.threshold=4s Logical logs for debug keep_logical_logs=7 days Enable online backup online_backup_enabled=true online_backup_server=127.0.0.1:6362 Number of threads (for concurrent access) org.neo4j.server.webserver.maxthreads=64 (default number of CPUs) Memory used for page cache dbms.pagecache.memory=2g Time of pulling updates from master ha.pull_interval=10 (seconds) Without timeout replication Number of slaves to which Tx will be pushed upon commit on master.(Optimistic - can mark Tx success even if some pushes failed) ha.tx_push_factor=1 Push strategy Fixed push Txs based on server id order. ha.tx_push_strategy=fixed|round_robin Master to slave communication chunk size ha.com_chunk_size=2M Maximum number of connections a slave can have to the master ha.max_concurrent_channels_per_slave=20 http://neo4j.com/docs/stable/ha-configuration.html
  • 22. Neo4j in Production - Performance Use SSD It is much cheaper than 16-32Gb RAM IO tunning Disable file and dir access time updates. Set deadline scheduler for disk operations. This will increase read speed but decrease write speed. $ echo 'deadline' > /sys/block/sda/queue/scheduler $ cat /sys/block/sda/queue/scheduler Memory tunning Set dbms.pagecache.memory to the size of *store*.db files + 20-40% for growth. Leave some memory for OS OS Memory = 1GB + (size of graph.db/index) + (size of graph.db/schema) If you see swapping then increase OS memory size. JVM tunning Set dbms.memory.heap.initial_size and dbms.memory.heap.max_size to the same size to avoid unwanted full garbage collection pauses. Use concurrent Garbage Collector -XX:+UseG1GC Set old/new generation ration -XX:NewRatio=N (1 minimum. calculated like old/new = ratio) The more data updated in Txs the lower ratio you need.
  • 23. Neo4j in Production - Problems - Based on Java - Not stable - Problems with memory use and control - No control over queries - Problems with some silly queries like “delete all” - No sharding - No DC - replication - No master-master replication - Query planning is a mystery - Can’t work without big amount of memory - Dashboard shows unreal execution time - Hell with plugin deployment - Problems with data loss on master death - Problems with not synced data during requests. - Coming soon ...
  • 25. Thank You! User Stories: https://neo4j.com/case-studies/ Free Sand box with data: https://neo4j.com/sandbox-v2/ Kernel extension example https://github.com/creotiv/neo4j-kernel-plugin-example Advanced locking: http://goo.gl/Cy3MEU HA configuration: http://neo4j.com/docs/stable/ha-configuration.html Andrey Nikishaev creotiv@gmail.com fb.me/anikishaev