A brief introduction to neo4j and graph databases focusing in the first part on the differences with relational databases and on the second part on the Cypher Query Language. In the end of the presentation I included some benchmarks of Neo4j compared to other solutions.
2. The problem
Need to scale to big data
Data always more connected
Each user action creates data
relationships
SQL JOIN doesn’t scale
Big and complex queries
3. Solution: NoSQL and graph databases
NoSQL for scalability
Neo4j leading graph database
“Neo4j is a highly scalable native graph database
that leverages data relationships as first-class
entities, helping enterprises build intelligent
applications to meet today’s evolving data
challenges.”
Intuitive approach to data
4. Neo4j key characteristics and features
● Flexible schema
● Native graph storage
● Native graph processing
● ACID
● Rest API
● Data visualization
● Cypher query language
6. Pricing and scaling
Community edition GPLv3
Neo4j Enterprise edition:
● Different licenses
● In-memory page cache
● Clustering
● Cache sharding
● Monitoring
7. Cypher query language
SQL is bad for graph queries
Cypher as a declarative graph query language
● Ask for data to match specific patterns
● Expressive and efficient
● Understandable by non technical people
12. Query comparison
SELECT product.product_name as Recommendation, count(1) as Frequency
FROM product, customer_product_mapping, (SELECT cpm3.product_id, cpm3.customer_id
FROM Customer_product_mapping cpm, Customer_product_mapping cpm2, Customer_product_mapping cpm3
WHERE cpm.customer_id = ‘customer-one’
and cpm.product_id = cpm2.product_id
and cpm2.customer_id != ‘customer-one’
and cpm3.customer_id = cpm2.customer_id
and cpm3.product_id not in (select distinct product_id
FROM Customer_product_mapping cpm
WHERE cpm.customer_id = ‘customer-one’)
) recommended_products
WHERE customer_product_mapping.product_id = product.product_id
and customer_product_mapping.product_id in recommended_products.product_id
and customer_product_mapping.customer_id = recommended_products.customer_id
GROUP BY product.product_name
ORDER BY Frequency desc
MATCH(u:Customer{customer_id:'customer-one'})-[:BOUGHT]->(p:Product)<-[:BOUGHT]-(peer:
Customer)-[:BOUGHT]->(reco:Product)
WHERE not (u)-[:BOUGHT]->(reco)
RETURN reco as Recommendation, count(*) as Frequency
ORDER BY Frequency DESC LIMIT 5;
SQL CYPHER
13. Neo4j vs SQL: social graph
Sample social graph
● With 1000 persons
● Average of 50 friends per person
● PathExists(a,b) limited to depth 4
● Cache warmed up
16. Case for graph databases as default choice
Graph databases as successor of RDBMS:
● ACID
● High applicability
● Good performance in the majority of scenarios
● Faster development
But for this to happen we need a standard language...