Neo4j
Theory and Practice
Tareq Abedrabbo
Graph Connect - 19/11/2013
About me
•

CTO/Principal Consultant at OpenCredo

•

Working with Neo4j for (almost) 3 years on a
number of different pro...
What is this talk
about?
It’s for developers
designing and building
applications with Neo4j
It’s not a collection of war
stories but I will refer to
real-world examples
It is about sharing
thoughts and lessons
learnt in a useful way
“If I'm to believe Twitter, half of the
earth's population are importing
Wikipedia into Neo4j, for very
obscure reasons.”
Agenda
•

What is Neo4j?

•

Approaching graph-based applications
•

Design

•

Implementation

•

Test

•

Use cases

•

Lessons...
What really is Neo4j?
A graph model
A query engine
A database
Neo4j is a solid foundation
on which to build graphbased applications
How should I approach
graph-based
applications?
Is there a useful way to
categorise graph-based
applications?
Domain-centric
applications
Data-centric
applications
Domain-Centric
•

Well-defined data model

•

Data changes through user interactions

•

Flexible but predictable data stru...
Data-Centric
•

Complex connected data that typically models real
world networks

•

Integrated from a variety of differen...
Typically applications fall
somewhere between
these 2 types
How can I use the
information available in
my graph?
•

Search and pattern-matching
•

•

Graph algorithms
•

•

Find a recommendation based on behaviour

Shortest path, disco...
Graphs are naturally
data-driven
Use case 1:
Network Impact Analysis
Requirement: Identify the
impact of failing
components
Requirement: Identify
interesting patterns, such
as single points of failure
Labelled property graph
is a natural fit for the
model
Additional “dimensions” can be
added to capture abstract concepts:
network redundancy, load-balancing
Cypher queries are a
natural solution to delivering
the different requirements
Use case 2:
Oil flow optimisation
Requirement: Identify
candidate configurations
to maximise flow
Requirement: Identify the
most practical and valuable
adjustments to the network
Simply connected graph
with complex
components
Interlude: Genetic
Algorithms
•

Start from an initial population of candidate solutions
(individuals or phenotypes), ideally random

•

Attribute a sco...
Is this even a use
case for Neo4j?
Persist and share
calculated solutions
Inspect intermediary
steps
Use Cypher queries to
interrogate solutions
Lessons learnt
Understand your
domain
•

Don’t follow “best practices” blindly

•

For domain-centric applications you can use a
mapping framework, such as Spri...
Use Cypher
!

•

Expressive

•

Readable

•

Maintainable

•

Performant

•

Cypher + the web console is the quickest way to
experime...
Manage complexity
with domain knowledge
•

Graph algorithms are typically complex

•

Knowledge of the domain can simplify queries and
traversals
•

Make Cypher q...
Write robust and
flexible code
•

Break down problems into a small queries. Return
graph resources (or ids) to chain queries.

•

Robustness principal: “...
Start with a
representative dataset
•

Create a small data sets to capture the initial use
cases

•

Write simple unit tests using these datasets to
support d...
Move to a realistic
dataset as soon as
possible
•

A realistic data set
•

Should capture the complexity of the real data

•

Should be sufficiently large

•

Ideally base...
Test non-functional
aspects
•

Graph data is inherently flexible and evolving

•

Queries need to be correct and sufficiently performant

•

Existing qu...
Links
•

Twitter: @tareq_abedrabbo

•

Blog: http://www.terminalstate.net

•

OpenCredo: http://www.opencredo.com
Thank yo...
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Upcoming SlideShare
Loading in …5
×

Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

2,248 views

Published on

In this talk Tareq will discuss graph solutions based on his experiences building a varied mix of graph-based systems. He will be sharing techniques and approaches that he has learned and will focus on a number of concepts that may be applied to a wider context.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,248
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
103
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

  1. 1. Neo4j Theory and Practice Tareq Abedrabbo Graph Connect - 19/11/2013
  2. 2. About me • CTO/Principal Consultant at OpenCredo • Working with Neo4j for (almost) 3 years on a number of different projects • Co-author of Neo4j in Action (Manning)
  3. 3. What is this talk about?
  4. 4. It’s for developers designing and building applications with Neo4j
  5. 5. It’s not a collection of war stories but I will refer to real-world examples
  6. 6. It is about sharing thoughts and lessons learnt in a useful way
  7. 7. “If I'm to believe Twitter, half of the earth's population are importing Wikipedia into Neo4j, for very obscure reasons.”
  8. 8. Agenda
  9. 9. • What is Neo4j? • Approaching graph-based applications • Design • Implementation • Test • Use cases • Lessons learnt
  10. 10. What really is Neo4j?
  11. 11. A graph model
  12. 12. A query engine
  13. 13. A database
  14. 14. Neo4j is a solid foundation on which to build graphbased applications
  15. 15. How should I approach graph-based applications?
  16. 16. Is there a useful way to categorise graph-based applications?
  17. 17. Domain-centric applications
  18. 18. Data-centric applications
  19. 19. Domain-Centric • Well-defined data model • Data changes through user interactions • Flexible but predictable data structure(s) • Recommendation engines, social networks, etc… • Top-down design
  20. 20. Data-Centric • Complex connected data that typically models real world networks • Integrated from a variety of different sources • Data can be unpredictable • Telco networks, utility networks, etc… • bottom-up design
  21. 21. Typically applications fall somewhere between these 2 types
  22. 22. How can I use the information available in my graph?
  23. 23. • Search and pattern-matching • • Graph algorithms • • Find a recommendation based on behaviour Shortest path, disconnected components Optimisation • Maximise oil flow while minimising water
  24. 24. Graphs are naturally data-driven
  25. 25. Use case 1: Network Impact Analysis
  26. 26. Requirement: Identify the impact of failing components
  27. 27. Requirement: Identify interesting patterns, such as single points of failure
  28. 28. Labelled property graph is a natural fit for the model
  29. 29. Additional “dimensions” can be added to capture abstract concepts: network redundancy, load-balancing
  30. 30. Cypher queries are a natural solution to delivering the different requirements
  31. 31. Use case 2: Oil flow optimisation
  32. 32. Requirement: Identify candidate configurations to maximise flow
  33. 33. Requirement: Identify the most practical and valuable adjustments to the network
  34. 34. Simply connected graph with complex components
  35. 35. Interlude: Genetic Algorithms
  36. 36. • Start from an initial population of candidate solutions (individuals or phenotypes), ideally random • Attribute a score each solution using a fitness function • • The only place with specific business knowledge Apply genetic operators to create a new generation • • • Cross-breeding to retain best characteristics from each parent Mutation to maintain diversity and to avoid converging to a local optima too quickly Stop when you want!
  37. 37. Is this even a use case for Neo4j?
  38. 38. Persist and share calculated solutions
  39. 39. Inspect intermediary steps
  40. 40. Use Cypher queries to interrogate solutions
  41. 41. Lessons learnt
  42. 42. Understand your domain
  43. 43. • Don’t follow “best practices” blindly • For domain-centric applications you can use a mapping framework, such as Spring Data Neo4j • For data-centric applications, you should stay as close as possible to the graph model • In any case, don’t try to hide the graph!
  44. 44. Use Cypher
  45. 45. ! • Expressive • Readable • Maintainable • Performant • Cypher + the web console is the quickest way to experiment and to prototype solutions
  46. 46. Manage complexity with domain knowledge
  47. 47. • Graph algorithms are typically complex • Knowledge of the domain can simplify queries and traversals • Make Cypher queries as specific as possible • Take “shortcuts” when you know the domain
  48. 48. Write robust and flexible code
  49. 49. • Break down problems into a small queries. Return graph resources (or ids) to chain queries. • Robustness principal: “Be conservative in what you do, be liberal in what you accept from others” • Use assertions as preconditions • Assertions document intent • Fail fast if data doesn’t match
  50. 50. Start with a representative dataset
  51. 51. • Create a small data sets to capture the initial use cases • Write simple unit tests using these datasets to support design and implementation • These tests tend to become less useful when requirements are better understood • Throw them away!
  52. 52. Move to a realistic dataset as soon as possible
  53. 53. • A realistic data set • Should capture the complexity of the real data • Should be sufficiently large • Ideally based on production data • Write functional and integration tests against this dataset
  54. 54. Test non-functional aspects
  55. 55. • Graph data is inherently flexible and evolving • Queries need to be correct and sufficiently performant • Existing queries’s performance can degrade as the underlying model changes • Assertions on timeouts should be part of the test suite to detect loops and poor performance • JUnit’s @Test(timeout=5) • Spring’s @Timeout(value=5)
  56. 56. Links • Twitter: @tareq_abedrabbo • Blog: http://www.terminalstate.net • OpenCredo: http://www.opencredo.com Thank you!

×