Cloud east shutl_talk

384 views
162 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
384
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cloud east shutl_talk

  1. 1. Tuesday, 28 May 13
  2. 2. Tuesday, 28 May 13
  3. 3. How Neo4j helps Shutlto delivery even faster...Tuesday, 28 May 13
  4. 4. Volker Pachersenior developer @shutl@vpacherhttp://github.com/vpacherTuesday, 28 May 13
  5. 5. Tuesday, 28 May 13
  6. 6. • SaaS platformTuesday, 28 May 13
  7. 7. • SaaS platform• we provide an API for carriers and merchantsTuesday, 28 May 13
  8. 8. • SaaS platform• we provide an API for carriers and merchants• shutl.it C2C platformTuesday, 28 May 13
  9. 9. • SaaS platform• we provide an API for carriers and merchants• shutl.it C2C platform• customers can chose between a delivery either:Tuesday, 28 May 13
  10. 10. • SaaS platform• we provide an API for carriers and merchants• shutl.it C2C platform• customers can chose between a delivery either:within 90 minutes of purchaseTuesday, 28 May 13
  11. 11. • SaaS platform• we provide an API for carriers and merchants• shutl.it C2C platform• customers can chose between a delivery either:within 90 minutes of purchaseor a 1 hour window of their choiceTuesday, 28 May 13
  12. 12. • SaaS platform• we provide an API for carriers and merchants• shutl.it C2C platform• customers can chose between a delivery either:within 90 minutes of purchaseor a 1 hour window of their choice(same day or any day)Tuesday, 28 May 13
  13. 13. • SaaS platform• we provide an API for carriers and merchants• shutl.it C2C platform• customers can chose between a delivery either:within 90 minutes of purchaseor a 1 hour window of their choice(same day or any day)• fastest delivery to date 15:00 minTuesday, 28 May 13
  14. 14. • SaaS platform• we provide an API for carriers and merchants• shutl.it C2C platform• customers can chose between a delivery either:within 90 minutes of purchaseor a 1 hour window of their choice(same day or any day)• fastest delivery to date 15:00 min• SOA with services built using jRuby, sinatra, mongoDB and neo4jTuesday, 28 May 13
  15. 15. Tuesday, 28 May 13
  16. 16. Problems?Tuesday, 28 May 13
  17. 17. http://xkcd.com/287/Tuesday, 28 May 13
  18. 18. problems with our previous attempt (v1):Tuesday, 28 May 13
  19. 19. • exponential growth of joins in mysql with added featuresproblems with our previous attempt (v1):Tuesday, 28 May 13
  20. 20. • exponential growth of joins in mysql with added features• code base too complex and unmaintanableproblems with our previous attempt (v1):Tuesday, 28 May 13
  21. 21. • exponential growth of joins in mysql with added features• code base too complex and unmaintanable• api response time growing too large the more data was addedproblems with our previous attempt (v1):Tuesday, 28 May 13
  22. 22. • exponential growth of joins in mysql with added features• code base too complex and unmaintanable• api response time growing too large the more data was added• our fastest delivery was quicker then our slowest query!problems with our previous attempt (v1):Tuesday, 28 May 13
  23. 23. The case for graph databases:Tuesday, 28 May 13
  24. 24. The case for graph databases:• relationships are explicit stored (RDBS lack relationships)Tuesday, 28 May 13
  25. 25. The case for graph databases:• relationships are explicit stored (RDBS lack relationships)• domain modelling is simplified because adding new ‘subgraphs‘doesn’t affect the existing structure and queries (additive model)Tuesday, 28 May 13
  26. 26. The case for graph databases:• relationships are explicit stored (RDBS lack relationships)• domain modelling is simplified because adding new ‘subgraphs‘doesn’t affect the existing structure and queries (additive model)• white board friendlyTuesday, 28 May 13
  27. 27. The case for graph databases:• relationships are explicit stored (RDBS lack relationships)• domain modelling is simplified because adding new ‘subgraphs‘doesn’t affect the existing structure and queries (additive model)• white board friendly• schema-lessTuesday, 28 May 13
  28. 28. The case for graph databases:• relationships are explicit stored (RDBS lack relationships)• domain modelling is simplified because adding new ‘subgraphs‘doesn’t affect the existing structure and queries (additive model)• white board friendly• schema-less• db performance remains relatively constant because queries arelocalized to its portion of the graph. O(1) for same queryTuesday, 28 May 13
  29. 29. The case for graph databases:• relationships are explicit stored (RDBS lack relationships)• domain modelling is simplified because adding new ‘subgraphs‘doesn’t affect the existing structure and queries (additive model)• white board friendly• schema-less• db performance remains relatively constant because queries arelocalized to its portion of the graph. O(1) for same query• traversals of relationships are easy and very fastTuesday, 28 May 13
  30. 30. What is a graph anyway?Node 1 Node 2Node 4Node 3a collection of vertices (nodes)connected by edges (relationships)Tuesday, 28 May 13
  31. 31. a short historyLeonard Eulerthe seven bridges of Königsberg (1735)Tuesday, 28 May 13
  32. 32. directed graphNode 1 Node 2Node 4Node 3each relationship has a direction orone start node and one end nodeTuesday, 28 May 13
  33. 33. property graphname:Volker•nodes contain properties (key, value)•relationships have a type and are always directed•relationships can contain properties tooname: Sam:friendsname: Megan:knowssince: 2005name: Paul:friends:works_for:knowsTuesday, 28 May 13
  34. 34. a graph is its own index (constant query performance)Tuesday, 28 May 13
  35. 35. the case for Neo4jTuesday, 28 May 13
  36. 36. the case for Neo4j• we can run it embedded in the same jvmTuesday, 28 May 13
  37. 37. the case for Neo4j• we can run it embedded in the same jvm• we can use jruby as we know ruby very well alreadyTuesday, 28 May 13
  38. 38. the case for Neo4j• we can run it embedded in the same jvm• we can use jruby as we know ruby very well already• lots of good ruby libraries are available, we chose the neo4j gemby Andreas Ronge (https://github.com/andreasronge/neo4j)Tuesday, 28 May 13
  39. 39. the case for Neo4j• we can run it embedded in the same jvm• we can use jruby as we know ruby very well already• lots of good ruby libraries are available, we chose the neo4j gemby Andreas Ronge (https://github.com/andreasronge/neo4j)• it speaks cypherTuesday, 28 May 13
  40. 40. the case for Neo4j• we can run it embedded in the same jvm• we can use jruby as we know ruby very well already• lots of good ruby libraries are available, we chose the neo4j gemby Andreas Ronge (https://github.com/andreasronge/neo4j)• it speaks cypher• the guys from neotech are awesomeTuesday, 28 May 13
  41. 41. neo4j (jvm)flockdb (jvm)DEX (c++)OrientDB (jvm)Sones GraphDB (c#)some graph dbs available:Tuesday, 28 May 13
  42. 42. embedded vs. standalonepros:cons:better performancetransaction supportneo4j gem is availablewe can use cypher andtraversalonly the code running thedb has access to the dbaccess via rest api and cypherlanguage independent andcode doesn’t need to run onJVMnot as performantonly works with cyphertransaction is on a per querybasisneed to write model wrappersfor ourselvesTuesday, 28 May 13
  43. 43. gotchas and other stuff to consider:Tuesday, 28 May 13
  44. 44. gotchas and other stuff to consider:• testing proved to be difficult and we had to write our own toolsTuesday, 28 May 13
  45. 45. gotchas and other stuff to consider:• testing proved to be difficult and we had to write our own tools• migrations of schemaless dbs are more difficult to stay on top of and requirespecial solutions in the case of graph dbsTuesday, 28 May 13
  46. 46. gotchas and other stuff to consider:• testing proved to be difficult and we had to write our own tools• migrations of schemaless dbs are more difficult to stay on top of and requirespecial solutions in the case of graph dbs• seeding an embedded database is hardTuesday, 28 May 13
  47. 47. gotchas and other stuff to consider:• testing proved to be difficult and we had to write our own tools• migrations of schemaless dbs are more difficult to stay on top of and requirespecial solutions in the case of graph dbs• seeding an embedded database is hard• graph db partioning is almost impossible and the whole graph needs to be inmemoryTuesday, 28 May 13
  48. 48. gotchas and other stuff to consider:• testing proved to be difficult and we had to write our own tools• migrations of schemaless dbs are more difficult to stay on top of and requirespecial solutions in the case of graph dbs• seeding an embedded database is hard• graph db partioning is almost impossible and the whole graph needs to be inmemory• encoding Dates and Times that are stored in UTC and work across timezone isnon-trivialTuesday, 28 May 13
  49. 49. gotchas and other stuff to consider:• testing proved to be difficult and we had to write our own tools• migrations of schemaless dbs are more difficult to stay on top of and requirespecial solutions in the case of graph dbs• seeding an embedded database is hard• graph db partioning is almost impossible and the whole graph needs to be inmemory• encoding Dates and Times that are stored in UTC and work across timezone isnon-trivial• nested datastructure (hashes and array) can’t be stored and need to beconverted to jsonTuesday, 28 May 13
  50. 50. Querying the graph: CypherTuesday, 28 May 13
  51. 51. Querying the graph: Cypher• declarative query language specific to neo4jTuesday, 28 May 13
  52. 52. Querying the graph: Cypher• declarative query language specific to neo4j• easy to learn and intuitiveTuesday, 28 May 13
  53. 53. Querying the graph: Cypher• declarative query language specific to neo4j• easy to learn and intuitive• enables the user to specify specific patterns to query for (something that lookslike ‘this’)Tuesday, 28 May 13
  54. 54. Querying the graph: Cypher• declarative query language specific to neo4j• easy to learn and intuitive• enables the user to specify specific patterns to query for (something that lookslike ‘this’)• inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching)Tuesday, 28 May 13
  55. 55. Querying the graph: Cypher• declarative query language specific to neo4j• easy to learn and intuitive• enables the user to specify specific patterns to query for (something that lookslike ‘this’)• inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching)• focuses on what to query for and not how to query for itTuesday, 28 May 13
  56. 56. Querying the graph: Cypher• declarative query language specific to neo4j• easy to learn and intuitive• enables the user to specify specific patterns to query for (something that lookslike ‘this’)• inspired partly by SQL (WHERE and ORDER BY) and SPARQL (pattern matching)• focuses on what to query for and not how to query for it• switch from a mySQl world is made easier by the use of cypher instead of havingto learn a traversal framework straight awayTuesday, 28 May 13
  57. 57. • START: Starting points in the graph, obtained via index lookups or by element IDs.• MATCH: The graph pattern to match, bound to the starting points in START.• WHERE: Filtering criteria.• RETURN: What to return.• CREATE: Creates nodes and relationships.• DELETE: Removes nodes, relationships and properties.• SET: Set values to properties.• FOREACH: Performs updating actions once per element in a list.• WITH: Divides a query into multiple, distinct partscypher clausesTuesday, 28 May 13
  58. 58. an example graphNode 1meNode 2SteveNode 3SamNode 4DavidNode 5Meganme - [:knows] -> Steve -[:knows] -> Davidme - [:knows] -> Sam -[:knows] -> MeganMegan - [:knows] -> DavidknowsknowsknowsknowsknowsTuesday, 28 May 13
  59. 59. START me=node(1)MATCH me-[:knows]->()-[:knows]->fofRETURN fofthe queryTuesday, 28 May 13
  60. 60. START me=node(1)MATCH me-[:knows*2..]->fofWHERE fof.name =~ Da.*RETURN fofTuesday, 28 May 13
  61. 61. a good place to try it out:http://console.neo4j.org/Tuesday, 28 May 13
  62. 62. root (0)Year: 2013Month: 05 Month 01201401052013Year: 2014Month: 0606Day: 24 Day: 252425Day: 2626Event 1 Event 2 Event 3happens happens happens happensrepresenting dates/timesTuesday, 28 May 13
  63. 63. find all events on a specific daySTART root=node(0)MATCH root-[:‘2013’]-()-[:’05’]-()-[:’24’]-()-[:happens]-eventRETURN eventTuesday, 28 May 13
  64. 64. root (0)Year: 2013Month: 05 Month 01201401052013Year: 2014Month: 0606Day: 24 Day: 252425Day: 2626Event 1 Event 2 Event 3happens happens happens happensnext nextrepresenting dates/timesTuesday, 28 May 13
  65. 65. find all events for a given rangeSTART root=node(0)MATCH root-[:‘2013’]-()-[:’05’]-()-[:’24’]-start,root-[:‘2013’]-()-[:’05’]-()-[:’26’]-end,start-[:next*0..]-middle-[:next*0..]-end,middle-[:happens]-eventRETURN eventTuesday, 28 May 13
  66. 66. root (0)Year: 2013Month: 05 Month 01201401052013Year: 2014Month: 0606Day: 24 Day: 252425Day: 2626Event 1 (20) Event 2 Event 3happens happens happens happensnext nextrepresenting dates/timesTuesday, 28 May 13
  67. 67. does an event happen on a certain date?START event=node(20)MATCH event-[:’24’]-()-[:’05’]-()-[:‘2013’]-()RETURN eventTuesday, 28 May 13
  68. 68. testing and importing:• we are using rspec for all tests on the api and practice tdd/bdd• setting up ‘scenarios’ for an integration test was difficult and slow with existing tools• we decided to built our own dsl based on the geoff notation developed by Nigel Small toallow for the setting up of scenarios and for the import of data from mysqlTuesday, 28 May 13
  69. 69. geoff:developed by Nigel Small (@technige, http://geoff.nigelsmall.net/)allows modelling of graphs in a human readable form(A) {"name": "Alice"}(B) {"name": "Bob"}(A)-[:KNOWS]->(B)and provides a java interface to insert them into an existing graphTuesday, 28 May 13
  70. 70. • imports any geoff file into a neo4j db• it is open sourcegeoff-importer gem(https://github.com/shutl/geoff-importer)Tuesday, 28 May 13
  71. 71. • provides a dsl for creating a graph and inserting it into the db• it is open source• it works together with FactoryGirl(https://github.com/thoughtbot/factory_girl)• it supports only the graph structure of the neo4j gem at themoment• we haven’t solved all the issues with event listeners yetgeoff gem(https://github.com/shutl/geoff)Tuesday, 28 May 13
  72. 72. Geoff(Company, Person) docompany Acme doaddress "13 Something Road"outgoing :employees doperson Geoffperson Nigel doname Nigel Smallendendendcompany Github dooutgoing :customers doperson Tomperson Dickperson Harryendendperson Harry doincoming :customers docompany NeoTechendendendgeoff gem(https://github.com/shutl/geoff)Tuesday, 28 May 13
  73. 73. root node:company :personacme13 somthing roadNeoTechGitHub:all:all:allGeoffNigel SmallTomDickHarry:all:all:all:all:all:employees:employees:customers:customers:customersTuesday, 28 May 13
  74. 74. QUESTIONS?Volker Pachervolker@shutl.comwww.shutl.comTuesday, 28 May 13

×