dependency-solver

801 views

Published on

Published in: Technology, Art & Photos
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
801
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

dependency-solver

  1. 1. Introducing the Dependency Solver,a method for scheduling access to a shared resource Colin Horne (cdfh) Lokku Ltd December 4, 2010
  2. 2. About This talk describes a small module we use at Lokku Algorithm::DependencySolver This is on CPAN: http://search.cpan.org/~cdfh
  3. 3. About Lokku Lokku builds vertical search engines Mostly, we work on Nestoria Users search for property near a particular location For this, we need geodata We use data from Yahoo! GeoPlanet But we do many transformations on the data before it’s ready for use! That’s what the geobuild does
  4. 4. The GeobuildA case study for using the dependency solver Part of Lokku’s GIS (Geographical Information System) Performs various operations on geodata Each operation has the entire geodata as its input . . . and outputs the entire geodata as its output . . . i.e., each operation is a transformation of the geodata A mathematician might say that each operation is a homomorphism
  5. 5. The GeodataA case study Each country has its own geodata . . . and its own set of operations to perform on it Most operations are used by more than one country The geodata for a single country is stored in a table The table has a large number of columns . . . and a large number of rows Each row represents a geographical area of interest . . . a datum in the geodata Each operation reads and writes to only a small subset of the table’s columns
  6. 6. The Store The backend database A persistent object interface Uses Berkeley DB internally Implements permissions on the columns Permissions and dependency relations (depends/affects) are one and the same The store thus enforces the dependencies Note: Permissions are not related to security
  7. 7. The Store The Store isn’t part of the dependency solver It’s too tightly coupled to Lokku’s internal codebase to release on CPAN Users of the dependency solver will have to write their own Store :-( These slides provide some ideas of what features to include
  8. 8. The Dependency SolverWhat makes it different? CPAN already has Algorithm::Dependency What makes the dependency solver any different? Most dependency-related modules assume nodes depend on other nodes The key difference is that the dependency solver assumes nodes (or, rather, “operations”) depend on shared resources, which they can also affect (i.e., modify). In the context of this talk, resources are the geodata-table’s columns
  9. 9. Operations and Resources Operations depends/affects relations Resources
  10. 10. Introducing some notation Reads → Writes A B C D E → A B C D E This diagram shows how the table’s columns are affected by some operation The highlighted columns on the left are those which are read by the operation And those on the write are written by the operation
  11. 11. Introducing some notation Reads → Writes A B C D E → A B C D E All other columns are ignored by the operation The operation in this example doesn’t care about the columns B, C, D or E To this operation, the table only has one column: A
  12. 12. Introducing some notation Reads → Writes A B C D E → A B C D E A B C D E → A B C D E A B C D E → A B C D E Each row represents a single operation i.e., in this example, there are three operations
  13. 13. Scheduling operations Reads → Writes A B C D E → A B C D E A B C D E → A B C D E A B C D E → A B C D E A B C D E → A B C D E A B C D E → A B C D E In which order should these operations be run? Can we infer a correct order from the information above?
  14. 14. OperationsAssumptions Whenever a column’s data is changed, all operations which read that column must be re-run There must be no cycles i.e., an operation must not directly or indirectly cause itself to be re-run Therefore, an operation must not write to a column which it reads from
  15. 15. OperationsAssumptions Whenever a column’s data is changed, all operations which read that column must be re-run There must be no cycles i.e., an operation must not directly or indirectly cause itself to be re-run Therefore, an operation must not write to a column which it reads from Under these assumptions, one can only schedule n operations, where n is the number of columns in the table Any more would cause a cycle Not very useful. . . Relaxing these rules allows for a more useful tool
  16. 16. Operation sequences We can allow the user to depend on operations as well as shared resources By doing so, we can break any cycles which might occur This creates a “relative ordering”
  17. 17. Breaking the cycle Suppose we have some dependency configuration which results in the following dependency graph (Each node is an operation) 1 2 3 By saying: $node3.depends($node2) We obtain the following dependency graph 1 2 3
  18. 18. Breaking the cycle Suppose we have some dependency configuration which results in the following dependency graph (Each node is an operation) 1 2 3 By saying: $node2.depends($node3) We obtain the following dependency graph 1 3 2
  19. 19. An example of scheduling Operation Reads → Writes 1 A B C D E → A B C D E 2 A B C D E → A B C D E 3 A B C D E → A B C D E
  20. 20. An example of scheduling Operation Reads → Writes 1 A B C D E → A B C D E 2 A B C D E → A B C D E 3 A B C D E → A B C D E It would also be valid for 3 to run before 2 We can create a dependency graph to reflect this
  21. 21. An example of scheduling Operation Reads → Writes 1 A B C D E → A B C D E 2 A B C D E → A B C D E 3 A B C D E → A B C D E 2 1 3
  22. 22. An example of scheduling Operation Reads → Writes 2 A B C D E → A B C D E 1 A B C D E → A B C D E 3 A B C D E → A B C D E What happens if we run operation 2 before 1?
  23. 23. An example of scheduling Operation Reads → Writes 2 A B C D E → A B C D E 1 A B C D E → A B C D E 3 A B C D E → A B C D E What happens if we run operation 2 before 1?
  24. 24. An example of scheduling Operation Reads → Writes 2 A B C D E → A B C D E 1 A B C D E → A B C D E 3 A B C D E → A B C D E Column B gets written to without a subsequent read Operation 2 sees an old value for column B
  25. 25. An example of scheduling Operation Reads → Writes 2 A B C D E → A B C D E 1 A B C D E → A B C D E 3 A B C D E → A B C D E Column B gets written to without a subsequent read Operation 2 sees an old value for column B
  26. 26. An example of scheduling Operation Reads → Writes 2 A B C D E → A B C D E 1 A B C D E → A B C D E 2 A B C D E → A B C D E 3 A B C D E → A B C D E Column B gets written to without a subsequent read Operation 2 sees an old value for column B However, it is still valid providing operation 2 is re-run
  27. 27. An example of scheduling Operation Reads → Writes 2 A B C D E → A B C D E 1 A B C D E → A B C D E 2 A B C D E → A B C D E 3 A B C D E → A B C D E Column B gets written to without a subsequent read Operation 2 sees an old value for column B However, it is still valid providing operation 2 is re-run But the first run of operation 2 now has no effect
  28. 28. Relative Orderings Operation Reads → Writes 1 A B C D → A B C D 2 A B C D → A B C D 3 A B C D → A B C D How do we schedule this? Operation 2 must be run after operations 1 and 3 Because 2 reads column B, which 1 and 3 write to But operation 3 must be run after operation 2 Because 3 reads column C, which 2 writes to
  29. 29. Relative Orderings Operation Reads → Writes 1 A B C D → A B C D 2 A B C D → A B C D 3 A B C D → A B C D 1 2 3
  30. 30. Order matters To avoid cycles, specify an ordering: Operation 2 before operation 3 1 2 3 Operation 3 before operation 2 1 3 2
  31. 31. The dependency solver That’s it; that’s the dependency solver The dependency solver itself is very simple But it has interesting applications Automatic parallelisation Which is guaranteed to be without concurrency-related bugs Easy of managing complex systems Add another operation, and it just works! No need to understand what unrelated operations do
  32. 32. Done Lokku is hiring Programming interns Engineering manager lokku.com/jobs

×