Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Coordinating distributed task execution


Published on

Describes technologies to provide distributed task execution. Highlights difference between grid and client-server model.

Published in: Technology
  • Be the first to comment

Coordinating distributed task execution

  1. 1. Coordinating distributed task execution Vitalii Tymchyshyn, [email_address]
  2. 2. What will we be talking about <ul><li>You have a large set of tasks or task stream
  3. 3. Each task is relatively large, it's processing is CPU intensive and require multiple algorithms to be used
  4. 4. The goal is to maximize overall performance, not minimize processing time of single task </li></ul>
  5. 5. Real life Example <ul><li>The task is web crawling with content analysis
  6. 6. Complex Artificial Intelligence algorithms are used, each release changes resource consumption schema
  7. 7. Some algorithms take single page, others take domain as a whole
  8. 8. Target: 1000s of domains, 100000s of pages per day </li></ul>
  9. 9. Simple way <ul><li>to have multiple Vms
  10. 10. each fully does it's set of tasks
  11. 11. each task has it's working set on it's hand </li></ul>
  12. 12. Scalability with Clustering <ul>But: <li>Each algorithm initialization data takes memory while only one algorithm is running
  13. 13. Algorithm may require only small part of task data
  14. 14. Task processing at some point may be split and processed in parallel
  15. 15. Solution: Clustering </li></ul>
  16. 16. Example: web domain processing Get domain data Mark cities (needs world city index) Detect addresses Define primary address
  17. 17. Example: cluster setup Primary processor , reads task & Performs primary address detection City mark processor , Needs memory for city index, Works page by page, fast Address detector , Works page by page, Slow but you can have many of this because of low Memory footprint
  18. 18. Cluster in focus <ul><li>10-20 hardware nodes
  19. 19. FreeBSD OS, jails in use, so no multicast
  20. 20. Oracle Grid Engine (formely SGE) as cluster processes controller
  21. 21. Complex, memory consuming tasks, with JNI (crashes, long GC) </li></ul>
  22. 22. Grid vs Client-Server <ul>Grid </ul><ul>Client-Server </ul>
  23. 23. Grid vs Client-Server <ul><li>Latency is two times less
  24. 24. A lot more connections
  25. 25. Everyone is watching
  26. 26. Complex cluster membership change procedure </li></ul><ul><li>More robust
  27. 27. Servers can be clustered
  28. 28. Central point for debugging
  29. 29. No “watching deads” overhead for everyone </li></ul>
  30. 30. Hazelast <ul><li>One of few free Data Grids
  31. 31. Has built-in Computing Grid abstractions
  32. 32. Good support from developers
  33. 33. but
  34. 34. Bugs in non-usual usages
  35. 35. Simply did not work reliably in our environment </li></ul>
  36. 36. Grid Gain <ul><li>May fit like a glove
  37. 37. You'd better not make mitten out of glove
  38. 38. Heavy annotations use – problems with runtime configuration
  39. 39. Weakened interfaces – here are shotgun, you have your foot
  40. 40. Unsafe vendor support </li></ul>
  41. 41. ZooKeeper <ul><li>The thing we are using now
  42. 42. Low level primitives, yet there are 3rd party libraries with high level
  43. 43. Client-Server architecture
  44. 44. Clustered servers for stability and read scalability.
  45. 45. No write scalability
  46. 46. Part of Hadoop </li></ul>
  47. 47. Schema Task coordinator Task coordinator Task Data Storage (HDFS) Named prioritized task queue (ZooKeeper) Task coordinator Queue processor
  48. 48. Features <ul><li>Automatic coordinator/processor failure detection
  49. 49. Reliable task storage
  50. 50. Fully configurable load balancing/timeout/retry policy
  51. 51. Large storage block reuse
  52. 52. Easy queue monitoring </li></ul>
  53. 53. Typical configuration Wait queue Large queue Tiny queue Task coordinator Task coordinator Task distribution & IO Domain to page demultiplexor Per page Task processor Per page Task processor Per page Task processor Whole domain Task processor
  54. 54. Author contacts Vitalii Tymchyshyn [email_address] Skyрe: vittymchyshyn Twitter: @tivv00