Data array processing with Java language

2,223 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,223
On SlideShare
0
From Embeds
0
Number of Embeds
737
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data array processing with Java language

  1. 1. Data array processing Vitalii Tymchyshyn, [email_address] http://michaelgr.com/
  2. 2. Data array processing <ul><li>What will we be talking about
  3. 3. Real life Example
  4. 4. Main problems to be solved
  5. 5. Configurable task processing
  6. 6. Clustering solutions
  7. 7. Grid vs Client-Server
  8. 8. Controlling CPU usage </li></ul><ul>Vitalii Tymchyshyn, [email_address] </ul>
  9. 9. What will we be talking about <ul><li>You have a large set of tasks or task stream
  10. 10. Each task is relatively large, it's processing is CPU intensive and require multiple algorithms to be used
  11. 11. The goal is to maximize overall performance, not minimize processing time of single task </li></ul>
  12. 12. Real life Example <ul><li>The task is web crawling with content analysis
  13. 13. Complex Artificial Intelligence algorithms are used, each release changes resource consumption schema
  14. 14. Some algorithms take single page, others take domain as a whole
  15. 15. Target: 1000s of domains, 100000s of pages per day </li></ul>
  16. 16. Main problems to be solved <ul><li>Make single task processing be configurable, so that algorithms are independent and easily extended/replaced
  17. 17. Make solution scalable & solid
  18. 18. Make it use the equipment fully, but without overload </li></ul>
  19. 19. 1. Configurable task processing <ul><li>The most popular way is IoC container, e.g. Spring
  20. 20. Another option is data flow – beans do not call each other layer by layer, but are called by container one by one, taking input and producing output
  21. 21. Let's compare the options and check if second option has any benefits with “Hello, world” </li></ul>
  22. 22. “Hello, world” with IoC Daytime retriever Greeting text reader Greeting printer Answer printer User input reader Hello World runner
  23. 23. “Hello, world” graph dataflow Read greeting text Check day time Print greeting Read user input Print answer
  24. 24. Graph dataflow Pros & Cons <ul>Pros <li>Full decoupling
  25. 25. Easy parallel processing, clustering & savepoints
  26. 26. Automatic flow management
  27. 27. Single call to get data needed in many places
  28. 28. Data types instead of interfaces </li></ul>
  29. 29. Graph dataflow Pros & Cons <ul>Cons <li>IoC is still good to use to manage common resources and complex nodes :) </li></ul>
  30. 30. Our graph vs BPM <ul><li>Lighweight
  31. 31. Connections is data, no central storage
  32. 32. Targeted on small (minutes to hours) automated CPU intensive jobs, subtasks from millisecons to minutes
  33. 33. Highly configurable clustering </li></ul>
  34. 34. Conslusion With graph dataflow we have algorithm parts as independent blocks Time to use this block to fill our equipment efficiently.
  35. 35. 2. Scalability with Clustering <ul>Simple way is: <li>to have multiple Vms
  36. 36. each fully does it's set of tasks
  37. 37. each task has it's working set on it's hand </li></ul>
  38. 38. 2. Scalability with Clustering <ul>But: <li>Each algorithm initialization data takes memory while only one algorithm is running
  39. 39. Algorithm may require only small part of task data
  40. 40. Task processing at some point may be split and processed in parallel
  41. 41. Solution: Clustering </li></ul>
  42. 42. Example: web domain processing Get domain data Mark cities (needs world city index) Detect addresses Define primary address
  43. 43. Example: cluster setup Primary processor , reads task & Performs primary address detection City mark processor , Needs memory for city index, Works page by page, fast Address detector , Works page by page, Slow but you can have many of this because of low Memory footprint
  44. 44. Cluster in focus <ul><li>10-20 hardware nodes
  45. 45. FreeBSD OS, jails in use, so no multicast
  46. 46. Oracle Grid Engine (formely SGE) as cluster processes controller
  47. 47. Complex, memory consuming tasks, with JNI (crashes, long GC) </li></ul>
  48. 48. Two faces of cluster Janus <ul>Data cluster <li>Is good for task data to be stored in
  49. 49. Can be replaced with central data warehouse, but scalability will suffer
  50. 50. You would better separate it from computing VM if computing is complex
  51. 51. Can perform Computing cluster functions </li></ul><ul>Computing cluster <li>Is good for running tasks from multiple task producers
  52. 52. Can be grid-based or client-server
  53. 53. Multiple small clusters may be better then one large </li></ul>
  54. 54. Hazelast <ul><li>One of few free Data Grids
  55. 55. Has built-in Computing Grid abstractions
  56. 56. Good support from developers
  57. 57. but
  58. 58. Bugs in non-usual usages
  59. 59. Simply did not work reliably in our environment </li></ul>
  60. 60. Grid Gain <ul><li>May fit like a glove
  61. 61. You'd better not make mitten out of glove
  62. 62. Heavy annotations use – problems with runtime configuration
  63. 63. Weakened interfaces – here are shotgun, you have your foot
  64. 64. Unsafe vendor support </li></ul>
  65. 65. ZooKeeper <ul><li>The thing we are using now
  66. 66. Low level primitives, yet there are 3rd party libraries with high level
  67. 67. Client-Server architecture
  68. 68. Clustered servers for stability and read scalability.
  69. 69. No write scalability
  70. 70. Part of Hadoop </li></ul>
  71. 71. HDFS <ul><li>Has Single Point of Failure
  72. 72. Name node memory requirements are linear from number of files
  73. 73. Uses TCP (don't forget to tune OS tcp stack)
  74. 74. Much like unix file system
  75. 75. Again, part of hadoop </li></ul>
  76. 76. Two types of The Time Wall Time = CPU Time + Wait + Latency <ul><li>External wait is managed with cooperative multitasking (discussed later)
  77. 77. Latency is vital for interactive services, but has low priority for data processing </li></ul>
  78. 78. Grid vs Client-Server <ul>Grid </ul><ul>Client-Server </ul>
  79. 79. Grid vs Client-Server <ul><li>Latency is two times less
  80. 80. A lot more connections
  81. 81. Everyone is watching
  82. 82. Complex cluster membership change procedure </li></ul><ul><li>More robust
  83. 83. Servers can be clustered
  84. 84. Central point for debugging
  85. 85. No “watching deads” overhead for everyone </li></ul>
  86. 86. Conclusion Now our tasks are spread on our equipment. Time to prevent resource overload!
  87. 87. 3. Controlling CPU usage <ul><li>Low load means processing power not used
  88. 88. High load means that: </li><ul><li>Parallel tasks unnecessary take memory
  89. 89. High System time because of context switches
  90. 90. CPU caches are split on different switching tasks
  91. 91. Lower total throughput </li></ul></ul>
  92. 92. Parallel vs Sequential on single CPU Sequential (LA=1): Average finish time = (1+2+3+4)/4 = 2.5 Parallel (LA=4): Average finish time = (4+4+4+4)/4 = 4 Task 1 Task 3 Task 2 Task 4 Task 1 Task 4 Task 3 Task 2
  93. 93. <ul>Thread-pool: <ul><li>Multiple tasks are processed at one time by different threads
  94. 94. There should be enough threads to use CPU while someone's blocked
  95. 95. There should not be too much threads for not to overload CPU </li></ul></ul><ul>Event-based: <ul><li>One task is processed at given time
  96. 96. There must be no blocking
  97. 97. Any blockable call must be replaced with callback / event </li></ul></ul>Popular options to process tasks
  98. 98. Thread-pool vs event-based <ul>Thread-pool: <li>Pros </li><ul><li>Simple procedural model
  99. 99. A lot of libraries & frameworks </li></ul><li>Cons </li><ul><li>Context-switch storms
  100. 100. Per-thread memory
  101. 101. Average speed </li></ul></ul><ul>Event-based: <li>Pros </li><ul><li>Optimal pool size
  102. 102. No waiting threads memory overhead </li></ul><li>Cons </li><ul><li>More complex event-based programming
  103. 103. Little supporting libraries & frameworks </li></ul></ul>
  104. 104. Introducing cooperative multitask <ul><li>It's much like thread-pool variant, but:
  105. 105. Each wait (IO) is signaled to multitask coordinator
  106. 106. Only one thread can be in no-wait state, another thread exiting wait will block on a mutex.
  107. 107. If system is overloaded (mutex is always taken), new tasks are not run. </li></ul>
  108. 108. Cooperative multitasking features <ul><li>Still simple procedural model
  109. 109. Controlled CPU usage
  110. 110. Low waiting thread memory usage because of no layered calls in graph dataflow
  111. 111. All regular frameworks & libraries are available
  112. 112. Dynamic thread pool size </li></ul>
  113. 113. Author contacts Vitalii Tymchyshyn [email_address] Skyрe: vittymchyshyn

×