Task scheduling algorithm for multicore
processor system for minimizing recovery
time in case of single node fault
1
Shohe...
Background
• Multicore processors
 Almost all processors designed recently are
multicore processors
• Computing cluster c...
Objective of Research
• Fault tolerance
 We assume a single fail-stop failure of a multicore
processor
• Network contenti...
Task Graph
• A group of tasks that can be
executed in parallel
• Vertex (task node)
Task to be executed on a single
CPU c...
Processor Graph
• Topology of the computer network
• Vertex (Processor node)
CPU core (circle)
• has only one link
Switc...
Task Scheduling
• Task scheduling problem
 assigns a processor node to each
task node
 minimizes total execution time
A...
Inputs and Outputs for Task Scheduling
• Inputs
Task graph and processor graph
• Output
A schedule
• which is an assignm...
Network Contention Model
• Communication delay
If processor link is occupied by another
communication
• We use existing n...
Multicore Processor Model
• Each core executes a task
independently from other cores
• Communication between cores
finishe...
Influence of Multicore Processors
10
• Need for considering multicore
processors in scheduling
High speed communication l...
• Need for considering multicore
processors in scheduling
High speed communication link
among processors on a single die
...
Related Work (1/2)
• Checkpointing [3]
Node state is saved in each node
Backup node is allocated
Recover processing res...
Related Work (2/2)
• Task scheduling method[5] in which
 Multiple task graph templates are prepared
beforehand,
 Process...
Our Contribution
• There is no existing method for scheduling
that takes account of both
• multicore processor failure
• n...
Assumptions
• Only a single fail-stop failure of a multicore
processor can occur
Failed computing node automatically rest...
Checkpointing and Recovery
• Each processor node saves state to the main memory
when each task is finished
 Saved state i...
What Proposed Method Tries to Do
• Reduce recovery time in case of failure
 Minimizes the worst case total execution time...
Worst Case Scenario
• Critical path
Path in task graph from first to last task with longest
execution time
• The worst ca...
Idea of Proposed Method
• We distribute tasks on critical path over dies
But, there is communication overhead
If we dist...
Problem with Existing Method
20
1 2
3
A B C
21
3
BA
Resulting execution
Existing
Schedule
D
DC
• Task 1 is assigned to cor...
• Suppose that failure happens
when Task 3 is being executed
• All results are lost
21
1 2
3
A B C
21
3
BA
D
DC
Resulting ...
Problem with Existing Method
22
1 2
3
A B C
21
3
BA
D
DC
1’ 2’
3’
21
3
Resulting execution
Existing
Schedule
Time
• Suppos...
Improvement in Proposed Method
• Distribute influential tasks to
other dies
In this case, task 3 is the most
influential
...
Recovery in Proposed Method
• Suppose that failure happens
when Task 3 is being executed
• Results of Task 1 and 2 are sav...
Recovery in Proposed Method
• Suppose that failure happens
when Task 3 is being executed
• Results of Task 1 and 2 are sav...
Communication Overhead
• Communication overhead is imposed to the
proposed method
26
Existing schedule Proposed schedule
o...
Speed-up in Recovery
27
Recovery with
existing schedule
Recovery with
proposed schedule
Proposed method has larger effect
...
Comparison of Schedules
28
Existing schedule Proposed schedule
Time
Time
Task graph
10 32
Processor graph
1
2
6 7
3 4
8 9
...
29
Not
available
Comparison of Recovery
Existing schedule
Proposed schedule
Time
Time
Task graph
10 32
Processor graph
1
2...
Evaluation
• Items to compare
Recovery time in case of a failure
Overhead in case of no failure
• Compared methods
PROP...
Test Environment
• Devices
4 PCs with
• Intel Core i7 920 (2.67GHz) (Quad core)
• Intel Network Interface Card
 Intel Gi...
Task Graph with Low Parallelism
Configuration
• Number of task nodes:90
• Number of cores on a die:2
• Number of dies:2~4
...
Results with Robot Control Task
• We varied number of dies
• In case of failure, proposed method reduced
total execution t...
Configuration
• Number of task nodes:98
• Number of cores on a die:4
• Number of dies:2~4
• Sparse matrix solver [4]
34
10...
Results with Sparse Matrix Solver
• We varied number of dies
• In case of failure, execution time including
recovery reduc...
Simulation with Varied CCR
• CCR
Ratio between comm. time and comp. time
High CCR means long communication time
• Number...
• We varied CCR
• INTERLEAVED has large overhead when
CCR=10 (communication heavy)
• PROPOSED has 30% overhead, but reduce...
Effect of Parallelization of Proposed Scheduler
• Proposed algorithm is parallelized
• Compared times to generate schedule...
Conclusion
• Proposed task scheduling method considering
Network contention
Single fail-stop failure
Multicore processo...
Shohei Gotoda, Naoki Shibata and Minoru Ito :
"Task scheduling algorithm for multicore
processor system for minimizing rec...
Upcoming SlideShare
Loading in...5
×

(Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

4,366

Published on

Shohei Gotoda, Naoki Shibata and Minoru Ito : "Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault," Proceedings of IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2012), pp.260-267, DOI:10.1109/CCGrid.2012.23, May 15, 2012.

In this paper, we propose a task scheduling al-gorithm for a multicore processor system which reduces the
recovery time in case of a single fail-stop failure of a multicore
processor. Many of the recently developed processors have
multiple cores on a single die, so that one failure of a computing
node results in failure of many processors. In the case of a failure
of a multicore processor, all tasks which have been executed
on the failed multicore processor have to be recovered at once.
The proposed algorithm is based on an existing checkpointing
technique, and we assume that the state is saved when nodes
send results to the next node. If a series of computations that
depends on former results is executed on a single die, we need
to execute all parts of the series of computations again in
the case of failure of the processor. The proposed scheduling
algorithm tries not to concentrate tasks to processors on a die.
We designed our algorithm as a parallel algorithm that achieves
O(n) speedup where n is the number of processors. We evaluated
our method using simulations and experiments with four PCs.
We compared our method with existing scheduling method, and
in the simulation, the execution time including recovery time in
the case of a node failure is reduced by up to 50% while the
overhead in the case of no failure was a few percent in typical
scenarios.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,366
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
83
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Recently, almost all processors are designed as multicore processors, and these are commonly used in datacenters.
    On the other hand, computing cluster consisting of 1800 nodes experiences about 1000 failures in the first year, according to this cnet.com article.
  • So, the objective of this research is to devise a task scheduling method that minimizes recovery time
    taking account of fault tolerance of multicore processors and network contention.
  • Now I define some terms used in our research.
    A task graph is a group of tasks that can be executed in parallel.
    A vertex of a task graph is a task to be executed on a single CPU core.
    Each edge represents data dependence between these tasks.
  • Processor graph is the topology of the computing system.
    Each round vertex represents a CPU core.
    Each rectangular vertex represents a switch that does not have computing capability.
  • The task scheduling problem is an np-hard problem to assign a processor node to each task node.
    In this figure, processor 1 is assigned to this task graph node.
  • The inputs of the task scheduling problem is these two graphs,
    and output is a schedule, which is an assignment of a processor node to each task node.
    The objective function is usually to minimize the total task execution time.
  • Our proposed method takes account of network contention based on the model
    proposed by Oliver Sinnen. In this model, if a processor link is occupied by another communication,
    there is communication delay.
  • A multicore processor is modeled like this.
    We assign a task to each of cores.
    Since the cores share the main memory, we assume that communication between cores finishes instantaneously.
    A network interface is shared among the cores, so one die of a multicore processor is model like this graph.
    We assume that all cores on a die stop simultaneously in case of a fault.
  • There is a need for considering multicore processors in scheduling.
    Since the communication link among cores on a die has high bandwidth, existing scheduler tries to utilize this link to minimize the total
    execution time. As a result, many dependent tasks are assigned to cores on a die.

    But, if a failure happens, many dependent tasks and their results are destroyed.
  • There is a need for considering multicore processors in scheduling.
    Since the communication link among cores on a die has high bandwidth, existing scheduler tries to utilize this link to minimize the total
    execution time. As a result, many dependent tasks are assigned to cores on a die.

    But, if a failure happens, many dependent tasks and their results are destroyed.
  • I now explain a related work. A checkpointing technique is proposed in this paper.
    In this paper, node state is saved in each node, and recovery is made by these saved states.


  • As far as we surveyed, there is no existing method for scheduling that takes account of both
    multicore processor failure and network contention.

    So, we proposed a scheduling method taking account of both of these things.
  • I now explain the assumptions made in our research.
    We assume that only a single fail-stop failure of a multicore processor can occur.
    The failed node automatically restart after 30 seconds by rebooting.
    Failure can be detected in one second by interruption of heartbeat signals.
    We use a checkpointing technique for recovery.
    We use the network contention model proposed by Oliver Sinnen.
  • As for checkpointing and recovery,
    we assume that each processor node saves state to the main memory when each task is finished.
  • So, our method reduces the recovery time in case of a failure
    it minimizes the worst case total execution time.
    It means that the worst case in the all possible patterns of failure
    and we minimize the sum of execution time before and after failure

    Our method is based on sinnen’s method, so it takes account of network contention.
  • I now explain the worst case scenario of failure.
    The critical path is the …
    The worst case scenario is that
  • 基本は既存と同じ手法でスケジューリングを行います
    既存手法のスケジュールでは
    このようなタスクグラフがあたえられたとき
    1,2は並列実行可能であるため、それぞれABに割り当てスケジュールとしてはこのようになります。
    続いて3に関しては、マルチコアプロセッサ内に割り当てたほうが、リンク速度が高速で処理時間を短くできるため
    A,Bのどちらか、この場合ですとAに割り当てられます
  • しかし、3のタスクノード実行中に故障が発生した場合、A,Bは同一のマルチコアプロセッサであるため、
    1,2,3のすべてのタスクノードデータが失われてしまうため、この1,2,3をもう一度最初からやりなおす必要があります。
  • しかし、3のタスクノード実行中に故障が発生した場合、A,Bは同一のマルチコアプロセッサであるため、
    1,2,3のすべてのタスクノードデータが失われてしまうため、この1,2,3をもう一度最初からやりなおす必要があります。
  • 提案手法では、先ほどの通り、3での故障がタスク処理時間の増加が最大となることがわかったため、
    3のタスクノードを、親プロセッサと異なるものに割り当てます。
    この場合、A,Bのマルチコアプロセッサとは異なるCに割り当てたため、3では通信時間が発生します。
  • 提案手法で先程と同様、タスクノード3の実行中に故障が発生した場合、
    既存のように同じプロセッサに集中しておらず、タスクノード1,2が残されています。
    そのため、再処理が必要なのは3のみとなり、処理時間が短縮されます。
  • 提案手法で先程と同様、タスクノード3の実行中に故障が発生した場合、
    既存のように同じプロセッサに集中しておらず、タスクノード1,2が残されています。
    そのため、再処理が必要なのは3のみとなり、処理時間が短縮されます。
  • 停止故障非発生時
    提案手法でのタスク処理時間が通信時間により若干増加
    別プロセッサに割り当てるコストがかかるため,
  • このように,提案手法ではタスク全体での処理時間を,
    停止故障が発生していないときの通信時間をオーバーヘッドとして,
    停止故障発生時の計算時間を短縮することができます.

    そのため提案手法では通信時間に対して計算時間が大きいほど,
    時間短縮できる割合が大きくなり停止故障発生時に有利になります.
         
  • 入力タスクグラフのタスクノードの色と,スケジュール中の色が対応しています.
    タスクグラフのクリティカルパス上のタスクノードは赤系の色になっています.
    既存スケジュールの方でみると,クリティカルパス上のタスクノードが通信時間削減し,タスク処理時間を短くするために,ひとつのプロセッサ上に集中しています.
    故障が発生したときは,これらのタスクノードの計算結果がすべて失われてしまうため,1からやり直した場合2倍近くのタスク処理時間になります
    提案手法ではこの,クリティカルパス上の赤系のタスクノードが,ある程度,他のプロセッサも利用するような割り当てになっており,
    故障が発生しても,全てをやり直す必要がなく,タスク処理時間の短縮につながります.
  • プロセッサ0とプロセッサ1をもつマルチコアプロセッサが故障が発生したとき
    黒で示している部分故障で利用できない状態
    既存手法では,このクリティカルーパス上の最後のタスクノードの処理が終わる直前に壊れた時に故障が発生して,
    再処理するために,7,9のタスクノードを利用して処理を再開しようとするが,これも同じプロセッサ上のために
    結果を失っていて使えない.
    このように順にたどっていった場合,最終的に,クリティカルパス上のタスク全てをやり直すことになる.

    提案手法では,ある程度,クリティカルパス上のタスクノードが分散しているため,すべてやり直す必要がなく,
    処理時間の短縮につながる.
  • このへんで12分ならええ感じ
  • 実験に利用した全てのタスクグラフはタスクスケジューリングのベンチマーク用に公開されている,standard task graph setからいくつかピックアップしたもの


    # Parallelism : 4.363796
  • (sum of all task processing times)/(critical path length).

    # Parallelism : 15.868853
  • CCR = communication to computation ratio

    (sum of all task processing times)/(critical path length).

    # Parallelism : 15.868853
  • CCRはタスクグラフの計算時間と通信時間の比率で,CCRが高いほど,通信時間がいタスクグラフになっています.

    交互手法では,通信時間のオーバーヘッドが大きく,CCR10の時では,既存手法よりもかなり悪くなっている.
    提案手法では,CCRが低いタスクグラフで効果を発揮する傾向にあるが,CCR10のときでも,
    故障していない時のオーバーヘッドは大きめであるが,故障時のタスク処理時間は短縮できている
  • (Slides) Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault

    1. 1. Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault 1 Shohei Gotoda†, Naoki Shibata‡, Minoru Ito† †Nara Institute of Science and Technology ‡Shiga University
    2. 2. Background • Multicore processors  Almost all processors designed recently are multicore processors • Computing cluster consisting of 1800 nodes experiences about 1000 failures[1] in the first year after deployment [1] Google spotlights data center inner workings cnet.com article on May 30, 2008
    3. 3. Objective of Research • Fault tolerance  We assume a single fail-stop failure of a multicore processor • Network contention  To generate schedules reproducible on real systems 3 Devise new scheduling method that minimizes recovery time taking account of the above points
    4. 4. Task Graph • A group of tasks that can be executed in parallel • Vertex (task node) Task to be executed on a single CPU core • Edge (task link) Data dependence between tasks 4 Task node Task link Task graph
    5. 5. Processor Graph • Topology of the computer network • Vertex (Processor node) CPU core (circle) • has only one link Switch (rectangle) • has more than 2 links • Edge (Processor link) Communication path between processors 5 Processor node Processor linkSwitch Processor graph 321
    6. 6. Task Scheduling • Task scheduling problem  assigns a processor node to each task node  minimizes total execution time An NP-hard problem 6 1 One processor node is assigned to each task node 321 Processor graph Task graph
    7. 7. Inputs and Outputs for Task Scheduling • Inputs Task graph and processor graph • Output A schedule • which is an assignment of a processor node to each task node • Objective function Minimize task execution time 7 3 31 31 321 Processor graph Task graph
    8. 8. Network Contention Model • Communication delay If processor link is occupied by another communication • We use existing network contention model[2] 8 3 31 32 Contention 321 Processor graph Task graph [2] O. Sinnen and L.A. Sousa, “Communication Contention in Task Scheduling,“ IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 6, pp. 503-515, 2005.
    9. 9. Multicore Processor Model • Each core executes a task independently from other cores • Communication between cores finishes instantaneously • One network interface is shared among all cores on a die • If there is a failure, all cores on a die stop execution simultaneously 9 Core1 Core2 CPU 21 Processor graph
    10. 10. Influence of Multicore Processors 10 • Need for considering multicore processors in scheduling High speed communication link among processors on a single die • Existing schedulers try to utilize this high speed link • As a result, many dependent tasks are assigned to cores on a single die 3 31 32 321 Assigned to cores on a same die Processor graph Task graph
    11. 11. • Need for considering multicore processors in scheduling High speed communication link among processors on a single die • Existing schedulers try to utilize this high speed link • As a result, many dependent tasks are assigned to cores on a single die In case of fault • Dependent tasks tends to be destroyed at a time 11 3 31 32 321 Processor graph Task graph Influence of Multicore Processors Assigned to cores on a same die
    12. 12. Related Work (1/2) • Checkpointing [3] Node state is saved in each node Backup node is allocated Recover processing results from saved state Multicore is not considered Network contention is not considered 12 [3] Y. Gu, Z. Zhang, F. Ye, H. Yang, M. Kim, H. Lei, and Z. Liu. An empirical study of high availability in stream processing systems. In Middleware ’09: the 10th ACM/IFIP/USENIX International Conference on Middleware (Industrial Track), 2009. 1 2 3 4 Input Queue Output Queue Secondary Primary Backup
    13. 13. Related Work (2/2) • Task scheduling method[5] in which  Multiple task graph templates are prepared beforehand,  Processors are assigned according to the templates • This method is suitable for highly loaded systems [5] Wolf, J., et al.: SODA: An Optimizing Scheduler for Large- Scale Stream-Based Distributed Computer Systems. In: ACM Middleware (2008)
    14. 14. Our Contribution • There is no existing method for scheduling that takes account of both • multicore processor failure • network contention • We propose a scheduling method taking account of network contention and multicore processor failure 14
    15. 15. Assumptions • Only a single fail-stop failure of a multicore processor can occur Failed computing node automatically restart after 30 sec. • Failure can be detected in one second by interruption of heartbeat signals • Use checkpointing technique to recover from saved state • Network contention Contention model is same as the Sinnen’s model 15
    16. 16. Checkpointing and Recovery • Each processor node saves state to the main memory when each task is finished  Saved state is the data transferred to the succeeding processor nodes  Only output data from each task node is saved as a state • This is much smaller than the complete memory image  We assume saving state finishes instantaneously • Since this is just copying small data within memory • Recovery  Saved state which is not affected by the failure is found in the ancestor task nodes. Some tasks are executed again using the saved state 16 [3] Y. Gu, Z. Zhang, F. Ye, H. Yang, M. Kim, H. Lei, and Z. Liu. An empirical study of high availability in stream processing systems. In Middleware ’09: the 10th ACM/IFIP/USENIX International Conference on Middleware (Industrial Track), 2009.
    17. 17. What Proposed Method Tries to Do • Reduce recovery time in case of failure  Minimizes the worst case total execution time • Worst case in the all possible patterns of failure • Each of dies can fail  Execution time before failure + recovery
    18. 18. Worst Case Scenario • Critical path Path in task graph from first to last task with longest execution time • The worst case scenario All tasks in critical path are assigned to processors on a die Failure happens when the last task is being executed We need two times of total execution time 18 Example task graph First Last
    19. 19. Idea of Proposed Method • We distribute tasks on critical path over dies But, there is communication overhead If we distribute too many tasks, there is too much overhead • Usually, the last tasks in critical path have larger influence We check tasks from the last task in the critical path We find the last k tasks in the critical path to other dies We find the best k
    20. 20. Problem with Existing Method 20 1 2 3 A B C 21 3 BA Resulting execution Existing Schedule D DC • Task 1 is assigned to core A • Task 2 is assigned to core B • Task 3 is assigned to same die • because of high communication speed Time
    21. 21. • Suppose that failure happens when Task 3 is being executed • All results are lost 21 1 2 3 A B C 21 3 BA D DC Resulting execution Existing Schedule Time Problem with Existing Method
    22. 22. Problem with Existing Method 22 1 2 3 A B C 21 3 BA D DC 1’ 2’ 3’ 21 3 Resulting execution Existing Schedule Time • Suppose that failure happens when Task 3 is being executed • All results are lost • We need to execute all tasks again from the beginning on another die
    23. 23. Improvement in Proposed Method • Distribute influential tasks to other dies In this case, task 3 is the most influential 23 21 3 Proposed schedule 1 2 3 A B C BA Resulting execution D DC Comm. overhead Time
    24. 24. Recovery in Proposed Method • Suppose that failure happens when Task 3 is being executed • Results of Task 1 and 2 are saved 24 21 3 1 2 3 A B C BA D DC Resulting execution Time Proposed schedule
    25. 25. Recovery in Proposed Method • Suppose that failure happens when Task 3 is being executed • Results of Task 1 and 2 are saved • Execution can be continued from the saved state 25 3’ 21 3 1 2 3 A B C BA D DC 3 Resulting execution Time Proposed schedule
    26. 26. Communication Overhead • Communication overhead is imposed to the proposed method 26 Existing schedule Proposed schedule overhead 1 2 3 A B C D 1 2 3 A B C D Time
    27. 27. Speed-up in Recovery 27 Recovery with existing schedule Recovery with proposed schedule Proposed method has larger effect if computation time is longer than communication time 1 2 3 A B C D 1 2 3 A B C D 1’ 2’ 3’ 3’ speed-up 時間
    28. 28. Comparison of Schedules 28 Existing schedule Proposed schedule Time Time Task graph 10 32 Processor graph 1 2 6 7 3 4 8 9 5 1 0 1 1 1 2 1 3
    29. 29. 29 Not available Comparison of Recovery Existing schedule Proposed schedule Time Time Task graph 10 32 Processor graph 1 2 6 7 3 4 8 9 5 1 0 1 1 1 2 1 3
    30. 30. Evaluation • Items to compare Recovery time in case of a failure Overhead in case of no failure • Compared methods PROPOSED CONTENTION • Sinnen’s method considering network contention INTERLEAVED • Scheduling algorithm that tries to spread tasks to all dies as much as possible 30
    31. 31. Test Environment • Devices 4 PCs with • Intel Core i7 920 (2.67GHz) (Quad core) • Intel Network Interface Card  Intel Gigabit CT Desktop Adaptor (PCI Express x1) • 6.0GB Memory • Program to measure execution time • Windows 7(64bit) • Java(TM) SE Runtime Environment (64bit) • Standard TCP socket 31
    32. 32. Task Graph with Low Parallelism Configuration • Number of task nodes:90 • Number of cores on a die:2 • Number of dies:2~4 • Robot control [4] 32 Task graph Processor graph 10 Die 1 Core Switch 4 5 Die # of dies 32 Die 6 7 Die [4] Standard Task Graph Set http://www.kasahara.elec.waseda.ac.jp/schedule/index.html
    33. 33. Results with Robot Control Task • We varied number of dies • In case of failure, proposed method reduced total execution time by 40% • In case of no failure, up to 6% of overhead 33 In case of a failure No failure 40% 6% Number of dies Number of dies CONTENTIONINTERLEAVED PROPOSED INTERLEAVED CONTENTION PROPOSED Executiontime(sec) Executiontime(sec)
    34. 34. Configuration • Number of task nodes:98 • Number of cores on a die:4 • Number of dies:2~4 • Sparse matrix solver [4] 34 10 Die 1 Core Switch 2 3 54 Die 6 7 # of dies Task Graph with High Parallelism Processor graph Task graph [4] Standard Task Graph Set http://www.kasahara.elec.waseda.ac.jp/schedule/index.html
    35. 35. Results with Sparse Matrix Solver • We varied number of dies • In case of failure, execution time including recovery reduced by up to 25% • In case of no failure, up to 7% of overhead 35 25% 7% In case of a failure No failure INTERLEAVEDINTERLEAVED CONTENTION CONTENTION PROPOSED PROPOSED Number of diesNumber of dies Executiontime(sec) Executiontime(sec)
    36. 36. Simulation with Varied CCR • CCR Ratio between comm. time and comp. time High CCR means long communication time • Number of tasks:50 • Number of cores on a die:4 • Number of dies:4 • Task graph 18 random graphs 10 Die 1 Core Switch 2 3 54 Die 6 7 # of dies Processor graph
    37. 37. • We varied CCR • INTERLEAVED has large overhead when CCR=10 (communication heavy) • PROPOSED has 30% overhead, but reduced execution time in case of no failure 37 5% 30% Results with Varied CCR In case of a failure No failure Executiontime(sec) Executiontime(sec) INTERLEAVED CONTENTION PROPOSED CONTENTION PROPOSED INTERLEAVED
    38. 38. Effect of Parallelization of Proposed Scheduler • Proposed algorithm is parallelized • Compared times to generate schedules 20 task graphs Multi thread vs Single Thread Speed-up : up to x4 38 Environment • Intel Core i7 920 (2.67GHz) • Windows 7(64bit) • Java(TM) SE 6 (64bit) Single thread Multi thread Timetogenerateschedule
    39. 39. Conclusion • Proposed task scheduling method considering Network contention Single fail-stop failure Multicore processor • Future work Evaluation on larger computer system 39
    40. 40. Shohei Gotoda, Naoki Shibata and Minoru Ito : "Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault," Proceedings of IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2012), pp.260- 26, 2012. DOI:10.1109/CCGrid.2012.23 [ PDF ] 40
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×