Gabriele D’Angelo joint work with  Luciano Bononi Michele Bracuto Lorenzo Donatiello University of Bologna Department of C...
Presentation outline <ul><li>Goal and basic assumptions </li></ul><ul><li>State of the art </li></ul><ul><li>The Concurren...
Goal and basic assumptions <ul><li>Goal: to  increase the simulation speed , reduce the  Wall Clock Time (WCT)  required t...
MRIP: Multiple Replications in Parallel <ul><li>MRIP approach: launching  multiple runs  of independent sequential simulat...
Legacy PDES approach <ul><li>The whole simulation is composed by the execution of a  linear sequence of multiple parallel ...
Typical PDES problems <ul><li>Frequent synchronizations  are required among the model components </li></ul><ul><li>Each si...
The CR-PADS approach
Concurrent Replication of PADS (CR-PADS) <ul><li>The CR-PADS is a mechanism that  duplicates  the logical processes (LPs) ...
ARTìS: Advanced RTI System
The ARTìS and Replication architecture LP standard logical structure LP CR-PADS logical structure <ul><li>The Replication ...
Parallel and Distributed CR-PADS architecture <ul><li>PEU = Physical Execution Unit </li></ul><ul><li>A set of LPs on the ...
Performance evaluation: Ad-Hoc wireless network model <ul><li>Simulated model: </li></ul><ul><li>A set of  Simulated Mobil...
Ad-Hoc network model characterization <ul><li>Computation and communication issues: </li></ul><ul><li>The  computation  re...
Simulation test-beds <ul><li>Two different environments: </li></ul><ul><li>Parallel : a single Physical Execution Unit (PE...
Parallel environment: 500 SMHs M=#PEU N=#LP
Parallel environment: 1000 SMHs M=#PEU N=#LP
Parallel environment: 2000 SMHs M=#PEU N=#LP By increasing the number of SMHs we are increasing the percentage of local co...
Distributed environment: 500 SMHs M=#PEU N=#LP
Distributed environment: 1000 SMHs M=#PEU N=#LP
Distributed environment: 2000 SMHs M=#PEU N=#LP The average communication latency of distributed environments is at least ...
The overall picture The CR-PADS mechanism is able to give  significant speed-up  but can also introduce overhead when the ...
Performance analysis: processed events/sec The CR-PADS allows a  high event computation density But when the computation l...
Performance analysis: network throughput  (KB/sec) If CR-PADS is OFF the network is  under-utilized When the number of sim...
Conclusions: Concurrent Replication of PADS <ul><li>Under the Parallel (or Distributed) Discrete Event Simulation (PDES)  ...
Future work <ul><li>Further extend the ARTìS middleware </li></ul><ul><li>Improve the CR-PADS implementation </li></ul><ul...
Gabriele D’Angelo   [email_address] http://www.cs.unibo.it/~gdangelo University of Bologna Department of Computer Science ...
Upcoming SlideShare
Loading in …5
×

Concurrent Replication of Parallel and Distributed Simulations

1,990 views
1,903 views

Published on

Parallel and distributed simulations enable the analysis of complex systems by concurrently exploiting the aggregate computation power and memory of clusters of execution units. In this work we investigate a new direction for increasing both the speedup of a simulation process and the utilization of computation and communication resources. Many simulation-based investigations require to collect independent observations for a correct and significant statistical analysis of results. The execution of many independent parallel or distributed simulation runs may suffer the speedup reduction due to rollbacks under the optimistic approach, and due to idle CPU times originated by synchronization and communication bottlenecks under the conservative approach. We present a parallel and distributed simulation framework supporting concurrent replication of parallel and distributed simulations (CR-PADS), as an alternative to the execution of a linear sequence of multiple parallel or distributed simulation runs. Results obtained from tests executed under variable scenarios show that speedup and resource utilization gains could be obtained by adopting the proposed replication approach in addition to the pure parallel and distributed simulation.

Published in: Economy & Finance, Spiritual
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,990
On SlideShare
0
From Embeds
0
Number of Embeds
74
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Concurrent Replication of Parallel and Distributed Simulations

  1. 1. Gabriele D’Angelo joint work with Luciano Bononi Michele Bracuto Lorenzo Donatiello University of Bologna Department of Computer Science PADS 2005 – Monterey (CA) Concurrent Replication of Parallel and Distributed Simulations
  2. 2. Presentation outline <ul><li>Goal and basic assumptions </li></ul><ul><li>State of the art </li></ul><ul><li>The Concurrent Replication of PADS (CR-PADS) mechanism </li></ul><ul><li>The ARTìS middleware </li></ul><ul><li>The CR-PADS logical structure and implementation </li></ul><ul><li>Performance evaluation: Ad Hoc network model </li></ul><ul><ul><li>Parallel and distributed environments </li></ul></ul><ul><li>Conclusions and future work </li></ul>
  3. 3. Goal and basic assumptions <ul><li>Goal: to increase the simulation speed , reduce the Wall Clock Time (WCT) required to complete the simulation runs </li></ul><ul><li>We focus on Parallel and Distributed Simulations based on conservative synchronization algorithms (time-stepped, CMB) </li></ul><ul><li>A typical simulation-based investigation requires to collect many independent observations for a correct and significant statistical analysis of results </li></ul><ul><li>And so we are interested in the WCT necessary to complete all the simulations runs (not a single run) </li></ul>
  4. 4. MRIP: Multiple Replications in Parallel <ul><li>MRIP approach: launching multiple runs of independent sequential simulations in parallel over a set of concurrent CPUs </li></ul><ul><li>Every simulation run is executed from the beginning up to the end on the same CPU ( monolithic sequential simulation ) </li></ul><ul><li>Some computational resources may remain unexploited </li></ul>
  5. 5. Legacy PDES approach <ul><li>The whole simulation is composed by the execution of a linear sequence of multiple parallel or distributed simulation run </li></ul><ul><li>The standard Parallel or Distributed event simulation (PDES) approach may introduce advantages: every independent run could exploit the whole computation architecture </li></ul><ul><li>A single run may complete in less time than a sequential run </li></ul>
  6. 6. Typical PDES problems <ul><li>Frequent synchronizations are required among the model components </li></ul><ul><li>Each simulation component swings between computation and communication phases </li></ul><ul><li>The whole set of processes advance with the speed of the slowest </li></ul><ul><li>Some phases (usually before the synchronization barriers) could be communication intensive and may led to network congestions (further increasing the communication overhead) </li></ul><ul><li>Our idea is to obtain a more fluent computation and communication by concurrently merging the execution of more PADS replicas </li></ul>
  7. 7. The CR-PADS approach
  8. 8. Concurrent Replication of PADS (CR-PADS) <ul><li>The CR-PADS is a mechanism that duplicates the logical processes (LPs) of PADS runs starting from the initialization phase of every single run </li></ul><ul><li>Every replica is based on the same model definition, realizes an independent execution based on local initial parameters, variable factors of the analysis and different random number generator seeds </li></ul><ul><li>CR-PADS is absolutely different from simulation cloning </li></ul><ul><li>The risk of this mechanism is to spend too much time in switching processes’ execution, in the creation of communication bottlenecks and live-locks, resulting in trashing effects </li></ul>
  9. 9. ARTìS: Advanced RTI System
  10. 10. The ARTìS and Replication architecture LP standard logical structure LP CR-PADS logical structure <ul><li>The Replication layer is the funnel for replicas over the RTI Communication layer. It is required to maintain transparency and to optimize the communication performances </li></ul>
  11. 11. Parallel and Distributed CR-PADS architecture <ul><li>PEU = Physical Execution Unit </li></ul><ul><li>A set of LPs on the same PEUs communicate via low latency Shared Memory (SHM) </li></ul><ul><li>Two or more LPs located on different hosts (i.e. no shared memory available) rely on standard TCP/IP connections </li></ul><ul><li>The whole structure is based on a set of inter-communicating threads </li></ul>
  12. 12. Performance evaluation: Ad-Hoc wireless network model <ul><li>Simulated model: </li></ul><ul><li>A set of Simulated Mobile Hosts (SMHs) </li></ul><ul><li>Mobility model: </li></ul><ul><ul><li>Random Mobility Motion model (RMM) </li></ul></ul><ul><ul><li>uncorrelated SMHs’ mobility </li></ul></ul><ul><li>Traffic model: </li></ul><ul><ul><li>ping messages (CBR) by every SMH to all neighbors within the wireless communication range (250 m) </li></ul></ul><ul><li>Propagation model </li></ul><ul><ul><li>open space (neighbor-SMHs within detection range) </li></ul></ul>
  13. 13. Ad-Hoc network model characterization <ul><li>Computation and communication issues: </li></ul><ul><li>The computation required for each SMH per time-step is in the order of O(#SMH ^ 2) : to determine the neighbor set </li></ul><ul><li>The communication required among SMHs is in the order of O(K*#SMH) per time-step, with K defined as a constant value based on SMHs density (assumed as constant) </li></ul>
  14. 14. Simulation test-beds <ul><li>Two different environments: </li></ul><ul><li>Parallel : a single Physical Execution Unit (PEU) </li></ul><ul><li>Distributed : a set of PEU, interconnected by a Fast Ethernet LAN (100 Mb/sec) </li></ul><ul><li>Each PEU is an Intel Dual Xeon Pentium IV 2800 MHz, with 3 GB RAM, Debian GNU/Linux OS with kernel version 2.6.x </li></ul><ul><li>Conservative time-stepped simulation: 300 time-steps </li></ul>
  15. 15. Parallel environment: 500 SMHs M=#PEU N=#LP
  16. 16. Parallel environment: 1000 SMHs M=#PEU N=#LP
  17. 17. Parallel environment: 2000 SMHs M=#PEU N=#LP By increasing the number of SMHs we are increasing the percentage of local computation with respect of the weight of communication (and synchronization) This means that we are pushing the computation to saturate all the CPU power
  18. 18. Distributed environment: 500 SMHs M=#PEU N=#LP
  19. 19. Distributed environment: 1000 SMHs M=#PEU N=#LP
  20. 20. Distributed environment: 2000 SMHs M=#PEU N=#LP The average communication latency of distributed environments is at least one order of magnitude bigger than the average latency experienced in the parallel architecture This would translate in better opportunities for CR-PADS to optimize the concurrent execution of many runs
  21. 21. The overall picture The CR-PADS mechanism is able to give significant speed-up but can also introduce overhead when the number of replicas is high and the computation saturates the CPUs It is worth to note that usually the number of required replicas is quite limited
  22. 22. Performance analysis: processed events/sec The CR-PADS allows a high event computation density But when the computation load increases we can see a trashing effect: there is not space for additional concurrency in the computation
  23. 23. Performance analysis: network throughput (KB/sec) If CR-PADS is OFF the network is under-utilized When the number of simulated SMHs increases the new bottleneck is the computation and not the communication
  24. 24. Conclusions: Concurrent Replication of PADS <ul><li>Under the Parallel (or Distributed) Discrete Event Simulation (PDES) frequent synchronizations are required among the model components </li></ul><ul><li>Every CPU swings between computation and idle , while the underlying communication infrastructure swings between idle and communication periods </li></ul><ul><li>A typical simulation-based investigation requires to collect many independent observations for a correct and significant statistical analysis of results </li></ul><ul><li>Proposed solution : launching multiple, independent and concurrent PDES runs over the system, with the goal to reduce idle computation and communication time </li></ul>
  25. 25. Future work <ul><li>Further extend the ARTìS middleware </li></ul><ul><li>Improve the CR-PADS implementation </li></ul><ul><li>Test the performances of the CR-PADS mechanism on CMB-synchronized models </li></ul><ul><li>Integrate the CR-PADS with our migration based framework ( GAIA, Generic Adaptive Interaction Architecture ) to improve load-balancing and dynamic model partitioning ( PADS 2003 ) </li></ul><ul><li>Apply the concurrent replication and simulation cloning mechanisms to High Performance Computing (HPC) environments </li></ul>
  26. 26. Gabriele D’Angelo [email_address] http://www.cs.unibo.it/~gdangelo University of Bologna Department of Computer Science PADS 2005, Monterey (CA) Concurrent Replication of Parallel and Distributed Simulations

×