OFC/NFOEC: GPU-based Parallelization of System Modeling


Published on

Check out Stephan Pachnicke's presentation on GPU-based Paralleilization of System Modeling at OFC/NFOEC 2013

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

OFC/NFOEC: GPU-based Parallelization of System Modeling

  1. 1. GPU-based Parallelization ofSystem ModelingStephan Pachnicke, 18.03.2013
  2. 2. Outline• Motivation• Numerical System Modeling• GPU-Parallelization• Comparison of Speedup and Accuracy• Conclusion2 © 2013 ADVA Optical Networking. All rights reserved.
  3. 3. AcknowledgmentsThe author would like to acknowledge the help andcontributions ofAdam Chachaj – Krone MesstechnikHeinrich Müller – TU DortmundPeter Krummrich – TU DortmundMarkus Roppelt – ADVA Optical NetworkingMichael Eiselt – ADVA Optical Networking3 © 2013 ADVA Optical Networking. All rights reserved.
  4. 4. Motivation4 © 2013 ADVA Optical Networking. All rights reserved.
  5. 5. In Short: Computational Performance Graphical Processing Unit (GPU) vs. CPU Cluster5 © 2013 ADVA Optical Networking. All rights reserved.
  6. 6. Increase in GFlop/s• GPU performance is growing even faster than predicted by Moore„s law and is significantly higher than CPU performance• GPUs are attractive also for general purpose computing (complex numerical simulations)6 © 2013 ADVA Optical Networking. All rights reserved.
  7. 7. Optical System Modeling• Simulation of (long-haul) optical transmission systems requires numerical solution of the nonlinear Schrödinger equation High computational effort for small step-sizes due to accurate simulation of nonlinear fiber effects• Precise estimation of the bit error ratio with Monte-Carlo simulations for PMD and noise Requires a high number of simulated bits7 © 2013 ADVA Optical Networking. All rights reserved.
  8. 8. Split-Step Fourier Method (SSFM)• Splits nonlinear Schrödinger equation in linear and nonlinear parts• Separate solution of linear and nonlinear parts• Solution of the linear part in the frequency domain and of the nonlinear part in time domain (acceptable for small step-sizes)… FFT FFT IFFT IFFT IFFT … 1 Split-Step8 © 2013 ADVA Optical Networking. All rights reserved.
  9. 9. Speedup Factor (GPU vs CPU) Single precision (SP) Double precision (DP) Legend DP: Nvidia CUDA FFT SP: FFT using pre-calculated twiddle factors• Single precision arithmetic has much higher performance on GPU (because main target group is computer gaming)• Longer block lengths allow better parallelization Single precision implementation desirable9 © 2013 ADVA Optical Networking. All rights reserved.
  10. 10. Accuracy (in single precision) Legend CUFFT: Nvidia CUDA FFT FFTW: Fastest Fourier Transform in the West IPP: Intel Integrated Performance Primitives LUT-based FFT LUT: Precalculate trigonometric functions in DP • Total accuracy of SSFM dominated by FFT accuracy • Backward error grows linearly with increasing number of FFTs • CUDA FFT shows considerably higher error than other FFT implementations10 © 2013 ADVA Optical Networking. All rights reserved.
  11. 11. Analysis: Accuracy Why is the accuracy of CUFFT in SP relatively low?  FFT performance depends crucially on accuracy of „twiddle- factors“ (or trigonometric functions)  HW implementation of trigonometric functions in SP on GPUs optimized for peak performance not accuracy What can be done to increase accuracy in single precision?  Implementation of Taylor series expansion (slow!)  Compute trigonometric functions in DP on CPU and store them in a look-up table on the GPU (especially suited to the split-step Fourier method with thousands of FFTs of similar length) J. C. Schatzman, SIAM J. Scientific Comput. (1996).11 © 2013 ADVA Optical Networking. All rights reserved.
  12. 12. Illustrative Example CUDA FFT (SP) LUT-based FFT (SP) -: GPU -: CPU • Look-up table based FFT provides a significantly increased accuracy in single- precision arithmetics • Look-up table holds pre-calculated „twiddle-factor“ values Source: S. Pachnicke, et al, OFC 2011.12 © 2013 ADVA Optical Networking. All rights reserved.
  13. 13. System Analysis (SSFM Simulation) Req. OSNR deviation for BER=10-3 [dB] GPU simulation (in SP or DP) vs. CPU simulation (in DP) 11x 112 Gb/s CP-QPSK • GPU double precision results are (almost) identical to CPU results • The OSNR penalty of our single precision implementation remains below 0.1 dB up to a number of approx. 125,000 split-steps Source: S. Pachnicke, IEEE ICTON, 2010.13 © 2013 ADVA Optical Networking. All rights reserved.
  14. 14. Combined Simulation in SP & DP  Calculate approximate division of the parameter space into strata by fast simulations with single precision.  The ellipses represent parameter combinations for which bit errors occur during transmission.  Execute simulations with double precision accuracy sparsely in the different strata to assess the BER.  Combined simulation with single and double precision and automatic (algorithmic) choice of amount of single precision simulations P. Serena, et al, IEEE JLT, 2009. S. Pachnicke, et al, OFC 2011.14 © 2013 ADVA Optical Networking. All rights reserved.
  15. 15. Discussion Robustness of algorithm has been checked by deliberately selecting high amount of 880,000 split-steps • Results of combined (SP & DP) GPU simulations match well with results obtained from CPU simulations in DP • Speedup of up to a factor of 180 possible compared to CPU  Stratified Monte-Carlo sampling allows algorithmic choice of amount of required DP simulations for a given accuracy Source: S. Pachnicke, et al, OFC 2011.15 © 2013 ADVA Optical Networking. All rights reserved.
  16. 16. Design Advantages • GPU parallelization allows simulation of a long distance 80 WDM channel system on a PC in reasonable time Source: C. Xia, D. van den Borne, OFC, 2011 • Result: The system performance can be estimated much more precisely than with CPU-based simulations (typically modeling only 10 WDM channel systems)16 © 2013 ADVA Optical Networking. All rights reserved.
  17. 17. Conclusion • GPUs offer a much higher computational peak performance than CPUs • Full benefit of GPU power only in single precision • Increase in single precision accuracy possible by pre-computing of trigonometric function values for FFTs • Speedup in simulation time of more than a factor of 100 possible compared to CPU17 © 2013 ADVA Optical Networking. All rights reserved.
  18. 18. Further Reading • N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, J. Manferdelli, “High Performance Discrete Fourier Transforms on Graphics Processors”, Proc. of IEEE conference on Supercomputing (SC), article no. 2 (2008). • S. Pachnicke, “Fiber-Optic Transmission Networks: Efficient Design and Dynamic Operation”, Springer (2011). • J. C. Schatzman, “Accuracy of the Discrete Fourier Transform and the Fast Fourier Transform”, SIAM J. Scientific Comput. 17, 1150-1166 (1996). • G. Falcao, V. Silva, L. Sousa, “How GPUs can outperform ASICs for fast LDPC decoding”, Proc. of ACM International Conference on Supercomputing (ICS), 390-399 (2009). • J. A. Stratton, S. S. Stone, W.-M. W. Hwu, “MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs”, Lecture Notes in Computer Science 5335, 16-30 (2008). • R. R. Exposito, G. L. Taboada, S. Ramos, J. Tourino, R. Doallo, “General- purpose computation on GPUs for high performance cloud computing”, Wiley J. Concurrency and Computation 24 (2012).18 © 2013 ADVA Optical Networking. All rights reserved.
  19. 19. Thank youspachnicke@advaoptical.comIMPORTANT NOTICEThe content of this presentation is strictly confidential. ADVA Optical Networking is the exclusive owner or licensee of thecontent, material, and information in this presentation. Any reproduction, publication or reprint, in whole or in part, is strictlyprohibited.The information in this presentation may not be accurate, complete or up to date, and is provided without warranties orrepresentations of any kind, either express or implied. ADVA Optical Networking shall not be responsible for and disclaims anyliability for any loss or damages, including without limitation, direct, indirect, incidental, consequential and special damages,alleged to have been caused by or in connection with using and/or relying on the information contained in this presentation.Copyright © for the entire content of this presentation: ADVA Optical Networking.