Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

High Performance Computing: an Introduction for the Society of Actuaries


Published on

Invited talk. An introduction to high-performance computing given at the Society of Actuaries Life & Annuity Symposium in 2011.

Published in: Technology
  • Be the first to comment

High Performance Computing: an Introduction for the Society of Actuaries

  1. 1. High Performance Computing Adam DeConinck R Systems NA, Inc. 1
  2. 2. Development of models begins at small scale.Working on your laptop is convenient, simple.Actual analysis, however, is slow. 2
  3. 3. Development of models begins at small scale.Working on your laptop is convenient, simple.Actual analysis, however, is slow.“Scaling up” typically means a small server orfast multi-core desktop.Speedup exists, but for very large models, notsignificant.Single machines dont scale up forever. 3
  4. 4. For the largest models, a different approach is required. 4
  5. 5. High-Performance Computing involves many distinct computer processors working together on the same calculation.Large problems are divided into smaller parts and distributed among the many computers.Usually clusters of quasi-independent computers which are coordinated by a central scheduler. 5
  6. 6. Typical HPC Cluster LoginExternalconnection Ethernet network Scheduler Computes File Server High-speed network (10GigE / Infiniband) 6
  7. 7. Performance gains High-end workstation Duration (s) Number of cores Performance test: stochastic finance model on R Systems cluster High-end workstation: 8 cores. Maximum speedup of 20x: 4.5 hrs → 14 minutes  Scale-up heavily model-dependent: 5x – 100x in our tests, can be faster No more performance gain after ~500 cores: why? Some operations cant be parallelized. Additional cores? Run multiple models simultaneously 7
  8. 8. Performance comes at a price: complexity. New paradigm: real-time analysis vs batch jobs. Applications must be written specifically to take advantage of distributed computing. Performance characteristics of applications change. Debugging becomes more of a challenge. 8
  9. 9. New paradigm: real-time analysis vs batch jobs.Most small analyses are done in Large jobs are typically done in a real time: batch model: “At-your-desk” analysis  Submit job to a queue Small models only  Much larger models Fast iterations  Slow iterations No waiting for resources  May need to wait 9
  10. 10. Applications must be written specifically to take advantage of distributed computing. Explicitly split your problem into smaller “chunks” “Message passing” between processes Entire computation can be slowed by one or two slow chunks Exception: “embarrassingly parallel” problems Easy-to-split, independent chunks of computation Thankfully, many useful models fall under “Embarrassingly parallel” = this heading. (e.g. stochastic models) No inter-process communication 10
  11. 11. Performance characteristics of applications change.On a single machine: On a cluster: CPU speed (compute)  Single-machine metrics Cache  Network Memory  File server Disk  Scheduler contention  Results from other nodes 11
  12. 12. Debugging becomes more of a challenge. More complexity = more pieces that can fail Race conditions: sequence of events no longer deterministic Single nodes can “stall” and slow the entire computation Scheduler, file server, login server all have their own challenges 12
  13. 13. External resources One solution to handling complexity: outsource it! Historical HPC facilities: universities, national labs  Often have the most absolute compute capacity, and will sell excess capacity  Competition with academic projects, typically do not include SLA or high-level support Dedicated commercial HPC facilities providing “on-demand” compute power. 13
  14. 14. External HPC Internal HPC Outsource HPC sysadmin  Requires in-house expertise No hardware investment  Major investment in hardware Pay-as-you-go  Possible idle time Easy to migrate to new tech  Upgrades require new hardware 14
  15. 15. Internal HPC External HPC No external contention  No guaranteed access All internal—easy security  Security arrangements complex Full control over configuration  Limited control of configuration Simpler licensing control  Some licensing complex Requires in-house expertise  Outsource HPC sysadmin Major investment in hardware  No hardware investment Possible idle time  Pay-as-you-go Upgrades require new hardware  Easy to migrate to new tech 15
  16. 16. “The Cloud” “Cloud computing”: virtual machines, dynamic allocation of resources in an external resource Lower performance (virtualization), higher flexibility Usually no contracts necessary: pay with your credit card, get 16 nodes Often have to do all your own sysadmin Low support, high control 16
  17. 17. CASE STUDY:Windows cluster for Actuarial Application 17
  18. 18. Global insurance company Needed 500-1000 cores on a temporary basis Preferred a utility, “pay-as-you-go” model Experimenting with external resources for “burst” capacity during high-activity periods Commercially licensed and supported application Requested a proof of concept 18
  19. 19. Cluster configuration Application embarrassingly parallel, small-to-medium data files, computationally and memory-intensive  Prioritize computation (processors), access to fileserver over inter-node communication, large storage  Upgraded memory in compute nodes to 2 GB/core 128-node cluster: 3.0 GHz Intel Xeon processors, 8 cores per node for 1024 cores total Windows 2008 HPC R2 operating system Application and fileserver on login node 19
  20. 20. Stumbling blocks Application optimizationCustomer had a wide variety of models which generated different usage patterns. (IO, compute, memory-intensive jobs) Required dynamic reconfiguration for different conditions. Technical issueIterative testing process. Application turned out to be generating massive fileserver contention. Had to make changes to both software and hardware. Human processes Users were accustomed to internal access model. Required changes both for providers (increase ease-of-use) and users (change workflow) Security Customer had never worked with an external provider before. Complex internal security policy had to be reconciled with remote access. 20
  21. 21. Lessons learned: Security was the biggest delaying factor. The initial security setup took over 3 months from the first expression of interest, even though cluster setup was done in less than a week.  Only mattered the first time though: subsequent runs started much more smoothly. A low-cost proof-of-concept run was important to demonstrate feasibility, and for working the bugs out. A good relationship with the application vendor was extremely important to solving problems and properly optimizing the model for performance. 21
  22. 22. Recent developments: GPUs 22
  23. 23. Graphics processing units CPU: complex, general-purpose processor GPU: highly-specialized parallel processor, optimized for performing operations for common graphics routines Highly specialized → many more “cores” for same cost and space  Intel Core i7: 4 cores @ 3.4 GHz: $300 = $75/core  NVIDIA Tesla M2070: 448 cores @ 575 MHz: $4500 = $10/core Also higher bandwidth: 100+ GB/s for GPU vs 10-30 GB/s for CPU Same operations can be adapted for non-graphics applications: “GPGPU” 23 Image from
  24. 24. HPC/Actuarial using GPUs  Random-number generation  Finite-difference modeling  Image processing  Numerical Algorithms Group: GPU random-number generator  MATLAB: operations on large arrays/matrices  Wolfram Mathematica: symbolic math analysisData from 24