High Performance Computing: an Introduction for the Society of Actuaries


Published on

Invited talk. An introduction to high-performance computing given at the Society of Actuaries Life & Annuity Symposium in 2011.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

High Performance Computing: an Introduction for the Society of Actuaries

  1. 1. High Performance Computing Adam DeConinck R Systems NA, Inc. 1
  2. 2. Development of models begins at small scale.Working on your laptop is convenient, simple.Actual analysis, however, is slow. 2
  3. 3. Development of models begins at small scale.Working on your laptop is convenient, simple.Actual analysis, however, is slow.“Scaling up” typically means a small server orfast multi-core desktop.Speedup exists, but for very large models, notsignificant.Single machines dont scale up forever. 3
  4. 4. For the largest models, a different approach is required. 4
  5. 5. High-Performance Computing involves many distinct computer processors working together on the same calculation.Large problems are divided into smaller parts and distributed among the many computers.Usually clusters of quasi-independent computers which are coordinated by a central scheduler. 5
  6. 6. Typical HPC Cluster LoginExternalconnection Ethernet network Scheduler Computes File Server High-speed network (10GigE / Infiniband) 6
  7. 7. Performance gains High-end workstation Duration (s) Number of cores Performance test: stochastic finance model on R Systems cluster High-end workstation: 8 cores. Maximum speedup of 20x: 4.5 hrs → 14 minutes  Scale-up heavily model-dependent: 5x – 100x in our tests, can be faster No more performance gain after ~500 cores: why? Some operations cant be parallelized. Additional cores? Run multiple models simultaneously 7
  8. 8. Performance comes at a price: complexity. New paradigm: real-time analysis vs batch jobs. Applications must be written specifically to take advantage of distributed computing. Performance characteristics of applications change. Debugging becomes more of a challenge. 8
  9. 9. New paradigm: real-time analysis vs batch jobs.Most small analyses are done in Large jobs are typically done in a real time: batch model: “At-your-desk” analysis  Submit job to a queue Small models only  Much larger models Fast iterations  Slow iterations No waiting for resources  May need to wait 9
  10. 10. Applications must be written specifically to take advantage of distributed computing. Explicitly split your problem into smaller “chunks” “Message passing” between processes Entire computation can be slowed by one or two slow chunks Exception: “embarrassingly parallel” problems Easy-to-split, independent chunks of computation Thankfully, many useful models fall under “Embarrassingly parallel” = this heading. (e.g. stochastic models) No inter-process communication 10
  11. 11. Performance characteristics of applications change.On a single machine: On a cluster: CPU speed (compute)  Single-machine metrics Cache  Network Memory  File server Disk  Scheduler contention  Results from other nodes 11
  12. 12. Debugging becomes more of a challenge. More complexity = more pieces that can fail Race conditions: sequence of events no longer deterministic Single nodes can “stall” and slow the entire computation Scheduler, file server, login server all have their own challenges 12
  13. 13. External resources One solution to handling complexity: outsource it! Historical HPC facilities: universities, national labs  Often have the most absolute compute capacity, and will sell excess capacity  Competition with academic projects, typically do not include SLA or high-level support Dedicated commercial HPC facilities providing “on-demand” compute power. 13
  14. 14. External HPC Internal HPC Outsource HPC sysadmin  Requires in-house expertise No hardware investment  Major investment in hardware Pay-as-you-go  Possible idle time Easy to migrate to new tech  Upgrades require new hardware 14
  15. 15. Internal HPC External HPC No external contention  No guaranteed access All internal—easy security  Security arrangements complex Full control over configuration  Limited control of configuration Simpler licensing control  Some licensing complex Requires in-house expertise  Outsource HPC sysadmin Major investment in hardware  No hardware investment Possible idle time  Pay-as-you-go Upgrades require new hardware  Easy to migrate to new tech 15
  16. 16. “The Cloud” “Cloud computing”: virtual machines, dynamic allocation of resources in an external resource Lower performance (virtualization), higher flexibility Usually no contracts necessary: pay with your credit card, get 16 nodes Often have to do all your own sysadmin Low support, high control 16
  17. 17. CASE STUDY:Windows cluster for Actuarial Application 17
  18. 18. Global insurance company Needed 500-1000 cores on a temporary basis Preferred a utility, “pay-as-you-go” model Experimenting with external resources for “burst” capacity during high-activity periods Commercially licensed and supported application Requested a proof of concept 18
  19. 19. Cluster configuration Application embarrassingly parallel, small-to-medium data files, computationally and memory-intensive  Prioritize computation (processors), access to fileserver over inter-node communication, large storage  Upgraded memory in compute nodes to 2 GB/core 128-node cluster: 3.0 GHz Intel Xeon processors, 8 cores per node for 1024 cores total Windows 2008 HPC R2 operating system Application and fileserver on login node 19
  20. 20. Stumbling blocks Application optimizationCustomer had a wide variety of models which generated different usage patterns. (IO, compute, memory-intensive jobs) Required dynamic reconfiguration for different conditions. Technical issueIterative testing process. Application turned out to be generating massive fileserver contention. Had to make changes to both software and hardware. Human processes Users were accustomed to internal access model. Required changes both for providers (increase ease-of-use) and users (change workflow) Security Customer had never worked with an external provider before. Complex internal security policy had to be reconciled with remote access. 20
  21. 21. Lessons learned: Security was the biggest delaying factor. The initial security setup took over 3 months from the first expression of interest, even though cluster setup was done in less than a week.  Only mattered the first time though: subsequent runs started much more smoothly. A low-cost proof-of-concept run was important to demonstrate feasibility, and for working the bugs out. A good relationship with the application vendor was extremely important to solving problems and properly optimizing the model for performance. 21
  22. 22. Recent developments: GPUs 22
  23. 23. Graphics processing units CPU: complex, general-purpose processor GPU: highly-specialized parallel processor, optimized for performing operations for common graphics routines Highly specialized → many more “cores” for same cost and space  Intel Core i7: 4 cores @ 3.4 GHz: $300 = $75/core  NVIDIA Tesla M2070: 448 cores @ 575 MHz: $4500 = $10/core Also higher bandwidth: 100+ GB/s for GPU vs 10-30 GB/s for CPU Same operations can be adapted for non-graphics applications: “GPGPU” 23 Image from http://blogs.nvidia.com/2009/12/whats-the-difference-between-a-cpu-and-a-gpu/
  24. 24. HPC/Actuarial using GPUs  Random-number generation  Finite-difference modeling  Image processing  Numerical Algorithms Group: GPU random-number generator  MATLAB: operations on large arrays/matrices  Wolfram Mathematica: symbolic math analysisData fromhttp://www.nvidia.com/object/computational_finance.html 24