SlideShare a Scribd company logo
1 of 24
Download to read offline
High Performance Computing

        Adam DeConinck
       R Systems NA, Inc.




        1
Development of models begins at small scale.

Working on your laptop is convenient, simple.

Actual analysis, however, is slow.




    2
Development of models begins at small scale.

Working on your laptop is convenient, simple.

Actual analysis, however, is slow.


“Scaling up” typically means a small server or
fast multi-core desktop.

Speedup exists, but for very large models, not
significant.

Single machines don't scale up forever.


    3
For the largest models, a different approach is required.


                    4
High-Performance Computing involves many
  distinct computer processors working together on
  the same calculation.

Large problems are divided into smaller parts and
  distributed among the many computers.

Usually clusters of quasi-independent computers
  which are coordinated by a central scheduler.


                   5
Typical HPC Cluster

             Login
External
connection                 Ethernet network




                     Scheduler



                                               Computes

                     File Server
                                              High-speed network
                                              (10GigE / Infiniband)

                      6
Performance gains
    High-end
    workstation



                    Duration (s)




                                            Number of cores


     Performance test: stochastic finance model on R Systems cluster

     High-end workstation: 8 cores. Maximum speedup of 20x: 4.5 hrs → 14 minutes
      
          Scale-up heavily model-dependent: 5x – 100x in our tests, can be faster

     No more performance gain after ~500 cores: why? Some operations can't be parallelized.

     Additional cores? Run multiple models simultaneously


                                     7
Performance comes at a price: complexity.



    New paradigm: real-time analysis vs batch jobs.

    Applications must be written specifically to take
    advantage of distributed computing.

    Performance characteristics of applications change.

    Debugging becomes more of a challenge.



                     8
New paradigm: real-time analysis vs batch jobs.




Most small analyses are done in    Large jobs are typically done in a
  real time:                          batch model:

    “At-your-desk” analysis        
                                       Submit job to a queue

    Small models only              
                                       Much larger models

    Fast iterations                
                                       Slow iterations

    No waiting for resources       
                                       May need to wait



                               9
Applications must be written specifically to
  take advantage of distributed
  computing.

    Explicitly split your problem into smaller
    “chunks”

    “Message passing” between processes

    Entire computation can be slowed by one
    or two slow chunks

    Exception: “embarrassingly parallel”
    problems

    Easy-to-split, independent chunks of
    computation

    Thankfully, many useful models fall under
                                                   “Embarrassingly parallel” =
    this heading. (e.g. stochastic models)       No inter-process communication

                              10
Performance characteristics of applications change.


On a single machine:        On a cluster:

    CPU speed (compute)     
                                Single-machine metrics

    Cache                   
                                Network

    Memory                  
                                File server

    Disk                    
                                Scheduler contention
                            
                                Results from other nodes



                   11
Debugging becomes more of a challenge.


    More complexity = more pieces that can fail

    Race conditions: sequence of events no longer deterministic

    Single nodes can “stall” and slow the entire computation

    Scheduler, file server, login server all have their own challenges




                          12
External resources

    One solution to handling complexity: outsource it!

    Historical HPC facilities: universities, national labs
    
        Often have the most absolute compute capacity, and will sell
        excess capacity
    
        Competition with academic projects, typically do not include
        SLA or high-level support

    Dedicated commercial HPC facilities providing “on-demand”
    compute power.



                           13
External HPC                      Internal HPC

    Outsource HPC sysadmin        
                                      Requires in-house expertise

    No hardware investment        
                                      Major investment in hardware

    Pay-as-you-go                 
                                      Possible idle time

    Easy to migrate to new tech   
                                      Upgrades require new hardware




                          14
Internal HPC                          External HPC

    No external contention            
                                          No guaranteed access

    All internal—easy security        
                                          Security arrangements complex

    Full control over configuration   
                                          Limited control of configuration

    Simpler licensing control         
                                          Some licensing complex


    Requires in-house expertise       
                                          Outsource HPC sysadmin

    Major investment in hardware      
                                          No hardware investment

    Possible idle time                
                                          Pay-as-you-go

    Upgrades require new hardware     
                                          Easy to migrate to new tech



                             15
“The Cloud”

    “Cloud computing”: virtual machines, dynamic allocation of resources in
    an external resource

    Lower performance (virtualization), higher flexibility

    Usually no contracts necessary: pay with your credit card, get 16 nodes

    Often have to do all your own sysadmin

    Low support, high control



                              16
CASE STUDY:
Windows cluster for Actuarial
       Application




         17
Global insurance company


    Needed 500-1000 cores on a temporary basis

    Preferred a utility, “pay-as-you-go” model

    Experimenting with external resources for “burst”
    capacity during high-activity periods

    Commercially licensed and supported application

    Requested a proof of concept

                     18
Cluster configuration

    Application embarrassingly parallel, small-to-medium data files,
    computationally and memory-intensive
    
        Prioritize computation (processors), access to fileserver over
        inter-node communication, large storage
    
        Upgraded memory in compute nodes to 2 GB/core

    128-node cluster: 3.0 GHz Intel Xeon processors, 8 cores per node for
    1024 cores total

    Windows 2008 HPC R2 operating system

    Application and fileserver on login node


                            19
Stumbling blocks

    Application optimization
Customer had a wide variety of models which generated different usage
  patterns. (IO, compute, memory-intensive jobs) Required dynamic
  reconfiguration for different conditions.

    Technical issue
Iterative testing process. Application turned out to be generating massive
   fileserver contention. Had to make changes to both software and hardware.

    Human processes
    Users were accustomed to internal access model. Required changes both
    for providers (increase ease-of-use) and users (change workflow)

    Security
    Customer had never worked with an external provider before. Complex
    internal security policy had to be reconciled with remote access.
                            20
Lessons learned:


    Security was the biggest delaying factor. The initial security setup took over
    3 months from the first expression of interest, even though cluster setup
    was done in less than a week.
    
        Only mattered the first time though: subsequent runs started much
        more smoothly.

    A low-cost proof-of-concept run was important to demonstrate feasibility,
    and for working the bugs out.

    A good relationship with the application vendor was extremely important
    to solving problems and properly optimizing the model for performance.



                              21
Recent developments: GPUs




       22
Graphics processing units




    CPU: complex, general-purpose processor

    GPU: highly-specialized parallel processor, optimized for performing operations for
    common graphics routines

    Highly specialized → many more “cores” for same cost and space
     
         Intel Core i7: 4 cores @ 3.4 GHz: $300 = $75/core
     
         NVIDIA Tesla M2070: 448 cores @ 575 MHz: $4500 = $10/core

    Also higher bandwidth: 100+ GB/s for GPU vs 10-30 GB/s for CPU

    Same operations can be adapted for non-graphics applications: “GPGPU”

                                       23
                      Image from http://blogs.nvidia.com/2009/12/whats-the-difference-between-a-cpu-and-a-gpu/
HPC/Actuarial using GPUs
                                                   
                                                        Random-number generation
                                                   
                                                        Finite-difference modeling
                                                   
                                                        Image processing

                                                   
                                                        Numerical Algorithms Group:
                                                        GPU random-number generator
                                                   
                                                        MATLAB: operations on large arrays/matrices
                                                   
                                                        Wolfram Mathematica: symbolic math analysis


Data from
http://www.nvidia.com/object/computational_finan
ce.html




                                                   24

More Related Content

What's hot

High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
Divyen Patel
 
High performance computing
High performance computingHigh performance computing
High performance computing
Guy Tel-Zur
 
ZFS
ZFSZFS

What's hot (20)

Introduction to High-Performance Computing
Introduction to High-Performance ComputingIntroduction to High-Performance Computing
Introduction to High-Performance Computing
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Decoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMIDecoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMI
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
New Accelerated Compute Infrastructure Solutions from Supermicro
New Accelerated Compute Infrastructure Solutions from SupermicroNew Accelerated Compute Infrastructure Solutions from Supermicro
New Accelerated Compute Infrastructure Solutions from Supermicro
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
 
Perspective on HPC-enabled AI
Perspective on HPC-enabled AIPerspective on HPC-enabled AI
Perspective on HPC-enabled AI
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPI
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
AMD Processor
AMD ProcessorAMD Processor
AMD Processor
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
ZFS
ZFSZFS
ZFS
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
The Power of HPC with Next Generation Supermicro Systems
The Power of HPC with Next Generation Supermicro Systems The Power of HPC with Next Generation Supermicro Systems
The Power of HPC with Next Generation Supermicro Systems
 
InfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and RoadmapInfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and Roadmap
 
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P..."Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 

Viewers also liked

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
 

Viewers also liked (20)

High Performance Computing - The Future is Here
High Performance Computing - The Future is HereHigh Performance Computing - The Future is Here
High Performance Computing - The Future is Here
 
High performance concrete ppt
High performance concrete pptHigh performance concrete ppt
High performance concrete ppt
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
INCITE - INtegrated Components for Interactive TEaching
INCITE - INtegrated Components for Interactive TEachingINCITE - INtegrated Components for Interactive TEaching
INCITE - INtegrated Components for Interactive TEaching
 
JAWS
JAWSJAWS
JAWS
 
High Performance Statistical Computing
High Performance Statistical ComputingHigh Performance Statistical Computing
High Performance Statistical Computing
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Kalray TURBOCARD2 @ ISC'14
Kalray TURBOCARD2 @ ISC'14Kalray TURBOCARD2 @ ISC'14
Kalray TURBOCARD2 @ ISC'14
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
 
GPFS - graphical intro
GPFS - graphical introGPFS - graphical intro
GPFS - graphical intro
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Unix _linux_fundamentals_for_hpc-_b
Unix  _linux_fundamentals_for_hpc-_bUnix  _linux_fundamentals_for_hpc-_b
Unix _linux_fundamentals_for_hpc-_b
 
Parasitic Computing
Parasitic ComputingParasitic Computing
Parasitic Computing
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
Delivering Transformational Solutions to Industry by Dr. Frederick Streitz, D...
Delivering Transformational Solutions to Industry by Dr. Frederick Streitz, D...Delivering Transformational Solutions to Industry by Dr. Frederick Streitz, D...
Delivering Transformational Solutions to Industry by Dr. Frederick Streitz, D...
 
Biometric technology
Biometric technologyBiometric technology
Biometric technology
 

Similar to High Performance Computing: an Introduction for the Society of Actuaries

Performance testing virtualized systems v5
Performance testing virtualized systems v5Performance testing virtualized systems v5
Performance testing virtualized systems v5
Mentora
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
ChangWoo Min
 

Similar to High Performance Computing: an Introduction for the Society of Actuaries (20)

Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
B9 cmis
B9 cmisB9 cmis
B9 cmis
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Matching Your Costs to Your DAU: Thin Client Back-End Infrastructure Made Easy
Matching Your Costs to Your DAU: Thin Client Back-End Infrastructure Made EasyMatching Your Costs to Your DAU: Thin Client Back-End Infrastructure Made Easy
Matching Your Costs to Your DAU: Thin Client Back-End Infrastructure Made Easy
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
InTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AIInTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AI
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
 
Performance testing virtualized systems v5
Performance testing virtualized systems v5Performance testing virtualized systems v5
Performance testing virtualized systems v5
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
 
Adaptive Computing Using PlateSpin Orchestrate
Adaptive Computing Using PlateSpin OrchestrateAdaptive Computing Using PlateSpin Orchestrate
Adaptive Computing Using PlateSpin Orchestrate
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

High Performance Computing: an Introduction for the Society of Actuaries

  • 1. High Performance Computing Adam DeConinck R Systems NA, Inc. 1
  • 2. Development of models begins at small scale. Working on your laptop is convenient, simple. Actual analysis, however, is slow. 2
  • 3. Development of models begins at small scale. Working on your laptop is convenient, simple. Actual analysis, however, is slow. “Scaling up” typically means a small server or fast multi-core desktop. Speedup exists, but for very large models, not significant. Single machines don't scale up forever. 3
  • 4. For the largest models, a different approach is required. 4
  • 5. High-Performance Computing involves many distinct computer processors working together on the same calculation. Large problems are divided into smaller parts and distributed among the many computers. Usually clusters of quasi-independent computers which are coordinated by a central scheduler. 5
  • 6. Typical HPC Cluster Login External connection Ethernet network Scheduler Computes File Server High-speed network (10GigE / Infiniband) 6
  • 7. Performance gains High-end workstation Duration (s) Number of cores  Performance test: stochastic finance model on R Systems cluster  High-end workstation: 8 cores. Maximum speedup of 20x: 4.5 hrs → 14 minutes  Scale-up heavily model-dependent: 5x – 100x in our tests, can be faster  No more performance gain after ~500 cores: why? Some operations can't be parallelized.  Additional cores? Run multiple models simultaneously 7
  • 8. Performance comes at a price: complexity.  New paradigm: real-time analysis vs batch jobs.  Applications must be written specifically to take advantage of distributed computing.  Performance characteristics of applications change.  Debugging becomes more of a challenge. 8
  • 9. New paradigm: real-time analysis vs batch jobs. Most small analyses are done in Large jobs are typically done in a real time: batch model:  “At-your-desk” analysis  Submit job to a queue  Small models only  Much larger models  Fast iterations  Slow iterations  No waiting for resources  May need to wait 9
  • 10. Applications must be written specifically to take advantage of distributed computing.  Explicitly split your problem into smaller “chunks”  “Message passing” between processes  Entire computation can be slowed by one or two slow chunks  Exception: “embarrassingly parallel” problems  Easy-to-split, independent chunks of computation  Thankfully, many useful models fall under “Embarrassingly parallel” = this heading. (e.g. stochastic models) No inter-process communication 10
  • 11. Performance characteristics of applications change. On a single machine: On a cluster:  CPU speed (compute)  Single-machine metrics  Cache  Network  Memory  File server  Disk  Scheduler contention  Results from other nodes 11
  • 12. Debugging becomes more of a challenge.  More complexity = more pieces that can fail  Race conditions: sequence of events no longer deterministic  Single nodes can “stall” and slow the entire computation  Scheduler, file server, login server all have their own challenges 12
  • 13. External resources  One solution to handling complexity: outsource it!  Historical HPC facilities: universities, national labs  Often have the most absolute compute capacity, and will sell excess capacity  Competition with academic projects, typically do not include SLA or high-level support  Dedicated commercial HPC facilities providing “on-demand” compute power. 13
  • 14. External HPC Internal HPC  Outsource HPC sysadmin  Requires in-house expertise  No hardware investment  Major investment in hardware  Pay-as-you-go  Possible idle time  Easy to migrate to new tech  Upgrades require new hardware 14
  • 15. Internal HPC External HPC  No external contention  No guaranteed access  All internal—easy security  Security arrangements complex  Full control over configuration  Limited control of configuration  Simpler licensing control  Some licensing complex  Requires in-house expertise  Outsource HPC sysadmin  Major investment in hardware  No hardware investment  Possible idle time  Pay-as-you-go  Upgrades require new hardware  Easy to migrate to new tech 15
  • 16. “The Cloud”  “Cloud computing”: virtual machines, dynamic allocation of resources in an external resource  Lower performance (virtualization), higher flexibility  Usually no contracts necessary: pay with your credit card, get 16 nodes  Often have to do all your own sysadmin  Low support, high control 16
  • 17. CASE STUDY: Windows cluster for Actuarial Application 17
  • 18. Global insurance company  Needed 500-1000 cores on a temporary basis  Preferred a utility, “pay-as-you-go” model  Experimenting with external resources for “burst” capacity during high-activity periods  Commercially licensed and supported application  Requested a proof of concept 18
  • 19. Cluster configuration  Application embarrassingly parallel, small-to-medium data files, computationally and memory-intensive  Prioritize computation (processors), access to fileserver over inter-node communication, large storage  Upgraded memory in compute nodes to 2 GB/core  128-node cluster: 3.0 GHz Intel Xeon processors, 8 cores per node for 1024 cores total  Windows 2008 HPC R2 operating system  Application and fileserver on login node 19
  • 20. Stumbling blocks  Application optimization Customer had a wide variety of models which generated different usage patterns. (IO, compute, memory-intensive jobs) Required dynamic reconfiguration for different conditions.  Technical issue Iterative testing process. Application turned out to be generating massive fileserver contention. Had to make changes to both software and hardware.  Human processes Users were accustomed to internal access model. Required changes both for providers (increase ease-of-use) and users (change workflow)  Security Customer had never worked with an external provider before. Complex internal security policy had to be reconciled with remote access. 20
  • 21. Lessons learned:  Security was the biggest delaying factor. The initial security setup took over 3 months from the first expression of interest, even though cluster setup was done in less than a week.  Only mattered the first time though: subsequent runs started much more smoothly.  A low-cost proof-of-concept run was important to demonstrate feasibility, and for working the bugs out.  A good relationship with the application vendor was extremely important to solving problems and properly optimizing the model for performance. 21
  • 23. Graphics processing units  CPU: complex, general-purpose processor  GPU: highly-specialized parallel processor, optimized for performing operations for common graphics routines  Highly specialized → many more “cores” for same cost and space  Intel Core i7: 4 cores @ 3.4 GHz: $300 = $75/core  NVIDIA Tesla M2070: 448 cores @ 575 MHz: $4500 = $10/core  Also higher bandwidth: 100+ GB/s for GPU vs 10-30 GB/s for CPU  Same operations can be adapted for non-graphics applications: “GPGPU” 23 Image from http://blogs.nvidia.com/2009/12/whats-the-difference-between-a-cpu-and-a-gpu/
  • 24. HPC/Actuarial using GPUs  Random-number generation  Finite-difference modeling  Image processing  Numerical Algorithms Group: GPU random-number generator  MATLAB: operations on large arrays/matrices  Wolfram Mathematica: symbolic math analysis Data from http://www.nvidia.com/object/computational_finan ce.html 24