2019 HighPerformance Computing - Strategies for Machine Learning.pdf

•

0 likes•11 views

Building on prem HPC is a challenging but sometimes necessary task for companies. Having specialised hardware for sophisticated tasks such as simulation, deep learning or optimisation can be financially, strategically or from pure availability point of view be important. This slide deck gives an general overview how this can be approached.

Technology

High Performance
Computing
strategies for machine learning

Why?
https://www.pomona.edu/administration/hpc/access-guidelines/primer

Focus for today
http://cloudscaling.com/blog/cloud-computing/grid-cloud-hpc-whats-the-diff/

Hardware for
HPC
HPC starts when you
feel that you need a
bigger laptop!
http://www.advancedclustering.com/hpc-compute-blocks-built-to-order-for-intensive-
workloads/

Good to know:
Scaling up with
Hardware
Understand
what speed is
possible.
http://www.moorinsightsstrategy.com/wp-content/uploads/2015/04/unnamed.png

Scaling up with
Hardware
You want to optimize how your
data ﬂows!
https://www.microway.com/product/octoputer-4u-tesla-8-gpu-server-nvlink/blockdiagram-sys-4028gr-tvxrt-teslav100/

Python tools for the help
Recommended talk:
https://youtu.be/HKjM3
peINtwe
tbb4py: python c-extension that is instantiated via monkey
patching python pools → enables TBB (threading building blocks
lib) of intel MKL (math kernel lib.). → I saw ~20% speedup on
some tests.

Goal: Fast R&D
turnover
Time is costly
Hardware is costly
Speed matters
Scaling is important
Data Science has a 1:n
compute requirement
Cloud: large differences in offers.
Costs per compute unit ...

… the story of a model that could not
be trained in sufficient time for
production ...

Product Strategy → ML Strategy →
Data Strategy + Compute Strategy →
HPC Case

https://news.developer.nvidia.com/new-translator-provides-more-human-li
ke-translations/
Example:
Product: Online Translator
HPC Strategy: HPC cluster in inceland

Current trends in hardware:
Trend: Heterogenization, Diversiﬁcation, HPC is entering consumer market more
and more

HPC accelerates research
Idea
Result
Experiment
Reasoning by Andrew Ng: https://youtu.be/c_55gZfUK1E

Deﬁne your
compute strategy?
Why HPC?
Why Cloud?
When Hybrid?
Answer this as early as possible →
large buy in risks.
Cloud:
Good if you have no owned
infrastructure, manpower and
want to get ready fast. Good for
scale and resilience.
Hybrid:
Best option if you have the
manpower and use case.
Gives you option to pick the
best from both worlds
On Premise:
Good if you can manage
the hardware. Good if you
want to be highly
optimised and know your
case.

What is your
compute strategy?
Why HPC
Why Cloud
When Hybrid
Be careful with case studies!!
What is your priority? Fast results,
Scalability, Resilience, Cost
efficiency
.
Data locality?
Utilization will drive your costs
structures
R&D turnover
Talent
available?
Business case ? Eg. IOT, embedded,
special hardware, consumer
electronics.

Example setup:
Data
Research
Deploy
! Good option
for HPC on prem
or in Cloud
Cloud / on
Prem
Cloud

A possible HPC setup for research
Slurm Network
NFS
(nearli
ne)
Master Node (weak)
Nodes
Send Jobs
User Login
SLURM Compute Node:
N CPU (specialized)
X RAM (Main memory)
Generic resources:
- Enhanced network
- Fast Internal flash
storage
- GPU /Phi Coprocessors
Same user id and
permissions on all
system components!!
Mounted
network file
system
Focus:
- Connect: Central model repo
- Connect: Central data repo
- Use templates that can be
ported to your cloud platform

Best Practices
Educate your researchers how to
best use the system
Develop standards and best
practices for the ML dev cycle (eg.
model versioning and testing)
Develope standards for transitions
between eg. between on premise
and cloud.
Check your product use case:
- eg. requirements for training
- many products vs. single (see
deepl.com example)

Similar to 2019 HighPerformance Computing - Strategies for Machine Learning.pdf

Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Indrajit Poddar

Monitoring IAAS & PAAS SolutionsColloquium

Elephants in the cloud or How to become cloud readyGetInData

Elephants in the cloud or how to become cloud readyKrzysztof Adamski

Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...Evention

IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...Angela Williams

Deep Learning with Spark and GPUsDataWorks Summit

Cluster ComputingBOSS Webtech

Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software

General Learning.pptxAmmarAhmedSiddiqui2

Power AI introductionSnowy Chen

Implementing AI: High Performance Architectures: Large scale HPC hardware in ...KTN

Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation

Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation

[AWS Dev Day] 인공지능 / 기계 학습 | AWS 기반 기계 학습 자동화 및 최적화를 위한 실전 기법 - 남궁영환 AWS 솔루션...Amazon Web Services Korea

AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services

Hadoop project design and a usecasesudhakara st

Lessons Learned on Benchmarking Big Data Platformst_ivanov

Cloud Computing for Small & Medium BusinessesAl Sabawi

Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-WarStorage Switzerland

Similar to 2019 HighPerformance Computing - Strategies for Machine Learning.pdf (20)

Optimizing Hortonworks Apache Spark machine learning workloads for contempora...

Monitoring IAAS & PAAS Solutions

Elephants in the cloud or How to become cloud ready

Elephants in the cloud or how to become cloud ready

Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...

IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...

Deep Learning with Spark and GPUs

Cluster Computing

Accelerate Big Data Processing with High-Performance Computing Technologies

General Learning.pptx

Power AI introduction

Implementing AI: High Performance Architectures: Large scale HPC hardware in ...

Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms

Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness

[AWS Dev Day] 인공지능 / 기계 학습 | AWS 기반 기계 학습 자동화 및 최적화를 위한 실전 기법 - 남궁영환 AWS 솔루션...

AWS re:Invent 2016: High Performance Computing on AWS (CMP207)

Hadoop project design and a usecase

Lessons Learned on Benchmarking Big Data Platforms

Cloud Computing for Small & Medium Businesses

Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

A Call to Action for Generative AI in 2024Results

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Slack Application Development 101 Slidespraypatel2

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Understanding the Laravel MVC ArchitecturePixlogix Infotech

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Salesforce Community Group Quito, Salesforce 101

A Call to Action for Generative AI in 2024

Boost PC performance: How more available memory can improve productivity

Finology Group – Insurtech Innovation Award 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Slack Application Development 101 Slides

Handwritten Text Recognition for manuscripts and early printed texts

Unblocking The Main Thread Solving ANRs and Frozen Frames

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Understanding the Laravel MVC Architecture

A Domino Admins Adventures (Engage 2024)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

How to Troubleshoot Apps for the Modern Connected Worker

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Google AI Hackathon: LLM based Evaluator for RAG

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

2019 HighPerformance Computing - Strategies for Machine Learning.pdf

1. High Performance Computing strategies for machine learning

2. Why? https://www.pomona.edu/administration/hpc/access-guidelines/primer

3. Focus for today http://cloudscaling.com/blog/cloud-computing/grid-cloud-hpc-whats-the-diff/

4. Hardware for HPC HPC starts when you feel that you need a bigger laptop! http://www.advancedclustering.com/hpc-compute-blocks-built-to-order-for-intensive- workloads/

5. Good to know: Scaling up with Hardware Understand what speed is possible. http://www.moorinsightsstrategy.com/wp-content/uploads/2015/04/unnamed.png

6. Scaling up with Hardware You want to optimize how your data ﬂows! https://www.microway.com/product/octoputer-4u-tesla-8-gpu-server-nvlink/blockdiagram-sys-4028gr-tvxrt-teslav100/

7. Python tools for the help Recommended talk: https://youtu.be/HKjM3 peINtwe tbb4py: python c-extension that is instantiated via monkey patching python pools → enables TBB (threading building blocks lib) of intel MKL (math kernel lib.). → I saw ~20% speedup on some tests.

8. Goal: Fast R&D turnover Time is costly Hardware is costly Speed matters Scaling is important Data Science has a 1:n compute requirement Cloud: large differences in offers. Costs per compute unit ...

9. … the story of a model that could not be trained in sufficient time for production ...

10. Product Strategy → ML Strategy → Data Strategy + Compute Strategy → HPC Case

11. https://news.developer.nvidia.com/new-translator-provides-more-human-li ke-translations/ Example: Product: Online Translator HPC Strategy: HPC cluster in inceland

12. Current trends in hardware: Trend: Heterogenization, Diversiﬁcation, HPC is entering consumer market more and more

13. Towards a HPC strategy

14. HPC accelerates research Idea Result Experiment Reasoning by Andrew Ng: https://youtu.be/c_55gZfUK1E

15. Example ML Platform

16. Deﬁne your compute strategy? Why HPC? Why Cloud? When Hybrid? Answer this as early as possible → large buy in risks. Cloud: Good if you have no owned infrastructure, manpower and want to get ready fast. Good for scale and resilience. Hybrid: Best option if you have the manpower and use case. Gives you option to pick the best from both worlds On Premise: Good if you can manage the hardware. Good if you want to be highly optimised and know your case.

17. What is your compute strategy? Why HPC Why Cloud When Hybrid Be careful with case studies!! What is your priority? Fast results, Scalability, Resilience, Cost efficiency . Data locality? Utilization will drive your costs structures R&D turnover Talent available? Business case ? Eg. IOT, embedded, special hardware, consumer electronics.

18. Example setup: Data Research Deploy ! Good option for HPC on prem or in Cloud Cloud / on Prem Cloud

19. A possible HPC setup for research Slurm Network NFS (nearli ne) Master Node (weak) Nodes Send Jobs User Login SLURM Compute Node: N CPU (specialized) X RAM (Main memory) Generic resources: - Enhanced network - Fast Internal flash storage - GPU /Phi Coprocessors Same user id and permissions on all system components!! Mounted network file system Focus: - Connect: Central model repo - Connect: Central data repo - Use templates that can be ported to your cloud platform

20. Best Practices Educate your researchers how to best use the system Develop standards and best practices for the ML dev cycle (eg. model versioning and testing) Develope standards for transitions between eg. between on premise and cloud. Check your product use case: - eg. requirements for training - many products vs. single (see deepl.com example)

2019 HighPerformance Computing - Strategies for Machine Learning.pdf

Recommended

Recommended

More Related Content

Similar to 2019 HighPerformance Computing - Strategies for Machine Learning.pdf

Similar to 2019 HighPerformance Computing - Strategies for Machine Learning.pdf (20)

Recently uploaded

Recently uploaded (20)

2019 HighPerformance Computing - Strategies for Machine Learning.pdf