SlideShare a Scribd company logo
1 of 23
Architecting the Right
System for Your AI
Application—without the Vendor Fluff
Brett Newman
VP Marketing & Customer Engagement
Microway, Inc.
wespeakhpc@microway.com
Where We’re Headed
1. Before You Start
• What do you know: Datasets, Algorithms, Collaborators
2. How to Select A System
• Common training, mixed workloads, datasets too large,
don’t know
3. Collaborating with Vendors
• Who, where, and what to look for
Who is This For?
End Users Who:
1. Don’t know where to start
2. Need a “checklist”
3. Afraid of/ hate working with vendors
4. Hate being sold to
Not for:
1. AI Framework Writers
2. 10+ year ninja GPU coders
Before You Start
What Do You Know?
About Your Dataset:
○ Size – overall
○ Chunkable? (batch size)
○ Size – individual datum
128GB
16GB
32GB + 32GB + 32GB + 32GB
8GB
Image Credit: By Leonardo da Vinci - Cropped and relevelled from File:Mona Lisa, by Leonardo da Vinci, from C2RMF.jpg.
Originally C2RMF: Galerie de tableaux en très haute définition: image page, Public Domain,
https://commons.wikimedia.org/w/index.php?curid=15442524
Visual Idea Inspiration Credit: Scott Soutter, IBM
1 multi
GPU
server
POWER9
w/NVLink or pre-
process
Various Tesla V100 systems
Overall: 128GB
Oversimplified Example
About Your Algorithm
○ Standard Framework vs. Custom Algorithm
○ Have You Run Any Profilers/Tools?
PCI-E Switching
OR
CPU:GPU NVLink
Denser,
NVLink Interconnected
(+10-20% on training)
Mixed
Workload
Ex: Molecular Dynamics +
AI Simulation Refinement
NVProf
Allinea Perf Tools
Intel Visual Profiler
What Do You Know?
Tool Examples
What Do You Know?
About Your Collaborators
○ Running on what HW?
○ Using Larger facilities?
Ex: Summit @ ORNL
Basic Guidance to
Architecting Your AI System
Algorithm: Solely AI Training, Common Frameworks
• Primary: NVLink connected systems, with GPU count to dataset scale/ budget
• Secondary: PCI-E systems (switched) with GPU count to dataset scale/ budget
4 GPUs with NVLink 8 GPUs with NVLink 16 GPUs with NVLink
Dataset Size (w/ batches <32GB)
NVLink: 10-20% training
perf. increase
Greatest Ease of Use with Perf., AI Training
DGX-Station
(4 GPUs)
DGX-1
(8 GPUs)
DGX-2
(16 GPUs)
Mixed Workloads or Small Datasets
• Balanced systems (2 sockets, full/half populated 2-4 GPUs)
• Greatest flexibility & expandability
Dataset: Too Large/Non “Chunkable”
• POWER9 Systems with Coherency + CPU: GPU NVLink (5X BW)
• Switched PCI-E Tree + Custom Algorithms with Unified Memory
POWER9 with NVLink8 GPUs with Switches
Don’t Know, Can’t Find Out
1. Test it! If at all possible
Upgrading from Fermi, Kepler > most
system architecture choices
2. No Matter Your Choice…
GPU acceleration > CPU systems (5X-50X)
Good, Better, Best
Collaborating with Vendors
Vendors: Who to Look For?
People & Titles
○ Technical Sales
○ Solution Engineer
○ Anyone who proves they know something
○ Anyone with proven access to hardware
Vendors: Who to Look For?
In Tier 1 Vendors
○ Find: HPC or AI Groups, exclusively (hard)
○ Avoid: general sellers, laptop/networking guy
In Tier 2 Vendors
○ Find: Established AI/HPC Vendors
○ Avoid: parts resellers/limited integration shops
○ Find: NVIDIA NPN Elite Deep Learning Partners
Vendors: What to Look For/Signals
Signals:
○ Ask for testing/benchmarking
○ Ask to see HW architecture of solution
(back of napkin OK)
○ Spending time on phone, email, or in
person?
Don’t work with someone who doesn’t
understand what you’re talking about!
Vendors: Strategies For a Better Engagement
Overshare
○ Every piece of data: about data, algorithm/code, your goals
○ About what is working/isn’t working today
○ About what you own
Discuss Collaborators
○ What do they own?
○ Need to plan to run together?
State Realistic Plans for Flexibility/Expansion
Review
What we Talked About
1. Before You Start
• What do you know: Datasets, Algorithms, Collaborators
2. How to Select A System
• Datasets too large, common training, mixed workloads,
don’t know
3. Collaborating with Vendors
• Who, where, and what to look for
Real Experts, Real Deliveries
So, Less Confused?
Gain confidence to Solve the AI HW Puzzle
The Best Vendors are Partners & Here to Help!
microway.com/gpu-test-drive/ microway.com/configure-
your-solution
calendly.com/microway/schedul
e-a-consulation
GPU Solutions Guide
Microway designs and builds fully-integrated clusters, servers, and
workstations. For 35 years, we have delivered high-performance
systems for data analytics, cognitive systems, research, and AI.
Leverage our expertise – We Speak HPC & AI
© Copyright 2019 Microway. All Rights Reserved.
Experts in High Performance Computing
http://www.microway.com
508-746-7341

More Related Content

Similar to Architecting the Right System for Your AI Application—without the Vendor Fluff

Similar to Architecting the Right System for Your AI Application—without the Vendor Fluff (20)

Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup Event
 
Taming Your Deep Learning Workflow by Determined AI
Taming Your Deep Learning Workflow by Determined AITaming Your Deep Learning Workflow by Determined AI
Taming Your Deep Learning Workflow by Determined AI
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Using Product Box to Build the Complete Developer
Using Product Box to Build the Complete DeveloperUsing Product Box to Build the Complete Developer
Using Product Box to Build the Complete Developer
 
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
 
Data science meetup - Spiros Antonatos
Data science meetup - Spiros AntonatosData science meetup - Spiros Antonatos
Data science meetup - Spiros Antonatos
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit edition
 
Unit no_1.pptx
Unit no_1.pptxUnit no_1.pptx
Unit no_1.pptx
 
PyDataStructs Tech Share at Quansight
PyDataStructs Tech Share at QuansightPyDataStructs Tech Share at Quansight
PyDataStructs Tech Share at Quansight
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 

Architecting the Right System for Your AI Application—without the Vendor Fluff

  • 1. Architecting the Right System for Your AI Application—without the Vendor Fluff Brett Newman VP Marketing & Customer Engagement Microway, Inc. wespeakhpc@microway.com
  • 2. Where We’re Headed 1. Before You Start • What do you know: Datasets, Algorithms, Collaborators 2. How to Select A System • Common training, mixed workloads, datasets too large, don’t know 3. Collaborating with Vendors • Who, where, and what to look for
  • 3. Who is This For? End Users Who: 1. Don’t know where to start 2. Need a “checklist” 3. Afraid of/ hate working with vendors 4. Hate being sold to Not for: 1. AI Framework Writers 2. 10+ year ninja GPU coders
  • 5. What Do You Know? About Your Dataset: ○ Size – overall ○ Chunkable? (batch size) ○ Size – individual datum 128GB 16GB 32GB + 32GB + 32GB + 32GB 8GB Image Credit: By Leonardo da Vinci - Cropped and relevelled from File:Mona Lisa, by Leonardo da Vinci, from C2RMF.jpg. Originally C2RMF: Galerie de tableaux en très haute définition: image page, Public Domain, https://commons.wikimedia.org/w/index.php?curid=15442524 Visual Idea Inspiration Credit: Scott Soutter, IBM 1 multi GPU server POWER9 w/NVLink or pre- process Various Tesla V100 systems Overall: 128GB Oversimplified Example
  • 6. About Your Algorithm ○ Standard Framework vs. Custom Algorithm ○ Have You Run Any Profilers/Tools? PCI-E Switching OR CPU:GPU NVLink Denser, NVLink Interconnected (+10-20% on training) Mixed Workload Ex: Molecular Dynamics + AI Simulation Refinement NVProf Allinea Perf Tools Intel Visual Profiler What Do You Know? Tool Examples
  • 7. What Do You Know? About Your Collaborators ○ Running on what HW? ○ Using Larger facilities? Ex: Summit @ ORNL
  • 9. Algorithm: Solely AI Training, Common Frameworks • Primary: NVLink connected systems, with GPU count to dataset scale/ budget • Secondary: PCI-E systems (switched) with GPU count to dataset scale/ budget 4 GPUs with NVLink 8 GPUs with NVLink 16 GPUs with NVLink Dataset Size (w/ batches <32GB) NVLink: 10-20% training perf. increase
  • 10. Greatest Ease of Use with Perf., AI Training DGX-Station (4 GPUs) DGX-1 (8 GPUs) DGX-2 (16 GPUs)
  • 11. Mixed Workloads or Small Datasets • Balanced systems (2 sockets, full/half populated 2-4 GPUs) • Greatest flexibility & expandability
  • 12. Dataset: Too Large/Non “Chunkable” • POWER9 Systems with Coherency + CPU: GPU NVLink (5X BW) • Switched PCI-E Tree + Custom Algorithms with Unified Memory POWER9 with NVLink8 GPUs with Switches
  • 13. Don’t Know, Can’t Find Out 1. Test it! If at all possible Upgrading from Fermi, Kepler > most system architecture choices 2. No Matter Your Choice… GPU acceleration > CPU systems (5X-50X) Good, Better, Best
  • 15. Vendors: Who to Look For? People & Titles ○ Technical Sales ○ Solution Engineer ○ Anyone who proves they know something ○ Anyone with proven access to hardware
  • 16. Vendors: Who to Look For? In Tier 1 Vendors ○ Find: HPC or AI Groups, exclusively (hard) ○ Avoid: general sellers, laptop/networking guy In Tier 2 Vendors ○ Find: Established AI/HPC Vendors ○ Avoid: parts resellers/limited integration shops ○ Find: NVIDIA NPN Elite Deep Learning Partners
  • 17. Vendors: What to Look For/Signals Signals: ○ Ask for testing/benchmarking ○ Ask to see HW architecture of solution (back of napkin OK) ○ Spending time on phone, email, or in person? Don’t work with someone who doesn’t understand what you’re talking about!
  • 18. Vendors: Strategies For a Better Engagement Overshare ○ Every piece of data: about data, algorithm/code, your goals ○ About what is working/isn’t working today ○ About what you own Discuss Collaborators ○ What do they own? ○ Need to plan to run together? State Realistic Plans for Flexibility/Expansion
  • 20. What we Talked About 1. Before You Start • What do you know: Datasets, Algorithms, Collaborators 2. How to Select A System • Datasets too large, common training, mixed workloads, don’t know 3. Collaborating with Vendors • Who, where, and what to look for
  • 21. Real Experts, Real Deliveries
  • 22. So, Less Confused? Gain confidence to Solve the AI HW Puzzle The Best Vendors are Partners & Here to Help! microway.com/gpu-test-drive/ microway.com/configure- your-solution calendly.com/microway/schedul e-a-consulation GPU Solutions Guide
  • 23. Microway designs and builds fully-integrated clusters, servers, and workstations. For 35 years, we have delivered high-performance systems for data analytics, cognitive systems, research, and AI. Leverage our expertise – We Speak HPC & AI © Copyright 2019 Microway. All Rights Reserved. Experts in High Performance Computing http://www.microway.com 508-746-7341

Editor's Notes

  1. What’s the overall size of your whole dataset? Does it fit into a single GPU or is it definitely a number of GPUs? Is it multi system? Chunkable – the professional term is whether you can set a reasonable batch size. Does you data fit into chunks the size of a GPU (or portion of one) Individual datum—sometimes your data is so large it won’t fit at all. That’s a case for a specialized code or specialized HW to compensate. Writing your code to manage data with CUDA unified memory, or better yet purchasing a POWER9 with NVLink system. Similarly, if you are using image data of fairly large size (or a batch size of many smaller, more likely), it’s likely a case for a 32GB Tesla GPU
  2. PCI-E switching Why CPU: GPU NVLink? If you can’t write efficiently
  3. End users underweight this. They are so focused on the concrete hardware value (how much, what’s my complicated price/performance calculation), that they miss the efficacy metric. If you and a primary collaborator need to dramatically change your ETL steps or even your runtime instructions perform similar runs, then you getting far less time out of your expensive hardware. Matching each other is hugely important Similarly, if you have opportunity for larger runs or dedicated time on a larger machine, matching this is critical.