SlideShare a Scribd company logo
1 of 17
Download to read offline
© 2019 Cray Inc.
TRAINING AI FOR
AUTONOMOUS VEHICLES:
W H E R E E N T E R P R I S E S T O R A G E H I T S T H E W A L L
Per Nyberg I Uli Plechschmidt I London, March 28 2019
© 2019 Cray Inc.
Pharma Energy
Autonomous
Technologies Insurance National AI Initiative Scientific Research
Top 5 Global
Pharma
Major Integrated
Oil and Gas
Company
Fortune 20 Global
Technology
Company
Leading Personal
Lines Property and
Casualty Insurer
Alan Turing
Institute
(UK National Data Science
and Artificial Intelligence
Center)
National Energy
Research Scientific
Computing Center
Cray CS-Storm Dense
GPU Cluster
Cray CS-Storm and
XC50 w. Urika-XC
Cray CS-Storm Dense
GPU Cluster
Cray CS-Storm Dense
GPU Cluster
Cray Urika-GX
Cray XC40 “Cori”
Supercomputer
Supporting core
research and
development in areas
including
cheminformatics and
large machine learning
workloads such as
CryoEM
Applying machine and
deep learning
techniques to identify
previously undetected
features in subsurface
seismic imaging
Machine and deep
learning workloads to
develop systems for
connected cars and
autonomous
technologies
Application of deep
learning for the
automation of claims
processing
Enabling the
development of
advanced applications
across a number of
fields including
engineering and
technology, defense and
security, and smart
cities
Supporting machine and
deep learning across a
broad set of science
disciplines
Record breaking deep
learning results in
Climate, High Energy
Physics and Astronomy
Industry Academia Government
AI IS PERVASIVE ACROSS OUR CUSTOMER BASE
© 2019 Cray Inc.
WHAT IS AUTONOMOUS DRIVING?
3
© 2019 Cray Inc.
Massive Simulation Using AI/ML @ Extreme Scale
MASSIVE SIMULATION TASK
* www.rand.org/content/dam/rand/pubs/research_reports/RR1400/RR1478/RAND_RR1478.pdf
“Autonomous vehicles need to be driven more than
11 billion miles to be 20% better than humans.
With a fleet of 100 vehicles, 24 hours a day, 365 days a
year, at 25 miles per hour, this would take 518 years*.”
As we cannot wait
518 years
4
© 2019 Cray Inc.
HIGH-LEVEL DEVELOPMENT PROCESS
Sensor data from
training cars
Model
Engineering
Autonomous
VehiclesIngest BIG Data Trained Model
Data logger in
training car
AI infrastructure in
data center
Control unit in
self-driving car
Data scientists
5
© 2019 Cray Inc.
AUTONOMOUS VEHICLES: AI WORKFLOW
Search
analysis
Training
data
Report
results
Test
data
Data
preprocessing
Ingest
data
Model
training
Re-
simulation
Model
testing
Model
deployment
Train Test Loop
Data-intensive
Compute-intensive
6
© 2019 Cray Inc.
AUTONOMOUS VEHICLES: THE TIP OF THE PYRAMID
Extreme
Moderate
StorageRequirements
Level 1: Workgroup AI
Level 2: Departmental AI
Level 3: Divisional AI
Level 4:Training AI for Autonomous Vehicles
7
© 2019 Cray Inc.
EXAMPLES OF TRAINING CARS
Up to 5 Terabyte/hour per training car running 8 hours x 100 training cars =
4 Petabyte per day
a
100 training cars running 5 working days per week =
20 Petabyte per week
8
© 2019 Cray Inc.
EXTREME STORAGE REQUIREMENTS
Example: European Car Manufacturer
Base
Case
Scenario
100% Growth
“Buffer”
Translation to
Storage
Requirements
Useable capacity in
single namespace
130
Petabyte
260
Petabyte
About 170 PB (340 PB) raw capacity
Re-simulation of 100%
of the data in ….. hours
100
hours
50
hours
• 130 PB in 100 hours = 361 GB/sec
• 130 PB in 50 hours = 722 GB/sec
• 260 PB in 100 hours = 722 GB/sec
• 260 PB in 50 hours = 1.4 TB/sec
9
~ 470 HD movies
per second
~ 87 million
HD movies
© 2019 Cray Inc.
STORAGE CHALLENGE TRIANGLE
Scalability
(in Petabyte)
Throughput
(in Gigabyte/second)
Affordability
(%age of storage spending of total Autonomous Driving budget)
10
© 2019 Cray Inc.
ENTERPRISE STORAGE IS CHALLENGED
Scalability
(to 100’s of
Petabyte capacity)
Throughput
(of 100’s of
Gigabyte/sec)
Affordability
(achieve the requirements
at lowest possible TCO)
Classic Block/File Storage
(e.g. Dell EMC PowerMax, NetApp FAS, HPE
3PAR)
Scale-Out NAS
(e.g. Dell EMC Isilon, NetApp Clustered
ONTAP)
Distributed Object Storage
(e.g. Scality, Cloudian, Dell EMC ECS, IBM
COS, Pure Storage Flashblade etc.)
Hadoop-based Storage
(e.g. Cloudera/Hortonworks, MapR, Oracle Big
Data Appliance etc.)
Public Cloud Storage
(e.g. Amazon AWS S3, Microsoft Azure,
Google Cloud Storage etc.)
11
© 2019 Cray Inc. 12
SUPERCOMPUTING STORAGE WORKS!
Storage choices of the Top 100* Global Supercomputers
27
20
14
10
8 7
5 5
3
1
0
5
10
15
20
25
30
Cray
ClusterStor
DDN
EXAScaler
NetApp Chinese Lenovo
DSS-G
DDN
GRIDScaler
Fujitsu
FEFS
IBM ESS Unknown Supermicro
#1
*November 2018 top500.org list
#1File System in the Top 100
with 76% share
LUSTRE
OTHERS
Storage System in the Top 100
with 27% share
© 2019 Cray Inc.
WHY DID UBER ATG SWITCH TO LUSTRE?
http://wiki.lustre.org/images/e/ed/LUG2018-Lustre_at_Uber-Cobb_Xiong.pdf 13
© 2019 Cray Inc. 14
HIGH LEVEL SOLUTION ARCHITECTURE
Hadoop CPU nodes
for data pre-processing
CPU & dense GPU nodes
for deep learning
Hardware-in-the-loop (HIL)
simulators
(data access with standard CIFS/NFS)Lustre client Lustre client
90%+ READ 90%+ READ
Scalable CIFS/NFS Gateway
READWRITE
Cray ClusterStor L300 Lustre Global Namespace
Scaling to 100’s of Petabyte and Terabyte/second
© 2019 Cray Inc.
ENGINEERED RACK-SCALE HPC STORAGE
Data Network Switches (IB, OPA, GbE)
Management Network Switches
System Management Unit (SMU)
Metadata Management Unit (MMU)
Base Rack Expansion
Rack
Expansion
Rack
Expansion
Rack
Expansion
Rack
Scalable Storage Units (SSU)
with embedded HA OSS*:
5U84 5U84 2U24
L300NL300 L300F
All HHD:
Large,
sequential
I/O
All Flash:
Small,
random,
I/O
Hybrid
(SSD/HDD):
Mixed I/O
*Object Storage Servers
Mix & Match
Ex
15
© 2019 Cray Inc.
CLUSTERSTOR FOR AUTONOMOUS VEHICLES
Performance Efficiency
Achieve requirements with the most efficient architecture
requiring less storage gear (drives, enclosures, racks)
More budget left to spend on data science
Engineered Solution
Pre-integrated, tested, tuned,
and shipped ready to deploy.
Days instead of weeks
to implementation
Reliability
No single point of failure
architecture and parity
de-clustered RAID
Less downtime and
(much) faster rebuilds
Scalability
Sustained linear performance
when adding capacity up to
hundreds of Petabyte
Predictable application
performance @ scale
Management & Support
Integrated system management, System Snapshot
Analyzer (SSA)/Call home, hardware monitoring
with health alerts, API, HPC storage analytics
on job level with View for ClusterStor
Less downtime and faster time-to-problem
resolution
Unique
Values
1
34
5 2
16
© 2019 Cray Inc.
THANK YOU
W h e n e v e r y t h i n g e l s e b r e a k s ,
i t i s t i m e t o c o n s i d e r y o u r f i r s t C r a y s y s t e m !

More Related Content

What's hot

Ericsson 5G Radio Dot Launch
Ericsson 5G Radio Dot LaunchEricsson 5G Radio Dot Launch
Ericsson 5G Radio Dot LaunchEricsson
 
SES - I SSPI Day 2017
SES - I SSPI Day 2017SES - I SSPI Day 2017
SES - I SSPI Day 2017SSPI Brasil
 
Ericsson transports 5G
Ericsson transports 5GEricsson transports 5G
Ericsson transports 5GEricsson
 
Satellite Connectivity will make The Internet of Things (IoT) Really Work
Satellite Connectivity will make The Internet of Things (IoT) Really WorkSatellite Connectivity will make The Internet of Things (IoT) Really Work
Satellite Connectivity will make The Internet of Things (IoT) Really WorkKymeta Corporation
 
MIPI DevCon Seoul 2018: MIPI State of the Alliance
MIPI DevCon Seoul 2018: MIPI State of the AllianceMIPI DevCon Seoul 2018: MIPI State of the Alliance
MIPI DevCon Seoul 2018: MIPI State of the AllianceMIPI Alliance
 
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated Driving
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated DrivingMIPI DevCon Taipei 2019 Keynote: Technologies for Automated Driving
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated DrivingMIPI Alliance
 
The Ground Rules for Satellite and IoT
The Ground Rules for Satellite and IoTThe Ground Rules for Satellite and IoT
The Ground Rules for Satellite and IoTtechUK
 
Reliable Onboard Media Services for the Latest Heavy-Load Cargo Rebuild - Cas...
Reliable Onboard Media Services for the Latest Heavy-Load Cargo Rebuild - Cas...Reliable Onboard Media Services for the Latest Heavy-Load Cargo Rebuild - Cas...
Reliable Onboard Media Services for the Latest Heavy-Load Cargo Rebuild - Cas...ST Engineering iDirect
 
Building the platform for 5G
Building the platform for 5GBuilding the platform for 5G
Building the platform for 5GEricsson
 
Connected Train and Customer Communications: Rail and Digital Industry Roadmap
Connected Train and Customer Communications: Rail and Digital Industry RoadmapConnected Train and Customer Communications: Rail and Digital Industry Roadmap
Connected Train and Customer Communications: Rail and Digital Industry RoadmapToby Treacher
 
The “Telematics Horizon” V2V and V2I Networking
The “Telematics Horizon” V2V and V2I NetworkingThe “Telematics Horizon” V2V and V2I Networking
The “Telematics Horizon” V2V and V2I NetworkingAlison Chaiken
 
Kymeta satellite2013nonnda finalweb
Kymeta satellite2013nonnda finalwebKymeta satellite2013nonnda finalweb
Kymeta satellite2013nonnda finalwebJohn Humphrey
 
Beginners: When will 2G & 3G be switched off now that 5G is here?
Beginners: When will 2G & 3G be switched off now that 5G is here?Beginners: When will 2G & 3G be switched off now that 5G is here?
Beginners: When will 2G & 3G be switched off now that 5G is here?3G4G
 
Mistral Workshop
Mistral WorkshopMistral Workshop
Mistral WorkshopSirti
 
5G Crosshaul vs 5G-XHaul
5G Crosshaul vs 5G-XHaul5G Crosshaul vs 5G-XHaul
5G Crosshaul vs 5G-XHaul3G4G
 
Examining Gatespace / Ericssonʼs Telematics Solutions - C Larsson
Examining Gatespace / Ericssonʼs Telematics Solutions - C LarssonExamining Gatespace / Ericssonʼs Telematics Solutions - C Larsson
Examining Gatespace / Ericssonʼs Telematics Solutions - C Larssonmfrancis
 
MISTRAL project Workshop 1: Trends on future train-to-wayside communications
MISTRAL project Workshop 1: Trends on future train-to-wayside communicationsMISTRAL project Workshop 1: Trends on future train-to-wayside communications
MISTRAL project Workshop 1: Trends on future train-to-wayside communicationsVeronika Nedviga
 

What's hot (18)

Ericsson 5G Radio Dot Launch
Ericsson 5G Radio Dot LaunchEricsson 5G Radio Dot Launch
Ericsson 5G Radio Dot Launch
 
SES - I SSPI Day 2017
SES - I SSPI Day 2017SES - I SSPI Day 2017
SES - I SSPI Day 2017
 
Ericsson transports 5G
Ericsson transports 5GEricsson transports 5G
Ericsson transports 5G
 
Satellite Connectivity will make The Internet of Things (IoT) Really Work
Satellite Connectivity will make The Internet of Things (IoT) Really WorkSatellite Connectivity will make The Internet of Things (IoT) Really Work
Satellite Connectivity will make The Internet of Things (IoT) Really Work
 
MIPI DevCon Seoul 2018: MIPI State of the Alliance
MIPI DevCon Seoul 2018: MIPI State of the AllianceMIPI DevCon Seoul 2018: MIPI State of the Alliance
MIPI DevCon Seoul 2018: MIPI State of the Alliance
 
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated Driving
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated DrivingMIPI DevCon Taipei 2019 Keynote: Technologies for Automated Driving
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated Driving
 
The Ground Rules for Satellite and IoT
The Ground Rules for Satellite and IoTThe Ground Rules for Satellite and IoT
The Ground Rules for Satellite and IoT
 
Reliable Onboard Media Services for the Latest Heavy-Load Cargo Rebuild - Cas...
Reliable Onboard Media Services for the Latest Heavy-Load Cargo Rebuild - Cas...Reliable Onboard Media Services for the Latest Heavy-Load Cargo Rebuild - Cas...
Reliable Onboard Media Services for the Latest Heavy-Load Cargo Rebuild - Cas...
 
Building the platform for 5G
Building the platform for 5GBuilding the platform for 5G
Building the platform for 5G
 
Connected Train and Customer Communications: Rail and Digital Industry Roadmap
Connected Train and Customer Communications: Rail and Digital Industry RoadmapConnected Train and Customer Communications: Rail and Digital Industry Roadmap
Connected Train and Customer Communications: Rail and Digital Industry Roadmap
 
The “Telematics Horizon” V2V and V2I Networking
The “Telematics Horizon” V2V and V2I NetworkingThe “Telematics Horizon” V2V and V2I Networking
The “Telematics Horizon” V2V and V2I Networking
 
Kymeta satellite2013nonnda finalweb
Kymeta satellite2013nonnda finalwebKymeta satellite2013nonnda finalweb
Kymeta satellite2013nonnda finalweb
 
Beginners: When will 2G & 3G be switched off now that 5G is here?
Beginners: When will 2G & 3G be switched off now that 5G is here?Beginners: When will 2G & 3G be switched off now that 5G is here?
Beginners: When will 2G & 3G be switched off now that 5G is here?
 
Mistral Workshop
Mistral WorkshopMistral Workshop
Mistral Workshop
 
5G Crosshaul vs 5G-XHaul
5G Crosshaul vs 5G-XHaul5G Crosshaul vs 5G-XHaul
5G Crosshaul vs 5G-XHaul
 
Examining Gatespace / Ericssonʼs Telematics Solutions - C Larsson
Examining Gatespace / Ericssonʼs Telematics Solutions - C LarssonExamining Gatespace / Ericssonʼs Telematics Solutions - C Larsson
Examining Gatespace / Ericssonʼs Telematics Solutions - C Larsson
 
5G, the way forward!
5G, the way forward!5G, the way forward!
5G, the way forward!
 
MISTRAL project Workshop 1: Trends on future train-to-wayside communications
MISTRAL project Workshop 1: Trends on future train-to-wayside communicationsMISTRAL project Workshop 1: Trends on future train-to-wayside communications
MISTRAL project Workshop 1: Trends on future train-to-wayside communications
 

Similar to Training AI for Autonomous Vehicles: Where Enterprise Storage Hits the Wall

Accelerating Time to Science Using Cloud
Accelerating Time to Science Using CloudAccelerating Time to Science Using Cloud
Accelerating Time to Science Using CloudAmazon Web Services
 
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...Amazon Web Services
 
Scaling Tightly Coupled Algorithms on AWS - Scott Eberhardt, HPC
Scaling Tightly Coupled Algorithms on AWS - Scott Eberhardt, HPCScaling Tightly Coupled Algorithms on AWS - Scott Eberhardt, HPC
Scaling Tightly Coupled Algorithms on AWS - Scott Eberhardt, HPCAmazon Web Services
 
Enabling Research Using Cloud Computing
Enabling Research Using Cloud ComputingEnabling Research Using Cloud Computing
Enabling Research Using Cloud ComputingAmazon Web Services
 
Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clar...
Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clar...Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clar...
Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clar...Amazon Web Services
 
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...Amazon Web Services
 
IBM Storage for AI and Big Data
IBM Storage for AI and Big DataIBM Storage for AI and Big Data
IBM Storage for AI and Big DataTony Pearson
 
S110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cS110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cTony Pearson
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSAmazon Web Services
 
What would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWSWhat would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWSAmazon Web Services
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28Amazon Web Services
 
Computação de Alta Performance (HPC) na AWS - CMP201 - Sao Paulo Summit
Computação de Alta Performance (HPC) na AWS -  CMP201 - Sao Paulo SummitComputação de Alta Performance (HPC) na AWS -  CMP201 - Sao Paulo Summit
Computação de Alta Performance (HPC) na AWS - CMP201 - Sao Paulo SummitAmazon Web Services
 
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019
IoT  Edge Data Processing with NVidia Jetson Nano oct 3 2019IoT  Edge Data Processing with NVidia Jetson Nano oct 3 2019
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019Timothy Spann
 
Rightscale Cloudcamp Boston
Rightscale  Cloudcamp BostonRightscale  Cloudcamp Boston
Rightscale Cloudcamp Bostonjtreadway
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Arun Gupta
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Amazon Web Services
 
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...Amazon Web Services
 

Similar to Training AI for Autonomous Vehicles: Where Enterprise Storage Hits the Wall (20)

Accelerating Time to Science Using Cloud
Accelerating Time to Science Using CloudAccelerating Time to Science Using Cloud
Accelerating Time to Science Using Cloud
 
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
 
Scaling Tightly Coupled Algorithms on AWS - Scott Eberhardt, HPC
Scaling Tightly Coupled Algorithms on AWS - Scott Eberhardt, HPCScaling Tightly Coupled Algorithms on AWS - Scott Eberhardt, HPC
Scaling Tightly Coupled Algorithms on AWS - Scott Eberhardt, HPC
 
Webinar AI&ML
Webinar AI&MLWebinar AI&ML
Webinar AI&ML
 
Enabling Research Using Cloud Computing
Enabling Research Using Cloud ComputingEnabling Research Using Cloud Computing
Enabling Research Using Cloud Computing
 
What Can HPC on AWS Do?
What Can HPC on AWS Do?What Can HPC on AWS Do?
What Can HPC on AWS Do?
 
Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clar...
Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clar...Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clar...
Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clar...
 
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
 
IBM Storage for AI and Big Data
IBM Storage for AI and Big DataIBM Storage for AI and Big Data
IBM Storage for AI and Big Data
 
S110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909cS110646 storage-for-ai-jburg-v1909c
S110646 storage-for-ai-jburg-v1909c
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWS
 
What would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWSWhat would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWS
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
 
Computação de Alta Performance (HPC) na AWS - CMP201 - Sao Paulo Summit
Computação de Alta Performance (HPC) na AWS -  CMP201 - Sao Paulo SummitComputação de Alta Performance (HPC) na AWS -  CMP201 - Sao Paulo Summit
Computação de Alta Performance (HPC) na AWS - CMP201 - Sao Paulo Summit
 
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019
IoT  Edge Data Processing with NVidia Jetson Nano oct 3 2019IoT  Edge Data Processing with NVidia Jetson Nano oct 3 2019
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019
 
Rightscale Cloudcamp Boston
Rightscale  Cloudcamp BostonRightscale  Cloudcamp Boston
Rightscale Cloudcamp Boston
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
 
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...
Accelerating Development Using Custom Hardware Accelerations with Amazon EC2 ...
 

More from Peter Bloomfield

Ai for urban traffic control neil walton_2020
Ai for urban traffic control neil walton_2020Ai for urban traffic control neil walton_2020
Ai for urban traffic control neil walton_2020Peter Bloomfield
 
Geospatial intelligence satellite applications catapult pdf - july 23 2019
Geospatial intelligence   satellite applications catapult pdf - july 23 2019Geospatial intelligence   satellite applications catapult pdf - july 23 2019
Geospatial intelligence satellite applications catapult pdf - july 23 2019Peter Bloomfield
 
Prem Gill Seals From Space
Prem Gill Seals From SpacePrem Gill Seals From Space
Prem Gill Seals From SpacePeter Bloomfield
 
David Petit Deimos presentation EO
David Petit Deimos presentation EODavid Petit Deimos presentation EO
David Petit Deimos presentation EOPeter Bloomfield
 
Tsc cav@digital catapult_march2019
Tsc cav@digital catapult_march2019Tsc cav@digital catapult_march2019
Tsc cav@digital catapult_march2019Peter Bloomfield
 
Cyanapse talk photorealisticf_ilters_migaragemeetup_7nov2018
Cyanapse talk photorealisticf_ilters_migaragemeetup_7nov2018Cyanapse talk photorealisticf_ilters_migaragemeetup_7nov2018
Cyanapse talk photorealisticf_ilters_migaragemeetup_7nov2018Peter Bloomfield
 
Armin mustafa talk_08.11.18_a_imeetup
Armin mustafa talk_08.11.18_a_imeetupArmin mustafa talk_08.11.18_a_imeetup
Armin mustafa talk_08.11.18_a_imeetupPeter Bloomfield
 
Caspian machine learning garage
Caspian machine learning garageCaspian machine learning garage
Caspian machine learning garagePeter Bloomfield
 

More from Peter Bloomfield (10)

Ai for urban traffic control neil walton_2020
Ai for urban traffic control neil walton_2020Ai for urban traffic control neil walton_2020
Ai for urban traffic control neil walton_2020
 
Geospatial intelligence satellite applications catapult pdf - july 23 2019
Geospatial intelligence   satellite applications catapult pdf - july 23 2019Geospatial intelligence   satellite applications catapult pdf - july 23 2019
Geospatial intelligence satellite applications catapult pdf - july 23 2019
 
Prem Gill Seals From Space
Prem Gill Seals From SpacePrem Gill Seals From Space
Prem Gill Seals From Space
 
David Petit Deimos presentation EO
David Petit Deimos presentation EODavid Petit Deimos presentation EO
David Petit Deimos presentation EO
 
Tsc cav@digital catapult_march2019
Tsc cav@digital catapult_march2019Tsc cav@digital catapult_march2019
Tsc cav@digital catapult_march2019
 
Cyanapse talk photorealisticf_ilters_migaragemeetup_7nov2018
Cyanapse talk photorealisticf_ilters_migaragemeetup_7nov2018Cyanapse talk photorealisticf_ilters_migaragemeetup_7nov2018
Cyanapse talk photorealisticf_ilters_migaragemeetup_7nov2018
 
Yossarian 2018 intro
Yossarian 2018 introYossarian 2018 intro
Yossarian 2018 intro
 
Armin mustafa talk_08.11.18_a_imeetup
Armin mustafa talk_08.11.18_a_imeetupArmin mustafa talk_08.11.18_a_imeetup
Armin mustafa talk_08.11.18_a_imeetup
 
Caspian machine learning garage
Caspian machine learning garageCaspian machine learning garage
Caspian machine learning garage
 
Pablo Suau - DWP Digital
Pablo Suau - DWP DigitalPablo Suau - DWP Digital
Pablo Suau - DWP Digital
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Training AI for Autonomous Vehicles: Where Enterprise Storage Hits the Wall

  • 1. © 2019 Cray Inc. TRAINING AI FOR AUTONOMOUS VEHICLES: W H E R E E N T E R P R I S E S T O R A G E H I T S T H E W A L L Per Nyberg I Uli Plechschmidt I London, March 28 2019
  • 2. © 2019 Cray Inc. Pharma Energy Autonomous Technologies Insurance National AI Initiative Scientific Research Top 5 Global Pharma Major Integrated Oil and Gas Company Fortune 20 Global Technology Company Leading Personal Lines Property and Casualty Insurer Alan Turing Institute (UK National Data Science and Artificial Intelligence Center) National Energy Research Scientific Computing Center Cray CS-Storm Dense GPU Cluster Cray CS-Storm and XC50 w. Urika-XC Cray CS-Storm Dense GPU Cluster Cray CS-Storm Dense GPU Cluster Cray Urika-GX Cray XC40 “Cori” Supercomputer Supporting core research and development in areas including cheminformatics and large machine learning workloads such as CryoEM Applying machine and deep learning techniques to identify previously undetected features in subsurface seismic imaging Machine and deep learning workloads to develop systems for connected cars and autonomous technologies Application of deep learning for the automation of claims processing Enabling the development of advanced applications across a number of fields including engineering and technology, defense and security, and smart cities Supporting machine and deep learning across a broad set of science disciplines Record breaking deep learning results in Climate, High Energy Physics and Astronomy Industry Academia Government AI IS PERVASIVE ACROSS OUR CUSTOMER BASE
  • 3. © 2019 Cray Inc. WHAT IS AUTONOMOUS DRIVING? 3
  • 4. © 2019 Cray Inc. Massive Simulation Using AI/ML @ Extreme Scale MASSIVE SIMULATION TASK * www.rand.org/content/dam/rand/pubs/research_reports/RR1400/RR1478/RAND_RR1478.pdf “Autonomous vehicles need to be driven more than 11 billion miles to be 20% better than humans. With a fleet of 100 vehicles, 24 hours a day, 365 days a year, at 25 miles per hour, this would take 518 years*.” As we cannot wait 518 years 4
  • 5. © 2019 Cray Inc. HIGH-LEVEL DEVELOPMENT PROCESS Sensor data from training cars Model Engineering Autonomous VehiclesIngest BIG Data Trained Model Data logger in training car AI infrastructure in data center Control unit in self-driving car Data scientists 5
  • 6. © 2019 Cray Inc. AUTONOMOUS VEHICLES: AI WORKFLOW Search analysis Training data Report results Test data Data preprocessing Ingest data Model training Re- simulation Model testing Model deployment Train Test Loop Data-intensive Compute-intensive 6
  • 7. © 2019 Cray Inc. AUTONOMOUS VEHICLES: THE TIP OF THE PYRAMID Extreme Moderate StorageRequirements Level 1: Workgroup AI Level 2: Departmental AI Level 3: Divisional AI Level 4:Training AI for Autonomous Vehicles 7
  • 8. © 2019 Cray Inc. EXAMPLES OF TRAINING CARS Up to 5 Terabyte/hour per training car running 8 hours x 100 training cars = 4 Petabyte per day a 100 training cars running 5 working days per week = 20 Petabyte per week 8
  • 9. © 2019 Cray Inc. EXTREME STORAGE REQUIREMENTS Example: European Car Manufacturer Base Case Scenario 100% Growth “Buffer” Translation to Storage Requirements Useable capacity in single namespace 130 Petabyte 260 Petabyte About 170 PB (340 PB) raw capacity Re-simulation of 100% of the data in ….. hours 100 hours 50 hours • 130 PB in 100 hours = 361 GB/sec • 130 PB in 50 hours = 722 GB/sec • 260 PB in 100 hours = 722 GB/sec • 260 PB in 50 hours = 1.4 TB/sec 9 ~ 470 HD movies per second ~ 87 million HD movies
  • 10. © 2019 Cray Inc. STORAGE CHALLENGE TRIANGLE Scalability (in Petabyte) Throughput (in Gigabyte/second) Affordability (%age of storage spending of total Autonomous Driving budget) 10
  • 11. © 2019 Cray Inc. ENTERPRISE STORAGE IS CHALLENGED Scalability (to 100’s of Petabyte capacity) Throughput (of 100’s of Gigabyte/sec) Affordability (achieve the requirements at lowest possible TCO) Classic Block/File Storage (e.g. Dell EMC PowerMax, NetApp FAS, HPE 3PAR) Scale-Out NAS (e.g. Dell EMC Isilon, NetApp Clustered ONTAP) Distributed Object Storage (e.g. Scality, Cloudian, Dell EMC ECS, IBM COS, Pure Storage Flashblade etc.) Hadoop-based Storage (e.g. Cloudera/Hortonworks, MapR, Oracle Big Data Appliance etc.) Public Cloud Storage (e.g. Amazon AWS S3, Microsoft Azure, Google Cloud Storage etc.) 11
  • 12. © 2019 Cray Inc. 12 SUPERCOMPUTING STORAGE WORKS! Storage choices of the Top 100* Global Supercomputers 27 20 14 10 8 7 5 5 3 1 0 5 10 15 20 25 30 Cray ClusterStor DDN EXAScaler NetApp Chinese Lenovo DSS-G DDN GRIDScaler Fujitsu FEFS IBM ESS Unknown Supermicro #1 *November 2018 top500.org list #1File System in the Top 100 with 76% share LUSTRE OTHERS Storage System in the Top 100 with 27% share
  • 13. © 2019 Cray Inc. WHY DID UBER ATG SWITCH TO LUSTRE? http://wiki.lustre.org/images/e/ed/LUG2018-Lustre_at_Uber-Cobb_Xiong.pdf 13
  • 14. © 2019 Cray Inc. 14 HIGH LEVEL SOLUTION ARCHITECTURE Hadoop CPU nodes for data pre-processing CPU & dense GPU nodes for deep learning Hardware-in-the-loop (HIL) simulators (data access with standard CIFS/NFS)Lustre client Lustre client 90%+ READ 90%+ READ Scalable CIFS/NFS Gateway READWRITE Cray ClusterStor L300 Lustre Global Namespace Scaling to 100’s of Petabyte and Terabyte/second
  • 15. © 2019 Cray Inc. ENGINEERED RACK-SCALE HPC STORAGE Data Network Switches (IB, OPA, GbE) Management Network Switches System Management Unit (SMU) Metadata Management Unit (MMU) Base Rack Expansion Rack Expansion Rack Expansion Rack Expansion Rack Scalable Storage Units (SSU) with embedded HA OSS*: 5U84 5U84 2U24 L300NL300 L300F All HHD: Large, sequential I/O All Flash: Small, random, I/O Hybrid (SSD/HDD): Mixed I/O *Object Storage Servers Mix & Match Ex 15
  • 16. © 2019 Cray Inc. CLUSTERSTOR FOR AUTONOMOUS VEHICLES Performance Efficiency Achieve requirements with the most efficient architecture requiring less storage gear (drives, enclosures, racks) More budget left to spend on data science Engineered Solution Pre-integrated, tested, tuned, and shipped ready to deploy. Days instead of weeks to implementation Reliability No single point of failure architecture and parity de-clustered RAID Less downtime and (much) faster rebuilds Scalability Sustained linear performance when adding capacity up to hundreds of Petabyte Predictable application performance @ scale Management & Support Integrated system management, System Snapshot Analyzer (SSA)/Call home, hardware monitoring with health alerts, API, HPC storage analytics on job level with View for ClusterStor Less downtime and faster time-to-problem resolution Unique Values 1 34 5 2 16
  • 17. © 2019 Cray Inc. THANK YOU W h e n e v e r y t h i n g e l s e b r e a k s , i t i s t i m e t o c o n s i d e r y o u r f i r s t C r a y s y s t e m !