SlideShare a Scribd company logo
1 of 2
Download to read offline
Towards Exascale Computing for
Autonomous Driving
Levent G¨urel1,2
1
ABAKUS Computing Technologies, Palo Alto, California, USA
2
University of Illinois at Urbana-Champaign, Illinois, USA
lgurel@illinois.edu, lgurel@ieee.org
!"#$%&$'"(
)"%*+,&"%
-.$%&$'"(
)"%*+,&"%
/0! 12
13
12 2421
//! 1256
127
123
89: 1256
127
123
/0! 121;
1222 1
//! 125<
1215
127
89: 12
5<
12
15
12
7
/0! 1252
12=
122
//! 12>2
121=
1215
89: 12>2
121=
1215
12
12
?*@A+#B9C(DB@"(E%F
GHI!%
/$#,B.(
IA",$#B*9%
/$#,B.(
JBK"(
ELF
12
;
127
Fig. 1. Order-of-magnitude estimates of computing time (in seconds) it would
take to perform some common matrix operations with petascale and exascale
computational resources. MVP: Matrix-vector product. MMP: Matrix-matrix
product. Inv: Matrix inversion.
processes should be used to make sure that the vehicle is
acting in compliance with the decisions made. All of those
compute-intensive operations should be completed in almost
real time and repeated numerous times every second.
Among countless compute-intensive operations, consider
the example of high-resolution imaging radars operating at
77 GHz, where the wavelength (λ) is about 3.9 mm. Credible
imaging outcomes will require the solution of inverse problems
with rigorous full-wave solvers and λ/10 discretization, which
is about 0.4 mm. Every-day objects in the surrounding theatre,
such as other cars, bicycles, pedestrians, road structures, will
be discretized with billions of unknowns. In this context, the
term “unknowns” is synonymous with degrees of freedom,
nodes, elements, etc. The number of unknowns is the same as
the size of the matrix equation to be solved.
Figure 1 illustrates order-of-magnitude estimates of FLOPs
and computing times (in seconds) for some basic and relevant
operations involving N ×N matrices. In Fig. 1, matrix sizes of
108
–1010
are considered, in accordance with realistic automo-
Abstract— Autonomous driving is emerging as one of the most
computationally demanding technologies of modern times. We re-
port a high-level roadmap towards satisfying the ever-increasing
computational needs and requirements of autonomous mobility
that is easily at exascale level. We demonstrate that hardware
solutions alone will not be sufficient, and exascale computing
demands should be met with a combination of hardware and
software. In that context, algorithms with reduced computational
complexities will be crucial to provide software solutions that will
be integrated with available hardware, which is also limited by
various mobility restrictions, such as power, space, and weight.
I. INTRODUCTION
Exascale computing refers to computing systems capable
of at least one billion-billion (1018
) calculations or floating
point operations (FLOPs) per second. The National Strategic
Computing Initiative of the White House Office of Science
and Technology Policy identifies accelerating delivery of a
capable exascale computing system that integrates hardware
and software capability to deliver approximately 100 times
the performance of current 10 petaflop systems across a range
of applications representing government needs as a strategic
objective. As such, an exascale computing system is defined
as the integration of hardware and software capabilities. We
agree with this viewpoint. It is necessary, but not sufficient,
to drive progress in both hardware and software avenues. A
sufficient outcome will also require the successful integration
of hardware and software. This paper focuses on the software
component. We also pay attention to integration since software
cannot be considered independently of the hardware platform
it will be used on.
II. THE NEED FOR EXASCALE COMPUTING
Reaching the goal of developing vehicles, drones, robots
that can operate autonomously encompasses so many com-
puting applications that autonomous mobility appears to be
capable of devouring as much computational resources as you
can throw at it.
Autonomous vehicles (AVs) need to perceive their en-
vironment with multiple sensors, such as radars, cameras,
lidars, and acoustic/ultrasonic transducers. AVs need to process
huge amounts of data provided by the sensors to resolve the
semantics of the surrounding theater. Then, AVs need to think
and make decisions based on machine learning and artificial
intelligence. Those decisions must be implemented through the
electronic and mechanical controls of the vehicle. Feedback
tive radar scenarios. Even with exascale computing resources,
inversion or solution of a billion-by-billion matrix equation
will require in the order of 108
seconds, or equivalently, more
than 30 years! Hence, it is clear that raw hardware power will
not be sufficient for brute-force solutions.
III. ALGORITHMIC AND SOFTWARE SOLUTIONS
We perform exascale computations with the following build-
ing blocks:
1) Matrix-vector multiplications (MVMs) with fast multi-
pole methods (FMMs).
2) Reduced-complexity linear system solvers employing
iterative techniques.
3) Fast Fourier transforms (FFTs) with FMMs.
Fast multipole methods (FMMs) have a plethora of variations
identified with different names. For example, multilevel fast
multipole algorithm (MLFMA) [1] is an important branching
out from the single-level fast multipole algorithm. Here, we
will use FMM to refer to any and all flavors of fast multipole
algorithms (including MLFMA) developed and used in various
disciplines.
FMM can also be used to implement FFTs. This means that
FMM can be used in any discipline that requires the use of
FFT. Even better, FMM can implement FFT on nonuniform
and irregular data [2]. FFT is an important tool for feature
extraction in big-data analytics. Therefore, in addition to
extending the range of applicability of FMM among various
disciplines (through the use of FFT), we also propose a unified
approach to big-data analytics and scientific solvers with
FMM. Figure 2 illustrates that employing reduced-complexity
solvers such as multilevel FMM and FFT with O(N log N)
complexity is the right way of solving extremely large prob-
lems involving billions of unknowns.
FMMs are used in a variety of disciplines, but nevertheless,
they are kernel-dependent. They can be used to acceler-
ate MVMs and linear-system solutions in many disciplines,
such as electrostatics and potential theory (Laplace’s equa-
tion), electrodynamics (Helmholtz’s equation), acoustics, as-
trophysics (gravitational force), molecular dynamics, quantum
mechanics (Schroedinger’s equation), structural mechanics,
fluid dynamics, heat equation, and multi-physics solutions.
Nevertheless, they cannot be used to perform a simple and
straightforward MVM for any arbitrary matrix and vector. It
turns out that the matrices should be constructed from specific
physical equations, whose kernels satisfy some particular
conditions, such as diagonalizability.
Fortunately, one such particular kernel that is amenable to
FMM treatment is that of the Fourier transform. Once FMM
finds life in efficient and parallel implementations of FFT,
we can penetrate into the most unimaginable capillaries of
the scientific roadmaps. Obviously, FFT is used for spectral
studies in every discipline, but its footprint is much broader
than that. FFT is used in time-harmonic treatment, complex
analysis, MVMs and matrix solutions (in some cases), and in
the vast area of big data. Not only that FMM can be used to
implement sequential and parallel FFT efficiently, but it can
also be used to compute the Fourier transform of nonuniform
and irregular multidimensional data sequences.
Parallelization of FMM is not at all trivial; on the contrary, a
scalable implementation is painstakingly difficult to attain. We
have been making progress and claiming incremental victories
in this area for more than a decade. We have been tuning our
parallelization strategies [3] so that we can achieve scalable
solutions with massive numbers of nodes, processors, and
cores on parallel supercomputers. [4] We have been solving
increasingly larger problems over the years, such as those
involving larger than billion-by-billion (109
× 109
) dense
matrix equations.
!"#$%&$'"(
)"%*+,&"%
-.$%&$'"(
)"%*+,&"%
/$+%%0$1(
-'0201$#0*1 345
6
7 89
69
89
8:
89
8;
<,='*>(
?#",$#0>" 345
;
7 89
;9
89
:
899
@01A'"BC">"'(
DEE 345
8F:
7 89
8:
8 89
B6
E+'#0BC">"'(
DEEG(DDH
345('*A(57 89
88
89
BI
89
BJ
89
89
K*2L+#01A(H02"(4%7
DC3!%
K*2L+#$#0*1$'(
K*2L'".0#=
E$#,0.(
@0M"(
457
@*'>",
Fig. 2. Order-of-magnitude estimates of computing time (in seconds) it would
take to solve matrix equations involving 10 billion unknowns via tradional and
fast solvers and with petascale and exascale computational resources.
IV. CONCLUSIONS
In this paper, we report a high-level roadmap towards sat-
isfying the ever-increasing computational needs and require-
ments of autonomous mobility. We demonstrate that hardware
solutions alone will not be sufficient, and exascale computing
demands should be met with a combination of hardware and
software. In that context, algorithms with reduced compu-
tational complexities, e.g., FMM and FFT, will be crucial
to provide software solutions that will be integrated with
available hardware, which is inherently limited due to mobility
restrictions.
REFERENCES
[1] ¨O. Erg¨ul and L. G¨urel, The Multilevel Fast Multipole Algorithm (MLFMA)
for Solving Large-Scale Computational Electromagnetics Problems.
IEEE Press-Wiley, 2014.
[2] A. Dutt and V. Rokhlin, “Fast Fourier transforms for nonequispaced data,”
SIAM J. Scientific Computing, vol. 14, no. 6, pp. 1368–1393, 1993.
[3] L. G¨urel and ¨O. Erg¨ul, “Hierarchical parallelization of the multilevel fast
multipole algorithm (MLFMA),” Proceedings of the IEEE, vol. 101, no. 2,
pp. 332–341, 2013.
[4] M. Salim, A. O. Akkirman, M. Hidayetoglu, and L. G¨urel, “Comparative
benchmarking: Matrix multiplication on a multicore coprocessor and
a GPU,” in Computational Electromagnetics International Workshop
(CEM’15), 2015, pp. 38–39.

More Related Content

What's hot

IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...IRJET Journal
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...eArtius, Inc.
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Genetic Algorithm for Solving the Economic Load Dispatch
Genetic Algorithm for Solving the Economic Load DispatchGenetic Algorithm for Solving the Economic Load Dispatch
Genetic Algorithm for Solving the Economic Load DispatchSatyendra Singh
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather PredictionIRJET Journal
 
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...Satyendra Singh
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelData Works MD
 
Project 2: Baseband Data Communication
Project 2: Baseband Data CommunicationProject 2: Baseband Data Communication
Project 2: Baseband Data CommunicationDanish Bangash
 
GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_reportCharles Hubbard
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicoreillidan2004
 
Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Aijun Zhang
 
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodEconomic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodIOSR Journals
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
A Smart Home Testbed for Evaluating XAI with Non-Experts
A Smart Home Testbed for Evaluating XAI with Non-ExpertsA Smart Home Testbed for Evaluating XAI with Non-Experts
A Smart Home Testbed for Evaluating XAI with Non-ExpertsCHAIproject
 
An Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing SystemAn Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing SystemIRJET Journal
 
Using information theory principles to schedule real time tasks
 Using information theory principles to schedule real time tasks Using information theory principles to schedule real time tasks
Using information theory principles to schedule real time tasksazm13
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 

What's hot (20)

IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
IRJET-An Effective Strategy for Defense & Medical Pictures Security by Singul...
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Genetic Algorithm for Solving the Economic Load Dispatch
Genetic Algorithm for Solving the Economic Load DispatchGenetic Algorithm for Solving the Economic Load Dispatch
Genetic Algorithm for Solving the Economic Load Dispatch
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather Prediction
 
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
PuShort Term Hydrothermal Scheduling using Evolutionary Programmingblished pa...
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph Kernel
 
Project 2: Baseband Data Communication
Project 2: Baseband Data CommunicationProject 2: Baseband Data Communication
Project 2: Baseband Data Communication
 
GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_report
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
 
Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)
 
A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...
A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...
A Novel Approach in Scheduling Of the Real- Time Tasks In Heterogeneous Multi...
 
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration MethodEconomic Dispatch of Generated Power Using Modified Lambda-Iteration Method
Economic Dispatch of Generated Power Using Modified Lambda-Iteration Method
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
A Smart Home Testbed for Evaluating XAI with Non-Experts
A Smart Home Testbed for Evaluating XAI with Non-ExpertsA Smart Home Testbed for Evaluating XAI with Non-Experts
A Smart Home Testbed for Evaluating XAI with Non-Experts
 
An Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing SystemAn Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing System
 
Tesis-Maestria-Presentacion-SIturriaga
Tesis-Maestria-Presentacion-SIturriagaTesis-Maestria-Presentacion-SIturriaga
Tesis-Maestria-Presentacion-SIturriaga
 
Using information theory principles to schedule real time tasks
 Using information theory principles to schedule real time tasks Using information theory principles to schedule real time tasks
Using information theory principles to schedule real time tasks
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 

Similar to Exascale Computing for Autonomous Driving

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy Ehsan Sharifi
 
Akselos solutions for oil & gas
Akselos solutions for oil & gasAkselos solutions for oil & gas
Akselos solutions for oil & gasAlonso Giannoni
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimationData Con LA
 
cis97003
cis97003cis97003
cis97003perfj
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Editor IJCATR
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsSubhajit Sahu
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsSubhajit Sahu
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check IJECEIAES
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIlya Kryukov
 
Final Project Report
Final Project ReportFinal Project Report
Final Project ReportRiddhi Shah
 
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...IJPEDS-IAES
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 

Similar to Exascale Computing for Autonomous Driving (20)

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
 
Akselos solutions for oil & gas
Akselos solutions for oil & gasAkselos solutions for oil & gas
Akselos solutions for oil & gas
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
N03430990106
N03430990106N03430990106
N03430990106
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
cis97003
cis97003cis97003
cis97003
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver Library
 
Anegdotic Maxeler (Romania)
  Anegdotic Maxeler (Romania)  Anegdotic Maxeler (Romania)
Anegdotic Maxeler (Romania)
 
Final Project Report
Final Project ReportFinal Project Report
Final Project Report
 
cug2011-praveen
cug2011-praveencug2011-praveen
cug2011-praveen
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Exascale Computing for Autonomous Driving

  • 1. Towards Exascale Computing for Autonomous Driving Levent G¨urel1,2 1 ABAKUS Computing Technologies, Palo Alto, California, USA 2 University of Illinois at Urbana-Champaign, Illinois, USA lgurel@illinois.edu, lgurel@ieee.org !"#$%&$'"( )"%*+,&"% -.$%&$'"( )"%*+,&"% /0! 12 13 12 2421 //! 1256 127 123 89: 1256 127 123 /0! 121; 1222 1 //! 125< 1215 127 89: 12 5< 12 15 12 7 /0! 1252 12= 122 //! 12>2 121= 1215 89: 12>2 121= 1215 12 12 ?*@A+#B9C(DB@"(E%F GHI!% /$#,B.( IA",$#B*9% /$#,B.( JBK"( ELF 12 ; 127 Fig. 1. Order-of-magnitude estimates of computing time (in seconds) it would take to perform some common matrix operations with petascale and exascale computational resources. MVP: Matrix-vector product. MMP: Matrix-matrix product. Inv: Matrix inversion. processes should be used to make sure that the vehicle is acting in compliance with the decisions made. All of those compute-intensive operations should be completed in almost real time and repeated numerous times every second. Among countless compute-intensive operations, consider the example of high-resolution imaging radars operating at 77 GHz, where the wavelength (λ) is about 3.9 mm. Credible imaging outcomes will require the solution of inverse problems with rigorous full-wave solvers and λ/10 discretization, which is about 0.4 mm. Every-day objects in the surrounding theatre, such as other cars, bicycles, pedestrians, road structures, will be discretized with billions of unknowns. In this context, the term “unknowns” is synonymous with degrees of freedom, nodes, elements, etc. The number of unknowns is the same as the size of the matrix equation to be solved. Figure 1 illustrates order-of-magnitude estimates of FLOPs and computing times (in seconds) for some basic and relevant operations involving N ×N matrices. In Fig. 1, matrix sizes of 108 –1010 are considered, in accordance with realistic automo- Abstract— Autonomous driving is emerging as one of the most computationally demanding technologies of modern times. We re- port a high-level roadmap towards satisfying the ever-increasing computational needs and requirements of autonomous mobility that is easily at exascale level. We demonstrate that hardware solutions alone will not be sufficient, and exascale computing demands should be met with a combination of hardware and software. In that context, algorithms with reduced computational complexities will be crucial to provide software solutions that will be integrated with available hardware, which is also limited by various mobility restrictions, such as power, space, and weight. I. INTRODUCTION Exascale computing refers to computing systems capable of at least one billion-billion (1018 ) calculations or floating point operations (FLOPs) per second. The National Strategic Computing Initiative of the White House Office of Science and Technology Policy identifies accelerating delivery of a capable exascale computing system that integrates hardware and software capability to deliver approximately 100 times the performance of current 10 petaflop systems across a range of applications representing government needs as a strategic objective. As such, an exascale computing system is defined as the integration of hardware and software capabilities. We agree with this viewpoint. It is necessary, but not sufficient, to drive progress in both hardware and software avenues. A sufficient outcome will also require the successful integration of hardware and software. This paper focuses on the software component. We also pay attention to integration since software cannot be considered independently of the hardware platform it will be used on. II. THE NEED FOR EXASCALE COMPUTING Reaching the goal of developing vehicles, drones, robots that can operate autonomously encompasses so many com- puting applications that autonomous mobility appears to be capable of devouring as much computational resources as you can throw at it. Autonomous vehicles (AVs) need to perceive their en- vironment with multiple sensors, such as radars, cameras, lidars, and acoustic/ultrasonic transducers. AVs need to process huge amounts of data provided by the sensors to resolve the semantics of the surrounding theater. Then, AVs need to think and make decisions based on machine learning and artificial intelligence. Those decisions must be implemented through the electronic and mechanical controls of the vehicle. Feedback
  • 2. tive radar scenarios. Even with exascale computing resources, inversion or solution of a billion-by-billion matrix equation will require in the order of 108 seconds, or equivalently, more than 30 years! Hence, it is clear that raw hardware power will not be sufficient for brute-force solutions. III. ALGORITHMIC AND SOFTWARE SOLUTIONS We perform exascale computations with the following build- ing blocks: 1) Matrix-vector multiplications (MVMs) with fast multi- pole methods (FMMs). 2) Reduced-complexity linear system solvers employing iterative techniques. 3) Fast Fourier transforms (FFTs) with FMMs. Fast multipole methods (FMMs) have a plethora of variations identified with different names. For example, multilevel fast multipole algorithm (MLFMA) [1] is an important branching out from the single-level fast multipole algorithm. Here, we will use FMM to refer to any and all flavors of fast multipole algorithms (including MLFMA) developed and used in various disciplines. FMM can also be used to implement FFTs. This means that FMM can be used in any discipline that requires the use of FFT. Even better, FMM can implement FFT on nonuniform and irregular data [2]. FFT is an important tool for feature extraction in big-data analytics. Therefore, in addition to extending the range of applicability of FMM among various disciplines (through the use of FFT), we also propose a unified approach to big-data analytics and scientific solvers with FMM. Figure 2 illustrates that employing reduced-complexity solvers such as multilevel FMM and FFT with O(N log N) complexity is the right way of solving extremely large prob- lems involving billions of unknowns. FMMs are used in a variety of disciplines, but nevertheless, they are kernel-dependent. They can be used to acceler- ate MVMs and linear-system solutions in many disciplines, such as electrostatics and potential theory (Laplace’s equa- tion), electrodynamics (Helmholtz’s equation), acoustics, as- trophysics (gravitational force), molecular dynamics, quantum mechanics (Schroedinger’s equation), structural mechanics, fluid dynamics, heat equation, and multi-physics solutions. Nevertheless, they cannot be used to perform a simple and straightforward MVM for any arbitrary matrix and vector. It turns out that the matrices should be constructed from specific physical equations, whose kernels satisfy some particular conditions, such as diagonalizability. Fortunately, one such particular kernel that is amenable to FMM treatment is that of the Fourier transform. Once FMM finds life in efficient and parallel implementations of FFT, we can penetrate into the most unimaginable capillaries of the scientific roadmaps. Obviously, FFT is used for spectral studies in every discipline, but its footprint is much broader than that. FFT is used in time-harmonic treatment, complex analysis, MVMs and matrix solutions (in some cases), and in the vast area of big data. Not only that FMM can be used to implement sequential and parallel FFT efficiently, but it can also be used to compute the Fourier transform of nonuniform and irregular multidimensional data sequences. Parallelization of FMM is not at all trivial; on the contrary, a scalable implementation is painstakingly difficult to attain. We have been making progress and claiming incremental victories in this area for more than a decade. We have been tuning our parallelization strategies [3] so that we can achieve scalable solutions with massive numbers of nodes, processors, and cores on parallel supercomputers. [4] We have been solving increasingly larger problems over the years, such as those involving larger than billion-by-billion (109 × 109 ) dense matrix equations. !"#$%&$'"( )"%*+,&"% -.$%&$'"( )"%*+,&"% /$+%%0$1( -'0201$#0*1 345 6 7 89 69 89 8: 89 8; <,='*>( ?#",$#0>" 345 ; 7 89 ;9 89 : 899 @01A'"BC">"'( DEE 345 8F: 7 89 8: 8 89 B6 E+'#0BC">"'( DEEG(DDH 345('*A(57 89 88 89 BI 89 BJ 89 89 K*2L+#01A(H02"(4%7 DC3!% K*2L+#$#0*1$'( K*2L'".0#= E$#,0.( @0M"( 457 @*'>", Fig. 2. Order-of-magnitude estimates of computing time (in seconds) it would take to solve matrix equations involving 10 billion unknowns via tradional and fast solvers and with petascale and exascale computational resources. IV. CONCLUSIONS In this paper, we report a high-level roadmap towards sat- isfying the ever-increasing computational needs and require- ments of autonomous mobility. We demonstrate that hardware solutions alone will not be sufficient, and exascale computing demands should be met with a combination of hardware and software. In that context, algorithms with reduced compu- tational complexities, e.g., FMM and FFT, will be crucial to provide software solutions that will be integrated with available hardware, which is inherently limited due to mobility restrictions. REFERENCES [1] ¨O. Erg¨ul and L. G¨urel, The Multilevel Fast Multipole Algorithm (MLFMA) for Solving Large-Scale Computational Electromagnetics Problems. IEEE Press-Wiley, 2014. [2] A. Dutt and V. Rokhlin, “Fast Fourier transforms for nonequispaced data,” SIAM J. Scientific Computing, vol. 14, no. 6, pp. 1368–1393, 1993. [3] L. G¨urel and ¨O. Erg¨ul, “Hierarchical parallelization of the multilevel fast multipole algorithm (MLFMA),” Proceedings of the IEEE, vol. 101, no. 2, pp. 332–341, 2013. [4] M. Salim, A. O. Akkirman, M. Hidayetoglu, and L. G¨urel, “Comparative benchmarking: Matrix multiplication on a multicore coprocessor and a GPU,” in Computational Electromagnetics International Workshop (CEM’15), 2015, pp. 38–39.