byteLAKE's expertise across NVIDIA architectures and configurations

NVIDIA Expertise
AI Solutions for Industries

• In-depth understanding of NVIDIA hardware and software ecosystems:
– Expertise in various NVIDIA architectures, including Fermi, Kepler (K80, GeForce GTX Titan,
Jetson), Maxwell (NVIDIA GeForce GTX 980), Pascal (P100), Tesla V100/A100/H100, and T4.
– Mastery of GPU programming languages such as CUDA, OpenCL, and OpenACC.
– Extensive experience optimizing solutions for NVIDIA GPUs.
– Comprehensive knowledge spanning desktop, mobile, and server environments.
• Demonstrated success through multiple case studies:
– AI training (machine learning and deep learning)
– Edge AI inferencing
– Classic HPC simulations (Computational Fluid Dynamics, weather simulations)
• Active participation in cutting-edge research with numerous publications in prestigious
journals, including Concurrency and Computation: Practice and Experience, Parallel
Computing, and the Journal of Supercomputing.
Expertise Across NVIDIA Architectures
and Configurations

Performance & Scalability
AI
Always optimized for the latest hardware available.
AI training optimized for edge servers and multi-node HPC architectures.
Edge Servers and HPC
Cost
Efficient
Scalable
Solution
Maximum
Performance
Quick deployment
& no external dependencies.
On-premises
Example configurations.
Other options available.
Hardware
ThinkSystem SR650V2
ThinkEdge SE350 ThinkEdge SE450
Workstations ThinkSystem SR670V2
AI Predictions AI Training
Edge HPC Server Performance: AI training
➢ Lenovo’s ThinkEdge SE450
➢ 2 NVIDIA A100 80GB Tensor Core GPUs
➢ Intel Xeon Gold 6330N CPU
➢ Download the full report

HPC Simulations optimized by Machine Learning
automatic adaptation of algorithm to a specific hardware architecture
• Enables software portability between different architectures:
– CPU: different number of cores, hierarchy of memory, caches size;
– GPU: register file reusing, shared memory utilization, GPU direct support, reduction of global memory transaction;
– HPC: selecting the right number of nodes, scalability estimation, overlapping data transfers & communication;
– Hybrid: load balancing
(i.e. selecting appropriate parts of the algorithm for different devices executing the code with different performance)
• Helps build adaptable algorithms:
– automatically selecting the size of data blocks, number of threads, number of processes, precision of data
(i.e. depending on algorithm or input data characteristics some data can be stored using double, single of half precision format)
– selecting the criterion of optimization: performance, energy consumption, accuracy of result, mix
(i.e. mix of performance & energy to minimize energy and keep execution time or to optimization performance & keep energy budget)
• Can auto-configure the system and provide the most suitable compiler flags
byteLAKE’s Software Autotuning

• Goal: Reducing the energy consumption of the MPDATA algorithm (algorithm for numerical simulation of
geophysical fluids flows on micro-to-planetary scales – especially used in a numerical weather prediction).
• Hardware: Piz Daint supercomputer (ranked 3-rd at top 500), equipped with the most advanced Pascal-based
GPUs: NVIDIA Tesla P100.
• Idea: Applying mixed precision arithmetic - set a part of operations to be performed in a single precision (32-bits)
and the remaining set to double (64-bits).
• Why do we use it? A single simulation of the weather phenomenon needed more than 1013
operations. We
suspected that not all of them needs double precision arithmetic to preserve the same simulation accuracy. We
believe that the control of the precision and accuracy of numerical results can increase the performance, decrease
the energy consumption, and provide highly accurate results.
• Solution: We used unsupervised learning to estimate the correlation between the precision of each matrix and
their influence on criteria (energy, accuracy of results). During the dynamic and short training stage we evaluated
the set of operations that could be performed in a single precision without loss in accuracy of the weather
simulation.
Research Case: (concluded)
CFD acceleration with GPU (MPDATA, weather forecast)
Results: We reduced energy by 33%, increased
performance by the factor of 1.27x using 25% less GPUs,
keeping the accuracy of the results at the same level
as when using double precision arithmetic.

Research Case:
Reconfiguring HPC Simulation with AI to optimize performance and energy
node count
accelerators per node
memory alignment
streams count
buffering types
…
cpu cores
memory policy
1 000 000 000
Possible
configurations
Ca. 5000
possible
configurations
Artificial Intelligenc
e
This module utilizes among
others the supervised learning
method with the random forest
algorithm.
The main functionality of the module
is to prune the search space in
order to eliminate the worst
configurations.
We develop a Machine
Learning module in order to
select the most fitting
configuration.
In this way we achieve a small set
that at 90% contains the best
configuration.

MPDATA Accelerated
CFD / Advection algorithm optimized for heterogeneous
computing. For Nvidia GPUs and Xilinx Alveo FPGAs.

• MPDATA
(Multidimensional Positive Definite Advection Transport Algorithm)
– main part of the dynamic core of the Eulerian/
semi-Lagrangian (EULAG) model
– EULAG (MPDATA+elliptic solver) is the established computational model,
developed for simulating thermo-fluid flows across a wide range of scales
and physical scenarios
– currently, this model is being implemented as the new dynamic core of the COSMO
(Consortium for Small-scale Modeling) weather prediction framework
– advection (together with the elliptic solver) is a key part of many frameworks that allow
users to implement their simulations
• Advection
– movement of some material (dissolved or suspended) in the fluid.
Algorithm: Advection (MPDATA)
General Information

• Easy to integrate
– Can work as a standalone application or be called as a function via our dedicated interface
(e.g. can be called as a function with input and output arrays)
– Compatible with frameworks like TensorFlow for integrating deep learning with CFD codes
• Easy to visualize the results
– Results can be stored in a raw format as a binary file of the output arrays or converted via
byteLAKE tools to a ParaView format
• See benefits already in 1-node HPC configurations
– Strongly adapted to Alveo U250, were single card supports the max size of arrays: 2,1 Gcells
(max compute domain: 1264 x 1264 x 1264) ~ 60 GB
• Scalable to many cards per node and many nodes
byteLAKE’s implementation compatibility

• First-order-accurate step of the advection scheme.
Second-order is an option.
• Input data
– Array X – non-diffusive quantity
(e.g. temperature of water vapor, ice, precipitation, etc.)
– Arrays V1, V2, V3 - each of them stores the velocity vectors in one direction
– (optional) Arrays Fi, Fe - implosion and explosion forces acting on a structure of X
– (optional) Array D with density
– (optional) Array rho which defines an interface for the coupling of COSMO and EULAG dynamic core
(used to provide the transformation of the X variable)
– DT – time step (scalar)
• Output data
– single X array that was updated in the given time step
Technical Information

• Applications include
– To characterize the sub-grid scales effect in global numerical simulations
of turbulent stellar interiors
– To compare anelastic and compressible convection-permitting weather forecasts
for the Alpine region
– Modeling the prediction of forest fire spread
– Flood simulations
– Biomechanical modeling of brain injuries within the Voigt model
(a linear system of differential equations where the motion of the brain tissue depends
merely on the balance between viscous and elastic forces)
– Simulation gravity wave turbulence in the Earth's atmosphere
– Simulation of geophysical turbulence in the Earth's atmosphere
– Ocean modeling: simulation of three-dimensional solitary wave generation and
propagation using EULAG coupled to the barotropic NCOM (Navy Coastal Ocean Model)
tidal model
11
Applications of Advection (MPDATA)

• Applications include cont.
– Oil and Gas: provides a significant return on investment (ROI) in seismic analysis,
reservoir modelling and basin modelling. Used also to monitor drilling and seismic data
to optimize drilling trajectories and minimize environmental risk.
– AgriTech: models to track and predict various environmental impacts on crop yield such
as weather changes. For example, daily weather predictions can be customized based on
the needs of each client and range from hyperlocal to global.
• Example adopters
– Poznan Supercomputing and Networking Center, Poland: prognosis of air pollution
– European Centre for Medium-Range Weather Forecasts, UK: weather forecast
– Institute of Meteorology and Water Management, Poland: weather forecasts
– German Aerospace Center: aeronautics, transport and energy areas
– University of Cape Town, RPA: weather simulation
– Montreal University: weather simulation
– Warsaw University: ocean simulation
Applications of Advection (MPDATA), cont.
Full list

12x better performance
30% reduced energy consumption
• Our solution: machine learning managed, dynamic application of mixed precision
• Highlights:
– Dynamic estimation of the algorithm’s power consumption as a function of the
frequency of the processor and the number of cores.
– Energy-aware task management
– Auto-tuning procedure taking into account algorithm’s and GPU-specific parameters
for auto-configuring purposes.
– Result: better performance, less energy consumed.
Weather engine optimized for Europe’s fastest
supercomputer (Piz Daint)
Our mechanism provides the energy savings of up to 1.43x
comparing to the default Linux scaling governor.

Dynamic Mixed Precision, cont.
We reduced E by 33%, increased performance by the factor of 1.27x using 25% less GPUs.
We kept the accuracy of the results at the same level as when using double precision arithmetic.

Dynamic Mixed Precision
Optimize execution time
• Ported geophysical model
(EULAG) to a parallel
computing supercomputer
architecture (Piz Daint)
• Used Machine Learning
(Random Forest) to
optimize various numerical
parameters as: data blocks
sizes, number of GPU
streams, sizes of vector data
types
Optimize energy efficiency
• Developed a mechanism
(mixed precision) that
allowed for providing a low
energy consumption of
supercomputers keeping the
code performance at the
highest possible level
• Developed a framework,
based on software
automatic tuning approach
Results
✓ 10 times faster
✓ Then we improved it
even more, reaching the
speed-up of 1.27
✓ Energy consumption
reduced by 33%
✓ Optimized GPUs usage
while keeping the accuracy
of computations
Highlights:
• C++, CUDA, MPI, OpenMP

Advanced quality inspection and data insights
AI for Manufacturing, Automotive, Paper, Chemical, and Energy sectors.
AI self-checkout stations for Restaurants and object recognition Retail businesses.
AI Solutions
for Industries

for Manufacturing for Automotive for Paper Industry Data Insights
Cognitive Services
Advanced quality inspection and data insights.
Cognitive Services for Restaurants
Self-checkout and object recognition.
CFD Suite
AI-accelerated Computational Fluid Dynamics.
Predictive Maintenance
Featured Products

Custom AI Development
AI Services
AI Workshop Edge AI Cognitive Automation HPC
Incubation
Intel Expertise Alveo Expertise
NVIDIA Expertise
brainello Ewa Guard
Document Processing Forestry & Agriculture
+48 508 091 885
+48 505 322 282
welcome@byteLAKE.com

➢ LinkedIn.com/company/byteLAKE
➢ X.com/byteLAKEGlobal
➢ FB.com/byteLAKE/
➢ byteLAKE.com/en/YouTube
➢ Blog
Partners & Clients
“AI already plays a very important role in our daily lives. […]
The application of the Intel® Distribution of OpenVINO™ toolkit
in byteLAKE’s Cognitive Services shows that AI works efficiently
as an actual tool for optimizing company operations. Moreover,
such a combination reduces the barrier of necessary upgrades to IT
infrastructure [...],” said Krzysztof Jonak,
EMEA Territory Sales Director, Intel.
“We’re also working with a number of partners on AI initiatives
that will provide real world solutions for customers. […]
Our collaboration with partners such as Intel, NVIDIA, Mark III
systems, and byteLAKE greatly expands the resources and
expertise we’re able to provide“, said Dr. Bhushan Desam,
Lenovo’s AI Global Business Leader, HPC and AI Business.

Our Research Studies
GO PUBLIC!
More at: byteLAKE.com/en/research

AI
Deployment
Plan
Case
Studies
Science +
business +
industry know-
how
& Partners

Meet byteLAKE
AI Solutions for Industries |
Quality Inspection |
Data Insights |
AI-accelerated CFD |
Self-Checkout
Products:
CFD Suite Cognitive Services
www.byteLAKE.com
Headquartered in Poland
Empowering Industries with Artificial Intelligence
Solutions.
At byteLAKE, we harness cutting-edge technology to
provide advanced quality inspection and data insights
tailored for the Manufacturing, Automotive, Paper,
Chemical, and Energy sectors.
Additionally, we offer self-checkout stations for
Restaurants and object recognition solutions for Retail
businesses.
+48 508 091 885
+48 505 322 282
welcome@byteLAKE.com

byteLAKE's expertise across NVIDIA architectures and configurations

Recommended

Recommended

More Related Content

Similar to byteLAKE's expertise across NVIDIA architectures and configurations

Similar to byteLAKE's expertise across NVIDIA architectures and configurations (20)

More from byteLAKE

More from byteLAKE (20)

Recently uploaded

Recently uploaded (20)

byteLAKE's expertise across NVIDIA architectures and configurations