SlideShare a Scribd company logo
Author(s)
Politehnica
University of
Bucharest
Automatic Control
and Computers
Faculty
Computer
Science
Department
Scientific Advisor
AES encryption on modern
consumer architectures
Ing. Grigore Lupescu
grigore.lupescu@gmail.com
Sl. Dr. Ing. Laura Gheorghe
Presentation Session - July 2014
AES Encryption
02.07.2014 Masters Presentation Session – July 2014 2
• Symmetric block cipher that can encrypt and decrypt information (adopted by
NIST in 2001 as a standard for encryption of electronic data)
• AES algorithm:
1. KeyExpansion: round keys are derived from the cipher key.
2. InitialRound: (AddRoundKey)
3. Rounds:
I. SubBytes— substitution step, each byte replaced according to SBOX
II. ShiftRows— transposition step, rows of the state are shifted.
III. MixColumns—a mixing operation which operates on the columns of
the state. Operations (+,*) are redefined in the Galois Finite Field.
IV. AddRoundKey - bitwise xor of the state with the round key.
4. Final Round:(SubBytes, ShiftRows, AddRoundKey).
Cipher Modes
02.07.2014 Masters Presentation Session – July 2014 3
• Algorithm to repeatedly apply a block cipher (e.g. AES) to the input plaintext
– Most operation modes require an initialization vector
• Most used cipher modes: Cipher-block chaining (CBC), Counter (CTR)
• Other cipher modes: Electronic codebook (ECB), Output feedback (OFB)
• Why use ECB ?
– Simple, fast, very well parallelizable, max throughput
– Provides a good estimate of how CTR would perform
Plaintext Plaintext Plaintext
Ciphertext Ciphertext Ciphertext
Block cipher
encryption
Block cipher
encryption
Block cipher
encryption
Key Key Key
Motivation
• Explore how modern low-end commodity hardware handles AES encryption
• Literature strongly favors GPU in comparison with CPU
• transfer to/from the GPU is ignored, just the actual execution is considered
• CPU code is not optimized/parallelized
• Creates a false impression that the GPU clearly outperforms the CPU
• How different scenarios impact performance on CPU and GPU
• Performance on CPU, iGPU, dGPU and combinations of these
• Influence of the compiler/SDK/optimization techniques
• Study of the memory hierarchy and its impact on performance
01.07.2014 Bachelor Presentation Session - July 2010 4
Software architecture (1)
01.07.2014 Bachelor Presentation Session - July 2010 5
• Written in C++, portable and
modular with cmake
(Linux/Windows). Make use of
OpenMP, OpenCL, AES-NI.
• Source code is divided into three
categories: I/O, MAIN and AES.
• AES code may be further divided
based on target processing units
– CPU => AES_HWNI
– GPU => AES_GPU
– CPU+GPU => AES_HYBRID
Software architecture (2)
01.07.2014 Bachelor Presentation Session - July 2010 6
When processing (Ct + Gt) threads are spawned
• Ct give work to CPU cores (Ct = number of
CPU cores)
• Gt give work to GPU devices (Gt = number of
GPU devices)
• On the left, case Ct=2 (dual core CPU), Gt=2
(iGPU, dGPU)
• Each CPU core or GPU device receives a % of
work according to it’s capabilities (work split
is done statically by the user)
AES Performance Testing
02.07.2014 Masters Presentation Session – July 2014 7
• Initial test system – AMD A6-5400K (dual core
x86 + iGPU HD7540, AES-NI support), dGPU R7
250, 6GB DDR3@1600, Ubuntu 12.04 LTS x64
• GPGPU implementation - OpenCL
– SubBytes – precomputed SBOX stored in
constant memory
– MixColumns – precomputed Galois Field
matrices to avoid (+,*) operations
– ShiftRows and AddRoundKey - simple
operations
• CPU implementation – OpenMP
– AES encryption using AES-NI instructions
and OpenMP for parallelism on multicore
CPUs
– Comparison with OpenSSL library
Results Single Unit Processing
02.07.2014 Masters Presentation Session – July 2014 8
OX – chunk size
in MB
OY – throughput AES-128 ECB in MB/sec
• CPU 5400K with AESNI has the best performance among tested compute units,
• iGPU with 3 CU yields a modest ~150MB, while the dGPU with 6 CU, yields a better
result, ~400MB/sec, which correlates with the increase in processing power/ memory
bandwidth over the iGPU.
Results Multi Unit Processing
02.07.2014 Masters Presentation Session – July 2014 9
• Various AES hybrid processing configurations (CPU+iGPU, CPU+dGPU, …)
• Work-split determined by experiments and statically given at runtime
• Performance results are bellow expectations
Influence Multi Unit Processing
02.07.2014 Masters Presentation Session – July 2014 10
CPU A6 5400K (blue)-
performance degrades with each
new processing unit added to the
hybrid configuration
Performing a trace on the CPU
with AMD Code XL we find the
higher cache miss rate as the
main cause
• OX – device configuration
• OY – throughput in MB/sec
iGPU HD7540 (yellow)- also is
only slightly impacted in multi-
device processing, most stable
processing device
Low performance, variations are
also low (~10%)
Compiler influence
02.07.2014 Masters Presentation Session – July 2014 11
• CPU assembly code generated
from C++ code for AES-NI
processing
• Differences come from how the
compiler optimizes aligned
memory accesses i.e. movdqa
vs movdqu
1. Performance with g++ 4.6 O1
=> 800MB/sec AES-128
2. Performance with g++ 4.7 O1
=> 1100MB/sec AES-128
3. Performance with g++ 4.8 O1
=> 1400MB/sec AES-128
GPU Optimizations
02.07.2014 Masters Presentation Session – July 2014 12
 Prefer cache over constant memory on AMD GCN and VLIW4 architectures fox SBOX
 Where possible analyze using precomputed tables vs computation on the fly – MixColumns is
better computed than stored ni constant memory
 PCIe bus limitations must be addressed (considerable difference PCIe4x vs PCIe16x)
 Overlapping execution with I/O could improve iGPU performance by 10-20% (figure above)
Proposed AES encryption system
02.07.2014 Masters Presentation Session – July 2014 13
 Build a low-end consumer system for large data AES encryption
 Proposed configuration is a sub 100 Euro x86 system which can encrypt large AES data
chunks (>1MB) at rates between 0.5 - 1GB/sec AES-128/AES-256 ECB
 AMD Sempron 3850 (quad core@1.3ghz, AES-NI, iGPU wOpenCL), 2GB DDR3 @1600,
mini-ITX MB and case, usb pendrive boot.
Results AES encryption system
02.07.2014 Masters Presentation Session – July 2014 14
 Ubuntu 14.04 x64 LTS operating system with g++ 4.8 and gpu catalyst driver 13.35. Tests
consisted of a 500MB le (generated by /dev/urandom) being encrypted AES-128
 CPU processing with AESNI shows the disproportion in throughput versus the iGPU - the
extent to which the iGPU may speed up computation is very limited.
Conclusions
02.07.2014 Masters Presentation Session – July 2014 15
• AES-NI instructions provide a simple way to improve AES performance by a
large margin (compiler may influence performance greatly)
• GPU for AES acceleration makes sense when the CPU does not support
AESNI, otherwise performance gain compared to cost is debatable.
• Fast data encryption using AES is possible on consumer systems that have
CPU AES-NI extensions or at least GPU accelerators
• Sub 100 Euro x86 low power system (<30 watt) with the processing
throughput of over 1GB/sec AES-128 ECB
• Future focus on heterogeneous unified memory architectures (hUMA) where
the communication between the CPU and GPU will be further simplified.

More Related Content

What's hot

GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
William Cunningham
 
CUDA
CUDACUDA
MIMUscope Instruction Manual
MIMUscope Instruction ManualMIMUscope Instruction Manual
MIMUscope Instruction Manual
oblu.io
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
Dhaval Kaneria
 
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
Dawei Mu
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
Ryousei Takano
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
Savith Satheesh
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
Ferdinand Jamitzky
 
Cuda
CudaCuda
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
Fisnik Kraja
 
Chainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereportChainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereport
Preferred Networks
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
Mahesh Khadatare
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
Khan Mostafa
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
Joaquín Aparicio Ramos
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
Ferdinand Jamitzky
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Ural-PDC
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
DESMOND YUEN
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
Martin Peniak
 
A New Approach for Video Encryption Based on Modified AES Algorithm
A New Approach for Video Encryption Based on Modified AES AlgorithmA New Approach for Video Encryption Based on Modified AES Algorithm
A New Approach for Video Encryption Based on Modified AES Algorithm
iosrjce
 

What's hot (20)

GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
CUDA
CUDACUDA
CUDA
 
MIMUscope Instruction Manual
MIMUscope Instruction ManualMIMUscope Instruction Manual
MIMUscope Instruction Manual
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
A Buffering Approach to Manage I/O in a Normalized Cross-Correlation Earthqua...
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Cuda
CudaCuda
Cuda
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Chainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereportChainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereport
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
A New Approach for Video Encryption Based on Modified AES Algorithm
A New Approach for Video Encryption Based on Modified AES AlgorithmA New Approach for Video Encryption Based on Modified AES Algorithm
A New Approach for Video Encryption Based on Modified AES Algorithm
 

Viewers also liked

Manufacturers of Fire Detection Equipment
Manufacturers of Fire Detection EquipmentManufacturers of Fire Detection Equipment
Manufacturers of Fire Detection Equipment
Global Fire Equipment
 
Data Decryption & Password Recovery
Data Decryption & Password RecoveryData Decryption & Password Recovery
Data Decryption & Password Recovery
Andrey Belenko
 
Защита данных безнеса с помощью шифрования
Защита данных безнеса с помощью шифрованияЗащита данных безнеса с помощью шифрования
Защита данных безнеса с помощью шифрования
Vladyslav Radetsky
 
McAfee Endpoint Security 10.1
McAfee Endpoint Security 10.1McAfee Endpoint Security 10.1
McAfee Endpoint Security 10.1
Vladyslav Radetsky
 
McAfee Encryption 2015
McAfee Encryption 2015McAfee Encryption 2015
McAfee Encryption 2015
Vladyslav Radetsky
 
Атаки на критичну інфраструктуру України. Висновки. Рекомендації.
Атаки на критичну інфраструктуру України. Висновки. Рекомендації.Атаки на критичну інфраструктуру України. Висновки. Рекомендації.
Атаки на критичну інфраструктуру України. Висновки. Рекомендації.
Vladyslav Radetsky
 
iOS Security and Encryption
iOS Security and EncryptioniOS Security and Encryption
iOS Security and Encryption
Urvashi Kataria
 
Mobile Device Encryption Systems
Mobile Device Encryption SystemsMobile Device Encryption Systems
Mobile Device Encryption Systems
Peter Teufl
 
Security and Encryption on iOS
Security and Encryption on iOSSecurity and Encryption on iOS
Security and Encryption on iOS
Graham Lee
 

Viewers also liked (9)

Manufacturers of Fire Detection Equipment
Manufacturers of Fire Detection EquipmentManufacturers of Fire Detection Equipment
Manufacturers of Fire Detection Equipment
 
Data Decryption & Password Recovery
Data Decryption & Password RecoveryData Decryption & Password Recovery
Data Decryption & Password Recovery
 
Защита данных безнеса с помощью шифрования
Защита данных безнеса с помощью шифрованияЗащита данных безнеса с помощью шифрования
Защита данных безнеса с помощью шифрования
 
McAfee Endpoint Security 10.1
McAfee Endpoint Security 10.1McAfee Endpoint Security 10.1
McAfee Endpoint Security 10.1
 
McAfee Encryption 2015
McAfee Encryption 2015McAfee Encryption 2015
McAfee Encryption 2015
 
Атаки на критичну інфраструктуру України. Висновки. Рекомендації.
Атаки на критичну інфраструктуру України. Висновки. Рекомендації.Атаки на критичну інфраструктуру України. Висновки. Рекомендації.
Атаки на критичну інфраструктуру України. Висновки. Рекомендації.
 
iOS Security and Encryption
iOS Security and EncryptioniOS Security and Encryption
iOS Security and Encryption
 
Mobile Device Encryption Systems
Mobile Device Encryption SystemsMobile Device Encryption Systems
Mobile Device Encryption Systems
 
Security and Encryption on iOS
Security and Encryption on iOSSecurity and Encryption on iOS
Security and Encryption on iOS
 

Similar to AES encryption on modern consumer architectures

Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
fcassier
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
Ryousei Takano
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
Fatima Qayyum
 
Synopsis on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTION STANDARD ...
Synopsis on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTION STANDARD ...Synopsis on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTION STANDARD ...
Synopsis on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTION STANDARD ...
Nikhil Jain
 
Design of area optimized aes encryption core using pipelining technology
Design of area optimized aes encryption core using pipelining technologyDesign of area optimized aes encryption core using pipelining technology
Design of area optimized aes encryption core using pipelining technology
IAEME Publication
 
Hardware Software Partitioning Of Advanced Encryption Standard To Counter Dif...
Hardware Software Partitioning Of Advanced Encryption Standard To Counter Dif...Hardware Software Partitioning Of Advanced Encryption Standard To Counter Dif...
Hardware Software Partitioning Of Advanced Encryption Standard To Counter Dif...
mjaganm
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
inside-BigData.com
 
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjLllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
ManhHoangVan
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
zaid_b
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
hvdvalk
 
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
IJECEIAES
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
Rebekah Rodriguez
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
 
Processor2
Processor2Processor2
Processor2
George Ranson
 
Chapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technologyChapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technology
BATMUNHMUNHZAYA
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
Wilhelm van Belkum
 

Similar to AES encryption on modern consumer architectures (20)

Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Synopsis on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTION STANDARD ...
Synopsis on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTION STANDARD ...Synopsis on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTION STANDARD ...
Synopsis on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTION STANDARD ...
 
Design of area optimized aes encryption core using pipelining technology
Design of area optimized aes encryption core using pipelining technologyDesign of area optimized aes encryption core using pipelining technology
Design of area optimized aes encryption core using pipelining technology
 
Hardware Software Partitioning Of Advanced Encryption Standard To Counter Dif...
Hardware Software Partitioning Of Advanced Encryption Standard To Counter Dif...Hardware Software Partitioning Of Advanced Encryption Standard To Counter Dif...
Hardware Software Partitioning Of Advanced Encryption Standard To Counter Dif...
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjLllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
 
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Processor2
Processor2Processor2
Processor2
 
Chapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technologyChapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technology
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 

Recently uploaded

22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 

Recently uploaded (20)

22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 

AES encryption on modern consumer architectures

  • 1. Author(s) Politehnica University of Bucharest Automatic Control and Computers Faculty Computer Science Department Scientific Advisor AES encryption on modern consumer architectures Ing. Grigore Lupescu grigore.lupescu@gmail.com Sl. Dr. Ing. Laura Gheorghe Presentation Session - July 2014
  • 2. AES Encryption 02.07.2014 Masters Presentation Session – July 2014 2 • Symmetric block cipher that can encrypt and decrypt information (adopted by NIST in 2001 as a standard for encryption of electronic data) • AES algorithm: 1. KeyExpansion: round keys are derived from the cipher key. 2. InitialRound: (AddRoundKey) 3. Rounds: I. SubBytes— substitution step, each byte replaced according to SBOX II. ShiftRows— transposition step, rows of the state are shifted. III. MixColumns—a mixing operation which operates on the columns of the state. Operations (+,*) are redefined in the Galois Finite Field. IV. AddRoundKey - bitwise xor of the state with the round key. 4. Final Round:(SubBytes, ShiftRows, AddRoundKey).
  • 3. Cipher Modes 02.07.2014 Masters Presentation Session – July 2014 3 • Algorithm to repeatedly apply a block cipher (e.g. AES) to the input plaintext – Most operation modes require an initialization vector • Most used cipher modes: Cipher-block chaining (CBC), Counter (CTR) • Other cipher modes: Electronic codebook (ECB), Output feedback (OFB) • Why use ECB ? – Simple, fast, very well parallelizable, max throughput – Provides a good estimate of how CTR would perform Plaintext Plaintext Plaintext Ciphertext Ciphertext Ciphertext Block cipher encryption Block cipher encryption Block cipher encryption Key Key Key
  • 4. Motivation • Explore how modern low-end commodity hardware handles AES encryption • Literature strongly favors GPU in comparison with CPU • transfer to/from the GPU is ignored, just the actual execution is considered • CPU code is not optimized/parallelized • Creates a false impression that the GPU clearly outperforms the CPU • How different scenarios impact performance on CPU and GPU • Performance on CPU, iGPU, dGPU and combinations of these • Influence of the compiler/SDK/optimization techniques • Study of the memory hierarchy and its impact on performance 01.07.2014 Bachelor Presentation Session - July 2010 4
  • 5. Software architecture (1) 01.07.2014 Bachelor Presentation Session - July 2010 5 • Written in C++, portable and modular with cmake (Linux/Windows). Make use of OpenMP, OpenCL, AES-NI. • Source code is divided into three categories: I/O, MAIN and AES. • AES code may be further divided based on target processing units – CPU => AES_HWNI – GPU => AES_GPU – CPU+GPU => AES_HYBRID
  • 6. Software architecture (2) 01.07.2014 Bachelor Presentation Session - July 2010 6 When processing (Ct + Gt) threads are spawned • Ct give work to CPU cores (Ct = number of CPU cores) • Gt give work to GPU devices (Gt = number of GPU devices) • On the left, case Ct=2 (dual core CPU), Gt=2 (iGPU, dGPU) • Each CPU core or GPU device receives a % of work according to it’s capabilities (work split is done statically by the user)
  • 7. AES Performance Testing 02.07.2014 Masters Presentation Session – July 2014 7 • Initial test system – AMD A6-5400K (dual core x86 + iGPU HD7540, AES-NI support), dGPU R7 250, 6GB DDR3@1600, Ubuntu 12.04 LTS x64 • GPGPU implementation - OpenCL – SubBytes – precomputed SBOX stored in constant memory – MixColumns – precomputed Galois Field matrices to avoid (+,*) operations – ShiftRows and AddRoundKey - simple operations • CPU implementation – OpenMP – AES encryption using AES-NI instructions and OpenMP for parallelism on multicore CPUs – Comparison with OpenSSL library
  • 8. Results Single Unit Processing 02.07.2014 Masters Presentation Session – July 2014 8 OX – chunk size in MB OY – throughput AES-128 ECB in MB/sec • CPU 5400K with AESNI has the best performance among tested compute units, • iGPU with 3 CU yields a modest ~150MB, while the dGPU with 6 CU, yields a better result, ~400MB/sec, which correlates with the increase in processing power/ memory bandwidth over the iGPU.
  • 9. Results Multi Unit Processing 02.07.2014 Masters Presentation Session – July 2014 9 • Various AES hybrid processing configurations (CPU+iGPU, CPU+dGPU, …) • Work-split determined by experiments and statically given at runtime • Performance results are bellow expectations
  • 10. Influence Multi Unit Processing 02.07.2014 Masters Presentation Session – July 2014 10 CPU A6 5400K (blue)- performance degrades with each new processing unit added to the hybrid configuration Performing a trace on the CPU with AMD Code XL we find the higher cache miss rate as the main cause • OX – device configuration • OY – throughput in MB/sec iGPU HD7540 (yellow)- also is only slightly impacted in multi- device processing, most stable processing device Low performance, variations are also low (~10%)
  • 11. Compiler influence 02.07.2014 Masters Presentation Session – July 2014 11 • CPU assembly code generated from C++ code for AES-NI processing • Differences come from how the compiler optimizes aligned memory accesses i.e. movdqa vs movdqu 1. Performance with g++ 4.6 O1 => 800MB/sec AES-128 2. Performance with g++ 4.7 O1 => 1100MB/sec AES-128 3. Performance with g++ 4.8 O1 => 1400MB/sec AES-128
  • 12. GPU Optimizations 02.07.2014 Masters Presentation Session – July 2014 12  Prefer cache over constant memory on AMD GCN and VLIW4 architectures fox SBOX  Where possible analyze using precomputed tables vs computation on the fly – MixColumns is better computed than stored ni constant memory  PCIe bus limitations must be addressed (considerable difference PCIe4x vs PCIe16x)  Overlapping execution with I/O could improve iGPU performance by 10-20% (figure above)
  • 13. Proposed AES encryption system 02.07.2014 Masters Presentation Session – July 2014 13  Build a low-end consumer system for large data AES encryption  Proposed configuration is a sub 100 Euro x86 system which can encrypt large AES data chunks (>1MB) at rates between 0.5 - 1GB/sec AES-128/AES-256 ECB  AMD Sempron 3850 (quad core@1.3ghz, AES-NI, iGPU wOpenCL), 2GB DDR3 @1600, mini-ITX MB and case, usb pendrive boot.
  • 14. Results AES encryption system 02.07.2014 Masters Presentation Session – July 2014 14  Ubuntu 14.04 x64 LTS operating system with g++ 4.8 and gpu catalyst driver 13.35. Tests consisted of a 500MB le (generated by /dev/urandom) being encrypted AES-128  CPU processing with AESNI shows the disproportion in throughput versus the iGPU - the extent to which the iGPU may speed up computation is very limited.
  • 15. Conclusions 02.07.2014 Masters Presentation Session – July 2014 15 • AES-NI instructions provide a simple way to improve AES performance by a large margin (compiler may influence performance greatly) • GPU for AES acceleration makes sense when the CPU does not support AESNI, otherwise performance gain compared to cost is debatable. • Fast data encryption using AES is possible on consumer systems that have CPU AES-NI extensions or at least GPU accelerators • Sub 100 Euro x86 low power system (<30 watt) with the processing throughput of over 1GB/sec AES-128 ECB • Future focus on heterogeneous unified memory architectures (hUMA) where the communication between the CPU and GPU will be further simplified.