SlideShare a Scribd company logo
1 of 13
Download to read offline
Author(s)
Politehnica
University of
Bucharest
Automatic Control
and Computers
Faculty
Computer
Science
Department
Scientific Advisor
AES encryption using GPU
architectures
Grigore Lupescu Emil Slusanschi
Scientific Student Projects Session - May 2014
AES Encrytion (1)
17.05.2014 Scientific Student Projects Session - May 2014 2
 Algorithm to repeatedly apply a block cipher (e.g. AES) to the input plaintext
 Most operation modes require an initialization vector
 Most used cipher modes: Cipher-block chaining (CBC), Counter (CTR)
 Other cipher modes: Electronic codebook (ECB), Output feedback (OFB)
 Why use ECB ?
 Simple, fast, very well parallelizable, max throughput
 Provides a good estimate of how CTR would perform
AES Encrytion (2)
17.05.2014 Scientific Student Projects Session - May 2014 3
 KeyExpansion: round keys are derived from the cipher key.
 InitialRound: (AddRoundKey)
 Rounds:
 SubBytes— substitution step where each byte is replaced with another
according to SBOX table.
 ShiftRows— transposition step where the last three rows of the state are
shifted.
 MixColumns—a mixing operation which operates on the columns of the
state. Operations (+,*) are redefined in the Galois Finite Field.
 AddRoundKey - bitwise xor of each byte of the state with the round key.
 Final Round:(SubBytes, ShiftRows, AddRoundKey).
Target System (1)
15.05.2014 Scientific Student Projects Session - May 2012 4
 SoC CPU – AMD A4 4000K (2 cores @3.0ghz,
Richland architecture, AES-NI), cores denoted
by BLUE
 SoC Integrated GPU HD7480 (iGPU), 2 SIMD
units of 64 cores each (VLIW4 architecture),
SIMD units denoted by RED
 Discrete GPU AMD R7 250 (dGPU), 6 SIMD units
of 64 cores each (GCN architecture), PCIe 16x
2.0 bus, SIMD units denoted by RED
 Data to be encrypted denoted by GREEN
 Software – C/C++/OpenCL, Linux Ubuntu 14.04
x64
Target System (2)
15.05.2014 Scientific Student Projects Session - May 2012 5
Algorithm Opt_1
• Array “indata” will reside in global device memory (__global)
• Variable “state” which holds transformations will be in GPU cache (__local)
• Simple operation “ShiftRows” is designed with vector addressing
(state.s05AF49E38.. )
• Simple operation “AddRoundKey” is a simple XOR (state ^ key).
• Complex operation “SubBytes” will use precomputed tables of Sbox, stored in
constant memory
• Complex operation “MixColumns” will use precomputed tables of
Galois_FiniteField, stored in constant memory
• Host sample code bellow (simple blocking enqueues)
while(!done()) { writeData(32MB, &offset);
execKernel(32MB, &offset); readData(32MB, &offset); }
15.05.2014 Scientific Student Projects Session - May 2012 6
Results Opt_1
15.05.2014 Scientific Student Projects Session - May 2012 7
• AMD CodeXL profiling, initial results – iGPU A4 4000, ~100MB/sec AES ECB128
Algorithm Opt_2
• Array “indata” will reside in global device memory (__global)
• Variable “state” which holds transformations will be in GPU cache (__local)
• Simple operation “ShiftRows” - unchanged
• Simple operation “AddRoundKey” – unchanged
• Complex operation “SubBytes” will use precomputed tables of Sbox, stored in
cache memory (__local)
• Complex operation “MixColumns” compute values instead of using precomputed
(used optimized version of MixColumns)
• Host sample code – unchanged
15.05.2014 Scientific Student Projects Session - May 2012 8
Results Opt_2
15.05.2014 Scientific Student Projects Session - May 2012 9
• Profiling, Opt_1 – iGPU A4 4000, ~100MB/sec AES ECB128
• Profiling, Opt_2 – iGPU A4 4000, ~210MB/sec AES ECB128
Algorithm Opt_3
• Array “indata” will reside in global device memory
(__global)
• Variable “state” which holds transformations will be
in GPU cache (__local)
• Simple operation “ShiftRows” - unchanged
• Simple operation “AddRoundKey” – unchanged
• Complex operation “SubBytes” – unchanged
• Complex operation “MixColumns” - unchanged
• Host sample code – overlap execution with I/O by
creating multiple queues (R, W, E)
15.05.2014 Scientific Student Projects Session - May 2012 10
Algorithm Opt_3 (2)
15.05.2014 Scientific Student Projects Session - May 2012 11
Results Opt_3
15.05.2014 Scientific Student Projects Session - May 2012 12
• Right figure - Results AES
ECB128 in MB/sec, of serial
(Opt_2) vs overlap (Opt_3)
• Bellow figure – 3 OpenCL
queues (R, W, E) for async
enqueues hence to achieve
overlap execution with I/O
Conclusions
15.05.2014 Scientific Student Projects Session - May 2012 13
 iGPU AES performance is good (faster than CPU but CPU AESNI is fastest)
 Prefer cache over constant memory
 Where possible analyze using precomputed tables vs computation on the fly
 Overlaping execution with I/O could improve iGPU performance by 10-20%
 Space of the iGPU occupied in the x86 SoC die increases with each generation and its
contribution in AES throughput will increase as well
 Memory transfers are expected to improve with each new generation and with them
CPU/iGPU performance

More Related Content

What's hot

Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the CanariesKernel TLV
 
Performance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVEPerformance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVELinaro
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...Storti Mario
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaXKernel TLV
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportLinaro
 
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Filipo Mór
 
GPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application ModelsGPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application ModelsFilipo Mór
 
Yet another introduction to Linux RCU
Yet another introduction to Linux RCUYet another introduction to Linux RCU
Yet another introduction to Linux RCUViller Hsiao
 
Semtex.c [CVE-2013-2094] - A Linux Privelege Escalation
Semtex.c [CVE-2013-2094] - A Linux Privelege EscalationSemtex.c [CVE-2013-2094] - A Linux Privelege Escalation
Semtex.c [CVE-2013-2094] - A Linux Privelege EscalationKernel TLV
 
ParallelLogicToEventDrivenFirmware_Doin
ParallelLogicToEventDrivenFirmware_DoinParallelLogicToEventDrivenFirmware_Doin
ParallelLogicToEventDrivenFirmware_DoinJonny Doin
 
protothread and its usage in contiki OS
protothread and its usage in contiki OSprotothread and its usage in contiki OS
protothread and its usage in contiki OSSalah Amean
 
DUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into KernelDUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into KernelAlexey Smirnov
 
Use Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeUse Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeAlessio Coltellacci
 

What's hot (20)

Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Performance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVEPerformance evaluation with Arm HPC tools for SVE
Performance evaluation with Arm HPC tools for SVE
 
20131212
2013121220131212
20131212
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
 
Afanasov14flynet slides
Afanasov14flynet slidesAfanasov14flynet slides
Afanasov14flynet slides
 
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler support
 
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
 
GPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application ModelsGPU Performance Prediction Using High-level Application Models
GPU Performance Prediction Using High-level Application Models
 
Yet another introduction to Linux RCU
Yet another introduction to Linux RCUYet another introduction to Linux RCU
Yet another introduction to Linux RCU
 
Semtex.c [CVE-2013-2094] - A Linux Privelege Escalation
Semtex.c [CVE-2013-2094] - A Linux Privelege EscalationSemtex.c [CVE-2013-2094] - A Linux Privelege Escalation
Semtex.c [CVE-2013-2094] - A Linux Privelege Escalation
 
Thesis Final Presentation
Thesis Final PresentationThesis Final Presentation
Thesis Final Presentation
 
Lec05 buffers basic_examples
Lec05 buffers basic_examplesLec05 buffers basic_examples
Lec05 buffers basic_examples
 
ParallelLogicToEventDrivenFirmware_Doin
ParallelLogicToEventDrivenFirmware_DoinParallelLogicToEventDrivenFirmware_Doin
ParallelLogicToEventDrivenFirmware_Doin
 
protothread and its usage in contiki OS
protothread and its usage in contiki OSprotothread and its usage in contiki OS
protothread and its usage in contiki OS
 
DUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into KernelDUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into Kernel
 
Use Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeUse Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient code
 

Viewers also liked

Grupo 2 gilbert 2011
Grupo 2 gilbert 2011Grupo 2 gilbert 2011
Grupo 2 gilbert 2011laveroniquita
 
MIRROR WAR 2nd CBT REPORT
MIRROR WAR 2nd CBT REPORTMIRROR WAR 2nd CBT REPORT
MIRROR WAR 2nd CBT REPORTMIRROR WAR
 
General Information/Institutional information 2014
General Information/Institutional information 2014General Information/Institutional information 2014
General Information/Institutional information 2014University of Pretoria
 
Pio outbound
Pio outboundPio outbound
Pio outboundmaureen07
 
Faculty of Engineering, Built Environment and Information Technology:/Ebit pa...
Faculty of Engineering, Built Environment and Information Technology:/Ebit pa...Faculty of Engineering, Built Environment and Information Technology:/Ebit pa...
Faculty of Engineering, Built Environment and Information Technology:/Ebit pa...University of Pretoria
 
Aqperli Petroglyphs in East Kazakhstan. Nurkassym K, Isin A, Tuganov S. Desig...
Aqperli Petroglyphs in East Kazakhstan. Nurkassym K, Isin A, Tuganov S. Desig...Aqperli Petroglyphs in East Kazakhstan. Nurkassym K, Isin A, Tuganov S. Desig...
Aqperli Petroglyphs in East Kazakhstan. Nurkassym K, Isin A, Tuganov S. Desig...University of Turan Astana
 
PALS Project model Accredited Learning & Support - World Vision
PALS Project model Accredited Learning & Support - World VisionPALS Project model Accredited Learning & Support - World Vision
PALS Project model Accredited Learning & Support - World VisionMoodlemootAU2014
 
Grupo 2 gilbert 2011
Grupo 2 gilbert 2011Grupo 2 gilbert 2011
Grupo 2 gilbert 2011laveroniquita
 
CEO Форум «Україна на перехресті: підсумки року після революції гідності» .
CEO Форум «Україна на перехресті: підсумки року після революції гідності» .CEO Форум «Україна на перехресті: підсумки року після революції гідності» .
CEO Форум «Україна на перехресті: підсумки року після революції гідності» .Dmytro Diedushenko
 
Coolerado's super-efficient air conditioning expanded to include small reside...
Coolerado's super-efficient air conditioning expanded to include small reside...Coolerado's super-efficient air conditioning expanded to include small reside...
Coolerado's super-efficient air conditioning expanded to include small reside...plantunderworld90
 
The history og Kazakh children organization "Atameken" (M.Kursabaev)
The history og Kazakh children organization "Atameken" (M.Kursabaev)The history og Kazakh children organization "Atameken" (M.Kursabaev)
The history og Kazakh children organization "Atameken" (M.Kursabaev)University of Turan Astana
 
Презентация Роста Дикого "Мужской гардероб"
Презентация Роста Дикого "Мужской гардероб" Презентация Роста Дикого "Мужской гардероб"
Презентация Роста Дикого "Мужской гардероб" Dmytro Diedushenko
 
Reflective learning across online discussion forums - Evolving educational pr...
Reflective learning across online discussion forums - Evolving educational pr...Reflective learning across online discussion forums - Evolving educational pr...
Reflective learning across online discussion forums - Evolving educational pr...MoodlemootAU2014
 
Fit beats - Startup Weekend Louisville
Fit beats - Startup Weekend LouisvilleFit beats - Startup Weekend Louisville
Fit beats - Startup Weekend LouisvilleJacob A. Heller
 

Viewers also liked (20)

Grupo 2 gilbert 2011
Grupo 2 gilbert 2011Grupo 2 gilbert 2011
Grupo 2 gilbert 2011
 
Thakurer bani
Thakurer baniThakurer bani
Thakurer bani
 
3º unit 5 transport&street
3º unit 5 transport&street3º unit 5 transport&street
3º unit 5 transport&street
 
3º unit 2 petjobs&accessories
3º unit 2 petjobs&accessories3º unit 2 petjobs&accessories
3º unit 2 petjobs&accessories
 
MIRROR WAR 2nd CBT REPORT
MIRROR WAR 2nd CBT REPORTMIRROR WAR 2nd CBT REPORT
MIRROR WAR 2nd CBT REPORT
 
General Information/Institutional information 2014
General Information/Institutional information 2014General Information/Institutional information 2014
General Information/Institutional information 2014
 
Pio outbound
Pio outboundPio outbound
Pio outbound
 
Faculty of Engineering, Built Environment and Information Technology:/Ebit pa...
Faculty of Engineering, Built Environment and Information Technology:/Ebit pa...Faculty of Engineering, Built Environment and Information Technology:/Ebit pa...
Faculty of Engineering, Built Environment and Information Technology:/Ebit pa...
 
Aqperli Petroglyphs in East Kazakhstan. Nurkassym K, Isin A, Tuganov S. Desig...
Aqperli Petroglyphs in East Kazakhstan. Nurkassym K, Isin A, Tuganov S. Desig...Aqperli Petroglyphs in East Kazakhstan. Nurkassym K, Isin A, Tuganov S. Desig...
Aqperli Petroglyphs in East Kazakhstan. Nurkassym K, Isin A, Tuganov S. Desig...
 
PALS Project model Accredited Learning & Support - World Vision
PALS Project model Accredited Learning & Support - World VisionPALS Project model Accredited Learning & Support - World Vision
PALS Project model Accredited Learning & Support - World Vision
 
Atto - Moodle HQ
Atto - Moodle HQAtto - Moodle HQ
Atto - Moodle HQ
 
Grupo 2 gilbert 2011
Grupo 2 gilbert 2011Grupo 2 gilbert 2011
Grupo 2 gilbert 2011
 
SWCDO 101: Project management
SWCDO 101: Project managementSWCDO 101: Project management
SWCDO 101: Project management
 
CEO Форум «Україна на перехресті: підсумки року після революції гідності» .
CEO Форум «Україна на перехресті: підсумки року після революції гідності» .CEO Форум «Україна на перехресті: підсумки року після революції гідності» .
CEO Форум «Україна на перехресті: підсумки року після революції гідності» .
 
Coolerado's super-efficient air conditioning expanded to include small reside...
Coolerado's super-efficient air conditioning expanded to include small reside...Coolerado's super-efficient air conditioning expanded to include small reside...
Coolerado's super-efficient air conditioning expanded to include small reside...
 
Africa (120slides)
Africa (120slides)Africa (120slides)
Africa (120slides)
 
The history og Kazakh children organization "Atameken" (M.Kursabaev)
The history og Kazakh children organization "Atameken" (M.Kursabaev)The history og Kazakh children organization "Atameken" (M.Kursabaev)
The history og Kazakh children organization "Atameken" (M.Kursabaev)
 
Презентация Роста Дикого "Мужской гардероб"
Презентация Роста Дикого "Мужской гардероб" Презентация Роста Дикого "Мужской гардероб"
Презентация Роста Дикого "Мужской гардероб"
 
Reflective learning across online discussion forums - Evolving educational pr...
Reflective learning across online discussion forums - Evolving educational pr...Reflective learning across online discussion forums - Evolving educational pr...
Reflective learning across online discussion forums - Evolving educational pr...
 
Fit beats - Startup Weekend Louisville
Fit beats - Startup Weekend LouisvilleFit beats - Startup Weekend Louisville
Fit beats - Startup Weekend Louisville
 

Similar to AES on modern GPUs

AES encryption on modern consumer architectures
AES encryption on modern consumer architecturesAES encryption on modern consumer architectures
AES encryption on modern consumer architecturesGrigore Lupescu
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08Neil Pittman
 
The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.Slide_N
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
Design of area optimized aes encryption core using pipelining technology
Design of area optimized aes encryption core using pipelining technologyDesign of area optimized aes encryption core using pipelining technology
Design of area optimized aes encryption core using pipelining technologyIAEME Publication
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
 
Brief Introduction to Parallella
Brief Introduction to ParallellaBrief Introduction to Parallella
Brief Introduction to ParallellaSomnath Mazumdar
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Tokyo Institute of Technology
 
epscor_talk_2.pptx
epscor_talk_2.pptxepscor_talk_2.pptx
epscor_talk_2.pptxShadowCon
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin敬倫 林
 
AES effecitve software implementation
AES effecitve software implementationAES effecitve software implementation
AES effecitve software implementationRoman Oliynykov
 

Similar to AES on modern GPUs (20)

AES encryption on modern consumer architectures
AES encryption on modern consumer architecturesAES encryption on modern consumer architectures
AES encryption on modern consumer architectures
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Aes
AesAes
Aes
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
A04660105
A04660105A04660105
A04660105
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08
 
The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.The Best Programming Practice for Cell/B.E.
The Best Programming Practice for Cell/B.E.
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Design of area optimized aes encryption core using pipelining technology
Design of area optimized aes encryption core using pipelining technologyDesign of area optimized aes encryption core using pipelining technology
Design of area optimized aes encryption core using pipelining technology
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Brief Introduction to Parallella
Brief Introduction to ParallellaBrief Introduction to Parallella
Brief Introduction to Parallella
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
 
epscor_talk_2.pptx
epscor_talk_2.pptxepscor_talk_2.pptx
epscor_talk_2.pptx
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptx
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
AES effecitve software implementation
AES effecitve software implementationAES effecitve software implementation
AES effecitve software implementation
 

Recently uploaded

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Recently uploaded (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

AES on modern GPUs

  • 1. Author(s) Politehnica University of Bucharest Automatic Control and Computers Faculty Computer Science Department Scientific Advisor AES encryption using GPU architectures Grigore Lupescu Emil Slusanschi Scientific Student Projects Session - May 2014
  • 2. AES Encrytion (1) 17.05.2014 Scientific Student Projects Session - May 2014 2  Algorithm to repeatedly apply a block cipher (e.g. AES) to the input plaintext  Most operation modes require an initialization vector  Most used cipher modes: Cipher-block chaining (CBC), Counter (CTR)  Other cipher modes: Electronic codebook (ECB), Output feedback (OFB)  Why use ECB ?  Simple, fast, very well parallelizable, max throughput  Provides a good estimate of how CTR would perform
  • 3. AES Encrytion (2) 17.05.2014 Scientific Student Projects Session - May 2014 3  KeyExpansion: round keys are derived from the cipher key.  InitialRound: (AddRoundKey)  Rounds:  SubBytes— substitution step where each byte is replaced with another according to SBOX table.  ShiftRows— transposition step where the last three rows of the state are shifted.  MixColumns—a mixing operation which operates on the columns of the state. Operations (+,*) are redefined in the Galois Finite Field.  AddRoundKey - bitwise xor of each byte of the state with the round key.  Final Round:(SubBytes, ShiftRows, AddRoundKey).
  • 4. Target System (1) 15.05.2014 Scientific Student Projects Session - May 2012 4  SoC CPU – AMD A4 4000K (2 cores @3.0ghz, Richland architecture, AES-NI), cores denoted by BLUE  SoC Integrated GPU HD7480 (iGPU), 2 SIMD units of 64 cores each (VLIW4 architecture), SIMD units denoted by RED  Discrete GPU AMD R7 250 (dGPU), 6 SIMD units of 64 cores each (GCN architecture), PCIe 16x 2.0 bus, SIMD units denoted by RED  Data to be encrypted denoted by GREEN  Software – C/C++/OpenCL, Linux Ubuntu 14.04 x64
  • 5. Target System (2) 15.05.2014 Scientific Student Projects Session - May 2012 5
  • 6. Algorithm Opt_1 • Array “indata” will reside in global device memory (__global) • Variable “state” which holds transformations will be in GPU cache (__local) • Simple operation “ShiftRows” is designed with vector addressing (state.s05AF49E38.. ) • Simple operation “AddRoundKey” is a simple XOR (state ^ key). • Complex operation “SubBytes” will use precomputed tables of Sbox, stored in constant memory • Complex operation “MixColumns” will use precomputed tables of Galois_FiniteField, stored in constant memory • Host sample code bellow (simple blocking enqueues) while(!done()) { writeData(32MB, &offset); execKernel(32MB, &offset); readData(32MB, &offset); } 15.05.2014 Scientific Student Projects Session - May 2012 6
  • 7. Results Opt_1 15.05.2014 Scientific Student Projects Session - May 2012 7 • AMD CodeXL profiling, initial results – iGPU A4 4000, ~100MB/sec AES ECB128
  • 8. Algorithm Opt_2 • Array “indata” will reside in global device memory (__global) • Variable “state” which holds transformations will be in GPU cache (__local) • Simple operation “ShiftRows” - unchanged • Simple operation “AddRoundKey” – unchanged • Complex operation “SubBytes” will use precomputed tables of Sbox, stored in cache memory (__local) • Complex operation “MixColumns” compute values instead of using precomputed (used optimized version of MixColumns) • Host sample code – unchanged 15.05.2014 Scientific Student Projects Session - May 2012 8
  • 9. Results Opt_2 15.05.2014 Scientific Student Projects Session - May 2012 9 • Profiling, Opt_1 – iGPU A4 4000, ~100MB/sec AES ECB128 • Profiling, Opt_2 – iGPU A4 4000, ~210MB/sec AES ECB128
  • 10. Algorithm Opt_3 • Array “indata” will reside in global device memory (__global) • Variable “state” which holds transformations will be in GPU cache (__local) • Simple operation “ShiftRows” - unchanged • Simple operation “AddRoundKey” – unchanged • Complex operation “SubBytes” – unchanged • Complex operation “MixColumns” - unchanged • Host sample code – overlap execution with I/O by creating multiple queues (R, W, E) 15.05.2014 Scientific Student Projects Session - May 2012 10
  • 11. Algorithm Opt_3 (2) 15.05.2014 Scientific Student Projects Session - May 2012 11
  • 12. Results Opt_3 15.05.2014 Scientific Student Projects Session - May 2012 12 • Right figure - Results AES ECB128 in MB/sec, of serial (Opt_2) vs overlap (Opt_3) • Bellow figure – 3 OpenCL queues (R, W, E) for async enqueues hence to achieve overlap execution with I/O
  • 13. Conclusions 15.05.2014 Scientific Student Projects Session - May 2012 13  iGPU AES performance is good (faster than CPU but CPU AESNI is fastest)  Prefer cache over constant memory  Where possible analyze using precomputed tables vs computation on the fly  Overlaping execution with I/O could improve iGPU performance by 10-20%  Space of the iGPU occupied in the x86 SoC die increases with each generation and its contribution in AES throughput will increase as well  Memory transfers are expected to improve with each new generation and with them CPU/iGPU performance