Uploaded byLeonardo Antichi

PPTX, PDF383 views

AES Implementation on FPGA

The document discusses two FPGA designs for AES implementation, focusing on a high-speed design achieving 25 Gbps throughput and a compact design utilizing only 124 slices with 2.2 Mbps throughput. The faster design allows for periodic key changes without throughput loss and dynamic switching between encryption and decryption modes, while the smaller design emphasizes efficiency with an 8-bit data path and pipelined architecture. It showcases the advantages of FPGA-specific optimizations over traditional ASIC approaches in both speed and area.

AES implementation
on FPGA
From the FASTEST to the SMALLEST
By
Tim Good and Mohammed Benaissa
Presented By Leonardo Antichi

The AIM
• Show two new FPGA designs for AES implementation:
• Faster & continuous throughput (25Gbps)
• Smallest Area (124 slices)
• Both support a 128-bit key
• Demonstrate the advantage of an FPGA specific
optimisation over an ASIC number-of-gates approach
showing the speed improvement made thanks to the loop
unrolled design

The Fastest
• Key Feature:
• 25 Gbps on a Xilinx
Spartan-III
• Key may be periodically
changed without loss
of throughput
• Operating mode may
be changed between
encryption and
decryption at will
• How:
• Specific FPGA pipeline
placement (deep cuts)
• Parallel loop-unrolled
design
• SubBytes operation in
computational form
(rather than a “SBox”)
• Separate clock domain
for key expansion

The Fastest
• Key Feature:
• 25 Gbps on a Xilinx
Spartan-III
• Key may be periodically
changed without loss of
throughput
• Operating mode may be
changed between encryption
and decryption at will
• How:
• Specific FPGA pipeline
placement (deep cuts)
• Parallel loop-unrolled design
• SubBytes operation in
computational form (rather
than a “SBox”)
• Separate clock domain for
key expansion
Fig. 1. Block diagram for each middle round Fig. 2. Block diagram for final round

The Smallest
• Key Feature:
• 124 slices only (60% of the
Xilinx Spartan-II device)
• 2,2 Mbps throughput
• How:
• Specific Processor
Architecture using 8-bit
data-path (ASIP)
• Subroutines and looping
• Pipelined design
permitting execution of a
new instruction every
cycle
• 360 bits of RAM

The Smallest
• Key Feature:
• 124 slices only (60% of the
Xilinx Spartan-II device)
• 2,2 Mbps throughput
• How:
• Specific Processor
Architecture using 8-bit
data-path (ASIP)
• Subroutines and looping
• Pipelined design permitting
execution of a new
instruction every cycle
• 360 bits of RAM
Fig. 3. ASIP Architecture
S-Box, 42
Fig. 4. Slice utilization versus design unit

Conclusion
Fig. 5. Throughput versus area for the different FPGA designs

Thank you
For your attention

Recommended

PPTX

Using FPGA in Embedded Devices

byGlobalLogic Ukraine

PDF

AIC Venus Smart Rack

PPTX

Implementation of Soft-core Processor on FPGA

PPTX

MarsBoard - NXP IMX6 Processor

byNEEVEE Technologies

PPTX

2017 - LISA - LinkedIn's Distributed Firewall (DFW)

PPTX

Introduction to Hardware Design Using KiCAD

byNEEVEE Technologies

PPTX

Hannes end-of-the-router-tnc17

byHannes Gredler

PDF

OpenDataPlane - Bill Fischofer

byharryvanhaaren

PDF

“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...

byEdge AI and Vision Alliance

PDF

DPDK Acceleration with Arkville

byShepard Siegel

PDF

Design and Testing Challenges for Chiplet Based Design: Assembly and Test View

byODSA Workgroup

PPTX

Building a Router

byHannes Gredler

PPTX

NXP i.MX6 Multi Media Processor & Peripherals

byNEEVEE Technologies

PDF

Hyperscan - Mohammad Abdul Awal

byharryvanhaaren

PPTX

Ppt quick logic

bySakshi Bhargava

PDF

Open Network OS Overview as of 2015/10/16

byKentaro Ebisawa

PPT

The Microarchitecure Of FPGA Based Soft Processor

PPTX

TIAD 2016 : Network automation with Ansible and OpenConfig/YANG

byThe Incredible Automation Day

PPT

iCAM

PDF

HKG15-506: Comcast - Lessons learned from migrating the RDK code base to the ...

PPTX

ONAP - Open Network Automation Platform

PPTX

PLB

PPTX

LEGaTO Heterogeneous Hardware

byLEGATO project

PPTX

TMS20DM8148 Embedded Linux Session II

byNEEVEE Technologies

PDF

0 foundation update__final - Mendy Furmanek

PDF

DPDK Architecture Musings - Andy Harvey

byharryvanhaaren

PPTX

Router3

PDF

Atf 3 q15-1 - introduction

PDF

An Efficient FPGA Implementation of the Advanced Encryption Standard Algorithm

PDF

Fpga implementation of encryption and decryption algorithm based on aes

byeSAT Publishing House

More Related Content

PPTX

Using FPGA in Embedded Devices

byGlobalLogic Ukraine

PDF

AIC Venus Smart Rack

PPTX

Implementation of Soft-core Processor on FPGA

PPTX

MarsBoard - NXP IMX6 Processor

byNEEVEE Technologies

PPTX

2017 - LISA - LinkedIn's Distributed Firewall (DFW)

PPTX

Introduction to Hardware Design Using KiCAD

byNEEVEE Technologies

PPTX

Hannes end-of-the-router-tnc17

byHannes Gredler

PDF

OpenDataPlane - Bill Fischofer

byharryvanhaaren

Using FPGA in Embedded Devices

byGlobalLogic Ukraine

AIC Venus Smart Rack

Implementation of Soft-core Processor on FPGA

MarsBoard - NXP IMX6 Processor

byNEEVEE Technologies

2017 - LISA - LinkedIn's Distributed Firewall (DFW)

Introduction to Hardware Design Using KiCAD

byNEEVEE Technologies

Hannes end-of-the-router-tnc17

byHannes Gredler

OpenDataPlane - Bill Fischofer

byharryvanhaaren

What's hot

PDF

“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...

byEdge AI and Vision Alliance

PDF

DPDK Acceleration with Arkville

byShepard Siegel

PDF

Design and Testing Challenges for Chiplet Based Design: Assembly and Test View

byODSA Workgroup

PPTX

Building a Router

byHannes Gredler

PPTX

NXP i.MX6 Multi Media Processor & Peripherals

byNEEVEE Technologies

PDF

Hyperscan - Mohammad Abdul Awal

byharryvanhaaren

PPTX

Ppt quick logic

bySakshi Bhargava

PDF

Open Network OS Overview as of 2015/10/16

byKentaro Ebisawa

PPT

The Microarchitecure Of FPGA Based Soft Processor

PPTX

TIAD 2016 : Network automation with Ansible and OpenConfig/YANG

byThe Incredible Automation Day

PPT

iCAM

PDF

HKG15-506: Comcast - Lessons learned from migrating the RDK code base to the ...

PPTX

ONAP - Open Network Automation Platform

PPTX

PLB

PPTX

LEGaTO Heterogeneous Hardware

byLEGATO project

PPTX

TMS20DM8148 Embedded Linux Session II

byNEEVEE Technologies

PDF

0 foundation update__final - Mendy Furmanek

PDF

DPDK Architecture Musings - Andy Harvey

byharryvanhaaren

PPTX

Router3

PDF

Atf 3 q15-1 - introduction

“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...

byEdge AI and Vision Alliance

DPDK Acceleration with Arkville

byShepard Siegel

Design and Testing Challenges for Chiplet Based Design: Assembly and Test View

byODSA Workgroup

Building a Router

byHannes Gredler

NXP i.MX6 Multi Media Processor & Peripherals

byNEEVEE Technologies

Hyperscan - Mohammad Abdul Awal

byharryvanhaaren

Ppt quick logic

bySakshi Bhargava

Open Network OS Overview as of 2015/10/16

byKentaro Ebisawa

The Microarchitecure Of FPGA Based Soft Processor

TIAD 2016 : Network automation with Ansible and OpenConfig/YANG

byThe Incredible Automation Day

iCAM

HKG15-506: Comcast - Lessons learned from migrating the RDK code base to the ...

ONAP - Open Network Automation Platform

PLB

LEGaTO Heterogeneous Hardware

byLEGATO project

TMS20DM8148 Embedded Linux Session II

byNEEVEE Technologies

0 foundation update__final - Mendy Furmanek

DPDK Architecture Musings - Andy Harvey

byharryvanhaaren

Router3

Atf 3 q15-1 - introduction

Similar to AES Implementation on FPGA

PDF

An Efficient FPGA Implementation of the Advanced Encryption Standard Algorithm

PDF

Fpga implementation of encryption and decryption algorithm based on aes

byeSAT Publishing House

PDF

Design and Implementation of Area Efficiency AES Algoritham with FPGA and ASIC,

bypaperpublications3

PDF

High throughput FPGA Implementation of Advanced Encryption Standard Algorithm

byTELKOMNIKA JOURNAL

PDF

Implementation and Design of AES S-Box on FPGA

byIJRES Journal

PDF

Tdp.a029a10

PDF

Comparison of AES and DES Algorithms Implemented on Virtex-6 FPGA and Microbl...

PDF

A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL

PDF

Hardware implementation of aes encryption and decryption for low area & power...

byeSAT Publishing House

PPTX

1st review major project aes algorithm.pptx

PPTX

Fast

bylodingthepage440

PDF

Aes

PDF

Design and Implementation of Area Efficiency AES Algoritham with FPGA and ASIC

bypaperpublications3

PDF

IJCER (www.ijceronline.com) International Journal of computational Engineerin...

PDF

Aes

PDF

A03530107

byinventionjournals

PDF

publication

PDF

IRJET- A Review on Various Secured Data Encryption Models based on AES Standard

byIRJET Journal

PPT

Unit -2.ppt

byDHANABALSUBRAMANIAN

PDF

11

An Efficient FPGA Implementation of the Advanced Encryption Standard Algorithm

Fpga implementation of encryption and decryption algorithm based on aes

byeSAT Publishing House

Design and Implementation of Area Efficiency AES Algoritham with FPGA and ASIC,

bypaperpublications3

High throughput FPGA Implementation of Advanced Encryption Standard Algorithm

byTELKOMNIKA JOURNAL

Implementation and Design of AES S-Box on FPGA

byIJRES Journal

Tdp.a029a10

Comparison of AES and DES Algorithms Implemented on Virtex-6 FPGA and Microbl...

A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL

Hardware implementation of aes encryption and decryption for low area & power...

byeSAT Publishing House

1st review major project aes algorithm.pptx

Fast

bylodingthepage440

Aes

Design and Implementation of Area Efficiency AES Algoritham with FPGA and ASIC

bypaperpublications3

IJCER (www.ijceronline.com) International Journal of computational Engineerin...

Aes

A03530107

byinventionjournals

publication

IRJET- A Review on Various Secured Data Encryption Models based on AES Standard

byIRJET Journal

Unit -2.ppt

byDHANABALSUBRAMANIAN

11

More from Leonardo Antichi

PPTX

Short Brocade Presentation

byLeonardo Antichi

PPTX

Forcepoint Overview

byLeonardo Antichi

PPTX

Checkpoint Overview

byLeonardo Antichi

PDF

The Equation Group & Greyfish

byLeonardo Antichi

PPTX

Behavioral biometrics

byLeonardo Antichi

PDF

The CCleaner Infection

byLeonardo Antichi

Short Brocade Presentation

byLeonardo Antichi

Forcepoint Overview

byLeonardo Antichi

Checkpoint Overview

byLeonardo Antichi

The Equation Group & Greyfish

byLeonardo Antichi

Behavioral biometrics

byLeonardo Antichi

The CCleaner Infection

byLeonardo Antichi

Recently uploaded

PDF

Achieving Interoperability in Healthcare IT: A Strategic Guide

PDF

Artificial Intelligence(AI) full Book.pdf

bydeyanimesh299

PDF

Beyond BBB: Practical Alternatives to Posterior Approximation in Bayesian Neu...

byTomasz Kusmierczyk

PDF

Xemelgo - RFID Industry Predictions for 2026

PPTX

Why 2026 Could Be a Turning Point for Decentralized Exchanges.pptx

bycalliemorgan193

PDF

IAC 500 Sensor - Humidity Measurement Device

byCS Instruments

PDF

Techbrains Baku 2025 by GoUP - all speaker session

byHusseinMalikMammadli

PPTX

Salesforce Spring 26 Release Key .pptx

byMehediHasan939830

PDF

Unit 1.2 Components of a Computer System.pdf

bySanieBautista

PDF

Artificial Intelligence and Barbarism - Conceptual Map

PDF

Navigating the Spectrum of Advanced AI – Agentic, Autonomous, and Autopoietic...

byScott M. Graffius

PDF

250 Prompts - ChatGPT acting as your Assistant

byMihir Nagarkar

PDF

“Lessons from Yesterday's Tomorrowland” by Scott M. Graffius

byScott M. Graffius

PDF

MAD (1).pdf Mobile application develipment

PPTX

Cesium formate brine - A brief history - prepared by John Downs in May 2011

PDF

AI Driven & AI Native Development AI native development

PPTX

Corporate AI Training to AI Enable a Company Workforce

byStélio Inácio

PPTX

Large Language Model. LLM Audit Artificial Intelligence

byMANASCHOUDHARY22

PDF

Ai In Courts Ai in courts AI in court AI in court

PPTX

Digital transformation success powered by EPM and NexInfo.pptx

byjayasuryanexinfo

Achieving Interoperability in Healthcare IT: A Strategic Guide

Artificial Intelligence(AI) full Book.pdf

bydeyanimesh299

Beyond BBB: Practical Alternatives to Posterior Approximation in Bayesian Neu...

byTomasz Kusmierczyk

Xemelgo - RFID Industry Predictions for 2026

Why 2026 Could Be a Turning Point for Decentralized Exchanges.pptx

bycalliemorgan193

IAC 500 Sensor - Humidity Measurement Device

byCS Instruments

Techbrains Baku 2025 by GoUP - all speaker session

byHusseinMalikMammadli

Salesforce Spring 26 Release Key .pptx

byMehediHasan939830

Unit 1.2 Components of a Computer System.pdf

bySanieBautista

Artificial Intelligence and Barbarism - Conceptual Map

Navigating the Spectrum of Advanced AI – Agentic, Autonomous, and Autopoietic...

byScott M. Graffius

250 Prompts - ChatGPT acting as your Assistant

byMihir Nagarkar

“Lessons from Yesterday's Tomorrowland” by Scott M. Graffius

byScott M. Graffius

MAD (1).pdf Mobile application develipment

Cesium formate brine - A brief history - prepared by John Downs in May 2011

AI Driven & AI Native Development AI native development

Corporate AI Training to AI Enable a Company Workforce

byStélio Inácio

Large Language Model. LLM Audit Artificial Intelligence

byMANASCHOUDHARY22

Ai In Courts Ai in courts AI in court AI in court

Digital transformation success powered by EPM and NexInfo.pptx

byjayasuryanexinfo

AES Implementation on FPGA

1.
AES implementation on FPGA Fromthe FASTEST to the SMALLEST By Tim Good and Mohammed Benaissa Presented By Leonardo Antichi
2.
The AIM • Showtwo new FPGA designs for AES implementation: • Faster & continuous throughput (25Gbps) • Smallest Area (124 slices) • Both support a 128-bit key • Demonstrate the advantage of an FPGA specific optimisation over an ASIC number-of-gates approach showing the speed improvement made thanks to the loop unrolled design
3.
The Fastest • KeyFeature: • 25 Gbps on a Xilinx Spartan-III • Key may be periodically changed without loss of throughput • Operating mode may be changed between encryption and decryption at will • How: • Specific FPGA pipeline placement (deep cuts) • Parallel loop-unrolled design • SubBytes operation in computational form (rather than a “SBox”) • Separate clock domain for key expansion
4.
The Fastest • KeyFeature: • 25 Gbps on a Xilinx Spartan-III • Key may be periodically changed without loss of throughput • Operating mode may be changed between encryption and decryption at will • How: • Specific FPGA pipeline placement (deep cuts) • Parallel loop-unrolled design • SubBytes operation in computational form (rather than a “SBox”) • Separate clock domain for key expansion Fig. 1. Block diagram for each middle round Fig. 2. Block diagram for final round
5.
The Smallest • KeyFeature: • 124 slices only (60% of the Xilinx Spartan-II device) • 2,2 Mbps throughput • How: • Specific Processor Architecture using 8-bit data-path (ASIP) • Subroutines and looping • Pipelined design permitting execution of a new instruction every cycle • 360 bits of RAM
6.
The Smallest • KeyFeature: • 124 slices only (60% of the Xilinx Spartan-II device) • 2,2 Mbps throughput • How: • Specific Processor Architecture using 8-bit data-path (ASIP) • Subroutines and looping • Pipelined design permitting execution of a new instruction every cycle • 360 bits of RAM Fig. 3. ASIP Architecture S-Box, 42 Fig. 4. Slice utilization versus design unit
7.
Conclusion Fig. 5. Throughputversus area for the different FPGA designs
8.
Thank you For yourattention

Editor's Notes

#3 THE AIM WHY FPGA? Expanding Design cost, risk, time to market Mass produced ARCHITECTURE CHOISES THE TWO DESIGNS The research objective is to explore the design space associated with the AES algorithm and in particular its FPGA (Field Programmable Gate Array ) hardware implementation. Why FPGA? Because FPGA has been expanding from its traditional role in prototyping to mainstream production. This change is being driven by commercial pressures to reduce design cost, risk and achieve a faster time to market. The Advances in technology have resulted in mask programmed mass produced versions of FPGA fabrics being offered by the leading manufacturers which, for some applications, remove the necessity to move prototype designs from FPGA to ASIC whilst still achieving a low unit cost. It is important also to underline that architectural choices for a given application are driven by the system requirements in terms of speed and the resources used and those can simply be viewed as throughput and area. For this porpoise two new FPGA designs are presented: The first is believed to be the fastest, achieving 25 Gb/s throughput using a Xilinx Spartan-III (XC3S2000) device, while, The second is believed to be the smallest and fits into a Xilinx Spartan-II (XC2S15) device, only requiring two block memories and 124 slices to achieve a throughput of 2.2 Mbps.
#4 DEEP CUT PIPELINE PLACEMENT UNROLLED DESIGN NO SBOX (Composite field arithmetic) FPGAs are particularly fast in terms of throughput when designs are implemented using deep pipeline cuts. The attainable speed is set by the longest path in the design so it is important to place the pipeline cuts such that the path lengths between the pipeline latches are balanced. First, the algorithm must be converted into a suitable form for deep pipelining. This is achieved by removing all the loops to form a loop-unrolled design where the data is moved through the stationary execution resources. On each clock cycle, a new item of data is input and progresses over numerous cycles through the pipeline resulting in the output of data each cycle, however, with some unavoidable latency. One of the key optimisations was to express the SubBytes operation in computational form rather than as a look-up table. Earlier implementations used the look-up table approach (the “S” box) but this resulted in an unbreakable delay due to the time required pass through the FPGA block memories. The FIPS-197 specification provided the mathematical derivation for SubBytes in terms of Galois Field (2^8) arithmetic. This was efficiently exploited by hardware implementations using composite field arithmetic which permitted a deeper level of pipelining and thus improved throughput.
#5 KEY CHANGES The high speed design presented here includes support for continued throughput during key changes for both encryption and decryption which previous pipelined designs have omitted.
#6 8bit design Longer clock less area 13546 cycles 18885 cycles ASIP 259 slices There already exist a number of good 32-bit based designs. In these designs the AES is implemented by breaking up a round into a number of smaller 32-bit wide operations. We decided to move to an 8 bit design increasing the clock necessary for completing the task but saving resources in terms of area occupied. This results in a final design using the PicoBlaze of 119 slices plus the block memories which are accounted for here by an equivalent number of slices (once again 32 bits per slice was used). The resulting design had an equivalent slice count of 452 and with a 90.2 MHz maximum clock. Key expansion followed by encipher took 13546 cycles and key expansion followed by decipher 18885 cycles. The average encipher-decipher throughput was 0.71 Mbps. An application specific instruction processor (ASIP) was developed based around an 8-bit datapath and minimal program ROM size. Minimisation of the ROM size resulted in a requirement to support subroutines and looping. This added area to the control portion of the design but the saving was made in terms of the size of the ROM. The total design, including an equivalent number of slices for the block memories only occupies 259 slices to give a throughput of 2.2Mbps.
#7 PROCESSOR DESIGN AREA (60%datapath 40%control) The datapath consisted of two processing units, the first to perform the SubBytes operation using resource shared composite field arithmetic and the second to perform multiply accumulate operations in Galois Field 2^8. A minimal set of instructions was developed (15 in total) to perform the operations required for the AES. The processor (Figure 3) used a pipelined design permitting execution of a new instruction every cycle. Figure 4 is a pie chart depicting the balance of area between the various design units. Of the processor hardware approx 60% of the area is required for the datapath and 40% for the control.
#8 THE FASTEST, APPLICATION THE SMALLEST, APPLICATION 32BIT These designs show the extremes of what is possible and have radically different applications In terms of speed to area ratio the unrolled designs perform the best as there is no controller overhead. However, such designs are very large and need a 1 – 2 million gate device but achieve throughputs up to 25 Gbps. These designs have applications in fixed infrastructure such as IPsec for e-commerce servers. The low area design described here achieves 2.2 Mbps which is sufficient for most wireless and home applications. The area required is small thus fundamentally low power so has utility in the future mobile area. The design requires just over half the available resources of the smallest available Spartan-II FPGA. The advantage of the 8-bit ASIP over the more traditional 8-bit microcontroller architecture (PicoBlaze) is shown by the approximate factor of three improvement in throughput and 40% reduction in area (including estimated area for memories). The 32-bit datapath designs occupy the middle ground between the two extremes and have utility where moderate throughput in the 100 – 200 Mbps is required.