Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

Vivek Venugopalan
Vivek VenugopalanResearch Scientist at United Technologies Research Center
Hardware Acceleration of TEA and XTEA Algorithms on FPGA, GPU and
                                                        Multi-Core Processors
                                                                                  Vivek Venugopal and Devu Manikantan Shila {venugov, manikad}@utrc.utc.com

 Introduction                                                                                              Tiny Encryption Algorithm (TEA) Extended Tiny Encryption Algorithm (XTEA)
                                                                                                                                                                                            half round1                                                                                                                                               half round 2                                                                                          half round1                                                               half round 2
                                                                                                                              v1 32
                                                                                                                                                                                                                                                                              32                                                                                                                                          v1 32            << 4                                                     32
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         << 4
                                                                                                                                                 << 4                                                                                                                                               << 4
                                                                                                                              k0   32                                  +                                                                                            k2 32                                                         +                                                                                        v1   32
                                                                                                                                                                                                                                                                                                                                                                                                                                           >> 5
                                                                                                                                                                                                                                                                                                                                                                                                                                                      XOR
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    32
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         >> 5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          XOR


                                                                                                                              v1      32                                                                                                                                      32
                                                                                                                                                                                                                                                                                                                                                                                                                           v1   32
                                                                                                                                      32       +                                           XOR
                                                                                                                                                                                                                                                                               32                   +                                          XOR                                                                                                          +                                                                    +
                                                                                                                              sum                                                                                                                                  sum

   Gateway to                                                                                                                         32                                                                                                                                       32                                                                                                                                               32                                                                               32
                                                                                                                              v1                 >> 5                                                                                                                                               >> 5                                                                                                                 sum0                                                                               ky
    Internet
                                 GPU + ARM (NVIDIA CARMA)                                                                     k1      32                               +                             XOR                                                              k3 32                                                     +                  XOR
                                                                                                                                                                                                                                                                                                                                                                                                                          kx    32                     +        XOR
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         sum1    32      +           XOR
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       v1_new
                                                                                                                                                                                                                                                                                                                                                                              v1_new
                                                              Planning                                                                                                                                              32                   +/-                   v0_new                                                                                        32   +/-                                                                                                 32   +/-                                                             32    +/-
                                                                                                                                                                                                     v0                                                                                                                                            v1                                                                                                           v0                              v0_new                               v1
                                                             Computer
                                                                                                                              encrypt/decrypt                                                                                                                                                                                                                                                                            encrypt/decrypt
       Encrypted communication

                                                                                     Flight Control and
                                                                                    Navigation Computer   • TEA uses addition, XOR and shift operations on 32-bit words • The Extended Tiny Encryption Algorithm (XTEA) was introduced after
                                                                                                          and has a very small code footprint.                                                                                                                                                                                                                                                       weaknesses for smaller rounds were found in TEA.
  Smart meter application         FPGA + ARM (Xilinx Zynq)
                                                                Unmanned Autonomous Vehicle               • TEA has security holes and weaknesses for smaller rounds,                                                                                                                                                                                                                                • In XTEA, the key scheduling is modified to reflect different patterns for
                                                                                                          especially the Avalanche Effect seen for 6 rounds                                                                                                                                                                                                                                          mixing the data and key continuously per round.
 • In smart grids, sensitive information such as power
 consumption, price update, or outage awareness is
 exchanged between the meters and the power utility
                                                                                                                                                                                                                        Implementation platforms and Results                                                                                                                                                                                                                                    8000
                                                                                                                                                                                                                                                                                                                                                                                             8000                                                                                                                     Intel Xeon X5650                          Nvidia C2070
 company in real-time over the Internet.                                                                  • Nvidia's Tesla C2070 high-end GPU, 2 hexa-core                                                                                                                                                                                                                                          Intel Xeon X5650
                                                                                                                                                                                                                                                                                                                                                                                                    Nvidia C2070
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Intel Quad core i7                        Nvidia GT650M
 • Unmanned Autonomous Vehicles (UAV) continuously                                                        Intel Xeon processors, Nvidia's GeForce GT 650M                                                                                                                                                                                                                                           Intel Quad core i7
                                                                                                                                                                                                                                                                                                                                                                                                    Nvidia GT650M                                                                               6000
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Zynq

 exchange dynamic information regarding the urban                                                         notebook GPU consisting of 384 cores, quad-core                                                                                                                                                                                                                                    6000




                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Throughput in Mbps
                                                                                                                                                                                                                                                                                                                                                                                                    Zynq




                                                                                                                                                                                                                                                                                                                                                                        Throughput in Mbps
 environment with a gateway. The gateway also provides                                                    Intel Core i7 CPU.
 feedback regarding the optimization parameters that                                                      • Xilinx's Zynq-7000 SoC ZC702 evaluation board.                                                                                                                                                                                                                                   4000
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                4000

 need to be fed into the UAV's path planning algorithm                                                    The Zynq-7000 platform consists of a dual ARM
 for mapping different routes to reach it's destination                                                   Cortex A-9 processor clocked at 800 MHz and                                                                                                                                                                                                                                                                                                                                           2000
                                                                                                                                                                                                                                                                                                                                                                                             2000
 safely.                                                                                                  Artix-7 FPGA as the programmable logic.                       Streaming Multiprocessor (SMX) Architecture
                                                                                                                                                                        Kepler GK110’s new SMX introduces several architectural innovations that make it not only the most




 • Cyber attacks on such critical and dynamic
                                                                                                                                                                        powerful multiprocessor we’ve built, but also the most programmable and power efficient.



                                                                                                                                                                                                                                                                                                                    Copy input data and
                                                                                                                                                                                                                                                                                                                   keys to GPU memory
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   0
 information can lead to severe losses of                                                                                                                                                                                                                                                                                                                                                       0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         8 KB                16 KB              8 MB       128 MB        1 GB
                                                                                                                                                                                                                                                                                                                                                                                                    8 KB      16 KB             8 MB              128 MB      1 GB
 resources and finance.                                                                                            SMX

                                                                                                            Control Logic
                                                                                                                                           SMX

                                                                                                                                      Control Logic
                                                                                                                                                                                                                                                                                                                  pre-compute sum values
                                                                                                                                                                                                                                                                                                                  for each round and store
                                                                                                                                                                                                                                                                                                                      in shared memory                                                                                Plaintext size
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Plaintext size
                                                                                                                                                                                                                                                                                                                                                                                                             Throughput (Mbps) comparison of TEA                                                                       Throughput (Mbps) comparison of XTEA

Motivation                                                                                                                                                                                                                                                                                                          calculate ciphers for
                                                                                                                                                                                                                                                                                                                     blocks in parallel




 • All the information from/to these smart meters need                                                           GT650M: 2 SMX with
                                                                                                                                                                                                                                                                                                                    copy ciphers back to
                                                                                                                                                                                                                                                                                                                            CPU
                                                                                                                                                                                                                                                                                                                                                                                              Conclusion
 to be decrypted/encrypted at the gateway, which in                                                                192 cores each                                                                            Inside SMX                                                                                     GPU Implementation
                                                                                                                                                                                                                                                                                                                                                                                              • GPUs and FPGAs provide better throughput for both TEA and XTEA as
                                                                                                                                                                        SMX: 192 single precision CUDA cores, 64 double precision units, 32 special function units (SFU), and 32 load/store units
                                                                                                                                                                        (LD/ST).




 turn can lead to very large response times. A larger
                                                                                                                                                                                                                                                                                                                                                                                              compared to CPUs.
                                                                                                                                       Flash          DRAM           SRAM



 response time implies poorer performance in terms of
 both throughput and latency.
                                                                                                          GIGe


                                                                                                          USB
                                                                                                                        Processing
                                                                                                                         System
                                                                                                                                                     Memory
                                                                                                                                                    Interfaces                         Custom
                                                                                                                                                                                                                                        Displays


                                                                                                                                                                                                                                           PCIe                      Running on Zynq board                                                 Running in ISIM
                                                                                                                                                                                                                                                                                                                                                                                              • FPGAs perform better for smaller plaintext sizes whereas GPUs are better for
                                                                                                                                                                                                                                                                                                                                                                                              larger plaintext sizes.
 • Continuous transmission of data from UAV regarding                                                     CAN
                                                                                                                                                                                                                                                                                                                        AXI Interconnect




                                                                                                                                                                                                                                                                                                                                                                                              • In terms of development time and cost, GPUs are better suited as embedded
                                                                                                                                               Dual ARM Cortex A-9
                                                                                                                          Fixed                 MPCore (800 MHz)
                                                                                                          I2C                                                                       Peripheral
                                                                                                                        peripherals


 the evidence grid need to be encrypted fast.
                                                                                                                                                                                                                                      SelectIO
                                                                                                                                                                                                                                     Resources
                                                                                                                                                                                                                                                                              Processing                                                             Programmable
                                                                                                          SD                                                                                                                                                                   System                                                                    Logic


                                                                                                                                                                                                                                                                                                                                                                                              cryptography co-processors as compared to FPGAs.
                                                                                                                                                                                                                                                                                                           JTAG


 • FPGAs and GPUs can be used in gateways to speed
                                                                                                          UART
                                                                                                                         2x 12-bit
                                                                                                                                                     Custom          Programmable

                                                                                                                                                                                                                                                                                                                                                                                              • Future research efforts may address the use of Zynq platform as a complete, low-
                                                                                                          GPIO          MSPS ADC                                                                                                        Memory
                                                                                                                                                                         Logic

 up the TEA/XTEA encryption and decryption of bulk
 information for improved throughput and latency.
                                                                                                                                      Analog        Monitors         Analog
                                                                                                                                                                                                                                                                                                                                                                                              cost cryptographic co-processor for more complex cryptographic algorithms
                                                                                                                               Zynq Internal block diagram                                                                                                                                      Hardware in Loop setup




 References
[1] D. J. Wheeler and R. M. Needham. TEA, a tiny encryption algorithm, 1995.
[2] D. J. Wheeler and R. M. Needham. TEA extensions. Technical report, Cambridge University, England, October 1997.
[3] Xilinx Inc. Xilinx Zynq-7000 SoC ZC702 Evaluation kit.
[4] Nvidia Inc. (Last Accessed: February 2012) Nvidia Tesla C2070 GPU Computing Processor, Nvidia GeoForce GT650M Notebook GPU [Available Online]
1 of 1

Recommended

Каталог облачных услуг КРОК by
Каталог облачных услуг КРОККаталог облачных услуг КРОК
Каталог облачных услуг КРОККРОК
1.7K views17 slides
Sheet Music: Tsubasa Chronicle - Hear our prayer (violin 1) by
Sheet Music: Tsubasa Chronicle - Hear our prayer (violin 1)Sheet Music: Tsubasa Chronicle - Hear our prayer (violin 1)
Sheet Music: Tsubasa Chronicle - Hear our prayer (violin 1)sayakahime
4.9K views1 slide
9 18 Part 1 by
9 18 Part 19 18 Part 1
9 18 Part 1burgerja
678 views11 slides
Silent vlfree by
Silent vlfreeSilent vlfree
Silent vlfreeLoveis1able Khumpuangdee
491 views4 slides
Out Through An Earhole by
Out Through An EarholeOut Through An Earhole
Out Through An EarholeJohn Turville
367 views3 slides
Tim berne solo huevos by
Tim berne solo huevos Tim berne solo huevos
Tim berne solo huevos Liam Noble
439 views1 slide

More Related Content

Similar to Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

Crompton Way Traffic Proposal Map by
Crompton Way Traffic Proposal MapCrompton Way Traffic Proposal Map
Crompton Way Traffic Proposal Mapguestf8bf20
94 views1 slide
Rain in-spring by
Rain in-springRain in-spring
Rain in-springLoveis1able Khumpuangdee
158 views2 slides
Or cad by
Or cadOr cad
Or cadanishgoel
1.8K views20 slides
Extending carriers network with fring OTT by
Extending carriers network with fring OTTExtending carriers network with fring OTT
Extending carriers network with fring OTTRoy Timor-Rousso
1.3K views25 slides
9 18 Part 2 by
9 18 Part 29 18 Part 2
9 18 Part 2burgerja
673 views16 slides
Whitehall Framework Plan by
Whitehall Framework Plan  Whitehall Framework Plan
Whitehall Framework Plan ExSite
721 views38 slides

Similar to Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors(20)

Crompton Way Traffic Proposal Map by guestf8bf20
Crompton Way Traffic Proposal MapCrompton Way Traffic Proposal Map
Crompton Way Traffic Proposal Map
guestf8bf2094 views
Or cad by anishgoel
Or cadOr cad
Or cad
anishgoel1.8K views
Extending carriers network with fring OTT by Roy Timor-Rousso
Extending carriers network with fring OTTExtending carriers network with fring OTT
Extending carriers network with fring OTT
Roy Timor-Rousso1.3K views
9 18 Part 2 by burgerja
9 18 Part 29 18 Part 2
9 18 Part 2
burgerja673 views
Whitehall Framework Plan by ExSite
Whitehall Framework Plan  Whitehall Framework Plan
Whitehall Framework Plan
ExSite721 views
Whitehall Framework Plan by ExSite
Whitehall Framework Plan  Whitehall Framework Plan
Whitehall Framework Plan
ExSite141 views
Jan&rsquo;s Health Bar Proposed Patio Revisions by wedway
Jan&rsquo;s Health Bar Proposed Patio RevisionsJan&rsquo;s Health Bar Proposed Patio Revisions
Jan&rsquo;s Health Bar Proposed Patio Revisions
wedway295 views
La Corda D'Oro: Brand New Breeze for Violin by sayakahime
La Corda D'Oro: Brand New Breeze for ViolinLa Corda D'Oro: Brand New Breeze for Violin
La Corda D'Oro: Brand New Breeze for Violin
sayakahime5.8K views
Fools garden lemon tree by Sah Ya
Fools garden   lemon treeFools garden   lemon tree
Fools garden lemon tree
Sah Ya5.2K views
CambridgeIP: IP Data as a source of Business Intelligence by CambridgeIP Ltd
CambridgeIP: IP Data as a source of Business IntelligenceCambridgeIP: IP Data as a source of Business Intelligence
CambridgeIP: IP Data as a source of Business Intelligence
CambridgeIP Ltd 425 views
AC/DC highway to hell by dhan drummer
AC/DC highway to hellAC/DC highway to hell
AC/DC highway to hell
dhan drummer1.7K views
BOV, Abu Dhabi, U.A.E. by Starckn
BOV, Abu Dhabi, U.A.E.BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.
Starckn400 views
Cafe Life Thumbnail Charts by guest7cc3e6
Cafe Life Thumbnail ChartsCafe Life Thumbnail Charts
Cafe Life Thumbnail Charts
guest7cc3e6215 views
Memorias (Juan Pablo Cediel) by pabloced
Memorias (Juan Pablo Cediel)Memorias (Juan Pablo Cediel)
Memorias (Juan Pablo Cediel)
pabloced135 views
Architectural Portfolio by Sam Sampoux
Architectural PortfolioArchitectural Portfolio
Architectural Portfolio
Sam Sampoux1.4K views

More from Vivek Venugopalan

xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions by
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsxDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsVivek Venugopalan
186 views1 slide
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA by
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGADesign, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGAVivek Venugopalan
301 views1 slide
Accelerating Real-Time LiDAR Data Processing Using GPUs by
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUsVivek Venugopalan
1.1K views12 slides
Real-time processing for ATST by
Real-time processing for ATSTReal-time processing for ATST
Real-time processing for ATSTVivek Venugopalan
763 views28 slides
Accelerating Particle Image Velocimetry using Hybrid Architectures by
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesAccelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesVivek Venugopalan
600 views20 slides
CISL talk by
CISL talkCISL talk
CISL talkVivek Venugopalan
397 views44 slides

More from Vivek Venugopalan(6)

xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions by Vivek Venugopalan
xDEFENSE: An Extended DEFENSE for mitigating Next Generation IntrusionsxDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
xDEFENSE: An Extended DEFENSE for mitigating Next Generation Intrusions
Vivek Venugopalan186 views
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA by Vivek Venugopalan
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGADesign, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Design, Implementation and Security Analysis of Hardware Trojan Threats in FPGA
Vivek Venugopalan301 views
Accelerating Real-Time LiDAR Data Processing Using GPUs by Vivek Venugopalan
Accelerating Real-Time LiDAR Data Processing Using GPUsAccelerating Real-Time LiDAR Data Processing Using GPUs
Accelerating Real-Time LiDAR Data Processing Using GPUs
Vivek Venugopalan1.1K views
Accelerating Particle Image Velocimetry using Hybrid Architectures by Vivek Venugopalan
Accelerating Particle Image Velocimetry using Hybrid ArchitecturesAccelerating Particle Image Velocimetry using Hybrid Architectures
Accelerating Particle Image Velocimetry using Hybrid Architectures
Vivek Venugopalan600 views

Recently uploaded

The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
85 views20 slides
Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
66 views27 slides
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...ShapeBlue
120 views12 slides
Mobile Core Solutions & Successful Cases.pdf by
Mobile Core Solutions & Successful Cases.pdfMobile Core Solutions & Successful Cases.pdf
Mobile Core Solutions & Successful Cases.pdfIPLOOK Networks
16 views7 slides
Optimizing Communication to Optimize Human Behavior - LCBM by
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBMYaman Kumar
39 views49 slides
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf by
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfBronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfThomasBronack
31 views31 slides

Recently uploaded(20)

The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE85 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty66 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue120 views
Mobile Core Solutions & Successful Cases.pdf by IPLOOK Networks
Mobile Core Solutions & Successful Cases.pdfMobile Core Solutions & Successful Cases.pdf
Mobile Core Solutions & Successful Cases.pdf
IPLOOK Networks16 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar39 views
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf by ThomasBronack
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfBronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
ThomasBronack31 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro38 views
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada46 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash171 views
What is Authentication Active Directory_.pptx by HeenaMehta35
What is Authentication Active Directory_.pptxWhat is Authentication Active Directory_.pptx
What is Authentication Active Directory_.pptx
HeenaMehta3515 views
Cocktail of Environments. How to Mix Test and Development Environments and St... by Aleksandr Tarasov
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu474 views
Deep Tech and the Amplified Organisation: Core Concepts by Holonomics
Deep Tech and the Amplified Organisation: Core ConceptsDeep Tech and the Amplified Organisation: Core Concepts
Deep Tech and the Amplified Organisation: Core Concepts
Holonomics17 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li104 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue209 views
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri44 views
This talk was not generated with ChatGPT: how AI is changing science by Elena Simperl
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
Elena Simperl34 views

Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

  • 1. Hardware Acceleration of TEA and XTEA Algorithms on FPGA, GPU and Multi-Core Processors Vivek Venugopal and Devu Manikantan Shila {venugov, manikad}@utrc.utc.com Introduction Tiny Encryption Algorithm (TEA) Extended Tiny Encryption Algorithm (XTEA) half round1 half round 2 half round1 half round 2 v1 32 32 v1 32 << 4 32 << 4 << 4 << 4 k0 32 + k2 32 + v1 32 >> 5 XOR 32 >> 5 XOR v1 32 32 v1 32 32 + XOR 32 + XOR + + sum sum Gateway to 32 32 32 32 v1 >> 5 >> 5 sum0 ky Internet GPU + ARM (NVIDIA CARMA) k1 32 + XOR k3 32 + XOR kx 32 + XOR sum1 32 + XOR v1_new v1_new Planning 32 +/- v0_new 32 +/- 32 +/- 32 +/- v0 v1 v0 v0_new v1 Computer encrypt/decrypt encrypt/decrypt Encrypted communication Flight Control and Navigation Computer • TEA uses addition, XOR and shift operations on 32-bit words • The Extended Tiny Encryption Algorithm (XTEA) was introduced after and has a very small code footprint. weaknesses for smaller rounds were found in TEA. Smart meter application FPGA + ARM (Xilinx Zynq) Unmanned Autonomous Vehicle • TEA has security holes and weaknesses for smaller rounds, • In XTEA, the key scheduling is modified to reflect different patterns for especially the Avalanche Effect seen for 6 rounds mixing the data and key continuously per round. • In smart grids, sensitive information such as power consumption, price update, or outage awareness is exchanged between the meters and the power utility Implementation platforms and Results 8000 8000 Intel Xeon X5650 Nvidia C2070 company in real-time over the Internet. • Nvidia's Tesla C2070 high-end GPU, 2 hexa-core Intel Xeon X5650 Nvidia C2070 Intel Quad core i7 Nvidia GT650M • Unmanned Autonomous Vehicles (UAV) continuously Intel Xeon processors, Nvidia's GeForce GT 650M Intel Quad core i7 Nvidia GT650M 6000 Zynq exchange dynamic information regarding the urban notebook GPU consisting of 384 cores, quad-core 6000 Throughput in Mbps Zynq Throughput in Mbps environment with a gateway. The gateway also provides Intel Core i7 CPU. feedback regarding the optimization parameters that • Xilinx's Zynq-7000 SoC ZC702 evaluation board. 4000 4000 need to be fed into the UAV's path planning algorithm The Zynq-7000 platform consists of a dual ARM for mapping different routes to reach it's destination Cortex A-9 processor clocked at 800 MHz and 2000 2000 safely. Artix-7 FPGA as the programmable logic. Streaming Multiprocessor (SMX) Architecture Kepler GK110’s new SMX introduces several architectural innovations that make it not only the most • Cyber attacks on such critical and dynamic powerful multiprocessor we’ve built, but also the most programmable and power efficient. Copy input data and keys to GPU memory 0 information can lead to severe losses of 0 8 KB 16 KB 8 MB 128 MB 1 GB 8 KB 16 KB 8 MB 128 MB 1 GB resources and finance. SMX Control Logic SMX Control Logic pre-compute sum values for each round and store in shared memory Plaintext size Plaintext size Throughput (Mbps) comparison of TEA Throughput (Mbps) comparison of XTEA Motivation calculate ciphers for blocks in parallel • All the information from/to these smart meters need GT650M: 2 SMX with copy ciphers back to CPU Conclusion to be decrypted/encrypted at the gateway, which in 192 cores each Inside SMX GPU Implementation • GPUs and FPGAs provide better throughput for both TEA and XTEA as SMX: 192 single precision CUDA cores, 64 double precision units, 32 special function units (SFU), and 32 load/store units (LD/ST). turn can lead to very large response times. A larger compared to CPUs. Flash DRAM SRAM response time implies poorer performance in terms of both throughput and latency. GIGe USB Processing System Memory Interfaces Custom Displays PCIe Running on Zynq board Running in ISIM • FPGAs perform better for smaller plaintext sizes whereas GPUs are better for larger plaintext sizes. • Continuous transmission of data from UAV regarding CAN AXI Interconnect • In terms of development time and cost, GPUs are better suited as embedded Dual ARM Cortex A-9 Fixed MPCore (800 MHz) I2C Peripheral peripherals the evidence grid need to be encrypted fast. SelectIO Resources Processing Programmable SD System Logic cryptography co-processors as compared to FPGAs. JTAG • FPGAs and GPUs can be used in gateways to speed UART 2x 12-bit Custom Programmable • Future research efforts may address the use of Zynq platform as a complete, low- GPIO MSPS ADC Memory Logic up the TEA/XTEA encryption and decryption of bulk information for improved throughput and latency. Analog Monitors Analog cost cryptographic co-processor for more complex cryptographic algorithms Zynq Internal block diagram Hardware in Loop setup References [1] D. J. Wheeler and R. M. Needham. TEA, a tiny encryption algorithm, 1995. [2] D. J. Wheeler and R. M. Needham. TEA extensions. Technical report, Cambridge University, England, October 1997. [3] Xilinx Inc. Xilinx Zynq-7000 SoC ZC702 Evaluation kit. [4] Nvidia Inc. (Last Accessed: February 2012) Nvidia Tesla C2070 GPU Computing Processor, Nvidia GeoForce GT650M Notebook GPU [Available Online]