Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

560 views
478 views

Published on

Presented at FPGA2013 http://fpganetworks.org/FPGA2013/

Abstract: Field programmable gate arrays (FPGA) are extensively used for rapid prototyping in embedded system applications. While hardware acceleration can be done via specialized processors like a Graphical Processing Unit (GPU), they can also be accomplished with FPGAs for more specialized scenarios. GPUs essentially consist of massively parallel cores and have high memory bandwidth; FPGAs, on the other hand, provide flexibility in terms of customizable I/O and computational resources. In this paper, we explore the usage of GPUs and FPGAs as cryptographic co-processors in streaming dataflow systems with huge rate of data inhalation. Two classic lightweight encryption algorithms, Tiny Encryption Algorithm (TEA) and Extended Tiny Encryption Algorithm (XTEA), are targeted for implementation on GPUs and FPGAs. The GPU implementations of TEA and XTEA in this study depict a maximum speedup of 13x over CPU based implementation. The pipelined FPGA implementation is able to realize a throughput of 6-9x more than the GPU for small plaintext sizes.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
560
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors

  1. 1. Hardware Acceleration of TEA and XTEA Algorithms on FPGA, GPU and Multi-Core Processors Vivek Venugopal and Devu Manikantan Shila {venugov, manikad}@utrc.utc.com Introduction Tiny Encryption Algorithm (TEA) Extended Tiny Encryption Algorithm (XTEA) half round1 half round 2 half round1 half round 2 v1 32 32 v1 32 << 4 32 << 4 << 4 << 4 k0 32 + k2 32 + v1 32 >> 5 XOR 32 >> 5 XOR v1 32 32 v1 32 32 + XOR 32 + XOR + + sum sum Gateway to 32 32 32 32 v1 >> 5 >> 5 sum0 ky Internet GPU + ARM (NVIDIA CARMA) k1 32 + XOR k3 32 + XOR kx 32 + XOR sum1 32 + XOR v1_new v1_new Planning 32 +/- v0_new 32 +/- 32 +/- 32 +/- v0 v1 v0 v0_new v1 Computer encrypt/decrypt encrypt/decrypt Encrypted communication Flight Control and Navigation Computer • TEA uses addition, XOR and shift operations on 32-bit words • The Extended Tiny Encryption Algorithm (XTEA) was introduced after and has a very small code footprint. weaknesses for smaller rounds were found in TEA. Smart meter application FPGA + ARM (Xilinx Zynq) Unmanned Autonomous Vehicle • TEA has security holes and weaknesses for smaller rounds, • In XTEA, the key scheduling is modified to reflect different patterns for especially the Avalanche Effect seen for 6 rounds mixing the data and key continuously per round. • In smart grids, sensitive information such as power consumption, price update, or outage awareness is exchanged between the meters and the power utility Implementation platforms and Results 8000 8000 Intel Xeon X5650 Nvidia C2070 company in real-time over the Internet. • Nvidias Tesla C2070 high-end GPU, 2 hexa-core Intel Xeon X5650 Nvidia C2070 Intel Quad core i7 Nvidia GT650M • Unmanned Autonomous Vehicles (UAV) continuously Intel Xeon processors, Nvidias GeForce GT 650M Intel Quad core i7 Nvidia GT650M 6000 Zynq exchange dynamic information regarding the urban notebook GPU consisting of 384 cores, quad-core 6000 Throughput in Mbps Zynq Throughput in Mbps environment with a gateway. The gateway also provides Intel Core i7 CPU. feedback regarding the optimization parameters that • Xilinxs Zynq-7000 SoC ZC702 evaluation board. 4000 4000 need to be fed into the UAVs path planning algorithm The Zynq-7000 platform consists of a dual ARM for mapping different routes to reach its destination Cortex A-9 processor clocked at 800 MHz and 2000 2000 safely. Artix-7 FPGA as the programmable logic. Streaming Multiprocessor (SMX) Architecture Kepler GK110’s new SMX introduces several architectural innovations that make it not only the most • Cyber attacks on such critical and dynamic powerful multiprocessor we’ve built, but also the most programmable and power efficient. Copy input data and keys to GPU memory 0 information can lead to severe losses of 0 8 KB 16 KB 8 MB 128 MB 1 GB 8 KB 16 KB 8 MB 128 MB 1 GB resources and finance. SMX Control Logic SMX Control Logic pre-compute sum values for each round and store in shared memory Plaintext size Plaintext size Throughput (Mbps) comparison of TEA Throughput (Mbps) comparison of XTEAMotivation calculate ciphers for blocks in parallel • All the information from/to these smart meters need GT650M: 2 SMX with copy ciphers back to CPU Conclusion to be decrypted/encrypted at the gateway, which in 192 cores each Inside SMX GPU Implementation • GPUs and FPGAs provide better throughput for both TEA and XTEA as SMX: 192 single precision CUDA cores, 64 double precision units, 32 special function units (SFU), and 32 load/store units (LD/ST). turn can lead to very large response times. A larger compared to CPUs. Flash DRAM SRAM response time implies poorer performance in terms of both throughput and latency. GIGe USB Processing System Memory Interfaces Custom Displays PCIe Running on Zynq board Running in ISIM • FPGAs perform better for smaller plaintext sizes whereas GPUs are better for larger plaintext sizes. • Continuous transmission of data from UAV regarding CAN AXI Interconnect • In terms of development time and cost, GPUs are better suited as embedded Dual ARM Cortex A-9 Fixed MPCore (800 MHz) I2C Peripheral peripherals the evidence grid need to be encrypted fast. SelectIO Resources Processing Programmable SD System Logic cryptography co-processors as compared to FPGAs. JTAG • FPGAs and GPUs can be used in gateways to speed UART 2x 12-bit Custom Programmable • Future research efforts may address the use of Zynq platform as a complete, low- GPIO MSPS ADC Memory Logic up the TEA/XTEA encryption and decryption of bulk information for improved throughput and latency. Analog Monitors Analog cost cryptographic co-processor for more complex cryptographic algorithms Zynq Internal block diagram Hardware in Loop setup References[1] D. J. Wheeler and R. M. Needham. TEA, a tiny encryption algorithm, 1995.[2] D. J. Wheeler and R. M. Needham. TEA extensions. Technical report, Cambridge University, England, October 1997.[3] Xilinx Inc. Xilinx Zynq-7000 SoC ZC702 Evaluation kit.[4] Nvidia Inc. (Last Accessed: February 2012) Nvidia Tesla C2070 GPU Computing Processor, Nvidia GeoForce GT650M Notebook GPU [Available Online]

×