This paper is devoted to the design of dual core crypto processor for executing both Prime field and binary field instructions. The proposed design is specifically optimized for Field programmable gate array (FPGA) platform. Combination of two different field (prime field GF(p) and Binary field GF(2m)) instructions execution is analysed.The design is implemented in Spartan 3E and virtex5. Both the performance results are compared. The implementation result shows the execution of parallelism using dual field instructions
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithmijsrd.com
Advanced encryption standard was accepted as a Federal Information Processing Standard (FIPS) standard. In traditional look up table (LUT) approaches, the unbreakable delay is longer than the total delay of the rest of operations in each round. LUT approach consumes a large area. It is more efficient to apply composite field arithmetic in the SubBytes transformation of the AES algorithm. It not only reduces the complexity but also enables deep sub pipelining such that higher speed can be achieved. Isomorphic mapping can be employed to convert GF(28) to GF(22)2)2) ,so that multiplicative inverse can be easily obtained. SubBytes and InvSubBytes transformations are merged using composite field arithmetic. It is most important responsible for the implementation of low cost and high throughput AES architecture. As compared to the typical ROM based lookup table, the presented implementation is both capable of higher speeds since it can be pipelined and small in terms of area occupancy (137/1290 slices on a Spartan III XCS200-5FPGA).
High performance pipelined architecture of elliptic curve scalar multiplicati...Ieee Xpert
High performance pipelined architecture of elliptic curve scalar multiplication over gf(2m) High performance pipelined architecture of elliptic curve scalar multiplication over gf(2m) High performance pipelined architecture of elliptic curve scalar multiplication over gf(2m) High performance pipelined architecture of elliptic curve scalar multiplication over gf(2m)
Graph based transistor network generation method for supergate designIeee Xpert
Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design
Flexible dsp accelerator architecture exploiting carry save arithmeticIeee Xpert
This document proposes a novel flexible accelerator architecture comprising computational units (FCUs) that can efficiently perform DSP operations using carry-save arithmetic. Each FCU operates directly on carry-save operands and can be configured to perform templates of common DSP operations like multiplication and addition/subtraction. By keeping operands in carry-save format throughout the FCU, intermediate conversions are avoided, improving performance compared to prior approaches. The proposed architecture aims to achieve high computational density while reducing area and power compared to existing inflexible accelerator designs.
High performance nb-ldpc decoder with reduction of message exchange Ieee Xpert
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
A high performance fir filter architecture for fixed and reconfigurable appli...Ieee Xpert
A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Many Machine Learning inference workloads compute predictions based on a limited number of models that are deployed together in the system. These models often share common structure and state. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks, thus being unaware of optimization and sharing opportunities.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithmijsrd.com
Advanced encryption standard was accepted as a Federal Information Processing Standard (FIPS) standard. In traditional look up table (LUT) approaches, the unbreakable delay is longer than the total delay of the rest of operations in each round. LUT approach consumes a large area. It is more efficient to apply composite field arithmetic in the SubBytes transformation of the AES algorithm. It not only reduces the complexity but also enables deep sub pipelining such that higher speed can be achieved. Isomorphic mapping can be employed to convert GF(28) to GF(22)2)2) ,so that multiplicative inverse can be easily obtained. SubBytes and InvSubBytes transformations are merged using composite field arithmetic. It is most important responsible for the implementation of low cost and high throughput AES architecture. As compared to the typical ROM based lookup table, the presented implementation is both capable of higher speeds since it can be pipelined and small in terms of area occupancy (137/1290 slices on a Spartan III XCS200-5FPGA).
High performance pipelined architecture of elliptic curve scalar multiplicati...Ieee Xpert
High performance pipelined architecture of elliptic curve scalar multiplication over gf(2m) High performance pipelined architecture of elliptic curve scalar multiplication over gf(2m) High performance pipelined architecture of elliptic curve scalar multiplication over gf(2m) High performance pipelined architecture of elliptic curve scalar multiplication over gf(2m)
Graph based transistor network generation method for supergate designIeee Xpert
Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design
Flexible dsp accelerator architecture exploiting carry save arithmeticIeee Xpert
This document proposes a novel flexible accelerator architecture comprising computational units (FCUs) that can efficiently perform DSP operations using carry-save arithmetic. Each FCU operates directly on carry-save operands and can be configured to perform templates of common DSP operations like multiplication and addition/subtraction. By keeping operands in carry-save format throughout the FCU, intermediate conversions are avoided, improving performance compared to prior approaches. The proposed architecture aims to achieve high computational density while reducing area and power compared to existing inflexible accelerator designs.
High performance nb-ldpc decoder with reduction of message exchange Ieee Xpert
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
A high performance fir filter architecture for fixed and reconfigurable appli...Ieee Xpert
A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Many Machine Learning inference workloads compute predictions based on a limited number of models that are deployed together in the system. These models often share common structure and state. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks, thus being unaware of optimization and sharing opportunities.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
This document describes a low power pipelined FFT processor architecture based on the Radix-4 single delay commutator (R4SDC) algorithm. It implements and compares 16, 64, and 256-point FFT architectures using conventional R4SDC, a complex multiplier, and a multiplier-less architecture based on common subexpression technique. For the 16-point FFT, an ordered R4SDC architecture is proposed that reorders coefficients and inputs to minimize switching activity and reduce power consumption compared to the conventional design. Simulation results show the area and power requirements of the different architectural implementations.
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
This document proposes an efficient CORDIC pipelined FFT algorithm for fingerprint recognition on FPGAs. The CORDIC algorithm uses only shift and add operations, making it suitable for replacing multipliers in the butterfly operations of an FFT. This reduces computational complexity. The proposed system takes a fingerprint, processes it with the CORDIC pipelined FFT, extracts features which are stored and then matched against a test fingerprint for recognition. The algorithm aims to provide an efficient hardware implementation of FFT and fingerprint recognition using minimal computations.
An Efficient High Speed Design of 16-Bit Sparse-Tree RSFQ AdderIJERA Editor
In this paper, we propse 16-bit sparse tree RSFQ adder (Rapid single flux quantam), kogge-stone adder, carry lookahead adder. In general N-bit adders like Ripple carry adder s(slow adders compare to other adders), and carry lookahead adders(area consuming adders) are used in earlier days. But now the most of industries are using parallel prefix adders because of their advantages compare to kogge-stone adder, carry lookahead adder, Our prefix sparse tree adders are faster and area efficient. Parallel prefix adder is a technique for increasing the speed in DSP processor while performing addition. We simulate and synthesis different types of 16-bit sparse tree RSFQ adders using Xilinx ISE10.1i tool, By using these synthesis results, We noted the performance parameters like number of LUT’s and delay. We compare these three adders interms of LUT’s represents area) and delay values.
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...TELKOMNIKA JOURNAL
A 64-bits RISC Dual-Core microprocessor with high performance and low power consumption is
presented in this paper. The processor has a symmetric architecture with two cores. Each of them has
three stage pipeline, 64-bit data-path and 64-bit address port. A novel shared register module, redundant
Booth3 algorithm and leapfrog Wallace tree architecture are introduced to the microprocessor, and both
the performance and power consumption of it has been improved enormously. As the FPGA simulation
result indicates, the power consumption is decreased by 14% and the longest data-path is shortened by
25%.
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd Iaetsd
This document presents a new VLSI architecture for a real-time pipeline FFT processor using fused floating point operations. It proposes high radix floating point butterflies implemented with two fused operations: a two-term dot product and add-subtract unit. Both discrete and fused radix processors are compared in terms of area. Higher throughput is achieved using a proposed architecture with conflict-free memory access and a new addressing scheme for radix-16 FFT.
CFA based SBOX and Modified Mixcolumn Implementation of 8 Bit Datapath for AESidescitation
Secure data transmission is very important in any communication systems.
Network Security provides many techniques for efficient data transmission through
unprotected network. Cryptography provides a method for securing the transmission of
information by the process of encryption. Encryption converts the message in to unreadable
form (Cipher Text) . Decryption converts this Cipher Text back to original message.
Advanced Encryption Standard (AES) has been used as the first choice of cryptographic
algorithm for many security based applications because of the high level of security and
flexibility of implementation in hardware and software. This paper presents an area
efficient, low power design for AES based on an 8-bit data path making it suitable for
wireless security applications. It has a significant power-area-latency performance
improvements over other existing AES designs. For high performance applications, AES S-
box and inverse S-box implemented using composite field Arithmetic (CFA). Also low
resource Mixcolumn structure is used in this structure. The 8 bit data path architecture is
implemented in XILINX 13.2 and simulated using MODELSIM 6.5 software. Also the
power and area calculation is done with the help of SYNOPSYS software.
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 CompressorIJERD Editor
With the appearance of new innovation in the fields of VLSI and correspondence, there is likewise a perpetually developing interest for fast transforming and low range outline. It is likewise a remarkable certainty that the multiplier unit structures a fundamental piece of processor configuration. Because of this respect, rapid multiplier architectures turn into the need of the day. In this paper, we acquaint a novel structural engineering with perform high velocity duplication utilizing old Vedic math's strategies. Another fast approach using 4:2 compressors and novel 7:2 compressors for expansion has additionally been joined in the same and has been investigated. Upon examination, the compressor based multiplier present in this paper, is just about two times quicker than the mainstream routines for augmentation. Likewise we outline a FFT utilizing compressor based multiplier. This all configuration and examinations were done on a Xilinx Spartan 3e arrangement of FPGA and the timing and zone of the outline, on the same have been ascertained.
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...IOSR Journals
This document describes a new efficient distributed arithmetic (NEDA) technique for implementing a high speed 1-D discrete wavelet transform (DWT) using a 9/7 filter on a Xilinx Virtex4 FPGA. The key aspects of the NEDA technique are that it uses adders as the main component and does not require multipliers, subtraction, or ROM. Simulation results show the proposed NEDA architecture requires fewer logic resources and has a shorter maximum path delay compared to existing distributed arithmetic techniques.
Design and implementation of Parallel Prefix Adders using FPGAsIOSR Journals
Abstract: Adders are known to have the frequently used in VLSI designs. In digital design we have half adder and full adder, main adders by using these adders we can implement ripple carry adders. RCA use to perform any number of addition. In this RCA is serial adder and it has commutation delay problem. If increase the ha&fa simultaneously delay also increase. That’s why we go for parallel adders(parallel prefix adders). IN the parallel prefix adder are ks adder(kogge-stone),sks adder(sparse kogge-stone),spaning tree and brentkung adder. These adders are designd and implemented on FPGA sparton3E kit. Simulated and synthesis by model sim6.4b, Xilinx ise10.1.
Design and Estimation of delay, power and area for Parallel prefix addersIJERA Editor
The document describes a study comparing the delay, power, and area of different types of parallel prefix adders implemented on a Xilinx Spartan 6 FPGA, including Kogge Stone, Han Carlson, and Brent Kung adders. It discusses the structures and computation methods of parallel prefix adders and traditional ripple carry and carry lookahead adders. The adders were designed in Verilog HDL and synthesized on the FPGA, then their delay, power usage, and resource usage were measured and compared. Simulation results showing the logic diagrams and outputs of sample adders like the ripple carry and Kogge Stone adder are also presented.
Iaetsd design and implementation of pseudo random number generatorIaetsd Iaetsd
This document proposes a new pseudo random number generator (PRNG) design called the reseeding-mixing PRNG (RM-PRNG) that improves on existing PRNGs. The RM-PRNG consists of a chaos-based PRNG and a long-period multiple recursive generator. The reseeding mechanism removes short periods from the chaos-based PRNG, while mixing it with the multiple recursive generator extends the overall system period length. This allows the RM-PRNG to achieve a high throughput rate of over 6.4 Gb/s. Simulation results show the RM-PRNG passes statistical tests and generates unpredictable random numbers suitable for cryptography applications. The design has advantages of high throughput, low hardware cost,
This document presents a new technique for designing an area efficient FFT/IFFT processor for WPAN applications using a Radix-25 algorithm and Wallace tree multiplier. The proposed design reduces the size of the twiddle factor memory and number of complex multiplications compared to previous designs. Simulation results show the proposed design occupies less area (14% of total logic elements) and has lower hardware complexity than prior approaches.
A High Throughput CFA AES S-Box with Error Correction CapabilityIOSR Journals
The document describes a proposed method for implementing a fault tolerant Advanced Encryption Standard (AES) using a Hamming error correction code. AES operates by performing rounds of transformations on blocks of data, with the most complex step being the SubBytes transformation which involves calculating multiplicative inverses in GF(28). The proposed method uses composite field arithmetic to more efficiently calculate these inverses. It also applies a (12,8) Hamming error correction code to each byte before and after processing to detect and correct single bit errors caused by radiation events, improving reliability for satellite communications. The parity check bits for the Hamming code are precalculated and stored for the AES S-box lookup tables.
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...vtunotesbysree
1) The document provides solutions to problems from Chapter 1 and Chapter 2 of a computer organization textbook.
2) In Chapter 1, it discusses basic computer structure, performance improvement through overlapping operations, and performance comparisons between RISC and CISC processors. In Chapter 2, it covers machine instructions, binary representations of numbers, assembly language programming, and addressing modes.
3) Some of the problems solved include calculating non-overlapped and overlapped execution times, comparing RISC and CISC processors under different clock rates, implementing addition and subtraction in binary, writing assembly code to calculate a dot product, and designing programs that use indexed addressing modes.
This document summarizes research on improving the performance of multiplier and accumulator (MAC) circuits used in digital signal processing. It presents four architectures for carry-select adders (CSLA) that can be used in MACs: 1) a regular CSLA, 2) a CSLA that replaces full adders with binary-to-excess converters (BEC) to reduce area, 3) a CSLA that uses D-latches to store intermediate values and reduce the number of adders, and 4) a modified CSLA architecture. The document analyzes the delay and area of each group of bits for the different CSLA architectures. It finds that BEC and D-latch based C
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd Iaetsd
This document discusses using the CORDIC algorithm to implement a pipelined FFT for fingerprint recognition on an FPGA. It proposes a hardware-efficient CORDIC FFT architecture that minimizes computational complexity. The CORDIC algorithm replaces complex multipliers with shift-add operations, providing a simpler hardware implementation than traditional multiplier-based FFT designs. The architecture includes a butterfly structure implemented with CORDIC, an angle generator for twiddle factors, and input/output blocks with registers and multiplexers to enable pipelined processing.
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...IJERA Editor
In Very Large Scale Integration (VLSI) outlines, Carry Select Adder (CSLA) is one of the quickest adder utilized as a part of numerous data processing processors to perform quick number crunching capacities. In this paper we proposed the design of SQRT CSLA using parity preserving reversible gate (P2RG). Reversible logic is emerging field in today VLSI design. In conventional circuits, the logic gates such as AND gate, OR gate is irreversible in nature and computing with irreversible logic results in energy dissipation. This problem can be circumvented by using reversible logic. In ideal condition, the reversible logic gate produces zero power dissipation. The proposed design is efficient in terms of delay as compare to irreversible SQRT CSLA. The simulation is done using Xilinx.
The 4-bit CPU was designed to execute 18 distinct operations from a lookup table. It was tested by executing an instruction set that performed operations like load, store, add, and jump instructions. The accumulator and program counter displays were observed after each instruction to verify the correct operations. Various components like a RAM, ALU, program counter, and decoders were used in the design. The group members demonstrated different parts of the instruction set to verify the CPU was functioning properly.
This paper introduces two architectures for modulo 2n+1 adders. The first architecture is based on a sparse carry computation unit that computes only some carries, enabled by a new inverted circular idempotency property. This sparse approach reduces area and power compared to prior proposals while maintaining speed. The second architecture unifies the design of modulo 2n+1 and 2n-1 adders by showing 2n+1 addition can be treated as a subcase of 2n-1 addition with minor extra logic.
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
This document summarizes research on optimizing an explicit finite-difference scheme for fluid dynamics simulations to achieve high performance on many-core systems like the PEZY-SC2 processor. The researchers developed a code generation framework that uses temporal blocking to optimize for low memory bandwidth. On a PEZY-SC2 system with 16 million cores, they achieved 4.78 PFlops and 21.5% efficiency, comparable to other works on higher bandwidth machines. Temporal blocking reduced the required memory bandwidth and allowed good weak scaling to larger core counts.
Field programmable gate array implementation of multiwavelet transform based...IJECEIAES
This article offers an efficient design and implementation of a discrete multiwavelet critical-sampling transform based orthogonal frequency division multiplexing (DMWCST-OFDM) transceiver using field programmable gate array (FPGA) platform. The design uses 16-point discrete multiwavelet critical-sampling transform (DMWCST) and its inverse as main processing modules. All modules were designed using a part of Vivado® Design Suite version (2015.2), which is Xilinx system generator (XSG), and is compatible with MATLAB Simulink version R2013b. The FPGA implementation is carried out on a Zynq (XC7Z020-1CLG484) evaluation board with joint test action group (JTAG) hardware cosimulation. According to the results obtained from the implementation tools, the implemented system is efficient in terms of resource utilization and could support the real-time operations.
Iaetsd multioperand redundant adders on fpg asIaetsd Iaetsd
This paper presents efficient implementations of redundant multi-operand adders on FPGAs. Previous work avoided redundant adders on FPGAs due to the efficient carry propagate adders (CPAs) and area overhead of redundant adders. The paper proposes carry-save compressor tree approaches that achieve fast critical paths independent of bit width with little to no area overhead compared to CPA trees. It presents a classic carry-save compressor tree and a novel linear array structure that efficiently uses fast carry chains. Compared to binary and ternary CPA trees, the approaches achieve speedups of up to 3.81 times for 64-bit width additions.
This document describes a low power pipelined FFT processor architecture based on the Radix-4 single delay commutator (R4SDC) algorithm. It implements and compares 16, 64, and 256-point FFT architectures using conventional R4SDC, a complex multiplier, and a multiplier-less architecture based on common subexpression technique. For the 16-point FFT, an ordered R4SDC architecture is proposed that reorders coefficients and inputs to minimize switching activity and reduce power consumption compared to the conventional design. Simulation results show the area and power requirements of the different architectural implementations.
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
This document proposes an efficient CORDIC pipelined FFT algorithm for fingerprint recognition on FPGAs. The CORDIC algorithm uses only shift and add operations, making it suitable for replacing multipliers in the butterfly operations of an FFT. This reduces computational complexity. The proposed system takes a fingerprint, processes it with the CORDIC pipelined FFT, extracts features which are stored and then matched against a test fingerprint for recognition. The algorithm aims to provide an efficient hardware implementation of FFT and fingerprint recognition using minimal computations.
An Efficient High Speed Design of 16-Bit Sparse-Tree RSFQ AdderIJERA Editor
In this paper, we propse 16-bit sparse tree RSFQ adder (Rapid single flux quantam), kogge-stone adder, carry lookahead adder. In general N-bit adders like Ripple carry adder s(slow adders compare to other adders), and carry lookahead adders(area consuming adders) are used in earlier days. But now the most of industries are using parallel prefix adders because of their advantages compare to kogge-stone adder, carry lookahead adder, Our prefix sparse tree adders are faster and area efficient. Parallel prefix adder is a technique for increasing the speed in DSP processor while performing addition. We simulate and synthesis different types of 16-bit sparse tree RSFQ adders using Xilinx ISE10.1i tool, By using these synthesis results, We noted the performance parameters like number of LUT’s and delay. We compare these three adders interms of LUT’s represents area) and delay values.
Research of 64-bits RISC Dual-core Microprocessor with High Performance and L...TELKOMNIKA JOURNAL
A 64-bits RISC Dual-Core microprocessor with high performance and low power consumption is
presented in this paper. The processor has a symmetric architecture with two cores. Each of them has
three stage pipeline, 64-bit data-path and 64-bit address port. A novel shared register module, redundant
Booth3 algorithm and leapfrog Wallace tree architecture are introduced to the microprocessor, and both
the performance and power consumption of it has been improved enormously. As the FPGA simulation
result indicates, the power consumption is decreased by 14% and the longest data-path is shortened by
25%.
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd Iaetsd
This document presents a new VLSI architecture for a real-time pipeline FFT processor using fused floating point operations. It proposes high radix floating point butterflies implemented with two fused operations: a two-term dot product and add-subtract unit. Both discrete and fused radix processors are compared in terms of area. Higher throughput is achieved using a proposed architecture with conflict-free memory access and a new addressing scheme for radix-16 FFT.
CFA based SBOX and Modified Mixcolumn Implementation of 8 Bit Datapath for AESidescitation
Secure data transmission is very important in any communication systems.
Network Security provides many techniques for efficient data transmission through
unprotected network. Cryptography provides a method for securing the transmission of
information by the process of encryption. Encryption converts the message in to unreadable
form (Cipher Text) . Decryption converts this Cipher Text back to original message.
Advanced Encryption Standard (AES) has been used as the first choice of cryptographic
algorithm for many security based applications because of the high level of security and
flexibility of implementation in hardware and software. This paper presents an area
efficient, low power design for AES based on an 8-bit data path making it suitable for
wireless security applications. It has a significant power-area-latency performance
improvements over other existing AES designs. For high performance applications, AES S-
box and inverse S-box implemented using composite field Arithmetic (CFA). Also low
resource Mixcolumn structure is used in this structure. The 8 bit data path architecture is
implemented in XILINX 13.2 and simulated using MODELSIM 6.5 software. Also the
power and area calculation is done with the help of SYNOPSYS software.
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 CompressorIJERD Editor
With the appearance of new innovation in the fields of VLSI and correspondence, there is likewise a perpetually developing interest for fast transforming and low range outline. It is likewise a remarkable certainty that the multiplier unit structures a fundamental piece of processor configuration. Because of this respect, rapid multiplier architectures turn into the need of the day. In this paper, we acquaint a novel structural engineering with perform high velocity duplication utilizing old Vedic math's strategies. Another fast approach using 4:2 compressors and novel 7:2 compressors for expansion has additionally been joined in the same and has been investigated. Upon examination, the compressor based multiplier present in this paper, is just about two times quicker than the mainstream routines for augmentation. Likewise we outline a FFT utilizing compressor based multiplier. This all configuration and examinations were done on a Xilinx Spartan 3e arrangement of FPGA and the timing and zone of the outline, on the same have been ascertained.
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...IOSR Journals
This document describes a new efficient distributed arithmetic (NEDA) technique for implementing a high speed 1-D discrete wavelet transform (DWT) using a 9/7 filter on a Xilinx Virtex4 FPGA. The key aspects of the NEDA technique are that it uses adders as the main component and does not require multipliers, subtraction, or ROM. Simulation results show the proposed NEDA architecture requires fewer logic resources and has a shorter maximum path delay compared to existing distributed arithmetic techniques.
Design and implementation of Parallel Prefix Adders using FPGAsIOSR Journals
Abstract: Adders are known to have the frequently used in VLSI designs. In digital design we have half adder and full adder, main adders by using these adders we can implement ripple carry adders. RCA use to perform any number of addition. In this RCA is serial adder and it has commutation delay problem. If increase the ha&fa simultaneously delay also increase. That’s why we go for parallel adders(parallel prefix adders). IN the parallel prefix adder are ks adder(kogge-stone),sks adder(sparse kogge-stone),spaning tree and brentkung adder. These adders are designd and implemented on FPGA sparton3E kit. Simulated and synthesis by model sim6.4b, Xilinx ise10.1.
Design and Estimation of delay, power and area for Parallel prefix addersIJERA Editor
The document describes a study comparing the delay, power, and area of different types of parallel prefix adders implemented on a Xilinx Spartan 6 FPGA, including Kogge Stone, Han Carlson, and Brent Kung adders. It discusses the structures and computation methods of parallel prefix adders and traditional ripple carry and carry lookahead adders. The adders were designed in Verilog HDL and synthesized on the FPGA, then their delay, power usage, and resource usage were measured and compared. Simulation results showing the logic diagrams and outputs of sample adders like the ripple carry and Kogge Stone adder are also presented.
Iaetsd design and implementation of pseudo random number generatorIaetsd Iaetsd
This document proposes a new pseudo random number generator (PRNG) design called the reseeding-mixing PRNG (RM-PRNG) that improves on existing PRNGs. The RM-PRNG consists of a chaos-based PRNG and a long-period multiple recursive generator. The reseeding mechanism removes short periods from the chaos-based PRNG, while mixing it with the multiple recursive generator extends the overall system period length. This allows the RM-PRNG to achieve a high throughput rate of over 6.4 Gb/s. Simulation results show the RM-PRNG passes statistical tests and generates unpredictable random numbers suitable for cryptography applications. The design has advantages of high throughput, low hardware cost,
This document presents a new technique for designing an area efficient FFT/IFFT processor for WPAN applications using a Radix-25 algorithm and Wallace tree multiplier. The proposed design reduces the size of the twiddle factor memory and number of complex multiplications compared to previous designs. Simulation results show the proposed design occupies less area (14% of total logic elements) and has lower hardware complexity than prior approaches.
A High Throughput CFA AES S-Box with Error Correction CapabilityIOSR Journals
The document describes a proposed method for implementing a fault tolerant Advanced Encryption Standard (AES) using a Hamming error correction code. AES operates by performing rounds of transformations on blocks of data, with the most complex step being the SubBytes transformation which involves calculating multiplicative inverses in GF(28). The proposed method uses composite field arithmetic to more efficiently calculate these inverses. It also applies a (12,8) Hamming error correction code to each byte before and after processing to detect and correct single bit errors caused by radiation events, improving reliability for satellite communications. The parity check bits for the Hamming code are precalculated and stored for the AES S-box lookup tables.
SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & ...vtunotesbysree
1) The document provides solutions to problems from Chapter 1 and Chapter 2 of a computer organization textbook.
2) In Chapter 1, it discusses basic computer structure, performance improvement through overlapping operations, and performance comparisons between RISC and CISC processors. In Chapter 2, it covers machine instructions, binary representations of numbers, assembly language programming, and addressing modes.
3) Some of the problems solved include calculating non-overlapped and overlapped execution times, comparing RISC and CISC processors under different clock rates, implementing addition and subtraction in binary, writing assembly code to calculate a dot product, and designing programs that use indexed addressing modes.
This document summarizes research on improving the performance of multiplier and accumulator (MAC) circuits used in digital signal processing. It presents four architectures for carry-select adders (CSLA) that can be used in MACs: 1) a regular CSLA, 2) a CSLA that replaces full adders with binary-to-excess converters (BEC) to reduce area, 3) a CSLA that uses D-latches to store intermediate values and reduce the number of adders, and 4) a modified CSLA architecture. The document analyzes the delay and area of each group of bits for the different CSLA architectures. It finds that BEC and D-latch based C
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd Iaetsd
This document discusses using the CORDIC algorithm to implement a pipelined FFT for fingerprint recognition on an FPGA. It proposes a hardware-efficient CORDIC FFT architecture that minimizes computational complexity. The CORDIC algorithm replaces complex multipliers with shift-add operations, providing a simpler hardware implementation than traditional multiplier-based FFT designs. The architecture includes a butterfly structure implemented with CORDIC, an angle generator for twiddle factors, and input/output blocks with registers and multiplexers to enable pipelined processing.
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...IJERA Editor
In Very Large Scale Integration (VLSI) outlines, Carry Select Adder (CSLA) is one of the quickest adder utilized as a part of numerous data processing processors to perform quick number crunching capacities. In this paper we proposed the design of SQRT CSLA using parity preserving reversible gate (P2RG). Reversible logic is emerging field in today VLSI design. In conventional circuits, the logic gates such as AND gate, OR gate is irreversible in nature and computing with irreversible logic results in energy dissipation. This problem can be circumvented by using reversible logic. In ideal condition, the reversible logic gate produces zero power dissipation. The proposed design is efficient in terms of delay as compare to irreversible SQRT CSLA. The simulation is done using Xilinx.
The 4-bit CPU was designed to execute 18 distinct operations from a lookup table. It was tested by executing an instruction set that performed operations like load, store, add, and jump instructions. The accumulator and program counter displays were observed after each instruction to verify the correct operations. Various components like a RAM, ALU, program counter, and decoders were used in the design. The group members demonstrated different parts of the instruction set to verify the CPU was functioning properly.
This paper introduces two architectures for modulo 2n+1 adders. The first architecture is based on a sparse carry computation unit that computes only some carries, enabled by a new inverted circular idempotency property. This sparse approach reduces area and power compared to prior proposals while maintaining speed. The second architecture unifies the design of modulo 2n+1 and 2n-1 adders by showing 2n+1 addition can be treated as a subcase of 2n-1 addition with minor extra logic.
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
This document summarizes research on optimizing an explicit finite-difference scheme for fluid dynamics simulations to achieve high performance on many-core systems like the PEZY-SC2 processor. The researchers developed a code generation framework that uses temporal blocking to optimize for low memory bandwidth. On a PEZY-SC2 system with 16 million cores, they achieved 4.78 PFlops and 21.5% efficiency, comparable to other works on higher bandwidth machines. Temporal blocking reduced the required memory bandwidth and allowed good weak scaling to larger core counts.
Field programmable gate array implementation of multiwavelet transform based...IJECEIAES
This article offers an efficient design and implementation of a discrete multiwavelet critical-sampling transform based orthogonal frequency division multiplexing (DMWCST-OFDM) transceiver using field programmable gate array (FPGA) platform. The design uses 16-point discrete multiwavelet critical-sampling transform (DMWCST) and its inverse as main processing modules. All modules were designed using a part of Vivado® Design Suite version (2015.2), which is Xilinx system generator (XSG), and is compatible with MATLAB Simulink version R2013b. The FPGA implementation is carried out on a Zynq (XC7Z020-1CLG484) evaluation board with joint test action group (JTAG) hardware cosimulation. According to the results obtained from the implementation tools, the implemented system is efficient in terms of resource utilization and could support the real-time operations.
Iaetsd multioperand redundant adders on fpg asIaetsd Iaetsd
This paper presents efficient implementations of redundant multi-operand adders on FPGAs. Previous work avoided redundant adders on FPGAs due to the efficient carry propagate adders (CPAs) and area overhead of redundant adders. The paper proposes carry-save compressor tree approaches that achieve fast critical paths independent of bit width with little to no area overhead compared to CPA trees. It presents a classic carry-save compressor tree and a novel linear array structure that efficiently uses fast carry chains. Compared to binary and ternary CPA trees, the approaches achieve speedups of up to 3.81 times for 64-bit width additions.
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
Video Compression is very essential to meet the technological demands such as low power, less memory
and fast transfer rate for different range of devices and for various multimedia applications. Video
compression is primarily achieved by Motion Estimation (ME) process in any video encoder which
contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric
in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung
Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder
scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42
% as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using
Virtex 7 FPGA.
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
Video Compression is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices and for various multimedia applications. Video compression is primarily achieved by Motion Estimation (ME) process in any video encoder which contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42 % as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using Virtex 7 FPGA.
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...IJMER
This document summarizes research on implementing various parallel prefix adders (Kogge-Stone, sparse Kogge-Stone, spanning tree, Brent-Kung) on FPGAs. It describes the structures of the different parallel prefix adders and discusses related work comparing adder performance on FPGAs. The adders were designed and tested on a Spartan 3E FPGA using VHDL simulation and Xilinx synthesis tools. Simulation waveforms and device utilization results are presented for the different adders. Measurement data from a logic analyzer was also used to compare actual performance to simulation results. In general, the research found that parallel prefix adders do not necessarily outperform ripple carry adders on FPGAs
High speed customized serial protocol for IP integration on FPGA based SOC ap...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.VLSICS Design
The developing an Application Specific Integrated Circuits (ASICs) will cost very high, the circuits should be proved and then it would be optimized before implementation. Multiplication which is the basic building block for several DSP processors, Image processing and many other. The Braun multipliers can easily be implemented using Field Programmable Gate Array (FPGA) devices. This research presented the comparative study of Spartan-3E, Virtex-4, Virtex-5 and Virtex-6 Low Power FPGA devices. The implementation of Braun multipliers and its bypassing techniques is done using Verilog HDL. We are proposing that adder block which we implemented our design (fast addition) and we compared the results of that so that our proposed method is effective when compare to the conventional design. There is the reduction in the resources like delay LUTs, number of slices used. Results are showed and it is verified using the Spartan-3E, Virtex-4 and Virtex-5 devices. The Virtex-5 FPGA has shown the good performance as compared to Spartan-3E and Virtex-4 FPGA devices.
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
This document describes several designs for high-speed matrix multiplication. It proposes a new design called Parallel-Parallel Input Multi-Output (PPI-MO) that uses multiple multipliers and registers to perform matrix multiplication in parallel. This increases throughput. For an n×n matrix multiplication, it uses n2 multipliers, n2 registers, and n(n-2) adders. Elements of the first matrix are input to rows of multipliers simultaneously, while elements of the second matrix are input to columns. The partial products from each column are added to calculate elements of the output matrix.
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
Video Compression is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices and for various multimedia applications. Video compression is primarily achieved by Motion Estimation (ME) process in any video encoder which contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42% as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using Virtex 7 FPGA.
The document describes a modified radix-24 single-data-path discrete Fourier transform (DFT) algorithm and architecture for implementing a 128-point inverse fast Fourier transform (IFFT) module for use in FPGA-based multiband orthogonal frequency division multiplexing (MB-OFDM) ultra-wideband (UWB) systems. The key aspects of the proposed design include:
1) A modified radix-24 single-data-flow (SDF) DFT algorithm that reorders the twiddle factor sequence to enable easier implementation of complex multiplication compared to earlier designs.
2) A single-data-path pipelined architecture that achieves the required 528 MHz processing speed for the U
International Journal of Engineering Inventions (IJEI) provides a multidisciplinary passage for researchers, managers, professionals, practitioners and students around the globe to publish high quality, peer-reviewed articles on all theoretical and empirical aspects of Engineering and Science.
The document describes the design and implementation of a digital circuit to calculate the greatest common divisor (GCD) of two 16-bit unsigned integers using the Euclidean algorithm. The design was implemented on a Xilinx Spartan-6 FPGA using different techniques, including behavioral models with loops, finite state machines, and structural models. Evaluation results showed that a structural model utilizing Spartan-6 primitives and macros achieved the minimum area-delay product compared to other architectures.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...IRJET Journal
The document describes a novel design for a multiplier and accumulator (MAC) unit using the modified Booth algorithm and parallel self-timed adder (PASTA). The modified Booth algorithm reduces the number of partial products compared to a regular multiplication process, lowering delay. A carry save adder design is also proposed to further improve performance in terms of computation speed, power consumption, and area compared to a conventional design using the modified Booth algorithm. Simulation results show the proposed MAC design with PASTA has better performance and reduced area overhead and critical path delay compared to conventional methods.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on designing a high-speed arithmetic architecture for a parallel multiplier-accumulator based on the radix-2 modified Booth algorithm. It presents the design of a modified Booth multiplier using a carry lookahead adder for high accuracy with a fixed width. It also proposes a high-speed MAC architecture that improves performance by eliminating the accumulator and modifying the carry-save adder to add lower bits in advance, reducing the number of inputs to the final adder. The proposed CSA architecture can efficiently implement the operations of the new MAC arithmetic.
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET Journal
This document presents a novel architecture for a high-speed inexact speculative adder using carry lookahead adder and Brent Kung adder. The proposed adder splits the critical path into shorter paths using fine-grain pipelining to improve speed. It uses a carry lookahead adder and Brent Kung adder in 4-bit blocks to speculate the carry and calculate sums, with error compensation. The design is implemented using a 5-stage pipeline to further reduce delay. Implementation in MAGIC shows the proposed adder achieves higher speed through optimized pipelining of key components in the speculative and compensation blocks.
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORIJNSA Journal
In this paper, we propose an elliptic curve key generation processor over GF(2163) scheme based on the Montgomery scalar multiplication algorithm. The new architecture is performed using polynomial basis. The Finite Field operations use a cellular automata multiplier and Fermat algorithm for inversion. For real time implementation, the architecture has been tested on an ISE 9.1 Software using Xilinx Virtex II Pro FPGA and on an ASIC CMOS 45 nm technology as well. The proposed implementation provides a time of 2.07 ms and 38 percent of Slices in Xilinx Virtex II Pro FPGA. Such features reveal the high efficiently of this implementation design.
Design and Implementation A different Architectures of mixcolumn in FPGAVLSICS Design
This paper details Implementation of the Encryption algorithm AES under VHDL language In FPGA by using different architecture of mixcolumn. We then review this research investigates the AES algorithm in FPGA and the Very High Speed Integrated Circuit Hardware Description language (VHDL). Altera Quartus II software is used for simulation and optimization of the synthesizable VHDL code. The set of transformations of both Encryptions and decryption are simulated using an iterative design approach in order to optimize the hardware consumption. Altera Cyclone III Family devices are utilized for hardware evaluation.
Similar to DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM (20)
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...PIMR BHOPAL
Variable frequency drive .A Variable Frequency Drive (VFD) is an electronic device used to control the speed and torque of an electric motor by varying the frequency and voltage of its power supply. VFDs are widely used in industrial applications for motor control, providing significant energy savings and precise motor operation.
Mechanical Engineering on AAI Summer Training Report-003.pdf
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
1. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
DOI : 10.5121/vlsic.2013.4105 51
DUAL FIELD DUAL CORE SECURE
CRYPTOPROCESSOR ON FPGA PLATFORM
C Veeraraghavan1
and K Rajendran2
1
Department of Electronics and communication,
Sri Krishna Arts and Science College, Coimbatore,Tamilnadu, India
veeraa@mail.com
2
Department of Electronics,
Government Arts College for women, Ramanathapuram, Tamilnadu, India
krishrazen@yahoo.co.in
ABSTRACT
This paper is devoted to the design of dual core crypto processor for executing both Prime field and binary
field instructions. The proposed design is specifically optimized for Field programmable gate array
(FPGA) platform. Combination of two different field (prime field GF(p) and Binary field GF(2m
))
instructions execution is analysed.The design is implemented in Spartan 3E and virtex5. Both the
performance results are compared. The implementation result shows the execution of parallelism using
dual field instructions
KEYWORDS
Crypto processor, FPGA, Prime field, Binary field.
1. INTRODUCTION
The term cryptography represents the encryption of data. For the secured data communication
cryptography is essential one. With rapid increases in communication and network applications,
cryptography has become a crucial issue to ensure the security of transmitted data. Cryptography
is used now a days in a variety of different applications. Every application has its own design
criteria and raises special requirements for hardware designs. While contactless powered
devices have to meet low-power constraints, battery-powered devices need energy-aware
implementations that consume as little energy as possible to increase the life-time of the battery.
The most fundamental decision concerning future hardware designs is whether to use a binary-
extension field or a prime field as basis of the used crypto processor.
A secure crypto processor is a dedicated microprocessor chip for carrying out cryptographic
operation. For increasing the speed of execution this paper is designing the dual core crypto
processor. At the same time for decreasing the power consumption this design is implemented in
the FPGA.
2. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
52
This paper is concentrate to design a dual crypto processor used to execute both prime and binary
field instructions.
2. RELATED WORK
Only a few papers companied binary and prime fields in hardware. Dual field arithmetic unit for
GF(p) and GF(2m
) is designed by Johannes Wolkerstorfer[3]. This paper design the
multiplication, squaring, addition, subtraction and inversion in both fields GF(p) and GF(2m
)).
J. Goodman et al. presented in [13] a VLSI implementation of a dual-field arithmetic unit. Their
so-called Domain-Specific Reconfigurable Cryptographic Processor (DSRCP) is not a mere
multiplier for GF(p) and GF(2m GF(2m
). It can calculate all operations required for elliptic curve
cryptography including inversion and comparisons. These operations are executed on an
extensive datapath which is controlled by a microcoded control unit. Main components of the
datapath are a carry-propagate adder for operation in GF(p) that takes three cycles for an addition
and a reconfigurable datapath for operation in GF(2m
). An additional comparator allows
comparisons of integers or polynomials. The Montgomery algorithm is used for multiplication in
GF(p). Multiplication in GF(2m
) obeys an iterative MSB-first scheme with interleaved modular
reduction.
3. IMPLEMENTING PRIME FIELD AND BINARY FIELD IN FPGA.
In 1983, Blakley introduced an algorithm to perform Modular multiplication of two integers A
and B modulo an integer M[4]. It is an iterative binary double-and-add algorithm. The main idea
of the algorithm is that it keeps the intermediate result after each iteration below the modulus
value, which it avoids final division. In this paper, the modulus M corresponds to p and we say it
Fp multiplication. All arithmetic in Fp are performed in two’s complement number system, which
avoids input and output conversions like existing implementations [11], [12].
3.1. Fast Carry Chains for Fp –Primitives
The main difficulty of the Blakley algorithm is the computation of addition on large operands.
The modified Blakley algorithm for large operands is shown in [5] and [6]. The use of carry save
adder (CSA) helps to speed up the repeated additions on large operands. However these modified
versions require at least one final addition on large carry chain. Some pre-computed values too
are used by this technique which requires additional time and storage area.
This work exploits the features available in an FPGA device for efficient computation of Blakley
algorithm on large operands. The specific features that are available in an FPGA device are
efficiently utilized for developing arithmetic primitives in F2
m
fields in [7]. However, this paper
looks after the same for Fp. The modern FPGA consists of 16 slices (or 32 LUTs) within a single
row which is connected through an in-built fast carry chain (FCC). The FCC can perform addition
on two 32-bit operands most efficiently compared to any other adder.Structures [8] . It is
experimentally shown that on a Virtex-4 FPGA device the latency of a 32-bit addition using a fast
carry chain takes only 5.8 ns, whereas the same using a carry look ahead structure takes 8.7 ns.
Hence, fast carry chain is 1.5 times faster than carry look ahead structure for computing addition
3. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
53
of two 32-bit operands on an FPGA platform. In order to compute an addition of two operands
longer than 32 bits, the FPGA will utilize more than one row which requires additional routing
delay. For example, the addition (A+B) of two 64-bit operands (A, B) using a single 64-bit carry
chain is slower than the same using three 32-bit FCC and a 2:1 multiplexer [8].
We develop an efficient 256-bit adder useful to Fp256–arithmetic using 32-bit fast carry chains.
The repeated Karatsuba decomposition is applied on 256-bit operands. An operand is
decomposed up to a depth of three for converting it into eight pieces of 32-bit operands. A 64-bit
addition is performed by using three 32-bit fast carry chains with a carry select structure.
Let, A = A1232
+ A0, B1232
+ B0 and C=A+B, where Ai, Bi are 32-bit integers. We compute A0 + B0,
A1 + B1 +0,and A1 + B1 +1 in parallel on three FCC. Then the carry out of the least significant
addition (A0 + B0) is used to multiplex the results of the most significant additions. Thus the
latency of a 64-bit adder is 1 FCC + 1 MUX, where MUX corresponds to a 2:1 multiplexer.
Similarly, a 128-bit adder is developed by three 64-bit adders, and a 256-bit adder is developed
by three 128-bit adders. Therefore, a 256-bit adder is developed from 32 bit adders.
Fig1. Architecture of Fp adder/subtractor/multiplier unit
4. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
54
Fig. 2. Internal Structure of a 256 bit adder
3.2. Architecture Description for prime field
Our first objective for designing such an integrated architecture is to reduce the overall hardware
costs for computing three essential prime field operations in pairing computation. The
architecture consists of several independent blocks which operate in parallel for accelerating the
execution of respective operations. The whole architecture is subdivided into four macro-
blocks(A1,A2,A3,A4) and seven micro-blocks (B1,B2,B3,B4,B5,B6,B7). The macro-blocks are
used to compute the arithmetic operations, whereas, micro-blocks are primarily responsible for
dataflow among the macro-blocks, the registers, and the I/O ports. The functionality of the
individual blocks are described here.
• Macro-blocks A1,A2& A4 are 256-bit adders based on our proposed technique is shown
in Fig1.
• Block A3 performs 2µ for an integer µ£Fp . This is done by simply one bit left shift
having only rewiring and no additional logic cells.
• Micro-block B1 consists of one 2:1multiplexer that selects either v1 or w1 based on the
most significant bits (or carry outs) of 2µ and 2µ -p operations. Therefore, this block
completes the 2µ mod p operation.
• Block B2 selects either s1 or s2 as the input to the A3.
• Blocks B3,B4 and B5 help to compute Fp —addition and subtraction in A1 and A2. The
control signal c0 holds zero for addition and one for subtraction. Thus, if c0 =1 then
block B5 selects -s2 else it selects s2 . Similarly, if c0 =1 then block B4 selects p else it
selects -p. Block B3 completes the operation by selecting the correct result. In case of Fp -
subtraction (i.e., c0 =1), it selects either v2 or w2 based on the most significant bit (MSB)
of only, whereas, for Fp -addition it does the same based on the MSB of both v2 and w2.
• Blocks B6 and B7 multiplex t1 (the output of 2µ mod p) and t2 (the output of s1±s2 mod
p ) as the new value of s1 and s2 and registers, respectively.
At every level of hierarchy it adds one
additional MUX in the critical path. Thus
the latency of a 256-bit adder is 1 FCC + 3
MUX delay, which is 9.9 ns on a Virtex-4
FPGA, whereas the latency of a 256-bit
carry look ahead adder on the same
platform is 16.7 ns, which is 1.7 times
slower than the above technique. we
develop a programmable Fp-primitive
based on above 256-bit high-speed adder
circuits. Prime field operations carry over
are addition, subtraction, and
multiplication. Fig. 1 depicts the overall
resulting architecture of the proposed Fp -
adder/subtractor/multiplier unit, where the
internal dataflow of A256 blocks are shown
in Fig. 2.
5. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
55
3.3 Binary field execution unit for addition
Fig.3. Binary Field Addition Unit Block Diagram
The binary field addition is designed by using the XOR gate operation.
4. INSTRUCTIONS IMPLEMENTATION FOR PRIME FIELD AND
BINARY FIELD
4.1.Instruction 1- Computation of Fp Multiplication
Proposed Fp –primitive follows the parallelism technique of Montgomery ladder [2] for
computing Blakley multiplication algorithm in Fp [4].
Basic Blakley technique for computing c=ab mod p with input parameters a and b is as follows
c←0
For i from 0 to k-1 do
c←2c+a.bk-1-i
c←c mod p
return c.
The choice of this algorithm is due to its lower hardware cost and intrinsic adaptability to
Montgomery ladder for parallelism. We rewrite it, in Algorithm 2 with parenthesized indices in
superscript in order to emphasize the intrinsic dependency as well as parallelism of the
multiplication procedure.
Algorithm 1: The interleaved multiplication based on Montgomery ladder.
Input: p, a= ∑ 2୧
ܽ݅ିଵ
ୀ and, b
= ∑ 2୧
ܾ݅ିଵ
ୀ .
Output:a.b. mod p
1. s1
(n)
←0; ← (n)
←a
2. For i=n - 1 down to 0 do
3. if bi=1 then u(i)
←s2
(i+1)
; else u(i)
← s1
(i+1)
;
4. υ1
(i)
← 2υ (i)
5. υ2
(i)
←s1
(i+1)
+s2
(i+1)
;
6. ω1
(i)
←υ1
(i)
+(p¬)+1;
7. ω2
(i)
←υ2
(i)
+(p¬)+1;;
8.c1
(i)
←(υ1
(2)
) n (ω1
(i)
) n ;
9. c2
(i)
←(υ2
(2)
) n (ω2
(i)
) n ;
10. if c1
(i)
=1 then t1
(i)
← ω1
(i)
; else t1
(i)
← υ1
(i)
;
11 .if c2
(i)
=1 then t2
(i)
← ω2
(i)
; else t2
(i)
← υ2
(i)
;
6. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
56
12.if b1
(i)
=1 then s1
(i)
← t 1
(i)
; else s1
(i)
← t1
(i)
;
13.if b2
(i)
=1 then s2
(i)
←t 2
(i)
; else s2
(i)
← t2
(i)
;
14. end for
15. return s1
(0)
;
[In this algorithm,x(i)
represents the value of x at ith iteration,(x)n indicates the nth
bit of x , and
indicates logical OR.]
The algorithm computes two intermediate results (s1
(i)
and s2
(i)
) in each iteration. The data
transfer inside the architecture (see Fig. 1) for computing (a.b) mod p is as follows.
• The register s1 and s2 hold the iterative results s1
(i)
and s2
(i)
of Algorithm 2, which are
initialized by zero and a, respectively, as specified in step 1.
• Iterative execution starts from i = n – 1 and goes down to zero as shown in step 2. This
step is executed by a 8-bit counter, which belongs to the control part of the proposed
design and it is not shown in Fig. 1.
• Block B2 of Fig. 1 executes step 3. The modular doubling (as computed by executing the
steps 4, 6, 8, and 10) and the modular addition (as computed by executing the steps 5, 7,
9, and 11) are performed in parallel. In Fig. 1, steps 4 and 6 are performed in blocks A3
and A4, respectively, whereas, both the steps 8 and 10 are performed in block B1.
Similarly, steps 5 and 7 are performed in blocks A1 and A2, respectively, whereas, both
the steps 9 and 11 are performed in block B3. During the execution of Fp -multiplication
Control signal c0 remains zero.
• Finally, results of the current iteration are restored as specified in step 12 and step 13 in
parallel B6 and B7 blocks. All steps from step 3 to step 13 of Algorithm 2 are performed
within one clock by the proposed architecture. Therefore, to compute a multiplication in
Fp256 the proposed design takes only 256 clock cycles.
4.2. Instruction 2- Computation of Fp Addition
The proposed design executes Algorithm 2 for computing Fp -addition. As described in step 1,
the architecture initializes registers s1 and s2 by operands and , respectively. It executes steps 2
and 3 in blocks A1 and A2 . Based on the most significant bits of υ2 and ω2 it produces the
correct s1+ s2 result of mod p in block B3 as described in Step 3 and step 4. During the execution of
Fp -addition the control signal c0 holds logic zero. The proposed architecture computes a Fp -
addition in one clock cycle.
Algorithm 2: The addition in prime field.
Input p,a= ∑ 2୧
ܽ݅ିଵ
ୀ and b= ∑ 2୧
ܾ݅ିଵ
ୀ .
Output a+b mod p .
1. s1 ←a; s2 ←a;
2. υ2 ←s1 + s2
3. ω2 ←υ1 +(p¬)+1 ;
4 c2 ←(υ2) n (ω2 ) n;
5. If c2 =1 then t2 ← ω2 ; else t2 ← υ2 ;
6. Return t2;
7. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
57
4.3. Instruction 3- Computation of Fp Subtraction
Subtraction a-b mod p.on the proposed design is performed by executing Algorithm 3.It is
executed by the architecture mostly like Algorithm 2 with additional help by block B5 for -S2.
Algorithm 3: The subtraction in prime field.
Input p,a= ∑ 2୧
ܽ݅ିଵ
ୀ and b= ∑ 2୧
ܾ݅ିଵ
ୀ .
Output a-b mod p .
1. s1 ←a; s2 ←a;
2. υ2 ←s1 + (¬s2)+1;
3. ω2 ←υ1 + p;
4 c2 ←(υ2) n ;
5. if c2 =1 then t2 ← ω2 ; else t2 ← υ2 ;
6. Return t2;
4.4. Instruction 4- Computation of Binary addition
Addition of a and b in binary field is performed by the exclusive OR gate.
Result = a XOR b.
5. EXECUTION OF PARALLEL INSTRUCTIONS
The Fig4 shows the dual core structure execution unit block diagram. The mechanism and
regularity of data access for computing all instructions are fairly simple. The distribution of
access to the registers and resolution of access conflicts are handled efficiently at the runtime by a
dedicated hardware block called data access unit which communicates among the CAUs and the
registers.. The DAU takes care of granting accesses. Therefore, a simple multiplexing protocol is
used between CAUs and registers, which is able to confirm a request within the same cycle in
order not to cause any delay cycles when trying to access data in parallel. The data accesses and
instruction sequences are hard coded into the sequence control of the architecture which avoids
the additional software development costs. The dependencies of the instructions are predefined
and thus the access conflicts are known. The priority of the data processing and the respective
execution is rearranged accordingly which achieves maximum utilization of CAUs.
Fig.4. Dual Core Structure Block Diagram
8. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
58
Fig.5. I/O Representation Diagram
The data access unit or DAU acts as a mediator while transferring data between CAUs and
memory elements. Due to the demand of parallel access, the proposed crypto processor stores all
intermediate results in its active registers. Each of the register consists of data-in, data-out, and
enables lines. It gets updated by data-in lines when the respective enable signal is invoked. The
crossbar switch (results) redirects the outputs of each operation to registers. Similarly, the
operands are redirected from registers to the input ports of the CAUs. The respective select
signals are generated prior to the above two redirection procedures by the sequence control unit.
The access control block synchronizes the select lines of the multiplexers for operands and
results. It also synchronizes the enable signals of registers for restoring the intermediate results.
The micro instruction sequence generator finds the current operation type and generates the
respective micro instructions which are nothing but the control signals. The respective values of
control signals, which on the other hand, represents the scheduling of different operations on
CAU.
Fig.6. Prime Field Addition and Binary Field Addition parallel execution waveform
9. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
59
6. IMPLEMENTATION RESULTS
The whole design has been done in verilog on Xilinx ISE design suit using a Spartran 3E, and
Virtex5. Performance are compared with both the architecture. The design is verified for the
following combination of instruction execution in parallel.
1)Prime field multiplication and Binary field addition .
2) Prime field addition and Binary field addition.
3) Prime field subtraction and binary field addition .
Example execution for Prime field addition and binary field addition waveform is shown in the
figure6. This design occupy only 20% of area in spartran 3E and 25.6% area in Virtex5.
TABLE 1 Comparison of Implementation in Spartan3AN&Virtex5
Sl.no Parameter Spartran 3E Virtex5
1. Area 20% 25.6%
2. Delay 30.83ns 14.94ns
3. Max.frequency 32.43Mhz 66.87Mhz
7. CONCLUSION
The dual core structure executes the parallelism with both fields of instructions. This architecture
is a sample design of crypto processor for executing prime field and binary field operations. Its
design is a low power architecture that can be realized on moderate silicon area. This is the basic
design of dual field implementation in dual core architecture. This design can be further
developed for more number of Prime and binary field instructions. This type of development of
the design can get a secured dual field crypto processor unit. This work can also be further
developed for the superscalar architecture, which will increase the speed of execution.
REFERENCES
[1] S. Mitsunari, R. Sakai, and M. Kasahara, “A new traitor tracing ,”IEICE trans. Fundam., vol.
2,pp.481-484,2002.
[2] I.Duursma and H. Lee, “Tate Pairing Implementation for Hyperelliptic Curves y2=xp-x+d,” in ASIA
CRYPT 2003, LNCS 2894, 2003, pp. 111-123.
[3] Johannes Wolkerstorfer, “Dual-Field Arithmetic Unit for GF(p) and GF(2m)”Institute for Applied
Information Processing and Communications, Graz University of Technology, Inffeldgasse 16a, 8010
Graz, Austria. This work origin from the European Commission funded project USB CRYPT
established under contract IST-2000-25169intheInformationSocietyTechnologies(IST)Program.
[4] P.C. Kocher, “Timing attacks on implementations of diffle-hellman, RSA,DSS and other systems,” in
Adv. Cryptology-CRYPTO’96,LNCS1109,1996,pp.104-113.
10. International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.1, February 2013
60
[5] D. N. Amanor, C. Paar, J. Pelzl, V. Bunimov, and M. Schimmler, “Efficient hardware architectures
for modular multiplication on FPGAs,” in proc.Int. Conf. Field Program. LogicAppl.2005,pp.539-
542.
[6] V. Bunimov and M. Schmiller, “Area and time efficient modular multiplication of large integers,” in
Proc. ASAP, 2003,pp.400-409.
[7] C. Reberio and D. Mukhopadhyay,”High speed compact elliptic curve crypto processor for FPGA
platforms,” Indocrypt’08, LNCS5365, pp.376-388, 2008.
[8] S.Ghosh, D. Mukhopadhyay, and D.Roychowdhury,”High speed Fp multipliers and adders on FPGA
platform,” presented at the DASIP’10, Scotland,U.K.,2010.
[9] Santosh Ghosh, Debdeep Mukhopadhyay, and Dipanwita Roychowdhury,” Secure Dual- core Crypto
processor for Curves on FPGA Platform”. DigitalObjectIdentifier 0.1109/TVLSI.2012.2188655.
[10] Jun-Hong Chen, Ming-Der Shieh, Member, IEEE, and Wen-Ching Lin,” A High-Performance
Unified-Field Reconfigurable Cryptographic Processor”. Digital Object Identifier
10.1109/TVLSI.2009.2020397.
[11] D.Kammler, D.Zhang, P.Schwabe,H. charwaechter, M. Langenberg, D. Auras, G. Ascheid, and R.
Mathar, “Designing an ASIP for cryptographic pairings over Barreto-Naehrig Curves,” CHES’09,
LNCS 5747, pp. 254-271,2009.
[12] J. Fan, F. Vercauteren, and I.Verbauwhede,”Faster Fp-arithematic for Cryptographic pairings on
Barreto-Naehrig curves,” CHES’09, LNCS 5747, pp.240-253, 2009.
[13] J. Goodman, A. P. Chandrakasan, An Energy-efficient Reconfigurable Public-Key Cryptography
Processor, IEEE Journal of Solid-State Circuits, pp. 1808–1820, November 2001.
Authors
C.Veeraraghavan received the M.Sc. in Applied Electronics from the Bharathiar
University in the year 1998 and M.Phil. in Electronics with specialization of VLSI
Design from Bharathiar University in the year 2006. Currently, he is pursuing Ph.D at
Sri Krishna Arts and Science college, Department of Electronics and communication
systems, Bharathiar University, Tamilnadu, India.
Dr. K.Rajendran received the Ph.D degree in Electronics from Bharathiar University,
Coimbatore in March 2009. He is currently an Assistant Professor in the Department
of Electronics, Government Arts College for women, Ramanathapuram, Tamilnadu,
India.