This document proposes a partial prediction algorithm to reduce computational complexity in H.264/AVC 4x4 intra-prediction mode decision. It exploits the inherent symmetry in spatial prediction modes to quickly determine a subset of candidate modes for each 4x4 block. Unlike other fast algorithms, it does not assume a dominant edge is present in each block. By considering pixel-to-pixel correlation and developing simple cost measures from the symmetries, it can search 3 or fewer modes instead of 9 to select the best prediction mode, reducing complexity with minimal impact on quality.
A computationally efficient method to find transformed residueiaemedu
The document discusses reducing the computational complexity of intra 4x4 prediction in H.264 video encoding. It first analyzes the complexity of standard intra 4x4 prediction, which involves predicting each 4x4 block using all nine prediction modes. It finds the complexity to be high due to repeating common calculations for each mode. The paper then proposes a novel method to directly determine transformed residue coefficients for intra 4x4 blocks, reducing unnecessary computations and enhancing encoding speed. Experimental results show the new method reduces computational complexity by at least 40% without impacting coding efficiency or performance.
A Study of BFLOAT16 for Deep Learning TrainingSubhajit Sahu
Highlighted notes of:
A Study of BFLOAT16 for Deep Learning Training
This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for DeepLearning training across image classification, speech recognition, language model-ing, generative networks, and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed-precision training and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16. We have implemented a method to emulate BFLOAT16 operations in Tensorflow, Caffe2, IntelCaffe, and Neon for our experiments. Our results show that deep learning training using BFLOAT16tensors achieves the same state-of-the-art (SOTA) results across domains as FP32tensors in the same number of iterations and with no changes to hyper-parameters.
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODELVLSICS Design
This document summarizes a research paper on implementing a low power design methodology for recursive encoders and decoders. It discusses how recursive coding can achieve better error correction performance at low signal-to-noise ratios compared to other codes. It then describes the design of a recursive decoder that uses the log-MAP algorithm to minimize power consumption. The decoder uses five main computational steps - branch metric calculation, forward metric computation, backward metric computation, log-likelihood ratio calculation, and extrinsic information calculation. It also compares the implementation of four-state and eight-state recursive encoders. The goal of the design is to optimize the power and area of recursive encoders and decoders.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
IRJET- Review Paper on Study of Various Interleavers and their SignificanceIRJET Journal
This document reviews various interleavers and their significance in digital communication systems. It discusses how interleavers can be used in turbo encoders and decoders to improve error correction capabilities without reducing bandwidth. The document summarizes different types of interleavers including random, QPP, helical, odd-even, and matrix interleavers. It also discusses turbo encoding and decoding processes and how convolutional codes differ from block codes. Key performance metrics like bit error rate and bit error rate curves are analyzed to evaluate and compare interleaver quality.
Tail-biting convolutional codes (TBCC) have been extensively applied in communication systems. This method is implemented by replacing the fixedtail with tail-biting data. This concept is needed to achieve an effective decoding computation. Unfortunately, it makes the decoding computation becomes more complex. Hence, several algorithms have been developed to overcome this issue in which most of them are implemented iteratively with uncertain number of iteration. In this paper, we propose a VLSI architecture to implement our proposed reversed-trellis TBCC (RT-TBCC) algorithm. This algorithm is designed by modifying direct-terminating maximumlikelihood (ML) decoding process to achieve better correction rate. The purpose is to offer an alternative solution for tail-biting convolutional code decoding process with less number of computation compared to the existing solution. The proposed architecture has been evaluated for LTE standard and it significantly reduces the computational time and resources compared to the existing direct-terminating ML decoder. For evaluations on functionality and Bit Error Rate (BER) analysis, several simulations, System-on-Chip (SoC) implementation and synthesis in FPGA are performed.
A New Approach to Linear Estimation Problem in Multiuser Massive MIMO SystemsRadita Apriana
A novel approach for solving linear estimation problem in multi-user massive MIMO systems is
proposed. In this approach, the difficulty of matrix inversion is attributed to the incomplete definition of the
dot product. The general definition of dot product implies that the columns of channel matrix are always
orthogonal whereas, in practice, they may be not. If the latter information can be incorporated into dot
product, then the unknowns can be directly computed from projections without inverting the channel
matrix. By doing so, the proposed method is able to achieve an exact solution with a 25% reduction in
computational complexity as compared to the QR method. Proposed method is stable, offers an extra
flexibility of computing any single unknown, and can be implemented in just twelve lines of code.
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...ijsrd.com
Error correction is an integral part of any communication system and for this purpose, the convolution codes are widely used as forward error correction codes. For decoding of convolution codes, at the receiver end Viterbi Decoder is being employed. The parameters of Viterbi algorithm can be changed to suit a specific application. The high speed and small area are two important design parameters in today’s wireless technology. In this paper, a high speed feed forward viterbi decoder has been designed using hybrid track back and register exchange architecture and embedded BRAM of target FPGA. The proposed viterbi decoder has been designed with Matlab, simulated with Xilinx ISE 8.1i Tool, synthesized with Xilinx Synthesis Tool (XST), and implemented on Xilinx Virtex4 based xc4vlx15 FPGA device. The results show that the proposed design can operate at an estimated frequency of 107.7 MHz by consuming considerably less resources on target device to provide cost effective solution for wireless applications.
A computationally efficient method to find transformed residueiaemedu
The document discusses reducing the computational complexity of intra 4x4 prediction in H.264 video encoding. It first analyzes the complexity of standard intra 4x4 prediction, which involves predicting each 4x4 block using all nine prediction modes. It finds the complexity to be high due to repeating common calculations for each mode. The paper then proposes a novel method to directly determine transformed residue coefficients for intra 4x4 blocks, reducing unnecessary computations and enhancing encoding speed. Experimental results show the new method reduces computational complexity by at least 40% without impacting coding efficiency or performance.
A Study of BFLOAT16 for Deep Learning TrainingSubhajit Sahu
Highlighted notes of:
A Study of BFLOAT16 for Deep Learning Training
This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for DeepLearning training across image classification, speech recognition, language model-ing, generative networks, and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed-precision training and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16. We have implemented a method to emulate BFLOAT16 operations in Tensorflow, Caffe2, IntelCaffe, and Neon for our experiments. Our results show that deep learning training using BFLOAT16tensors achieves the same state-of-the-art (SOTA) results across domains as FP32tensors in the same number of iterations and with no changes to hyper-parameters.
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODELVLSICS Design
This document summarizes a research paper on implementing a low power design methodology for recursive encoders and decoders. It discusses how recursive coding can achieve better error correction performance at low signal-to-noise ratios compared to other codes. It then describes the design of a recursive decoder that uses the log-MAP algorithm to minimize power consumption. The decoder uses five main computational steps - branch metric calculation, forward metric computation, backward metric computation, log-likelihood ratio calculation, and extrinsic information calculation. It also compares the implementation of four-state and eight-state recursive encoders. The goal of the design is to optimize the power and area of recursive encoders and decoders.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
IRJET- Review Paper on Study of Various Interleavers and their SignificanceIRJET Journal
This document reviews various interleavers and their significance in digital communication systems. It discusses how interleavers can be used in turbo encoders and decoders to improve error correction capabilities without reducing bandwidth. The document summarizes different types of interleavers including random, QPP, helical, odd-even, and matrix interleavers. It also discusses turbo encoding and decoding processes and how convolutional codes differ from block codes. Key performance metrics like bit error rate and bit error rate curves are analyzed to evaluate and compare interleaver quality.
Tail-biting convolutional codes (TBCC) have been extensively applied in communication systems. This method is implemented by replacing the fixedtail with tail-biting data. This concept is needed to achieve an effective decoding computation. Unfortunately, it makes the decoding computation becomes more complex. Hence, several algorithms have been developed to overcome this issue in which most of them are implemented iteratively with uncertain number of iteration. In this paper, we propose a VLSI architecture to implement our proposed reversed-trellis TBCC (RT-TBCC) algorithm. This algorithm is designed by modifying direct-terminating maximumlikelihood (ML) decoding process to achieve better correction rate. The purpose is to offer an alternative solution for tail-biting convolutional code decoding process with less number of computation compared to the existing solution. The proposed architecture has been evaluated for LTE standard and it significantly reduces the computational time and resources compared to the existing direct-terminating ML decoder. For evaluations on functionality and Bit Error Rate (BER) analysis, several simulations, System-on-Chip (SoC) implementation and synthesis in FPGA are performed.
A New Approach to Linear Estimation Problem in Multiuser Massive MIMO SystemsRadita Apriana
A novel approach for solving linear estimation problem in multi-user massive MIMO systems is
proposed. In this approach, the difficulty of matrix inversion is attributed to the incomplete definition of the
dot product. The general definition of dot product implies that the columns of channel matrix are always
orthogonal whereas, in practice, they may be not. If the latter information can be incorporated into dot
product, then the unknowns can be directly computed from projections without inverting the channel
matrix. By doing so, the proposed method is able to achieve an exact solution with a 25% reduction in
computational complexity as compared to the QR method. Proposed method is stable, offers an extra
flexibility of computing any single unknown, and can be implemented in just twelve lines of code.
FPGA Implementation of Viterbi Decoder using Hybrid Trace Back and Register E...ijsrd.com
Error correction is an integral part of any communication system and for this purpose, the convolution codes are widely used as forward error correction codes. For decoding of convolution codes, at the receiver end Viterbi Decoder is being employed. The parameters of Viterbi algorithm can be changed to suit a specific application. The high speed and small area are two important design parameters in today’s wireless technology. In this paper, a high speed feed forward viterbi decoder has been designed using hybrid track back and register exchange architecture and embedded BRAM of target FPGA. The proposed viterbi decoder has been designed with Matlab, simulated with Xilinx ISE 8.1i Tool, synthesized with Xilinx Synthesis Tool (XST), and implemented on Xilinx Virtex4 based xc4vlx15 FPGA device. The results show that the proposed design can operate at an estimated frequency of 107.7 MHz by consuming considerably less resources on target device to provide cost effective solution for wireless applications.
Exploiting 2-Dimensional Source Correlation in Channel Decoding with Paramete...IJECEIAES
The document describes a proposed joint source-channel coding (JSCC) system that exploits 2-dimensional source correlation in channel decoding with parameter estimation. The system uses a modified Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm at the decoder to exploit source correlation on rows and columns of a 2D source. A parameter estimation technique based on the Baum-Welch algorithm is used jointly with the decoder to estimate source correlation parameters at the receiver since these parameters are not always known in practice. Simulation results show that the proposed coding scheme that performs joint decoding and parameter estimation performs very close to an ideal 2D JSCC system with perfect knowledge of source correlation parameters.
An efficient hardware logarithm generator with modified quasi-symmetrical app...IJECEIAES
This paper presents a low-error, low-area FPGA-based hardware logarithm generator for digital signal processing systems which require high-speed, real time logarithm operations. The proposed logarithm generator employs the modified quasi-symmetrical approach for an efficient hardware implementation. The error analysis and implementation results are also presented and discussed. The achieved results show that the proposed approach can reduce the approximation error and hardware area compared with traditional methods.
This document discusses the design of a pipelined architecture for sparse matrix-vector multiplication on an FPGA. It begins with introductions to matrices, linear algebra, and matrix multiplication. It then describes the objective of building a hardware processor to perform multiple arithmetic operations in parallel through pipelining. The document reviews literature on pipelined floating point units. It provides details on the proposed pipelined design for sparse matrix-vector multiplication, including storing vector values in on-chip memory and using multiple pipelines to complete results in parallel. Simulation results showing reduced power and execution time are presented before concluding the design can improve performance for scientific applications.
Cross-Talk Control with Analog In-Memory-Compute for Artificial Intelligence ...Bruce Morton
This brief paper reports on a study of the feasibility of 8-bit precision in analog, in-memory computation of Multiply-Accumulate (MAC) sums, as proposed for implementation of efficient AI/ML hardware. With conventional read-out means and plausible memory array parasitic assumptions, cross-talk renders the 8-bit goal unattainable. In contrast, with the addition of spatial filter circuitry, results show MAC sum read-out can be made insensitive to parasitic cross-talk.
REDUCTION OF BUS TRANSITION FOR COMPRESSED CODE SYSTEMSVLSICS Design
Low power VLSI circuit design is one of the most important issues in present day technology. One of the ways of reducing power is to reduce the number of transitions on the bus. The main focus here is to present a method for reducing the power consumption of compressed-code systems by inverting the bits that are transmitted on the bus. Compression will generally increase bit-toggling, as it removes redundancies from the code transmitted on the bus. Arithmetic coding technique is used for compression /decompression and bit-toggling reduction is done by using shift invert coding technique. Therefore, there is also an additional challenge, to find the right balance between compression ratio and the bit-toggling reduction. This technique requires only 2 extra bits for the low Power coding, irrespective of the bit-width of the bus for compressed data.
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
This paper is devoted to the design of dual core crypto processor for executing both Prime field and binary field instructions. The proposed design is specifically optimized for Field programmable gate array (FPGA) platform. Combination of two different field (prime field GF(p) and Binary field GF(2m)) instructions execution is analysed.The design is implemented in Spartan 3E and virtex5. Both the performance results are compared. The implementation result shows the execution of parallelism using dual field instructions
The floating random walk (FRW) algorithm has several advantages for extracting interconnect capacitance. However, for multi-layer dielectrics in VLSI technology, the efficiency of FRW algorithm would be degraded due to the frequent stop of walk at dielectric interface. In this paper, an approach is proposed to calculate multi-dielectric Green's function, which is utilized to enable hops across dielectric interface in the FRW. Numerical results show that the proposed approach is about 4X faster than an existing method, and brings several times speedup to the FRW-based capacitance extraction for actual multi-dielectric interconnect structures.
Low complexity design of non binary ldpc decoder using extended min-sum algor...eSAT Journals
This document summarizes a research paper on reducing the computational complexity of non-binary LDPC decoders using an extended min-sum algorithm. It introduces low-density parity check codes and non-binary LDPC codes. It then describes an extended min-sum decoding algorithm and proposes two modifications to the parity check matrix - a lower diagonal matrix and a doubly diagonal matrix - to reduce complexity while maintaining performance. Simulation results on code lengths of 504 and 648 bits show the doubly diagonal matrix achieves the best bit error rate. Analysis finds the lower diagonal matrix has the lowest computational complexity of the approaches.
The document discusses porting a seismic inversion code to run in parallel using standard message passing libraries. It describes three options considered for distributing the large 3D seismic data across processors: mapping the data to a processor grid, treating it as a sparse matrix problem, or distributing the data as 1D vectors assigned to each processor. The third option was chosen as it best preserved the code structure, had regular dependencies, and simplified communications. The parallel code was implemented using the Distributed Data Library (DDL) for data management and the Message Passing Interface (MPI) for basic point-to-point communication between processors. Initial tests on clusters showed near linear speedup on up to 30 processors.
Lexically constrained decoding for sequence generation using grid beam searchSatoru Katsumata
This document summarizes a research paper that presents Grid Beam Search (GBS), an algorithm that extends beam search to allow inclusion of pre-specified lexical constraints in the output sequence generation. The researchers demonstrate GBS can provide improvements in translation quality for interactive translation scenarios by incorporating user feedback as constraints. They also show GBS enables significant gains in domain adaptation without retraining by using terminology from the target domain as constraints.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
High performance nb-ldpc decoder with reduction of message exchange Ieee Xpert
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...CSCJournals
This paper explores architecture possibilities to utilize more than one multiplier to speedup the computation of GF(p) elliptic curve crypto systems. The architectures considers projective coordinates to reduce the GF(p) inversion complexity through additional multiplication operations. The study compares the standard projective coordinates (X/Z,Y/Z) with the Jacobian coordinates (X/Z2,Y/Z3) exploiting their multiplication operations parallelism. We assume using 2, 3, 4, and 5 parallel multipliers and accordingly choose the appropriate projective coordinate efficiently. The study proved that the Jacobian coordinates (X/Z2,Y/Z3) is preferred when single or two multipliers are used. Whenever 3 or 4 multipliers are available, the standard projective coordinates (X/Z,Y/Z) are favored. We found that designs with 5 multipliers have no benefit over the 4 multipliers because of the data dependency. These architectures study are particularly attractive for elliptic curve cryptosystems when hardware area optimization is the key concern.
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithmijsrd.com
Advanced encryption standard was accepted as a Federal Information Processing Standard (FIPS) standard. In traditional look up table (LUT) approaches, the unbreakable delay is longer than the total delay of the rest of operations in each round. LUT approach consumes a large area. It is more efficient to apply composite field arithmetic in the SubBytes transformation of the AES algorithm. It not only reduces the complexity but also enables deep sub pipelining such that higher speed can be achieved. Isomorphic mapping can be employed to convert GF(28) to GF(22)2)2) ,so that multiplicative inverse can be easily obtained. SubBytes and InvSubBytes transformations are merged using composite field arithmetic. It is most important responsible for the implementation of low cost and high throughput AES architecture. As compared to the typical ROM based lookup table, the presented implementation is both capable of higher speeds since it can be pipelined and small in terms of area occupancy (137/1290 slices on a Spartan III XCS200-5FPGA).
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...lauratoni4
This document provides an overview of graph signal processing for machine learning. It discusses how networks can be represented as graphs and how graph-structured data is ubiquitous, appearing in applications involving networks of regions, road junctions, individuals, brain regions, and more. It also outlines challenges for using graph signal processing to exploit data structure, improve efficiency and robustness, and enhance model interpretability. The document concludes with a discussion of applications and open challenges.
A Review on Analysis on Codes using Different AlgorithmsAM Publications
Turbo codes are a new class of forward error correcting codes that have proved to give a performance close to the
channel capacity as proposed by C. Shannon. A turbo code encoder is formed by the parallel concatenation of two identical
recursive convolutional encoders separated by an interleaver. The turbo code decoder utilizes two cascaded decoding blocks
where each block in turn share the a priori information produced by the other. The decoding scheme has the benefit to work
iteratively such that the overall performance can be improved. In this paper, a performance analysis of turbo codes is carried.
Two decoding techniques, namely, Log maximum a posteriori probability (Log-MAP) algorithm, and the soft output Viterbi
algorithm (SOVA), which uses log-likelihood ratio to produce soft outputs are used in the performance analysis. The effect of
using different decoding schemes is studied on both punctured and unpunctured codes. Finally, the performance of the two
different decoding techniques is compared in terms of bit error rate. Simulations are carried out with the help of MATLAB
tools. Results indicate that turbo codes outperform the well-accepted convolutional codes with same constraints; however the best results are given by the unpunctured Log-MAP decoding of turbo codes.
Graph based transistor network generation method for supergate designIeee Xpert
Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design
A high performance fir filter architecture for fixed and reconfigurable appli...Ieee Xpert
A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications
Mahmoud Mohamed Hesham Abd Eltawab Aly is seeking a challenging position in international economics and business analysis that allows him to utilize his experience in foreign purchasing, import/export coordination, accounting, and marketing. He has over 5 years of relevant work experience, including his current role as a Foreign Purchasing Specialist where he negotiates with international suppliers. He also has a diploma in Foreign Trade Economics from Helwan University and is proficient in English, Arabic, and Italian.
This document provides an overview of the neurobiology of trauma and its implications for interviewing victims. It discusses how trauma impacts brain functioning and memory formation. Traumatic memories are initially just fragmented sensations that are later integrated into narrative stories by the prefrontal cortex. However, the prefrontal cortex may be impaired during trauma, resulting in fragmented and inconsistent victim accounts that raise suspicion. Understanding these neurobiological effects of trauma can help improve victim interviewing techniques.
Exploiting 2-Dimensional Source Correlation in Channel Decoding with Paramete...IJECEIAES
The document describes a proposed joint source-channel coding (JSCC) system that exploits 2-dimensional source correlation in channel decoding with parameter estimation. The system uses a modified Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm at the decoder to exploit source correlation on rows and columns of a 2D source. A parameter estimation technique based on the Baum-Welch algorithm is used jointly with the decoder to estimate source correlation parameters at the receiver since these parameters are not always known in practice. Simulation results show that the proposed coding scheme that performs joint decoding and parameter estimation performs very close to an ideal 2D JSCC system with perfect knowledge of source correlation parameters.
An efficient hardware logarithm generator with modified quasi-symmetrical app...IJECEIAES
This paper presents a low-error, low-area FPGA-based hardware logarithm generator for digital signal processing systems which require high-speed, real time logarithm operations. The proposed logarithm generator employs the modified quasi-symmetrical approach for an efficient hardware implementation. The error analysis and implementation results are also presented and discussed. The achieved results show that the proposed approach can reduce the approximation error and hardware area compared with traditional methods.
This document discusses the design of a pipelined architecture for sparse matrix-vector multiplication on an FPGA. It begins with introductions to matrices, linear algebra, and matrix multiplication. It then describes the objective of building a hardware processor to perform multiple arithmetic operations in parallel through pipelining. The document reviews literature on pipelined floating point units. It provides details on the proposed pipelined design for sparse matrix-vector multiplication, including storing vector values in on-chip memory and using multiple pipelines to complete results in parallel. Simulation results showing reduced power and execution time are presented before concluding the design can improve performance for scientific applications.
Cross-Talk Control with Analog In-Memory-Compute for Artificial Intelligence ...Bruce Morton
This brief paper reports on a study of the feasibility of 8-bit precision in analog, in-memory computation of Multiply-Accumulate (MAC) sums, as proposed for implementation of efficient AI/ML hardware. With conventional read-out means and plausible memory array parasitic assumptions, cross-talk renders the 8-bit goal unattainable. In contrast, with the addition of spatial filter circuitry, results show MAC sum read-out can be made insensitive to parasitic cross-talk.
REDUCTION OF BUS TRANSITION FOR COMPRESSED CODE SYSTEMSVLSICS Design
Low power VLSI circuit design is one of the most important issues in present day technology. One of the ways of reducing power is to reduce the number of transitions on the bus. The main focus here is to present a method for reducing the power consumption of compressed-code systems by inverting the bits that are transmitted on the bus. Compression will generally increase bit-toggling, as it removes redundancies from the code transmitted on the bus. Arithmetic coding technique is used for compression /decompression and bit-toggling reduction is done by using shift invert coding technique. Therefore, there is also an additional challenge, to find the right balance between compression ratio and the bit-toggling reduction. This technique requires only 2 extra bits for the low Power coding, irrespective of the bit-width of the bus for compressed data.
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
This paper is devoted to the design of dual core crypto processor for executing both Prime field and binary field instructions. The proposed design is specifically optimized for Field programmable gate array (FPGA) platform. Combination of two different field (prime field GF(p) and Binary field GF(2m)) instructions execution is analysed.The design is implemented in Spartan 3E and virtex5. Both the performance results are compared. The implementation result shows the execution of parallelism using dual field instructions
The floating random walk (FRW) algorithm has several advantages for extracting interconnect capacitance. However, for multi-layer dielectrics in VLSI technology, the efficiency of FRW algorithm would be degraded due to the frequent stop of walk at dielectric interface. In this paper, an approach is proposed to calculate multi-dielectric Green's function, which is utilized to enable hops across dielectric interface in the FRW. Numerical results show that the proposed approach is about 4X faster than an existing method, and brings several times speedup to the FRW-based capacitance extraction for actual multi-dielectric interconnect structures.
Low complexity design of non binary ldpc decoder using extended min-sum algor...eSAT Journals
This document summarizes a research paper on reducing the computational complexity of non-binary LDPC decoders using an extended min-sum algorithm. It introduces low-density parity check codes and non-binary LDPC codes. It then describes an extended min-sum decoding algorithm and proposes two modifications to the parity check matrix - a lower diagonal matrix and a doubly diagonal matrix - to reduce complexity while maintaining performance. Simulation results on code lengths of 504 and 648 bits show the doubly diagonal matrix achieves the best bit error rate. Analysis finds the lower diagonal matrix has the lowest computational complexity of the approaches.
The document discusses porting a seismic inversion code to run in parallel using standard message passing libraries. It describes three options considered for distributing the large 3D seismic data across processors: mapping the data to a processor grid, treating it as a sparse matrix problem, or distributing the data as 1D vectors assigned to each processor. The third option was chosen as it best preserved the code structure, had regular dependencies, and simplified communications. The parallel code was implemented using the Distributed Data Library (DDL) for data management and the Message Passing Interface (MPI) for basic point-to-point communication between processors. Initial tests on clusters showed near linear speedup on up to 30 processors.
Lexically constrained decoding for sequence generation using grid beam searchSatoru Katsumata
This document summarizes a research paper that presents Grid Beam Search (GBS), an algorithm that extends beam search to allow inclusion of pre-specified lexical constraints in the output sequence generation. The researchers demonstrate GBS can provide improvements in translation quality for interactive translation scenarios by incorporating user feedback as constraints. They also show GBS enables significant gains in domain adaptation without retraining by using terminology from the target domain as constraints.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
High performance nb-ldpc decoder with reduction of message exchange Ieee Xpert
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...CSCJournals
This paper explores architecture possibilities to utilize more than one multiplier to speedup the computation of GF(p) elliptic curve crypto systems. The architectures considers projective coordinates to reduce the GF(p) inversion complexity through additional multiplication operations. The study compares the standard projective coordinates (X/Z,Y/Z) with the Jacobian coordinates (X/Z2,Y/Z3) exploiting their multiplication operations parallelism. We assume using 2, 3, 4, and 5 parallel multipliers and accordingly choose the appropriate projective coordinate efficiently. The study proved that the Jacobian coordinates (X/Z2,Y/Z3) is preferred when single or two multipliers are used. Whenever 3 or 4 multipliers are available, the standard projective coordinates (X/Z,Y/Z) are favored. We found that designs with 5 multipliers have no benefit over the 4 multipliers because of the data dependency. These architectures study are particularly attractive for elliptic curve cryptosystems when hardware area optimization is the key concern.
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithmijsrd.com
Advanced encryption standard was accepted as a Federal Information Processing Standard (FIPS) standard. In traditional look up table (LUT) approaches, the unbreakable delay is longer than the total delay of the rest of operations in each round. LUT approach consumes a large area. It is more efficient to apply composite field arithmetic in the SubBytes transformation of the AES algorithm. It not only reduces the complexity but also enables deep sub pipelining such that higher speed can be achieved. Isomorphic mapping can be employed to convert GF(28) to GF(22)2)2) ,so that multiplicative inverse can be easily obtained. SubBytes and InvSubBytes transformations are merged using composite field arithmetic. It is most important responsible for the implementation of low cost and high throughput AES architecture. As compared to the typical ROM based lookup table, the presented implementation is both capable of higher speeds since it can be pipelined and small in terms of area occupancy (137/1290 slices on a Spartan III XCS200-5FPGA).
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...lauratoni4
This document provides an overview of graph signal processing for machine learning. It discusses how networks can be represented as graphs and how graph-structured data is ubiquitous, appearing in applications involving networks of regions, road junctions, individuals, brain regions, and more. It also outlines challenges for using graph signal processing to exploit data structure, improve efficiency and robustness, and enhance model interpretability. The document concludes with a discussion of applications and open challenges.
A Review on Analysis on Codes using Different AlgorithmsAM Publications
Turbo codes are a new class of forward error correcting codes that have proved to give a performance close to the
channel capacity as proposed by C. Shannon. A turbo code encoder is formed by the parallel concatenation of two identical
recursive convolutional encoders separated by an interleaver. The turbo code decoder utilizes two cascaded decoding blocks
where each block in turn share the a priori information produced by the other. The decoding scheme has the benefit to work
iteratively such that the overall performance can be improved. In this paper, a performance analysis of turbo codes is carried.
Two decoding techniques, namely, Log maximum a posteriori probability (Log-MAP) algorithm, and the soft output Viterbi
algorithm (SOVA), which uses log-likelihood ratio to produce soft outputs are used in the performance analysis. The effect of
using different decoding schemes is studied on both punctured and unpunctured codes. Finally, the performance of the two
different decoding techniques is compared in terms of bit error rate. Simulations are carried out with the help of MATLAB
tools. Results indicate that turbo codes outperform the well-accepted convolutional codes with same constraints; however the best results are given by the unpunctured Log-MAP decoding of turbo codes.
Graph based transistor network generation method for supergate designIeee Xpert
Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design Graph based transistor network generation method for supergate design
A high performance fir filter architecture for fixed and reconfigurable appli...Ieee Xpert
A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications A high performance fir filter architecture for fixed and reconfigurable applications
Mahmoud Mohamed Hesham Abd Eltawab Aly is seeking a challenging position in international economics and business analysis that allows him to utilize his experience in foreign purchasing, import/export coordination, accounting, and marketing. He has over 5 years of relevant work experience, including his current role as a Foreign Purchasing Specialist where he negotiates with international suppliers. He also has a diploma in Foreign Trade Economics from Helwan University and is proficient in English, Arabic, and Italian.
This document provides an overview of the neurobiology of trauma and its implications for interviewing victims. It discusses how trauma impacts brain functioning and memory formation. Traumatic memories are initially just fragmented sensations that are later integrated into narrative stories by the prefrontal cortex. However, the prefrontal cortex may be impaired during trauma, resulting in fragmented and inconsistent victim accounts that raise suspicion. Understanding these neurobiological effects of trauma can help improve victim interviewing techniques.
Rock and rain is an agency that partners with Target to create brand experiences through strategic events, pop-ups, and tours. They have established a seamless collaboration with Target to provide an insider perspective along with fresh ideas. Some of their projects for Target include transforming perceptions of Target as a beauty destination, introducing new store concepts in an authentic way, launching fashion collections, and driving awareness of Target's grocery offerings through large-scale events. Their goal is to create experiences that excite guests and build loyalty for the Target brand.
The document discusses the origins of containers and mentions fresh lawn, lemonade, lunchtime at the library with news, games and maps. It focuses on containers, fresh food and drink, and activities at the library during lunch.
New nature club of dr bmn college of homescienceamruta Sapre
The committee organized various activities in the academic year 2014-2015 related to environmental protection and sustainability:
- 12 FY BSc students visited Bandra to understand the concept of citizens reclaiming open spaces on Sundays.
- A lecture on green auditing was organized for staff after discussions with experts.
- FYBCA students surveyed waste workers and mapped biodiversity, presenting their work to an environmental planner.
- Selected students worked on a mobile application project to map local biodiversity like birds and trees.
The committee also organized visits for students to environmental events and festivals to increase awareness.
PCI Compliance Myths, Reality and Solutions for RetailInDefense Security
Is this presentation,we discuss common misconceptions and myths that many retailers have about their PCI-DSS Compliance Obligations as well as share available solutions how to achieve and maintain PCI Compliance. Also, we outline many cyber security solutions that address certain objectives within the PCI Compliance requirements.
For additional info, visit https://indefensesecurity.com
Ensayo "Noruega: un país con grandes riquezas"raulense16
El ensayo describe a Noruega como un país con grandes riquezas culturales, históricas, geográficas y artísticas. Aborda temas como la geografía, sociedad, gobierno, economía y energía de Noruega. El ensayo está redactado de forma continua y objetiva, utilizando una sola fuente confiable de información.
Eurostat a publié, le 3 novembre 2016, un communiqué de presse relatif au taux de chômage dans la zone euro et dans l'UE28.
Dans la zone euro, le taux de chômage est évalué à 10,0% en septembre 2016. Un taux stable par rapport à août 2016 et en baisse par rapport à septembre 2015 (10,6%).
Eurostat souligne qu'il s'agit du taux le plus faible enregistré dans la zone euro depuis juin 2011.
Dans l'UE28, le taux de chômage s'établit à 8,5% en septembre 2016. Ce taux est stable par rapport à août 2016 et en baisse par rapport à septembre 2015 (9,2%). C'est le taux le plus faible enregistré par l'UE28 depuis février 2009.
L'Allemagne et la République Tchèque ont enregistré les taux de chômage les plus faibles (4,1% et 4,0%). La Grèce et l'Espagne ont enregistré les taux de chômage les plus élevés (23,2% et 19,3%).
M. Chandrashekhar is seeking a position as an accountant where he can contribute as an active team member and continue learning. He has over 9 years of experience in accounting roles where he maintained books of accounts, reconciled bank accounts, analyzed expenditures, and finalized annual financial statements for various companies. His academic credentials include an MBA in finance and he is proficient in accounting software like Tally, Focus, Wings2000 and Excel.
People think there is a secret sauce to SEO there isn't. They hope it will come from the heavens, from an ebook, from Facebook. Some book. Maybe from Twitter?
No, No and No.
A pillar of inbound marketing is content. Here we put it up to you to at least start. Today. Get working on your content and don't stop - EVER.
Kia New Sportage 2017 is the completely a new design of SUV. It is a small sport utility vehicle having a good fuel economy, performance. Includes trims like LX, EX, SX more at Kia Dealership.
O documento descreve um estudo sobre os problemas causados por pombos em Campo Grande-MS entre 1995-1997. Os resultados mostraram que o distrito sul recebeu mais reclamações sobre pombos, seguido pelo leste, norte e oeste. As escolas municipais tiveram maior infestação de pombos do que as estaduais. Poucos casos de doenças transmitidas por pombos foram notificados.
O navio brasileiro Paraná, carregado de café, foi torpedeado por um submarino alemão na costa francesa em 1917, matando três tripulantes. Os alemães ainda atiraram contra os náufragos. A Torre Eiffel começou a ser construída em 1887 para servir de entrada na Exposição Universal de Paris em 1889, sendo inaugurada em março daquele ano.
La lista es una estructura de datos que permite almacenar varios elementos de forma ordenada. Se compone de nodos enlazados entre sí que contienen la información y referencias al nodo siguiente. El programa principal recorre la lista para acceder a cada elemento de forma secuencial.
DogVacay is exploring launching its peer-to-peer dog sitting service in the UK. Market research found that the growing pet industry and humanization of pets presents an opportunity. DogVacay's competitive advantage is its certified sitters, which provides reassurance over competitors. The target market is middle class dog owners and sitters aged 20-59 in London, who value quality pet care services. A marketing mix is recommended focusing on building awareness through PR and digital promotion to drive loyalty within this target segment.
The document discusses evolving strategies in antibiotic discovery and development. It focuses on how chemical modification of natural product scaffolds through semi-synthesis has dominated antibiotic development and led to many successful drugs. However, rising drug resistance requires new innovations, like developing antibiotic adjuvants to preserve existing drugs or expanding chemical diversity through synthetic biology or new techniques to mine antibiotic-producing organisms. Understanding antibiotic chemical properties and harnessing both synthetic chemistry and natural product discovery will be important to address the growing need for new antibiotics.
Relato de Experiência em Automação de Testes Funcionais com SeleniumWagner Silva
O documento descreve a implementação de uma estratégia de automação de testes funcionais em uma empresa de TI para resolver problemas como grande esforço na execução de testes manuais e cobertura reduzida. A estratégia incluiu análise do processo de teste existente, definição de uma abordagem de automação, seleção de ferramentas, critérios para escolha de casos de teste e desenvolvimento e execução de scripts automatizados, resultando em redução do tempo de teste e detecção de mais falhas.
Este documento contiene tres partes sobre una gallina pinta y sus pollitos pintos. La primera parte presenta un trabalenguas sobre la gallina pinta y sus pollitos pintos. La segunda parte pide completar el trabalenguas con palabras faltantes. La tercera parte presenta otros trabalenguas para practicar.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Selection of intra prediction modes for intra frame coding in advanced video ...eSAT Journals
Abstract This paper proposes selection of Intra prediction modes for Intra frame coding in Advanced Video Coding Standard using Matlab. The proposed algorithm selects prediction modes for intra frame coding. There are nine prediction modes are there to predict the intra frame in AVC using Intra prediction,but all the prediction modes are not required for all the applications. Intra prediction is the first process of advanced video coding standard. It predicts a macro block by referring to its previous macro blocks to reduce spatial redundancy,appling all the prediction modes to predict intra frame it leads to more computational complexity is increased at the encoder of AVC. In the proposed algoriyhm, applied all the prediction modes(0-8) for prediction of intra frame but only few modes such as mode0, mode1, mode2,mode4,mode6 gives good PSNR, high comprssion ratio and low bit rate. Out of these modes mode2 gives good PSNR, compression ratio and redced bit rate, mode5, mode7 and mode8 gives lower PSNR, low compression ratio and increased bitrate compared to mode0,mode1, mode2, mode4 and mode6. The simulation results are presented using Matlab. The PSNR , compressed ratio and bit rate achived for different quantization parameters of mother daughter frames , foreman frames was presented. Keywords: AVC, PSNR, CAVLC, Macroblock, Prediction modes.
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...ijma
Thresholding operators have been used successfully for denoising signals, mostly in the wavelet domain.
These operators transform a noisy coefficient into a denoised coefficient with a mapping that depends on
signal statistics and the value of the noisy coefficient itself. This paper demonstrates that a polynomial
threshold mapping can be used for enhanced denoising of Principal Component Analysis (PCA) transform
coefficients. In particular, two polynomial threshold operators are used here to map the coefficients
obtained with the popular local pixel grouping method (LPG-PCA), which eventually improves the
denoising power of LPG-PCA. The method reduces the computational burden of LPG-PCA, by eliminating
the need for a second iteration in most cases. Quality metrics and visual assessment show the improvement.
The document proposes a new method for frame synchronization in OFDMA mode of wireless networks. The method uses a timing metric based on the preamble structure specified for OFDMA, which consists of real BPSK symbols on every third subcarrier. In an ideal scenario with a DFT size of 2046, the timing metric results in four significant peaks, with the true frame boundary indicated by one of the peaks. Thresholding and detection strategies are discussed to identify the frame boundary in noisy channels. Simulations show the proposed method performs better than existing techniques, especially in fading channels.
Efficient pu mode decision and motion estimation for h.264 avc to hevc transc...sipij
H.264/AVC has been widely applied to various applications. However, a new video compression standard,
High Efficient Video Coding (HEVC), had been finalized in 2013. In this work, a fast transcoder from
H.264/AVC to HEVC is proposed. The proposed algorithm includes the fast prediction unit (PU) decision
and the fast motion estimation. With the strong relation between H.264/AVC and HEVC, the modes,
residuals, and variance of motion vectors (MVs) extracted from H.264/AVC can be reused to predict the
current encoding PU of HEVC. Furthermore, the MVs from H.264/AVC are used to decide the search
range of PU during motion estimation. Simulation results show that the proposed algorithm can save up to
53% of the encoding time and maintains the rate-distortion (R-D) performance for HEVC.
Estimation of bitlength of transformed quantized residueIAEME Publication
This document proposes a method to estimate the bitlength of transformed and quantized residue coefficients and syntax elements for mode decision in H.264 baseline encoding. It aims to reduce the computational complexity of calculating rate-distortion cost (RD-Cost) by estimating bitlengths without fully encoding bitstreams. The key aspects are:
1) It classifies residue coefficients and estimates bitlength for coefficient types like Luma_4x4, Luma DC, Luma AC, Chroma DC, and Chroma AC based on coding tables and context like neighboring coefficients.
2) It estimates bitlengths for syntax elements like macroblock type, prediction modes, and motion vectors based on Exp-Golomb coding tables
1) The document proposes using homogeneous motion discovery to generate additional reference frames for 4K video coding. Motion is estimated between reference frames and the current frame to generate affine motion models and associated masks.
2) Experimental results on 3 video sequences show average bit rate savings of 3.78% over HEVC by using the additional reference frames generated from the affine motion models.
3) The approach provides a simpler computation method for high resolution video coding compared to motion hint estimation, which requires super-pixel segmentation that becomes impractical for resolutions like 4K.
Lossless Data Compression Using Rice Algorithm Based On Curve Fitting TechniqueIRJET Journal
This document describes a lossless data compression technique called the Rice algorithm and a modification of the Rice algorithm using curve fitting. The Rice algorithm uses a two-stage approach of preprocessing data to decorrelate it followed by entropy coding to assign variable-length codes. The modified Rice algorithm uses curve fitting to generate a function to predict future data points for improved compression performance compared to the original Rice algorithm. Simulation results show the modified algorithm achieves better compression by selecting an encoding option that uses fewer bits than the original Rice algorithm on a sample data set.
Design and implementation of log domain decoder IJECEIAES
Low-Density-Parity-Check (LDPC) code has become famous in communications systems for error correction, as an advantage of the robust performance in correcting errors and the ability to meet all the requirements of the 5G system. However, the mot challenge faced researchers is the hardware implementation, because of higher complexity and long run-time. In this paper, an efficient and optimum design for log domain decoder has been implemented using Xilinx system generator with FPGA device Kintex7 (XC7K325T-2FFG900C). Results confirm that the proposed decoder gives a Bit Error Rate (BER) very closed to theory calculations which illustrate that this decoder is suitable for next generation demand which needs a high data rate with very low BER.
This document presents an intelligent link adaptation scheme for OFDM systems that adapts coding, modulation, and power allocation to maximize throughput. It uses a fuzzy rule-based system to select the optimal modulation and coding scheme based on channel state information and quality of service requirements. It then uses a differential evolution algorithm to find the optimal power vector to transmit over OFDM subcarriers while satisfying total power and bit error rate constraints. The proposed scheme is shown through simulations to outperform conventional fixed schemes and adaptive schemes that only optimize a subset of parameters. Product codes and QAM are used as the coding and modulation schemes, respectively.
Improving The Performance of Viterbi Decoder using Window System IJECEIAES
An efficient Viterbi decoder is introduced in this paper; it is called Viterbi decoder with window system. The simulation results, over Gaussian channels, are performed from rate 1/2, 1/3 and 2/3 joined to TCM encoder with memory in order of 2, 3. These results show that the proposed scheme outperforms the classical Viterbi by a gain of 1 dB. On the other hand, we propose a function called RSCPOLY2TRELLIS, for recursive systematic convolutional (RSC) encoder which creates the trellis structure of a recursive systematic convolutional encoder from the matrix “H”. Moreover, we present a comparison between the decoding algorithms of the TCM encoder like Viterbi soft and hard, and the variants of the MAP decoder known as BCJR or forward-backward algorithm which is very performant in decoding TCM, but depends on the size of the code, the memory, and the CPU requirements of the application.
This document summarizes a research paper that proposes using parallel concatenated turbo codes in wireless sensor networks in an adaptive way. The key points are:
1) Turbo codes can achieve near-Shannon limit performance but decoding is complex, making them difficult to implement on energy-constrained sensor nodes.
2) The proposed approach shifts the complex turbo decoding to the base station while sensor nodes implement encoding and basic error correction.
3) At sensor nodes, a parallel concatenated convolutional code (PCCC) circuit encodes data and detects/corrects errors in forwarded packets. This improves energy efficiency and reliability over the wireless sensor network.
In this paper, we describe an FPGA H.264/AVC encoder architecture performing at real-time. To reduce the critical path length and to increase throughput, the encoder uses a parallel and pipeline architecture and all modules have been optimized with respect the area cost. Our design is described in VHDL and synthesized to Altera Stratix III FPGA. The throughput of the FPGA architecture reaches a processing rate higher than 177 million of pixels per second at 130 MHz, permitting its use in H.264/AVC standard directed to HDTV.
This document presents a method for parallel CRC computation that improves on previous approaches. It derives a recursive formula from the polynomial generator degree that allows generation of parallel CRC circuits independently of the technology used. The approach generates an F-matrix from the polynomial that is used to design circuits for processing multiple bits in parallel. Simulation results show the parallel CRC circuits can process data faster than serial LFSR approaches, with 64-bit parallel processing achieving the fastest speed. The method provides a customizable solution for building parallel CRC hardware across different polynomial generators and data widths.
A new efficient way based on special stabilizer multiplier permutations to at...IJECEIAES
BCH codes represent an important class of cyclic error-correcting codes; their minimum distances are known only for some cases and remains an open NP-Hard problem in coding theory especially for large lengths. This paper presents an efficient scheme ZSSMP (Zimmermann Special Stabilizer Multiplier Permutation) to find the true value of the minimum distance for many large BCH codes. The proposed method consists in searching a codeword having the minimum weight by Zimmermann algorithm in the sub codes fixed by special stabilizer multiplier permutations. These few sub codes had very small dimensions compared to the dimension of the considered code itself and therefore the search of a codeword of global minimum weight is simplified in terms of run time complexity. ZSSMP is validated on all BCH codes of length 255 for which it gives the exact value of the minimum distance. For BCH codes of length 511, the proposed technique passes considerably the famous known powerful scheme of Canteaut and Chabaud used to attack the public-key cryptosystems based on codes. ZSSMP is very rapid and allows catching the smallest weight codewords in few seconds. By exploiting the efficiency and the quickness of ZSSMP, the true minimum distances and consequently the error correcting capability of all the set of 165 BCH codes of length up to 1023 are determined except the two cases of the BCH(511,148) and BCH(511,259) codes. The comparison of ZSSMP with other powerful methods proves its quality for attacking the hardness of minimum weight search problem at least for the codes studied in this paper.
Report AdvancedCodingFinal - Pietro SantoroPietro Santoro
The document provides a summary of a student's laboratory sessions on advanced wireless communications. It includes:
1) Implementation of a C++ class for a sliding window soft-input soft-output decoder with binary log-likelihood ratios as input and output.
2) Simulation of a binary convolutional coded system using the SISO decoder, 2-PAM modulation over an AWGN channel. Bit error rates are computed at the input and output of the SISO decoder.
3) Comments on the SISO decoder class and simulation results for convolutional codes with rates of 1/2, 1/4, and 1/8, reporting the four bit error rates versus Eb/N0.
An Efficient Interpolation-Based Chase BCH Decoderijsrd.com
This document describes an efficient interpolation-based Chase decoder for Bose-Chaudhuri-Hocquenghem (BCH) codes. BCH codes are error-correcting codes commonly used in applications such as flash memory and digital video broadcasting. The proposed Chase decoder uses an interpolation-based approach inspired by Chase decoders for Reed-Solomon codes but modified to leverage the binary properties of BCH codes. This allows it to correct up to t + η errors, where t is the number of errors the underlying BCH code can correct and η is the number of bits flipped in the Chase algorithm. The decoder consists of syndrome calculation, key equation solving, error location via Chien search, and error correction
Detection and Classification in Hyperspectral Images using Rate Distortion an...Pioneer Natural Resources
This document summarizes an experiment that compares two methods of bit allocation for compressing hyperspectral imagery using JPEG2000: 1) the traditional high bit rate quantizer approach and 2) the rate distortion optimal (RDO) approach. The experiment shows that both methods perform well at relatively low bit rates, achieving over 96% classification accuracy. However, at very low bit rates, the RDO approach outperforms the high bit rate quantizer approach, achieving 90% accuracy at 0.0375 bpppb compared to less than 90% for the high bit rate method. The RDO approach also achieves lower mean squared error than the high bit rate quantizer approach.
The document discusses using network coding to optimize routing schemes for multicasting in mobile ad-hoc networks. It defines the problem and assumptions, such as independent information sources and specifying multicast requirements. Network coding is described as employing coding at nodes rather than just relaying data. Simulations show that calculating the optimal routing scheme is much less complex with network coding compared to conventional routing, and the network coding solution uses close to the minimum possible energy.
2. appropriately termed, the Partial Intra-Prediction process, diminishes the
computational complexity while maintaining comparable bit-rates and PSNR. Further,
this method also offers flexibility to trade off between quality and computational
complexity as explained in Section 3.
This paper is organized as follows. In Section 2, we present an overview of the
H.264/AVC Intra-Prediction process, followed by a discussion of the several different
approaches for fast Intra-prediction in the available literature, as well as the
performance considerations. In Section 3, we discuss our proposed Partial
Intra-Prediction algorithm, followed by a discussion of the experimental results in
Section 4. Finally, Section 5 contains our concluding remarks.
2. BACKGROUND
2.1. Overview of H.264/AVC 4x4 Intra-Prediction mode decision
The H.264/AVC standard exploits the spatial correlation between adjacent
macroblocks/blocks for Intra prediction as explained below. Intra prediction is based
on the fact that these frames do not depend on earlier or later frames for
reconstruction. However, the previously encoded blocks from within the same frame
may be used as reference for new blocks. The resulting picture is referred to as an
I-picture. Fig.1 below shows the current block X to be predicted and its spatially
adjacent blocks U, V, W which are the left, up, and, up-right encoded 4x4 blocks
respectively.
In H.264/AVC Intra prediction is performed at two levels hierarchically. For
coding the luma component, one 16x16 macro-block may be predicted as a whole
using four Intra_16x16 modes, or each 4x4 sub-block of the 16x16 macro-block may
be predicted individually using a total of nine possible Intra_4x4 modes; prediction at
the level of 4x4 blocks is beneficial when the correlation is more localized. Intra
prediction for the chroma component uses similar techniques as those for luma
Intra_16x16 prediction. The different prediction modes generally correspond to
different edge orientations, and, closer the current block X is to a prediction block P,
more likely the current block has an edge oriented in that direction. This gives smaller
residue which in turn needs fewer bits to code.
Fig. 2 above shows a 4x4 block containing 16 pixels labeled from a through p. A
prediction block P is calculated based on the pixels labeled A-M obtained from the
W
U
V
X
Fig.1: Adjacent blocks of current 4x4 block
CA B D
a b
e f
c d
g h
i j
m n
k l
o p
E F G H
L
K
J
I
M
Fig. 2: A 4x4 block and its neighboring pixels
3
7
0
5
4
6
1
8
2 DC
Fig. 3: Direction of 9 4x4 intra-prediction
233
3. neighboring blocks. A prediction mode is a way to generate these 16 predictive pixel
values using some or all of the neighboring pixels in nine different directions as
shown in Fig. 3. Note that in some cases, not all of the samples A-M are available
within the current slice. In order to preserve independent decoding of slices, only
samples within the current slice are used for prediction. DC prediction is modified
depending on which samples among A-M are available; the other modes may only be
used if all of the required prediction samples are available (except that, if E, F, G and
H are not available, their value is copied from sample D).
Fig. 4: Calculation of nine 4x4 intra-prediction modes
Fig. 4 shows how the nine intra prediction modes are designed in a directional
manner. Mode 0 is the vertical prediction mode in which pixels a, e, i, and m are
predicted by A and so on. Mode 1 is the horizontal prediction mode in which pixels a,
b, c, and d are predicted by I and so on. Mode 2 is called DC prediction in which all
pixels (a to p) are predicted by (A+B+C+D+I+J+K+L)/8. For modes 3-8, the
predicted samples are formed from a weighted average of the prediction samples A-M
as shown in figure. The encoder finally selects the best prediction mode for each
block using rate-distortion optimization (RDOPT) that minimizes the residual
between the encoded block X and its prediction P. The residue in all prediction modes
is exactly transformed, quantized, entropy coded, and, then entropy decoded,
de-quantized, inverse transformed as in conventional decoder and the decoded residue
is used to obtain the original block X’. Now the sum of absolute differences between
X and X’ gives the RDcost. With Intra_4x4 prediction, the prediction error may be
substantially smaller due to the small block size, although much higher number of
numerical operations is required because of the larger number of blocks and the
possibility of nine different directional predictions [3].
Since the choice of prediction modes for chroma components is independent to
that of luma components, H.264/AVC encodes the MB by iterating all the luma intra
decisions for each possible chroma intra prediction mode for the best coding
efficiency. Therefore the number of mode combinations for luma and chroma
components in an MB is M8 x (M4 x 16+M16), where M8, M4, and M16 represent the
number of modes for chroma, Intra_4x4 and Intra_16x16 predictions respectively. As
a result, for an MB, it has to perform 4 x (9 x 16+4) =592 different RDO calculations
before a best RDO mode is determined which makes the intra-prediction process one
of the computational bottlenecks in H.264/AVC encoder.
2.2. Current Approaches to fast Intra-Prediction
Broadly, the solution to the mentioned problem is attempted in two ways.
I. Reduce candidate modes: This approach tries to restrict the number of possible
candidate modes by extracting the correlation characteristics of the current block with
respect to its neighbor blocks through some efficient and simple computations [4-10].
234
4. II. Partial Cost function: This approach aims at arriving at a best mode through
only partial computation of the complex RD cost function instead of whole [11-14].
There is yet another possible classification of fast intra-Prediction algorithms
based upon the domain of operation.
2.2.1. Pixel Domain Techniques
Threshold based: In [4] [5], a threshold, based on neighboring costs, is used to
achieve early termination in the cost computation of different modes, which makes it
content dependent. A better attempt is made in [6], which overcomes the disadvantage
of thresholds, though it could bring down the number of candidate modes to only six.
Local edge direction based: In [7] [8], an edge direction histogram is used, which
introduces a lot of additional complexity in the form of pre-calculation cost. [9] [10]
propose a simpler computation of the dominant edge direction within a block.
However, the assumption of edge direction is not always true in every block. This
leaves more candidate modes for further search when no dominant edge is present.
2.2.2. Frequency Domain Techniques
In contrast, frequency domain techniques are based on the inherent capability of
Discrete Cosine Transform (DCT) bases to exploit the directional feature of the input
block naturally. Since the DCT coefficients are anyway calculated this does not add
any additional cost. [11] [12] cater to this argument though they differ in algorithms.
However, a careful analysis and experimentation on DCT bases (basic tile patterns
[13]) reveals that the above argument doesn’t hold good for all directional predictions
except the first few direct modes like 0, 1, 3 and 4. Also this fails completely if there
is no dominant edge degrading PSNR badly. In [14] a complete transform domain
cost function is presented. In [15-17] the correlation between optimal coding mode
decisions of temporally adjacent pictures is exploited to reduce the computational
load of the encoding, the computational saving is not good though.
The partial Cost computation is effective only in frequency domain as the DCT
can shift the intensity of a whole block to the first few pixels. However, in [18], a very
simple cost function is used to extract the local dominant edge direction.
3. PROPOSED ALGORITHM
A careful analysis of the intra prediction calculation process reveals that the main
reasoning on which most of the above methods are based is some-what imprecise.
These methods are based on the claim that the dominating direction of a smaller block
is similar to that of the bigger macro-block. Another implicit assumption is that the
directional correlation of each block is consistent with directions of the edges and that
the prediction modes of each block are also correlated with those of neighboring
modes. It is also interesting to note that most of these methods assume that the edge
direction is very uniform. Thus, they pre-calculate the cost function for only a few
modes like 0, 1, 3, and 4 and further evaluate directions like 5, 6, 7 and 8 if there is no
clear distinction in those costs. Consequently, no method has any cost calculation for
mode 2; it is either used all the time or is taken into account when the cost is almost
distributed equally within the first four directions.
The algorithm proposed here overcomes all of the shortcomings of the above
methods. Broadly speaking it is based on the spatial symmetry in the intra prediction
process and uses pixel to pixel correlation as the basis for obtaining a subset of
candidate modes in contrast to local edge correlation used in all the above methods.
This way it doesn’t assume a dominant edge in the block and has a separate and simple
pre-calculation cost for all modes including mode 2. Additionally, this method also
offers flexibility to trade off between quality and computational complexity whereby
235
5. we can choose the number of candidate modes based on the PSNR degradation we are
willing to accept. Effectively this method is termed as Partial Intra-prediction process.
Fig. 6: 2x2 Sub-blocks
To fully understand the proposed algorithm we first look at the inherent symmetry of
the Intra Prediction process. This symmetry is then utilized in developing low overhead
cost measures which allow us to substantially reduce the range of possible modes for
optimal prediction using Rate Distortion Optimization.
Fig. 5: Inherent Symmetry and Pixel Correlation in different intra-prediction modes
Mode 6
236
6. 3.1. Inherent Symmetry in Intra-prediction Modes
Let us take a closer look at Fig. 5 which depicts the inherent symmetry in the
various intra-prediction modes. Assume that this figure shows the prediction blocks of
all modes of the current block indexed with respective mode numbers. From this
representation the pixel to pixel correlation can be easily observed in each mode. For
example, in Mode 0- the Vertical prediction mode, due to the symmetry, we can see that
the pixels in each column will have the same predicted pixel value. Similarly, in Mode
1 – the Horizontal mode, the pixels in each row will have the same predicted value.
Subsequently, such symmetry observations can be extended to all the different modes.
Now, consider eight 2x2 sub-blocks in the original 4x4 block X at different
positions as shown in Fig.6. They are labeled as aj where j is an index to position of
sub-block. Let mij represent similar sub-block in ith
prediction mode at jth
position.We
may consider the same notation for the sub-block average also. Thus, for Mode 0 with
Vertical symmetry, we can see that m01 is same as m03, and, m02 is same as m04.
Similar symmetry exists in all modes as summarized in Table 1 below.
Table 1: Inherent Symmetry in Intra-prediction modes
Mode Symmetry Sub-Block Cost Measures
0 m01=m03; m02=m04 or m08=m06 |a1 - m01| + |a2 - m02| + |a3 - m01| + |a4 - m02|
1 m11=m12; m13=m14 or m15=m17 |a1 - m11| + |a3 - m12| + |a2 - m11| + |a4 - m12|
2 m21=m22=m23=m24 or m25=m26=m27=m28 |a1 - m21| + |a2 - m21| + |a3 - m21| + |a4 - m21|
3 m32=m33 ( |a2 - m31| + |a3 - m31| ) x 2
4 m41=m44 ( |a1 - m41| + |a4 - m41| ) x 2
5 m51=m56 or m58=m54 |a1 - m51| + |a6 - m51| + |a8 - m52| + |a4 - m52|
6 m61=m67 or m65=m64 |a1 - m61| + |a7 - m61| + |a5 - m62| + |a4 - m62|
7 m72=m76 or m78=m73 |a2 - m71| + |a6 - m71| + |a8 – m72| + |a3 - m72|
8 m85=m82 or m83=m87 |a5 - m81| + |a2 - m81| + |a3 - m82| + |a7 - m82|
3.2. Spatial domain feature
We use spatial domain feature to filter out unlikely mode candidates. For each
prediction mode, we can compute the sum of absolute differences (SAD) between the
true and predicted pixel values as a spatial domain feature.
3.3. Partial Prediction Process
Using the above symmetry in different prediction modes, from Table 1, we now try
to simplify the intra prediction process as follows. We try to obtain a close measure to
SAD cost function in each mode (which is just the sum of absolute differences
between16 pixels in X and P) in terms of those sub-blocks which come under symmetry
in respective prediction modes as listed in the Table 1 above.
In mode 0, the sum of absolute differences between a1 and m01; a2 and m02; a3 and
m03; a4 and m04 gives the cost. However since m01=m03 and m02=m04 in mode0 due to
the symmetry we can simply calculate the cost function as |a1 - m01| + |a2 - m02| + |a3 -
m01| + |a4 - m02|, where |x| stands for the absolute value for x. Similarly cost is
calculated in all other modes exploiting the existing symmetry in corresponding
prediction mode as listed out in Table 2. The pixels which are not covered under
symmetry (like in modes 3-8) are taken care of by replicating the pixel differences
that are covered under symmetry. The fact that the intensity variations within a block
are not so rapid from pixel to pixel and also that these neighboring pixels are similarly
predicted from the same adjacent pixels justifies the assumption.
237
7. 3.4. Pre-Calculation Costs
Note that all of the mij measures can be calculated from 13 pixels A-M from
neighboring blocks using the corresponding pixel values in respective positions in
each prediction block depending on the mode. The cost function is simplified by using
averages of sub-blocks instead of pixels directly. It is further simplified by the
symmetry observed in table 1, which brings down the number of ‘m’ measures that
are to be pre-computed in each mode. As an example mode 0 requires only m01 and
m02 to be computed as (A+B)/2 and (C+D)/2 respectively instead of 4 measures m01to
m04. Modes 2-4 require only one such measure to be computed. And rest of the modes
require atmost two such measures. So a total of 15 measures are required to be
computed in addition to 8 averages aj in the original block to calculate the cost
function for all modes. This is the total pre-calculation cost for a block. To further cut
down the pre-calculation cost, we avoid divisions by considering only additions of
pixels in sub-blocks rather than averages. This final pre-calculation cost is
insignificant compared to the computational gain achieved as discussed below.
3.5. Significant and Flexible configuration : Quality vs. Complexity Tradeoffs
The prediction mode giving the least cost can ideally be chosen as the best mode.
However to pay some price for our simplification without much loss in visual quality
we may take first few least cost modes in increasing order and obtain the best mode
using the RDO process. Our experimental results in section 4 suggest ‘3’ to be the
best selection for candidate modes (mode_count) that can give a good compromise
between computational saving and PSNR loss. This parameter ‘mode_count’ gives us
the control to trade-off between gain and loss efficiently. We may also use a threshold
to the cost differences in table 2 to choose the number of candidate modes, which will
give a non-integer value for average mode_count. Even a mode_count of 3 will
reduce the total RDO calculations to as low as 4 x (3 x 16 + 4) =208, which is just
about 35% of total computations. This is a computational saving by 65% ideally.
3.6. Criteria for further reducing ‘mode_count’
In [19], authors present a detailed analysis of mode filtering using SAD. We use
two of the reported results to further reduce the mode_count from 3 to 1.
1. If the smallest SAD cost value is lower than 50, we may directly use the least mode
with the probability of error lower than 10%. And the cumulative histogram shows
that approximately 38% of blocks fall in to this event. The error probability is defined
as ))(|(Pr))(|(Pr SADRDOSADSAD mfmmobmfeob ≠= (1)
Where SADm and RDOm are modes selected based on optimal SAD and RDO criteria
respectively and )( SADmf represents the SAD value of mode SADm .
2. Further we may consider higher modes if the corresponding mode cost is below a
certain threshold. This is formulated as below using a parameter T.
)|(Pr)|(Pr
1
12
SAD
SADSAD
TmmobTeob RDOSAD
−
=≠= (2)
For T larger than 38%, we can skip next modes with error probability less than 10%.
3.7. Summary of the Proposed Algorithm:
Let the input block be X and A to M be the pixels from neighboring blocks
available for prediction.
1. Calculate sub-block averages ai for current block X as in Fig. 6.
238
8. 2. Calculate averages mij in each mode from A-M pixels for those sub-blocks
covered under symmetry as listed in Table 1.
3. Calculate the cost measure of each mode as shown in Table 1.
4. Sort and pick up three or lesser candidate modes giving the least cost in
ascending order for further evaluation.
a. If least SAD is less than 50, choose only least mode; mode_count=1.
b. Further include second and third modes only if corresponding parameters
(T) exceed 0.38 as in 3.6. This will give a mode_count of 2 or 3 at most.
c. The most_probable_mode may also be added to the candidate modes list if
it was not already selected through pre-calculations.
5. Now do the exact RDO evaluation for these few candidate modes to obtain the
best prediction mode for the macroblock.
3.8. Advantages of this approach:
1. Significant computational saving (ideally 65%) with minimal PSNR loss.
2. Pre-calculation involves only additions and subtractions unlike many of earlier
methods which used complex calculations for edge detection.
3. Separate pre-calculation cost for mode 2 which no earlier method has.
4. No assumption of dominant edge in blocks. Instead of edge correlation, it
directly aims at pixel correlation. Hence overcomes many of previous shortcomings.
5. Flexibility to trade-off between computational gain and PSNR loss. In addition,
the most probable mode predicted from the modes of adjacent already predicted
modes may be added to candidate modes list if that mode is not already present.
4. EXPERIMENTAL RESULTS
The proposed method is first evaluated in Matlab through computer simulations in
comparison to reference FS H.264/AVC intra prediction with SAD cost function.
Standard gray scale test images like peppers, lena, goldhill, house, bird etc are used
for our simulations. Results are shown in Table 4.1 for the case of mode_count
exactly set to 3 and lesser than 3. The results show significant improvement compared
to fast three step method [6] and DCT coefficients method [12]. Both the methods
degrade PSNR badly almost by 0.5 to 1 dB for a little saving compared to SAD. The
proposed method achieves better match to SAD compared to both of them with much
lesser candidates and minimal PSNR loss.
Table 4.1: Partial Prediction Approach applied to test images
The proposed algorithm is then implemented in JM10.1 provided by [2]. In our
experiments RD Optimization is enabled, hadamard transform is used, initial QP used
is 28 and intra period is set to 1. 16x16 Intra prediction is also disabled. The test
FS with SAD Proposed (mode_count = 3) Proposed (mode_count < 3)Images
Modes PSNR Modes % Match dB loss MODES % MATCH dB loss
peppers.png 9 26.9100 3 91.5527 0.0241 2.6768 90.6250 0.1753
house.png 9 29.9157 3 81.4453 0.0606 2.7942 81.2500 0.0902
bird.gif 9 30.7278 3 82.0313 0.0302 2.8647 81.8604 0.0553
camera.gif 9 24.4249 3 84.4238 0.0705 2.7505 83.2764 0.3193
goldhill.gif 9 25.1679 3 87.5488 0.0837 2.7292 86.8164 0.1291
lena.gif 9 26.2605 3 89.7949 0.0626 2.6523 89.1846 0.1160
239
9. sequences include Foreman, Container, walk, news, salesman in format of QCIF
(4:2:0) and Foreman, Bus, Flower, Mobile, and Waterfall in format of CIF (4:2:0).
The length of each test sequence is 300 frames and all the frames are encoded using
I-frame coding. Tables (4.3 – 4.4) list the simulation results of the proposed algorithm
compared to full search in terms of three measures: PSNR loss in dB(∆dB), bit-rate
(∆B(%)) and computation time (∆T(%)) changes as a percentage. The simulations
are run on different sequences under the following four test conditions.
Proposed 1: with mode_count=3
Proposed 2: with mode_count<3, filtering using 4(b) in 3.7.
Proposed 3: with mode_count<3, filtering using 4(a) in 3.7.
Proposed 4: with mode_count<3, filtering using both.
Table 4.2: Partial Prediction Algorithm in JM 10.1 with QCIF sequences
4.1. Partial Prediction Algorithm in JM 10.1 with CIF sequences
The performance of our proposed algorithm is obvious from the presented results.
The number of candidate modes is reduced to the least compared to many of the fast
mode decision methods [4-7]. With three least modes selected on average, our method
achieves about 50% reductions in complexity with minimum PSNR loss of 0.06db on
average. The savings increased to 62% when further mode filtering is done with atmost
0.2 dB loss in PSNR. [4-7] reported savings less than 35%. Very little computation time
is lost in precalculations (5-15%) compared to most of edge direction histogram
methods [8-10] which consume considerable amount of total computations. Also the
argument that the presence of edge is not always guaranteed in every block explains the
significant loss in PSNR above 0.2dB in these methods apart from slower speedup. The
DCT domain techniques [11, 12] also significantly loose on quality due to the same
reason. The Matlab results measure the performance match of the proposed method
with FS reference method using SAD cost function. It is always above 80% with an
average PSNR loss of about 0.06 dB for mode_count = 3. Overall, the proposed method
gave better results compared to many earlier methods with significant computational
saving, minimal PSNR loss and bit-rate increment. The proposed method gives us the
advantage of complete control to trade-off between quality and computational saving.
Proposed1 (mode_count=3) Proposed4(Mode_count < 3)Sequences
∆T(%) ∆dB ∆B (%) ∆T(%) ∆dB ∆B (%)
foreman -49.89 -0.04 +4.37 -62.72 -0.15 +9.26
bus -51.23 -0.09 +2.89 -61.74 -0.18 +7.84
mobile -51.34 -0.14 +1.88 -60.76 -0.22 +4.58
flower -50.8 -0.12 +1.91 -60.04 -0.24 +4.25
waterfall -49.87 -0.1 +2.34 -58.7 -0.19 +6.41
Proposed 1 Proposed 2 Proposed 3 Proposed 4
Sequence
∆T(%) ∆dB ∆B(%) ∆T(%) ∆dB ∆B(%) ∆T(%) ∆dB ∆B(%) ∆T(%) ∆dB ∆B(%)
foreman -49.92 -0.04 +2.18 -57.78 -0.1 +6.32 -59.28 -0.14 +8.43 -62.93 -0.17 +9.72
container -49.62 -0.06 +3.85 -56.44 -0.11 +6.63 -58.84 -0.13 +9.4 -62.45 -0.16 +9.37
news -49.89 -0.08 +4.08 -56.86 -0.14 +6.73 -58.4 -0.16 +8.42 -62.08 -0.2 +9.77
saleman -49.84 -0.09 +3.07 -57.06 -0.14 +6.394 -58.06 -0.17 +8.64 -61.2 -0.2 +9.56
walk -49.9 -0.07 +3.32 -56.76 -0.14 +6.078 -57.14 -0.15 +7.55 -61.382 -0.19 +8.77
240
10. 5. CONCLUSIONS
In this paper, a fast intra-prediction mode decision method for H.264/AVC is
proposed based on the pixel to pixel correlation and the symmetry inherently existing
in the intra-prediction process, in contrast to the existing work which is based on the
local edge correlation to prediction mode direction assuming that each block carries a
dominant edge. So the proposed approach overcomes the shortcomings of the earlier
methods and gives much better quality for given bit-rates with minimum PSNR loss
as low as 0.05 db. It also therefore avoids the relatively complex computations
required for extracting the dominant edge like in most of earlier methods and limits
the pre-calculations to a set of additions and subtractions. There is significant
speed-up (50%) overall while maintaining similar bit-rate (3.5%) and PSNR (~0.1dB).
The results verify the success of the proposed method. Moreover the approach offers
flexibility to the user to trade-off between quality and computational complexity.
6. REFERENCES
1. Information Technology - Coding of Audio-Visual Objects - Part 10: Advanced Video Coding,
Final Draft International Standard, ISO/IEC FDIS 14496-10, March 2005.
2. JVT Reference Software JM10.1, http://iphome.hhi.de/suehring/tml/
3. Iain E. G. Richardson, H.264 and MPEG-4 Video Compression – Video Coding for
Next-generation Multimedia, John Wiley & Sons, ISBN: 978-0-470-84837-1, 2003.
4. B. Meng, and O.C. Au, “Fast Intra-prediction Mode Selection for 4x4 blocks in H.264,” Proc. of
2003 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP03, April 2003.
5. B. Meng, O.C. Au, C-W. Wong and H-K. Lam, “Efficient Intra-prediction Mode Selection for 4x4
blocks in H.264,” Proc. of IEEE Int. Conf. on Multimedia and Expo, ICME03, vol. 3, July 2003.
6. C-C. Cheng and T-S. Chang, “Fast Three Step Intra Prediction Algorithm for 4x4 blocks in H.264,”
Proc. of IEEE International Symposium on Circuits and Systems ISCAS 2005, May 2005.
7. F. Pan, et al., “Fast Mode Decision for Intra Prediction,” JVT-G013, 7th Meeting, Pattaya II,
Thailand, 7-14 March, 2003.
8. F. Pan, et al., “A Directional Field Based Fast Intra Mode Decision Algorithm for H.264 Video
Coding,” Proc. of 2004 IEEE Int. Conf. Multimedia & Expo, ICME04, 2004.
9. Y-D. Zhang, F. Dai, and S-X. Lin, “Fast 4x4 Intra-prediction Mode Selection for H.264,” Proc. of
2004 IEEE Int. Conf. Multimedia & Expo, ICME04, 2004.
10. J-F. Wang, et al., “A Novel Fast Algorithm for Intra Mode Decision in H.264/AVC Encoders,”
Proc. of 2006 IEEE Int. Symposium on Circuits and Systems, ISCAS 2006, May 2006.
11. T. Tsukuba, et al., “H.264 Fast Intra-prediction Mode Decision Based on Frequency
Characteristic,” Proc. of European Signal Processing Conf. EUSIPCO2005, 2005.
12. T. Hattori and K. Ichige, “Intra-prediction Mode Decision in H.264/AVC using DCT
Coefficients,” Proc. Int. Symposium on Intelligent Signal Processing & Communication Systems,
ISPACS 2006, Dec 2006.
13. B. Shen and I.K. Sethi, “Direct Feature Extraction from Compressed Images,” Proc. IS&T/SPIE
Conf. Storage and Retrieval for Image and Video Databases, Jan 1996.
14. J. Xin, A. Vetro and H. Sun, “Efficient Macroblock Coding-mode Decision for H.264/AVC Video
Coding,” Proc. of the 24th Picture Coding Symposium PCS '04, December 2004.
15. X. Lu, et al., “Fast Mode Decision and Motion Estimation for H.264 with a Focus on
MPEG-2/H.264 Transcoding,” Proc. of 2005 IEEE International Symposium on Circuits and
Systems ISCAS 2005, vol. 2, pp. 1246–1249, Kobe, Japan, May 2005.
16. M-C. Hwang, et al., “Fast Intra Prediction Mode Selection Scheme Using Temporal Correlation in
H.264,” Proc. IEEE TENCON 05, Region 10, Nov 2005.
17. C-C. Lien and C-P. Yu, “A Fast Mode Decision Method for H.264/AVC Using the
Spatial-Temporal Prediction Scheme,” Proc. of 18th
IEEE Int. Conf. on Pattern Recognition ICPR
'06, Volume 04, pp. 334-337, August 2006.
18. A Elyousfi, et al., “Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video
Coding,” ICJSN S Int. Journal of Computer Sc. and Network Security, Vol. 7, No. 1, Jan 2007.
19. C Kim, H Shih, C J Kuo, “Fast H.264 Intra prediction Mode selection using Joint spatial and
Transform domain features”, Elsevier Science, 25 April 2004.
241