IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...idescitation
The lifting scheme based Discrete Wavelet
Transform is a powerful tool for image processing
applications. The lack of disk space during transmission and
storage of images pushes the demand for high speed
implementation of efficient compression technique. This paper
proposes a highly pipelined and distributed VLSI architecture
of lifting based 2D DWT with lifting coefficients represented
in fixed point [2:14] format. Compared to conventional
architectures [11], [13]-[16], the proposed highly pipelined
architecture optimizes the design which increases
significantly the performance speed. The design raises the
operating frequency, at the expense of more hardware area.
In this paper, initially a software model of the proposed design
was developed using MATLAB ®. Corresponding to this
software model, an efficient highly parallel pipelined
architecture was designed and developed using verilog HDL
language and implemented in VIRTEX ® 6 (XC6VHX380T)
FPGA. Also the design was synthesized on TSMC 0.18μm
ASIC Library by using Synopsis Design Compiler. The entire
system is suitable for several real time applications.
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...VLSICS Design
Evaluating the previous work is an important part of developing new hardware efficient methods for the implementation of DWT through Lifting schemes. The aim of this paper is to give a review of VLSI architectures for efficient hardware implementation of wavelet lifting schemes. The inherent in place computation of lifting scheme has many advantages over conventional convolution based DWT. The architectures are represented in terms of parallel filter, row column, folded, flipping and recursive structures. The methods for scanning of images are the line-based and the block-based and their characteristics for the given application are given. The various architectures are analyzed in terms of hardware and timing complexity involved with the given size of input image and required levels of decomposition. This study is useful for deriving an efficient method for improving the speed and hardware complexities of existing architectures and to design a new hardware implementation of multilevel DWT using lifting schemes.
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
This document describes a proposed low-power CORDIC-based DCT architecture that prioritizes processing of low-frequency DCT coefficients over high-frequency coefficients to reduce power consumption with minimal image quality degradation. It uses a look-ahead CORDIC approach to allow varying the number of CORDIC iterations for different coefficients. Experimental results show the proposed architecture achieves 38.1% area and power savings compared to DA-based DCT, with comparable power to MCM-based DCT but using 100% less area and a minor 0.04dB quality loss.
High Speed and Area Efficient 2D DWT Processor Based Image Compressionsipij
The document describes a proposed high speed and area efficient 2D discrete wavelet transform (DWT) processor design for image compression applications implemented on FPGAs. The design uses a pipelined partially serial architecture to enhance speed while optimally utilizing FPGA resources. Simulation results show the design operating at 231MHz on a Spartan 3 FPGA, a 15% improvement over alternative designs. Resource utilization and speed are improved compared to previous implementations through the optimized DWT processor architecture and FPGA platform choice.
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...Serhan
The document presents a thesis on implementing and optimizing cache-aware time-skewing algorithms for FDTD kernels to reduce cache misses and processor idle time. The main goals were to generate and validate 1D and 2D FDTD codes, analyze data dependencies and loop iterations, find optimal tiling and skewing, and measure improvements in cache profiling and execution time from applying these optimizations. The results demonstrated enhancements over naive FDTD implementations and validated the effectiveness of the proposed cache-aware algorithms and time-skewing techniques.
This document compares different architectures for implementing the discrete cosine transform (DCT) in an image compression system for low power applications. It summarizes four architectures: a baseline 2D DCT architecture, a row/column distributed arithmetic approach, a fully pipelined architecture, and analyzes their power consumption and speed. The row/column approach provides a 24.4% power savings over the baseline, while the fully pipelined architecture provides 16.4% savings. The fully pipelined architecture also achieves the highest throughput of 4.703 GHz.
This document describes an implementation of fast image convolution using Winograd's minimal filtering algorithm for 3x3 filters. The implementation combines C code with BLAS calls for GEMM. It is optimized for Intel Xeon Phi processors and uses Intel MKL for BLAS calls. Benchmarking shows the implementation achieves 10% greater overall performance than MKL convolution and can be up to 1.5x faster for some layers and up to 4x slower for others, indicating potential for a hybrid approach. High-bandwidth memory on Intel Xeon Phi significantly improves efficiency of fast convolution.
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...idescitation
The lifting scheme based Discrete Wavelet
Transform is a powerful tool for image processing
applications. The lack of disk space during transmission and
storage of images pushes the demand for high speed
implementation of efficient compression technique. This paper
proposes a highly pipelined and distributed VLSI architecture
of lifting based 2D DWT with lifting coefficients represented
in fixed point [2:14] format. Compared to conventional
architectures [11], [13]-[16], the proposed highly pipelined
architecture optimizes the design which increases
significantly the performance speed. The design raises the
operating frequency, at the expense of more hardware area.
In this paper, initially a software model of the proposed design
was developed using MATLAB ®. Corresponding to this
software model, an efficient highly parallel pipelined
architecture was designed and developed using verilog HDL
language and implemented in VIRTEX ® 6 (XC6VHX380T)
FPGA. Also the design was synthesized on TSMC 0.18μm
ASIC Library by using Synopsis Design Compiler. The entire
system is suitable for several real time applications.
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...VLSICS Design
Evaluating the previous work is an important part of developing new hardware efficient methods for the implementation of DWT through Lifting schemes. The aim of this paper is to give a review of VLSI architectures for efficient hardware implementation of wavelet lifting schemes. The inherent in place computation of lifting scheme has many advantages over conventional convolution based DWT. The architectures are represented in terms of parallel filter, row column, folded, flipping and recursive structures. The methods for scanning of images are the line-based and the block-based and their characteristics for the given application are given. The various architectures are analyzed in terms of hardware and timing complexity involved with the given size of input image and required levels of decomposition. This study is useful for deriving an efficient method for improving the speed and hardware complexities of existing architectures and to design a new hardware implementation of multilevel DWT using lifting schemes.
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
This document describes a proposed low-power CORDIC-based DCT architecture that prioritizes processing of low-frequency DCT coefficients over high-frequency coefficients to reduce power consumption with minimal image quality degradation. It uses a look-ahead CORDIC approach to allow varying the number of CORDIC iterations for different coefficients. Experimental results show the proposed architecture achieves 38.1% area and power savings compared to DA-based DCT, with comparable power to MCM-based DCT but using 100% less area and a minor 0.04dB quality loss.
High Speed and Area Efficient 2D DWT Processor Based Image Compressionsipij
The document describes a proposed high speed and area efficient 2D discrete wavelet transform (DWT) processor design for image compression applications implemented on FPGAs. The design uses a pipelined partially serial architecture to enhance speed while optimally utilizing FPGA resources. Simulation results show the design operating at 231MHz on a Spartan 3 FPGA, a 15% improvement over alternative designs. Resource utilization and speed are improved compared to previous implementations through the optimized DWT processor architecture and FPGA platform choice.
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...Serhan
The document presents a thesis on implementing and optimizing cache-aware time-skewing algorithms for FDTD kernels to reduce cache misses and processor idle time. The main goals were to generate and validate 1D and 2D FDTD codes, analyze data dependencies and loop iterations, find optimal tiling and skewing, and measure improvements in cache profiling and execution time from applying these optimizations. The results demonstrated enhancements over naive FDTD implementations and validated the effectiveness of the proposed cache-aware algorithms and time-skewing techniques.
This document compares different architectures for implementing the discrete cosine transform (DCT) in an image compression system for low power applications. It summarizes four architectures: a baseline 2D DCT architecture, a row/column distributed arithmetic approach, a fully pipelined architecture, and analyzes their power consumption and speed. The row/column approach provides a 24.4% power savings over the baseline, while the fully pipelined architecture provides 16.4% savings. The fully pipelined architecture also achieves the highest throughput of 4.703 GHz.
This document describes an implementation of fast image convolution using Winograd's minimal filtering algorithm for 3x3 filters. The implementation combines C code with BLAS calls for GEMM. It is optimized for Intel Xeon Phi processors and uses Intel MKL for BLAS calls. Benchmarking shows the implementation achieves 10% greater overall performance than MKL convolution and can be up to 1.5x faster for some layers and up to 4x slower for others, indicating potential for a hybrid approach. High-bandwidth memory on Intel Xeon Phi significantly improves efficiency of fast convolution.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
The document discusses efficient VLSI implementations of image encryption using minimal operations. It proposes using discrete cosine transform (DCT) for image compression and encryption simultaneously. For encryption, a linear feedback shift register generates random numbers added to some DCT outputs. The DCT algorithm and arithmetic operators are optimized to reduce operations and increase throughput. Simulation results show encryption in the frequency domain at 656 million samples per second on an 82 MHz clock.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document discusses test data compression techniques for system-on-chip designs. It presents the XMatchPro algorithm that combines dictionary-based and bit mask-based compression to significantly reduce testing time and memory requirements. The algorithm was applied to benchmarks and achieved a 92% compression efficiency while improving decompression efficiency by up to 90% compared to other techniques without additional overhead. International lossless compression techniques are also overviewed, including dictionary-based Lempel-Ziv methods that have been implemented in hardware using systolic arrays or content addressable memories.
This document proposes a multi-application multi-step mapping method for mapping multiple applications simultaneously onto a many-core Network-on-Chip (NoC). The method consists of two steps: 1) an application mapping step that finds a region on the NoC for each application using maximal empty rectangle techniques, and 2) a task mapping step that maps the tasks of each application within its region to minimize communication latency and energy consumption. The method aims to optimize the layout of applications and tasks to reduce network latency and energy usage for multi-application mapping on many-core NoCs.
HIGH SPEED REVERSE CONVERTER FOR HIGH DYNAMIC RANGE MODULI SETP singh
In this paper a new reverse converter architecture for the five moduli set {2n, 2n/2-1, 2n/2+1, 2n+1, 22n-1-1} is presented. The proposed converter is designed in two levels architecture by using of New Chinese Reminder Theorem-I (New CRT-I) and Mixed Radix Conversion (MRC). The proposed architecture has achieved significant improvement in terms of delay of the reverse converter compared to state-of-the-art reverse converters.
This document presents a new algorithm for progressive medical image coding using binary wavelet transforms (BWT). It divides grayscale medical images into binary bit-planes and applies a three-level BWT to each bit-plane. It then encodes each BWT bit-plane using quadtree-based partitioning to exploit the energy concentration in high-frequency subbands. Experiments on ultrasound, MRI and CT images show it provides significant improvements in bitrate for required quality compared to existing progressive image coding methods.
This document summarizes a research paper on implementing a discrete cosine transform (DCT) core using distributed arithmetic with an error-compensated adder tree to improve accuracy and throughput. It introduces distributed arithmetic as an alternative to multipliers to reduce hardware costs in DCT designs. Previous work achieved higher speeds by parallelizing shifting and addition but incurred large truncation errors. The proposed error-compensated adder tree operates shifting and addition in parallel while compensating for truncation errors. It allows using 9-bit precision instead of 12-bit to meet accuracy requirements, reducing hardware costs. The design achieves a throughput of 1 billion pixels per second with a gate count of 22,200 gates.
34 8951 suseela g suseela paper8 (edit)newIAESIJEECS
The document proposes a low memory and energy efficient image compression technique called Skipped Zonal based Binary DCT (SZBinDCT) for transmitting images over resource constrained visual sensor networks. SZBinDCT first removes alternating rows and columns of pixels to reduce redundancy. It then applies a low complexity Binary DCT transform to the remaining pixels. Only the top left zone of transformed coefficients containing most energy are considered. These coefficients are quantized and entropy encoded using Golomb Rice coding. Simulation results show the technique compresses images to 0.39 bpp with 27.34 dB PSNR while consuming only 11.352 microjoules of energy per 8x8 block, significantly less than other techniques. The low complexity compression allows extended
33 8951 suseela g suseela paper8 (edit)new2IAESIJEECS
Modern manufacturing methods permit the study and prediction of surface roughness since the acquisition of signals and its processing is made instantaneously. With the availability of better computing facilities and newer algorithms in the machine learning domain, online surface roughness prediction will lead to the manufacture of intelligent machines that alert the operator when the process crosses the specified range of roughness. Prediction of surface roughness by multiple linear regression, regression tree and M5P tree methods using multivariable predictors and a single response dependent variable Ra (surface roughness) is attempted. Vibration signal from the boring operation has been acquired for the study that predicts the surface roughness on the inner face of the workpiece. A machine learning approach was used to extract the statistical features and analyzed by four different cases to achieve higher predictability, higher accuracy, low computing effort and reduction of the root mean square error. One case among them was carried out upon feature reduction using Principle Component Analysis (PCA) to examine the effect of feature reduction.
Image compression using Hybrid wavelet Transform and their Performance Compa...IJMER
Images may be worth a thousand words, but they generally occupy much more space in hard disk, or
bandwidth in a transmission system, than their proverbial counterpart. Compressing an image is significantly
different than compressing raw binary data. Of course, general purpose compression programs can be used to
compress images, but the result is less than optimal. This is because images have certain statistical properties
which can be exploited by encoders specifically designed for them. Also, some of the finer details in the image
can be sacrificed for the sake of saving a little more bandwidth or storage space. Compression is the process of
representing information in a compact form. Compression is a necessary and essential method for creating
image files with manageable and transmittable sizes. The data compression schemes can be divided into
lossless and lossy compression. In lossless compression, reconstructed image is exactly same as compressed
image. In lossy image compression, high compression ratio is achieved at the cost of some error in reconstructed
image. Lossy compression generally provides much higher compression than lossless compression.
Dynamic Texture Coding using Modified Haar Wavelet with CUDAIJERA Editor
Texture is an image having repetition of patterns. There are two types, static and dynamic texture. Static texture is an image having repetitions of patterns in the spatial domain. Dynamic texture is number of frames having repetitions in spatial and temporal domain. This paper introduces a novel method for dynamic texture coding to achieve higher compression ratio of dynamic texture using 2D-modified Haar wavelet transform. The dynamic texture video contains high redundant parts in spatial and temporal domain. Redundant parts can be removed to achieve high compression ratios with better visual quality. The modified Haar wavelet is used to exploit spatial and temporal correlations amongst the pixels. The YCbCr color model is used to exploit chromatic components as HVS is less sensitive to chrominance. To decrease the time complexity of algorithm parallel programming is done using CUDA (Compute Unified Device Architecture). GPU contains the number of cores as compared to CPU, which is utilized to reduce the time complexity of algorithms.
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is able to mesh automatically the simulation domain according to the propagation of fluids. This method can also be useful in order to perform several types of physical simulations. In this paper, we associate this
algorithm with a multiphase and multicomponent lattice Boltzmann model (MPMC–LBM) because it is
able to perform various types of simulations on complex geometries. The use of this algorithm combined
with the massive parallelism of GPUs[5] allows to obtain very good performance in comparison with the
staticmesh method used in literature. Several simulations are shown in order to evaluate the algorithm.
A High Performance Modified SPIHT for Scalable Image CompressionCSCJournals
In this paper, we present a novel extension technique to the Set Partitioning in Hierarchical Trees (SPIHT) based image compression with spatial scalability. The present modification and the preprocessing techniques provide significantly better quality (both subjectively and objectively) reconstruction at the decoder with little additional computational complexity. There are two proposals for this paper. Firstly, we propose a pre-processing scheme, called Zero-Shifting, that brings the spatial values in signed integer range without changing the dynamic ranges, so that the transformed coefficient calculation becomes more consistent. For that reason, we have to modify the initialization step of the SPIHT algorithms. The experiments demonstrate a significant improvement in visual quality and faster encoding and decoding than the original one. Secondly, we incorporate the idea to facilitate resolution scalable decoding (not incorporated in original SPIHT) by rearranging the order of the encoded output bit stream. During the sorting pass of the SPIHT algorithm, we model the transformed coefficient based on the probability of significance, at a fixed threshold of the offspring. Calling it a fixed context model and generating a Huffman code for each context, we achieve comparable compression efficiency to that of arithmetic coder, but with much less computational complexity and processing time. As far as objective quality assessment of the reconstructed image is concerned, we have compared our results with popular Peak Signal to Noise Ratio (PSNR) and with Structural Similarity Index (SSIM). Both these metrics show that our proposed work is an improvement over the original one.
Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on ...IDES Editor
Haar discrete wavelet transform (DWT), the
simplest among all DWTs, has diverse applications in signal
and image processing fields. A traditional approach for 2D
Haar DWT is 1D row operation followed by and 1D column
operation. In 2002, Chen and Liao presented a fast algorithm
for 2D Haar DWT based on segmented matrix. However, this
method is infeasible for its high computational requirements
for processing large sized images. In this paper, we have
implemented the segmented matrix algorithm on a low cost
NVIDIA’s GPU to achieve speedup in computation. The
efficiency of our GPU based implementation is measured and
compared with CPU based algorithms. Our experimental
results show performance improvement over a factor of 28.5
compared with Chen and Liao’s CPU based segmented matrix
algorithm and a factor of 8 compared to MATLAB’s wavelet
function for an image of size 2560×2560.
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
1) O Parque Nacional de Itatiaia é o primeiro parque nacional do Brasil, fundado em 1937 para proteger a biodiversidade da região que abriga a Floresta Atlântica e ecossistemas de altitude.
2) O parque possui trilhas, um centro de visitantes com exposições sobre a fauna, flora e história da região, além de realizar atividades educativas.
3) As escolas podem agendar visitas guiadas para conhecer melhor o bioma Mata Atlântica por meio desta importante unidade de conservação.
La estrategia en redes sociales, una clave de internacionalización #enredate...Soluciona Facil
La estrategia en redes sociales, una clave de internacionalización #enredatecs2014 Enredate Castellón 2014 por David Martinez Calduch @davidmacalduch de Soluciona Facil
El documento presenta información sobre cuatro emprendimientos colombianos: Eco Steam, Moda Ecológica, AdRiva y el emprendimiento de Adriana Valderrama. Eco Steam ofrece servicios de limpieza ecológica y fue fundada por Jorge Martínez y Juan Camilo Agudelo en 2020. Moda Ecológica, creada por Olga Lucía García en 2009, diseña ropa y accesorios con materiales reciclados. AdRiva produce y vende accesorios de cuero artesanales y fue iniciada por una emp
Disneyland en Anaheim, California, se inauguró en 1955 y consta de siete áreas temáticas originales como Main Street, U.S.A., Adventureland, Frontierland, Fantasyland y Tomorrowland. Más tarde se añadieron áreas como New Orleans Square y Critter Country. El parque Walt Disney World en Florida, inaugurado en 1971, cuenta con varios hoteles temáticos propiedad de Walt Disney Parks and Resorts que se encuentran cerca de los cuatro parques temáticos principales.
1. O documento apresenta os resultados da pesquisa CNI-IBOPE sobre a avaliação do governo e intenções de voto para as eleições presidenciais de 2014.
2. A avaliação do governo Dilma caiu de 31% considerando-o ótimo ou bom em junho, enquanto 33% o consideram ruim ou péssimo.
3. Na intenção de voto espontânea, Dilma lidera com 25%, Aécio tem 11% e Eduardo Campos 4%. Na estimulada, Dilma tem 39%, Aécio 21%
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
The document discusses efficient VLSI implementations of image encryption using minimal operations. It proposes using discrete cosine transform (DCT) for image compression and encryption simultaneously. For encryption, a linear feedback shift register generates random numbers added to some DCT outputs. The DCT algorithm and arithmetic operators are optimized to reduce operations and increase throughput. Simulation results show encryption in the frequency domain at 656 million samples per second on an 82 MHz clock.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document discusses test data compression techniques for system-on-chip designs. It presents the XMatchPro algorithm that combines dictionary-based and bit mask-based compression to significantly reduce testing time and memory requirements. The algorithm was applied to benchmarks and achieved a 92% compression efficiency while improving decompression efficiency by up to 90% compared to other techniques without additional overhead. International lossless compression techniques are also overviewed, including dictionary-based Lempel-Ziv methods that have been implemented in hardware using systolic arrays or content addressable memories.
This document proposes a multi-application multi-step mapping method for mapping multiple applications simultaneously onto a many-core Network-on-Chip (NoC). The method consists of two steps: 1) an application mapping step that finds a region on the NoC for each application using maximal empty rectangle techniques, and 2) a task mapping step that maps the tasks of each application within its region to minimize communication latency and energy consumption. The method aims to optimize the layout of applications and tasks to reduce network latency and energy usage for multi-application mapping on many-core NoCs.
HIGH SPEED REVERSE CONVERTER FOR HIGH DYNAMIC RANGE MODULI SETP singh
In this paper a new reverse converter architecture for the five moduli set {2n, 2n/2-1, 2n/2+1, 2n+1, 22n-1-1} is presented. The proposed converter is designed in two levels architecture by using of New Chinese Reminder Theorem-I (New CRT-I) and Mixed Radix Conversion (MRC). The proposed architecture has achieved significant improvement in terms of delay of the reverse converter compared to state-of-the-art reverse converters.
This document presents a new algorithm for progressive medical image coding using binary wavelet transforms (BWT). It divides grayscale medical images into binary bit-planes and applies a three-level BWT to each bit-plane. It then encodes each BWT bit-plane using quadtree-based partitioning to exploit the energy concentration in high-frequency subbands. Experiments on ultrasound, MRI and CT images show it provides significant improvements in bitrate for required quality compared to existing progressive image coding methods.
This document summarizes a research paper on implementing a discrete cosine transform (DCT) core using distributed arithmetic with an error-compensated adder tree to improve accuracy and throughput. It introduces distributed arithmetic as an alternative to multipliers to reduce hardware costs in DCT designs. Previous work achieved higher speeds by parallelizing shifting and addition but incurred large truncation errors. The proposed error-compensated adder tree operates shifting and addition in parallel while compensating for truncation errors. It allows using 9-bit precision instead of 12-bit to meet accuracy requirements, reducing hardware costs. The design achieves a throughput of 1 billion pixels per second with a gate count of 22,200 gates.
34 8951 suseela g suseela paper8 (edit)newIAESIJEECS
The document proposes a low memory and energy efficient image compression technique called Skipped Zonal based Binary DCT (SZBinDCT) for transmitting images over resource constrained visual sensor networks. SZBinDCT first removes alternating rows and columns of pixels to reduce redundancy. It then applies a low complexity Binary DCT transform to the remaining pixels. Only the top left zone of transformed coefficients containing most energy are considered. These coefficients are quantized and entropy encoded using Golomb Rice coding. Simulation results show the technique compresses images to 0.39 bpp with 27.34 dB PSNR while consuming only 11.352 microjoules of energy per 8x8 block, significantly less than other techniques. The low complexity compression allows extended
33 8951 suseela g suseela paper8 (edit)new2IAESIJEECS
Modern manufacturing methods permit the study and prediction of surface roughness since the acquisition of signals and its processing is made instantaneously. With the availability of better computing facilities and newer algorithms in the machine learning domain, online surface roughness prediction will lead to the manufacture of intelligent machines that alert the operator when the process crosses the specified range of roughness. Prediction of surface roughness by multiple linear regression, regression tree and M5P tree methods using multivariable predictors and a single response dependent variable Ra (surface roughness) is attempted. Vibration signal from the boring operation has been acquired for the study that predicts the surface roughness on the inner face of the workpiece. A machine learning approach was used to extract the statistical features and analyzed by four different cases to achieve higher predictability, higher accuracy, low computing effort and reduction of the root mean square error. One case among them was carried out upon feature reduction using Principle Component Analysis (PCA) to examine the effect of feature reduction.
Image compression using Hybrid wavelet Transform and their Performance Compa...IJMER
Images may be worth a thousand words, but they generally occupy much more space in hard disk, or
bandwidth in a transmission system, than their proverbial counterpart. Compressing an image is significantly
different than compressing raw binary data. Of course, general purpose compression programs can be used to
compress images, but the result is less than optimal. This is because images have certain statistical properties
which can be exploited by encoders specifically designed for them. Also, some of the finer details in the image
can be sacrificed for the sake of saving a little more bandwidth or storage space. Compression is the process of
representing information in a compact form. Compression is a necessary and essential method for creating
image files with manageable and transmittable sizes. The data compression schemes can be divided into
lossless and lossy compression. In lossless compression, reconstructed image is exactly same as compressed
image. In lossy image compression, high compression ratio is achieved at the cost of some error in reconstructed
image. Lossy compression generally provides much higher compression than lossless compression.
Dynamic Texture Coding using Modified Haar Wavelet with CUDAIJERA Editor
Texture is an image having repetition of patterns. There are two types, static and dynamic texture. Static texture is an image having repetitions of patterns in the spatial domain. Dynamic texture is number of frames having repetitions in spatial and temporal domain. This paper introduces a novel method for dynamic texture coding to achieve higher compression ratio of dynamic texture using 2D-modified Haar wavelet transform. The dynamic texture video contains high redundant parts in spatial and temporal domain. Redundant parts can be removed to achieve high compression ratios with better visual quality. The modified Haar wavelet is used to exploit spatial and temporal correlations amongst the pixels. The YCbCr color model is used to exploit chromatic components as HVS is less sensitive to chrominance. To decrease the time complexity of algorithm parallel programming is done using CUDA (Compute Unified Device Architecture). GPU contains the number of cores as compared to CPU, which is utilized to reduce the time complexity of algorithms.
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
In this paper, a new progressive mesh algorithm is introduced in order to perform fast physical simulations by the use of a lattice Boltzmann method (LBM) on a single-node multi-GPU architecture. This algorithm is able to mesh automatically the simulation domain according to the propagation of fluids. This method can also be useful in order to perform several types of physical simulations. In this paper, we associate this
algorithm with a multiphase and multicomponent lattice Boltzmann model (MPMC–LBM) because it is
able to perform various types of simulations on complex geometries. The use of this algorithm combined
with the massive parallelism of GPUs[5] allows to obtain very good performance in comparison with the
staticmesh method used in literature. Several simulations are shown in order to evaluate the algorithm.
A High Performance Modified SPIHT for Scalable Image CompressionCSCJournals
In this paper, we present a novel extension technique to the Set Partitioning in Hierarchical Trees (SPIHT) based image compression with spatial scalability. The present modification and the preprocessing techniques provide significantly better quality (both subjectively and objectively) reconstruction at the decoder with little additional computational complexity. There are two proposals for this paper. Firstly, we propose a pre-processing scheme, called Zero-Shifting, that brings the spatial values in signed integer range without changing the dynamic ranges, so that the transformed coefficient calculation becomes more consistent. For that reason, we have to modify the initialization step of the SPIHT algorithms. The experiments demonstrate a significant improvement in visual quality and faster encoding and decoding than the original one. Secondly, we incorporate the idea to facilitate resolution scalable decoding (not incorporated in original SPIHT) by rearranging the order of the encoded output bit stream. During the sorting pass of the SPIHT algorithm, we model the transformed coefficient based on the probability of significance, at a fixed threshold of the offspring. Calling it a fixed context model and generating a Huffman code for each context, we achieve comparable compression efficiency to that of arithmetic coder, but with much less computational complexity and processing time. As far as objective quality assessment of the reconstructed image is concerned, we have compared our results with popular Peak Signal to Noise Ratio (PSNR) and with Structural Similarity Index (SSIM). Both these metrics show that our proposed work is an improvement over the original one.
Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on ...IDES Editor
Haar discrete wavelet transform (DWT), the
simplest among all DWTs, has diverse applications in signal
and image processing fields. A traditional approach for 2D
Haar DWT is 1D row operation followed by and 1D column
operation. In 2002, Chen and Liao presented a fast algorithm
for 2D Haar DWT based on segmented matrix. However, this
method is infeasible for its high computational requirements
for processing large sized images. In this paper, we have
implemented the segmented matrix algorithm on a low cost
NVIDIA’s GPU to achieve speedup in computation. The
efficiency of our GPU based implementation is measured and
compared with CPU based algorithms. Our experimental
results show performance improvement over a factor of 28.5
compared with Chen and Liao’s CPU based segmented matrix
algorithm and a factor of 8 compared to MATLAB’s wavelet
function for an image of size 2560×2560.
Matlab Implementation of Baseline JPEG Image Compression Using Hardware Optim...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
1) O Parque Nacional de Itatiaia é o primeiro parque nacional do Brasil, fundado em 1937 para proteger a biodiversidade da região que abriga a Floresta Atlântica e ecossistemas de altitude.
2) O parque possui trilhas, um centro de visitantes com exposições sobre a fauna, flora e história da região, além de realizar atividades educativas.
3) As escolas podem agendar visitas guiadas para conhecer melhor o bioma Mata Atlântica por meio desta importante unidade de conservação.
La estrategia en redes sociales, una clave de internacionalización #enredate...Soluciona Facil
La estrategia en redes sociales, una clave de internacionalización #enredatecs2014 Enredate Castellón 2014 por David Martinez Calduch @davidmacalduch de Soluciona Facil
El documento presenta información sobre cuatro emprendimientos colombianos: Eco Steam, Moda Ecológica, AdRiva y el emprendimiento de Adriana Valderrama. Eco Steam ofrece servicios de limpieza ecológica y fue fundada por Jorge Martínez y Juan Camilo Agudelo en 2020. Moda Ecológica, creada por Olga Lucía García en 2009, diseña ropa y accesorios con materiales reciclados. AdRiva produce y vende accesorios de cuero artesanales y fue iniciada por una emp
Disneyland en Anaheim, California, se inauguró en 1955 y consta de siete áreas temáticas originales como Main Street, U.S.A., Adventureland, Frontierland, Fantasyland y Tomorrowland. Más tarde se añadieron áreas como New Orleans Square y Critter Country. El parque Walt Disney World en Florida, inaugurado en 1971, cuenta con varios hoteles temáticos propiedad de Walt Disney Parks and Resorts que se encuentran cerca de los cuatro parques temáticos principales.
1. O documento apresenta os resultados da pesquisa CNI-IBOPE sobre a avaliação do governo e intenções de voto para as eleições presidenciais de 2014.
2. A avaliação do governo Dilma caiu de 31% considerando-o ótimo ou bom em junho, enquanto 33% o consideram ruim ou péssimo.
3. Na intenção de voto espontânea, Dilma lidera com 25%, Aécio tem 11% e Eduardo Campos 4%. Na estimulada, Dilma tem 39%, Aécio 21%
Material de apoyo clase 4 para el análisis de riesgoUPTM
Este documento presenta un catálogo de posibles amenazas a los activos de un sistema de información, incluyendo desastres naturales, fallos técnicos, errores humanos y ataques intencionados. Para cada amenaza se describen los tipos de activos afectados, las dimensiones de seguridad comprometidas como disponibilidad e integridad, y una breve explicación. El documento también incluye escalas para valorar el impacto de los riesgos identificados.
Este documento contiene una lista de 22 nombres de estudiantes en una clase. Los nombres están organizados alfabéticamente sin más detalles proporcionados sobre cada estudiante. El documento comienza y termina con marcadores para indicar el inicio y fin de la lista de nombres.
Este documento describe la amistad como una de las relaciones humanas más hermosas que se puede formar a cualquier edad y entre personas de diferentes orígenes. Define la amistad como una relación de mutua benevolencia y confianza que requiere compromiso y deberes entre los amigos. Finalmente, ofrece algunas definiciones y frases sobre lo que significa un verdadero amigo.
El documento presenta varios poemas cortos basados en cuentos de hadas populares como Cenicienta, Blancanieves y Los tres cerditos. Los poemas están escritos en verso y resumen brevemente las historias conocidas, a menudo con un giro más oscuro o cómico. El documento fue creado como una actividad para estudiantes de primaria para que descubran historias a través de la poesía.
Book Conferencias David M Calduch de Soluciona Facil 2013-05Soluciona Facil
Este documento resume la experiencia profesional de SolucionaFacil.es como conferenciante y formador en temas relacionados con las redes sociales y el marketing digital. Incluye detalles sobre conferencias, talleres y formaciones impartidos entre 2010-2013 sobre plataformas como LinkedIn, Google+, redes sociales para empresas, y herramientas de gestión como HootSuite. El documento también muestra opiniones positivas de asistentes sobre la calidad de la información transmitida y las habilidades de comunicación del conferenciante.
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...VLSICS Design
In this paper, a scheme for the design of area efficient and high speed pipeline VLSI architecture for the computation of fixed point 1-d discrete wavelet transform using lifting scheme is proposed. The main focus of the scheme is to reduce the number and period of clock cycles and efficient area with little or no overhead on hardware resources. The fixed point representation requires less hardware resources compared with floating point representation. The pipelining architecture speeds up the clock rate of DWT and reduced bit precision reduces the area required for implementation. The architecture has been coded in verilog HDL on Xilinx platform and the target FPGA device used is Virtex-II Pro family, XC2VP7- 7board. The proposed scheme requires the least computing time for fixed point 1-D DWT and achieves the
less area for implementation, compared with other architectures. So this architecture is realizable for real time processing of DWT computation applications.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...IOSR Journals
1) The document describes a modified distributive arithmetic based discrete wavelet transform (DWT) processor architecture and its FPGA implementation for image compression.
2) The proposed architecture uses four lookup tables to store pre-computed partial products of filter coefficients, achieving a latency of 44 clock cycles and throughput of 4 clock cycles.
3) A software reference model is developed in Matlab to analyze the performance of various wavelets for image compression using the distributive arithmetic based DWT approach. The input image is resized and decomposed into sub-bands using DWT and reconstructed using IDWT.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
With advancement in CMOS technology, a lot of research has been done to develop various logic styles to improve the performance of logic circuits. D flip-flops (DFF) are fundamental building blocks in almost every sequential logic circuit. Hence, in sequential logic circuits, the overall performance of the circuit is affected by the performance of constituent DFFs. In recent years, the focus has been towards incorporating higher clock rates in a processor for better performance. To achieve high clock rates, fine granularity pipelining techniques are used, which implies that there are relatively a fewer levels of logic in each pipeline stage. A major consequence of this design trend is that the pipeline overhead has becoming more significant. The primary cause of pipeline overhead is the latency of the flip-flop or latch used to design the processor and the clock skew of the system. This calls out for the need of incorporating the logic functionality within the architecture of flip-flop. The new family of flip-flops are called Embedded Logic Flip Flops. In this Paper, we have reviewed various Flip-flop architectures which have been proposed so far. Our attempt is to do a qualitative analysis and comparison of the proposed Embedded logic flip-flop designs.
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...IOSR Journals
This document describes a new efficient distributed arithmetic (NEDA) technique for implementing a high speed 1-D discrete wavelet transform (DWT) using a 9/7 filter on a Xilinx Virtex4 FPGA. The key aspects of the NEDA technique are that it uses adders as the main component and does not require multipliers, subtraction, or ROM. Simulation results show the proposed NEDA architecture requires fewer logic resources and has a shorter maximum path delay compared to existing distributed arithmetic techniques.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Efficient FPGA implementation of high speed digital delay for wideband beamfor...journalBEEI
In this paper, the authors present an FPGA implementation of a digital delay for beamforming applications. The digital delay is based on a Parallel Farrow Filter. Such architecture allows to reach a very high processing rate with wideband signals and it is suitable to be used with Time-Interleaved Analog to Digital Converters (TI-ADC). The proposed delay has been simulated in MATLAB, implemented on FPGA and characterized in terms of amplitude and phase response, maximum clock frequency and area.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses wavelet transforms and fast wavelet transforms for image compression. It provides background on discrete wavelet transforms (DWT) and fast wavelet transforms. DWT is useful for image compression because it concentrates image energy into low-frequency coefficients. Compression is achieved by quantizing coefficients, prioritizing low-frequency ones. Popular image compression techniques like JPEG2000 use DWT. Fast wavelet transforms like Mallat's algorithm allow faster image analysis than DWT. The document reviews various image compression techniques and their performance in terms of compression ratio and image quality.
Design and Implementation of an Embedded System for Software Defined RadioIJECEIAES
In this paper, developing high performance software for demanding real-time embed- ded systems is proposed. This software-based design will enable the software engineers and system architects in emerging technology areas like 5G Wireless and Software Defined Networking (SDN) to build their algorithms. An ADSP-21364 floating point SHARC Digital Signal Processor (DSP) running at 333 MHz is adopted as a platform for an embedded system. To evaluate the proposed embedded system, an implementation of frame, symbol and carrier phase synchronization is presented as an application. Its performance is investigated with an on line Quadrature Phase Shift keying (QPSK) receiver. Obtained results show that the designed software is implemented successfully based on the SHARC DSP which can utilized efficiently for such algorithms. In addition, it is proven that the proposed embedded system is pragmatic and capable of dealing with the memory constraints and critical time issue due to a long length interleaved coded data utilized for channel coding.
DIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATIONPiero Belforte
Frequency domain Vector Fitting (VF) is a well known technique to generate circuital models of a spatially discretized lossy transmission lines from theoretical formulation of losses. The sub-picosecond time steps required by multi-gigahertz bandwidths and short transmission lines included in the models, determine long Spice simulation times. A 100X speedup can be gained using the Digital Wave Simulator (DWS) instead of Spice. DWS processes the waves of a Digital Network built up connecting together scattering blocks (circuit elements, nodes and S-parameter multi-ports) coming from a Spice-like description. Being a DSP wave processor instead of a classical nodal equations solver, DWS is computationally very fast and numerically stable. Comparisons with commercial simulators like Microcap11 and CST Cable Studio show a good matching of results. A further 10-100X simulation speedup is obtained if Piecewise-Linear Fitting (PWLF) is used to describe the time-domain behaviors of Scattering Parameters. Single or multiple cell Behavioral Time Models (BTM) can be extracted by PWLF from TDR/TDT measurements and processed by DWS fast convolution algorithms. A setup de-embedding can be performed by pwl breakpoints optimization to fit actual measurements. A RG58 coaxial cable is analyzed and its VF-derived eye-diagrams are compared to PWLF measurement-derived results. At multi-gigabit rates significant differences, due to cable physical implementation effects, are observed. The modeling/simulation alternatives (VF/Spice, VF/DWS and PWLF/DWS) are compared together and the advantages of PWLF/DWS in term of simplicity, stability and speed are highlighted.
This document compares different architectures for implementing the discrete cosine transform (DCT) in an image compression system to reduce power consumption. It summarizes four architectures: a baseline 2D DCT architecture, a row/column distributed arithmetic approach, a fully pipelined architecture, and analyzes their speed and power savings. The row/column approach provides a 24.4% power savings compared to the baseline. The fully pipelined architecture provides 16.4% power savings and higher throughput of 4.703 GHz by exploiting pipelining and parallelism.
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
The document describes the design and implementation of an efficient interpolator for wireless communication systems using FPGA. It proposes a multiplier-less technique using distributed arithmetic look-up tables (DALUT) that replaces multiply-accumulate operations with LUT accesses. A 66th-order half-band polyphase FIR structure is implemented using the DALUT approach on Spartan-3E and Virtex2Pro FPGAs. Results show the proposed design achieves maximum frequencies of 92.859MHz on Virtex Pro and 61.6MHz on Spartan 3E while consuming fewer resources than a traditional MAC-based design.
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
Interpolator is an important sampling device used for multirate filtering to
provide signal processing in wireless communication system. There are many
applications in which sampling rate must be changed. Interpolators and decimators are
utilized to increase or decrease the sampling rate. In this paper an efficient method has
been presented to implement high speed and area efficient interpolator for wireless
communication systems. A multiplier less technique is used which substitutes multiplyand-accumulate
operations with look up table (LUT) accesses. Interpolator has been
implemented using Partitioned distributed arithmetic look up table (DALUT)
technique. This technique has been used to take an optimal advantage of embedded
LUTs of the target FPGA. This method is useful to enhance the system performance in
terms of speed and area. The proposed interpolator has been designed using half band
poly phase FIR structure with Matlab, simulated with ISE, synthesized with Xilinx
Synthesis Tools (XST) and implemented on Spartan-3E and Virtex2pro device. The
proposed LUT based multiplier less approach has shown a maximum operating
frequency of 92.859 MHz with Virtex Pro and 61.6 MHz with Spartan 3E by
consuming considerably less resources to provide cost effective solution for wireless
communication systems.
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd Iaetsd
This document presents a new VLSI architecture for a real-time pipeline FFT processor using fused floating point operations. It proposes high radix floating point butterflies implemented with two fused operations: a two-term dot product and add-subtract unit. Both discrete and fused radix processors are compared in terms of area. Higher throughput is achieved using a proposed architecture with conflict-free memory access and a new addressing scheme for radix-16 FFT.
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...IOSRJVSP
Most of the digital image processing application uses various domain transformation technique to convert time domain information to transform domain which will help to simplify the mathematical modeling. Discrete Wavelet Transform is one of the best transformation techniques. The time-frequency resolution makes this transform sensitive to both time and frequency which will give very good compression and decompression. In this paper, we propose FPGA implementation of multiplier-less CDF-5/3 wavelet transform for image processing application using System-Generator tool.To maintain low area and high frequency we use multiplier-less architecture for CDF-5/3 DWT for our implementation. The VHDL code for multiplier-less structure is fed to system generator tool using standard procedure and synthesis the structure to get the area and frequency
The document describes a proposed approach to developing an efficient Fast Fourier Transform (FFT) module using dual edge triggered flip flops and clock gating. FFT is used to modulate data between time and frequency domains in OFDM systems. Traditionally, single edge triggered flip flops are used in FFT architectures. The proposed system realizes FFT using dual edge triggered flip flops, which can capture and propagate data on both clock edges, improving speed. Clock gating is also used to reduce clock power consumption. Simulation results show the proposed structure achieves higher processing speeds compared to traditional single edge triggered flip flop designs.
Efficient Implementation of Low Power 2-D DCT ArchitectureIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...IJERA Editor
The execution of today's correspondence frameworks is exceedingly subject to the utilized Analog-to-Digital converters (ADCs), and with a specific end goal to give more flexibility and exactness to the developing correspondence innovations, superior-ADCs are needed. In this respect, the time-interleaved operation of an exhibit of ADCs (TI-ADC) might be a sensible result. A TI-ADC can build its throughput by utilizing M channel ADCs or sub converters in parallel and examining the data motion in a period-interleaved way. In any case, the execution of a TI-ADC gravely suffers from the bungles around the channel ADCs. In this paper we survey the advancement in the configuration of low-intricacy advanced remedy structures and calculations for time-interleaved ADCs in the course of the most recent five years. We devise a discrete-time model, state the outline issue, and finally infer the calculations and structures. Specifically, we examine proficient calculations to outline time-differing remedy filters and additionally iterative structures using polynomial based filters. Thusly, the remuneration structure may be utilized to repay time-differing recurrence reaction befuddles in time-interleaved ADCs, and in addition to remake uniform examples from nonuniformly tested indicators. We examine the recompense structure, research its execution, and exhibit requisition zones of the structure through various illustrations. At long last, we give a standpoint to future examination questions.
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...
Hz2514321439
1. V Ashok, B.Chinna rao, P.M.Francis / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1432-1439
A Streamlined Architecture For 3-D Discrete Wavelet
Transformation And Inverse Discrete Wavelet Transform
V Ashok B.Chinna rao P.M.Francis
M.tech(PG student) Prof.&Head,Dept.of ECE Asst.Prof. in Dept. of ECE
GITAS GITAS GITAS
Abstract
This paper presents an architecture of associated with 3-D-DWT realization [5], [6].
the lifting based running 3-D discrete wavelet Though the processor speed of modern computers
transform (DWT), which is a powerful image and soars high at the order of GHz, data fetching and
video compression algorithm. The proposed communicating with external memories consume
design is one of the first lifting based complete 3- several T states, making the computation quite
DDWT architectures without group of pictures slower at the end. As the speeds of the peripherals
restriction. The new computing technique based are still far behind the modern processors, it causes
on analysis of lifting signal flow graph minimizes more problems.
the storage requirement. This architecture enjoys
reduced memory referencing and related low Nowadays, most of the applications require
power consumption, low latency, and high real-time DWT engines with large computing
throughput compared to those of earlier reported potentiality for which a fast and dedicated very-
works. Further, the digital data can be retrieved large-scale integration (VLSI) architecture appears
using Inverse Discrete Wavelet Transform to be the best possible solution. While it ensures
(IDWT). The images need to be retrieved without high resource utilization, that too in cost effective
loosing of information. The proposed platforms like field programmable gate array
architecture has been successfully implemented (FPGA), designing such architecture does offer
on Xilinx Spartan series field-programmable gate some flexibilities like speeding up the computation
array, offering a speed of 40 MHz, making it by adopting more pipelined structures and parallel
suitable for realtime compression even with large processing, possibilities of reduced memory
frame dimensions. Moreover, the architecture is consumptions through better task scheduling or low-
fully scalable beyond the present coherent power and portability features.
Daubechies filterbank (9, 7).
To overcome one of the toughest problems
Index Terms—Discrete wavelet transform, image associated with 3-D-DWT architectures—viz., the
compression, lifting, video, IDWT, VLSI memory requirement, block based [7], [8] or scan-
architecture. based architectures [9]–[11] with independent group
of pictures (GOP) transform have been reported.
I. Introduction However, blocking degrades the PSNR quality
STILL IMAGE compression technique while the independent GOPs introduce annoying
based on 2-D discrete wavelet transform (DWT) has jerks in video playback due to PSNR drop at
already gained superiority over traditional JPEG transform boundaries [12]. Alternatively, some
based on discrete cosine transform and is successful scan-based running transform
standardized in forms like JPEG2000 [1]. Quite architectures with convolution filtering have been
similarly, the application of its 3-D superset, i.e., 3- reported in [13], [14] avoiding these limitations.
D-DWT on video, outperforms the current
predictive coding standards, like H.261-3, MPEG1- This paper fulfills the requirement herewith
2,4 by rendering the quality features like better peak presenting a scan-based complete 3-D architecture
signal-to-noise ratio (PSNR), absence of blocky having infinite GOP. Among the transform
artifacts in low bit rates. Furthermore, it has the components involved in three dimensions, the
added provisions of highly scalable compression, column and temporal directional transforms are
which is mostly coveted in modern communications characteristically parallel in nature (for a row-wise
over heterogeneous channels like the Internet [2]. scan). The novelty of this paper lies in introducing
Successful application of 3-D-DWT has been an ingenious analysis of signal flow graph (SFG),
reported in the literature in emerging fields like which subsequently shows a newer methodology for
medical image compression [3], hyper-spectral and computing those parallel transform components with
space image compression [4], etc. Software-based reduced storage overhead. Synchronous data flow
approaches are experimented to combat the huge and memory arrangements in conjunction with
computational complexity and memory requirement decimated addressing schemes are proposed
1432 | P a g e
2. V Ashok, B.Chinna rao, P.M.Francis / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1432-1439
afterward for incorporating this methodology in
hardware. Thus, the designed processor has a 1 𝛾(1 + 𝑧 −1 1 0 𝜁 0
minimum memory requirement and much smaller 0 1 𝛿(1 + 𝑧) 1 0 (1/𝜁)
hardware budget with a two-fold throughput and (3)
half computing time, latency or memory referencing
compared to those of [12], [17]. With a single adder The related algebraic equations are
in its critical path, the processor achieves a high
speed, which is a fruitful effect of pipelining and
incorporation of flipping scheme. Inside the
processor, the treatments of the signals at the
boundary are done with the mirror extensions
proposed in [1].
Section II summarizes the theory of
flipping as latest modification on lifting. The
proposed architecture along with the analyzed SFG
is illustrated in Section III. Section IV discusses the
issues related to implementation along with the
obtained results after mapping the design in re- where α = −1.586134342, β =
configurable Xilinx FPGAs. Besides, a performance −0.05298011854, γ =
comparison with other related works is also 0.8829110762, δ = 0.4435068522, and δ =
furnished in this section. Finally, the paper is 1.149604398 [16], and also 0 ≤ i ≤ −1, L is the data
concluded in Section V. length.
As a fruitful result, the processing speed
II. Theoretical Framework increases significantly when the flipped equations
As the DWT intrinsically constitutes a pair of are mapped into hardware.
filtering operations, a unified representation of the Following the modification on SFG, the final
polyphase matrix is introduced as follows [16]: equations for flipping are
(1)
where h(z) and g(z) stand for the transfer
functions for the lowpass and highpass filterbanks,
respectively, and all suffixes e and o in the literature
correspond to even and odd terms, respectively.
Thus, the transform is symbolized with the equation
( λ(z) γ(z) ) = ( xe(z) 𝑧 −1 xo(z) ) P(z)
where A = (1/α) = −0.630463, B = (1/16αβ)
with λ(z) and γ(z) signifying the filtered lowpass and = 743750, C = (1/32βγ) = −0.668067, D = (1/4γδ)
highpass parts of the input x(z). = 0.638443, K0 = (64αβγδ) = 2.590697 and K1 =
(32αβγ/δ) = 1.929981 (up to six fractional digits)
The lifting scheme [15], [16] factorizes the and also 0 ≤ i ≤ L − 1, L is the data length [18].
polyphase representation into a cascade of upper and
lower triangular matrices and a scaling matrix which To handle the truncation of the signals at
subsequently return a set of linear algebraic oundaries, mirror extension is utilized by
equations in the time domain bringing forth the incorporating corresponding changes into (5) at the
possibility of a pipelined processor. Several other start and stop of frame sequences and at the
advantages of lifting are mentioned in [16]. individual frame boundaries as well as for the 3-D
transforms.
For instance, the common Daubechies (9, 7) Now, during the computation of 3-D
filterbank can be factorized as wavelets, the order of spatial and temporal
transform components involved can be interchanged
𝛼(1 + 𝑧 −1 1 0 where both the arrangements conform to the
P(z) = 1
0 1 𝛽(1 + 𝑧) 1 definition of 3-D-DWT. However, first temporal
and then spatial (t + 2-D) transform suffer from
certain limitations with
1433 | P a g e
3. V Ashok, B.Chinna rao, P.M.Francis / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1432-1439
spatial scalability or spatio-temporal decomposition Fig. 4. Two snapshots of RMEM with a model
structure [2] which restrict its future extensions. image size of 8 × 8.
Thus, during the design of the present system, first
spatial and then temporal (2-D + t) decomposition B. RPE and the RMEM
are chosen though in due requirement, the reverse Among all the micro-architectures for
method can be equally mapped into hardware different submodules, which transform the input
without any difficulty. video in three directions, the RPE module is the
simplest. As described in Fig. 3, it is a
III. Proposed Architecture straightforward implementation of (5) with
A. Working Principle pipelining applied to speed up the operations.
Fig. 1 presents the proposed scan-based 1 Scanned with a dual clock, the incoming pixels are
level 3-D wavelet transform architecture with a separated into successive duos of odd and even ones
block level illustration of principal functional at the SPLITTER stage and move forward in
modules. Clearly from the figure, the proposed parallel throughout the pipeline. The required
architecture does the spatial transform first, datapath operations of lifting are performed upon
followed by its temporal counterpart. The following these pixels at consecutive Predict (Pi), Update (Ui),
two parts in this section give a detailed view about and Shift (Si) stages of the RPE (as depicted in Fig.
hand-in-hand working of 3) which finally produces pairs of highpass and
the different functional blocks to realize those two lowpass pixels available from the ports OUT EVEN
transform components. and OUT ODD in a streamlined fashion or manner.
The following sections discuss the detailed These pixels, prior to column processing,
design of principal working modules in the are temporarily put in RMEM which generate the
architecture. synchronized dataflow to store as well as feed the
coefficients to CPE. After processing the initial two
rows of a frame the transformed coefficients
completely fill up the memory locations as
illustrated in snapshot 1 of Fig. 4. At the very next
clock cycle, two new pixels viz., l(2,0) and h(2,0),
arrive from RPE and they are placed at the locations
of R1 and R3 (refer to snapshot 2), which are just
left vacant as stored data, namely, l(0,0) and l(1,0)
are read out at the commencement of column
Fig. 2. Spatio-temporal wavelet decomposition with
processing. Subsequent locations are similarly
proposed architecture. (a) Original frame sequence.
refreshed till all the coefficients from row 2 are
(b) After 1 level spatial transform. (c) Lowpass
stored in those two RAMs. Similarly, during
frames after the temporal transform. (d) Highpass processing of the next row, RAMs R2 and R4
frames after the temporal transform.
undergo a series of memory refreshments as the
locations previously containing h0 and h1
coefficient blocks are attributed to the storage of
coefficients of h3 and i3, available from RPE. Thus,
a periodic pattern can be identified among the
refreshed RAM pairs, which are further given in a
tabular form in Fig. 4 against the processed rows.
The proposed memory arrangement is free from any
such scenario where the RAM resources would be
unnecessarily occupied with stale data which are not
Fig. 3. (a) Architecture of RPE with (b) illustration to be used for future computation.
of a generic P/U module
C. Analysis of SFG to Facilitate Parallel
Computation
The problems associated with designing
architectures for column and temporal directional
transforms are however critical. In a setup where
video frames are scanned row-wise and processed
coefficients from RPE are spaced contiguously in
rows, the column processor has to wait for an entire
row to get another input sample for processing and
the temporal processor needs to hold back for the
entire frame before it can proceed with the next
1434 | P a g e
4. V Ashok, B.Chinna rao, P.M.Francis / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1432-1439
computation step. Like many other signal processing Fig. 5. Data-dependence analysis of SFG for parallel
architectures, the 3-D-DWT processor thus computation. (a) SFG during computation of slice 1.
inherently carries a huge memory and latency (b) SFG during computation of slice 2. (c) SFG
overhead in its working principle. Clearly, a during computation of slice 3. (d) Explanation of
pipelined design like RPE does not fit in for column color code.
and temporal processing and parallel architectures
are mostly sought to address this issue. The overall slice can be perfectly represented as the
advantage of any DWT processor lies in addressing function of two input samples from previous slice
these performance bottlenecks successfully. (colored in green), one input block from current
slice (shown in red) and computed predict and
The SFG of lifting, shown in Fig. 5, holds update coefficient blocks from previous slice
the key for analyzing data dependence inside any (highlighted in pink). Since slice 0 holds only input
DWT processor. Each input and output sample to samples, computation should commence with slice
this SFG denotes a block of data. For row 1. Following the sequence in Fig. 5(a)–(c), the
processing, these blocks refer to image pixels computation can henceforth continue for successive
adjacently spaced in rows. However, for column slices until the termination of SFG which happens at
transform, each of these blocks signifies a group of the end of each frame of column processing or at the
processed l and h band coefficients of size N/2. termination of video sequence for temporal
Similarly, for temporal transform they relate to 2-D processing. A pair of output sample blocks
transformed individual frames of N2 pixels. Thus, (highlighted in brown) from each individual slice
when the row processor can freely ―sweep across‖ are the natural outcomes of this computation.
this graph producing a stream-lined output, an
intelligent column or temporal processor must wait The present architecture successfully
and partially finish the computation with available implements this slicewise computing strategy
inputs before they proceed to the next step. The key carefully preserving the critical latency and memory
for such parallel processing is to find an optimized requirement with the help of some unique memory
basic step of computation which minimizes the arrangement techniques. Explained in Fig. 5(d), the
latency together with memory overhead for overall individual group of memory blocks is handled
architecture. differently during the column transform of l and h
A careful observation of SFG shown in bands and temporal transform. While for the l band
Fig. 5(a)–(c), infers that the individual slices are the processing, the computation starts with input
most distinguished representation of the aforesaid coefficient block in red reaching on-the-go from
basic computation steps. Highlighted in blue, the RPE and the green ones being fed from RMEM,
predict and update calculations inside each such during the h band processing all three of them are
read from RMEM. For the temporal transform, the
frame from the current slice is actually stored in
SMEM and read back at half rate. The intermediate
coefficients in pink are stored in CMEM and
TMEM, respectively.
Fig. 6. (a) Snapshot of CPE pipeline and (b) detailed
P/U module.
D. CPE
The architecture of CPE, shown in Fig.
6(a), is quite similar to that of RPE. Fig. 6(b)
presents its inside details with a P/U module.
However, the continuity of RPE pipeline is
purposefully broken at several places creating a new
set of input and output ports which contribute to the
1435 | P a g e
5. V Ashok, B.Chinna rao, P.M.Francis / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1432-1439
aforesaid parallel computation. Among all the ports, effect refreshes the memory with the consecutive
IN 1-3 and OUT duos of frames 0, 1 and 2, 3 and so on, while the
4-5 are connected to RMEM and SMEM second operations are solely responsible for
respectively and help creating a streamlined providing the additional pixels of frame 2i during
dataflow from spatial processor to temporal computations involving slice i (i = 1, 2, 3, . . . ).
processor, as depicted in the main architecture (refer
to Fig. 1). The other ports, IN 4-6 and OUT 1-3 As depicted in snapshot 1 of Fig. 8, the
from RPE are utilized to exchange intermediate frames 0 and 1, being divided in parts L and H,
coefficients with the six CMEM buffers which is the where L and H signify the fact that those pixels
key to slice-wise computation. emerge from the lowpass and highpass ports of
CPE, initially arrange themselves in buffers of
E. SMEM SMEM. Once the computation progresses, the pixels
Once transformed spatially, the frames are of the two frames are replaced with those of frame 2
directed to SMEM (refer to Fig. 1), which requires a and as a matter of fact, the new frame gets
minimum of two frame buffers for the data decimated inside the RAMs as the pixels of the new
management. While the first two frames can be frame are allocated to memory locations, just
given room in those two frame buffers easily, released off the older frames every time. Thus, as
complexity arises when the third frame arrives from the computation of slice 2 is completed, the new
SP and the computation is simultaneously started by frames 2 and 3 reposit themselves according to the
the TP. While in every clock cycle, a pixel pair of topography described in snapshot 2 of that figure.
frame 2 can be allocated into the vacant memory The order of decimation increases as the
locations from where the two pixels of the frames 0 computation moves ahead following a manner very
and 1 have been read out for the computation, the similar to that of fast Fourier transform addressing.
temporal processing methodology demands an extra The pattern repeats itself after log 2 ( 𝑁 2 ) cycles.
set of the read operation to be carried out; for Fig. 9 helps us
collecting the corresponding pixels of frame 2 which
act as the third set of input in computation of the
first lifting step (refer to slice 1 in Fig. 5).
Fig. 8. Two snapshots of the SMEM.
Fig. 7. Arrangement of SMEM frame buffers.
Importantly, this second set of data cannot
be provided online to the TP, as the frames are
arriving at double rate and computation needs them
at single clock. So, all that is needed is to read them
back from memory with a half data rate. Thus, the
memory arrangement of Fig. 7 is followed where
port A of the dual port RAMs is used for reading Fig. 9. Addressing pattern in SMEM. RAMs
older frames from memory as well as storing the illustrated for a memory depth of 8.
newer ones in those locations when the Port B
remains dedicated for the second set of read
operations. Thus, the first kind of operations in
1436 | P a g e
6. V Ashok, B.Chinna rao, P.M.Francis / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1432-1439
multipliers. Instead, those numbers could be
considered up to a finite precision during designing.
However, the impacts of this limited precision are
experienced with lowered PSNR values and
subsequent degradation of the quality of reproduced
video during the decompression. Additionally, the
precision of the data samples right after each
multiplier affects the PSNR in a quite similar way.
B. Implementation Results
The architecture has been mapped into
Xilinx programmable device (FPGA) XC4VFX140
Fig. 10. Snapshot of the TP. to visualize the with speed grade of 12 through the Xilinx ISE 7.1i
addressing pattern for a sample RAM depth of 8. tool. A uniform wordlength of 17 bits has been
maintained throughout the processor to afford
The dual port BlockRAMs of Xilinx sufficient data depth. After pipelining the
FPGA, which is used as target platform for the multipliers, the critical path for the processor
proposed architecture, provides the facility of ―read consists of single adder, making it quite fast. A fast
before write‖ operations in the same clock cycle at counter based controller was designed which
the memory locations. Utilizing it, simultaneous handles all the address generation and other
read and write operations are performed by an switching operations at the high speed of main data-
address in port A through the channels DIA and path. Such controllers are programmable and can
DOA, according to the requirements. The address in synchronize the control signal generation according
port B follows the same practice, only changing at a to different video frame sizes. So other than
half speed, as they simultaneously pick up two standard N × N, they can handle standard quarter
pixels from the RAM pairs in a clock cycle which common intermediate format or common
are further multiplexed to feed INT 2 of TP at single intermediate format or various different aspect
clock rate. ratios. The adders from the library and device dual
port block RAMs have been utilized as the principal
F. Latency and Complete Memory Requirement resources for the designed processor. Simulation is
Following the SFG of lifting (refer to Fig. performed by ModelSim XE III 6.0a, which yields a
5), the minimum number of inputs required for set of end results completely matching the results
computing a wavelet coefficient is restricted to five. from MATLAB 7.0.0, where a model of the
Thus, with the provision of the fifth input to arrive hardware is created.
online during the computation, the CPE and TP, The overall design report can be formulated as
respectively, need a minimum wait time of four
rows (2N clock cycles) and four frames (2𝑁 2 Custom frame size 256×256
cycles) to produce the first output from the Group of frames (GOP) Infinite
architecture, thereby resulting in a total parametric Maximum clock frequency 321 MHz
latency of T clock cycles, where Throughput Two results/cycle
Initial latency 2N2 + 2Nψ + 47 clock cycles
T = 2𝑁 2 + 2N + LatencyRPE + LatencyCPE + Number of occupied slices 1776 (2%)
LatencyTP . (6) Total number four input
LUTs 2188 (1%)
The extra terms arise due to the pipelined Number of block RAMs 350 (63%).
design of each computing unit. The memory
requirement for five frames and ten N/2 length line C. Inverse discrete wavelet transform
buffers is 5𝑁 2 + 5N. The inverse DWT (IDWT) is the
computational reverse. The lowest low-pass and
IV. Implementation Results and Discussions highpass data-streams are up-sampled (ie. a zero is
A. Multipliers and Datapath Precisions placed between each data-word) and then filtered
After the details of the architecture and the using filters related to the decomposition filters. The
data management principles have been thoroughly two resulting streams are simply added together to
chalked out, the issues related to mapping the design form the low-pass result of the previous level of
into a reconfigurable device are of prime interest. processing. This can be combined with the high-
These include the precision of the multipliers in the pass result in a similar fashion to produce further
architecture. levels, the process continuing until the original data-
Being irrational numbers, the flipping stream is reconstructed.
coefficients corresponding to (4) are not ideally
realizable in architecture with the hardware
1437 | P a g e
7. V Ashok, B.Chinna rao, P.M.Francis / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1432-1439
References
[1] A. Skodras, C. Christopoulos, and T.
Ebrahimi, ―The JPEG 2000 still image
compression standard,‖ IEEE Signal
Process. Mag., vol. 18, no. 5, pp. 36–58,
Sep. 2001.
[2] J.-R. Ohm, M. van der Schaar, and J. W.
Woods, ―Interframe wavelet coding:
Motion picture representation for universal
scalability,‖ J. Signal Process. Image
Commun., vol. 19, no. 9, pp. 877–908, Oct.
2004.
[3] G. Menegaz and J.-P. Thiran, ―Lossy to
lossless object-based coding of 3-D MRI
data,‖ IEEE Trans. Image Process., vol.
11, no. 9, pp. 1053– 1061, Sep. 2002.
Fig: Simulation Result of IDWT Block with Final [4] J. E. Fowler and J. T. Rucker, ―3-D
Pixel Values wavelet-based compression of
hyperspectral imagery,‖ in Hyperspectral
Figure shows the final pixel value of the Data Exploitation: Theory and
original image calculated and hence they are given Applications, C.-I. Chang, Ed. Hoboken,
out serially. Thus the pixel values given at the input NJ: Wiley, 2007, ch. 14, pp. 379–407.
of the DWT block is compared and verified with the [5] L. R. C. Suzuki, J. R. Reid, T. J. Burns, G.
final output pixel vales from the IDWT block. B. Lamont, and S. K. Rogers, ―Parallel
computation of 3-D wavelets,‖ in Proc.
V. Conclusion Scalable High- Performance Computing
The applications of 3-D wavelet based Conf., May 1994, pp. 454–461.
coding are opening new vistas in video and other [6] E. Moyano, P. Gonzalez, L. Orozco-
multidimensional signal compression and Barbosa, F. J. Quiles, P. J. Garcia, and A.
processing. The prominent needs in these diversified Garrido, ―3-D wavelet compression by
application areas are efficient 3-D-DWT engines message passing on a Myrinet cluster,‖ in
with good computing power which draws the Proc. Can. Conf. Electr. Comput. Eng.,
attention of the dedicated VLSI architectures as the vol. 2. 2001, pp. 1005–1010.
best possible solution. Though the researches of 2- [7] W. Badawy, G. Zhang, M. Talley, M.
D-DWT architectures are progressing quite fast, Weeks, and M. Bayoumi, ―Low power
fewer approaches are reported in the literatures architecture of running 3-D wavelet
designing their 3-D counterpart. transform for medical imaging
This paper has presented a lifting based 3- application,‖ in Proc. IEEE Workshop
D-DWT architecture with running transform, Signal Process. Syst., Taiwan, 1999, pp.
possibly the first of its kind. The main flavors of the 65–74.
design are minimized storage requirement and [8] G. Bernab´e, J. Gonz´alez, J. M. Garc´ia,
memory referencing, low latency and power and J. Duato, ―Memory conscious 3-D
consumption and increased throughput, which wavelet transform,‖ in Proc. 28th
become evident when they are compared with those Euromicro Conf. Multimedia Telecommun.,
of existing ones. Having single adder in its critical Dortmund, Germany, Sep. 2002, pp. 108–
path, the mapped processor achieves a high speed of 113.
321 MHz, offering large computing potentials which [9] M. Weeks and M. A. Bayoumi, ―Three-
opens up new vista for real-time video processing dimensional discrete wavelet transform
applications. architectures,‖ IEEE Trans. Signal
Compared to the original 3-D-DWT Process., vol. 50, no. 8, pp. 2050–2063,
transform, successful application of motion Aug. 2002.
compensations before temporal transform has been [10] Q. Dai, X. Chen, and C. Lin, ―Novel VLSI
reported in the literature [2] as a good alternative for architecture for multidimensional discrete
predictive coding. It is worth mentioning that the wavelet transform,‖ IEEE Trans. Circuits
present design is fully scalable to those future Syst. Video Technol., vol. 14, no. 8, pp.
modifications and can be accepted as an 1105–1110, Aug. 2004
introductory step toward those future 3-D wavelet [11] W. Badawy, M. Weeks, G. Zhang, M.
computing machines. Talley, and M. A. Bayoumi, ―MRI data
compression using a 3-D discrete wavelet
1438 | P a g e
8. V Ashok, B.Chinna rao, P.M.Francis / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1432-1439
transform,‖ IEEE Eng. Med. Biol. Mag., [23] B. Girod and S. Han, ―Optimum update for
vol. 21, no. 4, pp. 95–103, Jul.–Aug. 2002. motion-compensated lifting,‖ IEEE Signal
[12] J. Xu, Z. Xiong, S. Li, and Y.-Q. Zhang, Process. Lett., vol. 12, no. 2, pp. 150–153,
―Memory-constrained 3-D wavelet Feb. 2005.
transform for video coding without [24] A. Secker and D. Taubman, ―Motion-
boundary effects,‖ IEEE Trans. Circuits compensated highly scalable video
Syst. Video Technol., vol. 12, no. 9, pp. compression using an adaptive 3-D wavelet
812–818, Sep. 2002. transform based on lifting,‖ in Proc. IEEE
[13] B. Das and S. Banerjee, ―Low power Int. Conf. Image Process., Thessaloniki,
architecture of running 3-D wavelet Greece, Oct. 2001, pp. 1029–1032.
transform for medical imaging [25] B. Pesquet-Popescu and V. Bottreau,
application,‖ in Proc. Eng. Med. Biol. ―Three-dimensional lifting schemes for
Soc./Biomed. Eng. Soc. Conf., vol. 2. 2002, motion compensated video compression,‖
pp. 1062–1063. in Proc. IEEE Int. Conf. Acoust. Speech
[14] B. Das and S. Banerjee, ―Data-folded Signal Process., Salt Lake City, UT, May
architecture for running 3-D DWT using 4- 2001, pp. 1793–1796.
tap Daubechies filters,‖ IEE Proc. Circuits [26] N. Bozinovi´c, J. Konrad, T. Andr´e, M.
Devices Syst., vol. 152, no. 1, pp. 17–24, Antonini, and M. Barlaud, ―Motion-
Feb. 2005. compensated lifted wavelet video coding:
[15] W. Sweldens, ―The lifting scheme: A Toward optimal motion/transform
custom-design construction of biorthogonal configuration,‖ in Proc. 12th Eur. Signal
wavelets,‖ Appl. Comput. Harmon. Anal., Process. Conf., Vienna, Austria, Sep. 2004,
vol. 3, no. 15, pp. 186–200, 1996. pp. 1975–1978.
[16] I. Daubechies and W. Sweldens, ―Factoring [27] L. Luo, S. Li, Z. Zhuang, and Y.-Q. Zhang,
wavelet transforms into lifting steps,‖ J. ―Motion compensated lifting wavelet and
Fourier Anal. Appl., vol. 4, no. 3, pp. 247– its application in video coding,‖ in Proc.
269, 1998. IEEE Int. Conf. Multimedia Expo, Tokyo,
[17] Z. Taghavi and S. Kasaei, ―A memory Japan, 2001, pp. 365–368.
efficient algorithm for multidimensional [28] Architecture and Features of a Fully
wavelet transform based on lifting,‖ in Scalable Motion-Compensated 3-D
Proc. IEEE Int. Conf. Acoust. Speech Subband Codec, document M7977.doc,
Signal Process. (ICASSP), vol. 6. 2003, pp. ISO/IEC/JTC1/SC29/ WG11, Mar. 2002.
401–404. [29] Improved MC-EZBC with Quarter-Pixel
[18] C.-T. Huang, P.-C. Tsneg, and L.-G. Chen, Motion Vectors, document 813
―Flipping structure: An efficient VLSI MPEG2002/M8366, ISO/IEC
architecture for lifting-based discrete JTC1/SC29/WG11, May 2002.
wavelet transform,‖ IEEE Trans. Signal
Process., vol. 52, no. 4, pp. 1080–1090,
Apr. 2004.
[19] C. Pilrisot, M. Antonini, and M. Barlaud,
―3-D scan based wavelet transform and
quality control for video coding,‖ Eur.
Assoc. Signal Process. J. Appl. Signal
Process., vol. 2003, pp. 56–65, Jan. 2003.
[20] S. Barua, J. E. Carletta, K. A. Kotteri, and
A. E. Bell, ―An efficient architecture for
lifting-based two-dimensional discrete
wavelet transforms,‖ VLSI J. Integration,
vol. 38, no. 3, pp. 341–352, Jan. 2005.
[21] G. Kuzmanov, B. Zafarifar, P. Shrestha,
and S. Vassiliadis, ―Reconfigurable DWT
unit based on lifting,‖ in Proc. Program
Res. Integr. Syst. Circuits, Veldhoven, The
Netherlands, Nov. 2002, pp. 325–333.
[22] I. S. Uzun and A. Amira, ―Design and
FPGA implementation of nonseparable 2-D
biorthogonal wavelet transforms for
image/video coding,‖ in Proc. Int. Conf.
Image Process. (ICIP), vol. 4. Belfast,
U.K., Oct. 2004, pp. 2825–2828.
1439 | P a g e