Authors:
Jeff Plaisance, Indeed
Nathan Kurz, Verse Communications
Daniel Lemire, LICEF, Universite du Québec
Paper accepted to the International Symposium on Web Algorithms (iSWAG), 2015.
Blog post: http://engineering.indeed.com/blog/2015/03/vectorized-vbyte-decoding-high-performance-vector-instructions/
Abstract:
We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (MASKED VBYTE) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed.
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...zaidinvisible
This document summarizes a study that proposes enhancing the Advanced Encryption Standard (AES) algorithm by using an intelligent Blum-Blum-Shub (BBS) pseudo-random number generator to generate the initial encryption key. The AES algorithm is described along with its standard steps of sub-bytes, shift rows, mix columns, and add round key. Issues with the security of AES's public key are discussed. The study then introduces BBS and Iterated Local Search (ILS) metaheuristics and describes how combining them can generate strong cryptographic keys. An example is provided to demonstrate encrypting a message with the enhanced AES approach using an intelligent BBS-generated key. The study concludes the method increases encryption efficiency and
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses hash and MAC algorithms. It provides details on hash functions, the Secure Hash Algorithm (SHA), and HMAC.
Hash functions take a message and produce a fixed-size hash value. SHA is a secure hash algorithm developed by NIST that produces 160-bit or longer hash values. It involves padding the message, initializing a buffer, processing the message in blocks through a compression function, and outputting the final hash.
HMAC is a MAC algorithm that incorporates a secret key into an existing hash function like MD5 or SHA. It pads and XORs the key, hashes the result with the message, then hashes again with a padded key to produce the MAC value.
Modern Block Cipher- Modern Symmetric-Key CipherMahbubur Rahman
Introduction to Modern Symmetric-Key Ciphers- This lecture will cover only "Modern Block Cipher".
Slide Credit: Maleka Khatun & Mahbubur Rahman
Dept. of CSE, JnU, BD.
A High Throughput CFA AES S-Box with Error Correction CapabilityIOSR Journals
The document describes a proposed method for implementing a fault tolerant Advanced Encryption Standard (AES) using a Hamming error correction code. AES operates by performing rounds of transformations on blocks of data, with the most complex step being the SubBytes transformation which involves calculating multiplicative inverses in GF(28). The proposed method uses composite field arithmetic to more efficiently calculate these inverses. It also applies a (12,8) Hamming error correction code to each byte before and after processing to detect and correct single bit errors caused by radiation events, improving reliability for satellite communications. The parity check bits for the Hamming code are precalculated and stored for the AES S-box lookup tables.
Cyclic redundancy check (CRC) is a technique used to detect errors in digital data during transmission. It works by adding check bits to the message at the transmitter and checking for errors at the receiver. An encoder generates CRC check bits using a generator polynomial and appends them to the message. The receiver decodes the message using the same CRC method and checks if the remainder is zero, indicating no errors. CRC can be implemented using linear feedback shift registers (LFSRs) to generate the check bits.
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithmijsrd.com
Advanced encryption standard was accepted as a Federal Information Processing Standard (FIPS) standard. In traditional look up table (LUT) approaches, the unbreakable delay is longer than the total delay of the rest of operations in each round. LUT approach consumes a large area. It is more efficient to apply composite field arithmetic in the SubBytes transformation of the AES algorithm. It not only reduces the complexity but also enables deep sub pipelining such that higher speed can be achieved. Isomorphic mapping can be employed to convert GF(28) to GF(22)2)2) ,so that multiplicative inverse can be easily obtained. SubBytes and InvSubBytes transformations are merged using composite field arithmetic. It is most important responsible for the implementation of low cost and high throughput AES architecture. As compared to the typical ROM based lookup table, the presented implementation is both capable of higher speeds since it can be pipelined and small in terms of area occupancy (137/1290 slices on a Spartan III XCS200-5FPGA).
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...zaidinvisible
This document summarizes a study that proposes enhancing the Advanced Encryption Standard (AES) algorithm by using an intelligent Blum-Blum-Shub (BBS) pseudo-random number generator to generate the initial encryption key. The AES algorithm is described along with its standard steps of sub-bytes, shift rows, mix columns, and add round key. Issues with the security of AES's public key are discussed. The study then introduces BBS and Iterated Local Search (ILS) metaheuristics and describes how combining them can generate strong cryptographic keys. An example is provided to demonstrate encrypting a message with the enhanced AES approach using an intelligent BBS-generated key. The study concludes the method increases encryption efficiency and
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document discusses hash and MAC algorithms. It provides details on hash functions, the Secure Hash Algorithm (SHA), and HMAC.
Hash functions take a message and produce a fixed-size hash value. SHA is a secure hash algorithm developed by NIST that produces 160-bit or longer hash values. It involves padding the message, initializing a buffer, processing the message in blocks through a compression function, and outputting the final hash.
HMAC is a MAC algorithm that incorporates a secret key into an existing hash function like MD5 or SHA. It pads and XORs the key, hashes the result with the message, then hashes again with a padded key to produce the MAC value.
Modern Block Cipher- Modern Symmetric-Key CipherMahbubur Rahman
Introduction to Modern Symmetric-Key Ciphers- This lecture will cover only "Modern Block Cipher".
Slide Credit: Maleka Khatun & Mahbubur Rahman
Dept. of CSE, JnU, BD.
A High Throughput CFA AES S-Box with Error Correction CapabilityIOSR Journals
The document describes a proposed method for implementing a fault tolerant Advanced Encryption Standard (AES) using a Hamming error correction code. AES operates by performing rounds of transformations on blocks of data, with the most complex step being the SubBytes transformation which involves calculating multiplicative inverses in GF(28). The proposed method uses composite field arithmetic to more efficiently calculate these inverses. It also applies a (12,8) Hamming error correction code to each byte before and after processing to detect and correct single bit errors caused by radiation events, improving reliability for satellite communications. The parity check bits for the Hamming code are precalculated and stored for the AES S-box lookup tables.
Cyclic redundancy check (CRC) is a technique used to detect errors in digital data during transmission. It works by adding check bits to the message at the transmitter and checking for errors at the receiver. An encoder generates CRC check bits using a generator polynomial and appends them to the message. The receiver decodes the message using the same CRC method and checks if the remainder is zero, indicating no errors. CRC can be implemented using linear feedback shift registers (LFSRs) to generate the check bits.
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithmijsrd.com
Advanced encryption standard was accepted as a Federal Information Processing Standard (FIPS) standard. In traditional look up table (LUT) approaches, the unbreakable delay is longer than the total delay of the rest of operations in each round. LUT approach consumes a large area. It is more efficient to apply composite field arithmetic in the SubBytes transformation of the AES algorithm. It not only reduces the complexity but also enables deep sub pipelining such that higher speed can be achieved. Isomorphic mapping can be employed to convert GF(28) to GF(22)2)2) ,so that multiplicative inverse can be easily obtained. SubBytes and InvSubBytes transformations are merged using composite field arithmetic. It is most important responsible for the implementation of low cost and high throughput AES architecture. As compared to the typical ROM based lookup table, the presented implementation is both capable of higher speeds since it can be pipelined and small in terms of area occupancy (137/1290 slices on a Spartan III XCS200-5FPGA).
Blockchain and Smart Contract SimulationJun Furuse
Blockchain systems require high levels of safety due to financial incentives. Tezos focuses on formal verification methods to mathematically prove software properties. Simulations are also important for evaluating aspects that cannot be fully proven, such as consensus algorithms, smart contract behavior under different conditions, and determining appropriate gas costs for smart contract opcodes based on runtime performance.
Richard Hamming developed Hamming codes in the late 1940s to enable error correction in computing. Hamming codes are perfect 1-error correcting codes that use parity checks to detect and correct single bit errors in binary data. The codes work by encoding k message bits into an n-bit codeword with additional parity check bits such that the minimum distance between any two codewords is 3, allowing correction of single bit errors. Hamming codes see widespread use and can be generalized to non-binary alphabets. Extended Hamming codes provide both single-error correction and double-error detection.
This document discusses turbo codes, which are a type of error correction code built by parallel concatenating two convolutional code blocks. It focuses on investigating the iterative decoding of turbo codes. The bit error rate is calculated over multiple iterations of decoding and plotted against signal to noise ratio. Quadratic permutation polynomial and random interleavers are analyzed. Results show turbo codes with a memory of 3 and 1280 bit interleaver achieve the best performance, reaching a bit error rate of 10-5 at -1.2 dB after 10 iterations of decoding.
The document discusses various techniques for image compression, including lossless and lossy methods. For lossless compression, it describes predictive coding techniques that remove inter-pixel redundancy such as delta modulation. It also covers entropy encoding schemes like Huffman coding and LZW coding. For lossy compression, it discusses the discrete cosine transform used in the JPEG standard, where higher frequency coefficients are quantized more coarsely to remove information. Zig-zag ordering is used before entropy coding the quantized DCT coefficients.
IRJET- Review Paper on Study of Various Interleavers and their SignificanceIRJET Journal
This document reviews various interleavers and their significance in digital communication systems. It discusses how interleavers can be used in turbo encoders and decoders to improve error correction capabilities without reducing bandwidth. The document summarizes different types of interleavers including random, QPP, helical, odd-even, and matrix interleavers. It also discusses turbo encoding and decoding processes and how convolutional codes differ from block codes. Key performance metrics like bit error rate and bit error rate curves are analyzed to evaluate and compare interleaver quality.
The document discusses using network coding to optimize routing schemes for multicasting in mobile ad-hoc networks. It defines the problem and assumptions, such as independent information sources and specifying multicast requirements. Network coding is described as employing coding at nodes rather than just relaying data. Simulations show that calculating the optimal routing scheme is much less complex with network coding compared to conventional routing, and the network coding solution uses close to the minimum possible energy.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
This document presents a methodology for designing low error fixed width adaptive multipliers. It begins by discussing Baugh-Wooley multiplication, which produces a 2n-bit output from n-bit inputs. For digital signal processing applications, only an n-bit output is required. Direct truncation introduces errors. The methodology proposes using a generalized index and binary thresholding to derive an error-compensation bias to reduce truncation errors. It defines different types of binary thresholding and analyzes statistics to determine average bias values. The proposed fixed width multiplier is intended to have better error performance than other existing multiplier structures.
This document describes an implementation of a soft-decision Viterbi decoder for a GSM receiver using AltiVec SIMD instructions on PowerPC processors. It discusses encoding and decoding of data using convolutional codes, the structure of frames, and an algorithm to calculate branch metrics and update path metrics in parallel using AltiVec to determine the most likely decoding of received bits and perform traceback to output the decoded data. Key steps include setting up constants, calculating branch metrics from soft bit values, accumulating these into path metrics to evaluate possible paths, selecting the best path and tracing back states to decode the received bits.
This document contains a Java program that implements the Cyclic Redundancy Check (CRC) algorithm for error detection in data transmission. The CRC algorithm uses polynomial arithmetic to compute a checksum over transmitted data by dividing the data by a fixed generator polynomial. The checksum is appended to the data before transmission. On the receiving end, the data and checksum are divided again by the generator polynomial. If the remainder is non-zero, then an error is detected in the received data. The Java program takes input data and generator polynomial from the user, computes the CRC checksum, transmits the data with checksum, and checks for errors on receiving end.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Data Compression - Text Compression - Run Length EncodingMANISH T I
Run-length encoding (RLE) replaces consecutive repeated characters in data with a single character and count. For example, "aaabbc" would compress to "3a2bc". RLE works best on data with many repetitive characters like spaces. It has limitations for natural language text which contains few repetitions longer than doubles. Variants include digram encoding which compresses common letter pairs, and differencing which encodes differences between successive values like temperatures instead of absolute values.
Nowadays exponential advancement in reversible comp
utation has lead to better fabrication and
integration process. It has become very popular ove
r the last few years since reversible logic circuit
s
dramatically reduce energy loss. It consumes less p
ower by recovering bit loss from its unique input-o
utput
mapping. This paper presents two new gates called
RC-I and RC-II to design an n-bit signed binary
comparator where simulation results show that the p
roposed circuit works correctly and gives significa
ntly
better performance than the existing counterparts.
An algorithm has been presented in this paper for
constructing an optimized reversible n-bit signed c
omparator circuit. Moreover some lower bounds have
been proposed on the quantum cost, the numbers of g
ates used and the number of garbage outputs
generated for designing a low cost reversible sign
ed comparator. The comparative study shows that the
proposed design exhibits superior performance consi
dering all the efficiency parameters of reversible
logic
design which includes number of gates used, quantum
cost, garbage output and constant inputs. This
proposed design has certainly outperformed all the
other existing approaches.
International Journal of Engineering and Science Invention (IJESI) inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
BCH codes, part of the cyclic codes, are very powerful error correcting codes widely used in the information coding techniques. This presentation explains these codes with an example.
This document provides an introduction to network coding, including:
- Examples of how network coding can increase throughput in butterfly networks and wireless communications
- Theories showing how intermediate nodes can linearly combine information to deliver data at the maximum possible rate
- Benefits of network coding like increased throughput and efficiency, but also challenges like integrating it into existing infrastructure
Design of Reversible Sequential Circuit Using Reversible Logic SynthesisVLSICS Design
Reversible logic is one of the most vital issue at present time and it has different areas for its application, those are low power CMOS, quantum computing, nanotechnology, cryptography, optical computing, DNA computing, digital signal processing (DSP), quantum dot cellular automata, communication, computer graphics. It is not possible to realize quantum computing without implementation of reversible logic. The main purposes of designing reversible logic are to decrease quantum cost, depth of the circuits and the number of garbage outputs. In this paper, we have proposed a new reversible gate. And we have designed RS flip flop and D flip flop by using our proposed gate and Peres gate. The proposed designs are better than the existing proposed ones in terms of number of reversible gates and garbage outputs. So, this realization is more efficient and less costly than other realizations.
Introduction to Convolutional Codes
Convolutional Encoder Structure
Convolutional Encoder Representation(Vector, Polynomial, State Diagram and Trellis Representations )
Maximum Likelihood Decoder
Viterbi Algorithm
MATLAB Simulation
Hard and Soft Decisions
Bit Error Rate Tradeoff
Consumed Time Tradeoff
Analysis of LDPC Codes under Wi-Max IEEE 802.16eIJERD Editor
The LDPC codes have been shown to be by-far the best coding scheme capable for transmitting message over noisy channel. The main aim of this paper is to study the behaviour of LDPC codes on under IEEE 802.16e guidelines. The rate- ½ LDPC codes have been implemented on AWGN channel and the result shows that they can be used on such channels with low BER performance. The BER can be further minimized by increasing the block length.
Design and Implementation A different Architectures of mixcolumn in FPGAVLSICS Design
This paper details Implementation of the Encryption algorithm AES under VHDL language In FPGA by using different architecture of mixcolumn. We then review this research investigates the AES algorithm in FPGA and the Very High Speed Integrated Circuit Hardware Description language (VHDL). Altera Quartus II software is used for simulation and optimization of the synthesizable VHDL code. The set of transformations of both Encryptions and decryption are simulated using an iterative design approach in order to optimize the hardware consumption. Altera Cyclone III Family devices are utilized for hardware evaluation.
The document describes a modified AES key expansion algorithm for image encryption and decryption. It discusses basics of cryptography and image encryption. It introduces AES and describes the standard AES key expansion process. It then presents a modified AES key expansion algorithm tailored for images where the keys are expanded based on image pixel count, Rcon values are derived from the initial key, and the S-box is shifted based on the initial key. It analyzes the proposed algorithm and shows it offers high encryption quality with minimal time compared to previous techniques.
The Journal of MC Square Scientific Research is published by MC Square Publication on the monthly basis. It aims to publish original research papers devoted to wide areas in various disciplines of science and engineering and their applications in industry. This journal is basically devoted to interdisciplinary research in Science, Engineering and Technology, which can improve the technology being used in industry. The real-life problems involve multi-disciplinary knowledge, and thus strong inter-disciplinary approach is the need of the research.
Blockchain and Smart Contract SimulationJun Furuse
Blockchain systems require high levels of safety due to financial incentives. Tezos focuses on formal verification methods to mathematically prove software properties. Simulations are also important for evaluating aspects that cannot be fully proven, such as consensus algorithms, smart contract behavior under different conditions, and determining appropriate gas costs for smart contract opcodes based on runtime performance.
Richard Hamming developed Hamming codes in the late 1940s to enable error correction in computing. Hamming codes are perfect 1-error correcting codes that use parity checks to detect and correct single bit errors in binary data. The codes work by encoding k message bits into an n-bit codeword with additional parity check bits such that the minimum distance between any two codewords is 3, allowing correction of single bit errors. Hamming codes see widespread use and can be generalized to non-binary alphabets. Extended Hamming codes provide both single-error correction and double-error detection.
This document discusses turbo codes, which are a type of error correction code built by parallel concatenating two convolutional code blocks. It focuses on investigating the iterative decoding of turbo codes. The bit error rate is calculated over multiple iterations of decoding and plotted against signal to noise ratio. Quadratic permutation polynomial and random interleavers are analyzed. Results show turbo codes with a memory of 3 and 1280 bit interleaver achieve the best performance, reaching a bit error rate of 10-5 at -1.2 dB after 10 iterations of decoding.
The document discusses various techniques for image compression, including lossless and lossy methods. For lossless compression, it describes predictive coding techniques that remove inter-pixel redundancy such as delta modulation. It also covers entropy encoding schemes like Huffman coding and LZW coding. For lossy compression, it discusses the discrete cosine transform used in the JPEG standard, where higher frequency coefficients are quantized more coarsely to remove information. Zig-zag ordering is used before entropy coding the quantized DCT coefficients.
IRJET- Review Paper on Study of Various Interleavers and their SignificanceIRJET Journal
This document reviews various interleavers and their significance in digital communication systems. It discusses how interleavers can be used in turbo encoders and decoders to improve error correction capabilities without reducing bandwidth. The document summarizes different types of interleavers including random, QPP, helical, odd-even, and matrix interleavers. It also discusses turbo encoding and decoding processes and how convolutional codes differ from block codes. Key performance metrics like bit error rate and bit error rate curves are analyzed to evaluate and compare interleaver quality.
The document discusses using network coding to optimize routing schemes for multicasting in mobile ad-hoc networks. It defines the problem and assumptions, such as independent information sources and specifying multicast requirements. Network coding is described as employing coding at nodes rather than just relaying data. Simulations show that calculating the optimal routing scheme is much less complex with network coding compared to conventional routing, and the network coding solution uses close to the minimum possible energy.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
This document presents a methodology for designing low error fixed width adaptive multipliers. It begins by discussing Baugh-Wooley multiplication, which produces a 2n-bit output from n-bit inputs. For digital signal processing applications, only an n-bit output is required. Direct truncation introduces errors. The methodology proposes using a generalized index and binary thresholding to derive an error-compensation bias to reduce truncation errors. It defines different types of binary thresholding and analyzes statistics to determine average bias values. The proposed fixed width multiplier is intended to have better error performance than other existing multiplier structures.
This document describes an implementation of a soft-decision Viterbi decoder for a GSM receiver using AltiVec SIMD instructions on PowerPC processors. It discusses encoding and decoding of data using convolutional codes, the structure of frames, and an algorithm to calculate branch metrics and update path metrics in parallel using AltiVec to determine the most likely decoding of received bits and perform traceback to output the decoded data. Key steps include setting up constants, calculating branch metrics from soft bit values, accumulating these into path metrics to evaluate possible paths, selecting the best path and tracing back states to decode the received bits.
This document contains a Java program that implements the Cyclic Redundancy Check (CRC) algorithm for error detection in data transmission. The CRC algorithm uses polynomial arithmetic to compute a checksum over transmitted data by dividing the data by a fixed generator polynomial. The checksum is appended to the data before transmission. On the receiving end, the data and checksum are divided again by the generator polynomial. If the remainder is non-zero, then an error is detected in the received data. The Java program takes input data and generator polynomial from the user, computes the CRC checksum, transmits the data with checksum, and checks for errors on receiving end.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Data Compression - Text Compression - Run Length EncodingMANISH T I
Run-length encoding (RLE) replaces consecutive repeated characters in data with a single character and count. For example, "aaabbc" would compress to "3a2bc". RLE works best on data with many repetitive characters like spaces. It has limitations for natural language text which contains few repetitions longer than doubles. Variants include digram encoding which compresses common letter pairs, and differencing which encodes differences between successive values like temperatures instead of absolute values.
Nowadays exponential advancement in reversible comp
utation has lead to better fabrication and
integration process. It has become very popular ove
r the last few years since reversible logic circuit
s
dramatically reduce energy loss. It consumes less p
ower by recovering bit loss from its unique input-o
utput
mapping. This paper presents two new gates called
RC-I and RC-II to design an n-bit signed binary
comparator where simulation results show that the p
roposed circuit works correctly and gives significa
ntly
better performance than the existing counterparts.
An algorithm has been presented in this paper for
constructing an optimized reversible n-bit signed c
omparator circuit. Moreover some lower bounds have
been proposed on the quantum cost, the numbers of g
ates used and the number of garbage outputs
generated for designing a low cost reversible sign
ed comparator. The comparative study shows that the
proposed design exhibits superior performance consi
dering all the efficiency parameters of reversible
logic
design which includes number of gates used, quantum
cost, garbage output and constant inputs. This
proposed design has certainly outperformed all the
other existing approaches.
International Journal of Engineering and Science Invention (IJESI) inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
BCH codes, part of the cyclic codes, are very powerful error correcting codes widely used in the information coding techniques. This presentation explains these codes with an example.
This document provides an introduction to network coding, including:
- Examples of how network coding can increase throughput in butterfly networks and wireless communications
- Theories showing how intermediate nodes can linearly combine information to deliver data at the maximum possible rate
- Benefits of network coding like increased throughput and efficiency, but also challenges like integrating it into existing infrastructure
Design of Reversible Sequential Circuit Using Reversible Logic SynthesisVLSICS Design
Reversible logic is one of the most vital issue at present time and it has different areas for its application, those are low power CMOS, quantum computing, nanotechnology, cryptography, optical computing, DNA computing, digital signal processing (DSP), quantum dot cellular automata, communication, computer graphics. It is not possible to realize quantum computing without implementation of reversible logic. The main purposes of designing reversible logic are to decrease quantum cost, depth of the circuits and the number of garbage outputs. In this paper, we have proposed a new reversible gate. And we have designed RS flip flop and D flip flop by using our proposed gate and Peres gate. The proposed designs are better than the existing proposed ones in terms of number of reversible gates and garbage outputs. So, this realization is more efficient and less costly than other realizations.
Introduction to Convolutional Codes
Convolutional Encoder Structure
Convolutional Encoder Representation(Vector, Polynomial, State Diagram and Trellis Representations )
Maximum Likelihood Decoder
Viterbi Algorithm
MATLAB Simulation
Hard and Soft Decisions
Bit Error Rate Tradeoff
Consumed Time Tradeoff
Analysis of LDPC Codes under Wi-Max IEEE 802.16eIJERD Editor
The LDPC codes have been shown to be by-far the best coding scheme capable for transmitting message over noisy channel. The main aim of this paper is to study the behaviour of LDPC codes on under IEEE 802.16e guidelines. The rate- ½ LDPC codes have been implemented on AWGN channel and the result shows that they can be used on such channels with low BER performance. The BER can be further minimized by increasing the block length.
Design and Implementation A different Architectures of mixcolumn in FPGAVLSICS Design
This paper details Implementation of the Encryption algorithm AES under VHDL language In FPGA by using different architecture of mixcolumn. We then review this research investigates the AES algorithm in FPGA and the Very High Speed Integrated Circuit Hardware Description language (VHDL). Altera Quartus II software is used for simulation and optimization of the synthesizable VHDL code. The set of transformations of both Encryptions and decryption are simulated using an iterative design approach in order to optimize the hardware consumption. Altera Cyclone III Family devices are utilized for hardware evaluation.
The document describes a modified AES key expansion algorithm for image encryption and decryption. It discusses basics of cryptography and image encryption. It introduces AES and describes the standard AES key expansion process. It then presents a modified AES key expansion algorithm tailored for images where the keys are expanded based on image pixel count, Rcon values are derived from the initial key, and the S-box is shifted based on the initial key. It analyzes the proposed algorithm and shows it offers high encryption quality with minimal time compared to previous techniques.
The Journal of MC Square Scientific Research is published by MC Square Publication on the monthly basis. It aims to publish original research papers devoted to wide areas in various disciplines of science and engineering and their applications in industry. This journal is basically devoted to interdisciplinary research in Science, Engineering and Technology, which can improve the technology being used in industry. The real-life problems involve multi-disciplinary knowledge, and thus strong inter-disciplinary approach is the need of the research.
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
This document discusses modeling digital signal processing (DSP) CPU architectures using MATLAB. It provides an overview of typical DSP design flows and modeling approaches, including behavioral, bit-accurate, time-accurate and pipeline models. It also discusses fixed-point number representation issues in MATLAB and outlines several tutorials that demonstrate modeling DSP units like adders, accumulators and filters using MATLAB.
Lossless compression algorithms compress data without any loss of information, allowing the original data to be perfectly reconstructed from the compressed data. Some common lossless compression algorithms include run length encoding (RLE), Huffman coding, Lempel-Ziv-Welch (LZW), and variable length coding (VLC). RLE replaces repeated characters with a single character and count, Huffman coding assigns variable length binary codes to characters based on their frequency, and LZW builds a dictionary of repeated strings to shorten repetitive sequences during compression. Lossless compression is useful for storage and transmission of files where preserving complete accuracy of the data is important.
This document describes the design and verification of an 8x8 Vedic multiplier using a 90nm CMOS process. It presents the design methodology, including the use of Vedic multiplication algorithms to reduce computational steps compared to traditional methods. Transistor-level schematics for 2x2 and 4x4 multiplier modules are designed in Cadence using a 90nm library. The 4x4 module uses ripple carry adders to sum partial products in parallel. Simulation results verify the transistor-level designs match an ideal multiplier designed in Verilog, demonstrating an efficient digital multiplier based on Vedic mathematics.
Hardware implementation of aes encryption and decryption for low area & power...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Enhancing the matrix transpose operation using intel avx instruction set exte...ijcsit
General-purpose microprocessors are augmented with short-vector instruction extensions in order to
simultaneously process more than one data element using the same operation. This type of parallelism is
known as data-parallel processing. Many scientific, engineering, and signal processing applications can be
formulated as matrix operations. Therefore, accelerating these kernel operations on microprocessors,
which are the building blocks or large high-performance computing systems, will definitely boost the
performance of the aforementioned applications. In this paper, we consider the acceleration of the matrix
transpose operation using the 256-bit Intel advanced vector extension (AVX) instructions. We present a
novel vector-based matrix transpose algorithm and its optimized implementation using AVX instructions.
The experimental results on Intel Core i7 processor demonstrates a 2.83 speedup over the standard
sequential implementation, and a maximum of 1.53 speedup over the GCC library implementation. When
the transpose is combined with matrix addition to compute the matrix update, B + AT, where A and B are
squared matrices, the speedup of our implementation over the sequential algorithm increased to 3.19.
This document provides an overview of the C programming language. It discusses the features of C including that it is simple, versatile, supports separate compilation of functions, and can be used for systems programming. It also covers C data types like integers, floating-point numbers, and how they are represented. Additional topics include variable names, comments, input/output functions like printf() and scanf(), and conventions for writing readable C code. The document serves as an introduction to key concepts in C programming.
This document provides an overview of the C programming language. It discusses various features of C like it being a simple, versatile language that allows for separate compilation of functions. It also covers different data types in C like integer, floating-point, characters, and arrays. Examples are given to illustrate how integers of different sizes like char, short, int, long are represented. The document also discusses floating-point number representation according to the IEEE 754 standard. Finally, it briefly introduces common input/output functions in C like printf() and scanf() and covers coding conventions.
This document provides an overview of the C programming language. It discusses the features of C including that it is simple, versatile, supports separate compilation of functions, and can be used for systems programming. It also covers C data types like integers, floating-point numbers, and how they are represented. Additional topics include variable names, comments, input/output functions like printf() and scanf(), and conventions for writing readable C code. The document serves as an introduction to programming in C.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The document discusses various arithmetic instructions in assembly language including INC, DEC, ADD, SUB, MUL, DIV.
It explains the syntax and usage of each instruction, showing how they can increment, decrement, add, subtract, multiply or divide operands in registers or memory. Examples are given to illustrate how to perform basic arithmetic operations like addition and multiplication of numbers entered by the user.
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge.
We present some of the lessons learned while designing faster indexes, with a particular emphasis on compressed bitmap indexes. Compressed bitmap indexes accelerate queries in popular systems such as Apache Spark, Git, Elastic, Druid and Apache Kylin.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
The document presents a novel approach for forward error correction (FEC) decoding based on the belief propagation (BP) algorithm in LTE and WiMAX systems. Specifically, it proposes representing tail-biting convolutional codes and turbo codes using parity check matrices, which allows both code types to be decoded using a unified BP algorithm. This provides a lower complexity decoding architecture compared to traditional approaches. Simulation results show the BP algorithm achieves near-identical performance to maximum a posteriori (MAP) decoding for turbo codes, while being less complex. Representing codes with parity check matrices thus enables a universal decoder for LTE and WiMAX using a single BP algorithm.
This presentation has information about what do you mean by an algorithm, what is hashing and various hashing algorithms and their applications. Approximate counting Algorithm and their applications, Counting Distinct Elements Algorithm and their applications and Frequency estimation algorithm and their applications . Also, the research papers we referenced.
Development of an Algorithm for 16-Bit WTMIOSR Journals
This document describes the development of an algorithm for a 16-bit Wallace tree multiplier (WTM). It begins with an overview of binary multiplication methods and why the Wallace tree structure is advantageous in reducing propagation delay. The document then discusses improvements made to the basic WTM algorithm, including a new method for generating partial products using fewer logic gates. It presents the design, synthesis and testing of WTM circuits of varying sizes on a Spartan-3E FPGA board. Performance metrics like delay, area, power-delay product and area-delay product are measured and compared to other multipliers. The 16-bit WTM is found to have superior performance to the other multipliers in terms of delay, area and speed.
Weapons of Math Instruction: Evolving from Data0-Driven to Science-Drivenindeedeng
Donal McMahon, Director of Data Science at Indeed, presented how to transition from data-driven to science-driven product development. You’ll make better business decisions. It’s provable!
Alchemy and Science: Choosing Metrics That Workindeedeng
I help people find jobs by offering and hiring them. This document discusses a job opening for a data scientist role at Microsoft where the candidate would be responsible for defining data strategy, analyzing data, sharing insights, and helping achieve business goals by understanding customers through data.
Indeed Engineering and The Lead Developer Present: Tech Leadership and Manage...indeedeng
On March 1 2018, Indeed hosted a series of talks about leadership and management in the tech industry. Lighting talks included Data Scientist Robyn Rap with "Fish a Manager to Teach," Product Manager Michael Magan's "What Your Product Manager Wants from a Tech Lead," and Engineering Manager Paresh Suthar discussed "New Engineering Manager at Indeed? First: Write Some Code."
Ketan Gangatirkar, head of Job Seeker Engineering, provided the keynote "Quantum Leap: From Managing a Team to Leading an Org."
Indeed Engineering and The Lead Developer Present: Tech Leadership and Manage...indeedeng
This document summarizes a presentation given by Michael Magan, a Product Manager at Indeed. The presentation focuses on how to be an effective product manager by motivating teams, building products to last, and simplifying requirements. It provides examples of how Magan motivates his team by clearly defining success metrics and sharing data on product impact. It also emphasizes the importance of identifying high impact features and validating ideas with data before building them. The presentation encourages product managers to simplify requirements by prioritizing work based on its impact and difficulty. It concludes by discussing career paths for product managers such as becoming a director of engineering, chief architect, or CTO.
Improving the development process with metrics driven insights presentationindeedeng
- Jack Humphrey, VP of Engineering at Indeed, discusses how the company uses metrics and data analytics to improve processes and make decisions.
- They capture metrics on everything from code commits to production deployments to better understand workflows.
- As an example, Indeed measured translation verification times in JIRA and found issues were pending for too long, so they implemented a new deployment process that significantly reduced times.
- Metrics must be questioned and used carefully to avoid unintended outcomes, but can effectively support process improvement when paired with learning and discussion.
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Makingindeedeng
The document discusses common anti-patterns in evidence-based decision making, including being impatient, taking shortcuts in sampling and analysis, focusing on a single metric, and believing too strongly in one's own conclusions. It provides examples of companies making misguided decisions due to these anti-patterns, such as ending A/B tests early, ignoring parts of a sample, overemphasizing short-term metrics, and overrelying on persuasive but incorrect stories. The document advocates being patient, rigorous in sampling and analysis, considering multiple relevant metrics, and acknowledging the potential for fallibility.
Automation and Developer Infrastructure — Empowering Engineers to Move from I...indeedeng
Link to video: https://youtu.be/aHHfq4WK9Jw
At Indeed, we're growing quickly, from our engineer headcount to the number of features we deploy. Over the last three years, we’ve had a 6x increase in engineers, and a 15x increase in number of deploys. We’re currently deploying over 700 new features each week. In this talk, we'll describe the infrastructure built to support, scale and automate our software development and product releases, and how any organization can use these tools and techniques to improve release velocity in the face of rapid growth. Specifically, we will discuss Hobo — an easy, standardized way for developers to run our application stacks in Docker. We’ll also describe Control Tower, which manages software releases by unifying all of the information about application features into a single interface. These tools allow engineers to focus on product development, while moving their work from idea to production as efficiently as possible.
@Indeedeng: RAD - How We Replicate Terabytes of Data Around the World Every Dayindeedeng
Link to video: https://youtu.be/lDXdf5q8Yw8
At Indeed, we use massive amounts of data to build our products and services. At first, we relied on rsync to distribute these data to our servers. This rsync system lasted for ten years before we started to encounter scaling challenges. So we built a new system on top of BitTorrent to improve latency, reliability, and throughput. Today, terabytes of data flow around the world every day between our servers. In this talk, we will describe what we needed, what we created, and the lessons we learned building a system at this scale.
The document discusses job recommendations and similarities between users on a job site. It notes that millions of new jobs are discovered daily and 1.5 million new users visit the site each day. It also explains that the average lifespan of a job posting is about 30 days. The document then describes algorithms used to calculate similarities between users based on jobs viewed, and how job recommendations are personalized based on clustering users with similar interests.
Imhotep is Indeed's open-source analytics platform that allows for easy upload and compression of data as well as fast, interactive querying. It uses an SQL-like query language called IQL to perform aggregate analytics on datasets within specified date ranges and with optional filters, groupings, and metrics. The platform aims to provide an interactive experience where questions can be quickly refined, show ground truth data through a web interface, and instantly share insights through caching.
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...indeedeng
This document provides an overview of the technical challenges in launching Indeed's job search platform around the world. It discusses how Indeed handles tokenization and indexing of jobs in different languages, including challenges with Chinese, Japanese, and Korean text. It describes Indeed's approaches to language detection, stemming, and query expansion to improve recall and relevance across many international markets. Key techniques discussed include n-gram tokenization, Unicode blocking, Bayesian classification, term expansion maps separated from indexing, and rule-based stemming. The goal is to make Indeed's search system scalable, generic, and able to support comprehensive use cases for job searching in different languages and regions globally.
[@IndeedEng] Large scale interactive analytics with Imhotepindeedeng
Link to video: https://www.youtube.com/watch?v=IZ-kC6ut1Lg
In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. This has kept our engineering and product organizations focused on key metrics by analyzing test results. It also gives our marketing organization timely and accurate insight into our data - allowing us to identify opportunities, spot trends, and learn about our job seekers. In this talk, Zak Cocos, who leads our Marketing Sciences team, and Product Manager Tom Bergman will discuss and provide examples of the valuable insights that can be gained by using Imhotep with almost any data set.
Video available at: http://youtu.be/y0WC1cxLsfo
At Indeed our applications generate billions of log events each month across our seven data centers worldwide. These events store user and test data that form the foundation for decision making at Indeed. We built a distributed event logging system, called Logrepo, to record, aggregate, and access these logs. In this talk, we'll examine the architecture of Logrepo and how it evolved to scale.
Jeff Chien joined Indeed as a software engineer in 2008. He's worked on jobsearch frontend and backend, advertiser, company data, and apply teams and enjoys building scalable applications.
Jason Koppe is a Systems Administrator who has been with Indeed since late 2008. He's worked on infrastructure automation, monitoring, application resiliency, incident response and capacity planning.
[@IndeedEng] Boxcar: A self-balancing distributed services protocol indeedeng
Video available at: http://www.youtube.com/watch?v=E1ok08TVxDw
Indeed's flagship job search product has evolved over the years to meet new challenges. It began as a single, monolithic web application. This grew larger and increasingly complex as we built new features. To remedy this growing problem, we implemented a service-oriented architecture to improve system availability, scalability, and maintainability. We examined common practices for service-oriented architectures, and we discovered ways to improve on the state of the art. We developed these ideas into a new framework called Boxcar. In this talk, we will discuss the scaling problems we solved, the innovative ideas behind boxcar, and how we built the scalable architecture that we now use throughout our systems.
R.B. Boyer is a Software Engineer who has been with Indeed since late 2007. Over the years he has worked on a variety of projects, including distributed storage, authentication, and service architectures.
[@IndeedEng Talk] Diving deeper into data-driven product designindeedeng
Video available at: http://www.youtube.com/watch?v=i8MGTZ3KWmc
At April’s @IndeedEng Talk we introduced Indeed’s philosophy and practice of A/B testing. In this talk, two Indeed product managers will discuss how we used data-driven opportunity analysis and iterative testing to build two products. From product vision to product success, we’ll describe what we tested, how it performed, and what we learned from it. Product managers, designers, and engineers who want to learn how to prioritize product and feature ideas, iterate through tests, or optimize a funnel will find valuable insights to apply to their own products.
Graham Davis is a Senior Product Manager for Employer Products at Indeed. Prior to Indeed, Graham previously worked for several startups and got an MBA from Harvard Business School.
Donald Wysocki is Product Director for Job Search at Indeed. Prior to Indeed, Donald worked at frog design and Microsoft.
[@IndeedEng] Managing Experiments and Behavior Dynamically with Proctorindeedeng
Video available at: http://youtu.be/Q1T5J0KXUwY
At this very moment, Indeed is running more than one hundred A/B experiments. In previous @IndeedEng talks, we have discussed how we use A/B testing to develop better products.
In this tech talk, software engineer Matt Schemmel and product manager Tom Bergman describe Proctor, the system we developed to define and manage all of these experiments. They explain how we use Proctor to target users using data-driven rules, adjust experiments on-the-fly, and ensure clean results for multi-variate tests. Over time, Proctor has evolved from a system designed for managing experiments to one that manages overall system behavior through dynamic "feature toggle" functionality. Matt and Tom also share lessons we have learned from years of experimenting at web scale.
Matt Schemmel is a Senior Software Engineer working primarily on our Resume products.
Tom Bergman is a Product Manager currently working on our Aggregation systems. He previously helped evolve many of Indeed's data analysis tools, and also helped us launch and grow our sites in Japan, Korea, and China.
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...indeedeng
Video available: http://www.youtube.com/watch?v=zCy077_dyJo&feature=youtu.be
Since 2005, Indeed has created and cultivated a strong engineering culture with a focus on ownership, real-world impact, and constant incremental delivery. Our experience has demonstrated that rapid iteration is essential to discovering the most valuable functionality for our users. In the next @IndeedEng talk, Dan Heller will share some of the architectural solutions, tools, and processes Indeed has created to support constant incremental delivery of new features and enhancements.
Speaker:
Dan Heller has been working in software development for 13 years including time at Google, IBM, and long-forgotten startups. He has been at Indeed for the last 4 years, helping people get jobs by building products for Indeed’s employers and advertisers.
[@IndeedEng] Redundant Array of Inexpensive Datacentersindeedeng
Video available: http://youtu.be/hOsA5UpPUSU
Learn how Indeed built one of the fastest and most reliable websites in the world. Indeed Operations ensures indeed.com is always available and always fast for the jobseeker. Operations leaders Charles Valentine and Chris Graf will share how we configure and provision multiple datacenters around the world to provide a massively scalable platform for connecting job seekers with jobs. Charles and Chris will detail a simple and inexpensive method to build a platform that provides DNS-based global load balancing and failover, provider portability, and disposable datacenters.
Speakers:
Charles Valentine (VP of Technology Services at Indeed) leads the Operations, IT, and Security teams. Prior to joining Indeed in 2011, Charles served as VP Technology Services at The Knot.
Chris Graf has managed operations at Indeed since 2011. In that time, Indeed's traffic has grown by more than 300%. Prior to Indeed, Chris managed Web operations in the online gaming industry.
[@IndeedEng] Building Indeed Resume Searchindeedeng
Video available: http://youtu.be/qcnP5gQGBaU
Software engineer David Tulig will dive into the architecture of Indeed’s Resume Instant Search and our use of the Google Closure tools. David will explain how we write maintainable, efficient JavaScript components for Resume Instant Search and other Indeed products. He will discuss how we create templates that run on both client and server, providing fast initial page load time and search engine-friendly pages with the responsiveness of client-side rendering.
Speaker:
David Tulig is a software engineer on the Job Search team at Indeed. David has worked on employer, resume, and job search products during his 4 years at Indeed.
Mechatronics is a multidisciplinary field that refers to the skill sets needed in the contemporary, advanced automated manufacturing industry. At the intersection of mechanics, electronics, and computing, mechatronics specialists create simpler, smarter systems. Mechatronics is an essential foundation for the expected growth in automation and manufacturing.
Mechatronics deals with robotics, control systems, and electro-mechanical systems.
Home security is of paramount importance in today's world, where we rely more on technology, home
security is crucial. Using technology to make homes safer and easier to control from anywhere is
important. Home security is important for the occupant’s safety. In this paper, we came up with a low cost,
AI based model home security system. The system has a user-friendly interface, allowing users to start
model training and face detection with simple keyboard commands. Our goal is to introduce an innovative
home security system using facial recognition technology. Unlike traditional systems, this system trains
and saves images of friends and family members. The system scans this folder to recognize familiar faces
and provides real-time monitoring. If an unfamiliar face is detected, it promptly sends an email alert,
ensuring a proactive response to potential security threats.
Accident detection system project report.pdfKamal Acharya
The Rapid growth of technology and infrastructure has made our lives easier. The
advent of technology has also increased the traffic hazards and the road accidents take place
frequently which causes huge loss of life and property because of the poor emergency facilities.
Many lives could have been saved if emergency service could get accident information and
reach in time. Our project will provide an optimum solution to this draw back. A piezo electric
sensor can be used as a crash or rollover detector of the vehicle during and after a crash. With
signals from a piezo electric sensor, a severe accident can be recognized. According to this
project when a vehicle meets with an accident immediately piezo electric sensor will detect the
signal or if a car rolls over. Then with the help of GSM module and GPS module, the location
will be sent to the emergency contact. Then after conforming the location necessary action will
be taken. If the person meets with a small accident or if there is no serious threat to anyone’s
life, then the alert message can be terminated by the driver by a switch provided in order to
avoid wasting the valuable time of the medical rescue team.
Blood finder application project report (1).pdfKamal Acharya
Blood Finder is an emergency time app where a user can search for the blood banks as
well as the registered blood donors around Mumbai. This application also provide an
opportunity for the user of this application to become a registered donor for this user have
to enroll for the donor request from the application itself. If the admin wish to make user
a registered donor, with some of the formalities with the organization it can be done.
Specialization of this application is that the user will not have to register on sign-in for
searching the blood banks and blood donors it can be just done by installing the
application to the mobile.
The purpose of making this application is to save the user’s time for searching blood of
needed blood group during the time of the emergency.
This is an android application developed in Java and XML with the connectivity of
SQLite database. This application will provide most of basic functionality required for an
emergency time application. All the details of Blood banks and Blood donors are stored
in the database i.e. SQLite.
This application allowed the user to get all the information regarding blood banks and
blood donors such as Name, Number, Address, Blood Group, rather than searching it on
the different websites and wasting the precious time. This application is effective and
user friendly.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Software Engineering and Project Management - Software Testing + Agile Method...Prakhyath Rai
Software Testing: A Strategic Approach to Software Testing, Strategic Issues, Test Strategies for Conventional Software, Test Strategies for Object -Oriented Software, Validation Testing, System Testing, The Art of Debugging.
Agile Methodology: Before Agile – Waterfall, Agile Development.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
Supermarket Management System Project Report.pdfKamal Acharya
Supermarket management is a stand-alone J2EE using Eclipse Juno program.
This project contains all the necessary required information about maintaining
the supermarket billing system.
The core idea of this project to minimize the paper work and centralize the
data. Here all the communication is taken in secure manner. That is, in this
application the information will be stored in client itself. For further security the
data base is stored in the back-end oracle and so no intruders can access it.
1. 1st
International Symposium on Web AlGorithms • June 2015
Vectorized VByte Decoding
Jeff Plaisance
Indeed
jplaisance@indeed.com
Nathan Kurz
Verse Communications
nate@verse.com
Daniel Lemire
LICEF, Universit´e du Qu´ebec
lemire@gmail.com
Abstract
We consider the ubiquitous technique of VByte compression,
which represents each integer as a variable length sequence of
bytes. The low 7 bits of each byte encode a portion of the inte-
ger, and the high bit of each byte is reserved as a continuation
flag. This flag is set to 1 for all bytes except the last, and the
decoding of each integer is complete when a byte with a high
bit of 0 is encountered. VByte decoding can be a performance
bottleneck especially when the unpredictable lengths of the en-
coded integers cause frequent branch mispredictions. Previous
attempts to accelerate VByte decoding using SIMD vector in-
structions have been disappointing, prodding search engines
such as Google to use more complicated but faster-to-decode
formats for performance-critical code. Our decoder (MASKED
VBYTE) is 2 to 4 times faster than a conventional scalar VByte
decoder, making the format once again competitive with regard
to speed.
I. INTRODUCTION
In many applications, sequences of integers are com-
pressed with VByte to reduce memory usage. For ex-
ample, it is part of the search engine Apache Lucene (un-
der the name vInt). It is used by Google in its Protocol
Buffers interchange format (under the name Varint) and
it is part of the default API in the Go programming lan-
guage. It is also used in databases such as IBM DB2
(under the name Variable Byte) [?].
We can describe the format as follows. Given a non-
negative integer in binary format, and starting from the
least significant bits, we write it out using seven bits in
each byte, with the most significant bit of each byte set
to 0 (for the last byte), or to 1 (in the preceding bytes). In
this manner, integers in [0, 27
) are coded using a single
byte, integers in [27
, 214
) use two bytes and so on. See
Table 1 for examples.
The VByte format is applicable to arbitrary integers
including 32-bit and 64-bit integers. However, we focus
on 32-bit integers for simplicity.
Table 1: VByte form for various powers of two. Within each
word, the most significant bits are presented first. In
the VByte form, the most significant bit of each byte is
in bold.
integer binary form (16 bits) VByte form
1 0000000000000001 00000001
2 0000000000000010 00000010
4 0000000000000100 00000100
128 0000000010000000 10000000, 00000001
256 0000000100000000 10000000, 00000010
512 0000001000000000 10000000, 00000100
16384 0100000000000000 10000000, 10000000, 00000001
32768 1000000000000000 10000000, 10000000, 00000010
Differential coding A common application in infor-
mation retrieval is to compress the list of document iden-
tifiers in an inverted index [?]. In such a case, we would
not code directly the identifiers (x1, x2, . . .), but rather
their successive differences (e.g., x1 − 0, x2 − x1, . . .),
sometimes called deltas or gaps. If the document iden-
tifiers are provided in sorted order, then we might ex-
pect the gaps to be small and thus compressible using
VByte. We refer to this approach as differential coding.
There are several possible approaches to differential cod-
ing. For example, if there are no repeated values, we can
subtract one from each difference (x1 − 0, x2 − x1 −
1, x3 − x2 − 1, . . .) or we can subtract blocks of integers
for greater speed (x1, x2, x3, x4, x5 − x1, x6 − x2, x7 −
x3, x8 − x4, . . .). For simplicity, we only consider gaps
defined as successive differences (x2−x1, . . .). In this in-
stance, we need to compute a prefix sum over the gaps to
recover the original values (i.e., xi = (xi−xi−1)+xi−1).
II. EFFICIENT VBYTE DECODING
One of the benefits of the VByte format is that we can
write an efficient decoder using just a few lines of code
in almost any programming language. A typical de-
coder applies Algorithm 1. In this algorithm, the function
readByte provides byte values in [0, 28
) representing
a number x in the VByte format.
Processing each input byte requires only a few inex-
pensive operations (e.g., two additions, one shift, one
mask). However, each byte also involves a branch. On
a recent Intel processor (e.g., one using the Haswell mi-
croarchitecture), a single mispredicted branch can incur
a cost of 15 cycles or more. When all integers are com-
pressed down to one byte, mispredictions are rare and
the performance is high. However, when both one and
two byte values occur in close proximity the branch may
become less predictable and performance may suffer.
For differential coding, we modify this algorithm so
that it decodes the gaps and computes the prefix sum. It
suffices to keep track of the last value decoded and add it
to the decoded gap.
III. SIMD INSTRUCTIONS
Intel processors provide SIMD instructions operating on
128-bit registers (called XMM registers). These regis-
ters can be considered as vectors of two 64-bit integers,
vector2 of four 32-bit integers, vectors of eight 16-bit in-
tegers or vectors of sixteen 8-bit integers.
We review the main SIMD instructions we require in
Table 2. We can roughly judge the computational cost of
an instruction by its latency and reciprocal throughput.
The latency is the minimum number of cycles required
2. 1st
International Symposium on Web AlGorithms • June 2015
Algorithm 1 Conventional VByte decoder. The con-
tinue instruction returns the execution to the main loop.
The readByte function returns the next available input
byte.
1: y ← empty array of 32-bit integers
2: while input bytes are available do
3: b ← readByte()
4: if b ≤ 128 then append b to y and continue
5: c ← b
6: b ← readByte()
7: if b ≤ 128 then append c + b × 27
to y and con-
tinue
8: c ← (b mod 27
) × 27
9: b ← readByte()
10: if b ≤ 128 then append c + b × 214
to y and con-
tinue
11: c ← (b mod 27
) × 214
12: b ← readByte()
13: if b ≤ 128 then append c ← (b mod 27
) × 221
to
y and continue
14: b ← readByte()
15: append c + (b mod 27
) × 228
to y
16: return y
to execute the instruction. The latency is most important
when subsequent operations have to wait for the instruc-
tion to complete. The reciprocal throughput is the inverse
of the maximum number of instructions that can be exe-
cuted per cycle. For example, a reciprocal throughput of
0.5 means that up to two instructions can be executed per
cycle.
We use the movdqu instruction to load or store a reg-
ister. Loading and storing registers has a relatively high
latency (3 cycles). While we can load two registers per
cycle, we can only store one of them to memory. A typ-
ical SIMD instruction is paddd: it adds two vectors of
four 32-bit integers at once.
Sometimes it is necessary to selectively copy the con-
tent from one XMM register to another while possibly
copying and duplicating components to other locations.
We can do so with the pshufd instruction when consid-
ering the registers as vectors of 32-bit integers, or with
the pshufb instruction when registers is considered
vectors of bytes. These instructions take an input reg-
ister v as well as a control mask m and they output a new
vector (vm0
, vm1
, vm2
, vm3
, . . .) with the added conven-
tion that v−1 ≡ 0. Thus, for example, the pshufd in-
struction can copy one particular value to all positions
(using a mask made of 4 identical values). If we wish
to shift by a number of bytes, it can be more efficient to
use a dedicated instruction (psrldq or pslldq) even
though the pshufb instruction could achieve the same
result. Similarly, we can use the pmovsxbd instruction
to more efficiently unpack the first four bytes as four 32-
bit integers.
We can simultaneously shift right by a given number
of bits all of the components of a vector using the instruc-
tions psrlw (16-bit integers), psrld (32-bit integers)
and psrlq (64-bit integers). There are also correspond-
ing left-shift instructions such as psllq. We can also
compute the bitwise OR and bitwise AND between two
128-bit registers using the por and pand instructions.
There is no instruction to shift a vector of 16-bit in-
tegers by different number of bits (e.g., (v1, v2, . . .) →
(v1 1, v2 2, . . .)) but we can get the equivalent
result by mutiplying integers (e.g., with the pmullw in-
struction). The AVX2 instruction set introduced such
flexible shift instructions (e.g., vpsrlvd), and they are
much faster than a multiplication, but they not applicable
to vectors of 16-bit integers. Intel proposed a new in-
struction set (AVX-512) which contains such an instruc-
tion (vpsrlvw) but it is not yet publicly available.
Our contribution depends crucially on the pmovmskb
instruction. Given a vector of sixteen bytes, it out-
puts a 16-bit value made of the most significant bit of
each of the sixteen input bytes: e.g., given the vector
(128, 128, . . . , 128), pmovmskb would output 0xFFFF.
IV. MASKED VBYTE DECODING
The conventional VByte decoders algorithmically pro-
cess one input byte at a time (see Algorithm 1). To multi-
ply the decoding speed, we want to process larger chunks
of input data at once. Thankfully, commodity Intel and
AMD processors have supported Single instruction, mul-
tiple data (SIMD) instructions since the introduction of
the Pentium 4 in 2001. These instructions can process
several words at once, enabling vectorized algorithms.
Stepanov et al. [?] used SIMD instructions to acceler-
ate the decoding of VByte data (which they call varint-
SU). According to their experimental results, SIMD in-
structions lead to a disappointing speed improvement of
less than 25 %, with no gain at all in some instances. To
get higher speeds (e.g., an increase of 3×), they proposed
instead new formats akin to Google’s Group Varint [?].
For simplicity, we do not consider such “Group” alter-
natives further: once we consider different data format, a
wide range of fast SIMD-based compression schemes be-
come available [?]—some of them faster than Stepanov
et al.’s fastest proposal.
Though they did not provide a detailed description,
Stepanov et al.’s approach resembles ours in spirit. Con-
sider the simplified example from Fig. 1. It illustrates the
main steps:
• From the input bytes, we gather the control bits
(1,0,1,0,0,0 in this case) using the pmovmskb in-
struction.
• From the resulting mask, we look up a control mask
in a table and apply the pshufb instruction to move
the bytes. In our example, the first 5 bytes are left
in place (at positions 1, 2, 3, 4, 5) whereas the 5th
byte is moved to position 7. Other output bytes are
set to zero.
• We can then extract on the first 7 bits of the low
bytes (at positions 1, 3, 5, 7) into a new 8-byte reg-
ister. We can also extract the high bytes (positions
3. 1st
International Symposium on Web AlGorithms • June 2015
Table 2: Relevant SIMD instructions on Haswell Intel processors with latencies and reciprocal throughput in CPU cycles .
instruction description latency rec. through-
put
movdqu store or retrieve a 128-bit register 3 1/0.5
paddd add four pairs of 32-bit integers 1 0.5
pshufd shuffle four 32-bit integers 1 1
pshufb shuffle sixteen bytes 1 1
psrldq shift right by a number of bytes 1 0.5
pslldq shift left by a number of bytes 1 0.5
pmovsxbd unpack the first four bytes into four 32-bit ints. 1 0.5
pmovsxwd unpack the first four 16-bit integers into four 32-bit ints. 1 0.5
psrlw shift right eight 16-bit integers 1 1
psrld shift right four 32-bit integers 1 1
psrlq shift right two 64-bit integers 1 1
psllq shift left two 64-bit integers 1 1
por bitwise OR between two 128-bit registers 1 0.33
pand bitwise AND between two 128-bit registers 1 0.33
pmullw multiply eight 16-bit integers 5 1
pmovmskb create a 16-bit mask from the most significant bits 3 1
2, 4, 6, 8) into another 8-byte register. On this sec-
ond register, we apply a right shift by 1 bit on the
four 16-bit values (using psrlw). Finally, we com-
pute the bitwise OR of these two registers, combin-
ing the results from the low and high bits.
A na¨ıve implementation of this idea could be slow. In-
deed, we face several performance challenges:
• The pmovmskb instruction has a relatively high la-
tency (e.g., 3 cycles on the Haswell microarchitec-
ture).
• The pmovmskb instruction processes 16 bytes at
once, generating a 16-bit result. Yet looking up a
16-bit value in a table would require a 65536-value
table. Such a large table is likely to stress the CPU
cache.
Moreover, we do not know ahead of time where
coded integers begin and end: a typical segment of
16 bytes might contain the end of one compressed
integer, a few compressed integer at the beginning
of another compressed integer.
Our proposed algorithm works on 12 bytes inputs and
12-bit masks. In practice, we load the input bytes in a
128-bit register containing 16 bytes (using movdqu), but
only the first 12 bytes are considered. For the time being,
let us assume that the segment begins with a complete
encoded integer. Moreover, assume that the 12-bit mask
has been precomputed.
In what follows, we use the convention that (· · · )k is
a vector of k-bit integers. Because numbers are stored in
binary notation, we have that
(1, 0, 0, 0)8 = (1, 0)16 = (1)32,
that is, all three vectors represent the same binary data.
• If the mask is 00 · · · 00, then the 12 input bytes rep-
resent 12 integers as is. We can unpack the first
4 bytes to 4 32-bit integers (in a 128-bit register)
with the pmovsxbd instruction. This new regis-
ter can then be stored in the output buffer. We can
then shift the input register by 4 bytes using the
psrldq instruction, and apply to pmovsxbd in-
struction again. Repeating a third time, we have de-
coded all 12 integers. We have consumed 12 input
bytes and written 12 integers.
• Otherwise we use the 12-bit mask to look two 8-bit
values in a table 212
-entries-wide. The first 8-bit
value is an integer between 2 and 12 indicating how
many input bytes we consume. Though it is not im-
mediately useful to know how many bytes are con-
sumed, we use this number of consumed bytes when
loading the next input bytes. The second 8-bit value
is an index i taking integer values in [0, 170). From
this index, we load up one of 170 control masks. We
then proceed according to the value of the index i:
– If i < 64, then the next 6 integers each fit
in at most two bytes (they are less than 214
).
There are exactly 26
= 64 cases correspond-
ing to this scenario. That is, the first integer
can fit in one or two bytes, the second inte-
ger in one or two bytes, and so on, generating
64 distinct cases. For each of the 6 integers
xi, we have the low byte containing the least
significant 7 bits of the integer ai, and option-
ally a high byte containing the next 7 bits bi
(xi = ai + bi27
). The call to pshufb will
permute the bytes such that the low bytes oc-
cupy the positions 1, 3, 5, 7, 9, 11 whereas the
high bytes, when available, occupy the posi-
tions 2, 4, 6, 8, 10, 12. When a high byte is
not available, the byte value zero is used in-
stead.
For example, when all 6 values are in [27
, 214
),
4. 1st
International Symposium on Web AlGorithms • June 2015
10000000
00000001
10000010
00000011
00010000
00100000
Compressed data (6 bytes)
1 0 1 0 0 0
Mask (6 bits)
Extract mask
(pmovmskb)
10000000 00000001
10000010 00000011
00010000 0
00100000 0
Permute
(pshufb)
using mask
and look-up
table
Permuted bytes
low bytes high bytes
10000000 0
10000010 00000001
00010000 0
00100000 0
Decoded values from the permuted bytes by
extracting the 4 low and 4 high bytes with
masks, shifting down by 1 bit the high bytes
and ORing.
Integers 128, 386, 16, 32
Figure 1: Simplified illustration of vectorized VByte decoding from 6 bytes to four 16-bit integers (128, 386, 16, 32).
the permuted bytes are
(a11, b1, a21, b2, a31, b3, . . . , a61, b6)8
when presented as a vector of bytes with the
short-hand notation ai1 ≡ ai + 27
.
From these permuted bytes, we generate two
vectors using bitwise ANDs with fixed masks
(using pand). The first one retains only the
least significant 7 bits of the low bytes: as a
vector of 16-bit integers we have
(a11, b1, a21, b2, a31, b3, . . . , a61, b6)8
AND
(127, 0, 127, 0, 127, 0, . . . , 127, 0)8
= (a1, a2, a3, . . . , a6)16.
The second one retains only the high bytes:
(0, b1, 0, b2, 0, b3, . . . , 0, b6)8.
Considering the latter as a vector of 16-bit in-
tegers, we right shift it by 1 bit (using psrlw)
to get the following vector
(b127
, b227
, b327
, . . . , b627
)16.
We can then combine (with a bitwise OR using
por) this last vector with the vector contain-
ing the least significant 7 bits of the low bytes.
We have effectively decoded the 6 integers as
16-bit integers: we get
(a1 + b127
, a2 + b227
,
a3 + b327
, a4 + b427
,
a5 + b527
, a6 + b627
)16.
We can unpack the first four to 32-bit inte-
gers using an instruction such as pmovsxwd,
we can then shift by 8 bytes (using psrldq)
and apply pmovsxwd once more to decode
the last two integers.
– If 64 ≤ i < 145, the next 4 encoded integers
fit in at most 3 bytes. We can check that there
are 81 = 34
such cases. The processing is then
similar to the previous case except that we
have up to three bytes per integer (low, mid-
dle and high). The permuted version will re-
arrange the input bytes so that the first 3 bytes
contain the low, middle and high bytes of the
first integer, with the convention that a zero
byte is written when there is no correspond-
ing input byte. The next byte always contain
a zero. Then we store the data corresponding
to the next integer in the next 3 bytes. A zero
byte is added. And so on.
This time, we create 3 new vectors using bit-
wise ANDs with appropriate masks: one re-
taining only the least significant 7 bits from
the low bytes, another retaining only the least
significant 7 bits from the middle bytes and an-
other retaining only the high bytes. As vectors
of 32-bit integers, the second vector is right
shifted by 1 bit whereas the third vector is right
shifted by 2 bits (using psrld). The 3 regis-
ters are then combined with a bitwise OR and
written to the output buffer.
– Finally, when 145 ≤ i < 170), we decode
the next 2 integers. Each of these integers can
consume from 1 to 5 input bytes. There are
52
= 25 such cases.
For simplicity of exposition, we only explain
how we decode the first of the two integers us-
ing 8-byte buffers. The integer can be written
as x1 = a1 + b127
+ c1214
+ d1221
+ e1228
where a1, b1, c1, d1 ∈ [0, 27
) and e1 ∈ [0, 24
).
Assuming that x1 ≥ 228
, then the first 5 input
bytes will be (a11, b11, c11, d11, e1)8.
Irrespective of the value of the index i, the first
step is to set the most significant bit of each
input byte to 0 with a bitwise AND. Thus, if
x1 ≥ 228
, we get (a1, b1, c1, d1, e1)8.
We then permute the bytes so that we get the
following 8 bytes:
Y = (b1, c1, d1, e1 + a128
)16.
The last byte is occupied by the value a1
which we can isolate for later use by shifting
right the whole vector by seven bytes (using
5. 1st
International Symposium on Web AlGorithms • June 2015
psrldq):
Y = (a1, 0, 0, 0, 0, 0, 0, 0)8.
Using the pmullw instruction, we multiply
the permuted bytes (Y ) by the vector
(27
, 26
, 25
, 24
)16
to get
(b127
, c126
, d125
, e124
+ (a1212
mod 216
))16.
As a byte vector, this last vector is equivalent
to
X = (b127
mod 28
, b1 ÷ 2,
c126
mod 28
, c1 ÷ 22
,
d125
mod 28
, d1 ÷ 23
,
e126
mod 28
, )8
where we used to indicate an irrelevant byte
value. We can left shift this last result by one
byte (using psllq):
X = (0, b127
mod 28
,
b1 ÷ 2, c126
mod 28
,
c1 ÷ 22
, d125
mod 28
,
d1 ÷ 23
, e126
mod 28
)8
We can combine these two results with the
value a1 isolated earlier (Y ):
Y OR X OR X = (a1 + b127
mod 28
, ,
b1 ÷ 2 + c126
mod 28
, ,
c1 ÷ 22
+ d125
mod 28
, ,
d1 ÷ 23
+ e126
mod 28
, )8
where again we use to indicate irrelevant
byte values. We can permute this last vector
to get
(a1 + b127
mod 28
, b1 ÷ 2 + c126
mod 28
,
c1 ÷ 22
+ d125
mod 28
,
d1 ÷ 23
+ e126
mod 28
, . . .)8
= (a1 + b127
+ c1214
+ d1221
+ e1228
, . . .)32
Thus, we have effectively decoded the integer
x1.
The actual routine works with two integers
(x1, x2). The content of the first one is ini-
tially stored in the first 8 bytes of a 16-byte
vector whereas the remaining 8 bytes are used
for the second integer. Both integers are de-
coded simultaneously.
An important motivation is to amortize the latency of
the pmovmskb instruction as much as possible. First,
we repeatedly call the pmovmskb instruction until we
have processed up to 48 bytes to compute a correspond-
ing 48-bit mask. Then we repeatedly call the decoding
procedure as long as 12 input bits remain out of the pro-
cessed 48 bytes. After reach call to the 12-byte decoding
procedure, we left shift the mask by the number of con-
sumed bits. Recall that we look up the number of con-
sumed bytes at the beginning of the decoding procedure
so this number is readily available and its determination
does not cause any delay. When fewer than 12 valid bits
remain in the mask, we process another block of 48 input
bytes with pmovmskb. To accelerate further this pro-
cess, and if there are enough input bytes, we maintain
two 48-bit masks (representing 96 input bytes): in this
manner, a 48-bit mask is already available while a new
one is being computed. When fewer than 48 input bytes
but more than 16 input bytes remain, we call pmovmskb
as needed to ensure that we have at least a 12-bit mask.
When it is no longer possible, we fall back on conven-
tional VByte decoding.
Differential Coding We described the decoding pro-
cedure without accounting for differential coding. It can
be added without any major algorithmic change. We
just keep track of last 32-bit integer decoded. We might
store it in the last entry of a vector of four 32-bit integers
(henceforth p = (p1, p2, p3, p4)).
We compute the prefix sum before to writing the de-
coded integers. There are two cases to consider. We ei-
ther write four 32-bit integers or 2 32-bit integers (e.g.,
when writing 6 decoded integers, we first write 4, then 2
integers). In both cases, we permute first the entries of p
so that p ← (p4, p4, p4, p4) using the pshufd instruc-
tion.
• If we decoded 4-integers stored in the vector c.
We left shift the content of c by one integer (using
pslldq) so that c ← (0, c1, c2, c3). We add c to c
using paddd so that c ← (c1, c1 + c2, c2 + c3, c3 +
c4). We shift the rest by two integers (using again
pslldq) c ← (0, 0, c1, c1 + c2) and add c + c =
(c1, c1 +c2, c1 +c2 +c3, c1 +c2 +c3 +c4). Finally,
we add p to this last result p ← p + c + c = (p4 +
c1, pp+c1+c2, p+c1+c2+c3, p+c1+c2+c3+c4).
We can write p as the decoded output.
• The process is similar though less efficient if we
only have two decoded gaps. We start from a vec-
tor containing two gaps c ← (c1, c2, , ) where we
indicate irrelevant entries with . We can left shift
by one integer c ← (0, c1, c2, ) and add the result
c + c = (c1, c1 + c2, , ). Using the pshufd in-
struction, we can copy the value of the second com-
ponent to the third and fourth components, generat-
ing (c1, c1 + c2, c1 + c2, c1 + c2). We can then add
p to this result and store the result back into p. The
first two integers can be written out as output.
V. EXPERIMENTS
We implemented our software in C and C++. The bench-
mark program ran on a Linux server with an Intel i7-
6. 1st
International Symposium on Web AlGorithms • June 2015
0
500
1000
1500
2000
2500
3000
8 9 10 11 12 13 14 15
decodingspeed(mis)
bits per integer
Masked VByte
VByte
(a) Absolute speed
2
2.5
3
3.5
4
8 9 10 11 12 13 14 15
relativespeed
bits per integer
Masked VByte vs. VByte
(b) Ratio of speeds
Figure 2: Performance comparison for various sets of posting
lists (ClueWeb)
4770 processor running at 3.4 GHz. This Haswell pro-
cessor has 32 kB of L1 cache and 256 kB of L2 cache per
core with 8 MB of L3 cache. The machine has 32 GB
of RAM (DDR3-1600 with double-channel). We dis-
abled Turbo Boost and set the processor to run at its
highest clock speed. We report wall-clock timings. Our
software is freely available under an open-source license
(http://maskedvbyte.org) and was compiled us-
ing the GNU GCC 4.8 compiler with the -O3 flag.
For our experiments, we used a collection of posting
lists extracted from the ClueWeb09 (Category B) data
set. ClueWeb09 includes 50 million web pages. We have
one posting list for each of the 1 million most frequent
words—after excluding stop words and applying lemma-
tization. Documents were sorted lexicographically based
on their URL prior to attributing document identifiers.
The posting lists are grouped based on length: we store
and process lists of lengths 2K
to 2K+1
− 1 together for
all values of K. Coding and decoding times include dif-
ferential coding. Shorter lists are less compressible than
longer lists since their gaps tend to be larger. Our results
are summarized in Fig. 2. For each group of posting lists
we compute the average bits used per integer after com-
pression: this value ranges from 8 to slightly less than
16. All decoders work on the same compressed data.
When decoding long posting lists to RAM, our speed
is limited by RAM throughput. For this reason, we
decode the compressed data sequentially to buffers fit-
ting in L1 cache (4096 integers). For each group and
each decoder, we compute the average decoding speed
in millions of 32-bit integers per second (mis). For
our MASKED VBYTE decoder, the speeds ranges from
2700 mis for the most compressible lists to 650 mis for
the less compressible ones. The speed of the conven-
tionalVByte decoder ranges from 1100 mis to 300 mis.
For all groups of posting lists in our experiments, the
MASKED VBYTE decoder was at least twice as fast as
the conventional VByte decoder. However, for some
groups, the speedup is between 3× and 4×.
If we fully decode all lists instead of decoding to
a buffer that fits in CPU cache, the performance of
MASKED VBYTE can be reduced by about 15 %. For
example, instead of a maximal speed of 2700 mis,
MASKED VBYTE is limited to 2300 mis.
VI. CONCLUSION
To our knowledge, no existing VByte decoder comes
close to the speed of MASKED VBYTE. Given how
the VByte format is a de facto standard, it suggests that
MASKED VBYTE could help optimize a wide range of
existing software without affecting the data formats.
MASKED VBYTE is in production code at Indeed
as part of the open-source analytics platform Imhotep
(http://indeedeng.github.io/imhotep/).
ACKNOWLEDGMENTS
We thank L. Boystov from CMU for preparing and mak-
ing available the posting list collection.
REFERENCES
[1] Bishwaranjan Bhattacharjee, Lipyeow Lim, Timo-
thy Malkemus, George Mihaila, Kenneth Ross, Sher-
man Lau, Cathy McArthur, Zoltan Toth, and Reza
Sherkat. Efficient index compression in DB2 LUW.
Proc. VLDB Endow., 2(2):1462–1473, August 2009.
[2] Jeffrey Dean. Challenges in building large-scale in-
formation retrieval systems: invited talk. WSDM
’09, pages 1–1, New York, NY, USA, 2009. ACM.
[3] Daniel Lemire and Leonid Boytsov. Decoding bil-
lions of integers per second through vectorization.
Softw. Pract. Exper., 45(1), 2015.
[4] Alexander A. Stepanov, Anil R. Gangolli, Daniel E.
Rose, Ryan J. Ernst, and Paramjit S. Oberoi. SIMD-
based decoding of posting lists. CIKM ’11, pages
317–326, New York, NY, USA, 2011. ACM.
[5] Hugh E. Williams and Justin Zobel. Compressing
integers for fast file access. Comput. J., 42(3):193–
201, 1999.