Sudeepta Mishra gave a technical seminar presentation on data compression techniques. They discussed lossless compression techniques like dictionary coders, entropy encoding, and run-length encoding. They focused on the LZW compression algorithm, providing examples of how it works for both compression and decompression. LZW is an adaptive dictionary coding technique that builds the dictionary dynamically and does not require transmitting the dictionary.
The document discusses various data compression techniques, including lossless compression methods like Lempel-Ziv (LZ) and Lempel-Ziv-Welch (LZW) algorithms. LZ algorithms build an adaptive dictionary while encoding to replace repeated patterns with codes. LZW improves on LZ78 by using a dictionary indexed by codes. The encoder outputs codes for strings in the input and adds new strings to the dictionary. The decoder recreates the dictionary to decompress the data. LZW achieves good compression and is used widely in formats like PDF.
The document describes the LZW data compression algorithm. It begins with an introduction to data compression and the lossless LZW algorithm. It then explains the basic LZ78 algorithm and how LZW compression works using a dictionary-based approach to encode repeated data sequences. The encoding and decoding processes are described through examples. Applications and implementations of LZW are also discussed.
The document discusses different data compression techniques, focusing on lossless dictionary-based compression algorithms like Lempel-Ziv (LZ). It explains how LZ algorithms like LZW work by building an adaptive dictionary during compression and decompression. The LZW algorithm is described through examples. Advantages of LZW include no need to explicitly transfer the dictionary and fast encoding/decoding through table lookups. Problems like running out of dictionary space are also addressed.
This document discusses text compression algorithms LZW and Flate. It describes LZW's dictionary-based encoding approach and provides examples of encoding and decoding a string. Flate compression is explained as combining LZ77 compression, which finds repeated sequences, and Huffman coding, which assigns variable length codes based on frequency. Flate can choose between no compression, LZ77 then Huffman, or LZ77 and custom Huffman trees. The advantages of LZW include lossless compression and not needing the code table during decompression, while its disadvantage is dictionary size limits. Flate provides adaptive compression and lossless compression but has overhead from generating Huffman trees and complex implementation.
Temporal logic and functional reactive programmingSergei Winitzki
In my day job, most bugs come from imperatively implemented reactive programs. Temporal Logic and FRP are declarative approaches that promise to solve my problems. I will briey review the motivations behind
and the connections between temporal logic and FRP. I propose a rather "pedestrian" approach to propositional linear-time temporal logic (LTL), showing how to perform calculations in LTL and how to synthesize programs from LTL formulas. I intend to explain why LTL largely failed to
solve the synthesis problem, and how FRP tries to cope.
FRP can be formulated as a -calculus with types given by the propositional intuitionistic LTL. I will discuss the limitations of this approach, and outline the features of FRP that are required by typical application programming scenarios. My talk will be largely self-contained and should be understandable to anyone familiar with Curry-Howard and functional programming.
This document discusses minimizing deterministic finite automata (DFA) and provides references on the topic. It begins with an introduction to minimizing DFA and then provides several examples of DFAs with different languages over various alphabets. It concludes by listing references for additional information on minimizing DFA and automata theory.
This document contains lecture notes on regular expressions from a compilers course taught by Rebaz Najeeb at Koya University. It discusses topics like specification of tokens using regular expressions, regular expression operations and examples, and regular definitions. Several examples of regular expressions to match certain string patterns are provided, such as strings of even/odd length, strings starting with a specific alphabet, and patterns involving numbers. Homework involves writing a regular expression for valid email addresses.
This document discusses imperative and object-oriented programming languages. It covers basic concepts like state, variables, expressions, assignments, and control flow in imperative languages. It also discusses procedures and functions, including passing parameters, stack frames, and recursion. Finally, it briefly mentions the differences between call by value and call by reference.
The document discusses various data compression techniques, including lossless compression methods like Lempel-Ziv (LZ) and Lempel-Ziv-Welch (LZW) algorithms. LZ algorithms build an adaptive dictionary while encoding to replace repeated patterns with codes. LZW improves on LZ78 by using a dictionary indexed by codes. The encoder outputs codes for strings in the input and adds new strings to the dictionary. The decoder recreates the dictionary to decompress the data. LZW achieves good compression and is used widely in formats like PDF.
The document describes the LZW data compression algorithm. It begins with an introduction to data compression and the lossless LZW algorithm. It then explains the basic LZ78 algorithm and how LZW compression works using a dictionary-based approach to encode repeated data sequences. The encoding and decoding processes are described through examples. Applications and implementations of LZW are also discussed.
The document discusses different data compression techniques, focusing on lossless dictionary-based compression algorithms like Lempel-Ziv (LZ). It explains how LZ algorithms like LZW work by building an adaptive dictionary during compression and decompression. The LZW algorithm is described through examples. Advantages of LZW include no need to explicitly transfer the dictionary and fast encoding/decoding through table lookups. Problems like running out of dictionary space are also addressed.
This document discusses text compression algorithms LZW and Flate. It describes LZW's dictionary-based encoding approach and provides examples of encoding and decoding a string. Flate compression is explained as combining LZ77 compression, which finds repeated sequences, and Huffman coding, which assigns variable length codes based on frequency. Flate can choose between no compression, LZ77 then Huffman, or LZ77 and custom Huffman trees. The advantages of LZW include lossless compression and not needing the code table during decompression, while its disadvantage is dictionary size limits. Flate provides adaptive compression and lossless compression but has overhead from generating Huffman trees and complex implementation.
Temporal logic and functional reactive programmingSergei Winitzki
In my day job, most bugs come from imperatively implemented reactive programs. Temporal Logic and FRP are declarative approaches that promise to solve my problems. I will briey review the motivations behind
and the connections between temporal logic and FRP. I propose a rather "pedestrian" approach to propositional linear-time temporal logic (LTL), showing how to perform calculations in LTL and how to synthesize programs from LTL formulas. I intend to explain why LTL largely failed to
solve the synthesis problem, and how FRP tries to cope.
FRP can be formulated as a -calculus with types given by the propositional intuitionistic LTL. I will discuss the limitations of this approach, and outline the features of FRP that are required by typical application programming scenarios. My talk will be largely self-contained and should be understandable to anyone familiar with Curry-Howard and functional programming.
This document discusses minimizing deterministic finite automata (DFA) and provides references on the topic. It begins with an introduction to minimizing DFA and then provides several examples of DFAs with different languages over various alphabets. It concludes by listing references for additional information on minimizing DFA and automata theory.
This document contains lecture notes on regular expressions from a compilers course taught by Rebaz Najeeb at Koya University. It discusses topics like specification of tokens using regular expressions, regular expression operations and examples, and regular definitions. Several examples of regular expressions to match certain string patterns are provided, such as strings of even/odd length, strings starting with a specific alphabet, and patterns involving numbers. Homework involves writing a regular expression for valid email addresses.
This document discusses imperative and object-oriented programming languages. It covers basic concepts like state, variables, expressions, assignments, and control flow in imperative languages. It also discusses procedures and functions, including passing parameters, stack frames, and recursion. Finally, it briefly mentions the differences between call by value and call by reference.
The document provides information about assembly language, including definitions, instruction formats, and the process of assembling, linking and executing an assembly language program. It defines assembly language as a language that uses symbols and letters instead of binary to represent instructions and storage locations. It also describes common instruction types like data transfer, arithmetic, logic and shift instructions. Finally, it outlines the steps to create an assembly program, which includes writing source code, assembling it, linking the object files, and executing the final executable.
Cody Roux - Pure Type Systems - Boston Haskell MeetupGreg Hale
- Pure type systems (PTS) provide a unified framework for understanding type systems and functional programming languages like Haskell.
- A PTS is defined by a set of sorts, axioms relating sorts, and rules for forming quantified types; this allows modeling features like polymorphism, dependent types, type constructors.
- Examples like the simply typed lambda calculus and System F can be modeled as PTSes. Properties like normalization are important but not always predictable from a PTS definition.
- PTSes can capture features of modern languages like predicative polymorphism, separating type- and term-level data, but consistency questions remain open for some extensions.
This document discusses lexical analysis in compilers. It begins with an outline of the topics to be covered, including lexical analysis, regular expressions, finite state automata, and the process of converting regular expressions to deterministic finite automata (DFAs). It then provides more details on each phase of a compiler and the role of lexical analysis. Key aspects of lexical analysis like tokenizing source code and classifying tokens are explained. The document also covers implementation of regular expressions using non-deterministic finite automata (NFAs) and their conversion to equivalent DFAs using techniques like epsilon-closure and transition tables.
The document discusses register allocation techniques used by compilers to optimize code generation. It describes how register allocation works by constructing a register interference graph and using graph coloring algorithms to assign temporaries to a limited number of machine registers. When graph coloring fails to find a solution, spilling of temporaries is used to reduce interferences and allow coloring. Cache optimization is also briefly covered.
The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)Peter Breuer
Slides for the paper "The mixed-signal modelling language VHDL-AMS and its semantics", given at 8th International Colloquium NACSA, Plovidiv, Bulgaria, August 1999. A preprint of the paper is available at http://www.academia.edu/2493489/Denotational_semantics_for_core_VHDL-AMS .
Declarative Semantics Definition - Term RewritingGuido Wachsmuth
This document discusses term rewriting and its applications in compiler construction. It covers term rewriting systems, rewrite rules that transform terms, and rewrite strategies that control rule application. Examples are provided for desugaring code using rewrite rules and constant folding arithmetic expressions using rewrite rules and strategies. Stratego is presented as a domain-specific language for program transformation based on term rewriting.
Introduction - Imperative and Object-Oriented LanguagesGuido Wachsmuth
This document provides an overview of imperative and object-oriented languages. It discusses the properties of imperative languages like state, statements, control flow, procedures and types. It then covers object-oriented concepts like objects, messages, classes, inheritance and polymorphism. Examples are given in various languages like C, Java bytecode, x86 assembly to illustrate concepts like variables, expressions, functions and object-oriented features. Finally, it provides an outlook on upcoming lectures covering declarative language definition.
The document summarizes the K-SVD algorithm for designing overcomplete dictionaries for sparse representation. It discusses how K-SVD improves on previous methods by more efficiently solving the sparse coding and dictionary update stages. In sparse coding, it uses OMP to approximate l0-norm minimization. In dictionary update, it simultaneously updates the dictionary and coefficients using SVD to minimize representation error. The algorithm iterates between these two stages. It was shown to learn meaningful bases from natural images and effectively solve problems like image inpainting.
The document discusses three methods to optimize DFAs: 1) directly building a DFA from a regular expression, 2) minimizing states, and 3) compacting transition tables. It provides details on constructing a direct DFA from a regular expression by building a syntax tree and calculating first, last, and follow positions. It also describes minimizing states by partitioning states into accepting and non-accepting groups and compacting transition tables by representing them as lists of character-state pairs with a default state.
This document discusses various strategies for register allocation and assignment in compiler design. It notes that assigning values to specific registers simplifies compiler design but can result in inefficient register usage. Global register allocation aims to assign frequently used values to registers for the duration of a single block. Usage counts provide an estimate of how many loads/stores could be saved by assigning a value to a register. Graph coloring is presented as a technique where an interference graph is constructed and coloring aims to assign registers efficiently despite interference between values.
INC and DEC Instructions
ADD Instruction
SUB Instruction
NEG Instruction
Implementing Arithmetic Expressions
Flags Affected by Addition and Subtraction
Example Program (AddSub3)
This document discusses context free grammars (CFG). It defines the key components of a CFG including terminals, non-terminals, and productions. Terminals are symbols that cannot be replaced, non-terminals must be replaced, and productions are the grammatical rules. A CFG consists of an alphabet of terminals, non-terminals (including a start symbol S), and a finite set of productions that replace non-terminals with strings of terminals and/or non-terminals. Several examples are provided to illustrate how CFGs can define different context free languages.
Optimizing Set-Similarity Join and Search with Different Prefix SchemesHPCC Systems
As part of the 2018 HPCC Systems Summit Community Day event:
Up first, Zhe Yu, NC State University briefly discusses his poster, How to Be Rich: A Study of Monsters and Mice of American Industry
Following, Fabian Fier, presents his breakout session in the Documentation & Training Track.
Finding duplicate textual content is crucial for many applications, especially plagiarism detection. When dealing with millions of documents finding duplicate content becomes very time-consuming. Thus it needs scalable and efficient data structures and algorithms that solve this task in seconds rather than hours. In my talk, I present an optimization of a common filter-and-verification set-similarity join and search approach. Filter-and-verification means that we only consider such pairs of objects which share a common word or token in a prefix. Such pairs are potentially similar and are verified in a subsequent step. The candidate set is usually orders of magnitudes smaller than the cross product over an input set. We optimizied this approach by regarding overlaps larger than 1, which reduces the candidate set further and makes the verification faster. On the other hand this requires larger prefixes, which use more memory. Our experiments using HPCC Systems show that we can usually optimize the runtime by choosing an overlap different from the standard overlap 1.
Fabian Fier is a PhD student at the database research group of Johann-Christoph Freytag. He holds a diploma in computer science from Humboldt-Universität. His research interest is similarity search on web-scale data. He uses techniques from textual similarity joins on Big Data and adapts them to similiarity search.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
St Petersburg R user group meetup 2, Parallel RAndrew Bzikadze
This document provides an overview of parallel computing techniques in R using various packages like snow, multicore, and parallel. It begins with motivation for parallelizing R given its limitations of being single-threaded and memory-bound. It then covers the snow package which enables explicit parallelism across computer clusters. The multicore package provides implicit parallelism using forking, but is deprecated. The parallel package acts as a wrapper for snow and multicore. It also discusses load balancing, random number generation, and provides examples of using snow and multicore for parallel k-means clustering and lapply.
Design and Implementation of Area Efficiency AES Algoritham with FPGA and ASIC,paperpublications3
Abstract: A public domain encryption standard is subject to continuous, vigilant, expert cryptanalysis. AES is a symmetric encryption algorithm processing data in block of 128 bits. Under the influence of a key, a 128-bit block is encrypted by transforming it in a unique way into a new block of the same size. To implement AES Rijndael algorithm on FPGA using Verilog and synthesis using Xilinx, Plain text of 128 bit data is considered for encryption using Rijndael algorithm utilizing key. This encryption method is versatile used for military applications. The same key is used for decryption to recover the original 128 bit plain text. For high speed applications, the Non LUT based implementation of AES S-box and inverse S-box is preferred. Development of physical design of AES-128 bit is done using cadence SoC encounter. Performance evaluation of the physical design with respect to area, power, and time has been done. The core consumes 10.11 mW of power for the core area of 330100.742 μm2.
Keywords: Encryption, Decryption Rijndael algorithm, FPGA implementation, Physical Design.
Certified bit coded regular expression parsingrodrigogribeiro
Certified Bit-Coded Regular Expression Parsing
The document describes a certified algorithm for parsing regular expressions using bit-coded parse trees. It defines semantics and parse trees for regular expressions, shows how to relate parse trees to bit-codes, and proves properties about the relation between regular expressions and their bit-coded representations. The algorithm uses derivatives to implement prefix and substring matching and is included in a regular expression search tool developed in Agda. Experimental results are presented along with plans for future work.
This document discusses lexical analysis using finite automata. It begins by defining regular expressions, finite automata, and their components. It then covers non-deterministic finite automata (NFAs) and deterministic finite automata (DFAs), and how NFAs can recognize the same regular languages as DFAs. The document outlines the process of converting a regular expression to an NFA using Thompson's construction, then converting the NFA to a DFA using subset construction. It also discusses minimizing DFAs using Hopcroft's algorithm. Examples are provided to illustrate each concept.
This document discusses term rewriting and provides examples of how rewrite rules can be used to transform terms. Key points include:
- Rewrite rules define pattern matching and substitution to transform terms from a left-hand side to a right-hand side.
- Examples show desugaring language constructs like if-then statements, constant folding arithmetic expressions, and mapping/zipping lists with strategies as parameters to rules.
- Terms can represent programming language syntax and semantics domains. Signatures define the structure of terms.
- Rewriting systems provide a declarative way to define program transformations and semantic definitions through rewrite rules and strategies.
Cs6660 compiler design may june 2017 answer keyappasami
This document contains an exam for a Compiler Design course. It includes 20 short answer questions in Part A and 5 long answer questions in Part B. The long answer questions cover topics like the phases of a compiler, lexical analysis, parser construction, type checking, code optimization, and code generation. Students are instructed to answer all questions and provide detailed explanations for the long answer questions.
This document summarizes Liwei Ren's presentation on differential compression. It begins with Ren's background and introduces differential compression. Ren then presents a mathematical model describing differences between files using edit operations. The document categorizes differential compression based on whether references and targets are in the same/different locations over the same/different times. Finally, it discusses three advanced topics in more depth: comparing local, remote, and in-place differential compression schemes; applying differential compression to executable files; and performing in-place file merging with local differential compression.
The document discusses scalar quantization and the Lloyd-Max algorithm. It provides examples of using the Lloyd-Max algorithm to design scalar quantizers for Gaussian and Laplacian distributed signals. The algorithm works by iteratively calculating decision thresholds and representative levels to minimize mean squared error. At high rates, the distortion-rate function of a Lloyd-Max quantizer is approximated. The document also discusses entropy-constrained scalar quantization and an iterative algorithm to design those quantizers.
The document provides information about assembly language, including definitions, instruction formats, and the process of assembling, linking and executing an assembly language program. It defines assembly language as a language that uses symbols and letters instead of binary to represent instructions and storage locations. It also describes common instruction types like data transfer, arithmetic, logic and shift instructions. Finally, it outlines the steps to create an assembly program, which includes writing source code, assembling it, linking the object files, and executing the final executable.
Cody Roux - Pure Type Systems - Boston Haskell MeetupGreg Hale
- Pure type systems (PTS) provide a unified framework for understanding type systems and functional programming languages like Haskell.
- A PTS is defined by a set of sorts, axioms relating sorts, and rules for forming quantified types; this allows modeling features like polymorphism, dependent types, type constructors.
- Examples like the simply typed lambda calculus and System F can be modeled as PTSes. Properties like normalization are important but not always predictable from a PTS definition.
- PTSes can capture features of modern languages like predicative polymorphism, separating type- and term-level data, but consistency questions remain open for some extensions.
This document discusses lexical analysis in compilers. It begins with an outline of the topics to be covered, including lexical analysis, regular expressions, finite state automata, and the process of converting regular expressions to deterministic finite automata (DFAs). It then provides more details on each phase of a compiler and the role of lexical analysis. Key aspects of lexical analysis like tokenizing source code and classifying tokens are explained. The document also covers implementation of regular expressions using non-deterministic finite automata (NFAs) and their conversion to equivalent DFAs using techniques like epsilon-closure and transition tables.
The document discusses register allocation techniques used by compilers to optimize code generation. It describes how register allocation works by constructing a register interference graph and using graph coloring algorithms to assign temporaries to a limited number of machine registers. When graph coloring fails to find a solution, spilling of temporaries is used to reduce interferences and allow coloring. Cache optimization is also briefly covered.
The mixed-signal modelling language VHDL-AMS and its semantics (ICNACSA 1999)Peter Breuer
Slides for the paper "The mixed-signal modelling language VHDL-AMS and its semantics", given at 8th International Colloquium NACSA, Plovidiv, Bulgaria, August 1999. A preprint of the paper is available at http://www.academia.edu/2493489/Denotational_semantics_for_core_VHDL-AMS .
Declarative Semantics Definition - Term RewritingGuido Wachsmuth
This document discusses term rewriting and its applications in compiler construction. It covers term rewriting systems, rewrite rules that transform terms, and rewrite strategies that control rule application. Examples are provided for desugaring code using rewrite rules and constant folding arithmetic expressions using rewrite rules and strategies. Stratego is presented as a domain-specific language for program transformation based on term rewriting.
Introduction - Imperative and Object-Oriented LanguagesGuido Wachsmuth
This document provides an overview of imperative and object-oriented languages. It discusses the properties of imperative languages like state, statements, control flow, procedures and types. It then covers object-oriented concepts like objects, messages, classes, inheritance and polymorphism. Examples are given in various languages like C, Java bytecode, x86 assembly to illustrate concepts like variables, expressions, functions and object-oriented features. Finally, it provides an outlook on upcoming lectures covering declarative language definition.
The document summarizes the K-SVD algorithm for designing overcomplete dictionaries for sparse representation. It discusses how K-SVD improves on previous methods by more efficiently solving the sparse coding and dictionary update stages. In sparse coding, it uses OMP to approximate l0-norm minimization. In dictionary update, it simultaneously updates the dictionary and coefficients using SVD to minimize representation error. The algorithm iterates between these two stages. It was shown to learn meaningful bases from natural images and effectively solve problems like image inpainting.
The document discusses three methods to optimize DFAs: 1) directly building a DFA from a regular expression, 2) minimizing states, and 3) compacting transition tables. It provides details on constructing a direct DFA from a regular expression by building a syntax tree and calculating first, last, and follow positions. It also describes minimizing states by partitioning states into accepting and non-accepting groups and compacting transition tables by representing them as lists of character-state pairs with a default state.
This document discusses various strategies for register allocation and assignment in compiler design. It notes that assigning values to specific registers simplifies compiler design but can result in inefficient register usage. Global register allocation aims to assign frequently used values to registers for the duration of a single block. Usage counts provide an estimate of how many loads/stores could be saved by assigning a value to a register. Graph coloring is presented as a technique where an interference graph is constructed and coloring aims to assign registers efficiently despite interference between values.
INC and DEC Instructions
ADD Instruction
SUB Instruction
NEG Instruction
Implementing Arithmetic Expressions
Flags Affected by Addition and Subtraction
Example Program (AddSub3)
This document discusses context free grammars (CFG). It defines the key components of a CFG including terminals, non-terminals, and productions. Terminals are symbols that cannot be replaced, non-terminals must be replaced, and productions are the grammatical rules. A CFG consists of an alphabet of terminals, non-terminals (including a start symbol S), and a finite set of productions that replace non-terminals with strings of terminals and/or non-terminals. Several examples are provided to illustrate how CFGs can define different context free languages.
Optimizing Set-Similarity Join and Search with Different Prefix SchemesHPCC Systems
As part of the 2018 HPCC Systems Summit Community Day event:
Up first, Zhe Yu, NC State University briefly discusses his poster, How to Be Rich: A Study of Monsters and Mice of American Industry
Following, Fabian Fier, presents his breakout session in the Documentation & Training Track.
Finding duplicate textual content is crucial for many applications, especially plagiarism detection. When dealing with millions of documents finding duplicate content becomes very time-consuming. Thus it needs scalable and efficient data structures and algorithms that solve this task in seconds rather than hours. In my talk, I present an optimization of a common filter-and-verification set-similarity join and search approach. Filter-and-verification means that we only consider such pairs of objects which share a common word or token in a prefix. Such pairs are potentially similar and are verified in a subsequent step. The candidate set is usually orders of magnitudes smaller than the cross product over an input set. We optimizied this approach by regarding overlaps larger than 1, which reduces the candidate set further and makes the verification faster. On the other hand this requires larger prefixes, which use more memory. Our experiments using HPCC Systems show that we can usually optimize the runtime by choosing an overlap different from the standard overlap 1.
Fabian Fier is a PhD student at the database research group of Johann-Christoph Freytag. He holds a diploma in computer science from Humboldt-Universität. His research interest is similarity search on web-scale data. He uses techniques from textual similarity joins on Big Data and adapts them to similiarity search.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
St Petersburg R user group meetup 2, Parallel RAndrew Bzikadze
This document provides an overview of parallel computing techniques in R using various packages like snow, multicore, and parallel. It begins with motivation for parallelizing R given its limitations of being single-threaded and memory-bound. It then covers the snow package which enables explicit parallelism across computer clusters. The multicore package provides implicit parallelism using forking, but is deprecated. The parallel package acts as a wrapper for snow and multicore. It also discusses load balancing, random number generation, and provides examples of using snow and multicore for parallel k-means clustering and lapply.
Design and Implementation of Area Efficiency AES Algoritham with FPGA and ASIC,paperpublications3
Abstract: A public domain encryption standard is subject to continuous, vigilant, expert cryptanalysis. AES is a symmetric encryption algorithm processing data in block of 128 bits. Under the influence of a key, a 128-bit block is encrypted by transforming it in a unique way into a new block of the same size. To implement AES Rijndael algorithm on FPGA using Verilog and synthesis using Xilinx, Plain text of 128 bit data is considered for encryption using Rijndael algorithm utilizing key. This encryption method is versatile used for military applications. The same key is used for decryption to recover the original 128 bit plain text. For high speed applications, the Non LUT based implementation of AES S-box and inverse S-box is preferred. Development of physical design of AES-128 bit is done using cadence SoC encounter. Performance evaluation of the physical design with respect to area, power, and time has been done. The core consumes 10.11 mW of power for the core area of 330100.742 μm2.
Keywords: Encryption, Decryption Rijndael algorithm, FPGA implementation, Physical Design.
Certified bit coded regular expression parsingrodrigogribeiro
Certified Bit-Coded Regular Expression Parsing
The document describes a certified algorithm for parsing regular expressions using bit-coded parse trees. It defines semantics and parse trees for regular expressions, shows how to relate parse trees to bit-codes, and proves properties about the relation between regular expressions and their bit-coded representations. The algorithm uses derivatives to implement prefix and substring matching and is included in a regular expression search tool developed in Agda. Experimental results are presented along with plans for future work.
This document discusses lexical analysis using finite automata. It begins by defining regular expressions, finite automata, and their components. It then covers non-deterministic finite automata (NFAs) and deterministic finite automata (DFAs), and how NFAs can recognize the same regular languages as DFAs. The document outlines the process of converting a regular expression to an NFA using Thompson's construction, then converting the NFA to a DFA using subset construction. It also discusses minimizing DFAs using Hopcroft's algorithm. Examples are provided to illustrate each concept.
This document discusses term rewriting and provides examples of how rewrite rules can be used to transform terms. Key points include:
- Rewrite rules define pattern matching and substitution to transform terms from a left-hand side to a right-hand side.
- Examples show desugaring language constructs like if-then statements, constant folding arithmetic expressions, and mapping/zipping lists with strategies as parameters to rules.
- Terms can represent programming language syntax and semantics domains. Signatures define the structure of terms.
- Rewriting systems provide a declarative way to define program transformations and semantic definitions through rewrite rules and strategies.
Cs6660 compiler design may june 2017 answer keyappasami
This document contains an exam for a Compiler Design course. It includes 20 short answer questions in Part A and 5 long answer questions in Part B. The long answer questions cover topics like the phases of a compiler, lexical analysis, parser construction, type checking, code optimization, and code generation. Students are instructed to answer all questions and provide detailed explanations for the long answer questions.
This document summarizes Liwei Ren's presentation on differential compression. It begins with Ren's background and introduces differential compression. Ren then presents a mathematical model describing differences between files using edit operations. The document categorizes differential compression based on whether references and targets are in the same/different locations over the same/different times. Finally, it discusses three advanced topics in more depth: comparing local, remote, and in-place differential compression schemes; applying differential compression to executable files; and performing in-place file merging with local differential compression.
The document discusses scalar quantization and the Lloyd-Max algorithm. It provides examples of using the Lloyd-Max algorithm to design scalar quantizers for Gaussian and Laplacian distributed signals. The algorithm works by iteratively calculating decision thresholds and representative levels to minimize mean squared error. At high rates, the distortion-rate function of a Lloyd-Max quantizer is approximated. The document also discusses entropy-constrained scalar quantization and an iterative algorithm to design those quantizers.
This document provides an introduction to data compression. It defines data compression as converting an input data stream into a smaller output stream. Data compression is popular because it allows for more data storage and faster data transfers. The document then discusses key concepts in data compression including lossy vs. lossless compression, adaptive vs. non-adaptive methods, compression performance metrics, and probability models. It also introduces several standard corpora used to test compression algorithms.
The document discusses efficient codebook design for image compression using vector quantization. It introduces data compression techniques, including lossless compression methods like dictionary coders and entropy coding, as well as lossy compression methods like scalar and vector quantization. Vector quantization maps vectors to codewords in a codebook to compress data. The LBG algorithm is described for generating an optimal codebook by iteratively clustering vectors and updating codebook centroids.
The document discusses audio compression techniques. It begins with an introduction to pulse code modulation (PCM) and then describes μ-law and A-law compression standards which compress audio using companding algorithms. It also covers differential PCM and adaptive differential PCM (ADPCM) techniques. The document then discusses the MPEG audio compression standard, including its encoder architecture, three layer standards (Layers I, II, III), and applications. It concludes with a comparison of various MPEG audio compression standards and references.
This document provides an overview of image compression. It discusses what image compression is, why it is needed, common terminology used, entropy, compression system models, and algorithms for image compression including lossless and lossy techniques. Lossless algorithms compress data without any loss of information while lossy algorithms reduce file size by losing some information and quality. Common lossless techniques mentioned are run length encoding and Huffman coding while lossy methods aim to form a close perceptual approximation of the original image.
This document discusses different compression techniques including lossless and lossy compression. Lossless compression recovers the exact original data after compression and is used for databases and documents. Lossy compression results in some loss of accuracy but allows for greater compression and is used for images and audio. Common lossless compression algorithms discussed include run-length encoding, Huffman coding, and arithmetic coding. Lossy compression is used in applications like digital cameras to increase storage capacity with minimal quality degradation.
In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.[2] Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression.
This document discusses various topics related to data compression including compression techniques, audio compression, video compression, and standards like MPEG and JPEG. It covers lossless versus lossy compression, explaining that lossy compression can achieve much higher levels of compression but results in some loss of quality, while lossless compression maintains the original quality. The advantages of data compression include reducing file sizes, saving storage space and bandwidth.
This document discusses various methods of data compression. It begins by defining compression as reducing the size of data while retaining its meaning. There are two main types of compression: lossless and lossy. Lossless compression allows for perfect reconstruction of the original data by removing redundant data. Common lossless methods include run-length encoding and Huffman coding. Lossy compression is used for images and video, and results in some loss of information. Popular lossy schemes are JPEG, MPEG, and MP3. The document then proceeds to describe several specific compression algorithms and their encoding and decoding processes.
This document provides an overview of the ECE 551 Digital Design and Synthesis course for Fall 2009 at UW-Madison. It introduces the instructor, Eric Hoffman, and TA Vinod Nalamalapu. It outlines course goals, materials, tools, evaluation criteria and schedule. Key topics covered include Verilog, simulation, synthesis, FPGAs, standard cells and digital design flows. Students are expected to have prior knowledge of digital logic concepts from ECE 352.
Introduction to SeqAn, an Open-source C++ Template LibraryCan Ozdoruk
SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as NVIDIA GPUs. Go through the slides to learn more. For your own BI development you can try GPUs for free here: www.Nvidia.com/GPUTestDrive
The document discusses the VLSI lab and its goals of designing and simulating CMOS inverter circuits using CAD tools. It describes the necessary hardware, software, and foundry resources needed. The design steps are outlined as schematic creation, layout design, DRC checks, parasitic extraction, and post-layout simulation. A list of experiments is provided focusing on logic gates, flip flops, multiplexers, and sequential circuits. The document also discusses the Microwind tool for circuit layout and simulation and provides tutorials on MOS devices and design rules for the layout process.
The document discusses embedded systems. It defines embedded systems as computing systems that are integrated into larger devices and dedicated to a specific task. Examples include systems found in appliances, vehicles, medical equipment, and many other devices. The document outlines key characteristics of embedded systems like size, cost, power and performance constraints. It also covers embedded system applications, development cycle, challenges, example projects and differences between Arduino and mbed platforms. The future of embedded systems is predicted to include more connectivity through technologies like IoT.
This document provides an overview of logic synthesis and describes the tools used for logic synthesis. It discusses the different design objects like design, cell, port, pin, net, etc. It also provides examples of using the "find" command in Design Compiler to search for different design objects. The document includes an introduction to logic synthesis, the logic synthesis flow and optimization steps, as well as the HDL compiler and Design Compiler tools.
The technical seminar presentation discusses applications of artificial neural networks (ANNs) in microwave engineering. Specifically, it covers how ANNs can be used to model smart antenna systems and the resonant frequency of rectangular patch antennas. It also discusses using ANNs for the demand node concept in radio network analysis and optimization. The presentation provides details on ANN modeling approaches, network parameters, initialization and selection algorithms, adaptation, and optimization techniques.
The document outlines the FPGA design flow process which includes:
1) Design entry using HDL or schematic entry,
2) Synthesis to create a netlist,
3) Implementation including translating, mapping, placing and routing the design on the FPGA, and
4) Configuration and programming the FPGA with the bitstream.
DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009Harshal Hayatnagarkar
Ruby language is an attractive choice for constructing internal domain-specific languages. Living true to the quote of Bjarne Stroustrup "Library Design is Language Design, and Library Design is Language Design", a good design in Ruby can be warped into a good DSL without much efforts.
Silent sound technology allows for communication without vocalizing words by analyzing electrical signals from speech muscles or images of the mouth. It has two methods - electromyography detects electric pulses from speech muscles and image processing analyzes mouth movements. Sensors are attached to the face to capture these signals, which are then converted to speech patterns through a vocoder and compared to a database to determine the intended words. While this technology could help people who cannot speak or allow for private calls, it currently requires many sensors attached to the face and has difficulties with tonal languages.
This is the presentation for my library instruction session for Dr. Thomas Joyce's IMAG 4850 Senior Design class at the College of Engineering and Applied Sciences, at Western Michigan University.
This document describes a minor project report submitted by Deepa Bhatia to Chhattisgarh Swami Vivekanand Technical University for the degree of Bachelor of Engineering in Electronics and Telecommunication Engineering. The project involves the development of a development board containing an 8051 microcontroller and various peripherals. The report covers the basics of soldering, designing a power supply circuit, an LDR light sensing circuit, using DipTrace software to design the PCB layout, and the components used in the final development board. The goal of the project was to not only design the development board but also learn skills in PCB design, circuit assembly, and interfacing modules for use in future projects.
M.Tech Thesis on Simulation and Hardware Implementation of NLMS algorithm on ...Raj Kumar Thenua
This document is a dissertation submitted by Raj Kumar Thenua for the degree of Master of Technology in electronics and communication engineering from Sobhasaria Engineering College in Sikar, India. The dissertation investigates simulation and hardware implementation of the NLMS adaptive filtering algorithm on a TMS320C6713 digital signal processor. It includes an introduction, literature review, overview of adaptive filtering algorithms like LMS, NLMS and RLS, MATLAB simulation of these algorithms, Simulink model design for hardware implementation, real-time implementation on the TMS320C6713 processor and results and discussion.
The document summarizes a PhD research presentation on translating natural language specifications to the Object Constraint Language (OCL). The research aims to make OCL more usable by allowing constraints to be written in English instead of OCL syntax. The researcher proposes a framework that maps English specifications to the Semantic Business Vocabulary and Rules (SBVR) and then generates valid OCL code. No prior work has directly translated natural language to OCL. The research could make OCL more adoptable and useful for modeling systems.
This document outlines a research project in the Department of Computer Science & Engineering. It discusses topics such as an introduction, literature review, research gaps, objectives, a proposed model, experimental setup, results discussion, conclusions, future work, and references. The project is guided by Mr. Ritik Dwivedi and focuses on ABCD(XXXXXXXXX) as its main topic.
CSEE&T 2017 SWEBOK Evolution Panel - View from ISO/IEC/JTC1/SC7/WG20 and SEMATHironori Washizaki
Hironori Washizaki, "CSEE&T 2017 SWEBOK Evolution Panel - View from ISO/IEC/JTC1/SC7/WG20 and SEMAT", 30th IEEE Conference on Software Engineering Education and Training (CSEE&T), Savannah, Georgia, November 7-9, 2017
This course for someone with no previous experience with VHDL to learn VHDL language and write codes targeting hardware.
This course is focusing on syntax of VHDL, basic design circuits.
For all course videos and material visit YouTube channel
www.youtube.com/channel/UCcecv3gqLQCRT8MS3_aRn9Q
This document outlines a course on Computer Aided Design (CAD). The course will provide 30 hours of instruction over 15 weeks on using CAD software. Students will learn how to produce drawings for graphics, electrical components, circuit schematics, and wiring diagrams. Upon completing the course, students should be able to describe CAD concepts, use drawing commands proficiently in the software, and produce precise technical drawings faster. Examples of 2D and 3D drawings from mechanical, civil, and electrical engineering are shown. The course will use AutoCAD 2004 software and cover topics like file formats, system requirements, and benefits of CAD over manual drawing. Students will be assessed through practical work, chapter problems, and tests.
A quick overview of the cryptographic algorithms used by TrustNote, including hash function, signature algorithm, PoW algorithm
For detailed technical information, please visit https://github.com/trustnote/document/blob/master/TrustNote-TR-2018-01.md
This document discusses live coding, which involves writing computer code in real time to compose music or graphics. It introduces TOPLAP, an organization promoting live coding, and various live coding environments like SuperCollider and Fluxus. Fluxus is an open source live coding environment that uses the Scheme programming language. It allows for 3D graphics, rapid prototyping, and cross-platform use. The document demonstrates Fluxus with a simple example and discusses Scheme's syntax and features like recursion and list manipulation that make it suitable for live coding.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
1. NationalInstituteofScienceandTechnology
[1]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
A Review of Data Compression Techniques
Presented
by
Sudeepta Mishra
Roll# CS200117052
At
NIST,Berhampur
Under the guidance of
Mr. Rowdra Ghatak
2. NationalInstituteofScienceandTechnology
[2]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Introduction
• Data compression is the process of encoding data so
that it takes less storage space or less transmission time
than it would if it were not compressed.
• Compression is possible because most real-world data
is very redundant
3. NationalInstituteofScienceandTechnology
[3]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Different Compression Techniques
• Mainly two types of data Compression techniques are
there.
– Loss less Compression.
Useful in spreadsheets, text, executable program
Compression.
– Lossy less Compression.
Compression of images, movies and sounds.
4. NationalInstituteofScienceandTechnology
[4]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Types of Loss less data Compression
• Dictionary coders.
– Zip (file format).
– Lempel Ziv.
• Entropy encoding.
– Huffman coding (simple entropy coding).
• Run-length encoding.
5. NationalInstituteofScienceandTechnology
[5]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Dictionary-Based Compression
• Dictionary-based algorithms do not encode
single symbols as variable-length bit strings;
they encode variable-length strings of symbols
as single tokens.
• The tokens form an index into a phrase
dictionary.
• If the tokens are smaller than the phrases
they replace, compression occurs.
6. NationalInstituteofScienceandTechnology
[6]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Types of Dictionary
• Static Dictionary.
• Semi-Adaptive Dictionary.
• Adaptive Dictionary.
– Lempel Ziv algorithms belong to this category of
dictionary coders. The dictionary is being built in a
single pass, while at the same time encoding the data.
– The decoder can build up the dictionary in the same
way as the encoder while decompressing the data.
7. NationalInstituteofScienceandTechnology
[7]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
• Using a English Dictionary the string:
“A good example of how dictionary based compression works”
• Gives : 1/1 822/3 674/4 1343/60 928/75 550/32 173/46 421/2
• Using the dictionary as lookup table, each word is coded as
x/y, where, x gives the page no. and y gives the number of
the word on that page. If the dictionary has 2,200 pages
with less than 256 entries per page: Therefore x requires 12
bits and y requires 8 bits, i.e., 20 bits per word (2.5 bytes per
word). Using ASCII coding the above string requires 48
bytes, whereas our encoding requires only 20 (<-2.5 * 8)
bytes: 50% compression.
Dictionary-Based Compression: Example
8. NationalInstituteofScienceandTechnology
[8]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Lempel Ziv
• It is a family of algorithms, stemming from the two
algorithms proposed by Jacob Ziv and Abraham Lempel in
their landmark papers in 1977 and 1978.
LZ77 LZ78
LZR
LZHLZSS LZB
LZFG
LZC LZT LZMW
LZW
LZJ
9. NationalInstituteofScienceandTechnology
[9]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
LZW Algorithm
• It is An improved version of LZ78 algorithm.
• Published by Terry Welch in 1984.
• A dictionary that is indexed by “codes” is used.
The dictionary is assumed to be initialized with
256 entries (indexed with ASCII codes 0 through
255) representing the ASCII table.
10. NationalInstituteofScienceandTechnology
[10]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression)
W = NIL;
while (there is input){
K = next symbol from input;
if (WK exists in the dictionary) {
W = WK;
} else {
output (index(W));
add WK to the dictionary;
W = K;
}
}
11. NationalInstituteofScienceandTechnology
[11]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Flow Chart
START
W= NULL
IS EOF
?
K=NEXT INPUT
IS WK
FOUND?
W=WK
OUTPUT INDEX OF W
ADD WK TO DICTIONARY
STOP
W=K
YES
NO
YES
NO
12. NationalInstituteofScienceandTechnology
[12]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• Input string is
• The Initial
Dictionary
contains symbols
like
a, b, c, d with their
index values as 1, 2,
3, 4 respectively.
• Now the input string
is read from left to
right. Starting from
a.
a b d c a d a c
a 1
b 2
c 3
d 4
14. NationalInstituteofScienceandTechnology
[14]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• K = b.
• WK = ab
is not in the dictionary.
• Add WK to
dictionary
• Output code for a.
• Set W = b
a b d c a d a c
K
1
ab 5a 1
b 2
c 3
d 4
15. NationalInstituteofScienceandTechnology
[15]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• K = d
• WK = bd
Not in the dictionary.
Add bd to dictionary.
• Output code b
• Set W = d
a b d c a d a c
1
K
2
ab 5a 1
b 2
c 3
d 4
bd 6
16. NationalInstituteofScienceandTechnology
[16]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• K = a
• WK = da
not in the dictionary.
• Add it to dictionary.
• Output code d
• Set W = a
a b d a b d a c
1
K
2 4
ab 5a 1
b 2
c 3
d 4
bd 6
da 7
17. NationalInstituteofScienceandTechnology
[17]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• K = b
• WK = ab
It is in the dictionary.
a b d a b d a c
1
K
2 4
ab 5a 1
b 2
c 3
d 4
bd 6
da 7
18. NationalInstituteofScienceandTechnology
[18]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• K = d
• WK = abd
Not in the dictionary.
• Add W to the
dictionary.
• Output code for W.
• Set W = d
a b d a b d a c
1
K
2 4 5
ab 5a 1
b 2
c 3
d 4
bd 6
da 7
abd 8
19. NationalInstituteofScienceandTechnology
[19]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• K = a
• WK = da
In the dictionary.
a b d a b d a c
1
K
2 4 5
ab 5a 1
b 2
c 3
d 4
bd 6
da 7
abd 8
20. NationalInstituteofScienceandTechnology
[20]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• K = c
• WK = dac
Not in the dictionary.
• Add WK to the
dictionary.
• Output code for W.
• Set W = c
• No input left so
output code for W.
a b d a b d a c
1
K
2 4 5
ab 5a 1
b 2
c 3
d 4
bd 6
da 7
abd 8
7
dac 9
21. NationalInstituteofScienceandTechnology
[21]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Compression) Example
• The final output
string is
1 2 4 5 7 3
• Stop.
cadbadba
1
K
2 4 5
5ab
4d
3c
2b
1a
6bd
7da
8abd
7
9dac
3
22. NationalInstituteofScienceandTechnology
[22]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
LZW Decompression Algorithm
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{ entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry; }
23. NationalInstituteofScienceandTechnology
[23]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
LZW Decompression Algorithm Flow Chart
START
Output K
IS EOF
?
K=NEXT INPUT
ENTRY=DICTIONARY INDEX (K)
ADD W+ENTRY[0] TO DICTIONARY
STOP
W=ENTRY
K=INPUT
W=K
YES
NO
Output ENTRY
25. NationalInstituteofScienceandTechnology
[25]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Decompression) Example
• K = 2
• entry = b
• Output entry
• Add W + entry[0] to
dictionary
• W = entry[0] (i.e. b)
1
K
2 4 5
4d
3c
2b
1a
7 3
a b
5ab
26. NationalInstituteofScienceandTechnology
[26]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Decompression) Example
• K = 4
• entry = d
• Output entry
• Add W + entry[0] to
dictionary
• W = entry[0] (i.e. d)
1
K
2 4 5
4d
3c
2b
1a
7 3
a b
5ab
6bd
d
27. NationalInstituteofScienceandTechnology
[27]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Decompression) Example
• K = 5
• entry = ab
• Output entry
• Add W + entry[0] to
dictionary
• W = entry[0] (i.e. a)
1
K
2 4 5
4d
3c
2b
1a
7 3
a b
5ab
6bd
d a b
7da
28. NationalInstituteofScienceandTechnology
[28]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Decompression) Example
• K = 7
• entry = da
• Output entry
• Add W + entry[0] to
dictionary
• W = entry[0] (i.e. d)
1
K
2 4 5
4d
3c
2b
1a
7 3
a b
5ab
6bd
d a b
7da
d a
8abd
29. NationalInstituteofScienceandTechnology
[29]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
The LZW Algorithm (Decompression) Example
• K = 3
• entry = c
• Output entry
• Add W + entry[0] to
dictionary
• W = entry[0] (i.e. c)
1
K
2 4 5
4d
3c
2b
1a
7 3
a b
5ab
6bd
d a b
7da
d a
8abd
c
9dac
30. NationalInstituteofScienceandTechnology
[30]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Advantages
• As LZW is adaptive dictionary coding no need to
transfer the dictionary explicitly.
• It will be created at the decoder side.
• LZW can be made really fast, it grabs a fixed number
of bits from input, so bit parsing is very easy, and table
look up is automatic.
31. NationalInstituteofScienceandTechnology
[31]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Problems with the encoder
• What if we run out of space?
– Keep track of unused entries and use LRU (Last
Recently Used).
– Monitor compression performance and flush
dictionary when performance is poor.
32. NationalInstituteofScienceandTechnology
[32]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
Conclusion
• LZW has given new dimensions for the development of
new compression techniques.
• It has been implemented in well known compression
format like Acrobat PDF and many other types of
compression packages.
• In combination with other compression techniques
many other different compression techniques are
developed like LZMS.
33. NationalInstituteofScienceandTechnology
[33]
Technical Seminar Presentation 2005
Sudeepta Mishra
NationalInstituteofScienceandTechnology
Sudeepta Mishra CS200117052
REFERENCES
[1] http://www.bambooweb.com/articles/d/a/Data_Compression.html
[2] http://tuxtina.de/files/seminar/LempelZivReport.pdf
[3] BELL, T. C., CLEARY, J. G., AND WITTEN, I. H. Text
Compression. Prentice Hall, Upper Sadle River, NJ, 1990.
[4] http://www.cs.cf.ac.uk/Dave/Multimedia/node214.html
[5] http://download.cdsoft.co.uk/tutorials/rlecompression/Run-
Length Encoding (RLE) Tutorial.htm
[6] David Salomon, Data Compression The Complete Reference,
Second Edition. Springer-Verlac, New York, Inc, 2001 reprint.
[7] http://www.programmersheaven.com/2/Art_Huffman_p1.htm
[8] http://www.programmersheaven.com/2/Art_Huffman_p2.htm
[9] Khalid Sayood, Introduction to Data Compression Second
Edition, Chapter 5, pp. 137-157, Harcourt India Private Limited.