Design and Implementation of LZW Data Compression Algorithmijistjournal
LZW is dictionary based algorithm, which is lossless in nature and incorporated as the standard of the consultative committee on International telegraphy and telephony, which is implemented in this paper. Here, the designed dictionary is based on content addressable memory (CAM) array. Furthermore, the code for each character is available in the dictionary which utilizes less number of bits (5 bits) than its ASCII code. In this paper, LZW data compression algorithm is implemented by finite state machine, thus the text data can be effectively compressed. Accurate simulation results are obtained using Xilinx tools which show an improvement in lossless data compression scheme by reducing storage space to 60.25% and increasing the compression rate by 30.3%.
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...RSIS International
In this paper, we have designed the VLSI hardware for a novel RS decoding algorithm suitable for Multi-Gb/s Communication Systems. Through this paper we show that the performance benefit of the algorithm is truly witnessed when implemented in hardware thus avoiding the extra processing time of Fetch-Decode-Execute cycle of traditional microprocessor based computing systems. The new algorithm with less time complexity combined with its application specific hardware implementation makes it suitable for high speed real-time systems with hard timing constraints. The design is implemented as a digital hardware using VHDL
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...ijwmn
Non-binary low-density parity check (NB-LDPC) codes are an extension of binary LDPC codes with
significantly better performance. Although various kinds of low-complexity iterative decoding algorithms
have been proposed, there is a big challenge for VLSI implementation of NBLDPC decoders due to its high
complexity and long latency. In this brief, highly efficient check node processing scheme, which the
processing delay greatly reduced, including Min-Max decoding algorithm and check node unit are
proposed. Compare with previous works, less than 52% could be reduced for the latency of check node
unit. In addition, the efficiency of the presented techniques is design to demonstrate for the (620, 310) NBQC-
LDPC decoder.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Design and Implementation of LZW Data Compression Algorithmijistjournal
LZW is dictionary based algorithm, which is lossless in nature and incorporated as the standard of the consultative committee on International telegraphy and telephony, which is implemented in this paper. Here, the designed dictionary is based on content addressable memory (CAM) array. Furthermore, the code for each character is available in the dictionary which utilizes less number of bits (5 bits) than its ASCII code. In this paper, LZW data compression algorithm is implemented by finite state machine, thus the text data can be effectively compressed. Accurate simulation results are obtained using Xilinx tools which show an improvement in lossless data compression scheme by reducing storage space to 60.25% and increasing the compression rate by 30.3%.
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...RSIS International
In this paper, we have designed the VLSI hardware for a novel RS decoding algorithm suitable for Multi-Gb/s Communication Systems. Through this paper we show that the performance benefit of the algorithm is truly witnessed when implemented in hardware thus avoiding the extra processing time of Fetch-Decode-Execute cycle of traditional microprocessor based computing systems. The new algorithm with less time complexity combined with its application specific hardware implementation makes it suitable for high speed real-time systems with hard timing constraints. The design is implemented as a digital hardware using VHDL
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...ijwmn
Non-binary low-density parity check (NB-LDPC) codes are an extension of binary LDPC codes with
significantly better performance. Although various kinds of low-complexity iterative decoding algorithms
have been proposed, there is a big challenge for VLSI implementation of NBLDPC decoders due to its high
complexity and long latency. In this brief, highly efficient check node processing scheme, which the
processing delay greatly reduced, including Min-Max decoding algorithm and check node unit are
proposed. Compare with previous works, less than 52% could be reduced for the latency of check node
unit. In addition, the efficiency of the presented techniques is design to demonstrate for the (620, 310) NBQC-
LDPC decoder.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matching
(PPM) compression algorithm. This achieves significantly better compression for different natural
language texts compared to other well-known compression methods. Our method first generates a grammar
based on the most common two-character sequences (bigraphs) or three-character sequences (trigraphs) in
the text being compressed and then substitutes these sequences using the respective non-terminal symbols
defined by the grammar in a pre-processing phase prior to the compression. This leads to significantly
improved results in compression for various natural languages (a 5% improvement for American English,
10% for British English, 29% for Welsh, 10% for Arabic, 3% for Persian and 35% for Chinese). We
describe further improvements using a two pass scheme where the grammar-based pre-processing is
applied again in a second pass through the text. We then apply the algorithms to the files in the Calgary
Corpus and also achieve significantly improved results in compression, between 11% and 20%, when
compared with other compression algorithms, including a grammar-based approach, the Sequitur
algorithm.
Sequence Learning,
Simply introduce sequence learning technique to do the temporal classification Task. Include recurrent neural network, long-short term memory, bidirectional neural network, connectionist Temporal Classification and our experiment on low-resource language.
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...IJMERJOURNAL
ABSTRACT: The size of DNA (Deoxyribonucleic Acid) sequences is varying in the range of millions to billions of nucleotides and two or three times bigger annually. Therefore efficient lossless compression technique, data structures to efficiently store, access, secure communicate and search these large datasets are necessary. This compression algorithm for genetic sequences, based on searching the exact repeat, reverse and palindrome (R2P) substring substitution and create a Library file. The R2P substring is replaced by corresponding ASCII character where for repeat, selecting ASCII characters ranging from 33 to 33+72, for reverse from 33+73 to 33+73+72 and for palindrome from 179 to 179+72. The selective encryption technique, the data are encrypted either in the library file or in compressed file or in both, also by using ASCII code and online library file acting as a signature. Selective encryption, where a part of message is encrypted keeping the remaining part unencrypted, can be a viable proposition for running encryption system in resource constraint devices. The algorithm can approach a moderate compression rate, provide strong data security, the running time is very few second and the complexity is O(n2 ). Also the compressed data again compressed by renounced compressor for reducing the compression rate & ratio. This techniques can approach a compression rate of 2.004871bits/base.
A practical parser with combined parsingijseajournal
This paper introduces a practical solution for dramatically enlarging the capabilities of an established
parser, a task that presents substantial challenges. During the development of new procedures for
SUDAAN®, a commercial statistical software package, we found the existing parser to be inadequate for
new situations. Like many other parsers, the one in use could be characterized as a no-repair, noguesswork,
and no-backtracking look-ahead left-to-right LALR(1) parser [1, p. 300]. This paper describes
how the parser was enhanced to handle extra syntax for sophisticated mathematical and logical
expressions. The new parser adds a noncanonical parsing technique, along with a Shunting-Yard-style
algorithm and other techniques as a second step after the original canonical LALR [2], resulting in a
powerful and efficient two-level parsing approach. Adding a second step to the successful one-step parser
offered a way to preserve existing, well-tested capabilities while adding capabilities for parsing more
complex syntax.
Principal Type Scheme for Session TypesCSCJournals
Session types model communication between processes as dialogues specied by sequences of types of messages, each of which describe the format and direction of the message. The resulting system imposes a type discipline that guarantees compatibility of interaction patterns between processes of a well-typed program. The system is polymorphic in Curry\'s style, but no formal treatment of this aspect has been provided yet. In this paper we present a system assigning type schemes to programs and an algorithm of inference of the principal type scheme of any typable program for a significant fragment of the calculus which allows delegation of communication, i.e. transmission of channels. We use classical syntax for variables and channels, i.e. just one sort of names in each case for either bound or free occurrences. We prove soundness and completeness of the algorithm, working on individual terms rather than on alpha-equivalence classes. The algorithm has been implemented in Haskell and partially checked in the proof assistant Agda.
In this paper we analyze the cryptanalysis of the simplified data encryption standard algorithm using metaheuristics
and in particular genetic algorithms. The classic fitness function when using such an algorithm
is to compare n-gram statistics of a the decrypted message with those of the target message. We show that
using such a function is irrelevant in case of Genetic Algorithm, simply because there is no correlation
between the distance to the real key (the optimum) and the value of the fitness, in other words, there is no
hidden gradient. In order to emphasize this assumption we experimentally show that a genetic algorithm
perform worse than a random search on the cryptanalysis of the simplified data encryption standard
algorithm.
ARM procedure calling conventions and recursionStephan Cadene
◆ A portion of code within a larger program. Often called
a subroutine or procedure in imperative languages like C
methods in OO languages like Java
and functions in functional languages like Haskell
◆ Functions return a value. So some purists would say that a C
function returning void is actually a procedure !
◆ Procedures are necessary for:
reducing duplication of code and enabling re-use
decomposing complex programs into manageable parts
◆ Procedures can call each other and can even call themselves
◆ What happens when we call a procedure?
The caller is suspended; control hands over to the callee
Callee performs the requested task
Callee returns control to the caller
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsKamiya Toshihiro
Presentation of:
[Position Paper] Toshihiro Kamiya, Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods, Proc. 10th International Workshop on Software Clones (IWSC 2016), pp. 19-20, 2016.
Notice: re-uploaded on March 16, 2016. (Fix "IWSC05's" -> "IWSC15's" on page 5)
PERFORMANCE OF ITERATIVE LDPC-BASED SPACE-TIME TRELLIS CODED MIMO-OFDM SYSTEM...ijcseit
This paper presents the bit error rate (BER) performance of the low density parity check (LDPC) based
space-time trellis coded 2×2 multiple-input multiple-output orthogonal frequency-division multiplexing
(STTC-MIMO-OFDM) system on text message transmission. The system under investigation incorporates
1/2-rated LDPC encoding scheme under various digital modulations (BPSK, QPSK and QAM) over an
additative white gaussian noise (AWGN) and other fading (Raleigh and Rician) channels for two transmit
and two receive antennas. At the receiving section of the simulated system, Minimum Mean-Square-Error
(MMSE) channel equalization technique has been implemented to extract transmitted symbols without
enhancing noise power level. The effectiveness of the proposed system is analyzed in terms of BER with
signal-to-noise ratio (SNR). It is observable from the Matlab based simulation study that the proposed
system outperforms with BPSK as compared to other digital modulation schemes at relatively low SNRs
under AWGN, Rayleigh and Rician fading channels. The transmitted text message is found to have
retrieved effectively at the receiver under implementation of iterative sum-product LDPC decoding
algorithm. It has also been anticipated that the performance of the LDPC-based STTC-MIMO-OFDM
system degrades with the increase of noise power.
In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matching
(PPM) compression algorithm. This achieves significantly better compression for different natural
language texts compared to other well-known compression methods. Our method first generates a grammar
based on the most common two-character sequences (bigraphs) or three-character sequences (trigraphs) in
the text being compressed and then substitutes these sequences using the respective non-terminal symbols
defined by the grammar in a pre-processing phase prior to the compression. This leads to significantly
improved results in compression for various natural languages (a 5% improvement for American English,
10% for British English, 29% for Welsh, 10% for Arabic, 3% for Persian and 35% for Chinese). We
describe further improvements using a two pass scheme where the grammar-based pre-processing is
applied again in a second pass through the text. We then apply the algorithms to the files in the Calgary
Corpus and also achieve significantly improved results in compression, between 11% and 20%, when
compared with other compression algorithms, including a grammar-based approach, the Sequitur
algorithm.
Sequence Learning,
Simply introduce sequence learning technique to do the temporal classification Task. Include recurrent neural network, long-short term memory, bidirectional neural network, connectionist Temporal Classification and our experiment on low-resource language.
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...IJMERJOURNAL
ABSTRACT: The size of DNA (Deoxyribonucleic Acid) sequences is varying in the range of millions to billions of nucleotides and two or three times bigger annually. Therefore efficient lossless compression technique, data structures to efficiently store, access, secure communicate and search these large datasets are necessary. This compression algorithm for genetic sequences, based on searching the exact repeat, reverse and palindrome (R2P) substring substitution and create a Library file. The R2P substring is replaced by corresponding ASCII character where for repeat, selecting ASCII characters ranging from 33 to 33+72, for reverse from 33+73 to 33+73+72 and for palindrome from 179 to 179+72. The selective encryption technique, the data are encrypted either in the library file or in compressed file or in both, also by using ASCII code and online library file acting as a signature. Selective encryption, where a part of message is encrypted keeping the remaining part unencrypted, can be a viable proposition for running encryption system in resource constraint devices. The algorithm can approach a moderate compression rate, provide strong data security, the running time is very few second and the complexity is O(n2 ). Also the compressed data again compressed by renounced compressor for reducing the compression rate & ratio. This techniques can approach a compression rate of 2.004871bits/base.
A practical parser with combined parsingijseajournal
This paper introduces a practical solution for dramatically enlarging the capabilities of an established
parser, a task that presents substantial challenges. During the development of new procedures for
SUDAAN®, a commercial statistical software package, we found the existing parser to be inadequate for
new situations. Like many other parsers, the one in use could be characterized as a no-repair, noguesswork,
and no-backtracking look-ahead left-to-right LALR(1) parser [1, p. 300]. This paper describes
how the parser was enhanced to handle extra syntax for sophisticated mathematical and logical
expressions. The new parser adds a noncanonical parsing technique, along with a Shunting-Yard-style
algorithm and other techniques as a second step after the original canonical LALR [2], resulting in a
powerful and efficient two-level parsing approach. Adding a second step to the successful one-step parser
offered a way to preserve existing, well-tested capabilities while adding capabilities for parsing more
complex syntax.
Principal Type Scheme for Session TypesCSCJournals
Session types model communication between processes as dialogues specied by sequences of types of messages, each of which describe the format and direction of the message. The resulting system imposes a type discipline that guarantees compatibility of interaction patterns between processes of a well-typed program. The system is polymorphic in Curry\'s style, but no formal treatment of this aspect has been provided yet. In this paper we present a system assigning type schemes to programs and an algorithm of inference of the principal type scheme of any typable program for a significant fragment of the calculus which allows delegation of communication, i.e. transmission of channels. We use classical syntax for variables and channels, i.e. just one sort of names in each case for either bound or free occurrences. We prove soundness and completeness of the algorithm, working on individual terms rather than on alpha-equivalence classes. The algorithm has been implemented in Haskell and partially checked in the proof assistant Agda.
In this paper we analyze the cryptanalysis of the simplified data encryption standard algorithm using metaheuristics
and in particular genetic algorithms. The classic fitness function when using such an algorithm
is to compare n-gram statistics of a the decrypted message with those of the target message. We show that
using such a function is irrelevant in case of Genetic Algorithm, simply because there is no correlation
between the distance to the real key (the optimum) and the value of the fitness, in other words, there is no
hidden gradient. In order to emphasize this assumption we experimentally show that a genetic algorithm
perform worse than a random search on the cryptanalysis of the simplified data encryption standard
algorithm.
ARM procedure calling conventions and recursionStephan Cadene
◆ A portion of code within a larger program. Often called
a subroutine or procedure in imperative languages like C
methods in OO languages like Java
and functions in functional languages like Haskell
◆ Functions return a value. So some purists would say that a C
function returning void is actually a procedure !
◆ Procedures are necessary for:
reducing duplication of code and enabling re-use
decomposing complex programs into manageable parts
◆ Procedures can call each other and can even call themselves
◆ What happens when we call a procedure?
The caller is suspended; control hands over to the callee
Callee performs the requested task
Callee returns control to the caller
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsKamiya Toshihiro
Presentation of:
[Position Paper] Toshihiro Kamiya, Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods, Proc. 10th International Workshop on Software Clones (IWSC 2016), pp. 19-20, 2016.
Notice: re-uploaded on March 16, 2016. (Fix "IWSC05's" -> "IWSC15's" on page 5)
PERFORMANCE OF ITERATIVE LDPC-BASED SPACE-TIME TRELLIS CODED MIMO-OFDM SYSTEM...ijcseit
This paper presents the bit error rate (BER) performance of the low density parity check (LDPC) based
space-time trellis coded 2×2 multiple-input multiple-output orthogonal frequency-division multiplexing
(STTC-MIMO-OFDM) system on text message transmission. The system under investigation incorporates
1/2-rated LDPC encoding scheme under various digital modulations (BPSK, QPSK and QAM) over an
additative white gaussian noise (AWGN) and other fading (Raleigh and Rician) channels for two transmit
and two receive antennas. At the receiving section of the simulated system, Minimum Mean-Square-Error
(MMSE) channel equalization technique has been implemented to extract transmitted symbols without
enhancing noise power level. The effectiveness of the proposed system is analyzed in terms of BER with
signal-to-noise ratio (SNR). It is observable from the Matlab based simulation study that the proposed
system outperforms with BPSK as compared to other digital modulation schemes at relatively low SNRs
under AWGN, Rayleigh and Rician fading channels. The transmitted text message is found to have
retrieved effectively at the receiver under implementation of iterative sum-product LDPC decoding
algorithm. It has also been anticipated that the performance of the LDPC-based STTC-MIMO-OFDM
system degrades with the increase of noise power.
The FestGuru.com Ultimate Guide to TomorrowWorldFest Guru
Many hours of work went into creating this awesome guide to TomorrowWorld. Pulled from experiences over the last 3 years, we cover everything you need to know.
Myanmar 's largest marketplace for health and beauty merchandise to shop for Union of Burma ancient medicines, myanmar hair, Union of Burma article of clothing, myanmar dresses, Union of Burma ancient dresses etc. you'll be able to directly take care of Myanmar Suppliers via BaganTrade platform.
IEEE 2014 NS2 NETWORKING PROJECTS Fast regular expression matching using sma...IEEEBEBTECHSTUDENTPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Deep Packet Inspection with Regular Expression MatchingEditor IJCATR
Deep packet inspection directs, persists, filters and logs IP-based applications and Web services traffic based on content
encapsulated in a packet's header or payload, regardless of the protocol or application type. In content scanning, the packet payload is
compared against a set of patterns specified as regular expressions. With deep packet inspection in place through a single intelligent
network device, companies can boost performance without buying expensive servers or additional security products. They are typically
matched through deterministic finite automata (DFAs), but large rule sets need a memory amount that turns out to be too large for
practical implementation. Many recent works have proposed improvements to address this issue, but they increase the number of
transitions (and then of memory accesses) per character. This paper presents a new representation for DFAs, orthogonal to most of the
previous solutions, called delta finite automata (FA), which considerably reduces states and transitions while preserving a transition
per character only, thus allowing fast matching. A further optimization exploits Nth order relationships within the DFA by adopting
the concept of temporary transitions.
IEEE 802 refers to a family of IEEE standards
Dealing with local area network and metropolitan area network.
Restricted to networks carrying variable-size packets.
Specified in IEEE 802 map to the lower two layers
Data link layer
Physical layer
The most widely used standards
.802.3 - Ethernet
802.4 - Token Bus
802.5 - Token Ring
Packet Classification using Support Vector Machines with String KernelsIJERA Editor
Since the inception of internet many methods have been devised to keep untrusted and malicious packets away
from a user’s system . The traffic / packet classification can be used
as an important tool to detect intrusion in the system. Using Machine Learning as an efficient statistical based
approach for classifying packets is a novel method in practice today . This paper emphasizes upon using an
advanced string kernel method within a support vector machine to classify packets .
There exists a paper related to a similar problem using Machine Learning [2]. But the researches mentioned in
their paper are not up-to date and doesn’t account for modern day
string kernels that are much more efficient . My work extends their research by introducing different approaches
to classify encrypted / unencrypted traffic / packets .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...ijsrd.com
Deep Packet Inspection (DPI) is becoming more widely used in virtually all applications or services like Intrusion Detection System (IDS), which operate with or within a network. DPI analyzes all data present in the packet as it passes an inspection to determine the application transported and protocol. Deep packet inspection typically uses regular expression matching as a core operator. Regular expressions (RegExes) are used to flexibly represent complex string patterns in many applications ranging from network intrusion detection and prevention systems (NIDPSs). Regular expressions represent complex string pattern as attack signatures in DPI. It examine whether a packet’s payload matches any of a set of predefined regular expressions. There are various techniques developed in DPI for deep packet inspection for regular expression. We survey on these techniques for further improvement in regular expression detection in this paper. In the result we found that it is possible to reduce RegEx transaction memory required in network intrusion detection. We made this survey with possible use of DPI techniques in the wireless network.
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmEditor IJCATR
Sensor network consists of a large number of small nods, strongly interacting with the physical environment, takes
environmental data through sensors, and reacts after processing on information. Wireless network technologies are widely used in most
applications. As wireless sensor networks have many activities in the field of information transmission, network congestion cannot be
thus avoided. So it seems necessary that some new methods can control congestion and use existing resources for providing better traffic
demands. Congestion increases packet loss and retransmission of removed packets and also wastes of energy. In this paper, a novel
method is presented for congestion control in wireless sensor networks using genetic algorithm. The results of simulation show that the
proposed method, in comparison with the algorithm LEACH, can significantly improve congestion control at high speeds.
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONcsandit
The Internet and the ubiquitous presence of computing devices anywhere is generating a
continuously growing amount of information. However, the information entropy is not uniform.
It allows the use of data compression algorithms to reduce the demand for more powerful
processors and larger data storage equipment. This paper presents an adaptive rule-driven
device - the adaptive automata - as the device to identify repetitive patterns to be compressed in
a grammar based lossless data compression scheme.
1. International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
729
Finite States Optimization Using Pattern Matching Algorithm
Amitesh Bhardwaj1
, Somesh Kumar Dewangan2
1
M.Tech CSE Dept, DIMAT, Chhattisgarh,
2
M.Tech Associate Professor CSE Dept, DIMAT, Chhattisgarh,
Abstract — The pattern sequence is an expression that is a
statement in a language designed specifically to represent
prescribed targets in the most concise and flexible way to direct the
automation of text processing of general text files, specific textual
forms, or of random input strings .Regular expressions (RE) are
getting popular still under developed stage of their inherent
complexity that limits the total number of RE that can be known by
using a single chip. This limit on the number of RE doesn’t pair
with the scalability of present RE detection systems. Existing
schemes is limited in the old detection paradigm based on per-
character-state working and also state transition detection. Keeps
concentrate on optimizing the count of states and the need
transitions, but not on concept of optimizing the suboptimal
character-based detection method .The advantages of allowing out-
of-sequence detection, rather than detecting components of a RE in
order of appearance, have not been explored. LaFA needs less
memory due to these three aspects providing specialized and
optimized detection modules, systematically reordering the RE
detection sequence and sharing states among automata for different
RE’s.
I. INTRODUCTION
Flexibility in complex string patterns in large number
of applications ranging from network intrusion detection
and also prevention systems, compilers and DNA
multiple sequence alignment was provided by RE’s.
NIDPS’s uses RE to gives attacking signatures or packet
classifiers. A RE detection system need few requirements
for NIDPS high-speed networks scalability and max
throughput. A scalable detection system capable of
giving Look ahead Finite Automata supports the current
and expected future RE sets with less memory
requirements. Finding at line speed is another important
requirement of NIDPS. Our compact data structure is
capable by using very high-speed on-chip memory even
for large RE sets for bulky amount with high speed.
Finite automata (FA) are the de facto tools to mention the
RE detection problem. For RE detection on an input, the
FA starts at an initial state. And each character in the
input, the FA makes a transition to the next state, which
is determined by the last state and the present input
character. If the resulting state is unique, the FA is
termed a deterministic finite automaton (DFA); or else, it
is termed a nondeterministic finite automaton (NFA).
These both represent two extreme cases. DFA has
permanent time complexity which makes DFA the
preferred approach executed quickly on commodity
CPUs. NFA needs huge parallelism, making it harder to
implement in software. NFA allows multiple
simultaneous state transitions, leading to a higher time
complexity.
RE detection systems and, we believe, that limit the
scalability of these systems. RE share same components.
Normally FA approaches, a small state machine is used
to detect a component in a RE. This state machine is
changed since the similar component may appear
multiple times in different RE’s. Furthermore, most of
the time, the RE’s sharing this component cannot appear
at the same time in the input. As last, the repetition of the
same state machine for various RE’s introduces
redundancy and boundaries the scalability of the RE’s
detection system. Based on the above three observations,
we introduce LaFA, a novel detection method that
resolves scalability issues of the current RE detection
paradigm. The scalable and compact LaFA data structure
requires a small memory footprint and makes it feasible
to implement large RE sets using only very fast, small
on-chip memory.
II. PROBLEM STATEMENT
IDS find the Intrusion using known attack patterns
called signatures.
Every IDS will have more number of signatures (
more than 5000)
If Pattern matching algorithm is slow, the IDS
attack response time will be very high.
The existing efficient algorithms such as Boyer
Moore (BM), Aho-Coarasick(AC) does not improve
the throughput of IDS.
The proposed system is an Implementation of
Scalable look-ahead Regular Expression Detection
System.
Works based on look-ahead Finite Automata
Machine.
Improves the detection speed or attack-response
time.
The proposed system should be capable of
processing more number of signatures with more
Number of Complex Regular Expressions on every
packet payload.
The attack response time should be less when
Compared with Deterministic Finite Automatic
(DFA) Pattern Matching Procedures(aho-coarasick).
Should Provide pattern matching with Assertions
(back References, look-ahead, look-back, and
Conditional sub-patterns).
Should use less memory ( Space complexity is low)
2. International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
730
III. SYSTEM DEVELOPMENT
1) Buffered Lookup Modules: The buffered lookup
modules are core detection modules in the La FA
architecture. The look ahead operation can be known by
these modules. These modules use history-stored in
buffers to check past activity. Next, we explain various
detection modules based on the buffered lookup
approach.
2) Timestamps Lookup Module : TLM stores the
incoming character to its time of arrival. This module can
detect non repetition types of variable strings such as VA,
VB, and VC. TLM incorporates an input buffer, which
stores characters recently received from a packet in
chronological order. Using this buffer, TLM can answer
queries such as Does the character at time belongs to a
character class C? Let us consider the detection process
of RegEx1 as an example. For the detection of RegEx1, a
lowercase alphabetical character must be detected
between simple strings ―abc ‖ and ― op‖ Let us assume
for the time being that detection sequence and detection
timing is already verified by the correlation block. The
last portion that needs to be verified is whether the input
character between ―abc‖ and ―op‖ was a lowercase
alphabetical character or not. In the example, since at
time 4, a lowercase letter, appears in the input, RegEx1 is
detected.
RE1: a b c [a-z] op| S1 V1 S2
Input: a b c x o p
Time: 1 2 3 4 5 6
Input: a b c d e f x o p
Time: 1 2 3 4 5 6 789
In a formal notation of the ASCII values, let CCLM
={C0 C1….Ci} (can be 0 to 255) be an unordered set. Some
components can be empty (Cx=ϕ) . Each element Cx
stores time stamps (e.g., C120 = {t6} shows component
C120 (lowercase ― x‖) appeared at time 6). The query is
in a statement in the form of ―Is [t4,t6]ƞC120=ϕ‖ to
check for its validity. For some RegExes, more than one
timestamp needs to be stored per ASCII character. To
clarify this need, consider following example:
REGEX 6: abc [xΛ] {3}xyz| S1 V4 S3
Input: a b c x x x o p
Time: 1 2 3 4 5 6789
This example illustrates a situation that requires more
than one timestamp entry. Character ―x‖ appears four
times at times 4, 5, 6, and 7. CLM stores the timestamps.
However, time stamps 4–6 are overwritten, and only
timestamp 7 is stored because only one timestamp entry
is assigned. In this situation, RegEx element
[x]{3}cannot be verified correctly.
At least two time stamp entries must be reserved in
CLM for this example RegEx.1 The simulation result in
Section VII shows CLM only needs to store a small
number of time stamps per each of the 256 ASCII
characters regardless of the input traffic. This gives us the
brief description about the outline look up modules
comes to in line look up modules it differs.
In-Line Lookup Modules:
3) Repetition Detection Module (RDM): RDM is
responsible for detecting repetitions that are not detected
by CLM (variable strings of type VD, VE, VF, and
VG).More formally, RDM detects components in the
form base {x,y} by accepting consecutive repetitions of
the base ,x to times in the input. Here, base can be a
single character, a character class, or a simple string .The
{x,y} shows a range of repetition, where is the minimum
and is the maximum repetitions. The RDM is the only in-
line detection module. The RDM consists of several
identical sub modules, and each sub module can detect
character classes and negated character classes with
repetitions. These sub modules operate in an on-demand
manner. Assume a RegEx detection process is in progress
and the following component to be inspected is a
character class or negated character class repetition. The
correlation block sends a request to the RDM for the
detection of this component. The request consists of the
base ID and the minimum and maximum repetition
boundaries, where the base is represented by a pattern ID.
The RDM assigns one of the available sub modules for
detecting this component. The assigned sub module then
inspects the corresponding input characters to see
whether they all belong to the base range and if the
number of repetitions is between the minimum and
maximum repetition boundaries. Once the number of
repetitions reaches the minimum repetition Value, a next
simple string is activated. The next simple string
inactivates when the number of repetitions reaches to the
maximum repetition value.
IV. OVERVIEW OF SYSTEM ARCHITECTURE
Packet capturing modules receives every packet.
3. International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
731
Payload extraction module, extracts the application
layer packet.
Using time stamps module (TLM) each incoming
character is cross checked against non-repetition
types of variable strings.
Character look up module (CLM) is responsible for
identifying frequently access character strings.
Repetition Detection module is responsible for
identifying repetition that are not detected by CLM
Frequently appearing repetition module(FRM) it
Reduces Resource usage by creating opportunity for
sharing Effort of Frequent bases.
Results Showing Using Pattern Matching Algorithm
Boyes Moore
LADFA
Results
V. RELATED WORK
An intrusion detection system (IDS) can be a key
component of security incident response within
organizations. Traditionally, intrusion detection research
has focused on improving the accuracy of IDSs, but
recent work has recognized the need to support the
security practitioners who receive the IDS alarms and
investigate suspected incidents. To examine the
challenges associated with deploying and maintaining an
IDS, we analyzed 9 interviews with IT security
practitioners who have worked with IDSs and performed
participatory observations in an organization deploying a
network IDS.
4. International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
732
We had three main research questions: (1) What do
security practitioners expect from an IDS?; (2) What
difficulties do they encounter when installing and
configuring an IDS and (3) How can the usability of an
IDS be improved Our analysis reveals both positive and
negative perceptions that security practitioners have for
IDSs, as well as several issues encountered during the
initial stages of IDS deployment. In particular,
practitioners found it difficult to decide where to place
the IDS and how to best configure it for use within a
distributed environment with multiple stakeholders. We
provide recommendations for tool support to help
mitigate these challenges and reduce the effort of
introducing an IDS within an organization. Exact string
matching was the principal method used in early DPI
systems and has been studied extensively. To keep up
with new and highly sophisticated attacks, software-
based NIDPSs started using RegEx-based signatures to
express these attacks more flexibly. The introduction of
RegEx-based signatures increased the complexity of the
DPI, limiting the scalability of Software NIDPS or packet
classifiers to high speeds. Current state-of-the-art
hardware architectures are either space-efficient (NFA)
or high-speed (DFA), but not both. Pure NFA
implementations will be too slow in software, but
hardware implementations can be feasible with a high
level of parallelism. The implementation in uses solely
logic gates. Although such schemes achieve high-speed
RegEx detection, the inflexibility of implementing
signatures on logic gates limits the updatability and
scalability of NFA implementations. Mitra et al.
proposed a compiler to automatically convert PCRE
opcodes into VHDL code to generate NFA on FPGA.The
memory requirement of DFA implementation can be very
high for certain types of RegExes. Some approaches aim
to reduce the memory requirement by reducing the
number of states. These approaches replace the DFA for
these problematic RegExes with NFA or other
architectures to minimize memory consumption, while
using DFA to implement the rest of the RegExes. Hybrid
FA keeps the problematic DFA states as NFAs (other
parts use DFA). In, the authors propose history-based
finite automata (H-FA), which remember the transition
history to avoid creating unnecessary states, and history-
based counting finite automata (H-cFA) which add
counters to reduce the number of states. The XFA
proposed in formalizes and generalizes the DFA state
explosion problem and shows a reduction of number of
states. In, the authors introduce a DFA-based FPGA
solution. Multiple microcontrollers, each of which
independently computes highly complex DFA operations,
are used to avoid DFA state explosions. Reference is
similar to in that it compiles RE’s into simple operation
codes so that multiple, specifically designed micro
engines (microcontroller) work in parallel.
However, even after replacing the problematic DFA
states with more memory-efficient structures in the above
schemes, the memory consumption of the rest of the DFA
is still significantly high. Other approaches propose to
reduce DFA memory by reducing the number of
transitions. For instance, D FA merges multiple common
transitions, called default transitions, to reduce the total
number of transitions. However, this approach may
require a large number of transitions for some cases,
leading to an increase in the number of memory accesses
per input byte. In addition, D FA construction is complex
and requires significant resources. Several researchers
follow up on the D FA idea. CD FA and Merge DFA
resolve the shortcomings of D FA by proposing multiple
state transitions per character. Merge DFA bounds the
number of worst-case transitions to, where is the length
of the input string. Although bounded, Merge DFA still
requires a relatively large number of memory accesses.
CD FA can achieve one transition per input character.
However, it requires a perfect hash function to do so.
Although these schemes successfully address some issues
in the D FA, they still cannot achieve satisfactory results
for all three design objectives—namely flexibility for
adding new signatures, efficient resource usage, and
high-speed detection. The authors focus on optimization
at the RE level before the FA is generated. They propose
rule rewriting for particular RE patterns that cause state
explosions. They also suggested grouping (splitting) the
DFA into multiple groups to reduce the number of states.
VI. CONCLUSION
We explored LaFA, an on-chip RE detection system
that is highly scalable. The scalability of present schemes
is normally limited by the traditional per-character state
processing and state transition detection paradigm. The
key research existing schemes is on optimizing the
number of states and need transitions, not on the
suboptimal character-based detection format. In future,
the potential benefits of accessing out-of-sequence
detection optional of detecting components of a RE in
sequence of looking have not been explored. We selected
perfect separate detection operations from state
transitions, allowing opportunities to another optimize
traditional FAs. LaFA employs is look ahead technique
to re arrange the sequence of pattern detections technique
that needs less operations and can announce a mismatch
before exploring complex patterns. Replacing variable
strings are arranged by independent detection modules.
By this solution was evaluated to the scalability problem
of traditional FAs and improvement in memory
efficiency of LaFA. In Comparison with state-of-the-art
RE detection system, LaFA needs an order of magnitude
less memory.
5. International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
733
REFERENCES
[1] Amitesh Bhardwaj,Somesh Kumar Dewangan, ―FINITE STATES
OPTIMIZATION ON LOOK-AHEAD DETERMINISTIC
FINITE AUTOMATION ― International Journal of Computer
Trends and Technology (IJCTT) – volume 4 Issue 9, pp 3151 -
3156, Sep 2013.
[2] M. Fisk and G. Varghese, ―An analysis of fast string matching
applied to content-based forwarding and intrusion detection,‖
Tech. Rep.CS2001-0670 (updated version), 2002.
[3] ―Port80,‖ Port80 Software, San Diego, CA [Online]. Available:
http://www.port80software.com/surveys/top1000compression
[4] ―Website Optimization, LLC,‖ Website Optimization, LLC, Ann
Arbor, MI [Online]. Available:
http://www.websiteoptimization.com
[5] P.Deutsch, ―Gzip file format specification,‖ RFC
1952,May1996[Online].Available:
http://www.ietf.org/rfc/rfc1952.txt
[6] P. Deutsch, ―Deflate compressed data format specification,‖ RFC
1951, May 1996 [Online]. Available:
http://www.ietf.org/rfc/rfc1951.txt
[7] J. Ziv and A. Lempel, ―A universal algorithm for sequential data
compression,‖ IEEE Trans. Inf. Theory, vol. IT-23, no. 3, pp. 337–
343, May 1977.
[8] D. Huffman, ―A method for the construction of minimum-
redundancy codes,‖ Proc. IRE, vol. 40, no. 9, pp. 1098–1101, Sep.
1952.
[9] ―Zlib,‖ [Online]. Available: http://www.zlib.net
[10] A. Aho and M. Corasick, ―Efficient string matching: An aid to
bibliographic search,‖ Commun. ACM, vol. 18, pp. 333–340, Jun.
1975.
[11] R. Boyer and J. Moore, ―A fast string searching algorithm,‖
Commun.ACM, vol. 20, no. 10, pp. 762–772, Oct. 1977.
[12] T. Song, W. Zhang, D. Wang, and Y. Xue, ―A memory efficient
multiple patternmatching architecture for network security,‖ in
Proc. IEEE INFOCOM, Apr. 2008, pp. 166–170.
[13] J. van Lunteren, ―High-performance pattern-matching for
intrusion detection,‖ in Proc. IEEE INFOCOM, Apr. 2006, pp. 1–
13.
[14] V. Dimopoulos, I. Papaefstathiou, and D. Pnevmatikatos, ―A
memoryefficient reconfigurable Aho–Corasick FSM
implementation for intrusion detection systems,‖ in Proc. IC-
SAMOS, Jul. 2007, pp. 186–193.
[15] N. Tuck, T. Sherwood, B. Calder, and G. Varghese,
―Deterministic memory-efficient string matching algorithms for
intrusion detection,‖ in Proc. IEEE INFOCOM, 2004, vol. 4, pp.
2628–2639.
[16] M. Alicherry, M. Muthuprasanna, and V. Kumar, ―High speed
pattern matching for network ids/ips,‖ in Proc. IEEE ICNP, 2006,
pp. 187–196.
Authors Profile
First Author: Amitesh Bhardwaj
received his BE (CSE) degree from
Chhattisgarh Swami Vivekanand
Technical University Raipur in
2010.He is currently a M.Tech
student in the Computer Science
Engineering from chhattisgarh
swami Vivekanand Technical
University Raipur. His Research interests are in the areas
of Wireless and Network Security, with current focus on
secure data services in cloud computing and secure
computation outsourcing.
Second Author: Somesh Kumar
Dewangan received his M. Tech
in Computer Science and
Engineering from RCET Bhilai,
Chhattisgarh Swami
Vivekananda University Bhilai,
in 2009. Before that the MCA.
Degree in Computer Application
from MPBO University, Bhopal, India, in 2005. He is
lecturer, Assistant Professor, Associate professor, Disha
Institute of Management and Technology, Chhattisgarh
Swami Vivekananda Technical University Bhilai, India,
in 2005 and 2008 respectively. His research interests
include digital signal processing and image processing,
Natural Language Processing, Neural Network, Artificial
Intelligence, Information and Network Security, Mobile
Networking and Cryptography and Android based
Application.