SlideShare a Scribd company logo
Lakshmi Mythri Dasari et al Int. Journal of Engineering Research and Applications www.ijera.com 
ISSN : 2248-9622, Vol. 4, Issue 7( Version 4), July 2014, pp.29-31 
www.ijera.com 29 | P a g e 
Loss less DNA Solidity Using Huffman and Arithmetic Coding Lakshmi Mythri Dasari#1, Ch Srinivasu#2, Pathipati Srihari#3 #1, #2 Department of Electronics and Communication #1,#2Dadi Institute of Engineering and Technology, #3National Institute of Technology Karnataka Abstract DNA Sequences making up any bacterium comprise the blue print of that bacterium so that understanding and analyzing different genes with in sequences has become an exceptionally significant mission. Naturalists are manufacturing huge volumes of DNA Sequences every day that makes genome sequence catalogue emergent exponentially. The data bases such as Gen-bank represents millions of DNA Sequences filling many thousands of gigabytes workstation storing capability. Solidity of Genomic sequences can decrease the storage requirements, and increase the broadcast speed. In this paper we compare two lossless solidity algorithms (Huffman and Arithmetic coding). In Huffman coding, individual bases are coded and assigned a specific binary number. But for Arithmetic coding entire DNA is coded in to a single fraction number and binary word is coded to it. Solidity ratio is compared for both the methods and finally we conclude that arithmetic coding is the best. 
Key words: DNA, Solidity, Huffman coding, Arithmetic Coding, Solidity Ratio. 
I. Introduction 
The solidity of DNA sequences is one of the most challenging tasks in the field of data solidity. Since DNA sequences are the code of life. We expect them to be non-random and to present some regularity. It is natural to try taking advantage of such regularities in order compactly store the huge databases which are routinely handled by molecular biologists. The amount of DNA being extracted from organisms and sequenced is increasing exponentially. This yields two problems a) storage b) comprehension. Despite the prevalence of broad band network connection, there still exists a need for compact representation of data to speed up transmission. DNA is composed only from four chemicals bases: Adenine (A), Thymine (T), Guanine (G), and Cytosine (C). Human DNA consists of about 3 billion bases and more than 99% of those bases are the same in all people, the order of this base determines the information available for building an organism. The solidity of this huge amount of produced DNA sequences is a very important and challenging task. 
II. Proposed Work 
In this paper we compress the DNA sequence by using Huffman and Arithmetic Coding. 
a) Huffman Coding: The Huffman procedure is based on two observations concerning optimum prefix codes. 
1. In an optimum code, symbols that occur more frequently will have shorter code words than symbols that occur less frequently. 
2. In an optimum code, the two symbols that occur least frequently will have same length. 
The method stars by building a list of all alphabet symbols in descending order of their probabilities. It then constructs a tree. The tree is built by going through the following steps. 
1. Combine the two lowest frequencies (probabilities) and continue this procedure. 
2. Assign “0” to higher frequency (prob.) and “1” to lower frequency (prob.) of each pair, or vice versa. 
3. Trace the path for each character frequency from lower to upper point. Recording the ones and zeros along the path. 
4. Assign each character (message) codes sequentially from right to left. 
This is best illustrated by an example. Consider DNA sequences AATAAAATAAAACAAAATTAA AAGC whose length is 25. . Table I. Huffman coding Table 
BASE 
FREQUENCY 
PROBABILITY 
CODE WORD 
A 
18 
0.72 
0 
T 
4 
0.16 
10 
G 
1 
0.04 
110 
C 
2 
0.08 
111 
The number of bits required to represent the sequence is 35 bits. 
RESEARCH ARTICLE OPEN ACCESS
Lakshmi Mythri Dasari et al Int. Journal of Engineering Research and Applications www.ijera.com 
ISSN : 2248-9622, Vol. 4, Issue 7( Version 4), July 2014, pp.29-31 
www.ijera.com 30 | P a g e 
b) Arithmetic Coding: In Huffman code, every 
symbol is assigned a code word. This problem is 
overcome in arithmetic coding by assigning one 
code to the entire input stream, instead of 
assigning codes to the individual symbols. The 
method reads the input stream symbol by symbol 
and appends more bits to the code each time a 
symbol is input and processed. 
The algorithm for the arithmetic coding is given 
below 
1. Define an interval [0,1) which is closed at 0 and 
open at 1. 
2. Input the symbol “S” from the input stream. 
3. Divide the interval into sub intervals proportional 
to symbol probability by using the below 
formulas 
NewHigh  OldLow (OldHighOldLow)*HighRange(x) 
NewLowOldLow(OldHighOldLow)*LowRange(x) 
4. Next subinterval is the main interval. 
5. When all the symbols are coded, the final 
output should be any fractional number that 
uniquely identifies the interval which is known 
as tag. 
This represents the whole symbol sequence. The 
tag is then represented in binary digits. 
Minimum number of bits required to 
represent the tag is given by 
   
 
 
 
 
 
   
 
 
  
 
 
 1 
1 
log 
2 P S 
N 
Where PS is the probability of all symbols. 
Consider the subsequence 
AATAAAATAAAACAAAATTAAAAGC 
Table II. Arithmetic Coding Table 
BAS 
E 
FREQUEN 
CY 
PROBABILI 
TY 
RANGE 
A 18 0.72 [0.28,1) 
T 4 0.16 
[0.12,0.2 
8) 
G 1 0.04 
[0.08,0.1 
2) 
C 2 0.08 [0,0.08) 
The coding values of the sequence are as follows: 
A => [0, 0.08) 
A=> [0.0224, 0.08) 
T => [0.029312, 0.038528) 
A=> [0.03189248, 0.038528) 
A=> [0.033750425, 0.038528) 
A=> [0.035088146, 0.038528) 
A=> [0.036051305, 0.038528) 
T=> [0.036348508, 0.036744779) 
A=> [0.036459464, 0.036744779) 
A=> [0.036539352, 0.036744779) 
A=> [0.036516872, 0.036744779) 
A=> [0.036638286, 0.036744779) 
C=> [0.036638286, 0.036646806) 
A=> [0.036640672, 0.036646806) 
A=> [0.036642389, 0.036646806) 
A=> [0.036643626, 0.036646806) 
A=> [0.036644516, 0.036646806) 
T=> [0.036644791, 0.036645157) 
T=> [0.036644835, 0.036644893) 
A=> [0.036644851, 0.036644893) 
A=> [0.036644863, 0.036644893) 
A=> [0.036644871, 0.036644893) 
A=> [0.036644878, 0.036644893) 
G=> [0.036644879, 0.036644879) 
C=> [0.036644879, 0.036644879) 
0.036644879 
2 
0.036644879 0.036644879 
 
 
Tag  
Minimum number of bits required is 32. 
Binary representation of 0.036644879 is 
00000100101100001100011110000110. 
From the above example, we can observe 
that the number of bits encoded for arithmetic coding 
is less. 
III. Solidity Performance 
Several Quantities are commonly used to 
express the performance of a solidity method. 
1. Solidity Ratio: 
Sizeoftheinputstream 
Sizeoftheoutputstream 
CR  
A value of 0.6 means that the data occupies 60% 
of its original size after solidity. Values greater than 1 
mean an output stream bigger than the input stream 
(negative solidity). The solidity ratio can also be 
called bpb (bit per bit), since it equals the number of 
bits in the compressed stream needed, on average, to 
compress on bit in the input stream. 
IV. Outputs 
Different DNA samples are taken solidity is done 
by two methods and are compared 
Table III. Comparison Results 
SL 
No 
Access 
ion 
Numb 
er 
Origi 
nal 
Size 
(bits) 
After 
Solidity 
Solidity 
Ratio 
Huff 
man 
Arit 
hme 
tic 
Huff 
man 
Arit 
hme 
tic 
1 
AF348 
515.1 
17160 4198 4108 
0.75 
536 
0.76 
0606 
2 
AF007 
546 
17024 4256 4202 0.75 
0.75 
3055 
3 
AF008 
216.1 
5640 1410 1380 0.75 
0.75 
5319 
4 
AF186 
607.1 
10408 2602 2591 0.75 
0.75 
1057
Lakshmi Mythri Dasari et al Int. Journal of Engineering Research and Applications www.ijera.com 
ISSN : 2248-9622, Vol. 4, Issue 7( Version 4), July 2014, pp.29-31 
www.ijera.com 31 | P a g e 
Huffman and Arithmetic coding are applied on DNA sequences. The results shows that solidity is best achieved by Arithmetic coding rather than Huffman coding. Which are developed by MATLAB 8.1.0.604 (R2013a) on a core i5 processor with a 4GB of RAM. 
V. Conclusion 
The need for effective DNA solidity is evident in biological applications where storage and transmission of DNA are involved. Algorithm using the concept of Huffman and Arithmetic coding is proposed to compress DNA sequences. The Solidity ratio of Huffman coding is less, as the algorithm depends on the occurrence one of the elements with a high probability and the result is a short length of bits and this does not occur in DNA sequence. This is the major limitations of using Huffman code to compress DAN sequence. Good Solidity ration can be achieved by using adaptive methods, Dictionary learning method, wavelet methods, soft evolutionary computing methods etc. References 
[1] G. Manzini and M. Rastero “A Simple and Fast DNA Solidity” February 17, 2004. 
[2] H. Afify, M.Islam and M. Abdel wahedL. “DNA LossLess Differential Solidity Algorithm Based on similarity of genomic sequence data base. (IJCSIT) Vol 3, No4, August 2011. 
[3] M. Burrows and D.J. Wheeler, "A Block Sorting Lossless Data Solidity Algorithms", Technical report124, Digital System Research center, 1994. 
[4] P. G. Howard and J. S. Vitter, “Analysis of arithmetic coding for data solidity,” Informat. Process. Manag., vol. 28, no. 6, pp. 749-763, 1992. 
[5] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data solidity,” Commun. ACM, vol. 30, no. 6, pp. 520-540, June 1987.

More Related Content

What's hot

Simple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud StorageSimple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud Storage
Kevin Tong
 
Network Coding for Distributed Storage Systems(Group Meeting Talk)
Network Coding for Distributed Storage Systems(Group Meeting Talk)Network Coding for Distributed Storage Systems(Group Meeting Talk)
Network Coding for Distributed Storage Systems(Group Meeting Talk)
Jayant Apte, PhD
 
Features of genetic algorithm for plain text encryption
Features of genetic algorithm for plain text encryption Features of genetic algorithm for plain text encryption
Features of genetic algorithm for plain text encryption
IJECEIAES
 
Performance Analysis of Steepest Descent Decoding Algorithm for LDPC Codes
Performance Analysis of Steepest Descent Decoding Algorithm for LDPC CodesPerformance Analysis of Steepest Descent Decoding Algorithm for LDPC Codes
Performance Analysis of Steepest Descent Decoding Algorithm for LDPC Codes
idescitation
 
Network coding
Network codingNetwork coding
Network coding
Lishi He
 
Hardware implementation of (63, 51) bch encoder and decoder for wban using lf...
Hardware implementation of (63, 51) bch encoder and decoder for wban using lf...Hardware implementation of (63, 51) bch encoder and decoder for wban using lf...
Hardware implementation of (63, 51) bch encoder and decoder for wban using lf...
ijitjournal
 
Chunking in manipuri using crf
Chunking in manipuri using crfChunking in manipuri using crf
Chunking in manipuri using crf
ijnlc
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap
nooriasukmaningtyas
 
ENSEMBLE OF BLOWFISH WITH CHAOS BASED S BOX DESIGN FOR TEXT AND IMAGE ENCRYPTION
ENSEMBLE OF BLOWFISH WITH CHAOS BASED S BOX DESIGN FOR TEXT AND IMAGE ENCRYPTIONENSEMBLE OF BLOWFISH WITH CHAOS BASED S BOX DESIGN FOR TEXT AND IMAGE ENCRYPTION
ENSEMBLE OF BLOWFISH WITH CHAOS BASED S BOX DESIGN FOR TEXT AND IMAGE ENCRYPTION
IJNSA Journal
 
Cryptography- "A Black Art"
Cryptography- "A Black Art"Cryptography- "A Black Art"
Cryptography- "A Black Art"
Aditya Raina
 
C04922125
C04922125C04922125
C04922125
IOSR-JEN
 
Combating Bit Losses in Computer Networks using Modified Luby Transform Code
Combating Bit Losses in Computer Networks using Modified Luby Transform CodeCombating Bit Losses in Computer Networks using Modified Luby Transform Code
Combating Bit Losses in Computer Networks using Modified Luby Transform Code
IJMER
 
Comparative Study of Three DNA-based Information Hiding Methods
Comparative Study of Three DNA-based Information Hiding MethodsComparative Study of Three DNA-based Information Hiding Methods
Comparative Study of Three DNA-based Information Hiding Methods
CSCJournals
 
Network Coding
Network CodingNetwork Coding
Network Coding
Kishoj Bajracharya
 
White space steganography on text
White space steganography on textWhite space steganography on text
White space steganography on text
IJCNCJournal
 
Introduction to Network Coding
Introduction to Network CodingIntroduction to Network Coding

What's hot (16)

Simple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud StorageSimple regenerating codes: Network Coding for Cloud Storage
Simple regenerating codes: Network Coding for Cloud Storage
 
Network Coding for Distributed Storage Systems(Group Meeting Talk)
Network Coding for Distributed Storage Systems(Group Meeting Talk)Network Coding for Distributed Storage Systems(Group Meeting Talk)
Network Coding for Distributed Storage Systems(Group Meeting Talk)
 
Features of genetic algorithm for plain text encryption
Features of genetic algorithm for plain text encryption Features of genetic algorithm for plain text encryption
Features of genetic algorithm for plain text encryption
 
Performance Analysis of Steepest Descent Decoding Algorithm for LDPC Codes
Performance Analysis of Steepest Descent Decoding Algorithm for LDPC CodesPerformance Analysis of Steepest Descent Decoding Algorithm for LDPC Codes
Performance Analysis of Steepest Descent Decoding Algorithm for LDPC Codes
 
Network coding
Network codingNetwork coding
Network coding
 
Hardware implementation of (63, 51) bch encoder and decoder for wban using lf...
Hardware implementation of (63, 51) bch encoder and decoder for wban using lf...Hardware implementation of (63, 51) bch encoder and decoder for wban using lf...
Hardware implementation of (63, 51) bch encoder and decoder for wban using lf...
 
Chunking in manipuri using crf
Chunking in manipuri using crfChunking in manipuri using crf
Chunking in manipuri using crf
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap
 
ENSEMBLE OF BLOWFISH WITH CHAOS BASED S BOX DESIGN FOR TEXT AND IMAGE ENCRYPTION
ENSEMBLE OF BLOWFISH WITH CHAOS BASED S BOX DESIGN FOR TEXT AND IMAGE ENCRYPTIONENSEMBLE OF BLOWFISH WITH CHAOS BASED S BOX DESIGN FOR TEXT AND IMAGE ENCRYPTION
ENSEMBLE OF BLOWFISH WITH CHAOS BASED S BOX DESIGN FOR TEXT AND IMAGE ENCRYPTION
 
Cryptography- "A Black Art"
Cryptography- "A Black Art"Cryptography- "A Black Art"
Cryptography- "A Black Art"
 
C04922125
C04922125C04922125
C04922125
 
Combating Bit Losses in Computer Networks using Modified Luby Transform Code
Combating Bit Losses in Computer Networks using Modified Luby Transform CodeCombating Bit Losses in Computer Networks using Modified Luby Transform Code
Combating Bit Losses in Computer Networks using Modified Luby Transform Code
 
Comparative Study of Three DNA-based Information Hiding Methods
Comparative Study of Three DNA-based Information Hiding MethodsComparative Study of Three DNA-based Information Hiding Methods
Comparative Study of Three DNA-based Information Hiding Methods
 
Network Coding
Network CodingNetwork Coding
Network Coding
 
White space steganography on text
White space steganography on textWhite space steganography on text
White space steganography on text
 
Introduction to Network Coding
Introduction to Network CodingIntroduction to Network Coding
Introduction to Network Coding
 

Viewers also liked

K43015659
K43015659K43015659
K43015659
IJERA Editor
 
Technological Considerations and Constraints in the Manufacture of High Preci...
Technological Considerations and Constraints in the Manufacture of High Preci...Technological Considerations and Constraints in the Manufacture of High Preci...
Technological Considerations and Constraints in the Manufacture of High Preci...
IJERA Editor
 
Fluid Structure Interaction Based Investigation of Convergent-Divergent Nozzle
Fluid Structure Interaction Based Investigation of Convergent-Divergent NozzleFluid Structure Interaction Based Investigation of Convergent-Divergent Nozzle
Fluid Structure Interaction Based Investigation of Convergent-Divergent Nozzle
IJERA Editor
 
E43012628
E43012628E43012628
E43012628
IJERA Editor
 
B045041114
B045041114B045041114
B045041114
IJERA Editor
 
M046047076
M046047076M046047076
M046047076
IJERA Editor
 
G42044752
G42044752G42044752
G42044752
IJERA Editor
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
IJERA Editor
 
B044040916
B044040916B044040916
B044040916
IJERA Editor
 
L045056469
L045056469L045056469
L045056469
IJERA Editor
 
Assessment of the waste water quality parameter of the Chitrakoot Dham, Karwi
Assessment of the waste water quality parameter of the Chitrakoot Dham, KarwiAssessment of the waste water quality parameter of the Chitrakoot Dham, Karwi
Assessment of the waste water quality parameter of the Chitrakoot Dham, Karwi
IJERA Editor
 
Behavior Analysis Of Malicious Web Pages Through Client Honeypot For Detectio...
Behavior Analysis Of Malicious Web Pages Through Client Honeypot For Detectio...Behavior Analysis Of Malicious Web Pages Through Client Honeypot For Detectio...
Behavior Analysis Of Malicious Web Pages Through Client Honeypot For Detectio...
IJERA Editor
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
IJERA Editor
 
The Performance Evaluation of Concrete Filled Steel Tubular Arch Bridge
The Performance Evaluation of Concrete Filled Steel Tubular Arch BridgeThe Performance Evaluation of Concrete Filled Steel Tubular Arch Bridge
The Performance Evaluation of Concrete Filled Steel Tubular Arch Bridge
IJERA Editor
 
Cx4301574577
Cx4301574577Cx4301574577
Cx4301574577
IJERA Editor
 
Numerical simulation of Pressure Drop through a Compact Helical geometry
Numerical simulation of Pressure Drop through a Compact Helical geometryNumerical simulation of Pressure Drop through a Compact Helical geometry
Numerical simulation of Pressure Drop through a Compact Helical geometry
IJERA Editor
 
E045063032
E045063032E045063032
E045063032
IJERA Editor
 
X04408122125
X04408122125X04408122125
X04408122125
IJERA Editor
 
A Method of Survey on Object-Oriented Shadow Detection & Removal for High Res...
A Method of Survey on Object-Oriented Shadow Detection & Removal for High Res...A Method of Survey on Object-Oriented Shadow Detection & Removal for High Res...
A Method of Survey on Object-Oriented Shadow Detection & Removal for High Res...
IJERA Editor
 
Am04605271282
Am04605271282Am04605271282
Am04605271282
IJERA Editor
 

Viewers also liked (20)

K43015659
K43015659K43015659
K43015659
 
Technological Considerations and Constraints in the Manufacture of High Preci...
Technological Considerations and Constraints in the Manufacture of High Preci...Technological Considerations and Constraints in the Manufacture of High Preci...
Technological Considerations and Constraints in the Manufacture of High Preci...
 
Fluid Structure Interaction Based Investigation of Convergent-Divergent Nozzle
Fluid Structure Interaction Based Investigation of Convergent-Divergent NozzleFluid Structure Interaction Based Investigation of Convergent-Divergent Nozzle
Fluid Structure Interaction Based Investigation of Convergent-Divergent Nozzle
 
E43012628
E43012628E43012628
E43012628
 
B045041114
B045041114B045041114
B045041114
 
M046047076
M046047076M046047076
M046047076
 
G42044752
G42044752G42044752
G42044752
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
B044040916
B044040916B044040916
B044040916
 
L045056469
L045056469L045056469
L045056469
 
Assessment of the waste water quality parameter of the Chitrakoot Dham, Karwi
Assessment of the waste water quality parameter of the Chitrakoot Dham, KarwiAssessment of the waste water quality parameter of the Chitrakoot Dham, Karwi
Assessment of the waste water quality parameter of the Chitrakoot Dham, Karwi
 
Behavior Analysis Of Malicious Web Pages Through Client Honeypot For Detectio...
Behavior Analysis Of Malicious Web Pages Through Client Honeypot For Detectio...Behavior Analysis Of Malicious Web Pages Through Client Honeypot For Detectio...
Behavior Analysis Of Malicious Web Pages Through Client Honeypot For Detectio...
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
The Performance Evaluation of Concrete Filled Steel Tubular Arch Bridge
The Performance Evaluation of Concrete Filled Steel Tubular Arch BridgeThe Performance Evaluation of Concrete Filled Steel Tubular Arch Bridge
The Performance Evaluation of Concrete Filled Steel Tubular Arch Bridge
 
Cx4301574577
Cx4301574577Cx4301574577
Cx4301574577
 
Numerical simulation of Pressure Drop through a Compact Helical geometry
Numerical simulation of Pressure Drop through a Compact Helical geometryNumerical simulation of Pressure Drop through a Compact Helical geometry
Numerical simulation of Pressure Drop through a Compact Helical geometry
 
E045063032
E045063032E045063032
E045063032
 
X04408122125
X04408122125X04408122125
X04408122125
 
A Method of Survey on Object-Oriented Shadow Detection & Removal for High Res...
A Method of Survey on Object-Oriented Shadow Detection & Removal for High Res...A Method of Survey on Object-Oriented Shadow Detection & Removal for High Res...
A Method of Survey on Object-Oriented Shadow Detection & Removal for High Res...
 
Am04605271282
Am04605271282Am04605271282
Am04605271282
 

Similar to Loss less DNA Solidity Using Huffman and Arithmetic Coding

Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaa
Editor Jacotech
 
Data Encryption Technique Based on DNA Cryptography
Data Encryption Technique Based on DNA CryptographyData Encryption Technique Based on DNA Cryptography
Data Encryption Technique Based on DNA Cryptography
BALAKUMARC1
 
K mer index of dna sequence based on hash
K mer index of dna sequence based on hashK mer index of dna sequence based on hash
K mer index of dna sequence based on hash
ijcsa
 
Performance Efficient DNA Sequence Detectionalgo
Performance Efficient DNA Sequence DetectionalgoPerformance Efficient DNA Sequence Detectionalgo
Performance Efficient DNA Sequence Detectionalgo
Rahul Shirude
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
CSCJournals
 
Dna data compression algorithms based on redundancy
Dna data compression algorithms based on redundancyDna data compression algorithms based on redundancy
Dna data compression algorithms based on redundancy
ijfcstjournal
 
E031022026
E031022026E031022026
E031022026
ijceronline
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomics
Shyam Sarkar
 
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
IJAAS Team
 
IRJET- Enhanced Security using DNA Cryptography
IRJET- Enhanced Security using DNA CryptographyIRJET- Enhanced Security using DNA Cryptography
IRJET- Enhanced Security using DNA Cryptography
IRJET Journal
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
eSAT Publishing House
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
eSAT Journals
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Derryck Lamptey, MPhil, CISSP
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
ijait
 
J0445255
J0445255J0445255
J0445255
IJERA Editor
 
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
ijcsta
 
50320130403003 2
50320130403003 250320130403003 2
50320130403003 2
IAEME Publication
 
Improving the data recovery for short length LT codes
Improving the data recovery for short length LT codes Improving the data recovery for short length LT codes
Improving the data recovery for short length LT codes
IJECEIAES
 
Neuro genetic key based recursive modulo 2 substitution using mutated charact...
Neuro genetic key based recursive modulo 2 substitution using mutated charact...Neuro genetic key based recursive modulo 2 substitution using mutated charact...
Neuro genetic key based recursive modulo 2 substitution using mutated charact...
ijcsity
 
A comparative review on symmetric and asymmetric DNA-based cryptography
A comparative review on symmetric and asymmetric DNA-based cryptographyA comparative review on symmetric and asymmetric DNA-based cryptography
A comparative review on symmetric and asymmetric DNA-based cryptography
journalBEEI
 

Similar to Loss less DNA Solidity Using Huffman and Arithmetic Coding (20)

Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaa
 
Data Encryption Technique Based on DNA Cryptography
Data Encryption Technique Based on DNA CryptographyData Encryption Technique Based on DNA Cryptography
Data Encryption Technique Based on DNA Cryptography
 
K mer index of dna sequence based on hash
K mer index of dna sequence based on hashK mer index of dna sequence based on hash
K mer index of dna sequence based on hash
 
Performance Efficient DNA Sequence Detectionalgo
Performance Efficient DNA Sequence DetectionalgoPerformance Efficient DNA Sequence Detectionalgo
Performance Efficient DNA Sequence Detectionalgo
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
 
Dna data compression algorithms based on redundancy
Dna data compression algorithms based on redundancyDna data compression algorithms based on redundancy
Dna data compression algorithms based on redundancy
 
E031022026
E031022026E031022026
E031022026
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomics
 
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
 
IRJET- Enhanced Security using DNA Cryptography
IRJET- Enhanced Security using DNA CryptographyIRJET- Enhanced Security using DNA Cryptography
IRJET- Enhanced Security using DNA Cryptography
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
 
J0445255
J0445255J0445255
J0445255
 
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
 
50320130403003 2
50320130403003 250320130403003 2
50320130403003 2
 
Improving the data recovery for short length LT codes
Improving the data recovery for short length LT codes Improving the data recovery for short length LT codes
Improving the data recovery for short length LT codes
 
Neuro genetic key based recursive modulo 2 substitution using mutated charact...
Neuro genetic key based recursive modulo 2 substitution using mutated charact...Neuro genetic key based recursive modulo 2 substitution using mutated charact...
Neuro genetic key based recursive modulo 2 substitution using mutated charact...
 
A comparative review on symmetric and asymmetric DNA-based cryptography
A comparative review on symmetric and asymmetric DNA-based cryptographyA comparative review on symmetric and asymmetric DNA-based cryptography
A comparative review on symmetric and asymmetric DNA-based cryptography
 

Recently uploaded

ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 

Recently uploaded (20)

ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 

Loss less DNA Solidity Using Huffman and Arithmetic Coding

  • 1. Lakshmi Mythri Dasari et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 7( Version 4), July 2014, pp.29-31 www.ijera.com 29 | P a g e Loss less DNA Solidity Using Huffman and Arithmetic Coding Lakshmi Mythri Dasari#1, Ch Srinivasu#2, Pathipati Srihari#3 #1, #2 Department of Electronics and Communication #1,#2Dadi Institute of Engineering and Technology, #3National Institute of Technology Karnataka Abstract DNA Sequences making up any bacterium comprise the blue print of that bacterium so that understanding and analyzing different genes with in sequences has become an exceptionally significant mission. Naturalists are manufacturing huge volumes of DNA Sequences every day that makes genome sequence catalogue emergent exponentially. The data bases such as Gen-bank represents millions of DNA Sequences filling many thousands of gigabytes workstation storing capability. Solidity of Genomic sequences can decrease the storage requirements, and increase the broadcast speed. In this paper we compare two lossless solidity algorithms (Huffman and Arithmetic coding). In Huffman coding, individual bases are coded and assigned a specific binary number. But for Arithmetic coding entire DNA is coded in to a single fraction number and binary word is coded to it. Solidity ratio is compared for both the methods and finally we conclude that arithmetic coding is the best. Key words: DNA, Solidity, Huffman coding, Arithmetic Coding, Solidity Ratio. I. Introduction The solidity of DNA sequences is one of the most challenging tasks in the field of data solidity. Since DNA sequences are the code of life. We expect them to be non-random and to present some regularity. It is natural to try taking advantage of such regularities in order compactly store the huge databases which are routinely handled by molecular biologists. The amount of DNA being extracted from organisms and sequenced is increasing exponentially. This yields two problems a) storage b) comprehension. Despite the prevalence of broad band network connection, there still exists a need for compact representation of data to speed up transmission. DNA is composed only from four chemicals bases: Adenine (A), Thymine (T), Guanine (G), and Cytosine (C). Human DNA consists of about 3 billion bases and more than 99% of those bases are the same in all people, the order of this base determines the information available for building an organism. The solidity of this huge amount of produced DNA sequences is a very important and challenging task. II. Proposed Work In this paper we compress the DNA sequence by using Huffman and Arithmetic Coding. a) Huffman Coding: The Huffman procedure is based on two observations concerning optimum prefix codes. 1. In an optimum code, symbols that occur more frequently will have shorter code words than symbols that occur less frequently. 2. In an optimum code, the two symbols that occur least frequently will have same length. The method stars by building a list of all alphabet symbols in descending order of their probabilities. It then constructs a tree. The tree is built by going through the following steps. 1. Combine the two lowest frequencies (probabilities) and continue this procedure. 2. Assign “0” to higher frequency (prob.) and “1” to lower frequency (prob.) of each pair, or vice versa. 3. Trace the path for each character frequency from lower to upper point. Recording the ones and zeros along the path. 4. Assign each character (message) codes sequentially from right to left. This is best illustrated by an example. Consider DNA sequences AATAAAATAAAACAAAATTAA AAGC whose length is 25. . Table I. Huffman coding Table BASE FREQUENCY PROBABILITY CODE WORD A 18 0.72 0 T 4 0.16 10 G 1 0.04 110 C 2 0.08 111 The number of bits required to represent the sequence is 35 bits. RESEARCH ARTICLE OPEN ACCESS
  • 2. Lakshmi Mythri Dasari et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 7( Version 4), July 2014, pp.29-31 www.ijera.com 30 | P a g e b) Arithmetic Coding: In Huffman code, every symbol is assigned a code word. This problem is overcome in arithmetic coding by assigning one code to the entire input stream, instead of assigning codes to the individual symbols. The method reads the input stream symbol by symbol and appends more bits to the code each time a symbol is input and processed. The algorithm for the arithmetic coding is given below 1. Define an interval [0,1) which is closed at 0 and open at 1. 2. Input the symbol “S” from the input stream. 3. Divide the interval into sub intervals proportional to symbol probability by using the below formulas NewHigh  OldLow (OldHighOldLow)*HighRange(x) NewLowOldLow(OldHighOldLow)*LowRange(x) 4. Next subinterval is the main interval. 5. When all the symbols are coded, the final output should be any fractional number that uniquely identifies the interval which is known as tag. This represents the whole symbol sequence. The tag is then represented in binary digits. Minimum number of bits required to represent the tag is given by                   1 1 log 2 P S N Where PS is the probability of all symbols. Consider the subsequence AATAAAATAAAACAAAATTAAAAGC Table II. Arithmetic Coding Table BAS E FREQUEN CY PROBABILI TY RANGE A 18 0.72 [0.28,1) T 4 0.16 [0.12,0.2 8) G 1 0.04 [0.08,0.1 2) C 2 0.08 [0,0.08) The coding values of the sequence are as follows: A => [0, 0.08) A=> [0.0224, 0.08) T => [0.029312, 0.038528) A=> [0.03189248, 0.038528) A=> [0.033750425, 0.038528) A=> [0.035088146, 0.038528) A=> [0.036051305, 0.038528) T=> [0.036348508, 0.036744779) A=> [0.036459464, 0.036744779) A=> [0.036539352, 0.036744779) A=> [0.036516872, 0.036744779) A=> [0.036638286, 0.036744779) C=> [0.036638286, 0.036646806) A=> [0.036640672, 0.036646806) A=> [0.036642389, 0.036646806) A=> [0.036643626, 0.036646806) A=> [0.036644516, 0.036646806) T=> [0.036644791, 0.036645157) T=> [0.036644835, 0.036644893) A=> [0.036644851, 0.036644893) A=> [0.036644863, 0.036644893) A=> [0.036644871, 0.036644893) A=> [0.036644878, 0.036644893) G=> [0.036644879, 0.036644879) C=> [0.036644879, 0.036644879) 0.036644879 2 0.036644879 0.036644879   Tag  Minimum number of bits required is 32. Binary representation of 0.036644879 is 00000100101100001100011110000110. From the above example, we can observe that the number of bits encoded for arithmetic coding is less. III. Solidity Performance Several Quantities are commonly used to express the performance of a solidity method. 1. Solidity Ratio: Sizeoftheinputstream Sizeoftheoutputstream CR  A value of 0.6 means that the data occupies 60% of its original size after solidity. Values greater than 1 mean an output stream bigger than the input stream (negative solidity). The solidity ratio can also be called bpb (bit per bit), since it equals the number of bits in the compressed stream needed, on average, to compress on bit in the input stream. IV. Outputs Different DNA samples are taken solidity is done by two methods and are compared Table III. Comparison Results SL No Access ion Numb er Origi nal Size (bits) After Solidity Solidity Ratio Huff man Arit hme tic Huff man Arit hme tic 1 AF348 515.1 17160 4198 4108 0.75 536 0.76 0606 2 AF007 546 17024 4256 4202 0.75 0.75 3055 3 AF008 216.1 5640 1410 1380 0.75 0.75 5319 4 AF186 607.1 10408 2602 2591 0.75 0.75 1057
  • 3. Lakshmi Mythri Dasari et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 7( Version 4), July 2014, pp.29-31 www.ijera.com 31 | P a g e Huffman and Arithmetic coding are applied on DNA sequences. The results shows that solidity is best achieved by Arithmetic coding rather than Huffman coding. Which are developed by MATLAB 8.1.0.604 (R2013a) on a core i5 processor with a 4GB of RAM. V. Conclusion The need for effective DNA solidity is evident in biological applications where storage and transmission of DNA are involved. Algorithm using the concept of Huffman and Arithmetic coding is proposed to compress DNA sequences. The Solidity ratio of Huffman coding is less, as the algorithm depends on the occurrence one of the elements with a high probability and the result is a short length of bits and this does not occur in DNA sequence. This is the major limitations of using Huffman code to compress DAN sequence. Good Solidity ration can be achieved by using adaptive methods, Dictionary learning method, wavelet methods, soft evolutionary computing methods etc. References [1] G. Manzini and M. Rastero “A Simple and Fast DNA Solidity” February 17, 2004. [2] H. Afify, M.Islam and M. Abdel wahedL. “DNA LossLess Differential Solidity Algorithm Based on similarity of genomic sequence data base. (IJCSIT) Vol 3, No4, August 2011. [3] M. Burrows and D.J. Wheeler, "A Block Sorting Lossless Data Solidity Algorithms", Technical report124, Digital System Research center, 1994. [4] P. G. Howard and J. S. Vitter, “Analysis of arithmetic coding for data solidity,” Informat. Process. Manag., vol. 28, no. 6, pp. 749-763, 1992. [5] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data solidity,” Commun. ACM, vol. 30, no. 6, pp. 520-540, June 1987.