SlideShare a Scribd company logo
1 of 10
Download to read offline
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
DOI:10.5121/ijcsa.2015.5402 19
K-Mer Index Of DNA Sequence Based On Hash
Algorithm
Jinlin Liu1
, Qiang Chen2
and Chen Zhang3
]
1
College of Electronic and Electrical Engineering, Shanghai University of Engineering
Science, Shanghai 201620,China.
2
College of Electronic and Electrical Engineering, Shanghai University of Engineering
Science, Shanghai 201620,China.
3
School of Management, Shanghai University of Engineering Science
Shanghai, 201620, China.
ABSTRACT
K-mer frequency statistics of biological sequences is a very important and important problem in biological
information processing. This paper addresses the problem of index k-mer for large scale data reading DNA
sequences in a limited memory space and time. Using the hash algorithm to establish index, the index
model is set up to base pairing, and get the length of k-mer statistic information quickly, so as to avoid
searching all the sequences of the index. At the same time, the program uses hash table to establish index
and build search model, and uses the zipper method to resolve the conflict in the case of address conflict.
Algorithm of time complexity analysis and experimental results show that compared with the traditional
indexing methods, the algorithm of the performance improvement is obvious, and very suitable for to be
used in the k-mer length change with a wide range .
KEYWORDS
K-mer index; hash algorithm; DNA detecting; index model;
1.INTRODUCTION
With the rapid development of DNA sequencing technology in recent years, human generated
massive biological sequence data, and we need to analyze and process through effective
calculation means. Among the numerous biological sequence analysis and processing problems,
the k-mer of biological sequence data is a short sequence of DNA sequences of k sequences.
When the K value is appropriate, sequence k-mer frequency distribution contains all the
information in the genome constituting equivalent sequences .So we can learn biological
sequences of base distribution characteristics, functions, structures and evolution information by
analyzing DNA sequence k-mer distribution and different k-mer information
2.QUESTIONS
This paper aims to solve the problem of k-mer index of DNA sequence.According to the given K,
100 million DNA sequences will establish index, Then the computer will read every K length
DNA from the start to end for each sequence. Then move on to the next sequence to read again,
until the positions of the individual K-mer appeared in the sequence were recorded. Because
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
20
DNA sequencing fragments, large scale of data, so we have to handle large data sets under the
condition of limited memory and disk space, and make the space complexity and computational
complexity as much as possible has been optimized. So we have to solve these problems.
Q1.
According to the given K to establish index, then search every sequence. Each sequence uses a
hash algorithm to encode the base, and then convert the input specific K base fragment into the
decimal data, and then match in the 100 million sequence. In the end, the computer output line
and column base fragment.
Q2.
After the index is established, we build the hash table in memory, and every time we traverse, we
store the frequency and the position of the k-mer in the hash table. Under the limited memory
space, we can traverse a million DNA sequences.
3.PROBLEM ANALYSIS
3.1.problem abstraction.
First according to the 100 million genetic sequence, because the length of each gene sequence is
100, so gene sequence is equivalent to a two bit matrix array a, corresponding to the rows of a as:
1-1 000000, it is listed as the 1-100. The problem is abstracted from the matrix A[i][j] analysis,
i=1,2... 1000000; j=1,2,... 100.
3.2.Method solution
The base species of the sequence are: C, A, G, T. Using the hash algorithm, the four bases are
converted into four binary digits, and then the conversion sequence is converted, which is set
A=0, C=1, G=2, T=3,and then convert the four numbers to decimal digits in the matching query
.Hash value algorithm formula is Hash(value)=value*[4^(k-m-1)], value represents the
corresponding value of the character, K represents the length of M, and k-mer represents the
position range of the character in the string [0- (m-1)].For example, the sequence k=4 of a given
ATCG is converted into the corresponding decimal ATCG=[0* (4^3) +3* (4^2) +2* (4^1) +1*
(4^0)]=54. The base sequence of each row length of 100 can be converted to a 100-k+1 decimal
number. The same principle can be used for the same 1 million line base sequence, you can get
the corresponding decimal number and then stored in the two-dimensional array A[i][j].when the
same decimal number is matched, the program converts decimal conversion into a four - band
form of a corresponding length of K, like the example ATCG form. Then program will print base
fragment corresponding row and column labels mark.
After the establishment of the index, we use division method to build hash tables in memory, and
determine the address of the hash table. The column headers and corresponding location is stored
in the hash table every k-mer occurs. The search efficiency of the query million DNA sequences
is improved under the limited memory space.
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
21
4.MODEL ESTABLISHMENT AND SOLUTION
Hash algorithm is the binary value of arbitrary length is mapped into a shorter fixed length of the
binary value, this small binary value called hash value.
In this paper, according to the principle of hash algorithm, the identity of the four bases of the
ACGT respectively 0123, converted to four hexadecimal number is then transformed into a
decimal number, let base conversion of decimal number and the first line of 100-k+1 to a decimal
number to match, if the base sequence matching, the program will output the row and column
label mark.
Flow chart as shown below:
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
22
4.1.Model two: search model based on hash table
The main requirement of this paper is to design hash function, according to the keyword k-mer to
build hash table.
There are a lot of methods of constructing hash function, digital analysis method, the direct
method of definite value, random numbers, random number method is usually used in the key
word length, this paper selects division method. The obtained nucleotide sequence of hash values
divided by 1000 to take over, get the number as the address of the hash table. All to take over the
business of the same number into the bucket, and in each bucket will remainder exists is not the
same, but business the same. Therefore, in order to solve the address conflict.
The method of the zipper is to resolve the conflict: the nodes of all keywords are synonymous
with the same single linked list.. If the selected hash table length is m, the hash table can be
defined as an array of pointers consisting of a m pointer T[0..M-1]. All the hash address for the
node of I, are inserted into the single T[i] pointer to the single chain table. The initial values of
each component in T should be null pointer. In the zipper method, the load factor can be greater
than 1, but generally take α less than 1.
Hash search: first of all, k-mer as the keyword, and program needs to use the hash function to
calculate the address. If the base arrangement is the same as the base sequence of the searched
sequence, if the same output of the node is all the information, if the relative should be found,
then returns continue to search.
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
23
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
24
4.2.Model three: analysis of the memory space occupied by the hash table
Data definition analysis: int keyword denotes an integer, whose range from negative -
2147483647 to +2147483647 (including these two digits) (32 bits) of integer. The number of
bytes occupied per int type is 4B. The char holds no symbol for the 16 bit (double byte) code bits,
whose values range from 0 to 65535 (8 bits).
The number of bytes occupied per char type is 1B.
Overall data analysis:
row, 1000000 defined int type variable (4Byte)
Column, 100 defined char type variable (1Byte)
Each index information theory takes up the memory space size: (B), can also be converted into
memory occupancy size: (GB)
Different K values, the memory space corresponding to each index is shown in the table below
Table4.1 The Memory Space
K Memory Space((((GB))))
1 0.00000002
2 0.00000007
3 0.00000030
4 0.00000119
5 0.00000477
6 0.00001907
7 0.00007629
5 4
1024 1024 1024
k
 ×
 
× × 
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
25
5.RUN RESULTS SHOW
5.1.The interface
Figure5.1 The interface
8 0.00030518
9 0.00122070
10 0.00488281
11 0.01953125
12 0.07812500
13 0.31250000
14 1.25000000
15 5.00000000
16 20.00000000
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
26
5.2.Search interface
Figure5.2 the search interface
5.3.File generated results
K_mer.txt file shown in Figure
Figure5.3 the text file shown
International Journal on Computational Science
5.4.Results the output interface
5.5.The complexity of the algorithm
(1) establish index complexity analysis
Time complexity O (1) + O (m), m for the conflict when the length of the zipper, that is
deep.
Space complexity O ( )
(2) using index complexity analysis
Time complexity O (1)
Space complexity O (1)
6.CONCLUSIONS
In order to solve the problem of k
the hash algorithm index model, the hash table query model, and the memory analysis
model of hash table. The design uses the visual2010 software to traverse the optimal
results, and the occupancy memory is
is accurate. To provide a good solution for solving the problem of k
ournal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
esults the output interface
Figure5.4 the output interface
5.5.The complexity of the algorithm
(1) establish index complexity analysis
Time complexity O (1) + O (m), m for the conflict when the length of the zipper, that is
(2) using index complexity analysis
In order to solve the problem of k-mer index DNA, three kinds of models are proposed,
the hash algorithm index model, the hash table query model, and the memory analysis
The design uses the visual2010 software to traverse the optimal
results, and the occupancy memory is small, the traversal efficiency is high and the result
is accurate. To provide a good solution for solving the problem of k-mer index DNA.
August 2015
27
Time complexity O (1) + O (m), m for the conflict when the length of the zipper, that is
dex DNA, three kinds of models are proposed,
the hash algorithm index model, the hash table query model, and the memory analysis
The design uses the visual2010 software to traverse the optimal
small, the traversal efficiency is high and the result
mer index DNA.
International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015
28
REFERENCES
[1] Singh, M.; Garg, D., "Choosing Best Hashing Strategies and Hash Functions," Advance Computing
Conference, 2009. IACC 2009. IEEE International , vol., no., pp.50,55, 6-7 March 2009
[2] Rizk G, Lavenier D, Chikhi R. DSK: k-mer counting with very low memory
usage[J].Bioinformatics, 2013, 29(5): 652-653
[3] Deorowicz S, Debudaj-Grabysz A, Grabowski S. Disk-based k-mer counting on a PC[J].BMC
bioinfonnatics, 2013, 14(1): 160.
[4] Roy K S, Bhattacharya D, Schliep A. Turtle: Identifying frequent k-mers with cache-efficient
algorithms[J]. arXiv preprint arXiv:1305.1861,2013.
[5] Chor B, Horn D, Goldman N, et al. Genomic DNA k-mer spectra: models and modalities[J].Genome
Biol, 2009, 10(10): 8108.
[6] Hao B, Lee H C, Zhang S. Fractals related to long DNA sequences and complete
genomes[J].Chaos,Solitions&Fractals,2000,11(6):825-836.
[7] Yang Xu; Lei Ma; Zhaobo Liu; Chao, H.J., "A Multi-dimensional Progressive Perfect Hashing for
High-Speed String Matching," Architectures for Networking and Communications Systems (ANCS),
2011 Seventh ACM/IEEE Symposium on , vol., no., pp.167,177, 3-4 Oct. 2011
[8] Yasuda, K.; Miura, T.; Shioya, I., "Distributed Processes on Tree Hash," Computer Software and
Applications Conference, 2006. COMPSAC '06. 30th Annual International , vol.2, no., pp.10,13, 17-
21 Sept. 2006
[9] Bradford, P.G.; Gavrylyako, O.V., "Hash chains with diminishing ranges for sensors," Parallel
Processing Workshops, 2004. ICPP 2004 Workshops. Proceedings. 2004 International Conference
on , vol., no., pp.77,83, 18-18 Aug. 2004
[10] Jian-Wei Fan; Chao-Wen Chan; Ya-Fen Chang, "A random increasing sequence hash chain and
smart card-based remote user authentication scheme," Information, Communications and Signal
Processing (ICICS) 2013 9th International Conference on , vol., no., pp.1,5, 10-13 Dec. 2013
Authors
Jinlin Liu is currently studying in Mechanical and Electronic Engineering from
Shanghai University of Engineering Science, China, where he is working towards the
Master degree. His current research interests include FPGA, design and develop in
Embedded system.

More Related Content

What's hot

Datastructures using c++
Datastructures using c++Datastructures using c++
Datastructures using c++Gopi Nath
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureRai University
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
 
Datastructures and algorithms prepared by M.V.Brehmanada Reddy
Datastructures and algorithms prepared by M.V.Brehmanada ReddyDatastructures and algorithms prepared by M.V.Brehmanada Reddy
Datastructures and algorithms prepared by M.V.Brehmanada ReddyMalikireddy Bramhananda Reddy
 
Data Structure & Algorithms | Computer Science
Data Structure & Algorithms | Computer ScienceData Structure & Algorithms | Computer Science
Data Structure & Algorithms | Computer ScienceTransweb Global Inc
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure Prof Ansari
 
Mit203 analysis and design of algorithms
Mit203  analysis and design of algorithmsMit203  analysis and design of algorithms
Mit203 analysis and design of algorithmssmumbahelp
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureBalwant Gorad
 
IRJET- A Survey on Different Searching Algorithms
IRJET- A Survey on Different Searching AlgorithmsIRJET- A Survey on Different Searching Algorithms
IRJET- A Survey on Different Searching AlgorithmsIRJET Journal
 
Ii pu cs practical viva voce questions
Ii pu cs  practical viva voce questionsIi pu cs  practical viva voce questions
Ii pu cs practical viva voce questionsProf. Dr. K. Adisesha
 
Lecture 1 an introduction to data structure
Lecture 1   an introduction to data structureLecture 1   an introduction to data structure
Lecture 1 an introduction to data structureDharmendra Prasad
 
Bc0038– data structure using c
Bc0038– data structure using cBc0038– data structure using c
Bc0038– data structure using chayerpa
 
UNIT I LINEAR DATA STRUCTURES – LIST
UNIT I 	LINEAR DATA STRUCTURES – LIST 	UNIT I 	LINEAR DATA STRUCTURES – LIST
UNIT I LINEAR DATA STRUCTURES – LIST Kathirvel Ayyaswamy
 
Data structures Basics
Data structures BasicsData structures Basics
Data structures BasicsDurgaDeviCbit
 
Efficient Sparse Coding Algorithms
Efficient Sparse Coding AlgorithmsEfficient Sparse Coding Algorithms
Efficient Sparse Coding AlgorithmsAnshu Dipit
 

What's hot (20)

Datastructures using c++
Datastructures using c++Datastructures using c++
Datastructures using c++
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
 
Datastructures and algorithms prepared by M.V.Brehmanada Reddy
Datastructures and algorithms prepared by M.V.Brehmanada ReddyDatastructures and algorithms prepared by M.V.Brehmanada Reddy
Datastructures and algorithms prepared by M.V.Brehmanada Reddy
 
Data Structure & Algorithms | Computer Science
Data Structure & Algorithms | Computer ScienceData Structure & Algorithms | Computer Science
Data Structure & Algorithms | Computer Science
 
M v bramhananda reddy dsa complete notes
M v bramhananda reddy dsa complete notesM v bramhananda reddy dsa complete notes
M v bramhananda reddy dsa complete notes
 
Ch17 Hashing
Ch17 HashingCh17 Hashing
Ch17 Hashing
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
 
Data structure
 Data structure Data structure
Data structure
 
Mit203 analysis and design of algorithms
Mit203  analysis and design of algorithmsMit203  analysis and design of algorithms
Mit203 analysis and design of algorithms
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
 
C programming
C programmingC programming
C programming
 
IRJET- A Survey on Different Searching Algorithms
IRJET- A Survey on Different Searching AlgorithmsIRJET- A Survey on Different Searching Algorithms
IRJET- A Survey on Different Searching Algorithms
 
Ii pu cs practical viva voce questions
Ii pu cs  practical viva voce questionsIi pu cs  practical viva voce questions
Ii pu cs practical viva voce questions
 
Lecture 1 an introduction to data structure
Lecture 1   an introduction to data structureLecture 1   an introduction to data structure
Lecture 1 an introduction to data structure
 
Bc0038– data structure using c
Bc0038– data structure using cBc0038– data structure using c
Bc0038– data structure using c
 
UNIT I LINEAR DATA STRUCTURES – LIST
UNIT I 	LINEAR DATA STRUCTURES – LIST 	UNIT I 	LINEAR DATA STRUCTURES – LIST
UNIT I LINEAR DATA STRUCTURES – LIST
 
Data structures Basics
Data structures BasicsData structures Basics
Data structures Basics
 
Efficient Sparse Coding Algorithms
Efficient Sparse Coding AlgorithmsEfficient Sparse Coding Algorithms
Efficient Sparse Coding Algorithms
 
Binary Search
Binary SearchBinary Search
Binary Search
 

Viewers also liked

A countermeasure for flooding
A countermeasure for floodingA countermeasure for flooding
A countermeasure for floodingijcsa
 
Handling ambiguities and unknown words in named entity recognition using anap...
Handling ambiguities and unknown words in named entity recognition using anap...Handling ambiguities and unknown words in named entity recognition using anap...
Handling ambiguities and unknown words in named entity recognition using anap...ijcsa
 
Energy efficient sensor selection in visual sensor networks based on multi ob...
Energy efficient sensor selection in visual sensor networks based on multi ob...Energy efficient sensor selection in visual sensor networks based on multi ob...
Energy efficient sensor selection in visual sensor networks based on multi ob...ijcsa
 
Quantifying the impact of flood attack on
Quantifying the impact of flood attack onQuantifying the impact of flood attack on
Quantifying the impact of flood attack onijcsa
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMijcsa
 
INVESTIGATION OF NONLINEAR DYNAMICS IN THE BOOST CONVERTER: EFFECT OF CAPACIT...
INVESTIGATION OF NONLINEAR DYNAMICS IN THE BOOST CONVERTER: EFFECT OF CAPACIT...INVESTIGATION OF NONLINEAR DYNAMICS IN THE BOOST CONVERTER: EFFECT OF CAPACIT...
INVESTIGATION OF NONLINEAR DYNAMICS IN THE BOOST CONVERTER: EFFECT OF CAPACIT...ijcsa
 
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...ijcsa
 
SCHEDULING IN GRID TO MINIMIZE THE IMPOSED OVERHEAD ON THE SYSTEM AND TO INC...
SCHEDULING IN GRID TO MINIMIZE THE IMPOSED OVERHEAD ON THE SYSTEM AND  TO INC...SCHEDULING IN GRID TO MINIMIZE THE IMPOSED OVERHEAD ON THE SYSTEM AND  TO INC...
SCHEDULING IN GRID TO MINIMIZE THE IMPOSED OVERHEAD ON THE SYSTEM AND TO INC...ijcsa
 
tScene classification using pyramid histogram of
tScene classification using pyramid histogram oftScene classification using pyramid histogram of
tScene classification using pyramid histogram ofijcsa
 
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aTheta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aijcsa
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALCONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALijcsa
 
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...ijcsa
 
Application of Taguchi Experiment Design for Decrease of Cogging Torque in P...
Application of Taguchi Experiment Design for  Decrease of Cogging Torque in P...Application of Taguchi Experiment Design for  Decrease of Cogging Torque in P...
Application of Taguchi Experiment Design for Decrease of Cogging Torque in P...ijcsa
 
PORTFOLIO SELECTION BY THE MEANS OF CUCKOO OPTIMIZATION ALGORITHM
PORTFOLIO SELECTION BY THE MEANS OF CUCKOO OPTIMIZATION ALGORITHMPORTFOLIO SELECTION BY THE MEANS OF CUCKOO OPTIMIZATION ALGORITHM
PORTFOLIO SELECTION BY THE MEANS OF CUCKOO OPTIMIZATION ALGORITHMijcsa
 
COUPLER, POWER DIVIDER AND CIRCULATOR IN V-BAND SUBSTRATE INTEGRATED WAVEGUID...
COUPLER, POWER DIVIDER AND CIRCULATOR IN V-BAND SUBSTRATE INTEGRATED WAVEGUID...COUPLER, POWER DIVIDER AND CIRCULATOR IN V-BAND SUBSTRATE INTEGRATED WAVEGUID...
COUPLER, POWER DIVIDER AND CIRCULATOR IN V-BAND SUBSTRATE INTEGRATED WAVEGUID...ijcsa
 
A COMPARATIVE PERFORMANCE STUDY OF OFDM SYSTEM WITH THE IMPLEMENTATION OF COM...
A COMPARATIVE PERFORMANCE STUDY OF OFDM SYSTEM WITH THE IMPLEMENTATION OF COM...A COMPARATIVE PERFORMANCE STUDY OF OFDM SYSTEM WITH THE IMPLEMENTATION OF COM...
A COMPARATIVE PERFORMANCE STUDY OF OFDM SYSTEM WITH THE IMPLEMENTATION OF COM...ijcsa
 
Data analysis by using machine
Data analysis by using machineData analysis by using machine
Data analysis by using machineijcsa
 
Automatic rectification of perspective distortion from a single image using p...
Automatic rectification of perspective distortion from a single image using p...Automatic rectification of perspective distortion from a single image using p...
Automatic rectification of perspective distortion from a single image using p...ijcsa
 
JAVA 2013 IEEE IMAGEPROCESSING PROJECT Query adaptive image search with hash ...
JAVA 2013 IEEE IMAGEPROCESSING PROJECT Query adaptive image search with hash ...JAVA 2013 IEEE IMAGEPROCESSING PROJECT Query adaptive image search with hash ...
JAVA 2013 IEEE IMAGEPROCESSING PROJECT Query adaptive image search with hash ...IEEEGLOBALSOFTTECHNOLOGIES
 
Enhanced Hashing Approach For Image Forgery Detection With Feature Level Fusion
Enhanced Hashing Approach For Image Forgery Detection With Feature Level FusionEnhanced Hashing Approach For Image Forgery Detection With Feature Level Fusion
Enhanced Hashing Approach For Image Forgery Detection With Feature Level FusionIJTET Journal
 

Viewers also liked (20)

A countermeasure for flooding
A countermeasure for floodingA countermeasure for flooding
A countermeasure for flooding
 
Handling ambiguities and unknown words in named entity recognition using anap...
Handling ambiguities and unknown words in named entity recognition using anap...Handling ambiguities and unknown words in named entity recognition using anap...
Handling ambiguities and unknown words in named entity recognition using anap...
 
Energy efficient sensor selection in visual sensor networks based on multi ob...
Energy efficient sensor selection in visual sensor networks based on multi ob...Energy efficient sensor selection in visual sensor networks based on multi ob...
Energy efficient sensor selection in visual sensor networks based on multi ob...
 
Quantifying the impact of flood attack on
Quantifying the impact of flood attack onQuantifying the impact of flood attack on
Quantifying the impact of flood attack on
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
 
INVESTIGATION OF NONLINEAR DYNAMICS IN THE BOOST CONVERTER: EFFECT OF CAPACIT...
INVESTIGATION OF NONLINEAR DYNAMICS IN THE BOOST CONVERTER: EFFECT OF CAPACIT...INVESTIGATION OF NONLINEAR DYNAMICS IN THE BOOST CONVERTER: EFFECT OF CAPACIT...
INVESTIGATION OF NONLINEAR DYNAMICS IN THE BOOST CONVERTER: EFFECT OF CAPACIT...
 
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
 
SCHEDULING IN GRID TO MINIMIZE THE IMPOSED OVERHEAD ON THE SYSTEM AND TO INC...
SCHEDULING IN GRID TO MINIMIZE THE IMPOSED OVERHEAD ON THE SYSTEM AND  TO INC...SCHEDULING IN GRID TO MINIMIZE THE IMPOSED OVERHEAD ON THE SYSTEM AND  TO INC...
SCHEDULING IN GRID TO MINIMIZE THE IMPOSED OVERHEAD ON THE SYSTEM AND TO INC...
 
tScene classification using pyramid histogram of
tScene classification using pyramid histogram oftScene classification using pyramid histogram of
tScene classification using pyramid histogram of
 
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aTheta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALCONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
 
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
 
Application of Taguchi Experiment Design for Decrease of Cogging Torque in P...
Application of Taguchi Experiment Design for  Decrease of Cogging Torque in P...Application of Taguchi Experiment Design for  Decrease of Cogging Torque in P...
Application of Taguchi Experiment Design for Decrease of Cogging Torque in P...
 
PORTFOLIO SELECTION BY THE MEANS OF CUCKOO OPTIMIZATION ALGORITHM
PORTFOLIO SELECTION BY THE MEANS OF CUCKOO OPTIMIZATION ALGORITHMPORTFOLIO SELECTION BY THE MEANS OF CUCKOO OPTIMIZATION ALGORITHM
PORTFOLIO SELECTION BY THE MEANS OF CUCKOO OPTIMIZATION ALGORITHM
 
COUPLER, POWER DIVIDER AND CIRCULATOR IN V-BAND SUBSTRATE INTEGRATED WAVEGUID...
COUPLER, POWER DIVIDER AND CIRCULATOR IN V-BAND SUBSTRATE INTEGRATED WAVEGUID...COUPLER, POWER DIVIDER AND CIRCULATOR IN V-BAND SUBSTRATE INTEGRATED WAVEGUID...
COUPLER, POWER DIVIDER AND CIRCULATOR IN V-BAND SUBSTRATE INTEGRATED WAVEGUID...
 
A COMPARATIVE PERFORMANCE STUDY OF OFDM SYSTEM WITH THE IMPLEMENTATION OF COM...
A COMPARATIVE PERFORMANCE STUDY OF OFDM SYSTEM WITH THE IMPLEMENTATION OF COM...A COMPARATIVE PERFORMANCE STUDY OF OFDM SYSTEM WITH THE IMPLEMENTATION OF COM...
A COMPARATIVE PERFORMANCE STUDY OF OFDM SYSTEM WITH THE IMPLEMENTATION OF COM...
 
Data analysis by using machine
Data analysis by using machineData analysis by using machine
Data analysis by using machine
 
Automatic rectification of perspective distortion from a single image using p...
Automatic rectification of perspective distortion from a single image using p...Automatic rectification of perspective distortion from a single image using p...
Automatic rectification of perspective distortion from a single image using p...
 
JAVA 2013 IEEE IMAGEPROCESSING PROJECT Query adaptive image search with hash ...
JAVA 2013 IEEE IMAGEPROCESSING PROJECT Query adaptive image search with hash ...JAVA 2013 IEEE IMAGEPROCESSING PROJECT Query adaptive image search with hash ...
JAVA 2013 IEEE IMAGEPROCESSING PROJECT Query adaptive image search with hash ...
 
Enhanced Hashing Approach For Image Forgery Detection With Feature Level Fusion
Enhanced Hashing Approach For Image Forgery Detection With Feature Level FusionEnhanced Hashing Approach For Image Forgery Detection With Feature Level Fusion
Enhanced Hashing Approach For Image Forgery Detection With Feature Level Fusion
 

Similar to K mer index of dna sequence based on hash

Symmetric Key Generation Algorithm in Linear Block Cipher Over LU Decompositi...
Symmetric Key Generation Algorithm in Linear Block Cipher Over LU Decompositi...Symmetric Key Generation Algorithm in Linear Block Cipher Over LU Decompositi...
Symmetric Key Generation Algorithm in Linear Block Cipher Over LU Decompositi...ijtsrd
 
Computational intelligence based simulated annealing guided key generation in...
Computational intelligence based simulated annealing guided key generation in...Computational intelligence based simulated annealing guided key generation in...
Computational intelligence based simulated annealing guided key generation in...ijitjournal
 
A new dna based approach of generating keydependentmixcolumns
A new dna based approach of generating keydependentmixcolumnsA new dna based approach of generating keydependentmixcolumns
A new dna based approach of generating keydependentmixcolumnsIJCNCJournal
 
A design of parity check matrix for short irregular ldpc codes via magic
A design of parity check matrix for short irregular ldpc codes via magicA design of parity check matrix for short irregular ldpc codes via magic
A design of parity check matrix for short irregular ldpc codes via magicIAEME Publication
 
Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaaEditor Jacotech
 
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...IJAAS Team
 
Truncated boolean matrices for dna
Truncated boolean matrices for dnaTruncated boolean matrices for dna
Truncated boolean matrices for dnaIJCSEA Journal
 
Loss less DNA Solidity Using Huffman and Arithmetic Coding
Loss less DNA Solidity Using Huffman and Arithmetic CodingLoss less DNA Solidity Using Huffman and Arithmetic Coding
Loss less DNA Solidity Using Huffman and Arithmetic CodingIJERA Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Combining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherCombining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherIAEME Publication
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...IRJET Journal
 
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDLA Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDLidescitation
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec IISajid Marwat
 
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...IJNSA Journal
 
Indexing for Large DNA Database sequences
Indexing for Large DNA Database sequencesIndexing for Large DNA Database sequences
Indexing for Large DNA Database sequencesCSCJournals
 

Similar to K mer index of dna sequence based on hash (20)

Text encryption
Text encryptionText encryption
Text encryption
 
Symmetric Key Generation Algorithm in Linear Block Cipher Over LU Decompositi...
Symmetric Key Generation Algorithm in Linear Block Cipher Over LU Decompositi...Symmetric Key Generation Algorithm in Linear Block Cipher Over LU Decompositi...
Symmetric Key Generation Algorithm in Linear Block Cipher Over LU Decompositi...
 
Computational intelligence based simulated annealing guided key generation in...
Computational intelligence based simulated annealing guided key generation in...Computational intelligence based simulated annealing guided key generation in...
Computational intelligence based simulated annealing guided key generation in...
 
A new dna based approach of generating keydependentmixcolumns
A new dna based approach of generating keydependentmixcolumnsA new dna based approach of generating keydependentmixcolumns
A new dna based approach of generating keydependentmixcolumns
 
A design of parity check matrix for short irregular ldpc codes via magic
A design of parity check matrix for short irregular ldpc codes via magicA design of parity check matrix for short irregular ldpc codes via magic
A design of parity check matrix for short irregular ldpc codes via magic
 
Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaa
 
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
 
Truncated boolean matrices for dna
Truncated boolean matrices for dnaTruncated boolean matrices for dna
Truncated boolean matrices for dna
 
C6 agramakrishnan1
C6 agramakrishnan1C6 agramakrishnan1
C6 agramakrishnan1
 
Loss less DNA Solidity Using Huffman and Arithmetic Coding
Loss less DNA Solidity Using Huffman and Arithmetic CodingLoss less DNA Solidity Using Huffman and Arithmetic Coding
Loss less DNA Solidity Using Huffman and Arithmetic Coding
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Combining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcherCombining text and pattern preprocessing in an adaptive dna pattern matcher
Combining text and pattern preprocessing in an adaptive dna pattern matcher
 
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
 
11
1111
11
 
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDLA Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
 
I1803014852
I1803014852I1803014852
I1803014852
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec II
 
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
Digital Watermarking through Embedding of Encrypted and Arithmetically Compre...
 
Indexing for Large DNA Database sequences
Indexing for Large DNA Database sequencesIndexing for Large DNA Database sequences
Indexing for Large DNA Database sequences
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

K mer index of dna sequence based on hash

  • 1. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 DOI:10.5121/ijcsa.2015.5402 19 K-Mer Index Of DNA Sequence Based On Hash Algorithm Jinlin Liu1 , Qiang Chen2 and Chen Zhang3 ] 1 College of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620,China. 2 College of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620,China. 3 School of Management, Shanghai University of Engineering Science Shanghai, 201620, China. ABSTRACT K-mer frequency statistics of biological sequences is a very important and important problem in biological information processing. This paper addresses the problem of index k-mer for large scale data reading DNA sequences in a limited memory space and time. Using the hash algorithm to establish index, the index model is set up to base pairing, and get the length of k-mer statistic information quickly, so as to avoid searching all the sequences of the index. At the same time, the program uses hash table to establish index and build search model, and uses the zipper method to resolve the conflict in the case of address conflict. Algorithm of time complexity analysis and experimental results show that compared with the traditional indexing methods, the algorithm of the performance improvement is obvious, and very suitable for to be used in the k-mer length change with a wide range . KEYWORDS K-mer index; hash algorithm; DNA detecting; index model; 1.INTRODUCTION With the rapid development of DNA sequencing technology in recent years, human generated massive biological sequence data, and we need to analyze and process through effective calculation means. Among the numerous biological sequence analysis and processing problems, the k-mer of biological sequence data is a short sequence of DNA sequences of k sequences. When the K value is appropriate, sequence k-mer frequency distribution contains all the information in the genome constituting equivalent sequences .So we can learn biological sequences of base distribution characteristics, functions, structures and evolution information by analyzing DNA sequence k-mer distribution and different k-mer information 2.QUESTIONS This paper aims to solve the problem of k-mer index of DNA sequence.According to the given K, 100 million DNA sequences will establish index, Then the computer will read every K length DNA from the start to end for each sequence. Then move on to the next sequence to read again, until the positions of the individual K-mer appeared in the sequence were recorded. Because
  • 2. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 20 DNA sequencing fragments, large scale of data, so we have to handle large data sets under the condition of limited memory and disk space, and make the space complexity and computational complexity as much as possible has been optimized. So we have to solve these problems. Q1. According to the given K to establish index, then search every sequence. Each sequence uses a hash algorithm to encode the base, and then convert the input specific K base fragment into the decimal data, and then match in the 100 million sequence. In the end, the computer output line and column base fragment. Q2. After the index is established, we build the hash table in memory, and every time we traverse, we store the frequency and the position of the k-mer in the hash table. Under the limited memory space, we can traverse a million DNA sequences. 3.PROBLEM ANALYSIS 3.1.problem abstraction. First according to the 100 million genetic sequence, because the length of each gene sequence is 100, so gene sequence is equivalent to a two bit matrix array a, corresponding to the rows of a as: 1-1 000000, it is listed as the 1-100. The problem is abstracted from the matrix A[i][j] analysis, i=1,2... 1000000; j=1,2,... 100. 3.2.Method solution The base species of the sequence are: C, A, G, T. Using the hash algorithm, the four bases are converted into four binary digits, and then the conversion sequence is converted, which is set A=0, C=1, G=2, T=3,and then convert the four numbers to decimal digits in the matching query .Hash value algorithm formula is Hash(value)=value*[4^(k-m-1)], value represents the corresponding value of the character, K represents the length of M, and k-mer represents the position range of the character in the string [0- (m-1)].For example, the sequence k=4 of a given ATCG is converted into the corresponding decimal ATCG=[0* (4^3) +3* (4^2) +2* (4^1) +1* (4^0)]=54. The base sequence of each row length of 100 can be converted to a 100-k+1 decimal number. The same principle can be used for the same 1 million line base sequence, you can get the corresponding decimal number and then stored in the two-dimensional array A[i][j].when the same decimal number is matched, the program converts decimal conversion into a four - band form of a corresponding length of K, like the example ATCG form. Then program will print base fragment corresponding row and column labels mark. After the establishment of the index, we use division method to build hash tables in memory, and determine the address of the hash table. The column headers and corresponding location is stored in the hash table every k-mer occurs. The search efficiency of the query million DNA sequences is improved under the limited memory space.
  • 3. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 21 4.MODEL ESTABLISHMENT AND SOLUTION Hash algorithm is the binary value of arbitrary length is mapped into a shorter fixed length of the binary value, this small binary value called hash value. In this paper, according to the principle of hash algorithm, the identity of the four bases of the ACGT respectively 0123, converted to four hexadecimal number is then transformed into a decimal number, let base conversion of decimal number and the first line of 100-k+1 to a decimal number to match, if the base sequence matching, the program will output the row and column label mark. Flow chart as shown below:
  • 4. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 22 4.1.Model two: search model based on hash table The main requirement of this paper is to design hash function, according to the keyword k-mer to build hash table. There are a lot of methods of constructing hash function, digital analysis method, the direct method of definite value, random numbers, random number method is usually used in the key word length, this paper selects division method. The obtained nucleotide sequence of hash values divided by 1000 to take over, get the number as the address of the hash table. All to take over the business of the same number into the bucket, and in each bucket will remainder exists is not the same, but business the same. Therefore, in order to solve the address conflict. The method of the zipper is to resolve the conflict: the nodes of all keywords are synonymous with the same single linked list.. If the selected hash table length is m, the hash table can be defined as an array of pointers consisting of a m pointer T[0..M-1]. All the hash address for the node of I, are inserted into the single T[i] pointer to the single chain table. The initial values of each component in T should be null pointer. In the zipper method, the load factor can be greater than 1, but generally take α less than 1. Hash search: first of all, k-mer as the keyword, and program needs to use the hash function to calculate the address. If the base arrangement is the same as the base sequence of the searched sequence, if the same output of the node is all the information, if the relative should be found, then returns continue to search.
  • 5. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 23
  • 6. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 24 4.2.Model three: analysis of the memory space occupied by the hash table Data definition analysis: int keyword denotes an integer, whose range from negative - 2147483647 to +2147483647 (including these two digits) (32 bits) of integer. The number of bytes occupied per int type is 4B. The char holds no symbol for the 16 bit (double byte) code bits, whose values range from 0 to 65535 (8 bits). The number of bytes occupied per char type is 1B. Overall data analysis: row, 1000000 defined int type variable (4Byte) Column, 100 defined char type variable (1Byte) Each index information theory takes up the memory space size: (B), can also be converted into memory occupancy size: (GB) Different K values, the memory space corresponding to each index is shown in the table below Table4.1 The Memory Space K Memory Space((((GB)))) 1 0.00000002 2 0.00000007 3 0.00000030 4 0.00000119 5 0.00000477 6 0.00001907 7 0.00007629 5 4 1024 1024 1024 k  ×   × × 
  • 7. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 25 5.RUN RESULTS SHOW 5.1.The interface Figure5.1 The interface 8 0.00030518 9 0.00122070 10 0.00488281 11 0.01953125 12 0.07812500 13 0.31250000 14 1.25000000 15 5.00000000 16 20.00000000
  • 8. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 26 5.2.Search interface Figure5.2 the search interface 5.3.File generated results K_mer.txt file shown in Figure Figure5.3 the text file shown
  • 9. International Journal on Computational Science 5.4.Results the output interface 5.5.The complexity of the algorithm (1) establish index complexity analysis Time complexity O (1) + O (m), m for the conflict when the length of the zipper, that is deep. Space complexity O ( ) (2) using index complexity analysis Time complexity O (1) Space complexity O (1) 6.CONCLUSIONS In order to solve the problem of k the hash algorithm index model, the hash table query model, and the memory analysis model of hash table. The design uses the visual2010 software to traverse the optimal results, and the occupancy memory is is accurate. To provide a good solution for solving the problem of k ournal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 esults the output interface Figure5.4 the output interface 5.5.The complexity of the algorithm (1) establish index complexity analysis Time complexity O (1) + O (m), m for the conflict when the length of the zipper, that is (2) using index complexity analysis In order to solve the problem of k-mer index DNA, three kinds of models are proposed, the hash algorithm index model, the hash table query model, and the memory analysis The design uses the visual2010 software to traverse the optimal results, and the occupancy memory is small, the traversal efficiency is high and the result is accurate. To provide a good solution for solving the problem of k-mer index DNA. August 2015 27 Time complexity O (1) + O (m), m for the conflict when the length of the zipper, that is dex DNA, three kinds of models are proposed, the hash algorithm index model, the hash table query model, and the memory analysis The design uses the visual2010 software to traverse the optimal small, the traversal efficiency is high and the result mer index DNA.
  • 10. International Journal on Computational Science & Applications (IJCSA) Vol.5, No.4, August 2015 28 REFERENCES [1] Singh, M.; Garg, D., "Choosing Best Hashing Strategies and Hash Functions," Advance Computing Conference, 2009. IACC 2009. IEEE International , vol., no., pp.50,55, 6-7 March 2009 [2] Rizk G, Lavenier D, Chikhi R. DSK: k-mer counting with very low memory usage[J].Bioinformatics, 2013, 29(5): 652-653 [3] Deorowicz S, Debudaj-Grabysz A, Grabowski S. Disk-based k-mer counting on a PC[J].BMC bioinfonnatics, 2013, 14(1): 160. [4] Roy K S, Bhattacharya D, Schliep A. Turtle: Identifying frequent k-mers with cache-efficient algorithms[J]. arXiv preprint arXiv:1305.1861,2013. [5] Chor B, Horn D, Goldman N, et al. Genomic DNA k-mer spectra: models and modalities[J].Genome Biol, 2009, 10(10): 8108. [6] Hao B, Lee H C, Zhang S. Fractals related to long DNA sequences and complete genomes[J].Chaos,Solitions&Fractals,2000,11(6):825-836. [7] Yang Xu; Lei Ma; Zhaobo Liu; Chao, H.J., "A Multi-dimensional Progressive Perfect Hashing for High-Speed String Matching," Architectures for Networking and Communications Systems (ANCS), 2011 Seventh ACM/IEEE Symposium on , vol., no., pp.167,177, 3-4 Oct. 2011 [8] Yasuda, K.; Miura, T.; Shioya, I., "Distributed Processes on Tree Hash," Computer Software and Applications Conference, 2006. COMPSAC '06. 30th Annual International , vol.2, no., pp.10,13, 17- 21 Sept. 2006 [9] Bradford, P.G.; Gavrylyako, O.V., "Hash chains with diminishing ranges for sensors," Parallel Processing Workshops, 2004. ICPP 2004 Workshops. Proceedings. 2004 International Conference on , vol., no., pp.77,83, 18-18 Aug. 2004 [10] Jian-Wei Fan; Chao-Wen Chan; Ya-Fen Chang, "A random increasing sequence hash chain and smart card-based remote user authentication scheme," Information, Communications and Signal Processing (ICICS) 2013 9th International Conference on , vol., no., pp.1,5, 10-13 Dec. 2013 Authors Jinlin Liu is currently studying in Mechanical and Electronic Engineering from Shanghai University of Engineering Science, China, where he is working towards the Master degree. His current research interests include FPGA, design and develop in Embedded system.