This document presents a novel framework for identifying Short Tandem Repeats (STRs) using parallel string matching. It begins with background on STRs and challenges with existing sequential algorithms. It then describes a two-phase methodology - first applying a basic improved right prefix algorithm sequentially, then applying it in parallel using multi-threading on multicore processors. Results show the basic algorithm outperforms Boyer-Moore, Knuth-Morris-Pratt and brute force algorithms sequentially. When applied in parallel, processing time is reduced from 80ms sequentially to 40ms in parallel on multicore systems. The parallel STR identification framework allows efficient searching of repeats in large genomes.
Performance Efficient DNA Sequence DetectionalgoRahul Shirude
The document proposes a parallel string matching algorithm using CUDA to perform DNA sequence detection on a GPU. It aims to optimize GPU core utilization and achieve high performance gains over sequential approaches. The proposed algorithm finds correct pattern matches in DNA database sequences. Experimental results show the parallel GPU approach has much higher performance than sequential CPU methods.
This document summarizes a research paper on developing a memory-efficient bit-split based pattern matching algorithm for network intrusion detection systems. The algorithm aims to significantly reduce memory requirements for storing large rulesets by dividing long target patterns into fixed-length subpatterns. It proposes a DFA-based parallel string matching scheme using memory-efficient bit-split finite state machine architectures. Simulation results indicate the approach can effectively reduce total memory needs for parallel string matching engines to inspect network packets against intrusion detection rules in real-time.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
MotifGP is a motif discovery tool that uses multi-objective evolutionary algorithms to find DNA motifs. It evolves candidate motif solutions as network expressions using strongly typed genetic programming. MotifGP scores candidates based on how well they discriminate between input and control sequences while maintaining coverage of input sequences. When tested on 13 datasets, MotifGP identified known motifs and fully overlapped reference motifs in the databases, outperforming the existing DREME tool. Future work will explore different fitness functions and evolutionary methods to further improve motif discovery.
New Heuristic Model for Optimal CRC Polynomial IJECEIAES
Cyclic Redundancy Codes (CRCs) are important for maintaining integrity in data transmissions. CRC performance is mainly affected by the polynomial chosen. Recent increases in data throughput require a foray into determining optimal polynomials through software or hardware implementations. Most CRC implementations in use, offer less than optimal performance or are inferior to their newer published counterparts. Classical approaches to determining optimal polynomials involve brute force based searching a population set of all possible polynomials in that set. This paper evaluates performance of CRC-polynomials generated with Genetic Algorithms. It then compares the resultant polynomials, both with and without encryption headers against a benchmark polynomial.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Using Cisco Network Components to Improve NIDPS Performance csandit
Network Intrusion Detection and Prevention Systems (NIDPSs) are used to detect, prevent and
report evidence of attacks and malicious traffic. Our paper presents a study where we used open
source NIDPS software. We show that NIDPS detection performance can be weak in the face of
high-speed and high-load traffic in terms of missed alerts and missed logs. To counteract this
problem, we have proposed and evaluated a solution that utilizes QoS, queues and parallel
technologies in a multi-layer Cisco Catalyst Switch to increase NIDPSs detection performance.
Our approach designs a novel QoS architecture to organise and improve throughput-forwardplan
traffic in a layer 3 switch in order to improve NIDPS performance.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
Performance Efficient DNA Sequence DetectionalgoRahul Shirude
The document proposes a parallel string matching algorithm using CUDA to perform DNA sequence detection on a GPU. It aims to optimize GPU core utilization and achieve high performance gains over sequential approaches. The proposed algorithm finds correct pattern matches in DNA database sequences. Experimental results show the parallel GPU approach has much higher performance than sequential CPU methods.
This document summarizes a research paper on developing a memory-efficient bit-split based pattern matching algorithm for network intrusion detection systems. The algorithm aims to significantly reduce memory requirements for storing large rulesets by dividing long target patterns into fixed-length subpatterns. It proposes a DFA-based parallel string matching scheme using memory-efficient bit-split finite state machine architectures. Simulation results indicate the approach can effectively reduce total memory needs for parallel string matching engines to inspect network packets against intrusion detection rules in real-time.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
MotifGP is a motif discovery tool that uses multi-objective evolutionary algorithms to find DNA motifs. It evolves candidate motif solutions as network expressions using strongly typed genetic programming. MotifGP scores candidates based on how well they discriminate between input and control sequences while maintaining coverage of input sequences. When tested on 13 datasets, MotifGP identified known motifs and fully overlapped reference motifs in the databases, outperforming the existing DREME tool. Future work will explore different fitness functions and evolutionary methods to further improve motif discovery.
New Heuristic Model for Optimal CRC Polynomial IJECEIAES
Cyclic Redundancy Codes (CRCs) are important for maintaining integrity in data transmissions. CRC performance is mainly affected by the polynomial chosen. Recent increases in data throughput require a foray into determining optimal polynomials through software or hardware implementations. Most CRC implementations in use, offer less than optimal performance or are inferior to their newer published counterparts. Classical approaches to determining optimal polynomials involve brute force based searching a population set of all possible polynomials in that set. This paper evaluates performance of CRC-polynomials generated with Genetic Algorithms. It then compares the resultant polynomials, both with and without encryption headers against a benchmark polynomial.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Using Cisco Network Components to Improve NIDPS Performance csandit
Network Intrusion Detection and Prevention Systems (NIDPSs) are used to detect, prevent and
report evidence of attacks and malicious traffic. Our paper presents a study where we used open
source NIDPS software. We show that NIDPS detection performance can be weak in the face of
high-speed and high-load traffic in terms of missed alerts and missed logs. To counteract this
problem, we have proposed and evaluated a solution that utilizes QoS, queues and parallel
technologies in a multi-layer Cisco Catalyst Switch to increase NIDPSs detection performance.
Our approach designs a novel QoS architecture to organise and improve throughput-forwardplan
traffic in a layer 3 switch in order to improve NIDPS performance.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
The document proposes BergSump, a new framework for analyzing I/O automata. BergSump aims to confirm that superblocks and flip-flop gates are generally incompatible. It discusses related work on XML, wireless networks, and cryptography. The implementation section outlines version 5.9 of BergSump and plans to release the code under an open source license. The evaluation analyzes BergSump's performance and shows its median complexity is better than prior solutions. The conclusion argues that BergSump can successfully observe many sensor networks at once.
This document presents MLProph, a machine learning-based routing protocol for opportunistic networks. It uses decision trees and neural networks to select the next hop for packet forwarding. Simulation results show that MLProph achieves higher delivery probability and lower packet dropping than the PROPHET+ routing protocol. Future work will involve simulating MLProph using real mobility traces and exploring other machine learning classifiers.
This paper presents machine translation based on machine learning, which learns the semantically
correct corpus. The machine learning process based on Quantum Neural Network (QNN) is used to
recognizing the corpus pattern in realistic way. It translates on the basis of knowledge gained during
learning by entering pair of sentences from source to target language. By taking help of this training data
it translates the given text. own text.The paper consist study of a machine translation system which
converts source language to target language using quantum neural network. Rather than comparing
words semantically QNN compares numerical tags which is faster and accurate. In this tagger tags the
part of sentences discretely which helps to map bilingual sentences.
This paper presents a literature survey conducted for research oriented developments made till. The significance of this paper would be to provide a deep rooted understanding and knowledge transfer regarding existing approaches for gene sequencing and alignments using Smith Waterman algorithms and their respective strengths and weaknesses. In order to develop or perform any quality research it is always advised to conduct research goal oriented literature survey that could facilitate an in depth understanding of research work and an objective can be formulated on the basis of gaps existing between present requirements and existing approaches. Gene sequencing problems are one of the predominant issues for researchers to come up with optimized system model that could facilitate optimum processing and efficiency without introducing overheads in terms of memory and time. This research is oriented towards developing such kind of system while taking into consideration of dynamic programming approach called Smith Waterman algorithm in its enhanced form decorated with other supporting and optimized techniques. This paper provides an introduction oriented knowledge transfer so as to provide a brief introduction of research domain, research gap and motivations, objective formulated and proposed systems to accomplish ultimate objectives.
High throughput reliable multicast in multi-hop wireless mesh networksLogicMindtech Nologies
NS2 Projects for M. Tech, NS2 Projects in Vijayanagar, NS2 Projects in Bangalore, M. Tech Projects in Vijayanagar, M. Tech Projects in Bangalore, NS2 IEEE projects in Bangalore, IEEE 2015 NS2 Projects, WSN and MANET Projects, WSN and MANET Projects in Bangalore, WSN and MANET Projects in Vijayangar
This document summarizes an analysis of assembly algorithms and implementation of a De Bruijn graph approach to genome assembly. It discusses how De Bruijn graphs have become a common approach for assembly, representing reads as nodes and connecting nodes based on overlap of k-mers. The document outlines challenges in assembly including repeats and errors. It also summarizes two efficient data structures for representing De Bruijn graphs and describes implementing these to assemble microbial genomes and compare to the ABySS assembler.
The document discusses using word sense disambiguation (WSD) in concept identification for ontology construction. It describes implementing an approach that forms concepts from terms by meeting certain criteria, such as having an intentional definition and instances. WSD is needed to identify the sense of terms related to the domain when forming concepts. The Lesk algorithm is discussed as one method for WSD and concept disambiguation, involving calculating similarity between terms and WordNet senses. Evaluation shows the approach identified domain-specific concepts with reasonable precision and recall compared to other methods. Choosing the best WSD algorithm depends on factors like the problem nature and performance metrics.
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITYijcisjournal
This paper evaluates the security of wireless communication network based on the fuzzy logic in Mat lab. A new algorithm is proposed and evaluated which is the hybrid algorithm. We highlight the valuable assets in designing of wireless network communication system based on network simulator (NS2), which is crucial to protect security of the systems. Block cipher algorithms are evaluated by using fuzzy logics and a hybrid
algorithm is proposed. Both algorithms are evaluated in term of the security level. Logic (AND) is used in the rules of modelling and Mamdani Style is used for the evaluations
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...ijsrd.com
This document surveys various techniques for regular expression detection in deep packet inspection for network intrusion detection systems. It discusses how regular expressions are used with techniques like deterministic finite automata to detect attack signatures by matching packet payloads to predefined regular expression patterns. Specifically, it examines techniques like LaFA, TCAM, StriFA, CompactDFA, and DFA/EC that aim to improve regular expression matching performance by reducing memory usage, increasing matching speed, or decreasing the number of required states. The survey aims to understand these techniques and their potential application in improving intrusion detection in wireless networks.
The document discusses algorithms for database searching and sequence alignment. It introduces BLAST and FASTA, two widely used algorithms for database searching. BLAST works by finding short words in sequences that score above a threshold and then extending any alignments found. FASTA uses a "hit and extend" heuristic to find locally similar regions. The document then discusses the statistical models that BLAST uses to calculate expected values and rank matching sequences by significance. It describes how BLAST models alignments as coin tosses to apply the Erdös-Rényi theorem and derive the Karlin-Altschul equation for calculating expected values.
Communication systems-theory-for-undergraduate-students-using-matlabSaifAbdulNabi1
Dr. Chandana K.K. Jayasooriya received degrees from the Technical University of Berlin and Wichita State University. He is currently an Assistant Professor at the University of Pittsburgh at Johnstown teaching electrical engineering. The document discusses using MATLAB to teach communication systems theory to undergraduate students in a more intuitive way compared to traditional derivations-heavy approaches. It provides an example using amplitude modulation and shows how concepts like modulation, filtering, and demodulation can be demonstrated in MATLAB without requiring an advanced mathematical background.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
In remote sensor arrange messages are exchanged between the different source and goal matches agreeably such way that multi-jump parcel transmission is utilized. These information bundles are exchanged from the middle of the road hub to sink hub by sending a parcel to goal hubs. Where each hub overhears transmission close neighbor hub. To dodge this we propose novel approach with proficient steering convention i.e. most brief way directing and conveyed hub steering calculation. Proposed work additionally concentrates on Automatic Repeat Request and Deterministic Network coding. We spread this work by the end to end message encoding instrument. To upgrade hub security match shrewd key era is utilized, in which combined conveying hub is allocated with combine key to making secure correspondence. End to end. We dissect both single and numerous hubs and look at basic ARQ and deterministic system coding as strategies for transmission.
This document proposes improvements to domain-specific term extraction for ontology construction. It discusses issues with existing term extraction approaches and presents a new method that selects and organizes target and contrastive corpora. Terms are extracted using linguistic rules on part-of-speech tagged text. Statistical distributions are calculated to identify terms based on their frequency across multiple contrastive corpora. The approach achieves better precision in extracting simple and complex terms for computer science and biomedical domains compared to existing methods.
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...ijscai
Internet is the boon in modern era as every organization uses it for dissemination of information and ecommerce
related applications. Sometimes people of organization feel delay while accessing internet in
spite of proper bandwidth. Prediction model of web caching and prefetching is an ideal solution of this
delay problem. Prediction model analysing history of internet user from server raw log files and determine
future sequence of web objects and placed all web objects to nearer to the user so access latency could be
reduced to some extent and problem of delay is to be solved. To determine sequence of future web objects,
it is necessary to determine proximity of one web object with other by identifying proper distance metric
technique related to web caching and prefetching. This paper studies different distance metric techniques
and concludes that bio informatics based distance metric techniques are ideal in context to Web Caching
and Web Prefetching
This document provides an overview of the Python programming language, including its history, key features, and common uses. It discusses how Python is an interpreted, object-oriented language with dynamic typing and automatic memory management. Examples are given of Python's syntax for numbers, strings, modules, data structures like lists and dictionaries, and the interactive shell. Popular applications of Python like web development, science, and games are also mentioned.
This document provides an overview of common algorithms and data structures. It begins by defining an algorithm and discussing algorithm analysis techniques like asymptotic analysis. It then covers basic data structures like arrays, linked lists, stacks, queues, and trees. It explains various sorting algorithms like selection sort, bubble sort, insertion sort, quicksort, and mergesort. It also discusses search techniques like binary search and binary search trees. Other topics covered include priority queues, hashing, graphs, graph traversal, minimum spanning trees, shortest paths, and string searching algorithms.
string searching algorithms. Given two strings P and T over the same alphabet E, determine whether P occurs as a substring in T (or find in which position(s) P occurs as a substring in T). The strings P and T are called pattern and target respectively.
The document discusses two string matching algorithms: Knuth-Morris-Pratt (KMP) and Boyer-Moore (BM). KMP runs in linear time but checks every character, while BM can be sub-linear as it does not need to check every character. The document provides pseudocode examples and comparisons of the running times of KMP and BM on different length patterns, finding that BM performs best on longer patterns while KMP has better performance on shorter patterns. Both algorithms have widespread applications in areas like web search engines, spam filters, and natural language processing.
The document discusses the Boyer-Moore string searching algorithm. It works by preprocessing the pattern string and comparing characters from right to left. If a mismatch occurs, it uses two heuristics - bad character and good suffix - to determine the shift amount. The bad character heuristic shifts past mismatching characters, while the good suffix heuristic looks for matching suffixes to allow larger shifts. The algorithm generally gets faster as the pattern length increases, running in sub-linear time on average. It has applications in tasks like virus scanning and database searching that require high-speed string searching.
This document discusses different pattern recognition algorithms that could be implemented in real-time data sets. It begins by defining pattern recognition and providing examples. It then discusses why pattern recognition is important and lists several applications. The document goes on to describe three main approaches to pattern recognition - statistical, syntactic, and neural pattern recognition - and provides examples for each. It then provides more detailed descriptions and pseudocode for several specific algorithms, including KMP, Boyer-Moore, Rabin-Karp, naive string matching, and brute-force string matching. It concludes by discussing future work improving algorithm complexity and potential applications in biometric identification.
The document proposes BergSump, a new framework for analyzing I/O automata. BergSump aims to confirm that superblocks and flip-flop gates are generally incompatible. It discusses related work on XML, wireless networks, and cryptography. The implementation section outlines version 5.9 of BergSump and plans to release the code under an open source license. The evaluation analyzes BergSump's performance and shows its median complexity is better than prior solutions. The conclusion argues that BergSump can successfully observe many sensor networks at once.
This document presents MLProph, a machine learning-based routing protocol for opportunistic networks. It uses decision trees and neural networks to select the next hop for packet forwarding. Simulation results show that MLProph achieves higher delivery probability and lower packet dropping than the PROPHET+ routing protocol. Future work will involve simulating MLProph using real mobility traces and exploring other machine learning classifiers.
This paper presents machine translation based on machine learning, which learns the semantically
correct corpus. The machine learning process based on Quantum Neural Network (QNN) is used to
recognizing the corpus pattern in realistic way. It translates on the basis of knowledge gained during
learning by entering pair of sentences from source to target language. By taking help of this training data
it translates the given text. own text.The paper consist study of a machine translation system which
converts source language to target language using quantum neural network. Rather than comparing
words semantically QNN compares numerical tags which is faster and accurate. In this tagger tags the
part of sentences discretely which helps to map bilingual sentences.
This paper presents a literature survey conducted for research oriented developments made till. The significance of this paper would be to provide a deep rooted understanding and knowledge transfer regarding existing approaches for gene sequencing and alignments using Smith Waterman algorithms and their respective strengths and weaknesses. In order to develop or perform any quality research it is always advised to conduct research goal oriented literature survey that could facilitate an in depth understanding of research work and an objective can be formulated on the basis of gaps existing between present requirements and existing approaches. Gene sequencing problems are one of the predominant issues for researchers to come up with optimized system model that could facilitate optimum processing and efficiency without introducing overheads in terms of memory and time. This research is oriented towards developing such kind of system while taking into consideration of dynamic programming approach called Smith Waterman algorithm in its enhanced form decorated with other supporting and optimized techniques. This paper provides an introduction oriented knowledge transfer so as to provide a brief introduction of research domain, research gap and motivations, objective formulated and proposed systems to accomplish ultimate objectives.
High throughput reliable multicast in multi-hop wireless mesh networksLogicMindtech Nologies
NS2 Projects for M. Tech, NS2 Projects in Vijayanagar, NS2 Projects in Bangalore, M. Tech Projects in Vijayanagar, M. Tech Projects in Bangalore, NS2 IEEE projects in Bangalore, IEEE 2015 NS2 Projects, WSN and MANET Projects, WSN and MANET Projects in Bangalore, WSN and MANET Projects in Vijayangar
This document summarizes an analysis of assembly algorithms and implementation of a De Bruijn graph approach to genome assembly. It discusses how De Bruijn graphs have become a common approach for assembly, representing reads as nodes and connecting nodes based on overlap of k-mers. The document outlines challenges in assembly including repeats and errors. It also summarizes two efficient data structures for representing De Bruijn graphs and describes implementing these to assemble microbial genomes and compare to the ABySS assembler.
The document discusses using word sense disambiguation (WSD) in concept identification for ontology construction. It describes implementing an approach that forms concepts from terms by meeting certain criteria, such as having an intentional definition and instances. WSD is needed to identify the sense of terms related to the domain when forming concepts. The Lesk algorithm is discussed as one method for WSD and concept disambiguation, involving calculating similarity between terms and WordNet senses. Evaluation shows the approach identified domain-specific concepts with reasonable precision and recall compared to other methods. Choosing the best WSD algorithm depends on factors like the problem nature and performance metrics.
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITYijcisjournal
This paper evaluates the security of wireless communication network based on the fuzzy logic in Mat lab. A new algorithm is proposed and evaluated which is the hybrid algorithm. We highlight the valuable assets in designing of wireless network communication system based on network simulator (NS2), which is crucial to protect security of the systems. Block cipher algorithms are evaluated by using fuzzy logics and a hybrid
algorithm is proposed. Both algorithms are evaluated in term of the security level. Logic (AND) is used in the rules of modelling and Mamdani Style is used for the evaluations
A Survey on DPI Techniques for Regular Expression Detection in Network Intrus...ijsrd.com
This document surveys various techniques for regular expression detection in deep packet inspection for network intrusion detection systems. It discusses how regular expressions are used with techniques like deterministic finite automata to detect attack signatures by matching packet payloads to predefined regular expression patterns. Specifically, it examines techniques like LaFA, TCAM, StriFA, CompactDFA, and DFA/EC that aim to improve regular expression matching performance by reducing memory usage, increasing matching speed, or decreasing the number of required states. The survey aims to understand these techniques and their potential application in improving intrusion detection in wireless networks.
The document discusses algorithms for database searching and sequence alignment. It introduces BLAST and FASTA, two widely used algorithms for database searching. BLAST works by finding short words in sequences that score above a threshold and then extending any alignments found. FASTA uses a "hit and extend" heuristic to find locally similar regions. The document then discusses the statistical models that BLAST uses to calculate expected values and rank matching sequences by significance. It describes how BLAST models alignments as coin tosses to apply the Erdös-Rényi theorem and derive the Karlin-Altschul equation for calculating expected values.
Communication systems-theory-for-undergraduate-students-using-matlabSaifAbdulNabi1
Dr. Chandana K.K. Jayasooriya received degrees from the Technical University of Berlin and Wichita State University. He is currently an Assistant Professor at the University of Pittsburgh at Johnstown teaching electrical engineering. The document discusses using MATLAB to teach communication systems theory to undergraduate students in a more intuitive way compared to traditional derivations-heavy approaches. It provides an example using amplitude modulation and shows how concepts like modulation, filtering, and demodulation can be demonstrated in MATLAB without requiring an advanced mathematical background.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
In remote sensor arrange messages are exchanged between the different source and goal matches agreeably such way that multi-jump parcel transmission is utilized. These information bundles are exchanged from the middle of the road hub to sink hub by sending a parcel to goal hubs. Where each hub overhears transmission close neighbor hub. To dodge this we propose novel approach with proficient steering convention i.e. most brief way directing and conveyed hub steering calculation. Proposed work additionally concentrates on Automatic Repeat Request and Deterministic Network coding. We spread this work by the end to end message encoding instrument. To upgrade hub security match shrewd key era is utilized, in which combined conveying hub is allocated with combine key to making secure correspondence. End to end. We dissect both single and numerous hubs and look at basic ARQ and deterministic system coding as strategies for transmission.
This document proposes improvements to domain-specific term extraction for ontology construction. It discusses issues with existing term extraction approaches and presents a new method that selects and organizes target and contrastive corpora. Terms are extracted using linguistic rules on part-of-speech tagged text. Statistical distributions are calculated to identify terms based on their frequency across multiple contrastive corpora. The approach achieves better precision in extracting simple and complex terms for computer science and biomedical domains compared to existing methods.
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...ijscai
Internet is the boon in modern era as every organization uses it for dissemination of information and ecommerce
related applications. Sometimes people of organization feel delay while accessing internet in
spite of proper bandwidth. Prediction model of web caching and prefetching is an ideal solution of this
delay problem. Prediction model analysing history of internet user from server raw log files and determine
future sequence of web objects and placed all web objects to nearer to the user so access latency could be
reduced to some extent and problem of delay is to be solved. To determine sequence of future web objects,
it is necessary to determine proximity of one web object with other by identifying proper distance metric
technique related to web caching and prefetching. This paper studies different distance metric techniques
and concludes that bio informatics based distance metric techniques are ideal in context to Web Caching
and Web Prefetching
This document provides an overview of the Python programming language, including its history, key features, and common uses. It discusses how Python is an interpreted, object-oriented language with dynamic typing and automatic memory management. Examples are given of Python's syntax for numbers, strings, modules, data structures like lists and dictionaries, and the interactive shell. Popular applications of Python like web development, science, and games are also mentioned.
This document provides an overview of common algorithms and data structures. It begins by defining an algorithm and discussing algorithm analysis techniques like asymptotic analysis. It then covers basic data structures like arrays, linked lists, stacks, queues, and trees. It explains various sorting algorithms like selection sort, bubble sort, insertion sort, quicksort, and mergesort. It also discusses search techniques like binary search and binary search trees. Other topics covered include priority queues, hashing, graphs, graph traversal, minimum spanning trees, shortest paths, and string searching algorithms.
string searching algorithms. Given two strings P and T over the same alphabet E, determine whether P occurs as a substring in T (or find in which position(s) P occurs as a substring in T). The strings P and T are called pattern and target respectively.
The document discusses two string matching algorithms: Knuth-Morris-Pratt (KMP) and Boyer-Moore (BM). KMP runs in linear time but checks every character, while BM can be sub-linear as it does not need to check every character. The document provides pseudocode examples and comparisons of the running times of KMP and BM on different length patterns, finding that BM performs best on longer patterns while KMP has better performance on shorter patterns. Both algorithms have widespread applications in areas like web search engines, spam filters, and natural language processing.
The document discusses the Boyer-Moore string searching algorithm. It works by preprocessing the pattern string and comparing characters from right to left. If a mismatch occurs, it uses two heuristics - bad character and good suffix - to determine the shift amount. The bad character heuristic shifts past mismatching characters, while the good suffix heuristic looks for matching suffixes to allow larger shifts. The algorithm generally gets faster as the pattern length increases, running in sub-linear time on average. It has applications in tasks like virus scanning and database searching that require high-speed string searching.
This document discusses different pattern recognition algorithms that could be implemented in real-time data sets. It begins by defining pattern recognition and providing examples. It then discusses why pattern recognition is important and lists several applications. The document goes on to describe three main approaches to pattern recognition - statistical, syntactic, and neural pattern recognition - and provides examples for each. It then provides more detailed descriptions and pseudocode for several specific algorithms, including KMP, Boyer-Moore, Rabin-Karp, naive string matching, and brute-force string matching. It concludes by discussing future work improving algorithm complexity and potential applications in biometric identification.
The document summarizes and provides code examples for four pattern matching algorithms:
1. The brute force algorithm checks each character position in the text to see if the pattern starts there, running in O(mn) time in worst case.
2. The Boyer-Moore algorithm uses a "bad character" shift and "good suffix" shift to skip over non-matching characters in the text, running faster than brute force.
3. The Knuth-Morris-Pratt algorithm uses a failure function to determine the maximum shift of the pattern on a mismatch, avoiding wasteful comparisons.
4. The failure function allows KMP to skip portions of the text like Boyer-Moore, running
The document describes the Boyer-Moore string search algorithm, which improves on the naive string matching algorithm. It uses two rules - the bad character rule and good suffix rule - to skip unnecessary character comparisons, making string searches more efficient. The bad character rule uses a table to determine how far to shift the pattern when a mismatch occurs, while the good suffix rule allows reusing matches when they are found. Together these rules allow Boyer-Moore to significantly outperform the naive algorithm.
The role of materialized views is becoming vital in today’s distributed Data warehouses. Materialization is
where parts of the data cube are pre-computed. Some of the real time distributed architectures are
maintaining materialization transparencies in the sense the users are not known with the materialization at
a node. Usually what all followed by them is a cache maintenance mechanism where the query results are
cached. When a query requesting materialization arrives at a distributed node it checks in its cache and if
the materialization is available answers the query. What if materialization is not available- the node
communicates the query in the network until a node answering the requested materialization is available.
This type of network communication increases the number of query forwarding’s between nodes. The aim
of this paper is to reduce the multiple redirects. In this paper we propose a new CB-pattern tree indexing to
identify the exact distributed node where the needed materialization is available.
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming TELKOMNIKA JOURNAL
Genomic repeats, i.e., pattern searching in the string processing process to find
repeated base pairs in the order of deoxyribonucleic acid (DNA), requires
a long processing time. This research builds a big-data computational
model to look for patterns in strings by modifying and implementing
the Boyer-Moore algorithm on Apache Spark Streaming for human DNA
sequences from the ensemble site. Moreover, we perform some experiments
on cloud computing by varying different specifications of computer clusters
with involving datasets of human DNA sequences. The results obtained show
that the proposed computational model on Apache Spark Streaming is faster
than standalone computing and parallel computing with multicore. Therefore,
it can be stated that the main contribution in this research, which is to develop
a computational model for reducing the computational costs, has been
achieved.
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONcsandit
The Internet and the ubiquitous presence of computing devices anywhere is generating a
continuously growing amount of information. However, the information entropy is not uniform.
It allows the use of data compression algorithms to reduce the demand for more powerful
processors and larger data storage equipment. This paper presents an adaptive rule-driven
device - the adaptive automata - as the device to identify repetitive patterns to be compressed in
a grammar based lossless data compression scheme.
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET Journal
This document presents a novel approach for motif discovery using data mining algorithms. It proposes the MDWB algorithm which uses a word-based approach and MapReduce functions to identify motifs from large ChIP-seq datasets. The algorithm labels control and test datasets to determine thresholds and extracts frequently occurring substrings. MapReduce is used to reduce memory and time costs during the mining stage. Experimental results on ChIP-seq datasets show the MDWB algorithm achieves higher accuracy and runs faster than previous methods for identifying transcription factor binding sites.
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? IJORCS
An algorithm for locating all occurrences of a finite number of keywords in an arbitrary string, also known as multiple strings matching, is commonly required in information retrieval (such as sequence analysis, evolutionary biological studies, gene/protein identification and network intrusion detection) and text editing applications. Although Aho-Corasick was one of the commonly used exact multiple strings matching algorithm, Commentz-Walter has been introduced as a better alternative in the recent past. Comments-Walter algorithm combines ideas from both Aho-Corasick and Boyer Moore. Large scale rapid and accurate peptide identification is critical in computational proteomics. In this paper, we have critically analyzed the time complexity of Aho-Corasick and Commentz-Walter for their suitability in large scale peptide identification. According to the results we obtained for our dataset, we conclude that Aho-Corasick is performing better than Commentz-Walter as opposed to the common beliefs.
This document summarizes a research paper that proposes a new algorithm called Constraint Based Frequent Motif Mining (CBFMM) to efficiently find frequent patterns in biological sequence data in the presence of variations. CBFMM uses a frequent pattern tree structure to flexibly handle different similarity definitions and can be used for protein and nucleotide pattern mining to identify potential malfunctions and diseases. The paper describes related work on sequence mining algorithms and motif finding approaches. It then presents the CBFMM algorithm and experimental results demonstrating its effectiveness at mining biological domains for cryptic sequence repeats compared to other methods.
An Application of Pattern matching for Motif IdentificationCSCJournals
Pattern matching is one of the central and most widely studied problem in theoretical computer science. Solutions to the problem play an important role in many areas of science and information processing. Its performance has great impact on many applications including database query, text processing and DNA sequence analysis. In general Pattern matching algorithms are based on the shift value, the direction of the sliding window and the order in which comparisons are made. The performance of the algorithms can be enhanced to a great extent by a larger shift value and less number of comparison to get the shift value. In this paper we proposed an algorithm, for finding motif in DNA sequence. The algorithm is based on preprocessing of the pattern string(motif) by considering four consecutive nucleotides of the DNA that immediately follow the aligned pattern window in an event of mismatch between pattern(motif) and DNA sequence .Theoretically, we found the proposed algorithms work efficiently for motif identification.
This document discusses sequence alignment, which involves arranging biological sequences like DNA, RNA, or proteins to identify regions of similarity. It covers the basic concepts of sequence alignment including global versus local alignment and different methods like dot matrix, dynamic programming, and word-based approaches. Dynamic programming is highlighted as the most common algorithm that uses a scoring system to find the optimal alignment between two sequences.
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsIRJET Journal
This document summarizes research on plagiarism detection in computer programming assignments. It provides an abstract of the research topic and reviews several existing approaches for detecting plagiarism, including similarity-based, logic-based, and machine learning-based methods. Feature-based, string-matching, and algorithm-based techniques are discussed. The review identifies areas for further development, such as improving accuracy and supporting additional programming languages. The goal is to determine the best algorithm and methodology for developing a system to detect plagiarism in code.
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
In the field of proteomics because of more data is added, the computational methods need to be more
efficient. The part of molecular sequences is functionally more important to the molecule which is more
resistant to change. To ensure the reliability of sequence alignment, comparative approaches are used. The
problem of multiple sequence alignment is a proposition of evolutionary history. For each column in the
alignment, the explicit homologous correspondence of each individual sequence position is established. The
different pair-wise sequence alignment methods are elaborated in the present work. But these methods are
only used for aligning the limited number of sequences having small sequence length. For aligning
sequences based on the local alignment with consensus sequences, a new method is introduced. From NCBI
databank triticum wheat varieties are loaded. Phylogenetic trees are constructed for divided parts of
dataset. A single new tree is constructed from previous generated trees using advanced pruning technique.
Then, the closely related sequences are extracted by applying threshold conditions and by using shift
operations in the both directions optimal sequence alignment is obtained.
Sequence alignment involves arranging DNA, RNA, or protein sequences to identify similar regions and discover functional, structural, and evolutionary relationships. It compares a reference sequence to a query sequence. Alignments reveal regions of similarity that unlikely occurred by chance and may indicate common ancestry. Global alignment looks for conserved regions across full sequences while local alignment finds local matches between subsequences. Pairwise alignment involves two sequences while multiple sequence alignment handles three or more. Dynamic programming and word methods are common algorithmic approaches to sequence alignment.
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmEditor IJCATR
Sensor network consists of a large number of small nods, strongly interacting with the physical environment, takes
environmental data through sensors, and reacts after processing on information. Wireless network technologies are widely used in most
applications. As wireless sensor networks have many activities in the field of information transmission, network congestion cannot be
thus avoided. So it seems necessary that some new methods can control congestion and use existing resources for providing better traffic
demands. Congestion increases packet loss and retransmission of removed packets and also wastes of energy. In this paper, a novel
method is presented for congestion control in wireless sensor networks using genetic algorithm. The results of simulation show that the
proposed method, in comparison with the algorithm LEACH, can significantly improve congestion control at high speeds.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
Abstract There exist many computational methods for finding similarity in gene sequence, finding suitable methods that gives optimal similarity is difficult task. Objective of this project is to find an appropriate method to compute similarity in gene/protein sequence, both within the families and across the families. Many dynamic programming algorithms like Levenshtein edit distance; Longest Common Subsequence and Smith-waterman have used dynamic programming approach to find similarities between two sequences. But none of the method mentioned above have used real benchmark data sets. They have only used dynamic programming algorithms for synthetic data. We proposed a new method to compute similarity. The performance of the proposed algorithm is evaluated using number of data sets from various families, and similarity value is calculated both within the family and across the families. A comparative analysis and time complexity of the proposed method reveal that Smith-waterman approach is appropriate method when gene/protein sequence belongs to same family and Longest Common Subsequence is best suited when sequence belong to two different families. Keywords - Bioinformatics, Gene, Gene Sequencing, Edit distance, String Similarity.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...IRJET Journal
This document presents a proposed static load balancing method for parallel mining of frequent sequences. It partitions the set of all frequent sequences into disjoint prefix-based equivalence classes (PBECs). It estimates the relative execution time of each PBEC using the sequential PrefixSpan algorithm on sampling data sets. The PBECs are then assigned to processors based on these estimated execution times to balance the computational load. The goal is to develop an efficient parallel frequent sequence mining algorithm that can scale to large datasets through static load balancing of computations across multiple processors.
Application of support vector machines for prediction of anti hiv activity of...Alexander Decker
This document describes a study that used support vector machines (SVM) to develop a quantitative structure-activity relationship (QSAR) model to predict the anti-HIV activity of TIBO derivatives. The SVM model achieved high correlation (q2=0.96) and low error (RMSE=0.212), outperforming artificial neural networks and multiple linear regression models developed on the same data set. The results indicate that SVM is a valuable tool for QSAR modeling and predicting anti-HIV activity of chemical compounds.
Data mining projects topics for java and dot netredpel dot com
This document discusses several papers related to data mining and machine learning techniques. It begins with a brief summary of each paper, discussing the key contributions and findings. The summaries cover topics such as differential privacy-preserving data anonymization, fault detection in power systems using decision trees, temporal pattern searching in event data, high dimensional indexing for similarity search, landmark-based approximate shortest path computation, feature selection for high dimensional data, temporal pattern mining in data streams, data leakage detection, keyword search in spatial databases, analyzing relationships on Wikipedia, improving recommender systems using user-item subgroups, decision trees for uncertain data, and building confidential query services in the cloud using data perturbation.
Bioinformatics may be defined as the field of science
in which biology, computer science, and information
technology merge to form a single discipline. Its ultimate
goal is to enable the discovery of new biological insights as
well as to create a global perspective from which unifying
principles in biology can be discerned by means of
bioinformatics tools for storing, retrieving, organizing and
analyzing biological data. Also most of these tools possess
very distinct features and capabilities making a direct
comparison difficult to be done. In this paper we propose
taxonomy for characterizing bioinformatics tools and briefly
surveys major bioinformatics tools under each categories.
Hopefully this study will stimulate other designers
and
experienced end users understand the details of particular
tool categories/tools, enabling them to make the best choices
for their particular research interests.
A Survey of String Matching AlgorithmsIJERA Editor
The concept of string matching algorithms are playing an important role of string algorithms in finding a place where one or several strings (patterns) are found in a large body of text (e.g., data streaming, a sentence, a paragraph, a book, etc.). Its application covers a wide range, including intrusion detection Systems (IDS) in computer networks, applications in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. In this paper we present a short survey for well-known and recent updated and hybrid string matching algorithms. These algorithms can be divided into two major categories, known as exact string matching and approximate string matching. The string matching classification criteria was selected to highlight important features of matching strategies, in order to identify challenges and vulnerabilities.
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
process in which instead comparing whole query sequence with database sequence it breaks
query sequence into small words and these words are used to align patterns. it uses heuristic method which
make it faster than earlier smith-waterman algorithm. But due small query sequence used for align in case of
very large database with complex queries it may perform poor. To remove this draw back we suggest by using
MSA tools which can filter database in by removing unnecessary sequences from data. This sorted data set then
applies to BLAST which can then indentify relationship among them i.e. HOMOLOGS, ORTHOLOGS,
PARALOGS. The proposed system can be further use to find relation among two persons or used to create
family tree. Ortholog is interesting for a wide range of bioinformatics analyses, including functional annotation,
phylogenetic inference, or genome evolution. This system describes and motivates the algorithm for predicting
orthologous relationships among complete genomes. The algorithm takes a pairwise approach, thus neither
requiring tree reconstruction nor reconciliation
Similar to A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching (20)
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
1. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 49|P a g e
A Novel Framework for Short Tandem Repeats (STRs) Using
Parallel String Matching
D. Bala MuraliKrishna, Ch. Someswara Rao.
Department of Computer Science, S.R.K.R Engineering College, Bhimavaram, AP, India.
Associate Professor, Department of Computer Science, S.R.K.R Engineering College, Bhimavaram, AP, India.
Abstract
Short tandem repeats (STRs) have become important molecular markers for a broad range of applications, such
as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a
range of molecular ecology and diversity studies. These repeated DNA sequences are found in both Plants and
bacteria. Most of the computer programs that find STRs failed to report its number of occurrences of the
repeated pattern, exact position and it is difficult task to obtain accurate results from the larger datasets. So we
need high performance computing models to extract certain repeats. One of the solution is STRs using parallel
string matching, it gives number of occurrences with corresponding line number and exact location or position
of each STR in the genome of any length. In this, we implemented parallel string matching using JAVA Multi-
threading with multi core processing, for this we implemented a basic algorithm and made a comparison with
previous algorithms like Knuth Morris Pratt, Boyer Moore and Brute force string matching algorithms and from
the results our new basic algorithm gives better results than the previous algorithms. We apply this algorithm in
parallel string matching using multi-threading concept to reduce the time by running on multicore processors.
From the test results it is shown that the multicore processing is a remarkably efficient and powerful compared
to lower versions and finally this proposed STR using parallel string matching algorithm is better than the
sequential approaches.
Keywords: computing model, DNA, STR, parallel string matching, multicore processing.
I. Introduction
DNA molecules are subject to a variety of
mutational events. One of the less well understood is
tandem duplicationin which a stretch of DNA, which
we call the pattern, is converted into two or more
copies, each following the preceding one in a
contiguous fashion. For example we could have
consider the below String TCGGCGGCGGA, and the
pattern CGG.
In which the single occurrence of triplet CGG
has been transformed into three identical, adjacent
copies. The result of a tandem duplication event is
termed a tandem repeat. Over time, individual copies
within a tandem repeat may undergo additional,
uncoordinated mutations so that typically, only
approximate tandem copies are present. Tandem
repeats are presumed to occur frequently in genomic
sequences, comprising perhaps 10% or more of the
human genome [1]. But, accurate characterization of
the properties of tandem repeats has been limited by
the inability to easily detect them. In recent years, the
discovery of the trinucleotide repeat diseases has
piqued interest in tandem repeats.
STRs (Short Tandem Repeats) are the genetic
loci where one or few bases are repeated for varying
numbers of times. Such repetitions occur primarily
due to slipped-strand impairing and subsequent
error(s) during DNA replication, repair, or
recombination [1]. STRs comprising 2–6 base pairs
long, occur frequently and are ubiquitously
interspersed in many genomes [2, 3]. The biological
importance of STR tracts has been clearly delineated.
STR loci show extensive length polymorphism, and
hence they are widely used in DNA fingerprinting
and diversity studies [4, 5]. They are also considered
as ideal genetic markers for the construction of high-
density linkage maps.In spite of its high significance,
a bioinformatics tool for the analysis of these regions
is not available.
Available algorithms directly or indirectly detect
tandem repeats. However, there are many limitations
with these algorithms [6].The drawbacks are high
computational time required by the algorithm and
their inability to predict the positions of STRs in the
genome.Finally, researchers have to understand the
classical methods of pattern matching to develop new
efficient algorithms [7-10]. In this work, we describe
a program called STRs using parallel string matching
that uses Multi-threading to find STRs on multicore
processors. The program Framework for STRs using
parallel string matching locates repeats with
RESEARCH ARTICLE OPEN ACCESS
2. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 50|P a g e
patternsexact positon, line number and number of
occurrences of large number of files and directories
taking as input.
A serious comparison between different tools
should, of course, be performed by a third party
against a sufficient number of carefully selected test
problems, carefully tuning many different search
parameters; meanwhile we can say that in the case of
naive runs with default parameters on easy problems,
all the above programs appear to output basically
equivalent results. With the developments of new
string matching techniques, efficiency and speed are
the main factors in deciding among different options
available for each application area [11-13].However,
a major drawback of all such tools is that, when
running against very long sequences, they produce a
large amount of cumbersome results that require a
painstaking interpretation. In order to provide the
user with some capability of grasping the STRs at
first glance and looking over the drawbacks for large
sequencing files, we decided that the only way to do
this is by introducing parallel programing concept
into STRs identification [14-17]. And since, as far as
we know, no tool with Parallel programing feature is
as yet available [18, 19], a novel framework for the
STRs using parallel string matching of the results
compared with sequential approach has been devised.
STRs parallel string approach developed for
scanning genomes to find repeats of any length, their
exact position on chromosome and number of
occurrence. It can accept large sequences and large
number of files and directories up to GBs as input
can be searched simultaneously on multicore
processors. It divides the files and directories for
different cores and run these files parallel then after it
gives the files getting the output. Thus, the running
time of the program is greatly reduced. The
advantages over many other programs developed for
STR identification includes its ability to do parallel
programing using Multi-threading with multi core
processingto search patterns repeated for a number of
times of large input files and to give the exact
position and number of occurrences of the pattern in
the genome.
II. Methodology
Our goal is efficiently search STRs in large
genomes of given pattern. The proposed
methodology has two phases, Sequential and Parallel
approaches. We model a basic algorithm which is
giving better results than the previous algorithms like
Boyer Moore, KMP and Brute force. In the first
phase we apply on Sequential approach and later in
the second phase the basic algorithm is to be applied
on parallel string matching using Multi-threading
with multi core processing concept. Finally we give a
comparison between Boyer Moore, KMP and Brute
force with our basic algorithm in sequential approach
and then comparison between sequential approach
and parallel approach of basic algorithm.
Methodology comprises of two phases
Phase 1:STR using Sequential String matching
approach
The Sequential string matching approach uses
the basic algorithm which is better than the previous
algorithms and it processes the input file line by line
from a document and run sequential manner to
finding the repeats. The algorithm works with
Improved Right Prefix concept, the below example
demonstrate the working of the algorithm.
Figure 1: Example for Sequential pattern searching
Phase 2:STR Using Parallel String matching
approach
In The second Phase “Short Tandem Repeats
(STRs) Using Parallel String Matching” we apply the
above basic algorithm to this concept and it processes
the input file at first we take the input as a string or
text. The required text that is to be searched is further
divided into further small patterns or processes and
all this patterns are passed on the basic algorithm and
produce the result as number of occurrences of
repeated pattern as shown in the Fig.2 below.
Figure 2: Parallel approach using Multi-Threading
concept
G G C C G A G A A G A T A
G A T C
3. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 51|P a g e
Because threads can operate independently on
different areas of an array for this algorithm, you will
see a clear performance boost on multicore
architectures compared to a sequential algorithm that
would iterate over each process in the array. It uses
the parallel execution of multi cores at a time which
will results the speedup in the number of cores with
minimal effort, because the Multi-threading takes
care of maximizing parallelism.
The Flowchart of the Improved Right Prefix is
shown in Fig 3.It .scans the genome sequence line by
line for given input file and process the linefor given
pattern length e.g. (ATGCATGCATGC)is a
nucleotide with pattern count =3.If STR found then it
identifies the position of the pattern and then
increment the count value and search for next STR, if
not then again reads the line of sequence. The process
is continues until all the given patterns are found then
it gives the number of occurrences of the pattern with
line number and position.
Figure 3: Flowchart of Improved Right Prefix
algorithm
III. Results and Discussion
The sequential approach gives the better results
by using the basic algorithm compared to previous
algorithms Boyer Moore, KMP and Brute force with
additional features like exact position with line
number and number of occurrences. Then this basic
algorithm reduces the time when we are applying in
parallel string matching using Multi-threading with
multi core processing,the results are illustrated in the
histogram below. The sequential approach took
80ms, the parallel string matching using Multi-
threading approach took 40ms.
The Fig 4 shows Execution time vs File size on
sequential search withBrute Force, Improved Right
Prefix Algorithm. This graph shows the performance
difference between Brute Force,Improved Right
Prefix algorithms. From the graph we clearly observe
that IRP is better compared to Brute Force.
Figure 4: Histogram for Brute Forcevs IRP (Time vs
File size)
The Fig 5 shows Execution time vs File size on
sequential search with KMP, Improved Right Prefix
Algorithm. This graph shows the performance
difference between KMP,Improved Right Prefix
algorithms. From the graph we clearly observe that
IRP is better compared to KMP.The Fig 6 shows
Execution time vs File size using Boyer Moore, IRP
Algorithm on Sequential search. From the graph we
clearly observe that IRP is much better than Boyer
Moore.
0
200
400
600
800
1000
5 10 15 20 25 30
Time(ms)
File Size (MB)
Brute Force vs IRP
Bruteforce IRP
4. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 52|P a g e
Figure 5: Histogram for KMPvs IRP (Time vs File
size)
The most crucial stage in achieving a new
successful system and in giving confidence on the
system for the users that will work efficiently and
effectively. The system will be implemented only
after thorough testing and if it is found to work
according to the specification. For testing our
proposed system we will take the gene sequence data
set, consists of the four nucleotides A, C, G and T
(standing for adenine, cytosine, guanine, and
thymine, respectively) used to encode DNA.
Figure 6: Histogram for Boyer Moore vs IRP (Time
vs File size)
Therefore, the alphabet is O= {A, C, G, T}.
The text is consisted of 7, 50,000 records. Our
algorithm tested on different cores like 2, 4, 8, 12
etc., in the Fig 8 (Table). Here we put some
achievements what we develop and observe, and
finally our system shows that parallel approach is
much better than sequential approach with multi core
processor. The Fig 7 shows (Histogram) Execution
time vs File size on sequential search and parallel
search with Improved Right Prefix. From the Fig (4,
5, 6) we clearly observe that IRP is better compared
to other approaches.
Figure 7: Histogram for Sequential vs Parallel
approach
Figure 8: Table of Parallel execution times with
different cores
IV. Conclusion
In this paper we performed a comparisonbetween
Knuth Morris Pratt, Boyer Moore and Brute force
string matching algorithms with our improved right
prefix algorithm in sequential approach based on the
running time and in our tests with multicore
processing, we used strings of varying lengths and
texts of varying lengths. From the test results it is
shown that the improved right prefix algorithm is
extremely efficient in all cases. We apply this
algorithm in parallel string matching using multi-
0
200
400
600
800
1000
5 10 15 20 25 30 35
Time(ms)
File Size (MB)
KMP vs IRP
KMP IRP
0
200
400
600
800
1000
5 10 15 20 25 30
Time(ms)
File Size (MB)
Boyer Moore vs IRP
Boyer Moore IRP
0
200
400
600
800
1000
5 10 15 20 25 30
Time(ms)
File Size (MB)
Sequential Approach vs Parallel
Approach
Sequential Parallel
Number of Cores Parallel Execution Time (MS)
2 11026
4 8329
8 4208
12 2876
5. D. Bala MuraliKrishnaInt. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 5, Issue 9, (Part - 2) September 2015, pp.49-53
www.ijera.com 53|P a g e
threading concept to reduce the time by running on
multicore processors. We conclude that STR using
parallel string matching is the most efficient
compared to earlier versions. As a future
enhancement, we apply this algorithmto fork and join
concept in JAVA SE7 of parallel string matching
algorithm thereby finding the most efficient
algorithm which can be used in many fields such as
cryptography, molecular biology. Thus the problem
of matching becomes easier.
References
[1] Angelika Merkel and Neil Gemmell (2008)
Detecting short tandem repeats from
genome data: opening the software black
box, VOL 9. NO 5. 355-366.
[2] Gary Benson (1998) Tandem Repeats
Finder: A program to analyze DNA
sequences Nucleic Acids Research, 1999,
Vol_ 27, No. 2 573_580.
[3] Karen Norrgard (Write Science Right)
(2008) Forensics, DNA fingerprinting, and
CODIS. Nature Education 1(1):35.
[4] Kathleen McNamara-Schroeder, Cheryl
Olonan, Simon Chu, Maria C. Montoya,
MahtaAlviri,ShannonGinty, and John J.
Love (2005) DNA Fingerprint Analysis of
Three Short Tandem Repeat (STR) Loci for
Biochemistry and Forensic Science
Laboratory Courses November 21, and in
revised form, April 3, Vol. 34, No. 5, pp.
378–383, 2006.
[5] KoichiroDoi, TakuMonjo Pham H.
Hoang, Jun Yoshimura, Hideaki Yurino, Jun
Mitsui, Hiroyuki Ishiura, Yuji Takahashi,
Yaeko Ichikawa, Jun Goto, Shoji Tsuji,
and Shinichi
MorishitaRapid(2014) detection of expanded
short tandem repeats in personal genomics
using hybrid sequencing
Bioinformatics 30 (6): 815-822.
[6] TamannaAnwarandAsad U Khan (2006)
SSRscanner: a program for reporting
distribution and exact location of simple
sequence repeats.
[7] Jonathan L., “Analysis of Fundamental
Exact and Inexact Pattern Matching
Algorithms,” Technical Document, Stanford
University, 2004.
[8] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “Recent Advancement is
Parallel Algorithms for String matching on
computing models - A survey and
experimental results”, LNCS, Springer,
pp.270-278, ISBN: 978-3-642-29279-8,
2011.
[9] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “PDM data classification
from STEP- an object oriented String
matching approach”, IEEE conference on
Application of Information and
Communication Technologies, pp.1-9, ISBN:
978-1-61284-831-0, 2011.
[10] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “Recent Advancement is
Parallel Algorithms for String matching - A
survey and experimental results”, IJAC, Vol
4 issue 4, pp-91- 97, 2012.
[11] Simon Y. and Inayatullah M., “Improving
Approximate Matching Capabilities for
Meta Map Transfer Applications,”
Proceedings of Symposium on Principles
and Practice of Programming in Java,
pp.143 147, 2004.
[12] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “Parallel Algorithms for
String Matching Problem based on Butterfly
Model”, pp.41-56, IJCST, Vol. 3, Issue 3,
July – Sept, ISSN 2229-4333, 2012.
[13] ChintaSomeswararao, K Butchiraju, S
ViswanadhaRaju, “Recent Advancement is
String matching algorithms- A survey and
experimental results”, IJCIS, Vol 6 No 3,
pp.56-61, 2013.
[14] Geist, A., Beguelin, A., Dongarra, J., Jiang,
W., Manchek, R. & V., S. “PVM: Parallel
Virtual Machine”, A Users Guide and
Tutorial for Networked Parallel Computing,
MIT Press. 1994.
[15] Grama, A., Karypis, G., Kumar, V. &
Gupta, A. “Introduction to Parallel
Computing”, Addison Wesley, 2003.
[16] Leow, Y., Ng, C. &W.F.,W.. “Generating
hardware from OpenMP programs”,
International Conference on Field
Programmable Technology, pp. 73–80,
2006.
[17] Marr, D. T., Binns, F., Hill, D. L., Hinton,
G., Koufaty, D. A., Miller, J. A. & Upton,
M..“Hyper-Threading Technology
Architecture and Microarchitecture”, Intel
Technology Journal, pp. 4–15, 2002.
[18] Donald Knuth; James H. Morris, Jr,
Vaughan Pratt, "Fast pattern matching in
strings". SIAM Journal on Computing, pp-
323–350, 1977.
[19] Z. Galil, “Optimal parallel algorithms for
string matching,” in Proc. 16th Annu. ACM
symposium on Theory of computing, pp.
240-248, 1984.