The DNA sequences similarity analysis approaches have been based on the representation and the frequency of sequences components; however, the position inside sequence is important information for the sequence data. Whereas, insufficient information in sequences representations is important reason that causes poor similarity results. Based on three classifications of the DNA bases according to their chemical properties, the frequencies and average positions of group mutations have been grouped into two twelve-components vectors, the Euclidean distances among introduced vectors applied to compare the coding sequences of the first exon of beta globin gene of 11 species.
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
Pairwise Sequence Alignment between HBV and HCC Using Modified Needleman-Wuns...TELKOMNIKA JOURNAL
Ths paper aims to find similarity of Hepatitis B virus (HBV) and Hepatocelluler Carcinoma (HCC) DNA sequences. It is very important in bioformatics task. The similarity of sequence allignments indicates that they have similarity of chemical and physical properties. Mutation of the virus DNA in X region has potential role in HCC. It is observed using pairwise sequence alignment of genotype-A in HBV. The complexity of DNA sequence using dynamic programming, Needleman-Wunsch algorithm, is very high. Therefore, it is purpose to modifiy the method of Needleman Wunsch algorithm for optimum global DNA sequence alignment. The main idea is to optimize filling matrix and backtracking proccess of DNA components.This method can also solve various length of the both sequence alignment.
This research is applied to DNA sequence of 858 hepatitis B virus and 12 carcinoma patient, so that there are 10,296 pairwis of sequences. They are aligned globally using the purposed method and as a result, it is achieved high similarity of 96.547% and validity of 99.854%. Furhthermore, this method has reduced the complexity of original Needleman-Wunsch algorithm The reduction of computational time is as 34.6% and space complexity is as 42.52%.
Disintegration of the small world property with increasing diversity of chemi...N. Sukumar
Authors: Ganesh Prabhu, Sudeepto Bhattacharya,, Michael Krein, N. Sukumar (ORCID: 0000-0002-2724-9944). Full paper in J. Math. Chem. 54(10), 1916-1941 (2016).
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
Pairwise Sequence Alignment between HBV and HCC Using Modified Needleman-Wuns...TELKOMNIKA JOURNAL
Ths paper aims to find similarity of Hepatitis B virus (HBV) and Hepatocelluler Carcinoma (HCC) DNA sequences. It is very important in bioformatics task. The similarity of sequence allignments indicates that they have similarity of chemical and physical properties. Mutation of the virus DNA in X region has potential role in HCC. It is observed using pairwise sequence alignment of genotype-A in HBV. The complexity of DNA sequence using dynamic programming, Needleman-Wunsch algorithm, is very high. Therefore, it is purpose to modifiy the method of Needleman Wunsch algorithm for optimum global DNA sequence alignment. The main idea is to optimize filling matrix and backtracking proccess of DNA components.This method can also solve various length of the both sequence alignment.
This research is applied to DNA sequence of 858 hepatitis B virus and 12 carcinoma patient, so that there are 10,296 pairwis of sequences. They are aligned globally using the purposed method and as a result, it is achieved high similarity of 96.547% and validity of 99.854%. Furhthermore, this method has reduced the complexity of original Needleman-Wunsch algorithm The reduction of computational time is as 34.6% and space complexity is as 42.52%.
Disintegration of the small world property with increasing diversity of chemi...N. Sukumar
Authors: Ganesh Prabhu, Sudeepto Bhattacharya,, Michael Krein, N. Sukumar (ORCID: 0000-0002-2724-9944). Full paper in J. Math. Chem. 54(10), 1916-1941 (2016).
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
The understanding of gene regulation is the most basic need for the classification of genes within a DNA. These genes
within the DNA are grouped together into clusters also known as Transcription Units. The genes are grouped into transcription units
for the purpose of construction and regulation of gene expression and synthesis of proteins. This knowledge further contributes as
essential information for the process of drug design and to determine the protein functions of newly sequenced genomes. It is possible
to use the diverse biological information across multiple genomes as an input to the classification problem. The purpose of this work is
to show that Particle Swarm Optimization may provide for more efficient classification as compared to other algorithms. To validate
the approach E.Coli complete genome is taken as the benchmark genome.
A Study on DNA based Computation and Memory DevicesEditor IJCATR
The present study delineates Deoxyribonucleic Acid (DNA) based computing and storage devices which have good future in the vast era of information technology. The traditional devices mostly used are made up of silicon. The devices are costly and have physical limitations to cause leakage of electrons and circuit to shorten. So, there is a need of materials which are capable of doing fast processing and have vast memory storage. DNA which is a bio-molecule has all these characteristics capable of providing ample storage. In classical computing devices, electronic logic gates are elements which allow storing and transforming of information. Designing of an appropriate sequence or a net of “store” and “transform” operations (in a sense of building a device or writing a program) is equivalent to preparing some computations. In DNA based computation, the situation is analogous. The main difference is the type of computing devices since in this new method of computing instead of electronic gates, DNA molecules have been deployed for the processing of dossier. Moreover, the inherent massive parallelism of DNA computing may lead to methods solving some intractable computational problems. The aim of this research study is to analyze the logical features and memory formation using DNA bio molecules in order to achieve proliferated speed, accuracy and vast storage.
Exploring the Solution Space of Sorting by Reversals: A New ApproachIDES Editor
Analysing genome rearrangements is a problem
from the vast domain of comparative genomics and
computational biology. Several studies have shown that closely
related species have essentially the same set of genes however
their gene orders differ. The differences in the gene order are
the results of various large-scale evolutionary events of which
reversal is the most common rearrangement event. The
problem of finding the shortest sequence of reversals that can
transform one genome into another is called the sorting by
reversals problem. The length of such a sequence is the
reversal distance between the two genomes. In comparative
genomics, sorting by reversals algorithms are often used to
propose evolutionary scenarios of large-scale genomic
mutations between species. Following the first polynomial
time solution of this problem, several improvements has been
published on the subject. In 2008, Braga et al. proposed an
algorithm to perform the enumeration of traces that sort a
signed permutation by reversals. This algorithm has
exponential complexity in both time and space. To efficiently
handle the traces, Baudet and Dias proposed a depth first
approach in 2010. However, one of the limitations of the
proposed algorithm was that it cannot provide the count of
number of solutions in each trace. In this paper we are
presenting an algorithm to list the normal forms of each trace
in depth first manner and provide count of the total number of
solutions in the solution space.
Xu W., Ozer S., and Gutell R.R. (2009).
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.
21st International Conference on Scientific and Statistical Database Management. June 2-4, 2009. Springer-Verlag. pp. 200-216.
Analysis of Genomic and Proteomic Sequence Using Fir FilterIJMER
Bioinformatics is a field of science that implies the use of techniques from mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Digital Signal Processing (DSP) applications in genomic sequence analysis have received great attention in recent years.DSP principles are used to analyse genomic and proteomic sequences. The DNA sequence is mapped into digital signals in the form of binary indicator sequences. Signal processing techniques such as digital filtering is applied to genomic sequences to identify protein coding region. Frequency response of genomic sequences is used to solve many optimization problems in science, medicine and many other applications. The aim of this paper is to describe a method of generating Finite Impulse Response (FIR) of the genomic sequence. The same DNA sequence is used to convert into proteomic sequence using transcription and translation, and also digital filtering technique such as FIR filter applied to know the frequency response. The frequency response is same for both gene and proteomic sequence.
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
Multiple sequence alignment is increasingly important to bioinformatics, with several applications rangingfrom phylogenetic analyses to domain identification. There are several ways to perform multiple sequencealignment, an important way of which is the progressive alignment approach studied in this work.Progressive alignment involves three steps: find the distance between each pair of sequences; construct a guide tree based on the distance matrix; finally based on the guide tree align sequences using the concept of aligned profiles. Our contribution is in comparing two main methods of guide tree construction in terms of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our experimental results indicate that the Neighbor Join method is both more efficient in terms of performance and more accurate in terms of overall cost minimization.
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ijcseit
Multiple sequence alignment is increasingly important to bioinformatics, with several applications ranging from phylogenetic analyses to domain identification. There are several ways to perform multiple sequence alignment, an important way of which is the progressive alignment approach studied in this work.Progressive alignment involves three steps: find the distance between each pair of sequences; construct a
guide tree based on the distance matrix; finally based on the guide tree align sequences using the concept of aligned profiles. Our contribution is in comparing two main methods of guide tree construction in terms of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our experimental results indicate that the Neighbor Join method is both more efficient in terms of performance
and more accurate in terms of overall cost minimization.
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
Multiple sequence alignment is increasingly important to bioinformatics, with several applications ranging
from phylogenetic analyses to domain identification. There are several ways to perform multiple sequence
alignment, an important way of which is the progressive alignment approach studied in this work.
Progressive alignment involves three steps: find the distance between each pair of sequences; construct a
guide tree based on the distance matrix; finally based on the guide tree align sequences using the concept
of aligned profiles. Our contribution is in comparing two main methods of guide tree construction in terms
of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our
experimental results indicate that the Neighbor Join method is both more efficient in terms of performance
and more accurate in terms of overall cost minimization.
Gardner D.P., Xu W., Miranker D.P., Ozer S., Cannone J.J., and Gutell R.R. (2012).
An Accurate Scalable Template-based Alignment Algorithm.
Proceedings of 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), Philadelphia, PA. October 4-7, 2012. IEEE Computer Society, Washington, DC, USA. pp. 237-243.
Gardner D.P., Ren P., Ozer S., and Gutell R.R. (2011).
Statistical Potentials for Hairpin and Internal Loops Improve the Accuracy of the Predicted RNA Structure.
Journal of Molecular Biology, 413(2):473-483.2011. pp 15-22.
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
The understanding of gene regulation is the most basic need for the classification of genes within a DNA. These genes
within the DNA are grouped together into clusters also known as Transcription Units. The genes are grouped into transcription units
for the purpose of construction and regulation of gene expression and synthesis of proteins. This knowledge further contributes as
essential information for the process of drug design and to determine the protein functions of newly sequenced genomes. It is possible
to use the diverse biological information across multiple genomes as an input to the classification problem. The purpose of this work is
to show that Particle Swarm Optimization may provide for more efficient classification as compared to other algorithms. To validate
the approach E.Coli complete genome is taken as the benchmark genome.
A Study on DNA based Computation and Memory DevicesEditor IJCATR
The present study delineates Deoxyribonucleic Acid (DNA) based computing and storage devices which have good future in the vast era of information technology. The traditional devices mostly used are made up of silicon. The devices are costly and have physical limitations to cause leakage of electrons and circuit to shorten. So, there is a need of materials which are capable of doing fast processing and have vast memory storage. DNA which is a bio-molecule has all these characteristics capable of providing ample storage. In classical computing devices, electronic logic gates are elements which allow storing and transforming of information. Designing of an appropriate sequence or a net of “store” and “transform” operations (in a sense of building a device or writing a program) is equivalent to preparing some computations. In DNA based computation, the situation is analogous. The main difference is the type of computing devices since in this new method of computing instead of electronic gates, DNA molecules have been deployed for the processing of dossier. Moreover, the inherent massive parallelism of DNA computing may lead to methods solving some intractable computational problems. The aim of this research study is to analyze the logical features and memory formation using DNA bio molecules in order to achieve proliferated speed, accuracy and vast storage.
Exploring the Solution Space of Sorting by Reversals: A New ApproachIDES Editor
Analysing genome rearrangements is a problem
from the vast domain of comparative genomics and
computational biology. Several studies have shown that closely
related species have essentially the same set of genes however
their gene orders differ. The differences in the gene order are
the results of various large-scale evolutionary events of which
reversal is the most common rearrangement event. The
problem of finding the shortest sequence of reversals that can
transform one genome into another is called the sorting by
reversals problem. The length of such a sequence is the
reversal distance between the two genomes. In comparative
genomics, sorting by reversals algorithms are often used to
propose evolutionary scenarios of large-scale genomic
mutations between species. Following the first polynomial
time solution of this problem, several improvements has been
published on the subject. In 2008, Braga et al. proposed an
algorithm to perform the enumeration of traces that sort a
signed permutation by reversals. This algorithm has
exponential complexity in both time and space. To efficiently
handle the traces, Baudet and Dias proposed a depth first
approach in 2010. However, one of the limitations of the
proposed algorithm was that it cannot provide the count of
number of solutions in each trace. In this paper we are
presenting an algorithm to list the normal forms of each trace
in depth first manner and provide count of the total number of
solutions in the solution space.
Xu W., Ozer S., and Gutell R.R. (2009).
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.
21st International Conference on Scientific and Statistical Database Management. June 2-4, 2009. Springer-Verlag. pp. 200-216.
Analysis of Genomic and Proteomic Sequence Using Fir FilterIJMER
Bioinformatics is a field of science that implies the use of techniques from mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Digital Signal Processing (DSP) applications in genomic sequence analysis have received great attention in recent years.DSP principles are used to analyse genomic and proteomic sequences. The DNA sequence is mapped into digital signals in the form of binary indicator sequences. Signal processing techniques such as digital filtering is applied to genomic sequences to identify protein coding region. Frequency response of genomic sequences is used to solve many optimization problems in science, medicine and many other applications. The aim of this paper is to describe a method of generating Finite Impulse Response (FIR) of the genomic sequence. The same DNA sequence is used to convert into proteomic sequence using transcription and translation, and also digital filtering technique such as FIR filter applied to know the frequency response. The frequency response is same for both gene and proteomic sequence.
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
Multiple sequence alignment is increasingly important to bioinformatics, with several applications rangingfrom phylogenetic analyses to domain identification. There are several ways to perform multiple sequencealignment, an important way of which is the progressive alignment approach studied in this work.Progressive alignment involves three steps: find the distance between each pair of sequences; construct a guide tree based on the distance matrix; finally based on the guide tree align sequences using the concept of aligned profiles. Our contribution is in comparing two main methods of guide tree construction in terms of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our experimental results indicate that the Neighbor Join method is both more efficient in terms of performance and more accurate in terms of overall cost minimization.
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ijcseit
Multiple sequence alignment is increasingly important to bioinformatics, with several applications ranging from phylogenetic analyses to domain identification. There are several ways to perform multiple sequence alignment, an important way of which is the progressive alignment approach studied in this work.Progressive alignment involves three steps: find the distance between each pair of sequences; construct a
guide tree based on the distance matrix; finally based on the guide tree align sequences using the concept of aligned profiles. Our contribution is in comparing two main methods of guide tree construction in terms of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our experimental results indicate that the Neighbor Join method is both more efficient in terms of performance
and more accurate in terms of overall cost minimization.
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
Multiple sequence alignment is increasingly important to bioinformatics, with several applications ranging
from phylogenetic analyses to domain identification. There are several ways to perform multiple sequence
alignment, an important way of which is the progressive alignment approach studied in this work.
Progressive alignment involves three steps: find the distance between each pair of sequences; construct a
guide tree based on the distance matrix; finally based on the guide tree align sequences using the concept
of aligned profiles. Our contribution is in comparing two main methods of guide tree construction in terms
of both efficiency and accuracy of the overall alignment: UPGMA and Neighbor Join methods. Our
experimental results indicate that the Neighbor Join method is both more efficient in terms of performance
and more accurate in terms of overall cost minimization.
Gardner D.P., Xu W., Miranker D.P., Ozer S., Cannone J.J., and Gutell R.R. (2012).
An Accurate Scalable Template-based Alignment Algorithm.
Proceedings of 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), Philadelphia, PA. October 4-7, 2012. IEEE Computer Society, Washington, DC, USA. pp. 237-243.
Gardner D.P., Ren P., Ozer S., and Gutell R.R. (2011).
Statistical Potentials for Hairpin and Internal Loops Improve the Accuracy of the Predicted RNA Structure.
Journal of Molecular Biology, 413(2):473-483.2011. pp 15-22.
The objective of the experiment is for students to become familiar with databases that can be used to investigate gene sequences and to construct cladograms that provide evidence for evolutionary relatedness among species.
In this investigation, students will:
1. Create cladograms that depict evolutionary relationships among organisms.
2. Analyze biological data with online bioinformatics tools.
3. Connect and apply concepts pertaining to genetics and evolution.
Dna data compression algorithms based on redundancyijfcstjournal
Carl Jung said, 'Collective unconscious' i.e. we are all connected to each other in some way or the other via our DNA.In frequent cases there are four bases in a DNA. They are a (Adenine),c (Cytosine),g(Guanine)
and t (Thymine).Each of these bases can be represented by two bits as 2 powers 2 =4 i.e.a–00,c–01,g–11 and t–10 respectively, although this choice is random.Soredundancy within a sequence is more likely to exist.That’s why in this paper wehave explored different types of repeat to compress DNA.These are direct repeats, palindrome or reverse direct repeat , inverted exact repeats or complementary palindrome or exact reverse complement, inverted approximate repeats or approximate complementary palindrome or approximate reverse complement, interspersed or dispersed repeats,
flanking repeats or terminal repeats etc. Better compression gives better network speed and save storage space.
This paper presents a literature survey conducted for research oriented developments made till. The significance of this paper would be to provide a deep rooted understanding and knowledge transfer regarding existing approaches for gene sequencing and alignments using Smith Waterman algorithms and their respective strengths and weaknesses. In order to develop or perform any quality research it is always advised to conduct research goal oriented literature survey that could facilitate an in depth understanding of research work and an objective can be formulated on the basis of gaps existing between present requirements and existing approaches. Gene sequencing problems are one of the predominant issues for researchers to come up with optimized system model that could facilitate optimum processing and efficiency without introducing overheads in terms of memory and time. This research is oriented towards developing such kind of system while taking into consideration of dynamic programming approach called Smith Waterman algorithm in its enhanced form decorated with other supporting and optimized techniques. This paper provides an introduction oriented knowledge transfer so as to provide a brief introduction of research domain, research gap and motivations, objective formulated and proposed systems to accomplish ultimate objectives.
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
Using social media in education provides learners with an informal way for communication. Informal communication tends to remove barriers and hence promotes student engagement. This paper presents our experience in using three different social media technologies in teaching software project management course. We conducted different surveys at the end of every semester to evaluate students’ satisfaction and engagement. Results show that using social media enhances students’ engagement and satisfaction. However, familiarity with the tool is an important factor for student satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The amount of piracy in the streaming digital content in general and the music industry in specific is posing a real challenge to digital content owners. This paper presents a DRM solution to monetizing, tracking and controlling online streaming content cross platforms for IP enabled devices. The paper benefits from the current advances in Blockchain and cryptocurrencies. Specifically, the paper presents a Global Music Asset Assurance (GoMAA) digital currency and presents the iMediaStreams Blockchain to enable the secure dissemination and tracking of the streamed content. The proposed solution provides the data owner the ability to control the flow of information even after it has been released by creating a secure, selfinstalled, cross platform reader located on the digital content file header. The proposed system provides the content owners’ options to manage their digital information (audio, video, speech, etc.), including the tracking of the most consumed segments, once it is release. The system benefits from token distribution between the content owner (Music Bands), the content distributer (Online Radio Stations) and the content consumer(Fans) on the system blockchain.
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This paper discusses the importance of verb suffix mapping in Discourse translation system. In
discourse translation, the crucial step is Anaphora resolution and generation. In Anaphora
resolution, cohesion links like pronouns are identified between portions of text. These binders
make the text cohesive by referring to nouns appearing in the previous sentences or nouns
appearing in sentences after them. In Machine Translation systems, to convert the source
language sentences into meaningful target language sentences the verb suffixes should be
changed as per the cohesion links identified. This step of translation process is emphasized in
the present paper. Specifically, the discussion is on how the verbs change according to the
subjects and anaphors. To explain the concept, English is used as the source language (SL) and
an Indian language Telugu is used as Target language (TL)
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The using of information technology resources is rapidly increasing in organizations,
businesses, and even governments, that led to arise various attacks, and vulnerabilities in the
field. All resources make it a must to do frequently a penetration test (PT) for the environment
and see what can the attacker gain and what is the current environment's vulnerabilities. This
paper reviews some of the automated penetration testing techniques and presents its
enhancement over the traditional manual approaches. To the best of our knowledge, it is the
first research that takes into consideration the concept of penetration testing and the standards
in the area.This research tackles the comparison between the manual and automated
penetration testing, the main tools used in penetration testing. Additionally, compares between
some methodologies used to build an automated penetration testing platform.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
In order to treat and analyze real datasets, fuzzy association rules have been proposed. Several
algorithms have been introduced to extract these rules. However, these algorithms suffer from
the problems of utility, redundancy and large number of extracted fuzzy association rules. The
expert will then be confronted with this huge amount of fuzzy association rules. The task of
validation becomes fastidious. In order to solve these problems, we propose a new validation
method. Our method is based on three steps. (i) We extract a generic base of non redundant
fuzzy association rules by applying EFAR-PN algorithm based on fuzzy formal concept analysis.
(ii) we categorize extracted rules into groups and (iii) we evaluate the relevance of these rules
using structural equation model.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
2. 14 Computer Science & Information Technology (CS & IT)
uses the principle of finding the similarity between the two sequences statistically. This method
matches one sequence of DNA or protein with the other by local sequence alignment method. It
searches for local region for similarity and not the best match between two sequences.
In addition to performing alignments, is very popular due to its availability on the World Wide
Web through a large server at the National Center for Biotechnology Information (NCBI) and at
many other sites. Has evolved to provide molecular biologists with a set of very powerful search
tools that are freely available to run on many computer platforms.
Also, they are UCLUST [3] and CD-HIT [4] and many more, Obviously it consumes time while
running however, the similarity can be quickly computed with the alignment-based method that
converts each piece of DNA sequence into a feature vector in a new space. To generate feature
vectors some algorithms exploit probabilistic models of which the Markov model [5-6], SVM-
based approaches [7], widely used in bioinformatics applications.
Other technique used statistics method for sequence comparison, based on the joint k-tuple
content in two sequences called K-typle Algorithm One of the very popular alignment-free
methods [8, 9], in which DNA sequence is divided into a window of length k (word of length).
The feature vector is generated by the calculated to the frequency value of each tuple; the
similarity can be quickly measured by some distance metric between vectors. Such as KLD [10]
from two given DNA sequences, was constructed two frequencies vectors of n-words over a
sliding window, whereas was derived by a probabilistic distance between two Sequences using a
symmetrized version of the KLD, Which directly compares two Markov models built for the two
corresponding biological sequences.
On the other hand, these methods cannot completely describe all information contained in a DNA
sequence, since they only contains the word frequency information, therefore, many researches
modified k-tuple are proposed to contain more information. [11] used both the overlapping
structure of words and word frequency to improve the efficiency of sequence comparison. [12]
Transformed the DNA sequence into the 60-dimension distribution vectors.
In order to help improve DNA sequence analysis methods ,the graphical representations of DNA
sequences on 2D or 3D space [13-14] applied by several researches, but there are some
disadvantage as loss of information due to crossing and overlapping of the curve representing
DNA with itself [15-16]. To avoid this problem many new graphical representation methods
recently [17-14] have been invented.
Other works [18-19] have based on the dinucleotide analysis. To reveal the biology information
of DNA sequences. Based on qualitative comparisons used the three classifications of the four
DNA bases A, G, T and C, according their chemical properties.
[20] Present the DNA sequence by a 12-component vector consisting of twelve frequencies of
group mutations, and calculated the similarity between deferent vectors by the Euclidian distance.
While [21], converted a DNA sequence into three 2-dimension cumulative ratio curves the R/Y-
ratio curve, the K/M-ratio curve and the W/S-ratio curve, the coordination of every node on these
2-D cumulative ratio curves have clear biological implication.
3. Computer Science & Information Technology (CS & IT) 15
Li and Wang [22] presented a 16-dimension binary vector based also on the group of nucleotide
bases. These methods give encouraging results, they are focused much more on the sequence
frequency that the position for sequences analyses, Dong and Pei [23] argued that the position
inside sequence is important information. Therefore, unsufficient information in a feature vector
is important reason that causes poor similarity results.
In this paper, we combine the advantages of other methods with our own proposal. We presented
each DNA sequence by three symbolic sequences according to their chemical proprieties, the
group mutations have been grouped into two twelve-components vectors. The first represent the
frequency and the second represent the average position, to compare the coding sequences of the
first exon of beta globin gene of 11 different species, we applied the Euclidean distances among
introduced vectors.
2. MODELLING
2.1 Data Set
We have used in our experimentation DNA sequences derived by the data Set obtained by [23] ,
the data set contain the first exon of beta globin genes of 11 different species in Table 1
Table 1. The first exon of beta globin genes of 11 different species
4. 16 Computer Science & Information Technology (CS & IT)
2.2 Proposed Method
It is difficult to obtain the information from DNA primary sequence directly; in our approach we
based on the three classes of DNA bases, according the chemical properties, the purine group R =
{A, G} and pyrimidine group Y = {C, T}; amino group M = {A, C} and keto group K = {G, T};
weak H-bond group W= {A,T} and strong H-bond group S= {C, G}. They call RY classification,
MK classification and WS classification correspondingly. Whereas, for the primary sequence X =
S1..S2…S3…. Sn, with length n, is presented by three different sequences according the three
classification, RY, MK and WS by
Each DNA sequence represented by the three symbolic sequences according the three formula
above.
2.2.1 Frequency Analyses
In each classification we focus on group mutation information, for the three symbolic sequences
there are twelve group mutations, R →R, R →Y, Y →R, Y →Y, M →M, M →K, K →M, K
→K, W →W, S →W, W →S, S →S.
As a first step, we calculated the frequency of each mutation information defined by the
following formula used by [19].
The table 2 present the frequencies of group mutations of the first exon of β-globin gene of
eleven species based on the three symbolic sequences of DNA.
5. Computer Science & Information Technology (CS & IT) 17
Table 2. Frequencies of group mutations of 11 species
2.2.2 Position analyses
The position inside sequence is important information Therefore; unsufficient information in a
feature vector is important reason that causes poor similarity results.
For instance, if two sequences have the same frequency of components but have two different
sequencing directions, if we calculate just the frequency similarity. We get them identical, but the
position of their components is completely different, there is no biological relationship between
them. For this reason and to improve the effectiveness of similarity study of DNA sequence, that
considered as a main challenge in the field of bioinformatics sequences. We used the concept of
the position of the DNA components.
We based on the group mutations presented above for calculate the average position (average
distance); we have proposed the following formula.
K is the number of word UV.
Wherein the position of each component is defined as the average position of the word uv divide
by the length of the DNA sequence n.
The Table 3 present the average position of group mutations for the first exon of β-globin gene of
eleven species based on the three symbolic sequences of DNA.
6. 18 Computer Science & Information Technology (CS & IT)
Table 3. Average Position of group mutations of 11 species.
2.2.3 Example
For the Following Sequence
ATGGTGCACCTGAC
We get the three symbolic sequences:
From the three sequences we constructed two twelve-component vectors, the first for calculate
the frequency of group mutations and the second for their average position.
2.3 Similarity and Dissimilarity
In order to analysis the similarity and dissimilarity between two DNA sequences, each sequence
represented by two twelve-component vectors as presented above , The similarities between such
vectors calculated by the Euclidian distance between their end points for both vectors frequency
and average position. In the next, we calculated the average distance by the following formula.
7. Computer Science & Information Technology (CS & IT) 19
SP Define the similarity between two sequences related to the position of their components
(twelve component average position vectors)
SF Define the similarity between two sequences related to the frequency of their components
(twelve component frequency vectors).
More similarity between two DNA sequences defined by smaller value of Euclidian distance.
The following tables present the similarity/dissimalirity matrix of frequencies and average
position of group mutation of 11 species and average similarity between them.
Table 4. Similarity/dissimilarity matrix relative to the frequency of group mutations for 11 species.
Table 5. Similarity/dissimilarity matrix relative to the average position of group mutations for 11 species.
8. 20 Computer Science & Information Technology (CS & IT)
Table 6. Average Similarity/dissimilarity matrix between frequency and position of group mutations for the
11 genes sequences.
We have observed in (Table 6) of similarity above, that is a great similarity between the sequence
of human with gorilla, human with Chimpanzee and chimpanzee with gorilla another similarity
between mouse, lemur, and mouse with rat, such as mouse and rat belong to the same Muridae
mammalian family. Also for the bovine and goat, they belong to the same Bovidae mammalian
family.
Each of opossum and Gallus are far from the rest species, because opossum is the most remote
species from the remaining mammals and the Gallus is the only non-mammalian animal among
all other species of the dataset. However, the rest nine species are mammals family.
The obtained result it is not an accident, but shows the relationship in evolutionary sense between
the twelve species.
The relationship between the 11 species according our own DNA analysis presented in the
following dendrogram.
Figure 1. The dendrogram of twelve species.
9. Computer Science & Information Technology (CS & IT) 21
3. CONCLUSION AND PERSPECTIVE
Similarity analysis of DNA sequences are still important subjects in bioinformatics, the similarity
between two DNA sequences defined by the frequency and position of their components. The
representation of a DNA sequence by three symbolic sequences helpful to define all possible
mutations groups. We build two-dimensional vectors, the first represents the frequency of
mutation groups and the second represents their average positions,
To calculate the similarity and dissimilarity of DNA sequences, Euclidean distances are applied
based on the frequency and position of mutation groups
The evaluation results of 11 different species coincides with the evolutionary sense. The
proposed method has a wide Range of applicability for analysis of biological sequence.
REFERENCES
[1] Gish W, Miller W, Myers E, Lipman D, AltschulS: Basic local alignment search tool. J Mol Biol ,
215(3):403-410. doi:10.1016/S0022-2836(05)80360-2 (1990).
[2] Lipman DJ, Pearson WR: Rapid and sensitive protein similarity searches. Science, 227:1435-1441,
(1985).
[3] Edgar RC: Search and clustering orders of magnitude faster than blast Bioinformatics, 26:2460-2461,
(2010).
[4] Li WZ, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or
nucleotide sequences. Bioinformatics, 22:1658-1659, (2006).
[5] Pham TD, Zuegg J: A probabilistic measure for alignment-free sequence comparison. Bioinformatics
, 20:3455-3461, (2004).
[6] Freno A: Selecting features by learning markov blankets. Lect Notes Comput Sci, 4692:69-76,
(2007).
[7] Deshpande M, Karypis G: Evaluation of techniques for classifying biological sequences. Lect Notes
Comput Sci, 2336:417-431, (2002).
[8] Blaisdell BE: A measure of the similarity of sets of sequences not requiring sequence alignment. Proc
Natl Acad Sci U S A, 83(14):5155-5159, (1986).
[9] Vinga S, Almeida J: Alignment-free sequence comparison–a review. Bioinformatics, 19:513-523,
(2003).
[10] Wu,T.J.,Hsieh,Y.C.and Li,L. A. Statistical measures of DNA dissimilarity under Markov chain
models of base composition. Biometrics, 57,441–448, (2001).
[11] Dai Q, Liu XQ, Yao YH, Zhao FK: Numerical characteristics of word frequencies and their
application to dissimilarity measure for sequence comparison. J Theor Biol , 276:174-180, (2011).
[12] Zhao B, He RL, Yau SS: A new distribution vector and its application in genome clustering. Mol
Phylogenet Evol , 59:438-443, (2011).
[13] Hamori, E., Ruskin, J., Curves, H.: A Novel Method of Representation of Nucleotide Series
Especially Suited for Long DNA Sequences. J. Biol. Chem. 258, 1318–1327 (1983).
[14] Qi, Z., Qi, X.: Novel 2D graphical representation of DNA sequence based on dual nucleotides. Chem.
Phys. Lett. 440, 139–144 (2007).
[15] Gates, M.A.: A Simple way to look at DNA. J. Theor. Biol. 119, 319–328 (1986).
[16] Guo, X.F., Randic, M., Basak, S.C.: A novel 2-D graphical representation of DNA sequences of low
degeneracy. Chem. Phys. Lett. 350, 106–112 (2001).
[17] Randic, M., Vrakoc, M., Lers, N., Plsvsic, D.: Novel 2-D graphical representation of DNA sequences
and their numerical characterization. Chem. Phys. Lett. 368, 1–6 (2003).
[18] Liu, X.Q., Dai, Q., Xiu, Z.L., Wang, T.M.: PNN-curve: A new 2D graphical representation of DNA
sequences and its application. J. Theor. Biol. 243, 555–561 (2006).
10. 22 Computer Science & Information Technology (CS & IT)
[19] i, Z., Fan, T.: PN-curve: A 3D graphical representation of DNA sequences and their numerical
characterization. Chem. Phys. Lett. 442, 434–440 (2007).
[20] Shi L, Huang HL: Dna sequences analysis based on classifications of nucleotide bases. Adv Int Soft
Comput , 137:379-384, (2012).
[21] Yu HJ: Similarity analysis of dna sequences based on three 2-d cumulative ratio curves. Lect Notes
Comput Sci , 6840:462-469, (2012).
[22] Li C, Wang J: Similarity analysis of dna sequences based on the generalized lz complexity of (0,1)-
sequences. J Math Chem , 43:26-31, (2008).
[23] Dong GZ, Pei J: Classification, clustering, features and distances of sequence data. Adv Database
Syst, 33:47-65, (2007).
[24] Nandy, A., Harle, M., Basak, S.C.: Mathematical descriptors of DNA sequences development and
applications. ARKIVOC ix, 211–238 (2006).
[25] Zhao, L., et al.: An S-Curve-Based Approach of Identifying Biological Sequences. Acta Biotheoretica
58(1), 1–14 (2009).
[26] Xie, G., Mo, Z.: Three 3D graphical representations of DNA primary sequences based on the
classifications of DNA bases and their applications. Journal of Theoretical Biology 269(1), 123–130
(2011).
[27] Wu TJ, Huang YH, and Li LA, Optimal word sizes for dissimilarity measures and estimation of the
degree of dissimilarity between DNA sequences. Vol. 21 no .222005, pages 4125–4132 doi:
10.1093/bioinformatics/bti658, (2005).
[28] Sierk M, Person W. Sensitivity and Selectivity in Protein Structure Comparison. Protein Sci.2004 ;
13:773–785.
[29] Krasnogor N, Pelta DA. Measuring the Similarity of Protein Structures by Means of the Universal
Similarity Metric. Bioinformatics. 2004;20:1015–1021.
[30] Reinert G, Schbath S, and Waterman MS. Probabilistic and statistical properties of words: an
overview. J Comput Biol. 2000;7:1–46.
[31] QI. D and Wang T , Comparison study on k-word statistical measures for protein: From sequence to
'sequence space'