Abstract
High dimensional biological datasets in recent years has been growing rapidly. Extracting the knowledge and analyzing highdimensional
biological data is one the key challenges in which variety and veracity are the two distinct characteristics. The
question that arises now is, how to perform dimensionality reduction for this heterogeneous data and how to develop a high
performance platform to efficiently analyze high dimensional biological data and how to find the useful things from this data. To
deeply discuss this issue, this paper begins with a brief introduction to data analytics available for biological data, followed by
the discussions of big data analytics and then a survey on various data reduction methods for biological data. We propose a dense
clustering algorithm for standard high dimensional biological data.
Keywords: Big Data Analytics, Dimensionality Reduction
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
In the field of proteomics because of more data is added, the computational methods need to be more
efficient. The part of molecular sequences is functionally more important to the molecule which is more
resistant to change. To ensure the reliability of sequence alignment, comparative approaches are used. The
problem of multiple sequence alignment is a proposition of evolutionary history. For each column in the
alignment, the explicit homologous correspondence of each individual sequence position is established. The
different pair-wise sequence alignment methods are elaborated in the present work. But these methods are
only used for aligning the limited number of sequences having small sequence length. For aligning
sequences based on the local alignment with consensus sequences, a new method is introduced. From NCBI
databank triticum wheat varieties are loaded. Phylogenetic trees are constructed for divided parts of
dataset. A single new tree is constructed from previous generated trees using advanced pruning technique.
Then, the closely related sequences are extracted by applying threshold conditions and by using shift
operations in the both directions optimal sequence alignment is obtained.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Improved wolf algorithm on document images detection using optimum mean techn...journalBEEI
Detection text from handwriting in historical documents provides high-level features for the challenging problem of handwriting recognition. Such handwriting often contains noise, faint or incomplete strokes, strokes with gaps, and competing lines when embedded in a table or form, making it unsuitable for local line following algorithms or associated binarization schemes. In this paper, a proposed method based on the optimum threshold value and namely as the Optimum Mean method was presented. Besides, Wolf method unsuccessful in order to detect the thin text in the non-uniform input image. However, the proposed method was suggested to overcome the Wolf method problem by suggesting a maximum threshold value using optimum mean. Based on the calculation, the proposed method obtained a higher F-measure (74.53), PSNR (14.77) and lowest NRM (0.11) compared to the Wolf method. In conclusion, the proposed method successful and effective to solve the wolf problem by producing a high-quality output image.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Text documents clustering using modified multi-verse optimizerIJECEIAES
In this study, a multi-verse optimizer (MVO) is utilised for the text document clus- tering (TDC) problem. TDC is treated as a discrete optimization problem, and an objective function based on the Euclidean distance is applied as similarity measure. TDC is tackled by the division of the documents into clusters; documents belonging to the same cluster are similar, whereas those belonging to different clusters are dissimilar. MVO, which is a recent metaheuristic optimization algorithm established for continuous optimization problems, can intelligently navigate different areas in the search space and search deeply in each area using a particular learning mechanism. The proposed algorithm is called MVOTDC, and it adopts the convergence behaviour of MVO operators to deal with discrete, rather than continuous, optimization problems. For evaluating MVOTDC, a comprehensive comparative study is conducted on six text document datasets with various numbers of documents and clusters. The quality of the final results is assessed using precision, recall, F-measure, entropy accuracy, and purity measures. Experimental results reveal that the proposed method performs competitively in comparison with state-of-the-art algorithms. Statistical analysis is also conducted and shows that MVOTDC can produce significant results in comparison with three well-established methods.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERijnlc
We aim to model an adaptive log file parser. As the content of log files often evolves over time, we established a dynamic statistical model which learns and adapts processing and parsing rules. First, we limit the amount of unstructured text by clustering based on semantics of log file lines. Next, we only take the most relevant cluster into account and focus only on those frequent patterns which lead to the desired output table similar to Vaarandi [10]. Furthermore, we transform the found frequent patterns and the output stating the parsed table into a Hidden Markov Model (HMM). We use this HMM as a specific, however, flexible representation of a pattern for log file parsing to maintain high quality output. After training our model on one system type and applying it to a different system with slightly different log file patterns, we achieve an accuracy over 99.99%
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
In the field of proteomics because of more data is added, the computational methods need to be more
efficient. The part of molecular sequences is functionally more important to the molecule which is more
resistant to change. To ensure the reliability of sequence alignment, comparative approaches are used. The
problem of multiple sequence alignment is a proposition of evolutionary history. For each column in the
alignment, the explicit homologous correspondence of each individual sequence position is established. The
different pair-wise sequence alignment methods are elaborated in the present work. But these methods are
only used for aligning the limited number of sequences having small sequence length. For aligning
sequences based on the local alignment with consensus sequences, a new method is introduced. From NCBI
databank triticum wheat varieties are loaded. Phylogenetic trees are constructed for divided parts of
dataset. A single new tree is constructed from previous generated trees using advanced pruning technique.
Then, the closely related sequences are extracted by applying threshold conditions and by using shift
operations in the both directions optimal sequence alignment is obtained.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Improved wolf algorithm on document images detection using optimum mean techn...journalBEEI
Detection text from handwriting in historical documents provides high-level features for the challenging problem of handwriting recognition. Such handwriting often contains noise, faint or incomplete strokes, strokes with gaps, and competing lines when embedded in a table or form, making it unsuitable for local line following algorithms or associated binarization schemes. In this paper, a proposed method based on the optimum threshold value and namely as the Optimum Mean method was presented. Besides, Wolf method unsuccessful in order to detect the thin text in the non-uniform input image. However, the proposed method was suggested to overcome the Wolf method problem by suggesting a maximum threshold value using optimum mean. Based on the calculation, the proposed method obtained a higher F-measure (74.53), PSNR (14.77) and lowest NRM (0.11) compared to the Wolf method. In conclusion, the proposed method successful and effective to solve the wolf problem by producing a high-quality output image.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Text documents clustering using modified multi-verse optimizerIJECEIAES
In this study, a multi-verse optimizer (MVO) is utilised for the text document clus- tering (TDC) problem. TDC is treated as a discrete optimization problem, and an objective function based on the Euclidean distance is applied as similarity measure. TDC is tackled by the division of the documents into clusters; documents belonging to the same cluster are similar, whereas those belonging to different clusters are dissimilar. MVO, which is a recent metaheuristic optimization algorithm established for continuous optimization problems, can intelligently navigate different areas in the search space and search deeply in each area using a particular learning mechanism. The proposed algorithm is called MVOTDC, and it adopts the convergence behaviour of MVO operators to deal with discrete, rather than continuous, optimization problems. For evaluating MVOTDC, a comprehensive comparative study is conducted on six text document datasets with various numbers of documents and clusters. The quality of the final results is assessed using precision, recall, F-measure, entropy accuracy, and purity measures. Experimental results reveal that the proposed method performs competitively in comparison with state-of-the-art algorithms. Statistical analysis is also conducted and shows that MVOTDC can produce significant results in comparison with three well-established methods.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERijnlc
We aim to model an adaptive log file parser. As the content of log files often evolves over time, we established a dynamic statistical model which learns and adapts processing and parsing rules. First, we limit the amount of unstructured text by clustering based on semantics of log file lines. Next, we only take the most relevant cluster into account and focus only on those frequent patterns which lead to the desired output table similar to Vaarandi [10]. Furthermore, we transform the found frequent patterns and the output stating the parsed table into a Hidden Markov Model (HMM). We use this HMM as a specific, however, flexible representation of a pattern for log file parsing to maintain high quality output. After training our model on one system type and applying it to a different system with slightly different log file patterns, we achieve an accuracy over 99.99%
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
The current trend in the industry is to analyze large data sets and apply data mining, machine learning techniques to identify a pattern. But the challenges with huge data sets are the high dimensions associated with it. Sometimes in data analytics applications, large amounts of data produce worse performance. Also, most of the data mining algorithms are implemented column wise and too many columns restrict the performance and make it slower. Therefore, dimensionality reduction is an important step in data analysis. Dimensionality reduction is a technique that converts high dimensional data into much lower dimension, such that maximum variance is explained within the first few dimensions. This paper focuses on multivariate statistical and artificial neural networks techniques for data reduction. Each method has a different rationale to preserve the relationship between input parameters during analysis. Principal Component Analysis which is a multivariate technique and Self Organising Map a neural network technique is presented in this paper. Also, a hierarchical clustering approach has been applied to the reduced data set. A case study of Air quality measurement has been considered to evaluate the performance of the proposed techniques.
Enhancing the performance of cluster based text summarization using support v...eSAT Journals
Abstract Technology is evolving day by day and this increase in technology is nothing but is the efforts to reduce human work and to have systems as automatic as possible. Same thing is true in terms of existence of digital information. Due to enormous increase in the use of internet, there is striking increase in the digital information. This digital information is characterized by different form of information, same information in different form, unrelated information and also there is lot of redundant information. Another next important thing to note is that most of the time we require textual information. To search or retrieve small information one has to go through thousands of documents, read all the retrieved documents irrespective whether they contain useful information or no. It becomes very difficult to read all the retrieved documents and prepare exact summary out of it within time. Besides this, many times retrieved information is repeated in almost many documents. This leads to research in the area of text mining. Text summarization is one of the challenging tasks in the field of text mining. Keywords: Entropy, FCM, Purity, SVM
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...ijiert bestjournal
In Natural Scene Image,Text detection is important tasks which are used for many content based image analysis. A maximally stable external region based method is us ed for scene detection .This MSER based method incl udes stages character candidate extraction,text candida te construction,text candidate elimination & text candidate classification. Main limitations of this method are how to detect highly blurred text in low resolutio n natural scene images. The current technology not focuses on any t ext extraction method. In proposed system a Conditi onal Random field (CRF) model is used to assign candidat e component as one of the two classes (text& Non Te xt) by Considering both unary component properties and bin ary contextual component relationship. For this pur pose we are using connected component analysis method. The proposed system also performs a text extraction usi ng OCR
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
Biclusters are required to analyzing gene expression patterns of genes comparing rows in expression profiles and analyzing expression profiles of samples by comparing columns in gene expression matrix. In the process of biclustering we need to cluster genes and samples. The algorithm presented in this paper is based upon the two-way clustering approach in which the genes and samples are clustered using parallel fuzzy C-means clustering using message passing interface, we call it MFCM. MFCM applied for clustering on genes and samples which maximize membership function values of the data set. It is a parallelized rework of a parallel fuzzy two-way clustering algorithm for microarray gene expression data [9], to study the efficiency and parallelization improvement of the algorithm. The algorithm uses gene entropy measure to filter the clustered data to find biclusters. The method is able to get highly correlated biclusters of the gene expression dataset.
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONsipij
Automatic leaf recognition via image processing has been greatly important for a number of professionals, such as botanical taxonomic, environmental protectors, and foresters. Learn an over-complete leaf dictionary is an essential step for leaf image recognition. Big leaf images dimensions and training images number is facing of fast and complete data leaves dictionary. In this work an efficient approach applies to construct over-complete data leaves dictionary to set of big images diminutions based on sparse representation. In the proposed method a new cropped-contour method has used to crop the training image. The experiments are testing using correlation between the sparse representation and data dictionary and with focus on the computing time.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
Abstract Urban watersheds produce an instantaneous response to rainfall. That results in stormwater runoff in excess of the capacity of drainage systems. The excess stormwater must be managed to prevent flooding and erosion of streams. Management can be achieved with the help of structural stormwater Best Management Practices (BMPs). Detention ponds is one such BMP commonly found in the Austin, TX, USA. The City of Austin developed a plan to mitigate future events of flooding and erosion, resulting in the development and integration of stormwater BMP algorithms into the sub-hourly version of SWAT model. This paper deals with the development of a physically based algorithm for detention pond. The algorithm was tested using a previously flow-calibrated watershed in the Austin area. From the test results obtained it appears that the detention pond algorithm is functioning satisfactorily. The algorithm developed could be used a) to evaluate the functionality of individual detention pond b) to analyze the benefits of such structures at watershed or higher scales and c) as design tool. Keywords: flooding, detention, urban, watershed, BMP, algorithm, stormwater, modeling
Effects of solidification time on mechanical properties and wear behaviour of...eSAT Journals
Abstract An investigation was carried out to understand the effects of solidification time of the cylindrical specimen on hardness, wear rate and the coefficient of friction of Aluminium alloy (LM4). The commercially available LM4 Aluminium alloy was melted in a crucible furnace under argon atmosphere. The molten metal was poured into sand moulds having dissimilar size holes. The cast rods were tested for Vickers micro-hardness, wear rate and the coefficient of friction. The solidification time increases from 0.22 to 0.61 seconds when the diameter varied from 20 to 35mm. It was found that the hardness of the alloy vary with the diameter of the rod. Wear rate is inversely proportional to the hardness of the alloy. It was observed that the yield strength and tensile strength are increases with hardness, whereas the % elongation decreases. Further, the coefficient of friction is independent of hardness and not affected by the sliding time. Wear rate decreases linearly with hardness. Keywords: LM4- alloy, coefficient of friction, wear rate, solidification time, sand cast.
Feasibility study of mtbe physical adsorption from polluted water on gac, pac...eSAT Journals
Abstract MTBE or Methyl Tertiary Butyl Ether is an organic compound, which is used to increase the gasoline Octane Number. At the beginning of 80’s, by discovering the undesirable effects of tetra ethyl lead usage in fuel, MTBE started to be used worldwide. But gradually the undesirable effects of MTBE on environment had been revealed. There are many technologies for MTBE removal from polluted water. Adsorption is the most conventional and economical technology. In this research, some experiments have been done for studying the adsorption of MTBE on different solid adsorbent in batch process. In these experiments a fixed amount of adsorbents including Granular Activated Carbon (GAC), Powdered Activated Carbon (PAC) and the Husk Rice Carbon (HRC) have been put in different one litter covered vessels containing water polluted with known initial MTBE concentration and stirring them. By measuring MTBE concentration in the vessel at different times the effect of different operating parameters such as temperature and pH have been studied on adsorption and optimum condition have been determined. The batch experimental results have been used to calculate the constant parameters of Freundlich and Langmuir adsorption isotherm equations for these systems. Keywords: MTBE, Adsorption, Activated Carbon, Husk Rice Carbon
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
The current trend in the industry is to analyze large data sets and apply data mining, machine learning techniques to identify a pattern. But the challenges with huge data sets are the high dimensions associated with it. Sometimes in data analytics applications, large amounts of data produce worse performance. Also, most of the data mining algorithms are implemented column wise and too many columns restrict the performance and make it slower. Therefore, dimensionality reduction is an important step in data analysis. Dimensionality reduction is a technique that converts high dimensional data into much lower dimension, such that maximum variance is explained within the first few dimensions. This paper focuses on multivariate statistical and artificial neural networks techniques for data reduction. Each method has a different rationale to preserve the relationship between input parameters during analysis. Principal Component Analysis which is a multivariate technique and Self Organising Map a neural network technique is presented in this paper. Also, a hierarchical clustering approach has been applied to the reduced data set. A case study of Air quality measurement has been considered to evaluate the performance of the proposed techniques.
Enhancing the performance of cluster based text summarization using support v...eSAT Journals
Abstract Technology is evolving day by day and this increase in technology is nothing but is the efforts to reduce human work and to have systems as automatic as possible. Same thing is true in terms of existence of digital information. Due to enormous increase in the use of internet, there is striking increase in the digital information. This digital information is characterized by different form of information, same information in different form, unrelated information and also there is lot of redundant information. Another next important thing to note is that most of the time we require textual information. To search or retrieve small information one has to go through thousands of documents, read all the retrieved documents irrespective whether they contain useful information or no. It becomes very difficult to read all the retrieved documents and prepare exact summary out of it within time. Besides this, many times retrieved information is repeated in almost many documents. This leads to research in the area of text mining. Text summarization is one of the challenging tasks in the field of text mining. Keywords: Entropy, FCM, Purity, SVM
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...ijiert bestjournal
In Natural Scene Image,Text detection is important tasks which are used for many content based image analysis. A maximally stable external region based method is us ed for scene detection .This MSER based method incl udes stages character candidate extraction,text candida te construction,text candidate elimination & text candidate classification. Main limitations of this method are how to detect highly blurred text in low resolutio n natural scene images. The current technology not focuses on any t ext extraction method. In proposed system a Conditi onal Random field (CRF) model is used to assign candidat e component as one of the two classes (text& Non Te xt) by Considering both unary component properties and bin ary contextual component relationship. For this pur pose we are using connected component analysis method. The proposed system also performs a text extraction usi ng OCR
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
Biclusters are required to analyzing gene expression patterns of genes comparing rows in expression profiles and analyzing expression profiles of samples by comparing columns in gene expression matrix. In the process of biclustering we need to cluster genes and samples. The algorithm presented in this paper is based upon the two-way clustering approach in which the genes and samples are clustered using parallel fuzzy C-means clustering using message passing interface, we call it MFCM. MFCM applied for clustering on genes and samples which maximize membership function values of the data set. It is a parallelized rework of a parallel fuzzy two-way clustering algorithm for microarray gene expression data [9], to study the efficiency and parallelization improvement of the algorithm. The algorithm uses gene entropy measure to filter the clustered data to find biclusters. The method is able to get highly correlated biclusters of the gene expression dataset.
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONsipij
Automatic leaf recognition via image processing has been greatly important for a number of professionals, such as botanical taxonomic, environmental protectors, and foresters. Learn an over-complete leaf dictionary is an essential step for leaf image recognition. Big leaf images dimensions and training images number is facing of fast and complete data leaves dictionary. In this work an efficient approach applies to construct over-complete data leaves dictionary to set of big images diminutions based on sparse representation. In the proposed method a new cropped-contour method has used to crop the training image. The experiments are testing using correlation between the sparse representation and data dictionary and with focus on the computing time.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
Abstract Urban watersheds produce an instantaneous response to rainfall. That results in stormwater runoff in excess of the capacity of drainage systems. The excess stormwater must be managed to prevent flooding and erosion of streams. Management can be achieved with the help of structural stormwater Best Management Practices (BMPs). Detention ponds is one such BMP commonly found in the Austin, TX, USA. The City of Austin developed a plan to mitigate future events of flooding and erosion, resulting in the development and integration of stormwater BMP algorithms into the sub-hourly version of SWAT model. This paper deals with the development of a physically based algorithm for detention pond. The algorithm was tested using a previously flow-calibrated watershed in the Austin area. From the test results obtained it appears that the detention pond algorithm is functioning satisfactorily. The algorithm developed could be used a) to evaluate the functionality of individual detention pond b) to analyze the benefits of such structures at watershed or higher scales and c) as design tool. Keywords: flooding, detention, urban, watershed, BMP, algorithm, stormwater, modeling
Effects of solidification time on mechanical properties and wear behaviour of...eSAT Journals
Abstract An investigation was carried out to understand the effects of solidification time of the cylindrical specimen on hardness, wear rate and the coefficient of friction of Aluminium alloy (LM4). The commercially available LM4 Aluminium alloy was melted in a crucible furnace under argon atmosphere. The molten metal was poured into sand moulds having dissimilar size holes. The cast rods were tested for Vickers micro-hardness, wear rate and the coefficient of friction. The solidification time increases from 0.22 to 0.61 seconds when the diameter varied from 20 to 35mm. It was found that the hardness of the alloy vary with the diameter of the rod. Wear rate is inversely proportional to the hardness of the alloy. It was observed that the yield strength and tensile strength are increases with hardness, whereas the % elongation decreases. Further, the coefficient of friction is independent of hardness and not affected by the sliding time. Wear rate decreases linearly with hardness. Keywords: LM4- alloy, coefficient of friction, wear rate, solidification time, sand cast.
Feasibility study of mtbe physical adsorption from polluted water on gac, pac...eSAT Journals
Abstract MTBE or Methyl Tertiary Butyl Ether is an organic compound, which is used to increase the gasoline Octane Number. At the beginning of 80’s, by discovering the undesirable effects of tetra ethyl lead usage in fuel, MTBE started to be used worldwide. But gradually the undesirable effects of MTBE on environment had been revealed. There are many technologies for MTBE removal from polluted water. Adsorption is the most conventional and economical technology. In this research, some experiments have been done for studying the adsorption of MTBE on different solid adsorbent in batch process. In these experiments a fixed amount of adsorbents including Granular Activated Carbon (GAC), Powdered Activated Carbon (PAC) and the Husk Rice Carbon (HRC) have been put in different one litter covered vessels containing water polluted with known initial MTBE concentration and stirring them. By measuring MTBE concentration in the vessel at different times the effect of different operating parameters such as temperature and pH have been studied on adsorption and optimum condition have been determined. The batch experimental results have been used to calculate the constant parameters of Freundlich and Langmuir adsorption isotherm equations for these systems. Keywords: MTBE, Adsorption, Activated Carbon, Husk Rice Carbon
A challenge for security and service level agreement in cloud computingeeSAT Journals
Abstract One of the most promising field in computing is regarded as cloud computing, which allows users to store their data virtually over the cloud without the need of any fixed infrastructure. It offer user to manipulate their data regardless of the geographical position, hence providing a scalable, feasible and flexible way to connect user to their data. The cloud can be considered as of the Internet where data are usually stored and computing refers to the applications and services that it can provide. Users are connected to cloud by Cloud Service Providers through the Internet connection. Both users and cloud service provider agree on a protocol known as service level agreements for exchange of information. Cloud computing provides increase in capacity and grants the capability to perform computation on cloud infrastructure to its users. Cloud Computing is capable of providing more convenient communication and instant multi-point collaboration feature. Cloud computing provides better utilization of distributed resource over large amount of data and they can be access remotely through the internet. Quality of service should be kept in mind will designing the service level agreement. There are many problem related to cloud computing such as traffic managements, security and privacy. This paper provides a survey on the basic working principle of clouding computing. This paper provides an overview of the cloud architecture including the different types and level of clouds services. In this paper focus has been made upon the service level agreement, issue and security prospectus over the cloud computing. Keywords: Architecture, Cloud Computing, Cloud Security, Cloud Services Service Level Agreements
Efficient reconfigurable architecture of baseband demodulator in sdreSAT Journals
Abstract This paper presents the simulation architecture and performance analysis with the use of ZCD technic. A Zero-Crossing based All-Digital Baseband Demodulation architecture is proposed in this work. This architecture supports demodulation of all modulation schemes including MSK, PSK, FSK, and QAM. The proposed structure is very low area, low power, and low latency and can operate in real-time. Moreover it can switch, in run-time, between multiple modulation schemes like GMSK (GSM), QPSK (CDMA), GFSK (Bluetooth), 8-PSK (EDGE), Offset-QPSK (W-CDMA), etc. In addition, the phase resolution of the demodulator is scalable with performance. In addition, bit-wise amplitude quantization based quad-decomposition approach is utilized to demodulate higher order M-ary QAM modulations such as 16-QAM & 64-QAM, which is also a highly scalable architecture. This structure of demodulator provides energy-efficient and resource-efficient implementation of various wireless standards in physical layer of SDR. Keywords — Physical layer, Mobile and Wireless Communication, Software Defined radio (SDR), Zero Cross Detection (ZCD), Modulation Schemes, Architecture, high level synthesis, FPGA.
Automatic test packet generation in networkeSAT Journals
Abstract Now a day’s we see that networks are widely distributed so administrators depends on various tools such as ping and traceroute to rectify the problem in the network. We proposed an automated and systematic approach for testing and debugging network called "Automatic Test Packet Generation"(ATPG). Initially ATPG reads router configuration and then generates a model which is device freelance. The model is used to generate the minimum number of test packets to cover every link and rule in network. ATPG is capable for detecting both functional and performance problems. Test packets are sent at regular intervals and special technique is used to localize faults. Keywords: Test Packet Generation Algorithm; Network Troubleshooting; Data Plane Analysis.
Studies on seasonal variation of ground water quality using multivariate anal...eSAT Journals
Abstract
In this study the seasonal variability of groundwater quality parameters in Bidar urban and its industrial area are investigated. Three
water samples each from 35 wards were collected and subjected for physico-chemical analysis. Average of the three samples represents
each ward data. Analysis was done for pre-monsoon and post-monsoon seasons of the years 2009, 2010 & 2011. Seventeen physicochemical
parameters viz., pH, total hardness, calcium, magnesium, chloride, nitrate, sulfate, total dissolved solids, iron, fluoride,
sodium, potassium, alkalinity, manganese, zinc, dissolved oxygen & total solids were analyzed. Factor analysis is applied on the data
set to investigate the origin of the water pollution sources. FA yielded three factors for each season (combined pre-monsoon &
combined post-monsoon) with 62.8% and 61.6 % total variance respectively; in addition FA identifies anthropogenic factor (i.e.,
industrial, urban sewage and agricultural drainage) and natural factor (top soil mixture with water and percolation, weathering of
ground strata) as latent pollution sources. Hierarchical cluster analysis grouped 35 sampling stations of Bidar urban into three
clusters, i.e., relatively less polluted (LP), and moderately polluted (MP) and highly polluted (HP) sites, based on the similarity of water
quality characteristics.
Keywords: Bidar urban & its industrial area, Ground water quality, Correlation coefficients, Factor analysis, Cluster
analysis
Environmental impact due to the doubling of green house gases through global ...eSAT Journals
Abstract
Anthropogenic carbon is responsible for both global warming and ocean acidification. Efforts are underway to understand the role of the ocean in a high CO2 world on a global context. Atmospheric carbon dioxide (CO2) concentration has continued to increase and is now almost 100 ppm above its pre-industrial level. Combining our reconstruction with the known history of the anthropogenic emissions gives us a more precise and detailed view of the terrestrial biosphere sink. The term 'Greenhouse Effect' refers to the way certain gases trap heat in the atmosphere, much as the glass in a greenhouse prevents rising warm air from escaping. The greenhouse effect is a process where energy from the sun readily penetrates into the lower atmosphere and onto the surface of Earth and is converted to heat, but then cannot freely leave the planet. Due to the presence of certain greenhouse gases that trap heat, like carbon dioxide, methane, water vapour, and CFCs, the atmosphere retains the suns radiation and warms up the planet. By increasing the abundance of these gases in the atmosphere, humankind is increasing the overall warming of the Earth’s surface and lower atmosphere, a process called "global warming. CO2 is a greenhouse gas and as the IPCC report shows, its radiative effect is greater than that of all the other anthropogenic GHG gases.
A new conceptual algorithm for adaptive route changing in urban environmentseSAT Journals
Abstract In this paper an attempt is shown in a Mathematical Model of an alternating route adapting for large scale, which was previously done in small scale. This evaluation is done considering 75 and above traffic junctions with a scale of 4 to 6 vehicular density per second in each junction. The main objective of this paper is to show how an Adaptive Route Changing [ARC] application responds to assumed scenario & also proposing effective software architecture to monitor traffic in all the junctions at real time to minimise traffic congestion. Keywords : ARC, Intelligent transport system, Traffic Monitoring.
Improved spambase dataset prediction using svm rbf kernel with adaptive boosteSAT Journals
Abstract Spam is no more garbage but risk as it includes virus attachments and spyware agents which make the recipients’ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning algorithms have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should deal with it. In this paper we have proposed Hybrid classifier Adaptive boost with support vector machine RBF kernel on Spambase dataset. We have also extracted the features first by Principal component analysis. General Terms: Email Spam classification. Keywords: Adaboost, classifier, ensemble, machine learning, spam email, SVM.
Analysis of problems of biomass grinder integrated with briquetting planteSAT Journals
Abstract In briquetting plant, briquettes are produced through small pieces of soybean stalk, pigeon pea straw, lantana stalk, cotton stalk with moisture content below 10%. Biomass grinding machine is an important part of the briquetting plant, whose function is to convert aggregate material in to small pieces. Biomass grinder consists of mainly hammer, grate and shaft. Shaft is an essential component of grinding machine, when holds hammers, pulley and fan. The failures of shaft bearing were observed due to non uniform distribution of stress. The study was under taken to conduct simulation study of shaft for stress analysis with the help of FE software ANSYS 12.0.1. Keywords: Biomass Grinder, FEM, Stress-strain, Stepped Shaft
A survey on efficient no reference blur estimation methodseSAT Journals
Abstract Blur estimation in image processing has come to be of great importance in assessing the quality of images. This work presents a survey of no-reference blur estimation methods. A no-reference method is particularly useful, when the input image used for blur estimation does not have an available corresponding reference image. This paper provides a comparison of the methodologies of four no-reference blur estimation methods. The first method applies a scale adaptive technique of blur estimation to get better accuracy in the results. The second blur metric involves finding the energy using second order derivatives of an image using derivative pair of quadrature filters. The third blur metric is based on the kurtosis measurement in the discrete dyadic wavelet transform (DDWT) of the images. The fourth method of blur estimation is obtained by finding the ratio of sum of the edge widths of all the detected edges to the total number of edges. The results provided are useful in comparing the methods based on metrics like Spearman correlation coefficient. The results are obtained by evaluation on images from the Laboratory for Image and Video Engineering (LIVE) database. The various methods are evaluated on the images by adding varying content of noise. The performance is evaluated for 3 different categories namely Gaussian blur, motion blur and also JPEG2000 compressed images. Blur estimation finds its application in quality assessment, image fusion and auto-focusing in images. The sharpness of an image can also be found from the blur metric as sharpness is inversely proportional to blur. Sharpness metrics can also be combined with other metrics to measure overall quality of the image. Keywords: Blur estimation, blur, no-reference
Dynamic solar powered robot using dc dc sepic topologyeSAT Journals
Abstract This paper provides an idea to maintain constant voltage to the charging battery at both low and high level light intensity by using SEPIC topology in solar powered robot. Here two batteries are used, one is for charging and another is for discharging. When battery 1 get charged in mean time the battery 2 which is already charged is used to run the robot, Switching will be controlled by ARM processor using DPDT relay, and also voltage and current levels are monitored. Maximum Power Point tracking of solar will be done by single axis tracking system. Keywords: Battery system, Solar MPPT, Robot model, SEPIC topology
A survey on full reference image quality assessment algorithmseSAT Journals
Abstract Image quality measurement is very important for various image processing applications such as recognition, retrieval, classification, compression, restoration and similar fields. The images may contain different types of distortions like blur, noise, contrast change etc. So it is essential to rate the image quality appropriately. Traditionally subjective rating methods are used to evaluate the quality of the image, in which humans rated the image quality based on time requirements. This is a costly process and it needs experts for evaluating image quality. Nowadays many image quality assessment algorithms are available for finding the quality of images. These are mainly based on the properties of human visual system. These image quality assessment algorithms are the objective methods for finding quality. Most of the methods are heuristic and limited to specific application environment. Whereas some methods perform efficiently and having comparable performance with the subjective ratings. Objective methods are easy for integrating into various image compression techniques and other image processing applications. Based on the availability of the image, image quality assessment algorithms are classified into full reference, reduced reference and no reference respectively. Full reference algorithms normally adopt a two stage structure including local quality measurement and pooling to get the quality value. The input for this two stage structure includes a reference image and a distorted image. This paper presents a survey on the existing image quality assessment algorithms based on full reference method, in which a reference image will be available for finding the quality of the distorted image. Keywords: Image distortion, Image quality assessment, Human visual system
Behaviour of 3 d rc frames with masonry infill under earthquake loads an ana...eSAT Journals
Abstract Moderate and astringent earthquakes have struck different places in the world, causing rigorous damage to reinforced concrete
structures. The bond between the structural elements and masonry in-fills of the building is habitually effected by Earthquake. The
voids between horizontal and vertical resisting elements of the building frame is filled by Masonry in-fills. An infill wall enhances
considerably the strength and rigidity of the structure. It has apperceived that frames with in-fills have more vigor and rigidity
compared to the bare frames Hence this study is about the demeanor of 3D-RC frames with and without masonry in-fills utilizing
E-TABS. parameters were studied like displacement, lateral load distribution, stiffness and overturning moment of the frames and
it is concluded that, the in-fill walls are needed to be considered while designing phase of the structures.
Keywords: Earthquake load, 3D RC Frame, Masonry In-Fill
Simulation of incremental conductance mppt with direct control method using c...eSAT Journals
Abstract PV Module Maximum Power Point Tracker (MPPT) is a photovoltaic system that uses the photovoltaic array as a source of electrical power supply. Every photovoltaic (PV) array has an optimum operating point, called the maximum power point, which varies depending on cell temperature, the insulation level and array voltage. The function of MPPT is needed to operate the PV array at its maximum power point. The design of a Maximum Peak Power Tracking (MPPT) is proposed utilizing a cuk converter topology. Solar panel voltage and current are continuously monitored by a MPPT, and the duty cycle of the cuk converter continuously adjusted to extract maximum power. The design consists of a PV array, DC-DC cuk converter and many such algorithms have been proposed. However, one particular algorithm, the Incremental Conductance method, claimed by many in the literature to be inferior to others, continues to be by far the most widely used method in commercial PV MPPT’s. The general model was implemented on Mat lab, and accepts irradiance and temperature as variable parameters and outputs the I-V characteristic and P-V characteristic Index Terms: PV system; Maximum power point tracking (MPPT); Incremental conductance (Inccond); digital signal processor (dsp)
A study and comparison of olsr, aodv and zrp routing protocols in ad hoc netw...eSAT Journals
Abstract A mobile ad hoc networks (MANET) is characterized by multihop wireless connectivity consisting of independent nodes which move dynamically by changing its network connectivity without the uses of any pre-existent infrastructure. MANET offers[1, 2] such flexibility which helps the network to form anywhere, at any time, as long as two or more nodes are connected and communicate with each other either directly when they are in radio range or via intermediate mobile nodes. Routing is a significant issue and challenge in ad hoc networks and many routing protocols have been proposed like OLSR, AODV, DSDV,DSR, ZRP, and TORA, LAR so far to improve the routing performance and reliability. This research paper describes the characteristics of ad hoc routing protocols OLSR, AODV and ZRP based on the performance metrics like packet delivery ratio, end–to–end delay, throughput and jitter by increasing number of nodes in the network. This comparative study proves that OLSR, ZRP performs well in dense networks in terms of low mobility and low traffic but in high mobility and high traffic environment ZRP performs well than OLSR and AODV. Keywords: MANET, AODV, OLSR, ZRP, routing
Analysis and design of high rise building frame using staad proeSAT Journals
Abstract The Aim of present study “Analysis and design of high rise building by staad pro 2008” is to define proper technique for creating Geometry, cross sections for column and beam etc, developing specification and supports conditions, types of Loads and load combinations. In this study a 30- storey high rise structure is analyzed for seismic and wind load combination using staad pro 2008 and comparison is drawn. Keywords: Analysis, Geometry, Structure, Wind load
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
Abstract There exist many computational methods for finding similarity in gene sequence, finding suitable methods that gives optimal similarity is difficult task. Objective of this project is to find an appropriate method to compute similarity in gene/protein sequence, both within the families and across the families. Many dynamic programming algorithms like Levenshtein edit distance; Longest Common Subsequence and Smith-waterman have used dynamic programming approach to find similarities between two sequences. But none of the method mentioned above have used real benchmark data sets. They have only used dynamic programming algorithms for synthetic data. We proposed a new method to compute similarity. The performance of the proposed algorithm is evaluated using number of data sets from various families, and similarity value is calculated both within the family and across the families. A comparative analysis and time complexity of the proposed method reveal that Smith-waterman approach is appropriate method when gene/protein sequence belongs to same family and Longest Common Subsequence is best suited when sequence belong to two different families. Keywords - Bioinformatics, Gene, Gene Sequencing, Edit distance, String Similarity.
Scalable and efficient cluster based framework for multidimensional indexingeSAT Journals
Abstract Indexing high dimensional data has its utility in many real world applications. Especially the information retrieval process is dramatically improved. The existing techniques could overcome the problem of “Curse of Dimensionality” of high dimensional data sets by using a technique known as Vector Approximation-File which resulted in sub-optimal performance. When compared with VA-File clustering results in more compact data set as it uses inter-dimensional correlations. However, pruning of unwanted clusters is important. The existing pruning techniques are based on bounding rectangles, bounding hyper spheres have problems in NN search. To overcome this problem Ramaswamy and Rose proposed an approach known as adaptive cluster distance bounding for high dimensional indexing which also includes an efficient spatial filtering. In this paper we implement this high-dimensional indexing approach. We built a prototype application to for proof of concept. Experimental results are encouraging and the prototype can be used in real time applications. Index Terms–Clustering, high dimensional indexing, similarity measures, and multimedia databases
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
Clustering of medline documents using semi supervised spectral clusteringeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Clustering of medline documents using semi supervised spectral clusteringeSAT Journals
Abstract We are considering: local-content (LC) information, global-content (GC) information from PubMed and MESH (medical subject heading-MS) for the clustering of bio-medical documents. The performances of MEDLINE document clustering are enhanced from previous methods by combining both the LC and GC. We propose a semi-supervised spectral clustering method to overcome the limitations of representation space of earlier methods. Keywords- document clustering, semi-supervised clustering, spectral clustering
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Similar to Data reduction techniques for high dimensional biological data (20)
Mechanical properties of hybrid fiber reinforced concrete for pavementseSAT Journals
Abstract
The effect of addition of mono fibers and hybrid fibers on the mechanical properties of concrete mixture is studied in the present
investigation. Steel fibers of 1% and polypropylene fibers 0.036% were added individually to the concrete mixture as mono fibers and
then they were added together to form a hybrid fiber reinforced concrete. Mechanical properties such as compressive, split tensile and
flexural strength were determined. The results show that hybrid fibers improve the compressive strength marginally as compared to
mono fibers. Whereas, hybridization improves split tensile strength and flexural strength noticeably.
Keywords:-Hybridization, mono fibers, steel fiber, polypropylene fiber, Improvement in mechanical properties.
Material management in construction – a case studyeSAT Journals
Abstract
The objective of the present study is to understand about all the problems occurring in the company because of improper application
of material management. In construction project operation, often there is a project cost variance in terms of the material, equipments,
manpower, subcontractor, overhead cost, and general condition. Material is the main component in construction projects. Therefore,
if the material management is not properly managed it will create a project cost variance. Project cost can be controlled by taking
corrective actions towards the cost variance. Therefore a methodology is used to diagnose and evaluate the procurement process
involved in material management and launch a continuous improvement was developed and applied. A thorough study was carried
out along with study of cases, surveys and interviews to professionals involved in this area. As a result, a methodology for diagnosis
and improvement was proposed and tested in selected projects. The results obtained show that the main problem of procurement is
related to schedule delays and lack of specified quality for the project. To prevent this situation it is often necessary to dedicate
important resources like money, personnel, time, etc. To monitor and control the process. A great potential for improvement was
detected if state of the art technologies such as, electronic mail, electronic data interchange (EDI), and analysis were applied to the
procurement process. These helped to eliminate the root causes for many types of problems that were detected.
Managing drought short term strategies in semi arid regions a case studyeSAT Journals
Abstract
Drought management needs multidisciplinary action. Interdisciplinary efforts among the experts in various fields of the droughts
prone areas are helpful to achieve tangible and permanent solution for this recurring problem. The Gulbarga district having the total
area around 16, 240 sq.km, and accounts 8.45 per cent of the Karnataka state area. The district has been situated with latitude 17º 19'
60" North and longitude of 76 º 49' 60" east. The district is situated entirely on the Deccan plateau positioned at a height of 300 to
750 m above MSL. Sub-tropical, semi-arid type is one among the drought prone districts of Karnataka State. The drought
management is very important for a district like Gulbarga. In this paper various short term strategies are discussed to mitigate the
drought condition in the district.
Keywords: Drought, South-West monsoon, Semi-Arid, Rainfall, Strategies etc.
Life cycle cost analysis of overlay for an urban road in bangaloreeSAT Journals
Abstract
Pavements are subjected to severe condition of stresses and weathering effects from the day they are constructed and opened to traffic
mainly due to its fatigue behavior and environmental effects. Therefore, pavement rehabilitation is one of the most important
components of entire road systems. This paper highlights the design of concrete pavement with added mono fibers like polypropylene,
steel and hybrid fibres for a widened portion of existing concrete pavement and various overlay alternatives for an existing
bituminous pavement in an urban road in Bangalore. Along with this, Life cycle cost analyses at these sections are done by Net
Present Value (NPV) method to identify the most feasible option. The results show that though the initial cost of construction of
concrete overlay is high, over a period of time it prove to be better than the bituminous overlay considering the whole life cycle cost.
The economic analysis also indicates that, out of the three fibre options, hybrid reinforced concrete would be economical without
compromising the performance of the pavement.
Keywords: - Fatigue, Life cycle cost analysis, Net Present Value method, Overlay, Rehabilitation
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialseSAT Journals
Abstract
The issue of growing demand on our nation’s roadways over that past couple of decades, decreasing budgetary funds, and the need to
provide a safe, efficient, and cost effective roadway system has led to a dramatic increase in the need to rehabilitate our existing
pavements and the issue of building sustainable road infrastructure in India. With these emergency of the mentioned needs and this
are today’s burning issue and has become the purpose of the study.
In the present study, the samples of existing bituminous layer materials were collected from NH-48(Devahalli to Hassan) site.The
mixtures were designed by Marshall Method as per Asphalt institute (MS-II) at 20% and 30% Reclaimed Asphalt Pavement (RAP).
RAP material was blended with virgin aggregate such that all specimens tested for the, Dense Bituminous Macadam-II (DBM-II)
gradation as per Ministry of Roads, Transport, and Highways (MoRT&H) and cost analysis were carried out to know the economics.
Laboratory results and analysis showed the use of recycled materials showed significant variability in Marshall Stability, and the
variability increased with the increase in RAP content. The saving can be realized from utilization of recycled materials as per the
methodology, the reduction in the total cost is 19%, 30%, comparing with the virgin mixes.
Keywords: Reclaimed Asphalt Pavement, Marshall Stability, MS-II, Dense Bituminous Macadam-II
Laboratory investigation of expansive soil stabilized with natural inorganic ...eSAT Journals
Abstract
Soil stabilization has proven to be one of the oldest techniques to improve the soil properties. Literature review conducted revealed
that uses of natural inorganic stabilizers are found to be one of the best options for soil stabilization. In this regard an attempt has
been made to evaluate the influence of RBI-81 stabilizer on properties of black cotton soil through laboratory investigations. Black
cotton soil with varying percentages of RBI-81 viz., 0, 0.5, 1, 1.5, 2, and 2.5 percent were studied for moisture density relationships
and strength behaviour of soils. Also the effect of curing period was evaluated as literature review clearly emphasized the strength
gain of soils stabilized with RBI-81 over a period of time. The results obtained shows that the unconfined compressive strength of
specimens treated with RBI-81 increased approximately by 250% for a curing period of 28 days as compared to virgin soil. Further
the CBR value improved approximately by 400%. The studies indicated an increasing trend for soil strength behaviour with
increasing percentage of RBI-81 suggesting its potential applications in soil stabilization.
Influence of reinforcement on the behavior of hollow concrete block masonry p...eSAT Journals
Abstract
Reinforced masonry was developed to exploit the strength potential of masonry and to solve its lack of tensile strength. Experimental
and analytical studies have been carried out to investigate the effect of reinforcement on the behavior of hollow concrete block
masonry prisms under compression and to predict ultimate failure compressive strength. In the numerical program, three dimensional
non-linear finite elements (FE) model based on the micro-modeling approach is developed for both unreinforced and reinforced
masonry prisms using ANSYS (14.5). The proposed FE model uses multi-linear stress-strain relationships to model the non-linear
behavior of hollow concrete block, mortar, and grout. Willam-Warnke’s five parameter failure theory has been adopted to model the
failure of masonry materials. The comparison of the numerical and experimental results indicates that the FE models can successfully
capture the highly nonlinear behavior of the physical specimens and accurately predict their strength and failure mechanisms.
Keywords: Structural masonry, Hollow concrete block prism, grout, Compression failure, Finite element method,
Numerical modeling.
Influence of compaction energy on soil stabilized with chemical stabilizereSAT Journals
Abstract
Increase in traffic along with heavier magnitude of wheel loads cause rapid deterioration in pavements. There is a need to improve
density, strength of soil subgrade and other pavement layers. In this study an attempt is made to improve the properties of locally
available loamy soil using twin approaches viz., i) increasing the compaction of soil and ii) treating the soil with chemical stabilizer.
Laboratory studies are carried out on both untreated and treated soil samples compacted by different compaction efforts. Studies
show that increase in compaction effort results in increase in density of soil. However in soil treated with chemical stabilizer, rate of
increase in density is not significant. The soil treated with chemical stabilizer exhibits improvement in both strength and performance
properties.
Keywords: compaction, density, subgradestabilization, resilient modulus
Geographical information system (gis) for water resources managementeSAT Journals
Abstract
Water resources projects are inherited with overlapping and at times conflicting objectives. These projects are often of varied sizes
ranging from major projects with command areas of millions of hectares to very small projects implemented at the local level. Thus,
in all these projects there is seldom proper coordination which is essential for ensuring collective sustainability.
Integrated watershed development and management is the accepted answer but in turn requires a comprehensive framework that can
enable planning process involving all the stakeholders at different levels and scales is compulsory. Such a unified hydrological
framework is essential to evaluate the cause and effect of all the proposed actions within the drainage basins.
The present paper describes a hydrological framework developed in the form of a Hydrologic Information System (HIS) which is
intended to meet the specific information needs of the various line departments of a typical State connected with water related aspects.
The HIS consist of a hydrologic information database coupled with tools for collating primary and secondary data and tools for
analyzing and visualizing the data and information. The HIS also incorporates hydrological model base for indirect assessment of
various entities of water balance in space and time. The framework would be maintained and updated to reflect fully the most
accurate ground truth data and the infrastructure requirements for planning and management.
Keywords: Hydrological Information System (HIS); WebGIS; Data Model; Web Mapping Services
Forest type mapping of bidar forest division, karnataka using geoinformatics ...eSAT Journals
Abstract
The study demonstrate the potentiality of satellite remote sensing technique for the generation of baseline information on forest types
including tree plantation details in Bidar forest division, Karnataka covering an area of 5814.60Sq.Kms. The Total Area of Bidar
forest division is 5814Sq.Kms analysis of the satellite data in the study area reveals that about 84% of the total area is Covered by
crop land, 1.778% of the area is covered by dry deciduous forest, 1.38 % of mixed plantation, which is very threatening to the
environmental stability of the forest, future plantation site has been mapped. With the use of latest Geo-informatics technology proper
and exact condition of the trees can be observed and necessary precautions can be taken for future plantation works in an appropriate
manner
Keywords:-RS, GIS, GPS, Forest Type, Tree Plantation
Factors influencing compressive strength of geopolymer concreteeSAT Journals
Abstract
To study effects of several factors on the properties of fly ash based geopolymer concrete on the compressive strength and also the
cost comparison with the normal concrete. The test variables were molarities of sodium hydroxide(NaOH) 8M,14M and 16M, ratio of
NaOH to sodium silicate (Na2SiO3) 1, 1.5, 2 and 2.5, alkaline liquid to fly ash ratio 0.35 and 0.40 and replacement of water in
Na2SiO3 solution by 10%, 20% and 30% were used in the present study. The test results indicated that the highest compressive
strength 54 MPa was observed for 16M of NaOH, ratio of NaOH to Na2SiO3 2.5 and alkaline liquid to fly ash ratio of 0.35. Lowest
compressive strength of 27 MPa was observed for 8M of NaOH, ratio of NaOH to Na2SiO3 is 1 and alkaline liquid to fly ash ratio of
0.40. Alkaline liquid to fly ash ratio of 0.35, water replacement of 10% and 30% for 8 and 16 molarity of NaOH and has resulted in
compressive strength of 36 MPa and 20 MPa respectively. Superplasticiser dosage of 2 % by weight of fly ash has given higher
strength in all cases.
Keywords: compressive strength, alkaline liquid, fly ash
Experimental investigation on circular hollow steel columns in filled with li...eSAT Journals
Abstract
Composite Circular hollow Steel tubes with and without GFRP infill for three different grades of Light weight concrete are tested for
ultimate load capacity and axial shortening , under Cyclic loading. Steel tubes are compared for different lengths, cross sections and
thickness. Specimens were tested separately after adopting Taguchi’s L9 (Latin Squares) Orthogonal array in order to save the initial
experimental cost on number of specimens and experimental duration. Analysis was carried out using ANN (Artificial Neural
Network) technique with the assistance of Mini Tab- a statistical soft tool. Comparison for predicted, experimental & ANN output is
obtained from linear regression plots. From this research study, it can be concluded that *Cross sectional area of steel tube has most
significant effect on ultimate load carrying capacity, *as length of steel tube increased- load carrying capacity decreased & *ANN
modeling predicted acceptable results. Thus ANN tool can be utilized for predicting ultimate load carrying capacity for composite
columns.
Keywords: Light weight concrete, GFRP, Artificial Neural Network, Linear Regression, Back propagation, orthogonal
Array, Latin Squares
Experimental behavior of circular hsscfrc filled steel tubular columns under ...eSAT Journals
Abstract
This paper presents an outlook on experimental behavior and a comparison with predicted formula on the behaviour of circular
concentrically loaded self-consolidating fibre reinforced concrete filled steel tube columns (HSSCFRC). Forty-five specimens were
tested. The main parameters varied in the tests are: (1) percentage of fiber (2) tube diameter or width to wall thickness ratio (D/t
from 15 to 25) (3) L/d ratio from 2.97 to 7.04 the results from these predictions were compared with the experimental data. The
experimental results) were also validated in this study.
Keywords: Self-compacting concrete; Concrete-filled steel tube; axial load behavior; Ultimate capacity.
Evaluation of punching shear in flat slabseSAT Journals
Abstract
Flat-slab construction has been widely used in construction today because of many advantages that it offers. The basic philosophy in
the design of flat slab is to consider only gravity forces; this method ignores the effect of punching shear due to unbalanced moments
at the slab column junction which is critical. An attempt has been made to generate generalized design sheets which accounts both
punching shear due to gravity loads and unbalanced moments for cases (a) interior column; (b) edge column (bending perpendicular
to shorter edge); (c) edge column (bending parallel to shorter edge); (d) corner column. These design sheets are prepared as per
codal provisions of IS 456-2000. These design sheets will be helpful in calculating the shear reinforcement to be provided at the
critical section which is ignored in many design offices. Apart from its usefulness in evaluating punching shear and the necessary
shear reinforcement, the design sheets developed will enable the designer to fix the depth of flat slab during the initial phase of the
design.
Keywords: Flat slabs, punching shear, unbalanced moment.
Evaluation of performance of intake tower dam for recent earthquake in indiaeSAT Journals
Abstract
Intake towers are typically tall, hollow, reinforced concrete structures and form entrance to reservoir outlet works. A parametric
study on dynamic behavior of circular cylindrical towers can be carried out to study the effect of depth of submergence, wall thickness
and slenderness ratio, and also effect on tower considering dynamic analysis for time history function of different soil condition and
by Goyal and Chopra accounting interaction effects of added hydrodynamic mass of surrounding and inside water in intake tower of
dam
Key words: Hydrodynamic mass, Depth of submergence, Reservoir, Time history analysis,
Evaluation of operational efficiency of urban road network using travel time ...eSAT Journals
Abstract
Efficiency of the road network system is analyzed by travel time reliability measures. The study overlooks on an important measure of
travel time reliability and prioritizing Tiruchirappalli road network. Traffic volume and travel time were collected using license plate
matching method. Travel time measures were estimated from average travel time and 95th travel time. Effect of non-motorized vehicle
on efficiency of road system was evaluated. Relation between buffer time index and traffic volume was created. Travel time model has
been developed and travel time measure was validated. Then service quality of road sections in network were graded based on
travel time reliability measures.
Keywords: Buffer Time Index (BTI); Average Travel Time (ATT); Travel Time Reliability (TTR); Buffer Time (BT).
Estimation of surface runoff in nallur amanikere watershed using scs cn methodeSAT Journals
Abstract
The development of watershed aims at productive utilization of all the available natural resources in the entire area extending from
ridge line to stream outlet. The per capita availability of land for cultivation has been decreasing over the years. Therefore, water and
the related land resources must be developed, utilized and managed in an integrated and comprehensive manner. Remote sensing and
GIS techniques are being increasingly used for planning, management and development of natural resources. The study area, Nallur
Amanikere watershed geographically lies between 110 38’ and 110 52’ N latitude and 760 30’ and 760 50’ E longitude with an area of
415.68 Sq. km. The thematic layers such as land use/land cover and soil maps were derived from remotely sensed data and overlayed
through ArcGIS software to assign the curve number on polygon wise. The daily rainfall data of six rain gauge stations in and around
the watershed (2001-2011) was used to estimate the daily runoff from the watershed using Soil Conservation Service - Curve Number
(SCS-CN) method. The runoff estimated from the SCS-CN model was then used to know the variation of runoff potential with different
land use/land cover and with different soil conditions.
Keywords: Watershed, Nallur watershed, Surface runoff, Rainfall-Runoff, SCS-CN, Remote Sensing, GIS.
Estimation of morphometric parameters and runoff using rs & gis techniqueseSAT Journals
Abstract
Land and water are the two vital natural resources, the optimal management of these resources with minimum adverse environmental
impact are essential not only for sustainable development but also for human survival. Satellite remote sensing with geographic
information system has a pragmatic approach to map and generate spatial input layers of predicting response behavior and yield of
watershed. Hence, in the present study an attempt has been made to understand the hydrological process of the catchment at the
watershed level by drawing the inferences from moprhometric analysis and runoff. The study area chosen for the present study is
Yagachi catchment situated in Chickamaglur and Hassan district lies geographically at a longitude 75⁰52’08.77”E and
13⁰10’50.77”N latitude. It covers an area of 559.493 Sq.km. Morphometric analysis is carried out to estimate morphometric
parameters at Micro-watershed to understand the hydrological response of the catchment at the Micro-watershed level. Daily runoff
is estimated using USDA SCS curve number model for a period of 10 years from 2001 to 2010. The rainfall runoff relationship of the
study shows there is a positive correlation.
Keywords: morphometric analysis, runoff, remote sensing and GIS, SCS - method
-
Effect of variation of plastic hinge length on the results of non linear anal...eSAT Journals
Abstract The nonlinear Static procedure also well known as pushover analysis is method where in monotonically increasing loads are applied to the structure till the structure is unable to resist any further load. It is a popular tool for seismic performance evaluation of existing and new structures. In literature lot of research has been carried out on conventional pushover analysis and after knowing deficiency efforts have been made to improve it. But actual test results to verify the analytically obtained pushover results are rarely available. It has been found that some amount of variation is always expected to exist in seismic demand prediction of pushover analysis. Initial study is carried out by considering user defined hinge properties and default hinge length. Attempt is being made to assess the variation of pushover analysis results by considering user defined hinge properties and various hinge length formulations available in literature and results compared with experimentally obtained results based on test carried out on a G+2 storied RCC framed structure. For the present study two geometric models viz bare frame and rigid frame model is considered and it is found that the results of pushover analysis are very sensitive to geometric model and hinge length adopted. Keywords: Pushover analysis, Base shear, Displacement, hinge length, moment curvature analysis
Effect of use of recycled materials on indirect tensile strength of asphalt c...eSAT Journals
Abstract
Depletion of natural resources and aggregate quarries for the road construction is a serious problem to procure materials. Hence
recycling or reuse of material is beneficial. On emphasizing development in sustainable construction in the present era, recycling of
asphalt pavements is one of the effective and proven rehabilitation processes. For the laboratory investigations reclaimed asphalt
pavement (RAP) from NH-4 and crumb rubber modified binder (CRMB-55) was used. Foundry waste was used as a replacement to
conventional filler. Laboratory tests were conducted on asphalt concrete mixes with 30, 40, 50, and 60 percent replacement with RAP.
These test results were compared with conventional mixes and asphalt concrete mixes with complete binder extracted RAP
aggregates. Mix design was carried out by Marshall Method. The Marshall Tests indicated highest stability values for asphalt
concrete (AC) mixes with 60% RAP. The optimum binder content (OBC) decreased with increased in RAP in AC mixes. The Indirect
Tensile Strength (ITS) for AC mixes with RAP also was found to be higher when compared to conventional AC mixes at 300C.
Keywords: Reclaimed asphalt pavement, Foundry waste, Recycling, Marshall Stability, Indirect tensile strength.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Data reduction techniques for high dimensional biological data
1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 05 Issue: 02 | Feb-2016, Available @ http://www.ijret.org 319
DATA REDUCTION TECHNIQUES FOR HIGH DIMENSIONAL
BIOLOGICAL DATA
Shyam Mohan J S1
, P.Shanmugapriya2
, N.Kumaran3
1
Research Scholar, Department of CSE, SCSVMV University, Enathur, Kanchipuram - 631 561
2
Associate Professor, Department of IT, SCSVMV University, Enathur, Kanchipuram - 631 561
3
Assistant Professor, Department of IT, SCSVMV University, Enathur, Kanchipuram - 631 561
Abstract
High dimensional biological datasets in recent years has been growing rapidly. Extracting the knowledge and analyzing high-
dimensional biological data is one the key challenges in which variety and veracity are the two distinct characteristics. The
question that arises now is, how to perform dimensionality reduction for this heterogeneous data and how to develop a high
performance platform to efficiently analyze high dimensional biological data and how to find the useful things from this data. To
deeply discuss this issue, this paper begins with a brief introduction to data analytics available for biological data, followed by
the discussions of big data analytics and then a survey on various data reduction methods for biological data. We propose a dense
clustering algorithm for standard high dimensional biological data.
Keywords: Big Data Analytics, Dimensionality Reduction.
---------------------------------------------------------------------***---------------------------------------------------------------------
1. INTRODUCTION
Data Reduction is a part of the analysis where the data is
shortened, focused and organizing in such a way that
Researchers take effective decisions based on the data
pulled out. Dataset reduction can be done in many ways
.First, data coded in an exploratory analysis. Second, data
partitioning for theoretical or hypothesis testing. Third, by
eliminating the irrelevant data which is not used for
analysis.[1]. High Dimensional data sets in biology arise in
many ways. With the widespread utilization of DNA
microarrays and high throughput, parallel analysis of
thousands of genes for gene expression data has been done
easily. [2].One of the common technique is feature
extraction and dimensionality reduction.[3].Datasets
collaborate with each other (also called Big data) have a
large number of dimensions (attributes) .[4, 5].Each dataset
represents a point in a multidimensional space. Clustering is
an unsupervised learning method which do not require any
supervision, i.e., it groups similar data points into clusters
(dense regions).[6]. As the dimensionality of data increases,
it is difficult to form clusters. Traditional clustering
algorithms like DBSCAN [7] generate clusters in the full-
dimensional space by measuring the proximity and
dimensions of a dataset and therefore these classical
clustering algorithms are ineffective as well as inefficient
for the high-dimensional datasets. [8]. In data mining, data
reduction for low dimension datasets can be done using
principal component analysis [9], singular value
decomposition[10], and dynamic tensor analysis[11].but
however the above techniques fail for high dimension
datasets. Generally, high-dimensional data [12] has two
main implications: (1) Finding the relative contrast between
similar and dissimilar points is difficult as the
dimensionality of data grows.[13]. (2) Grouping of different
data sets with each other.[14].
2. MOTIVATING EXAMPLES
High dimensional Data sets in biology: In biology, high
dimensional data sets are obtained from microarray chips,
which forms a matrix [15].Each cell in the matrix contains
rows and columns , rows contain the expression level of a
gene and columns contain experimental condition. Rows
and columns can be clustered to form meaningful biological
inferences. But, only a subset of genes are considered for
experiments.[16].Forming clusters for biological data is
called as bi clustering for microarray datasets.[17].
Bioinformatics: In bioinformatics, large genomes are
represented using Expressed Sequence Tag (EST) that
contain portions of genes ( mature mRNA) of length at least
500–800 nucleotides long. Finding locations of genes can be
performed by searching.[18]. The task of identifying related
or different genes or protein domains by sequence search
techniques is a complex task. A highly beneficial search for
proteins is Remote homology.[19,20].
3. BACKGROUND AND LITERATURE WORK
Many data reduction techniques have been proposed in the
literature. Data reduction techniques can be divided into two
categories: global alignment and local alignment methods.
Global alignment or full-sequence matching is used in
global optimization, and local alignment method is used in
identifying the local regions that are globally divergent.
Table 1 shows various alignment methods available with
their description.
2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 05 Issue: 02 | Feb-2016, Available @ http://www.ijret.org 320
Table 1: Existing Data Reduction Techniques
SI.
No.
Name References Description
1. BLAST 21 Uses heuristic search for computing long biological sequences.
2. BLAST2 22
Performs limited number of insertions and deletions to increase the speed of
computation.
3. MegaBLAST 23 Uses greedy search algorithm for detecting a variety of sequences.
4.
MPBLAST and
miBLAST
24 Provide results for queries done in parallel.
5. BLAT 25 Provides indexing and linear search for finding the best match in the index.
6. OASIS 26 Performs similarity measure using local alignment technique.
7. The Shift-Or algorithm 27 Used in searching large queries by following bit manipulation technique.
8. RBSA 28
Uses alphabetic reduction technique in which number 1 represents odd characters
and 0 represents even letters.
9.
short-read sequencing
methods
29
Performs near-exact subsequence matching comparison techniques by
assumption. Ex: SOAP
10. q-gram 30
Follows pigeon-hole principle i.e., two sequences share similar or distinguishing
sequences .
For parallel processing analytics, Principal Component
Analysis (PCA) is a well known model for dimensionality
reduction which requires few number of I/O and CPU
operations.[31,32, 33]. It can be applicable to any data set
with numeric dimensions. It is used in data mining for
dimensionality reduction.
EntityRel [34], is a entity correlation graph(biomedical
graph) constructed from unstructured data by computing the
differences between two heterogeneous entities .It extends
the scope of unstructured text data by finding the meta path
based relationship analysis [35] using data mining
techniques.
DRESS, a dimensionality reduction technique used in
finding efficient sequence search is capable of handling
large queries and provides accurate results in optimal time.
Applies mapping transformations for original strings toa
new strings for dimensionality reduction.[36]
IHOSVD is a tensor based dimensionality reduction method
which applies recursively the incremental matrix
decomposition algorithm and periodically updating the
orthogonal bases for computing the new or the core
tensor.[37].In tensor based dimensionality reduction
method, data is categorized into various modules like data:
collection, tensorization, dimensionality reduction, analysis
and service modules. We apply clustering techniques to one
of the module in the above model.
4. RELATED WORK
4.1 Data Tensorization Module
In this module, the collected data is not uniform i.e, it is
usually unstructured, semi-structured or structured data .In
a unified tensor model, the above data is represented in
various sub-tensors and finally combined to form a unified
heterogeneous tensor. Most of the unstructured data includes
video and audio data. Similarly XML documents,
ontological data, etc. are semi-structured data and structured
data is a combination of numbers and strings.[37].For
biological data, we consider semi-structured data in a
subspace cluster.
Definition: Clustering in Subspace
Detecting clusters in free space or subspace is clustering in
subspace. Points may be members belonging to any multiple
clusters existing in free space. [38].
Problem statement: For every different subspaces in
subspace clustering, it is given by the fact that there is a
space with dimensions.
Preliminary
For any point p, we define core or center points, nearby
points and outliers that are clustered together. For any point
p there exists three points as given below:
Point p is a center point if any of the minimum points are
within distance x and which are nearby to p. No points
are nearby from a non-center point.
A point q is nearby to p if there exists a path p1, ..., pn
where p1 = A and pn = B, such that for each pi+1 p is
nearby to pi i.e., all the points on the path must be center
points excluding B.
Far away points which could not be reached are called
outliers.
As discussed in the above statement, any point p forms a
cluster consisting of center points joining other non-center
points that are nearby to the center point. Hence each cluster
consists of at least one center point.
XML Example 1:[39]
<Gene><genes><gene
id="361"><name>BNGT1</name><organism>Human</org
anism><annotations><protein_sequence>
BEPCKMHCNMDHNKG
</protein_sequence></annotations></gene><gene
3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 05 Issue: 02 | Feb-2016, Available @ http://www.ijret.org 321
id="362"><name>BNGT2</name><organism>Human</org
anism><family>angiopoietin</family><annotations><protei
n_sequence>
KQWJIHFTLHCILM
</protein_sequence></annotations></gene></genes><intera
ctions><sequence_similarity id="10001" gene1="361"
gene2="362"><score> 70
</score></sequence_similarity></interactions><relationship
s/></Gene>
Attribute<Protein
Sequence>
Attribute<Id> Gene<Id>
Text Text
Root
Element<Organism>
Element<Human Being>
Figure 1: Parsed tree For the example 1
5. METHOD
Without loss of generosity, we assume that for each
individual cluster we have one data point in step 1. In step 2,
the two neighboring clusters are merged to form a new
cluster leaving one cluster behind and therefore forming a
hierarchical clustering structure. STREAM algorithm is used
for constructing hierarchical clustering structure.[40] In
STREAM, the original data is divided [41]into clusters and
then the small sized clusters are merged with bigger clusters
using agglomerative algorithm.
5.1 Clustering Algorithm
Input: Initialize cluster size say D=30, distance x , minimum
points
Output: All P' x-nearby points and P.
1. Assign C to 0
2. Begin
3. Move For each point in dataset D
4. If(P=visited)
5. Proceed to next point which is nearby
6. mark P =visited
7. Nearbypoints1 = Compute Query(P, x)
8. If((size of NeighborPts1) < minimum points)
9. Then
10. Mark P =Outlier
11. else
12. Move C to next cluster
Expanding Cluster
Input: Initialize cluster size say D=30, distance x , minimum
points
Output: All P' x-nearby points and P.
1. Read Cluster C, Nearby points, P, x, minimum points
2. Move P to cluster C
3. For each point P' in Nearby points1 and
4. If P' is not visited Then
5. Mark P'=visited
6. Nearby points2 = Compute Query(P', x)
7. If((size of Nearby points2) >= minimum points) Then
8. Nearby points = Nearby points1 joined with Nearby
points2
9. If P Cluster C
10. Add P' to cluster C
11. Computed values(P, x)
Human Being Organism
Protein Sequence
(Gene Id, Attribute Id)
Figure 2: Clusters for the sample dataset
Table 2: Keywords and their Description
Keywords Description
ProbeID Unique identifier for each probe sequence
Annotation miRNAs recognized by the probe
miRNA h Human
miRNA m Mouse
miRNA r Rat
Sequence sequence of probe
6. RESULTS
6.1 Datasets
In this paper, we apply clustering techniques for
dimensionality reduction for biological data. The miRNA
sequences can be downloaded from [42].The statistical
results for 30 samples of miRNA sequences is shown in
table 3. For evaluation, we took 30 sample datasets from
table 3.The Iris dataset[43] which can be downloaded from
the UCI machine learning repository[44] and the Rat Central
Nervous System (CNS) dataset can be downloaded from
[45].The Iris dataset contains 50 samples with four features
measured from each sample .[46].The Rat CNS dataset
consists of 112 genes over 17 conditions during rat central
nervous system development.[47].We apply clustering
algorithm for miRNA data sets. Each datasets is provided
with a probe Id followed by annotation and the sequence.
For other datasets we just display the final number of
clusters in table 4.The final result is three clusters with their
sample datasets shown in figure.2.
4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 05 Issue: 02 | Feb-2016, Available @ http://www.ijret.org 322
6.2 Standard Biological Datasets
We applied our clustering algorithm for high dimensional
biological data. Each dataset consists [48]of 500
samples(apprx.) stored over each dataset. For evaluation, we
take the datasets in XML format which is also used in tensor
based dimensionality reduction technique.
Figure 3: Rapid growth of miRNA Sequences
Table 4: Total number of Clusters for Standard High
dimensional Biological Datasets.
Dataset Number of
Samples Taken
Number of
Clusters
Iris dataset 50 2
miRNA Dataset 130(apprx.) 3
Rat CNS dataset 112 genes over 17
conditions
6
cell phenotype
dataset
444 samples in
41dimensions with
4 centers
5
7. CONCLUSION
Forming Clusters for large biological datasets is a difficult
task which involves many algorithms. Real time video
streaming of biological datasets cannot be computed easily
as they contain highly sensitive information. In future we
hope to compute real time video analytics for biological
datasets.
Table 3: Statistical Results of miRNA Sequences
Protein Coding Sequence ID Annotation Protein Sequence of max. length 30
EamA190 h-miR-10b_rfam7.0 PKKMLECCTRCGEYIAMHSIYWQVLDWHAF
EamA187 hmr-miR-107_rfam7.0 AKKMLCKCTRCMEYIAMKSIYWQVLDWHE
EamA185 hmr-miR-103_rfam7.0 AKOMLECCHRCGEYIAMHSIYWQVLDWHA
EamA181 hmr-let-7f_rfam7.0 PBKMLECCTRIGEYIXMHSIYWZVLDWHAF
EamA179 hmr-let-7d_rfam7.0 MAMLECCTRCGEYIAMHSIYWQVLDWHAG
EamA177 mr-miR-101b_rfam7.0 MAHMLECCTRCGEYIAMHSIYWQVLDWHA
EamA175 hmr-miR-320_rfam7.0 KHGMLECCTRCGEYIAMHSIYWQVLDWHAF
EamA168 hmr-let-7e_rfam7.0 PKKMLECCTRFTHYIAMHSIYWQVLDWHAF
EamA161 hmr-miR-28_rfam7.0 SHYTLECCTRCGEYIAMHSIYWQVLDWHAF
EamA160 hmr-miR-26b_rfam7.0 HJYTREYUTRCGEYIAMHSIYWQVLDWHAF
EamA155 hmr-miR-136_rfam7.0 AKKMLCKCTRCMEYIAMKSIYWQVLDWHEF
EamA283 mr-miR-211_rfam7.0 AKOMLECCHRCGEYIAMHSIYWQVLDWHAF
EamA282 m-miR-199b_rfam7.0 PBKMLECCTRIGEYIXMHSIYWZVLDWHAFE
EamA281 mr-miR-217_rfam7.0 MAMLECCTRCGEYIAMHSIYWQVLDWHAG
EamA280 hmr-miR-30a-3p_rfam7.0 MAHMLECCTRCGEYIAMHSIYWQVLDWHAK
EamA279 hmr-miR-29c_rfam7.0 KHGMLECCTRCGEYIAMHSIYWQVLDWHAF
EamA278 hmr-miR-98_rfam7.0 PKKMLECCTRFTHYIAMHSIYWQVLDWHAF
EamA238 hm-miR-1_rfam7.0 SHYTLECCTRCGEYIAMHSIYWQVLDWHAF
EamA270 hmr-miR-30b_rfam7.0 HJYTREYUTRCGEYIAMHSIYWQVLDWHAF
EamA159 hmr-miR-130a_rfam7.0 MAHMLECCTRCGEYIAMHSIYWQVLDWHAK
EamA163 hmr-miR-142-3p_rfam7.0 KHGMLECCTRCGEYIAMHSIYWQVLDWHAF
EamA171 hmr-miR-137_rfam7.0 PKKMLECCTRFTHYIAMHSIYWQVLDWHAF
EamA306 m-miR-201_rfam7.0 SHYTLECCTRCGEYIAMHSIYWQVLDWHAF
EamA307 m-miR-202_rfam7.0 HJYTREYUTRCGEYIAMHSIYWQVLDWHAF
EamA308 hmr-miR-206_rfam7.0 AKKMLCKCTRCMEYIAMKSIYWQVLDWHEF
EamA309 m-miR-207_rfam7.0 AKOMLECCHRCGEYIAMHSIYWQVLDWHAF
EamA310 hmr-miR-208_rfam7.0 PBKMLECCTRIGEYIXMHSIYWZVLDWHAFE
EamA247 hmr-miR-212_rfam7.0 MAMLECCTRCGEYIAMHSIYWQVLDWHAG
EamA251 hmr-miR-216_rfam7.0 MAHMLECCTRCGEYIAMHSIYWQVLDWHAK
EamA253 hmr-miR-218_rfam7.0 KHGMLECCTRCGEYIAMHSIYWQVLDWHAF
5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 05 Issue: 02 | Feb-2016, Available @ http://www.ijret.org 323
REFERENCES
[1].“Handbook For Team-Based Qualitative Research”,
Greg Guest and Kathleen M. MacQueen, Altamira Press,
United Kingdom, 2008.
[2]. Schena M, Shalon D, Davis RW, Brown PO (1995)
Quantitative monitoring of gene expression patterns with a
complementary DNA microarray. Science 270(5235):467–
470.
[3]. Shendure J, Ji H (2008) Next-generation DNA
sequencing. Nat Biotech.26(10):1135–1145.
[4]. Fan J, Han F, Liu H (2014) Challenges of big data
analysis. National Science Review 1(2):293–314.
[5]. Steinbach M, Ertöz L, Kumar V (2004) The challenges
of clustering high dimensional data. In: New directions in
statistical physics. Springer, Berlin Heidelberg. pp 273–
309.
[6].Aggarwal CC, Reddy CK (2013) Data clustering:
algorithms and applications. Data Mining Knowledge and
Discovery Series 1st. CRC Press.
[7]. Ester M, Kriegel H, Sander J, Xu X (1996) A density-
based algorithm for discovering clusters in large spatial
databases with noise. Int Conf Knowl Discov Data Min
96(34):226–231.
[8]. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an
efficient data clustering method for very large databases. In:
Proc. of the ACM SIGMOD international conference on
management of data, vol. 1. ACM Press, USA. pp 103–114.
[9].H. Abdi and L. J. Williams,``Principal component
analysis,'' Wiley Interdiscipl. Rev., Comput. Statist., vol. 2,
no. 4, pp. 433_459,Jul./Aug. 2010.
[10]. M. Brand, ``Incremental singular value decomposition
of uncertain data with missing values,'' in Proc. 7th Eur.
Conf. Comput. Vis. (ECCV), 2002, pp. 707_720.
[11]. J. Sun, D. Tao, and C. Faloutsos, ``Beyond streams and
graphs: Dynamic tensor analysis,'' in Proc. 12th ACM
SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD),
2006, pp. 374_383.
[12]. Bellman RE (1961) Adaptive control processes: a
guided tour. Princeton University Press, New Jersey
[13]. Beyer K, Goldstein J (1999) When is nearest neighbor
meaningful? Proc 7th Int Conf Database theory. In:
DatabaseTheory –ICDT’99. Lecture Notes in Computer
Science. Springer, Berlin Heidelberg Vol. 1540. pp 217-235.
[14]. Parsons L, Haque E, Liu H (2004) Subspace clustering
for high dimensional data: a review. ACM SIGKDD Explor
News l6(1):90–105.
[15]. Babu MM (2004) Introduction to microarray data
analysis. In: Grant RP (ed). Computational genomics:
Theory and application. Horizon Press, UK. pp 225–249
[16]. Jiang D, Tang C, Zhang A (2004) Cluster analysis for
gene expression data: a survey. IEEE Trans Knowl Data
Eng 16(11):1370–1386
[17].Cheng Y, Church GM (2000) Biclustering of
expression data. Proc Int Conf Intell Syst Mol Biol 8:93–
103
[18].Jongeneel CV (2000) Searching the expressed sequence
tag (est) databases: panning for genes. Bioinformatics 1:76–
92.
[19].Liu B, Wang X, Zou Q, Dong Q, Chen Q (2013)
Protein remote homology detection by combining chous
pseudo amino acid composition and profile-based protein
representation. Mol Inf 32(9–10):775–782.
[20]. Bhadra R, Sandhya S, Abhinandan KR, Chakrabarti S,
Sowdhamini R, Srinivasan N (2006) Cascade psiblast web
server: a remote homology search tool for relating protein
domains. Nucleic Acids Res 34(Web–Server–Issue):143–
146.
[21].Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ
(1990) Basic local alignment search tool. J Mol Biol
215(3):403–410.
[22]. Altschul S, Madden T, Schffer R, Zhang J, Zhang Z,
MillerW, Lipman D (1997) Gapped blast and psi-blast: a
new generation of protein database search programs.
Nucleic Acids Res 25:3389–3402.
[23]. Zhang Z, Schwartz S,Wagner L,MillerW(2000) A
greedy algorithm for aligning dna sequences. J Comput Biol
7:203–214.
[24].Korf I, Gish W (2000) Mpblast : improved blast
performance with multiplexed queries. Bioinformatics
16:1052–1053.
[25].Kent WJ (2002) Resource BLAT-The BLAST-like
alignment tool. Genome Res.
[26].Meek C, Patel JM, Kasetty S (2003) Oasis: an online
and accurate technique for local-alignment searches on
biological sequences. In: Proceedings of very large database
endowment (PVLDB), vol 29, pp.910–921.
[27]. Baeza-Yates R, Gonnet GH (1992) A new approach to
text searching. Commun ACM 35(10):74–82.
[28]. Papapetrou P, Athitsos V, Kollios G, Gunopulos D
(2009) Reference-based alignment in large sequence
databases. Proc Very Large Database Endow (PVLDB)
2(1):205–216.
[29]. Li R, Li Y, Kristiansen K, Wang J (2008c) Soap: short
oligonucleotide alignment program. Bioinformatics
24(5):713–714.
[30].Yang X, Wang B, Li C (2008) Cost-based variable-
length-gram selection for string collections to support
approximate queries efficiently. In: Proceedings of the 2008
ACMSIGMOD international conference on Management of
data, ACM, pp 353–364.
[31].Carlos Ordonez Et.Al.” PCA for large data sets with
parallel data summarization” Distrib Parallel Databases
(2014) 32:377–403,Springer, DOI 10.1007/s10619-013-
7134-6.
[32].Halko, N.H., Martinsson, P.-G., Shkolnisky, Y., Tygert,
M.: An algorithm for the principal component analysis of
large data sets. SIAM J. Sci. Comput. 33(5), 2580–2594
(2011).
[33].Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements
of Statistical Learning, 1st edn. Springer, New York (2001).
[34].Ming Ji Et.al,” Mining strong relevance between
heterogeneous entities from unstructured biomedical data”,
Data Min Knowl Disc (2015) ,Springer,29:976–998. DOI
10.1007/s10618-014-0396-4.
[35].SunY,Han J,YanX,Yu PS,Wu T (2011) Pathsim: meta
path-based top-k similarity search in heterogeneous
information networks. PVLDB 4(11):992–1003.
[36].Alexios Kotsifakos et.al.” DRESS: dimensionality
reduction for efficient sequence search”, Data Min Knowl
6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 05 Issue: 02 | Feb-2016, Available @ http://www.ijret.org 324
Disc (2015) 29:1280–1311,Springer, DOI 10.1007/s10618-
015-0413-2.
[37]. LIWEI KUANG Et.Al.” A Tensor-Based Approach for
Big Data Representation and Dimensionality
Reduction”, IEEE Transactions on emerging Topics in
Computing, Volume 2,No.3, September 2014.
[38]. https://en.wikipedia.org/wiki/DBSCAN.
[39].http://authors.phptr.com/graves/book_examples/ch10/e
x10_2.xml.
[40].O’callaghan L, Meyerson A, Motwani R, Mishra N,
Guha S (2002) Streaming-data algorithms for
high-quality clustering. In: ICDE, p 0685
[41]. Karypis G, Han E, Kumar V (1999) Chameleon:
hierarchical clustering using dynamic modeling.
Computer 32:68–75.
[42].http://www.broadinstitute.org/cgi-
bin/cancer/datasets.cgi.
[43].Fisher, R. A. The use of multiple measurements in
taxonomic problems. Annals of Eugenics 7, 179–188
(1936).
[44].Bache,K. & Lichman, M. Uci machine learning
repository, 2013. URL http://archive. ics. uci. edu/ml
(1990).
[45].Wen, X. et al. Large-scale temporal gene expression
mapping of central nervous system development.
Proceedings of the National Academy of Sciences 95, 334–
339 (1998).
[46].Chenyue W. Hu Et.Al.,” Progeny Clustering: A Method
to Identify Biological Phenotypes”, Open Scientific Reports,
DOI: 10.1038/srep12894.
[47]. Giancarlo, R., Scaturro, D. & Utro, F. Computational
cluster validation for microarray data analysis: experimental
assessment of clest, consensus clustering, figure of merit,
gap statistics and model explorer. BMC Bioinformatics 9,
462 (2008).
[48].Linxia Wan Et. Al.” Automatically clustering large-
scale miRNA sequences: methods and experiments”,
International Conference on Intelligent Biology and
Medicine (ICIBM) Nashville, TN, USA. 22-24 April 2012,
http://www.biomedcentral.com/1471-2164/13/S8/S15.