- The document describes a proposed DNA compression algorithm that uses a variable length lookup table (LUT) to improve compression ratios compared to existing algorithms.
- Existing algorithms only consider triplets of DNA bases and do not make use of longer repeating sequences, whereas the proposed algorithm includes a variable length LUT that can store sequences of length greater than 3 bases.
- Test results show the proposed algorithm achieves better compression than an existing 2D coding algorithm, reducing the size of test DNA sequences by up to 3% by exploiting longer repeating patterns.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Privacy Preserving Clustering on Distorted dataIOSR Journals
- The document discusses privacy-preserving clustering on distorted data using singular value decomposition (SVD) and sparsified singular value decomposition (SSVD).
- It applies SVD and SSVD to distort a real-world dataset of 100 terrorists with 42 attributes, generating distorted datasets.
- K-means clustering is then performed on the original and distorted datasets for different numbers of clusters (k). The results show that SSVD more effectively groups the data objects into clusters compared to the original and SVD-distorted datasets, while preserving data privacy as measured by various metrics.
This document summarizes a research paper that proposes a new method for identifying tea leaf species using image analysis and clustering. The method involves three main steps: 1) Preprocessing tea leaf images through techniques like converting to grayscale, fuzzy denoising using Dual Tree Discrete Wavelet Transform, and boundary enhancement. 2) Extracting morphological and geometrical features from the preprocessed images. 3) Clustering the tea leaf images into different species using the Fuzzy C-Means algorithm based on the extracted features. The method is tested on 60 tea leaf images belonging to 6 species, and experimental results show it can accurately cluster the tea leaf images by species with high accuracy and in less time compared to other methods.
Ayurvedic Herb Detection using Image Processingijtsrd
Trees are an inseparable part of our ecosystem and the reducing number of herb varieties is a serious problem. To conserve herbs, their immediate identification by botanists is a must, thus a tool is needed which could identify herbs using easily accessible details. There is a growing scientific consensus that plant shelters have been altered and species are dying at rates never seen before. The biodiversity problem is not just about the problematic state of herbs species but also of the specialists who know them. This initially requires data about various plant species, so that they could be foreseen, protected and can be used for next generation. Plants form the basis of Ayurveda and todays Modern day medicine are a great source of income. Due to cutting of forests and Pollution, lot of medicinal herb leaves have almost become extinct. So, there is an immediate need for us to identify them and replant them for the use of next generations. Leaf Identification by physical means often leads to incorrect identification. Due to increasing illegal transaction and incorrect practices in the nascent drug industry on one hand and lack of enough professionals on the other hand, a self automated and robust identification and classification mechanism is needed to manage the huge amount of data and to end the malpractices. This paper aims at implementing such system using image processing with images of the plant leaves as a basis of classification. System returns the closest match to the query. The proposed algorithm is implemented on 5 different plant species. Dinesh Shitole | Faisal Tamboli | Krishna Motghare | Raj Kumar Raj ""Ayurvedic Herb Detection using Image Processing"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23605.pdf
Paper URL: https://www.ijtsrd.com/medicine/ayurvedic/23605/ayurvedic-herb-detection-using-image-processing/dinesh-shitole
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONsipij
Automatic leaf recognition via image processing has been greatly important for a number of professionals, such as botanical taxonomic, environmental protectors, and foresters. Learn an over-complete leaf dictionary is an essential step for leaf image recognition. Big leaf images dimensions and training images number is facing of fast and complete data leaves dictionary. In this work an efficient approach applies to construct over-complete data leaves dictionary to set of big images diminutions based on sparse representation. In the proposed method a new cropped-contour method has used to crop the training image. The experiments are testing using correlation between the sparse representation and data dictionary and with focus on the computing time.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that discusses applications of data mining in medical databases. It begins by noting that large amounts of patient data have been collected in hospital information systems, and data mining techniques can be used to extract valuable hidden information from this data. The document then provides an overview of common data mining methods like neural networks, decision trees, and cluster detection that are applicable to medical data. It also discusses the process of knowledge discovery in databases and some considerations for preprocessing medical data from different sources before performing data mining analysis.
Making effective use of graphics processing units (GPUs) in computationsOregon State University
Graphics processing units (GPUs) are specialized computer processors used in computers and video game systems to accelerate the creation and display of images. Due to their inherent parallel structure, they also have great potential to speed up computations in many scientific and engineering applications. GPUs are attractive for their ability to perform a large number of computations in parallel at an attractive price. Many of the world¹s largest supercomputers use GPUs to achieve their high performance, and personal computers and laptops use them for graphics displays and image processing. This seminar will explore the use of GPUs in general, describe examples of the use of GPUs in computations, and introduce some best practices for GPU computing.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Privacy Preserving Clustering on Distorted dataIOSR Journals
- The document discusses privacy-preserving clustering on distorted data using singular value decomposition (SVD) and sparsified singular value decomposition (SSVD).
- It applies SVD and SSVD to distort a real-world dataset of 100 terrorists with 42 attributes, generating distorted datasets.
- K-means clustering is then performed on the original and distorted datasets for different numbers of clusters (k). The results show that SSVD more effectively groups the data objects into clusters compared to the original and SVD-distorted datasets, while preserving data privacy as measured by various metrics.
This document summarizes a research paper that proposes a new method for identifying tea leaf species using image analysis and clustering. The method involves three main steps: 1) Preprocessing tea leaf images through techniques like converting to grayscale, fuzzy denoising using Dual Tree Discrete Wavelet Transform, and boundary enhancement. 2) Extracting morphological and geometrical features from the preprocessed images. 3) Clustering the tea leaf images into different species using the Fuzzy C-Means algorithm based on the extracted features. The method is tested on 60 tea leaf images belonging to 6 species, and experimental results show it can accurately cluster the tea leaf images by species with high accuracy and in less time compared to other methods.
Ayurvedic Herb Detection using Image Processingijtsrd
Trees are an inseparable part of our ecosystem and the reducing number of herb varieties is a serious problem. To conserve herbs, their immediate identification by botanists is a must, thus a tool is needed which could identify herbs using easily accessible details. There is a growing scientific consensus that plant shelters have been altered and species are dying at rates never seen before. The biodiversity problem is not just about the problematic state of herbs species but also of the specialists who know them. This initially requires data about various plant species, so that they could be foreseen, protected and can be used for next generation. Plants form the basis of Ayurveda and todays Modern day medicine are a great source of income. Due to cutting of forests and Pollution, lot of medicinal herb leaves have almost become extinct. So, there is an immediate need for us to identify them and replant them for the use of next generations. Leaf Identification by physical means often leads to incorrect identification. Due to increasing illegal transaction and incorrect practices in the nascent drug industry on one hand and lack of enough professionals on the other hand, a self automated and robust identification and classification mechanism is needed to manage the huge amount of data and to end the malpractices. This paper aims at implementing such system using image processing with images of the plant leaves as a basis of classification. System returns the closest match to the query. The proposed algorithm is implemented on 5 different plant species. Dinesh Shitole | Faisal Tamboli | Krishna Motghare | Raj Kumar Raj ""Ayurvedic Herb Detection using Image Processing"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23605.pdf
Paper URL: https://www.ijtsrd.com/medicine/ayurvedic/23605/ayurvedic-herb-detection-using-image-processing/dinesh-shitole
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONsipij
Automatic leaf recognition via image processing has been greatly important for a number of professionals, such as botanical taxonomic, environmental protectors, and foresters. Learn an over-complete leaf dictionary is an essential step for leaf image recognition. Big leaf images dimensions and training images number is facing of fast and complete data leaves dictionary. In this work an efficient approach applies to construct over-complete data leaves dictionary to set of big images diminutions based on sparse representation. In the proposed method a new cropped-contour method has used to crop the training image. The experiments are testing using correlation between the sparse representation and data dictionary and with focus on the computing time.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that discusses applications of data mining in medical databases. It begins by noting that large amounts of patient data have been collected in hospital information systems, and data mining techniques can be used to extract valuable hidden information from this data. The document then provides an overview of common data mining methods like neural networks, decision trees, and cluster detection that are applicable to medical data. It also discusses the process of knowledge discovery in databases and some considerations for preprocessing medical data from different sources before performing data mining analysis.
Making effective use of graphics processing units (GPUs) in computationsOregon State University
Graphics processing units (GPUs) are specialized computer processors used in computers and video game systems to accelerate the creation and display of images. Due to their inherent parallel structure, they also have great potential to speed up computations in many scientific and engineering applications. GPUs are attractive for their ability to perform a large number of computations in parallel at an attractive price. Many of the world¹s largest supercomputers use GPUs to achieve their high performance, and personal computers and laptops use them for graphics displays and image processing. This seminar will explore the use of GPUs in general, describe examples of the use of GPUs in computations, and introduce some best practices for GPU computing.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
In today’s competitive world that thrives on the thirst for profits through excellence, obtaining greater efficiency by optimum utilization of resources and better decision-making through analytical data mining methods has become the backbone of every industry. This highlights the importance of the powerful tools and concepts of Data Mining and Warehousing, which when applied effectively can revolutionize the face of any industry. The role of normalization techniques has become extremely pivotal for identifying patterns and maintaining the consistency of database
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
This document discusses GCUBE indexing, which is a method for indexing and aggregating spatial/continuous values in a data warehouse. The key challenges addressed are defining and aggregating spatial/continuous values, and efficiently representing, indexing, updating and querying data that includes both categorical and continuous dimensions. The proposed GCUBE approach maps multi-dimensional data to a linear ordering using the Hilbert curve, and then constructs an index structure on the ordered data to enable efficient query processing. Empirical results show the GCUBE indexing offers significant performance advantages over alternative approaches.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class Support Vector Machine (SVM), a machine learning technique for noise reduction, a modified variant of DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. The Experimental results indicate high homogeneity levels in the clustering process.
Design issues and challenges of reliable and secure transmission of medical i...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document discusses using data mining techniques like machine learning to analyze air quality data and generate models for predicting pollution levels. It summarizes applying decision trees and neural networks to data on pollutants and weather in Cambridge, UK. The models showed air temperature is a dominant predictor of ozone levels. While data mining provides an empirical approach and short-term predictions, physical models are still needed to fully understand urban air quality given small-scale variations.
An Effective Tea Leaf Recognition Algorithm for Plant Classification Using Ra...IJMER
A leaf is an organ of a vascular plant, as identified in botanical terms, and in particular in plant morphology. Naturally a leaf is a thin, flattened organ bear above ground and it is mainly used for photosynthesis. Recognition of plants has become an active area of research as most of the plant species are at the risk of extinction. Most of the leaves cannot be recognized easily since some are not flat (e.g. succulent leaves and conifers), some does not grow above ground (e.g. bulb scales), and some does not undergo photosynthetic function (e.g. cataphylls, spines, and cotyledons).In this paper, we mainly focused on tea leaves to identify the leaf type for improving tea leaf classification. Tea leaf images are loaded from digital cameras or scanners in the system. This proposed approach consists of three phases such as preprocessing, feature extraction and classification to process the loaded image. The tea leaf images can be identified accurately in the preprocessing phase by fuzzy denoising using Dual Tree Discrete Wavelet Transform (DT-DWT) in order to remove the noisy features and boundary enhancement to obtain the shape of leaf accurately. In the feature extraction phase, Digital Morphological Features (DMFs) are derived to improve the classification accuracy. Radial Basis Function (RBF) is used for efficient classification. The RBF is trained by 60 tea leaves to classify them into 6 types. Experimental results proved that the proposed method classifies the tea leaves with more accuracy in less time. Thus, the proposed method achieves more accuracy in retrieving the leaf type.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
This document describes a proposed Optimal Frequent Patterns System (OFPS) that uses a genetic algorithm to discover optimal frequent patterns from transactional databases more efficiently. The OFPS is a three-fold system that first prepares data through cleaning, integration and transformation. It then constructs a Frequent Pattern Tree to discover frequent patterns. Finally, it applies a genetic algorithm to generate optimal frequent patterns, simulating biological evolution to find the best solutions. The proposed system aims to overcome limitations of conventional association rule mining approaches and efficiently discover optimal patterns from large, changing datasets.
This document summarizes computational analysis methods for determining expectation values commonly used in bioinformatics databases. It discusses tools like BLAST, FASTA, and databases like NCBI that allow querying and analyzing sequences. The expectation value provides the probability that a match could occur by chance, with lower values indicating higher quality matches. These tools and databases facilitate customizable extraction of data from sequences to enable further analysis and knowledge discovery in bioinformatics.
This document presents a method for leaf identification using feature extraction and an artificial neural network. Leaf images are preprocessed, segmented, and features like eccentricity, aspect ratio, area, and perimeter are extracted. These features are used as inputs to train an artificial neural network classifier. The neural network is tested on leaf images and achieves 98.8% accuracy at identifying leaves using a minimum of seven input features. This approach provides an effective and computationally efficient way to identify plant leaves based on images.
Herbal plant recognition using deep convolutional neural networkjournalBEEI
This paper investigates the application of deep convolutional neural network (CNN) for herbal plant recognition through leaf identification. Traditional plant identification is often time-consuming due to varieties as well as similarities possessed within the plant species. This study shows that a deep CNN model can be created and enhanced using multiple parameters to boost recognition accuracy performance. This study also shows the significant effects of the multi-layer model on small sample sizes to achieve reasonable performance. Furthermore, data augmentation provides more significant benefits on the overall performance. Simple augmentations such as resize, flip and rotate will increase accuracy significantly by creating invariance and preventing the model from learning irrelevant features. A new dataset of the leaves of various herbal plants found in Malaysia has been constructed and the experimental results achieved 99% accuracy.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
This document summarizes a research paper that uses genetic algorithms for data mining of mass spectrometry data. The paper applies genetic algorithms to the data mining step of the KDD (Knowledge Discovery in Databases) process. Specifically, it uses genetic algorithms to extract optimal features or peaks from mass spectrometry data that can distinguish cancer patients from control patients. The genetic algorithm searches for discriminative features, which are ion intensity levels at specific mass-charge values. Over generations of the genetic algorithm, the paper finds significant patterns of masses that can differentiate between normal and cancer patients.
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
In today’s world, there is an improved advance in hardware technology which increases the capability to store and record personal data about consumers and individuals. Data mining extracts knowledge to support a variety of areas as marketing, medical diagnosis, weather forecasting, national security etc successfully. Still there is a challenge to extract certain kinds of data without violating the data owners’ privacy. As data mining becomes more enveloping, such privacy concerns are increasing. This gives birth to a new category of data mining method called privacy preserving data mining algorithm (PPDM). The aim of this algorithm is to protect the easily affected information in data from the large amount of data set. The privacy preservation of data set can be expressed in the form of decision tree. This paper proposes a privacy preservation based on data set complement algorithms which store the information of the real dataset. So that the private data can be safe from the unauthorized party, if some portion of the data can be lost, then we can recreate the original data set from the unrealized dataset and the perturbed data set.
A new revisited compression technique through innovative partition group binaryIAEME Publication
This document discusses a new compression technique called Partition Group Binary Compression (PGBC) for compressing DNA sequences. It begins with background on DNA sequencing and challenges with compressing large DNA databases. It then reviews existing compression algorithms like Huffbit Compress and Genbit Compress that use 2-bit encoding but do not perform well on sequences with few repeats. The proposed PGBC technique aims to achieve better compression ratios than existing methods even for sequences with little repetition. The paper is organized into sections on general compression algorithms, related existing algorithms, a description of how PGBC analysis improves on them, a comparative study on sample sequences, and conclusions.
The classification of different types of tumors is of great importance in cancer diagnosis and its drug discovery. Cancer classification via gene expression data is known to contain the keys for solving the fundamental problems relating to the diagnosis of cancer. The recent advent of DNA microarray technology has made rapid monitoring of thousands of gene expressions possible. With this large quantity of gene expression data, scientists have started to explore the opportunities of classification of cancer using a gene expression dataset. To gain a profound understanding of the classification of cancer, it is necessary to take a closer look at the problem, the proposed solutions, and the related issues altogether. In this research thesis, I present a new way for Leukemia classification using the latest AI technique of Deep learning using Google TensorFlow on gene expression data.
Design and development of learning model for compression and processing of d...IJECEIAES
This document describes a deep learning model for efficient compression and processing of genomic DNA sequence data. It proposes using computational optimization techniques across three stages of genomic data compression: extraction, storage, and retrieval of data. The model uses gene network embedding to combine gene interaction and expression data to learn lower dimensional representations. It predicts gene functions, reconstructs gene ontologies, and predicts gene interactions. Evaluation shows the model achieves an average precision score of 0.820 for gene interaction prediction on test data.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
In today’s competitive world that thrives on the thirst for profits through excellence, obtaining greater efficiency by optimum utilization of resources and better decision-making through analytical data mining methods has become the backbone of every industry. This highlights the importance of the powerful tools and concepts of Data Mining and Warehousing, which when applied effectively can revolutionize the face of any industry. The role of normalization techniques has become extremely pivotal for identifying patterns and maintaining the consistency of database
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
This document discusses GCUBE indexing, which is a method for indexing and aggregating spatial/continuous values in a data warehouse. The key challenges addressed are defining and aggregating spatial/continuous values, and efficiently representing, indexing, updating and querying data that includes both categorical and continuous dimensions. The proposed GCUBE approach maps multi-dimensional data to a linear ordering using the Hilbert curve, and then constructs an index structure on the ordered data to enable efficient query processing. Empirical results show the GCUBE indexing offers significant performance advantages over alternative approaches.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class Support Vector Machine (SVM), a machine learning technique for noise reduction, a modified variant of DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. The Experimental results indicate high homogeneity levels in the clustering process.
Design issues and challenges of reliable and secure transmission of medical i...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document discusses using data mining techniques like machine learning to analyze air quality data and generate models for predicting pollution levels. It summarizes applying decision trees and neural networks to data on pollutants and weather in Cambridge, UK. The models showed air temperature is a dominant predictor of ozone levels. While data mining provides an empirical approach and short-term predictions, physical models are still needed to fully understand urban air quality given small-scale variations.
An Effective Tea Leaf Recognition Algorithm for Plant Classification Using Ra...IJMER
A leaf is an organ of a vascular plant, as identified in botanical terms, and in particular in plant morphology. Naturally a leaf is a thin, flattened organ bear above ground and it is mainly used for photosynthesis. Recognition of plants has become an active area of research as most of the plant species are at the risk of extinction. Most of the leaves cannot be recognized easily since some are not flat (e.g. succulent leaves and conifers), some does not grow above ground (e.g. bulb scales), and some does not undergo photosynthetic function (e.g. cataphylls, spines, and cotyledons).In this paper, we mainly focused on tea leaves to identify the leaf type for improving tea leaf classification. Tea leaf images are loaded from digital cameras or scanners in the system. This proposed approach consists of three phases such as preprocessing, feature extraction and classification to process the loaded image. The tea leaf images can be identified accurately in the preprocessing phase by fuzzy denoising using Dual Tree Discrete Wavelet Transform (DT-DWT) in order to remove the noisy features and boundary enhancement to obtain the shape of leaf accurately. In the feature extraction phase, Digital Morphological Features (DMFs) are derived to improve the classification accuracy. Radial Basis Function (RBF) is used for efficient classification. The RBF is trained by 60 tea leaves to classify them into 6 types. Experimental results proved that the proposed method classifies the tea leaves with more accuracy in less time. Thus, the proposed method achieves more accuracy in retrieving the leaf type.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
This document describes a proposed Optimal Frequent Patterns System (OFPS) that uses a genetic algorithm to discover optimal frequent patterns from transactional databases more efficiently. The OFPS is a three-fold system that first prepares data through cleaning, integration and transformation. It then constructs a Frequent Pattern Tree to discover frequent patterns. Finally, it applies a genetic algorithm to generate optimal frequent patterns, simulating biological evolution to find the best solutions. The proposed system aims to overcome limitations of conventional association rule mining approaches and efficiently discover optimal patterns from large, changing datasets.
This document summarizes computational analysis methods for determining expectation values commonly used in bioinformatics databases. It discusses tools like BLAST, FASTA, and databases like NCBI that allow querying and analyzing sequences. The expectation value provides the probability that a match could occur by chance, with lower values indicating higher quality matches. These tools and databases facilitate customizable extraction of data from sequences to enable further analysis and knowledge discovery in bioinformatics.
This document presents a method for leaf identification using feature extraction and an artificial neural network. Leaf images are preprocessed, segmented, and features like eccentricity, aspect ratio, area, and perimeter are extracted. These features are used as inputs to train an artificial neural network classifier. The neural network is tested on leaf images and achieves 98.8% accuracy at identifying leaves using a minimum of seven input features. This approach provides an effective and computationally efficient way to identify plant leaves based on images.
Herbal plant recognition using deep convolutional neural networkjournalBEEI
This paper investigates the application of deep convolutional neural network (CNN) for herbal plant recognition through leaf identification. Traditional plant identification is often time-consuming due to varieties as well as similarities possessed within the plant species. This study shows that a deep CNN model can be created and enhanced using multiple parameters to boost recognition accuracy performance. This study also shows the significant effects of the multi-layer model on small sample sizes to achieve reasonable performance. Furthermore, data augmentation provides more significant benefits on the overall performance. Simple augmentations such as resize, flip and rotate will increase accuracy significantly by creating invariance and preventing the model from learning irrelevant features. A new dataset of the leaves of various herbal plants found in Malaysia has been constructed and the experimental results achieved 99% accuracy.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
This document summarizes a research paper that uses genetic algorithms for data mining of mass spectrometry data. The paper applies genetic algorithms to the data mining step of the KDD (Knowledge Discovery in Databases) process. Specifically, it uses genetic algorithms to extract optimal features or peaks from mass spectrometry data that can distinguish cancer patients from control patients. The genetic algorithm searches for discriminative features, which are ion intensity levels at specific mass-charge values. Over generations of the genetic algorithm, the paper finds significant patterns of masses that can differentiate between normal and cancer patients.
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
In today’s world, there is an improved advance in hardware technology which increases the capability to store and record personal data about consumers and individuals. Data mining extracts knowledge to support a variety of areas as marketing, medical diagnosis, weather forecasting, national security etc successfully. Still there is a challenge to extract certain kinds of data without violating the data owners’ privacy. As data mining becomes more enveloping, such privacy concerns are increasing. This gives birth to a new category of data mining method called privacy preserving data mining algorithm (PPDM). The aim of this algorithm is to protect the easily affected information in data from the large amount of data set. The privacy preservation of data set can be expressed in the form of decision tree. This paper proposes a privacy preservation based on data set complement algorithms which store the information of the real dataset. So that the private data can be safe from the unauthorized party, if some portion of the data can be lost, then we can recreate the original data set from the unrealized dataset and the perturbed data set.
A new revisited compression technique through innovative partition group binaryIAEME Publication
This document discusses a new compression technique called Partition Group Binary Compression (PGBC) for compressing DNA sequences. It begins with background on DNA sequencing and challenges with compressing large DNA databases. It then reviews existing compression algorithms like Huffbit Compress and Genbit Compress that use 2-bit encoding but do not perform well on sequences with few repeats. The proposed PGBC technique aims to achieve better compression ratios than existing methods even for sequences with little repetition. The paper is organized into sections on general compression algorithms, related existing algorithms, a description of how PGBC analysis improves on them, a comparative study on sample sequences, and conclusions.
The classification of different types of tumors is of great importance in cancer diagnosis and its drug discovery. Cancer classification via gene expression data is known to contain the keys for solving the fundamental problems relating to the diagnosis of cancer. The recent advent of DNA microarray technology has made rapid monitoring of thousands of gene expressions possible. With this large quantity of gene expression data, scientists have started to explore the opportunities of classification of cancer using a gene expression dataset. To gain a profound understanding of the classification of cancer, it is necessary to take a closer look at the problem, the proposed solutions, and the related issues altogether. In this research thesis, I present a new way for Leukemia classification using the latest AI technique of Deep learning using Google TensorFlow on gene expression data.
Design and development of learning model for compression and processing of d...IJECEIAES
This document describes a deep learning model for efficient compression and processing of genomic DNA sequence data. It proposes using computational optimization techniques across three stages of genomic data compression: extraction, storage, and retrieval of data. The model uses gene network embedding to combine gene interaction and expression data to learn lower dimensional representations. It predicts gene functions, reconstructs gene ontologies, and predicts gene interactions. Evaluation shows the model achieves an average precision score of 0.820 for gene interaction prediction on test data.
This document discusses a method for splitting large medical data sets based on the normal distribution in a cloud computing environment. The key points are:
- Large medical and e-commerce data sets present challenges for data mining due to their size and generation velocity. Existing splitting methods like UV decomposition do not scale well for very large data sets.
- The proposed method splits large data sets into smaller subsets based on identifying groups of data that approximate a normal distribution. These normal distribution (ND) subsets can then be analyzed individually while still representing the overall data set.
- The ND subsets are well-suited for distributed processing in a cloud computing environment, as each subset can be analyzed locally and in parallel. Experimental results show
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET Journal
This document presents a methodology for analyzing gene mutation data using ontologies and association rule mining. It aims to develop a common knowledge base for genomic and proteomic analysis by integrating multiple data sources. The methodology involves using k-nearest neighbors algorithm to find similar genes, an iterative multiplicative updating algorithm to solve optimization problems, and SNCoNMF to identify co-regulatory modules between genes, microRNAs and transcription factors. The results are represented using a Bayesian rose tree for efficient visualization of associations between genetic components and diseases.
Data reduction techniques to analyze nsl kdd datasetIAEME Publication
The document discusses applying data reduction techniques to the NSL-KDD dataset to analyze network intrusion detection data. It describes how data reduction can minimize data size without losing important information. The document applies several data reduction algorithms to the NSL-KDD dataset and uses the output to train and test two classification algorithms, J48 and Naive Bayes. The results are compared based on accuracy, specificity, and sensitivity to determine which data reduction technique improves classification performance the most. The goal is to find an effective and efficient way to analyze large network intrusion detection datasets using data reduction and machine learning.
Abstract DNA cryptography the new era of cryptography enhanced the cryptography in terms of time complexity as well as capacity. It uses the DNA strands to hide the information. The repeated sequence of the DNA makes highly difficult for unintended authority to get the message. The level of security can be increased by using the more than one cover medium. This paper discussed a technique that uses two cover medium to hide the message one is DNA and the other is image. The message is encrypted and converted to faked DNA then this faked DNA is hided to the image. Keywords: DNA, cryptography, DNA cryptography
A study and survey on various progressive duplicate detection mechanismseSAT Journals
Abstract One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Severa
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
Abstract There exist many computational methods for finding similarity in gene sequence, finding suitable methods that gives optimal similarity is difficult task. Objective of this project is to find an appropriate method to compute similarity in gene/protein sequence, both within the families and across the families. Many dynamic programming algorithms like Levenshtein edit distance; Longest Common Subsequence and Smith-waterman have used dynamic programming approach to find similarities between two sequences. But none of the method mentioned above have used real benchmark data sets. They have only used dynamic programming algorithms for synthetic data. We proposed a new method to compute similarity. The performance of the proposed algorithm is evaluated using number of data sets from various families, and similarity value is calculated both within the family and across the families. A comparative analysis and time complexity of the proposed method reveal that Smith-waterman approach is appropriate method when gene/protein sequence belongs to same family and Longest Common Subsequence is best suited when sequence belong to two different families. Keywords - Bioinformatics, Gene, Gene Sequencing, Edit distance, String Similarity.
dFuse: An Optimized Compression Algorithm for DICOM-Format Image ArchiveCSCJournals
Medical images are useful for knowing the details of the human body for health science or remedial reasons. DICOM is structured as a multi-part document in order to facilitate extension of these images. Additionally, DICOM defined information objects are not only for images but also for patients, studies, reports, and other data groupings. More information details in DICOM, resulted in large size, and transferring or communicating these files took lots of time. To solve this, files can be compressed and transferred. Efficient compression solutions are available and they are becoming more critical with the recent intensive growth of data and medical imaging. In order to receive the original and less sized image, we need effective compression algorithm. There are different algorithms for compression such as DCT, Haar, Daubuchies which has its roots in cosine and wavelet transforms. In this paper, we propose a new compression algorithm called “dFuse”. It uses cosine based three dimensional transform to compress the DICOM files. We use the following parameters to check the efficiency of the proposed algorithm, they are i) file size, ii) PSNR, iii) compression percentage and iv) compression ratio. From the experimental results obtained, the proposed algorithm works well for compressing medical images.
This document discusses and compares five predictive data mining techniques: principal component analysis, correlation coefficient analysis, principal component regression, nonlinear partial least squares, and linear regression. It first provides background on data acquisition, preparation, and preprocessing techniques. It then describes each predictive technique, including how they handle issues like collinearity in datasets. Finally, it discusses how these techniques will be applied to four different datasets and the results compared to determine which technique best predicts the response variable while reducing variables.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
Mining of Prevalent Ailments in a Health Database Using Fp-Growth AlgorithmWaqas Tariq
Health databases are characterised by large number of attributes such as personal biological and diagnosis information, health history, prescription, billing information and so on. The increasing need for providing enhanced medical system has necessitated the need for adopting an efficient data mining technique for extracting hidden and useful information from health database. In the past, many data mining algorithms such as Apriori, Eclat, H-Mine have been developed with deficiency in time-space trade off. In this work, an enhanced FP-growth frequent pattern mining algorithm coined FP-Ail is applied to students’ health database with a view to provide information about prevalent ailments and suggestions for managing the identified ailments. FP-Ail is tested on a student’s health database of a tertiary institution in Nigeria and the results obtained could be used by the management of the health centre for enhanced strategic decision making about health care. FP-Ail also provides the possibility to refine the minimum support threshold interactively, and to see the changes instantly.
Mining of Prevalent Ailments in a Health Database Using Fp-Growth AlgorithmWaqas Tariq
Health databases are characterised by large number of attributes such as personal biological and diagnosis information, health history, prescription, billing information and so on. The increasing need for providing enhanced medical system has necessitated the need for adopting an efficient data mining technique for extracting hidden and useful information from health database. In the past, many data mining algorithms such as Apriori, Eclat, H-Mine have been developed with deficiency in time-space trade off. In this work, an enhanced FP-growth frequent pattern mining algorithm coined FP-Ail is applied to students’ health database with a view to provide information about prevalent ailments and suggestions for managing the identified ailments. FP-Ail is tested on a student’s health database of a tertiary institution in Nigeria and the results obtained could be used by the management of the health centre for enhanced strategic decision making about health care. FP-Ail also provides the possibility to refine the minimum support threshold interactively, and to see the changes instantly.
Data preprocessing for efficient external sortingeSAT Journals
This document discusses external sorting using data preprocessing. It begins by introducing the problem of huge datasets containing redundancy, noise, and inconsistencies that reduce data quality. It proposes applying data preprocessing methods like cleaning, integration, transformation, and reduction to eliminate these issues before sorting. This improves sorting efficiency by reducing the number of passes, inputs/outputs, and runs needed compared to sorting raw data. The document provides pseudocode for preprocessing numeric, alphanumeric, and string data to remove duplication. It concludes that preprocessing datasets prior to external sorting reduces time complexity and I/O costs compared to sorting data with redundancy.
This document discusses external sorting using data preprocessing. It begins by introducing the problem of huge datasets containing redundancy, noise, and inconsistencies that reduce data quality. It proposes applying data preprocessing methods like cleaning, integration, transformation, and reduction to eliminate these issues before sorting. The preprocessing is said to reduce the number of sorting passes and I/O operations needed. The document then discusses related work on external sorting techniques and their overheads. It presents the proposed approach of using data cleaning before sorting to remove redundancy and minimize I/O costs. Pseudocode for preprocessing different data types is provided. The sorting algorithm applies an B-way external merge sort on the preprocessed data.
This document proposes and evaluates a neighborhood system approach for reducing the volume of very large datasets while preserving intrinsic properties. The approach defines neighborhoods for each record based on attribute similarities within a radius threshold. Two prediction algorithms were used to test if the reduced datasets maintained properties of the original data. Results across 10 datasets showed the approach improved prediction outcomes by increasing correct predictions and decreasing false predictions, demonstrating it is an effective robust reduction method.
Clustering technology has been applied in numerous applications. It can enhance the performance
of information retrieval systems, it can also group Internet users to help improve the click-through rate of
on-line advertising, etc. Over the past few decades, a great many data clustering algorithms have been
developed, including K-Means, DBSCAN, Bi-Clustering and Spectral clustering, etc. In recent years, two
new data clustering algorithms have been proposed, which are affinity propagation (AP, 2007) and density
peak based clustering (DP, 2014). In this work, we empirically compare the performance of these two latest
data clustering algorithms with state-of-the-art, using 6 external and 2 internal clustering validation metrics.
Our experimental results on 16 public datasets show that, the two latest clustering algorithms, AP and DP,
do not always outperform DBSCAN. Therefore, to find the best clustering algorithm for a specific dataset, all
of AP, DP and DBSCAN should be considered. Moreover, we find that the comparison of different clustering
algorithms is closely related to the clustering evaluation metrics adopted. For instance, when using the
Silhouette clustering validation metric, the overall performance of K-Means is as good as AP and DP. This
work has important reference values for researchers and engineers who need to select appropriate clustering
algorithms for their specific applications.
A Study on DNA based Computation and Memory DevicesEditor IJCATR
The present study delineates Deoxyribonucleic Acid (DNA) based computing and storage devices which have good future in the vast era of information technology. The traditional devices mostly used are made up of silicon. The devices are costly and have physical limitations to cause leakage of electrons and circuit to shorten. So, there is a need of materials which are capable of doing fast processing and have vast memory storage. DNA which is a bio-molecule has all these characteristics capable of providing ample storage. In classical computing devices, electronic logic gates are elements which allow storing and transforming of information. Designing of an appropriate sequence or a net of “store” and “transform” operations (in a sense of building a device or writing a program) is equivalent to preparing some computations. In DNA based computation, the situation is analogous. The main difference is the type of computing devices since in this new method of computing instead of electronic gates, DNA molecules have been deployed for the processing of dossier. Moreover, the inherent massive parallelism of DNA computing may lead to methods solving some intractable computational problems. The aim of this research study is to analyze the logical features and memory formation using DNA bio molecules in order to achieve proliferated speed, accuracy and vast storage.
Similar to Design & Implementation of a DNA Compression Algorithm (20)
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
1) The document discusses the Sungal Tunnel project in Jammu and Kashmir, India, which is being constructed using the New Austrian Tunneling Method (NATM).
2) NATM involves continuous monitoring during construction to adapt to changing ground conditions, and makes extensive use of shotcrete for temporary tunnel support.
3) The methodology section outlines the systematic geotechnical design process for tunnels according to Austrian guidelines, and describes the various steps of NATM tunnel construction including initial and secondary tunnel support.
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
This study examines the effect of response reduction factors (R factors) on reinforced concrete (RC) framed structures through nonlinear dynamic analysis. Three RC frame models with varying heights (4, 8, and 12 stories) were analyzed in ETABS software under different R factors ranging from 1 to 5. The results showed that displacement increased as the R factor decreased, indicating less linear behavior for lower R factors. Drift also decreased proportionally with increasing R factors from 1 to 5. Shear forces in the frames decreased with higher R factors. In general, R factors of 3 to 5 produced more satisfactory performance with less displacement and drift. The displacement variations between different building heights were consistent at different R factors. This study evaluated how R factors influence
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
This study compares the use of Stark Steel and TMT Steel as reinforcement materials in a two-way reinforced concrete slab. Mechanical testing is conducted to determine the tensile strength, yield strength, and other properties of each material. A two-way slab design adhering to codes and standards is executed with both materials. The performance is analyzed in terms of deflection, stability under loads, and displacement. Cost analyses accounting for material, durability, maintenance, and life cycle costs are also conducted. The findings provide insights into the economic and structural implications of each material for reinforcement selection and recommendations on the most suitable material based on the analysis.
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
This document discusses a study analyzing the effect of camber, position of camber, and angle of attack on the aerodynamic characteristics of airfoils. Sixteen modified asymmetric NACA airfoils were analyzed using computational fluid dynamics (CFD) by varying the camber, camber position, and angle of attack. The results showed the relationship between these parameters and the lift coefficient, drag coefficient, and lift to drag ratio. This provides insight into how changes in airfoil geometry impact aerodynamic performance.
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
This document reviews the progress and challenges of aluminum-based metal matrix composites (MMCs), focusing on their fabrication processes and applications. It discusses how various aluminum MMCs have been developed using reinforcements like borides, carbides, oxides, and nitrides to improve mechanical and wear properties. These composites have gained prominence for their lightweight, high-strength and corrosion resistance properties. The document also examines recent advancements in fabrication techniques for aluminum MMCs and their growing applications in industries such as aerospace and automotive. However, it notes that challenges remain around issues like improper mixing of reinforcements and reducing reinforcement agglomeration.
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
This document discusses research on using graph neural networks (GNNs) for dynamic optimization of public transportation networks in real-time. GNNs represent transit networks as graphs with nodes as stops and edges as connections. The GNN model aims to optimize networks using real-time data on vehicle locations, arrival times, and passenger loads. This helps increase mobility, decrease traffic, and improve efficiency. The system continuously trains and infers to adapt to changing transit conditions, providing decision support tools. While research has focused on performance, more work is needed on security, socio-economic impacts, contextual generalization of models, continuous learning approaches, and effective real-time visualization.
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
This document summarizes a research project that aims to compare the structural performance of conventional slab and grid slab systems in multi-story buildings using ETABS software. The study will analyze both symmetric and asymmetric building models under various loading conditions. Parameters like deflections, moments, shears, and stresses will be examined to evaluate the structural effectiveness of each slab type. The results will provide insights into the comparative behavior of conventional and grid slabs to help engineers and architects select appropriate slab systems based on building layouts and design requirements.
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
This document summarizes and reviews a research paper on the seismic response of reinforced concrete (RC) structures with plan and vertical irregularities, with and without infill walls. It discusses how infill walls can improve or reduce the seismic performance of RC buildings, depending on factors like wall layout, height distribution, connection to the frame, and relative stiffness of walls and frames. The reviewed research paper analyzes the behavior of infill walls, effects of vertical irregularities, and seismic performance of high-rise structures under linear static and dynamic analysis. It studies response characteristics like story drift, deflection and shear. The document also provides literature on similar research investigating the effects of infill walls, soft stories, plan irregularities, and different
This document provides a review of machine learning techniques used in Advanced Driver Assistance Systems (ADAS). It begins with an abstract that summarizes key applications of machine learning in ADAS, including object detection, recognition, and decision-making. The introduction discusses the integration of machine learning in ADAS and how it is transforming vehicle safety. The literature review then examines several research papers on topics like lightweight deep learning models for object detection and lane detection models using image processing. It concludes by discussing challenges and opportunities in the field, such as improving algorithm robustness and adaptability.
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
The document analyzes temperature and precipitation trends in Asosa District, Benishangul Gumuz Region, Ethiopia from 1993 to 2022 based on data from the local meteorological station. The results show:
1) The average maximum and minimum annual temperatures have generally decreased over time, with maximum temperatures decreasing by a factor of -0.0341 and minimum by -0.0152.
2) Mann-Kendall tests found the decreasing temperature trends to be statistically significant for annual maximum temperatures but not for annual minimum temperatures.
3) Annual precipitation in Asosa District showed a statistically significant increasing trend.
The conclusions recommend development planners account for rising summer precipitation and declining temperatures in
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
This document discusses the design and analysis of pre-engineered building (PEB) framed structures using STAAD Pro software. It provides an overview of PEBs, including that they are designed off-site with building trusses and beams produced in a factory. STAAD Pro is identified as a key tool for modeling, analyzing, and designing PEBs to ensure their performance and safety under various load scenarios. The document outlines modeling structural parts in STAAD Pro, evaluating structural reactions, assigning loads, and following international design codes and standards. In summary, STAAD Pro is used to design and analyze PEB framed structures to ensure safety and code compliance.
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
This document provides a review of research on innovative fiber integration methods for reinforcing concrete structures. It discusses studies that have explored using carbon fiber reinforced polymer (CFRP) composites with recycled plastic aggregates to develop more sustainable strengthening techniques. It also examines using ultra-high performance fiber reinforced concrete to improve shear strength in beams. Additional topics covered include the dynamic responses of FRP-strengthened beams under static and impact loads, and the performance of preloaded CFRP-strengthened fiber reinforced concrete beams. The review highlights the potential of fiber composites to enable more sustainable and resilient construction practices.
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
This document summarizes a survey on securing patient healthcare data in cloud-based systems. It discusses using technologies like facial recognition, smart cards, and cloud computing combined with strong encryption to securely store patient data. The survey found that healthcare professionals believe digitizing patient records and storing them in a centralized cloud system would improve access during emergencies and enable more efficient care compared to paper-based systems. However, ensuring privacy and security of patient data is paramount as healthcare incorporates these digital technologies.
Review on studies and research on widening of existing concrete bridgesIRJET Journal
This document summarizes several studies that have been conducted on widening existing concrete bridges. It describes a study from China that examined load distribution factors for a bridge widened with composite steel-concrete girders. It also outlines challenges and solutions for widening a bridge in the UAE, including replacing bearings and stitching the new and existing structures. Additionally, it discusses two bridge widening projects in New Zealand that involved adding precast beams and stitching to connect structures. Finally, safety measures and challenges for strengthening a historic bridge in Switzerland under live traffic are presented.
React based fullstack edtech web applicationIRJET Journal
The document describes the architecture of an educational technology web application built using the MERN stack. It discusses the frontend developed with ReactJS, backend with NodeJS and ExpressJS, and MongoDB database. The frontend provides dynamic user interfaces, while the backend offers APIs for authentication, course management, and other functions. MongoDB enables flexible data storage. The architecture aims to provide a scalable, responsive platform for online learning.
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
This paper proposes integrating Internet of Things (IoT) and blockchain technologies to help implement objectives of India's National Education Policy (NEP) in the education sector. The paper discusses how blockchain could be used for secure student data management, credential verification, and decentralized learning platforms. IoT devices could create smart classrooms, automate attendance tracking, and enable real-time monitoring. Blockchain would ensure integrity of exam processes and resource allocation, while smart contracts automate agreements. The paper argues this integration has potential to revolutionize education by making it more secure, transparent and efficient, in alignment with NEP goals. However, challenges like infrastructure needs, data privacy, and collaborative efforts are also discussed.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
This document provides a review of research on the performance of coconut fibre reinforced concrete. It summarizes several studies that tested different volume fractions and lengths of coconut fibres in concrete mixtures with varying compressive strengths. The studies found that coconut fibre improved properties like tensile strength, toughness, crack resistance, and spalling resistance compared to plain concrete. Volume fractions of 2-5% and fibre lengths of 20-50mm produced the best results. The document concludes that using a 4-5% volume fraction of coconut fibres 30-40mm in length with M30-M60 grade concrete would provide benefits based on previous research.
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
The document discusses optimizing business management processes through automation using Microsoft Power Automate and artificial intelligence. It provides an overview of Power Automate's key components and features for automating workflows across various apps and services. The document then presents several scenarios applying automation solutions to common business processes like data entry, monitoring, HR, finance, customer support, and more. It estimates the potential time and cost savings from implementing automation for each scenario. Finally, the conclusion emphasizes the transformative impact of AI and automation tools on business processes and the need for ongoing optimization.
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
The document describes the seismic design of a G+5 steel building frame located in Roorkee, India according to Indian codes IS 1893-2002 and IS 800. The frame was analyzed using the equivalent static load method and response spectrum method, and its response in terms of displacements and shear forces were compared. Based on the analysis, the frame was designed as a seismic-resistant steel structure according to IS 800:2007. The software STAAD Pro was used for the analysis and design.
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
This research paper explores using plastic waste as a sustainable and cost-effective construction material. The study focuses on manufacturing pavers and bricks using recycled plastic and partially replacing concrete with plastic alternatives. Initial results found that pavers and bricks made from recycled plastic demonstrate comparable strength and durability to traditional materials while providing environmental and cost benefits. Additionally, preliminary research indicates incorporating plastic waste as a partial concrete replacement significantly reduces construction costs without compromising structural integrity. The outcomes suggest adopting plastic waste in construction can address plastic pollution while optimizing costs, promoting more sustainable building practices.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.