Data mining is defined as the process of extracting or mining knowledge from vast and large database.Data mining is an interdisciplinary field that brings together techniques from machine learning, pattern recognition, statistics, databases, and visualization to address the issue of information extraction from large databases. Bioinformatics is defined as the science of organizing and analyzing the biological data. Microarray technology helps biologists for monitoring expression of thousands of genes in a single experiment on a small chip. Microarray is also called as DNA chip, gene chip, or biochip is used to analyze the gene expression profiles. Fuzzy Logic is defined as a multivalued logic that provides the intermediate values to be defined between conventional evaluations like true or false, yes or no, high or low, etc.In this paper, a type 2 fuzzy logic approach is used in microarray gene expression data to convert the numerical values into fuzzy terms. After fuzzification, the fuzzy association patterns are discovered. A framework is proposed to cluster microarray gene data based on fuzzy association patterns. Then the proposed type 2
fuzzy approach is compared with traditional clustering algorithms.
Data mining with human genetics to enhance gene based algorithm andIAEME Publication
This document summarizes a research study on enhancing DNA database security through data mining techniques. The study aims to understand relationships between DNA sequences and disease risk, and develop improved parental identification algorithms. Data mining is used to analyze large DNA datasets and extract patterns. Security methods like encryption algorithms are also applied to DNA databases to protect the data. The study examines using data mining, visualization, parallel processing and high-performance computing for DNA sequence analysis while enhancing security. The goal is to discover knowledge from DNA databases and securely store citizen identity data to prevent fraud.
Uses of Artificial Intelligence in BioinformaticsPragya Pai
This presentation is about the usage of Artificial Intelligence in Bioinformatics. These slides give the basic knowledge about usage of Artificial Intelligence in Bioinformatics.
IRJET- Crop Leaf Disease Diagnosis using Convolutional Neural NetworkIRJET Journal
This document describes a system for diagnosing crop leaf diseases using convolutional neural networks. The system can identify diseases in five major crops: corn, rice, wheat, sugarcane, and grapes. It uses a MobileNet model and CNN architecture trained on datasets of images of healthy and diseased leaves. The system achieves 97.33% accuracy in diagnosing diseases in grape leaves. It aims to help farmers detect diseases early and determine the appropriate pesticides.
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKijsc
Classification is a machine learning technique used to predict group membership for data instances. To simplify the problem of classification neural networks are being introduced. This paper focuses on IRIS plant classification using Neural Network. The problem concerns the identification of IRIS plant species on
the basis of plant attribute measurements. Classification of IRIS data set would be discovering patterns from examining petal and sepal size of the IRIS plant and how the prediction was made from analyzing the pattern to form the class of IRIS plant. By using this pattern and classification, in future upcoming years
the unknown data can be predicted more precisely. Artificial neural networks have been successfully applied to problems in pattern classification, function approximations, optimization, and associative memories. In this work, Multilayer feed- forward networks are trained using back propagation learning
algorithm.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
1) Systems biology aims to understand biology at the system level rather than just individual components. This requires advanced modeling and data analysis techniques.
2) Challenges in systems biology include understanding complex relationships between components, dynamic behavior over time, and controlling systems with unknown functions.
3) Artificial intelligence can help address these challenges through techniques like machine learning, knowledge representation, and problem solving. It has already been applied to tasks like gene alignment modeling and phylogenetic inference.
Free webinar-introduction to bioinformatics - biologist-1Elia Brodsky
The Omics Logic Introduction to Bioinformatics program is a one-month online training program that provides an introduction to the field of bioinformatics for beginners. The program consists of six sessions taught by an international team of experts, covering topics like genomics, transcriptomics, statistical analysis, machine learning, and a final bioinformatics project. Participants will learn data analysis skills in Python and R and how to extract insights from multi-omics datasets with applications in biomedicine. The goal is to prepare students for data-driven research in life sciences through interactive lessons, coding exercises, and independent projects.
Data mining with human genetics to enhance gene based algorithm andIAEME Publication
This document summarizes a research study on enhancing DNA database security through data mining techniques. The study aims to understand relationships between DNA sequences and disease risk, and develop improved parental identification algorithms. Data mining is used to analyze large DNA datasets and extract patterns. Security methods like encryption algorithms are also applied to DNA databases to protect the data. The study examines using data mining, visualization, parallel processing and high-performance computing for DNA sequence analysis while enhancing security. The goal is to discover knowledge from DNA databases and securely store citizen identity data to prevent fraud.
Uses of Artificial Intelligence in BioinformaticsPragya Pai
This presentation is about the usage of Artificial Intelligence in Bioinformatics. These slides give the basic knowledge about usage of Artificial Intelligence in Bioinformatics.
IRJET- Crop Leaf Disease Diagnosis using Convolutional Neural NetworkIRJET Journal
This document describes a system for diagnosing crop leaf diseases using convolutional neural networks. The system can identify diseases in five major crops: corn, rice, wheat, sugarcane, and grapes. It uses a MobileNet model and CNN architecture trained on datasets of images of healthy and diseased leaves. The system achieves 97.33% accuracy in diagnosing diseases in grape leaves. It aims to help farmers detect diseases early and determine the appropriate pesticides.
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKijsc
Classification is a machine learning technique used to predict group membership for data instances. To simplify the problem of classification neural networks are being introduced. This paper focuses on IRIS plant classification using Neural Network. The problem concerns the identification of IRIS plant species on
the basis of plant attribute measurements. Classification of IRIS data set would be discovering patterns from examining petal and sepal size of the IRIS plant and how the prediction was made from analyzing the pattern to form the class of IRIS plant. By using this pattern and classification, in future upcoming years
the unknown data can be predicted more precisely. Artificial neural networks have been successfully applied to problems in pattern classification, function approximations, optimization, and associative memories. In this work, Multilayer feed- forward networks are trained using back propagation learning
algorithm.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
1) Systems biology aims to understand biology at the system level rather than just individual components. This requires advanced modeling and data analysis techniques.
2) Challenges in systems biology include understanding complex relationships between components, dynamic behavior over time, and controlling systems with unknown functions.
3) Artificial intelligence can help address these challenges through techniques like machine learning, knowledge representation, and problem solving. It has already been applied to tasks like gene alignment modeling and phylogenetic inference.
Free webinar-introduction to bioinformatics - biologist-1Elia Brodsky
The Omics Logic Introduction to Bioinformatics program is a one-month online training program that provides an introduction to the field of bioinformatics for beginners. The program consists of six sessions taught by an international team of experts, covering topics like genomics, transcriptomics, statistical analysis, machine learning, and a final bioinformatics project. Participants will learn data analysis skills in Python and R and how to extract insights from multi-omics datasets with applications in biomedicine. The goal is to prepare students for data-driven research in life sciences through interactive lessons, coding exercises, and independent projects.
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...Thitichai Sripan
1) The document discusses a password-based user authentication scheme for a Web-based Human Genome Database system.
2) The system integrates DNA sequence alignment and data compression modules to help physicians examine genetic deficiencies and allows secure user access.
3) The proposed password-based scheme aims to address security issues with existing schemes by making the system more resistant to offline password guessing attacks.
SooryaKiran Bioinformatics is a global bioinformatics solutions provider that focuses on customized bioinformatics services and products. It develops algorithms and software for biological sequence analysis, structure prediction, and other areas. Key products include tools for sequence generation, analysis, and homology identification. The company collaborates with research institutions and has provided solutions for SNP analysis, genome analysis, and mitochondrial DNA analysis to clients around the world.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
1) The document describes a feature extraction program developed to analyze gene expression data from the Gene Expression Omnibus (GEO) database.
2) The program was tested on human transcription factor expression data sets and was able to successfully extract gene expression information.
3) Analyzing specific gene sets from GEO files had previously been a labor-intensive task, but the object-oriented program streamlines this process using C and Perl programming languages.
Efficiency of LSB steganography on medical information IJECEIAES
The development of the medical field had led to the transformation of communication from paper information into the digital form. Medical information security had become a great concern as the medical field is moving towards the digital world and hence patient information, disease diagnosis and so on are all being stored in the digital image. Therefore, to improve the medical information security, securing of patient information and the increasing requirements for communication to be transferred between patients, client, medical practitioners, and sponsors is essential to be secured. The core aim of this research is to make available a complete knowledge about the research trends on LSB Steganography Technique, which are applied to securing medical information such as text, image, audio, video and graphics and also discuss the efficiency of the LSB technique. The survey findings show that LSB steganography technique is efficient in securing medical information from intruder.
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Mathew Varghese
This document describes a computational biology methods program launched by Thomson Reuters to implement network and pathway analysis approaches for analyzing omics data. The program provides tools through an R package to enable analysis of diverse data sets within biological networks. It focuses on selecting state-of-the-art algorithms and implementing them in a robust, easy-to-use software package. The program aims to maximize the utility of internal and external biological network and pathway databases for drug discovery applications such as target identification and patient stratification.
EG-CompBio presentation about Artificial Intelligence in Bioinformatics covering:
-AI (Types, Development)
-Deep Learning (Architecture)
-Bioinformatics Fields
-Input formats for AI
-AI Challenges in Biology
-Example: (Proteomics, Transcriptomics)
-Metagenomics: @ NU
-Taxonomic Classification
-Phenotype Classification
-How to begin in AI in Bioinformatics
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...ijitcs
Sequencing projects arising from high-throughput technologies including those of sequencing DNA microarray allowed measuring simultaneously the expression levels of millions of genes of a biological sample as well as to annotate and to identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information, bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the researchers’ hypothesis. In this context, this article describes and discusses some techniques used for the functional analysis of gene expression data.
Protein sequence classification in data mining– a studyZac Darcy
Since the computerized applications are used all around the world, there occurs the c
ollection of a vast
amount of data
. The important information hidden in vast data is attracting the researchers of multiple
disciplines to make study in developing effective approaches to derive the hidden knowledge within them.
Data mining
may be
considered
to be
the process of extracting
or mining
the useful
and valuable
knowledge from large amounts of data. There
are various different domains in data mining such as text
mining, image mining, sequential pattern
mining, web mining and etc. Among these, sequence mining is
one of the most im
portant research area
which helps to finding the sequential relationships found in the
data.
Sequence mining is applied in
wide range of application
areas such as the analysis of customer
purchase patterns, web access patterns, weather
observations, protei
n sequencing, DNA sequencing, etc. In
protein and DNA analysis, sequence mining
techniques are used for sequence alignment, sequence
searching and sequence classification. In
the area of
protein
sequence analysis, the researchers are
showing their interest
in the field of protein sequence
classification. It has the
ability to discover the
recurring structures that exist in
the
protein
sequences. This paper explains various techniques used by
different researchers in classifying the proteins and also provide
s an overview of different protein sequence
classification methods
This Slide was collected from a seminar "Machine Learning for Data Mining" which was arranged in Daffodil International University.The Chief Guest was Dr. Dewan Md. Farid. He made this wonderful Slide for described to us about Data Mining. He also shared his research experience which was just amazing.Totally unpredictable speech it was from Dr. Dewan Md. Farid Sir. He is one of the famous researcher.I hope , you will enjoy this slide. Details about Dr. Dewan Md. Farid sir is given below in this link
https://ai.vub.ac.be/members/dewan-md-farid
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...ijcsa
Front end of data collection and loading into database manually may cause potential errors in data sets and a very time consuming process. Scanning of a data document in the form of an image and recognition of corresponding information in that image can be considered as a possible solution of this challenge. This paper presents an automated solution for the problem of data cleansing and recognition of user written data to transform into standard printed format with the help of artificial neural networks. Three different neural models namely direct, correlation based and hierarchical have been developed to handle this issue. In a very hostile input environment, the solution is developed to justify the proposed logic.
Performance Evaluation of Different Data Mining Classification Algorithm and ...IOSR Journals
This document evaluates the performance of different data mining classification algorithms and predictive analysis. It applies algorithms like decision trees, naive Bayes, k-nearest neighbor, neural networks, and support vector machines to datasets like Iris, liver disorder, and E. coli. The results show that neural networks and k-nearest neighbor achieved the best performance on these datasets, with accuracy rates up to 97.33% for Iris classification. Feature selection techniques like removing zero-weighted attributes were also found to improve some algorithm performance. Predictive analysis experiments found that neural networks and k-nearest neighbor were most accurate at predicting new class labels.
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
Latest progress in biology, medical science, bioinformatics, and biotechnology has become important and
tremendous amounts of biodata that demands in-depth analysis. On the other hand, recent progress in data
mining research has led to the development of numerous efficient and scalable methods for mining
interesting patterns in large databases. This paper bridge the two fields, data mining and bioinformatics
for successful mining of biological data. Microarrays constitute a new platform which allows the discovery
and characterization of proteins.
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression.
Identification of total number of miRNAs even in completely sequenced organisms is still an
open problem. However, researchers have been using techniques that can predict limited
number of miRNA in an organism. In this paper, we have used homology based approach for
comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx
mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase
repository. We have done pair wise as well as multiple alignments for the available miRNAs in
the repository to identify and analyse conserved regions among related species. Unfortunately,
to the best of our knowledge, miRNA related literature does not provide in depth analysis of
hexapods. We have made an attempt to derive the commonality among the miRNAs and to
identify the conserved regions which are still not available in miRNA repositories. The results
are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for hexapods.
Abstract DNA cryptography the new era of cryptography enhanced the cryptography in terms of time complexity as well as capacity. It uses the DNA strands to hide the information. The repeated sequence of the DNA makes highly difficult for unintended authority to get the message. The level of security can be increased by using the more than one cover medium. This paper discussed a technique that uses two cover medium to hide the message one is DNA and the other is image. The message is encrypted and converted to faked DNA then this faked DNA is hided to the image. Keywords: DNA, cryptography, DNA cryptography
Machine learning has many applications and opportunities in biology, though also faces challenges. It can be used for tasks like disease detection from medical images. Deep learning models like convolutional neural networks have achieved performance exceeding human experts in detecting pneumonia from chest X-rays. Frameworks like DeepChem apply deep learning to problems in drug discovery, while platforms like Open Targets integrate data on drug targets and their relationships to diseases. Overall, machine learning shows promise for advancing biological research, though developing expertise through learning resources and implementing models to solve real-world problems is important.
This document provides an overview of using soft computing techniques for DNA sequence classification. It discusses DNA and DNA sequencing. It then introduces common soft computing techniques used for classification, including neural networks, fuzzy logic, and genetic algorithms. The document proposes using these soft computing methods for DNA sequence classification and describes related studies. It outlines a methodology using neural networks and genetic algorithms and analyzes the advantages of soft computing for this application. In conclusion, it states that soft computing techniques are well-suited for DNA sequence classification problems.
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...home
Named entities are the most informative element of a textual document and identification of the names is very much
important for extracting further information from text. We have developed a conditional random field based system to
identify the named entities from homeopathic diagnosis discussion forum text. We have manually annotated a training
corpus for the task. As manual creation of a sufficiently large annotated corpus is costly and time consuming, we use an
active learning based semi-supervised framework to increase the efficiency of the system with the help of un-annotated
data. Our system achieves the highest f-value of
The document discusses microarray classification using computational intelligent techniques. It describes how microarrays can be used to analyze gene expression levels and classify samples, such as diagnosing tumors. Intelligent techniques like neural networks, genetic algorithms, and fuzzy logic are well-suited for microarray classification due to their ability to handle large and noisy biological data. The objective is to implement these techniques to maximize the accuracy of microarray classifiers for important medical applications.
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal
This document presents a semi-supervised spatial EM framework for microarray analysis to efficiently classify and predict diseases based on gene expression data. It uses a spatial EM algorithm to cluster gene expression data, followed by an SVM classifier to predict diseases and their severity levels. The proposed approach is evaluated based on classification accuracy, computation time, and ability to identify biologically significant genes. Experimental results on disease datasets show improved accuracy compared to other supervised and unsupervised methods. The authors conclude that using the same classifier for gene selection and classification enhances predictive performance, and future work will focus on partitioning genes into clusters correlated with sample categories to further improve accuracy.
BioinformaticsPurpose Bioinformatics is the combination of comp.docxrichardnorman90310
Bioinformatics
Purpose: Bioinformatics is the combination of computer science and biology which used various methods of storing and retrieving the biological data which have pros and cons, scientists are able to discover new information on various diseases, its mutation, it helps in differentiating one organism from another by analyzing their genetic data, biological development and will stop various crimes, disadvantages and develops the algorithm that helps in measuring the sequence similarity.
1. Introduction: Bioinformatics is a field which include molecular biology, statistics, issues, computer problems, and extensive mathematics complex problem. It has two stages deliberately gather various insights from the natural information and to make a computational model. It can be found in the study area of precision and preventive medicine.
0. Background info on of bioinformaticsComment by R Daniel Creider: A, B, C and D are not a part of the introduction. The outline is not organized correctly
0. How to approach bioinformatics?
1. Goals of Bioinformatics
0. Development of efficient algorithms
0. Extension of experimental data by predictions
1. Advantages of bioinformatics
1. World is getting information on new discovery and crimes are prevented
1. Discover new information on various diseases
1. How organisms mutate
1. How it analyses data to differentiate one organism from another
1. Disadvantages of bioinformatics
2. Data manipulation, complexity, lack of well-trained manpower to use the software
2. Misuse of the information
0. Problems behinds it
0. Data about the genetic information lack proper analyzed
0. Importance of Bioinformatics
3. Genetic research
0. Genomics and proteomics
1.
Solution
of the problem
1. Use software wisely
1. Decrease its complexity
1. Future of the bioinformatics
2. Bioinformatics is the present and future of biotechnology
0. Use for research and exchange information for comparison, storage and analysis
BIOINFORMATICS: A Technical Report
Texas A&M University-Commerce
Bishow KunwarComment by R Daniel Creider: Your name comes before the name of the University.
Abstract
The main aim of Bioinformatics is to improve the various methods of storing, retrieving and organizing the biological data by critically evaluating the data. The effectiveness of bi informatics in the field of genetics and genomics is playing its part in a way that particularly in textual mining of biological development. Bioinformatics is the application which is the mix of two fields (software engineering and science). It is a field that includes different things like sub-atomic science, measurement issues, software engineering issues, and broad arithmetic complex issues.
Keywords; Bioinformatics, Genetic, Genomic, Biological Development
Introduction:
Bioinformatics is the application which is the combination of two fields (computer science and biology). It is a field that involves multiple things like molecular .
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...Thitichai Sripan
1) The document discusses a password-based user authentication scheme for a Web-based Human Genome Database system.
2) The system integrates DNA sequence alignment and data compression modules to help physicians examine genetic deficiencies and allows secure user access.
3) The proposed password-based scheme aims to address security issues with existing schemes by making the system more resistant to offline password guessing attacks.
SooryaKiran Bioinformatics is a global bioinformatics solutions provider that focuses on customized bioinformatics services and products. It develops algorithms and software for biological sequence analysis, structure prediction, and other areas. Key products include tools for sequence generation, analysis, and homology identification. The company collaborates with research institutions and has provided solutions for SNP analysis, genome analysis, and mitochondrial DNA analysis to clients around the world.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
1) The document describes a feature extraction program developed to analyze gene expression data from the Gene Expression Omnibus (GEO) database.
2) The program was tested on human transcription factor expression data sets and was able to successfully extract gene expression information.
3) Analyzing specific gene sets from GEO files had previously been a labor-intensive task, but the object-oriented program streamlines this process using C and Perl programming languages.
Efficiency of LSB steganography on medical information IJECEIAES
The development of the medical field had led to the transformation of communication from paper information into the digital form. Medical information security had become a great concern as the medical field is moving towards the digital world and hence patient information, disease diagnosis and so on are all being stored in the digital image. Therefore, to improve the medical information security, securing of patient information and the increasing requirements for communication to be transferred between patients, client, medical practitioners, and sponsors is essential to be secured. The core aim of this research is to make available a complete knowledge about the research trends on LSB Steganography Technique, which are applied to securing medical information such as text, image, audio, video and graphics and also discuss the efficiency of the LSB technique. The survey findings show that LSB steganography technique is efficient in securing medical information from intruder.
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Mathew Varghese
This document describes a computational biology methods program launched by Thomson Reuters to implement network and pathway analysis approaches for analyzing omics data. The program provides tools through an R package to enable analysis of diverse data sets within biological networks. It focuses on selecting state-of-the-art algorithms and implementing them in a robust, easy-to-use software package. The program aims to maximize the utility of internal and external biological network and pathway databases for drug discovery applications such as target identification and patient stratification.
EG-CompBio presentation about Artificial Intelligence in Bioinformatics covering:
-AI (Types, Development)
-Deep Learning (Architecture)
-Bioinformatics Fields
-Input formats for AI
-AI Challenges in Biology
-Example: (Proteomics, Transcriptomics)
-Metagenomics: @ NU
-Taxonomic Classification
-Phenotype Classification
-How to begin in AI in Bioinformatics
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...ijitcs
Sequencing projects arising from high-throughput technologies including those of sequencing DNA microarray allowed measuring simultaneously the expression levels of millions of genes of a biological sample as well as to annotate and to identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information, bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the researchers’ hypothesis. In this context, this article describes and discusses some techniques used for the functional analysis of gene expression data.
Protein sequence classification in data mining– a studyZac Darcy
Since the computerized applications are used all around the world, there occurs the c
ollection of a vast
amount of data
. The important information hidden in vast data is attracting the researchers of multiple
disciplines to make study in developing effective approaches to derive the hidden knowledge within them.
Data mining
may be
considered
to be
the process of extracting
or mining
the useful
and valuable
knowledge from large amounts of data. There
are various different domains in data mining such as text
mining, image mining, sequential pattern
mining, web mining and etc. Among these, sequence mining is
one of the most im
portant research area
which helps to finding the sequential relationships found in the
data.
Sequence mining is applied in
wide range of application
areas such as the analysis of customer
purchase patterns, web access patterns, weather
observations, protei
n sequencing, DNA sequencing, etc. In
protein and DNA analysis, sequence mining
techniques are used for sequence alignment, sequence
searching and sequence classification. In
the area of
protein
sequence analysis, the researchers are
showing their interest
in the field of protein sequence
classification. It has the
ability to discover the
recurring structures that exist in
the
protein
sequences. This paper explains various techniques used by
different researchers in classifying the proteins and also provide
s an overview of different protein sequence
classification methods
This Slide was collected from a seminar "Machine Learning for Data Mining" which was arranged in Daffodil International University.The Chief Guest was Dr. Dewan Md. Farid. He made this wonderful Slide for described to us about Data Mining. He also shared his research experience which was just amazing.Totally unpredictable speech it was from Dr. Dewan Md. Farid Sir. He is one of the famous researcher.I hope , you will enjoy this slide. Details about Dr. Dewan Md. Farid sir is given below in this link
https://ai.vub.ac.be/members/dewan-md-farid
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...ijcsa
Front end of data collection and loading into database manually may cause potential errors in data sets and a very time consuming process. Scanning of a data document in the form of an image and recognition of corresponding information in that image can be considered as a possible solution of this challenge. This paper presents an automated solution for the problem of data cleansing and recognition of user written data to transform into standard printed format with the help of artificial neural networks. Three different neural models namely direct, correlation based and hierarchical have been developed to handle this issue. In a very hostile input environment, the solution is developed to justify the proposed logic.
Performance Evaluation of Different Data Mining Classification Algorithm and ...IOSR Journals
This document evaluates the performance of different data mining classification algorithms and predictive analysis. It applies algorithms like decision trees, naive Bayes, k-nearest neighbor, neural networks, and support vector machines to datasets like Iris, liver disorder, and E. coli. The results show that neural networks and k-nearest neighbor achieved the best performance on these datasets, with accuracy rates up to 97.33% for Iris classification. Feature selection techniques like removing zero-weighted attributes were also found to improve some algorithm performance. Predictive analysis experiments found that neural networks and k-nearest neighbor were most accurate at predicting new class labels.
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
Latest progress in biology, medical science, bioinformatics, and biotechnology has become important and
tremendous amounts of biodata that demands in-depth analysis. On the other hand, recent progress in data
mining research has led to the development of numerous efficient and scalable methods for mining
interesting patterns in large databases. This paper bridge the two fields, data mining and bioinformatics
for successful mining of biological data. Microarrays constitute a new platform which allows the discovery
and characterization of proteins.
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression.
Identification of total number of miRNAs even in completely sequenced organisms is still an
open problem. However, researchers have been using techniques that can predict limited
number of miRNA in an organism. In this paper, we have used homology based approach for
comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx
mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase
repository. We have done pair wise as well as multiple alignments for the available miRNAs in
the repository to identify and analyse conserved regions among related species. Unfortunately,
to the best of our knowledge, miRNA related literature does not provide in depth analysis of
hexapods. We have made an attempt to derive the commonality among the miRNAs and to
identify the conserved regions which are still not available in miRNA repositories. The results
are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for hexapods.
Abstract DNA cryptography the new era of cryptography enhanced the cryptography in terms of time complexity as well as capacity. It uses the DNA strands to hide the information. The repeated sequence of the DNA makes highly difficult for unintended authority to get the message. The level of security can be increased by using the more than one cover medium. This paper discussed a technique that uses two cover medium to hide the message one is DNA and the other is image. The message is encrypted and converted to faked DNA then this faked DNA is hided to the image. Keywords: DNA, cryptography, DNA cryptography
Machine learning has many applications and opportunities in biology, though also faces challenges. It can be used for tasks like disease detection from medical images. Deep learning models like convolutional neural networks have achieved performance exceeding human experts in detecting pneumonia from chest X-rays. Frameworks like DeepChem apply deep learning to problems in drug discovery, while platforms like Open Targets integrate data on drug targets and their relationships to diseases. Overall, machine learning shows promise for advancing biological research, though developing expertise through learning resources and implementing models to solve real-world problems is important.
This document provides an overview of using soft computing techniques for DNA sequence classification. It discusses DNA and DNA sequencing. It then introduces common soft computing techniques used for classification, including neural networks, fuzzy logic, and genetic algorithms. The document proposes using these soft computing methods for DNA sequence classification and describes related studies. It outlines a methodology using neural networks and genetic algorithms and analyzes the advantages of soft computing for this application. In conclusion, it states that soft computing techniques are well-suited for DNA sequence classification problems.
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...home
Named entities are the most informative element of a textual document and identification of the names is very much
important for extracting further information from text. We have developed a conditional random field based system to
identify the named entities from homeopathic diagnosis discussion forum text. We have manually annotated a training
corpus for the task. As manual creation of a sufficiently large annotated corpus is costly and time consuming, we use an
active learning based semi-supervised framework to increase the efficiency of the system with the help of un-annotated
data. Our system achieves the highest f-value of
The document discusses microarray classification using computational intelligent techniques. It describes how microarrays can be used to analyze gene expression levels and classify samples, such as diagnosing tumors. Intelligent techniques like neural networks, genetic algorithms, and fuzzy logic are well-suited for microarray classification due to their ability to handle large and noisy biological data. The objective is to implement these techniques to maximize the accuracy of microarray classifiers for important medical applications.
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal
This document presents a semi-supervised spatial EM framework for microarray analysis to efficiently classify and predict diseases based on gene expression data. It uses a spatial EM algorithm to cluster gene expression data, followed by an SVM classifier to predict diseases and their severity levels. The proposed approach is evaluated based on classification accuracy, computation time, and ability to identify biologically significant genes. Experimental results on disease datasets show improved accuracy compared to other supervised and unsupervised methods. The authors conclude that using the same classifier for gene selection and classification enhances predictive performance, and future work will focus on partitioning genes into clusters correlated with sample categories to further improve accuracy.
BioinformaticsPurpose Bioinformatics is the combination of comp.docxrichardnorman90310
Bioinformatics
Purpose: Bioinformatics is the combination of computer science and biology which used various methods of storing and retrieving the biological data which have pros and cons, scientists are able to discover new information on various diseases, its mutation, it helps in differentiating one organism from another by analyzing their genetic data, biological development and will stop various crimes, disadvantages and develops the algorithm that helps in measuring the sequence similarity.
1. Introduction: Bioinformatics is a field which include molecular biology, statistics, issues, computer problems, and extensive mathematics complex problem. It has two stages deliberately gather various insights from the natural information and to make a computational model. It can be found in the study area of precision and preventive medicine.
0. Background info on of bioinformaticsComment by R Daniel Creider: A, B, C and D are not a part of the introduction. The outline is not organized correctly
0. How to approach bioinformatics?
1. Goals of Bioinformatics
0. Development of efficient algorithms
0. Extension of experimental data by predictions
1. Advantages of bioinformatics
1. World is getting information on new discovery and crimes are prevented
1. Discover new information on various diseases
1. How organisms mutate
1. How it analyses data to differentiate one organism from another
1. Disadvantages of bioinformatics
2. Data manipulation, complexity, lack of well-trained manpower to use the software
2. Misuse of the information
0. Problems behinds it
0. Data about the genetic information lack proper analyzed
0. Importance of Bioinformatics
3. Genetic research
0. Genomics and proteomics
1.
Solution
of the problem
1. Use software wisely
1. Decrease its complexity
1. Future of the bioinformatics
2. Bioinformatics is the present and future of biotechnology
0. Use for research and exchange information for comparison, storage and analysis
BIOINFORMATICS: A Technical Report
Texas A&M University-Commerce
Bishow KunwarComment by R Daniel Creider: Your name comes before the name of the University.
Abstract
The main aim of Bioinformatics is to improve the various methods of storing, retrieving and organizing the biological data by critically evaluating the data. The effectiveness of bi informatics in the field of genetics and genomics is playing its part in a way that particularly in textual mining of biological development. Bioinformatics is the application which is the mix of two fields (software engineering and science). It is a field that includes different things like sub-atomic science, measurement issues, software engineering issues, and broad arithmetic complex issues.
Keywords; Bioinformatics, Genetic, Genomic, Biological Development
Introduction:
Bioinformatics is the application which is the combination of two fields (computer science and biology). It is a field that involves multiple things like molecular .
BioinformaticsPurpose Bioinformatics is the combination of comp.docxjasoninnes20
Bioinformatics
Purpose: Bioinformatics is the combination of computer science and biology which used various methods of storing and retrieving the biological data which have pros and cons, scientists are able to discover new information on various diseases, its mutation, it helps in differentiating one organism from another by analyzing their genetic data, biological development and will stop various crimes, disadvantages and develops the algorithm that helps in measuring the sequence similarity.
1. Introduction: Bioinformatics is a field which include molecular biology, statistics, issues, computer problems, and extensive mathematics complex problem. It has two stages deliberately gather various insights from the natural information and to make a computational model. It can be found in the study area of precision and preventive medicine.
0. Background info on of bioinformaticsComment by R Daniel Creider: A, B, C and D are not a part of the introduction. The outline is not organized correctly
0. How to approach bioinformatics?
1. Goals of Bioinformatics
0. Development of efficient algorithms
0. Extension of experimental data by predictions
1. Advantages of bioinformatics
1. World is getting information on new discovery and crimes are prevented
1. Discover new information on various diseases
1. How organisms mutate
1. How it analyses data to differentiate one organism from another
1. Disadvantages of bioinformatics
2. Data manipulation, complexity, lack of well-trained manpower to use the software
2. Misuse of the information
0. Problems behinds it
0. Data about the genetic information lack proper analyzed
0. Importance of Bioinformatics
3. Genetic research
0. Genomics and proteomics
1.
Solution
of the problem
1. Use software wisely
1. Decrease its complexity
1. Future of the bioinformatics
2. Bioinformatics is the present and future of biotechnology
0. Use for research and exchange information for comparison, storage and analysis
BIOINFORMATICS: A Technical Report
Texas A&M University-Commerce
Bishow KunwarComment by R Daniel Creider: Your name comes before the name of the University.
Abstract
The main aim of Bioinformatics is to improve the various methods of storing, retrieving and organizing the biological data by critically evaluating the data. The effectiveness of bi informatics in the field of genetics and genomics is playing its part in a way that particularly in textual mining of biological development. Bioinformatics is the application which is the mix of two fields (software engineering and science). It is a field that includes different things like sub-atomic science, measurement issues, software engineering issues, and broad arithmetic complex issues.
Keywords; Bioinformatics, Genetic, Genomic, Biological Development
Introduction:
Bioinformatics is the application which is the combination of two fields (computer science and biology). It is a field that involves multiple things like molecular ...
Genetic prediction using Machine Learning Techniques .pptxHabtamuAyenew4
This document discusses using machine learning techniques for genetic prediction. It begins with an introduction to genes and genetics. It then discusses how biotechnology and information technology are integrating, with artificial intelligence finding applications in genetic prediction. Machine learning and different types of machine learning like supervised, unsupervised, and deep learning are explained. Application areas of genetic prediction using machine learning are discussed, along with current trends and challenges. Future work with deep learning algorithms for genomic tasks is mentioned. The document concludes that artificial intelligence is outperforming other methods in clinical diagnostics and more research combining different data sources is still needed.
The classification of different types of tumors is of great importance in cancer diagnosis and its drug discovery. Cancer classification via gene expression data is known to contain the keys for solving the fundamental problems relating to the diagnosis of cancer. The recent advent of DNA microarray technology has made rapid monitoring of thousands of gene expressions possible. With this large quantity of gene expression data, scientists have started to explore the opportunities of classification of cancer using a gene expression dataset. To gain a profound understanding of the classification of cancer, it is necessary to take a closer look at the problem, the proposed solutions, and the related issues altogether. In this research thesis, I present a new way for Leukemia classification using the latest AI technique of Deep learning using Google TensorFlow on gene expression data.
introduction,history scope and applications of
relation to other fields , bioinformatics,biological databases,computers internet,sequence development, and
introduction to sequence development and alignment
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
This document summarizes a research paper that uses genetic algorithms for data mining of mass spectrometry data. The paper applies genetic algorithms to the data mining step of the KDD (Knowledge Discovery in Databases) process. Specifically, it uses genetic algorithms to extract optimal features or peaks from mass spectrometry data that can distinguish cancer patients from control patients. The genetic algorithm searches for discriminative features, which are ion intensity levels at specific mass-charge values. Over generations of the genetic algorithm, the paper finds significant patterns of masses that can differentiate between normal and cancer patients.
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
Enormous generation of biological data and the need of analysis of that data led to the generation of the field Bioinformatics. Data mining is the stream which is used to derive, analyze the data by exploring the hidden patterns of the biological data. Though, data mining can be used in analyzing biological data such as genomic data, proteomic data here Gene Expression (GE) Data is considered for evaluation. GE is generated from Microarrays such as DNA and oligo micro arrays. The generated data is analyzed through the clustering techniques of data mining. This study deals with an implement the basic clustering approach K-Means and various clustering approaches like Hierarchal, Som, Click and basic fuzzy based clustering approach. Eventually, the comparative study of those approaches which lead to the effective approach of cluster analysis of GE.The experimental results shows that proposed algorithm achieve a higher clustering accuracy and takes less clustering time when compared with existing algorithms.
An Overview on Gene Expression AnalysisIOSR Journals
This document provides an overview of gene expression analysis techniques. It discusses how DNA microarray technology allows measuring the expression of thousands of genes in parallel under different conditions. This helps with disease diagnosis, drug discovery, and other biomedical research. The document then summarizes several data mining and machine learning techniques that are used to analyze gene expression data, including clustering, classification using artificial neural networks, nearest centroid classification, and decision trees. It notes that these techniques help with cancer classification, identifying subtypes, and predicting patient outcomes.
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems.
They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
This document discusses the field of bioinformatics. It begins by defining bioinformatics as the combination of biology, computer science, and information technology, and explains that it involves applying computational techniques to understand biological data. It distinguishes bioinformatics from computational biology. The document then outlines what tasks bioinformatics can perform, describes the components and levels of organization in bioinformatics, and discusses the main branches of genomics, proteomics, and transcriptomics.
Data mining involves using machine learning and statistical methods to discover patterns in large datasets and is useful in bioinformatics for analyzing biological data. Bioinformatics analyzes data from sequences, molecules, gene expressions, and pathways. Data mining can help understand these rapidly growing biological datasets. Common data mining tools in bioinformatics include BLAST for sequence comparisons, Entrez for integrated database searching, and ORF Finder for identifying open reading frames. Data mining approaches are well-suited to the enormous volumes of data in bioinformatics databases.
This document discusses the origin, history, and applications of bioinformatics. It begins by defining bioinformatics as the use of computation to extract knowledge from biological data through collection, storage, manipulation and modeling of data for analysis and prediction. The history section notes that the term was coined in the 1970s but applications expanded in the 1990s with increased molecular biology data. It then outlines several key applications of bioinformatics like functional genomics, structural genomics, comparative genomics, and medical informatics.
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDYZac Darcy
This document summarizes research on protein sequence classification techniques in data mining. It discusses how protein sequences can be classified based on their amino acid sequences to determine evolutionary relationships and biological functions. Various classification methods are described that extract features from protein sequences and apply techniques like neural networks, rough set theory, and feature hashing to classify sequences into families. Challenges in classifying large amounts of biological data are also discussed. Effective classification of protein sequences is important for tasks like predicting properties of new proteins and understanding protein evolution.
Protein Sequence Classification In Data Mining - A StudyZac Darcy
Since the computerized applications are used all around the world, there occurs the collection of a vast
amount of data. The important information hidden in vast data is attracting the researchers of multiple
disciplines to make study in developing effective approaches to derive the hidden knowledge within them.
Data mining may be considered to be the process of extracting or mining the useful and valuable
knowledge from large amounts of data. There are various different domains in data mining such as text
mining, image mining, sequential pattern mining, web mining and etc. Among these, sequence mining is
one of the most important research area which helps to finding the sequential relationships found in the
data. Sequence mining is applied in wide range of application areas such as the analysis of customer
purchase patterns, web access patterns, weather observations, protein sequencing, DNA sequencing, etc. In
protein and DNA analysis, sequence mining techniques are used for sequence alignment, sequence
searching and sequence classification. In the area of protein sequence analysis, the researchers are
showing their interest in the field of protein sequence classification. It has the ability to discover the
recurring structures that exist in the protein sequences. This paper explains various techniques used by
different researchers in classifying the proteins and also provides an overview of different protein sequence
classification methods.
This document provides a summary of a research article that presents a comprehensive study on outlier detection in data mining. It begins with an abstract that outlines the paper's focus on delivering a survey of the literature on outliers and various approaches for their detection. Outliers are defined as data points that diverge partially or totally from the rest of the data set. The document then provides more details on outliers, data mining techniques, and knowledge discovery in data mining. It describes outliers as data objects that cannot be fitted into any cluster and can differ from neighboring data points or the complete data set. The paper aims to present a literature survey on outliers in different types of data sets and methodologies for their detection.
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
Similar to MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL) (20)
Accident detection system project report.pdfKamal Acharya
The Rapid growth of technology and infrastructure has made our lives easier. The
advent of technology has also increased the traffic hazards and the road accidents take place
frequently which causes huge loss of life and property because of the poor emergency facilities.
Many lives could have been saved if emergency service could get accident information and
reach in time. Our project will provide an optimum solution to this draw back. A piezo electric
sensor can be used as a crash or rollover detector of the vehicle during and after a crash. With
signals from a piezo electric sensor, a severe accident can be recognized. According to this
project when a vehicle meets with an accident immediately piezo electric sensor will detect the
signal or if a car rolls over. Then with the help of GSM module and GPS module, the location
will be sent to the emergency contact. Then after conforming the location necessary action will
be taken. If the person meets with a small accident or if there is no serious threat to anyone’s
life, then the alert message can be terminated by the driver by a switch provided in order to
avoid wasting the valuable time of the medical rescue team.
Home security is of paramount importance in today's world, where we rely more on technology, home
security is crucial. Using technology to make homes safer and easier to control from anywhere is
important. Home security is important for the occupant’s safety. In this paper, we came up with a low cost,
AI based model home security system. The system has a user-friendly interface, allowing users to start
model training and face detection with simple keyboard commands. Our goal is to introduce an innovative
home security system using facial recognition technology. Unlike traditional systems, this system trains
and saves images of friends and family members. The system scans this folder to recognize familiar faces
and provides real-time monitoring. If an unfamiliar face is detected, it promptly sends an email alert,
ensuring a proactive response to potential security threats.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Height and depth gauge linear metrology.pdfq30122000
Height gauges may also be used to measure the height of an object by using the underside of the scriber as the datum. The datum may be permanently fixed or the height gauge may have provision to adjust the scale, this is done by sliding the scale vertically along the body of the height gauge by turning a fine feed screw at the top of the gauge; then with the scriber set to the same level as the base, the scale can be matched to it. This adjustment allows different scribers or probes to be used, as well as adjusting for any errors in a damaged or resharpened probe.
Supermarket Management System Project Report.pdfKamal Acharya
Supermarket management is a stand-alone J2EE using Eclipse Juno program.
This project contains all the necessary required information about maintaining
the supermarket billing system.
The core idea of this project to minimize the paper work and centralize the
data. Here all the communication is taken in secure manner. That is, in this
application the information will be stored in client itself. For further security the
data base is stored in the back-end oracle and so no intruders can access it.
Mechatronics is a multidisciplinary field that refers to the skill sets needed in the contemporary, advanced automated manufacturing industry. At the intersection of mechanics, electronics, and computing, mechatronics specialists create simpler, smarter systems. Mechatronics is an essential foundation for the expected growth in automation and manufacturing.
Mechatronics deals with robotics, control systems, and electro-mechanical systems.
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Transcat
Join us for this solutions-based webinar on the tools and techniques for commissioning and maintaining PV Systems. In this session, we'll review the process of building and maintaining a solar array, starting with installation and commissioning, then reviewing operations and maintenance of the system. This course will review insulation resistance testing, I-V curve testing, earth-bond continuity, ground resistance testing, performance tests, visual inspections, ground and arc fault testing procedures, and power quality analysis.
Fluke Solar Application Specialist Will White is presenting on this engaging topic:
Will has worked in the renewable energy industry since 2005, first as an installer for a small east coast solar integrator before adding sales, design, and project management to his skillset. In 2022, Will joined Fluke as a solar application specialist, where he supports their renewable energy testing equipment like IV-curve tracers, electrical meters, and thermal imaging cameras. Experienced in wind power, solar thermal, energy storage, and all scales of PV, Will has primarily focused on residential and small commercial systems. He is passionate about implementing high-quality, code-compliant installation techniques.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Blood finder application project report (1).pdfKamal Acharya
Blood Finder is an emergency time app where a user can search for the blood banks as
well as the registered blood donors around Mumbai. This application also provide an
opportunity for the user of this application to become a registered donor for this user have
to enroll for the donor request from the application itself. If the admin wish to make user
a registered donor, with some of the formalities with the organization it can be done.
Specialization of this application is that the user will not have to register on sign-in for
searching the blood banks and blood donors it can be just done by installing the
application to the mobile.
The purpose of making this application is to save the user’s time for searching blood of
needed blood group during the time of the emergency.
This is an android application developed in Java and XML with the connectivity of
SQLite database. This application will provide most of basic functionality required for an
emergency time application. All the details of Blood banks and Blood donors are stored
in the database i.e. SQLite.
This application allowed the user to get all the information regarding blood banks and
blood donors such as Name, Number, Address, Blood Group, rather than searching it on
the different websites and wasting the precious time. This application is effective and
user friendly.
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
1. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
DOI : 10.5121/ijcsea.2012.2205 53
MICROARRAY GENE EXPRESSION
ANALYSIS USING TYPE 2 FUZZY LOGIC
(MGA-FL)
V.Bhuvaneswari1
and S.J.Brintha2
1
Assistant Professor, Department of Computer Application, Bharathiar University,
Coimbatore, India
bhuvanes_v@yahoo.com
2
M.Phil Research Scholar, Department of Computer Application, Bharathiar University,
Coimbatore, India
brinthasj@gmail.com
Abstract
Data mining is defined as the process of extracting or mining knowledge from vast and large database.
Data mining is an interdisciplinary field that brings together techniques from machine learning, pattern
recognition, statistics, databases, and visualization to address the issue of information extraction from
large databases. Bioinformatics is defined as the science of organizing and analyzing the biological data.
Microarray technology helps biologists for monitoring expression of thousands of genes in a single
experiment on a small chip. Microarray is also called as DNA chip, gene chip, or biochip is used to analyze
the gene expression profiles. Fuzzy Logic is defined as a multivalued logic that provides the intermediate
values to be defined between conventional evaluations like true or false, yes or no, high or low, etc.In this
paper, a type 2 fuzzy logic approach is used in microarray gene expression data to convert the numerical
values into fuzzy terms. After fuzzification, the fuzzy association patterns are discovered. A framework is
proposed to cluster microarray gene data based on fuzzy association patterns. Then the proposed type 2
fuzzy approach is compared with traditional clustering algorithms.
Keywords
Bioinformatics, Microarray, Fuzzy Logic, Association, Clustering.
1. INTRODUCTION
Data mining is the non-trivial process of identifying valid, novel, potentially useful, and
ultimately understandable patterns in data [2]. Data Mining is an essential step in Knowledge
Discovery in Database (KDD) process and it is defined as a process of information extraction.
Data Mining is concerned with the algorithmic means by which patterns or structures are
enumerated from the data under computational efficiency limitations.
Data mining can be performed on data represented in quantitative, textual, or Multimedia forms.
The widespread use of databases and the explosive growth in their sizes, organizations are faced
with the problem of information overload [3]. The effective utilization of these massive volumes
2. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
54
of data is becoming a major problem for all enterprises. Data mining supports automatic
exploration of data and evaluates the patterns.
Data Mining is a set of processes related to analyzing and discovering useful, actionable
knowledge buried deep beneath large volumes of data sets. This knowledge discovery involves
finding patterns or behaviors within the data that lead to some profitable business action. The
main aim of data mining is to discover hidden fact in databases. The two primary goals of data
mining are prediction and description. Prediction involves using some variables or fields in the
database to predict unknown or future values of other variables of interest. Description focuses
on finding human-interpretable patterns that describes the data.
Bioinformatics is defined as the application of computer technology to the management of
biological information. Bioinformatics is the science of storing, extracting, organizing, analyzing,
interpreting and utilizing information from biological sequences and molecules. The main goal of
bioinformatics is to enhance the understanding of biological processes [9]. Bioinformatics derives
knowledge from computer analysis of biological data. It is the interdisciplinary field of
developing and utilizing computer databases and algorithms to accelerate and enhance biological
research. Bioinformatics also deals with algorithms, databases and information systems, web
technologies, artificial intelligence and soft computing, information and computation theory,
software engineering, data mining, image processing, etc.
Bioinformatics and data mining together provide exciting and challenging researches in computer
science. Advances such as genome-sequencing initiatives, microarrays, proteomics, and
functional and structural genomics have pushed the frontiers of human knowledge. Data mining
provides the necessary tools for better understanding of gene expression, drug design, and other
emerging problems in genomics and proteomics. Knowledge Discovery in Database (KDD)
techniques plays an important role in the analysis and discovery of sequence, structure and
functional patterns or models from large sequence databases. Data mining approaches are ideally
suited for bioinformatics, since it contains rich data.
Data are collected from genome analysis, protein analysis, microarray data and probes of gene
function by genetic methods. Maintaining this type of enormous data is a tedious process for most
laboratories. Mostly the data are kept on spread sheets for easy usage. Data are arranged in
appropriate database format. A database format is important in applied areas such as medical,
agricultural and pharmaceutical research.
Gene expression is the conversion of the genetic information that is present in a DNA sequence
into a unit of biological function in a living cell. It involves two processes like transcription and
translation. The process of converting a gene into RNA is called as transcription. The
transcription is followed by the process of translation of the RNA into protein. There are many
techniques used for determining and quantifying gene expression, and many of these techniques
have substantial statistical components to them. The process of measuring gene expression via
cDNA is called gene expression analysis or gene expression profiling. Gene Expression analysis
is used to determine whether the particular gene is expressed or not. In gene expression analysis,
the expression levels of thousands of genes are simultaneously monitored to study the effects of
certain treatments, diseases, and developmental stages on gene expression.
Microarray is also called as DNA chip that is used for analyzing gene expression data. DNA
microarrays are becoming a fundamental tool in genomic research. Microarray methods were
initially developed to study differential gene expression using complex populations of RNA [10].
A microarray is a huge collection of spots that contain massive amounts of compressed data like
DNA, protein, or tissue arranged on array for easy simultaneous analysis. Each spot of a
3. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
55
microarray contains a unique DNA sequence, Protein or Tissue. Microarray analysis is a
technique that is used to determine whether genes are expressed or not and it also supports the
discovery of drug-sensitive genes and the chemical substructures associated with specific genetic
responses.
Microarray is a method for profiling gene and protein expression in cells and tissues. As a new
technology, microarrays have great potential to provide genome-wide patterns of gene expression,
to make accurate medical diagnosis, and to explore genetic causes underlying diseases. Scientists
used the DNA microarrays to measure the expression levels of large numbers of genes
simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles
(10−12
moles) of a specific DNA sequence. These picomoles are known as probes (or reporters).
These probes consist of short section of a gene or other DNA elements that are used to hybridize a
cDNA or cRNA sample called target. Several microarray gene expression datasets are publicly
available on the Internet. These datasets include a large number of gene expression values and
there is a need to have an accurate method to extract knowledge and useful information from
these microarray gene expression datasets.
Figure 1. Microarray Gene Expression Analysis
Fuzzy logic is a multivalued logic that differs from "crisp logic", where binary sets have two-
valued logic. Fuzzy logic variables have truth value that ranges in degree between 0 and 1. Fuzzy
logic is a superset of conventional Boolean logic that has been extended to handle the concept of
partial truth.
A membership function (MF) is a curve that defines how each point in the input space is mapped
to a membership value (or degree of membership) between 0 and 1. Fuzzy Logic consists of Type
1 and Type 2 fuzzy. A type-2 fuzzy set contains the grades of membership that are themselves
fuzzy. Type-1 Fuzzy Logic is unable to handle rule uncertainties. Type-2 Fuzzy Logic can handle
rule uncertainties effectively and efficiently [8]. A Type-2 membership grade can be any subset in
the primary membership. For each primary membership there exists a secondary membership that
defines the possibilities for the primary membership.
Type 1 fuzzy contains the constant values. A Type-2 Fuzzy Logic is an extension of Type 1
Fuzzy Logic in which the fuzzy sets comes from Existing Type 1 Fuzzy. Type 2 Fuzzy sets are
again characterized by IF–THEN rules [14]. Type-2 Fuzzy is computationally intensive because
type reduction is very intensive. Type-2 fuzzy is used for modeling uncertainty and imprecision in
4. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
56
a better way. The type-2 fuzzy sets are called as “fuzzy fuzzy” sets where the fuzzy degree of
membership is fuzzy itself that results from Type 1 Fuzzy [7]. The gene expression levels are
qualitatively classified into low, medium and high states to a varying degree based on a set of
membership functions.
Gene subset selection is important for both classifying and analyzing the microarray gene
expression data. It is a very difficult process because the gene expression data contain redundant
information and noises. To solve this problem, fuzzy logic based pre-processing approaches are
used. Fuzzy inference rules are used to transform the gene expression levels of a given dataset
into fuzzy values. Then the associations (similarity relations) to these fuzzy values are applied to
define fuzzy association patterns. Each fuzzy equivalence group and association patterns contain
strongly similar genes.
The presence of duplicate information in microarray makes the classification and clustering task
more difficult. A fuzzy logic based approach is used for eliminating the redundancy of
information in microarray data. This technique is easy to understand and can be used for a
biological interpretation. The aim of the paper is to propose a data mining technique to apply
fuzzy logic and for clustering microarray data to group genes with similar functionalities to
analyze the gene expression profiles.
The paper is organized as follows. Section II provides the literature study of the association rule
mining, Type 2 fuzzy logic for microarray. In Section III the MGA-FL methodology for gene
expression analysis is explained. Section IV explores the implemented results for the yeast
dataset. The final section draws the conclusion of the paper, the major strength of this work and
direction for future research.
2. REVIEW OF LITERATURE
Micro array technology is one of the latest advanced technologies in molecular biology. It enables
monitoring the expression thousands of genes (virtually the entire genome) simultaneously. A
wide range of different methods have been proposed for the analysis of gene expression data
including hierarchical clustering, self-organizing maps, and k-means approaches.
In [23], Zuoliang Chen et al., (2008) has proposed an associative classification approach namely
Classification with Fuzzy Association Rules (CFAR). Here the Fuzzy logic technique is used for
partitioning the domains. The experimental results inferred that CFAR generates better
understandability than the traditional approaches. In this paper Nenad Jukic et al., [13] (2011) has
proposed qualified association rules mining. This qualified association rule mining is an extension
of the association rules data mining method, which discovers previously unknown correlations
under some circumstances. This method aims at improving the action results. Various
experiments are carried out to illustrate how the qualified rules increase the effectiveness when
comparing with standard association rules.
In [22], Yuchun Tang et al., (2008) proposed a new Fuzzy Granular Support Vector Machine
Recursive Feature Elimination algorithm (FGSVM-RFE). This algorithm eliminates the
irrelevant, redundant, or noisy genes in different granules at different stages and selects highly
informative genes with potentially different biological functions. In [21], Yan-Fei Wang et al.,
(2011) has proposed a Type-2 fuzzy membership test for disease-associated gene identification on
microarrays to improve traditional fuzzy methods. In [16], Peter J. Woolf et al., (2000) developed
a novel algorithm for analyzing the gene expression profiles. This algorithm used the fuzzy logic
to transform gene expression values into qualitative descriptors.
5. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
57
In [1], AnirbanMukhopadhyay et al., (2009) has proposed a novel multiobjective variable string
fuzzy clustering scheme for clustering microarray gene expression data. The proposed technique
evolves the total number of clusters along with the clustering result. The proposed method is
applied on three publicly available real life gene expression datasets.
In [4], the author Dongguang Li (2008) presented information about how to discover useful
information based on the DNA microarray expression data collected from mouse experiments for
the leukemia research. Many methodologies involving fuzzy sets, neural networks, genetic
algorithms, rough sets, wavelets, and their hybridizations, are suggested to provide approximate
solutions. In [18], Qilian Liang et al., (2000) has proposed a method for designing interval Type-2
fuzzy logic systems. The concept of upper and lower membership functions is introduced and
defined in this paper.
In [19], Raed I. Hamed et al., (2010) developed a fuzzy reasoning model based on the Fuzzy Petri
Net. In [17], Pradipta Maji (2011) has proposed a clustering algorithm called fuzzy rough
supervised attribute clustering (FRSAC). The effectiveness of the FRSAC algorithm is compared
with existing supervised and unsupervised clustering algorithms like hierarchical clustering, the
k-means algorithm, SOM and PCA.
In [15], Pablo Martín-Muñoz et al., (2010) presented a new algorithm FuzzyCN2 for extracting
conjunctive fuzzy classification rules. The algorithm introduces the use of linguistic hedges and
produces more compact rules. In the article [12], Muzeyyen Bulut Ozek et al., (2007) has
presented a new Type-2 Fuzzy Logic Toolbox in MATLAB programming language. The primary
goal is to help the user to understand and implement Type-2 fuzzy logic systems easily.
In [14], Nilesh N. Karnik et al., (1999) has introduced a Type-2 fuzzy logic system that can
handle rule uncertainties. Type-2 Fuzzy logic is applied to time-varying channel equalization and
it is proved that the Type 2 fuzzy logic system provides better performance than the existing
Type-1 Fuzzy and nearest neighbor classifier.
In [11], Mohammad Mehdi Pourhashem et al., (2010) has proposed a new method based on fuzzy
clustering and genes semantic similarity to estimate missing values in microarray data. In the
proposed method, microarray gene data are clustered based on genes semantic similarity and their
gene expression values. In [5], Edmundo Bonilla Huerta et al., (2008) introduced a fuzzy logic
based pre-processing approach composed of two main steps. First, the fuzzy inference rules are
used to transform the gene expression levels of a given dataset into fuzzy values. Then a
similarity relation is applied to these fuzzy values to define fuzzy equivalence groups.
In [20], Vincent S. Tseng et al., (2006) has proposed two fuzzy data mining approaches for
microarray analysis called Fuzzy Associative Gene Expression (FAGE) and Ripple Effective
Gene Expression Rule (REGER) algorithms. Both techniques transform the microarray gene data
into fuzzy items. Then the fuzzy operators and specially designed data structures are used to
discover the relationships among genes.
3. PROBLEM FORMULATION AND METHODOLOGY
The main objective of this research work is to propose a framework to classify and analyze
Microarray Gene data by using data mining and fuzzy logic. The specific objective of the work is
to cluster the microarray gene data based on fuzzy association patterns and to compare the
proposed work with existing traditional algorithms.
6. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
58
Genome sequencing projects have provided static pictures of the genomes of many organisms.
Fuzzy logic incorporates a simple rule based approach for solving problems rather than
attempting to model a system mathematically. Linguistic variables are the input or output
variables of the system whose values are words or sentences from a natural language, instead of
numerical values. The applied fuzzy logic consists of a set of fuzzy if-then rules that enable
accurate nonlinear classification of input patterns. Fuzzy logic transforms quantitative expression
values into linguistic terms that are able to uncover hidden fuzzy sequential associations between
genes.
The proposed framework given in Figure 2 is used for analyzing associations of microarray gene
data using fuzzy logic and clustering approach. The proposed framework consists of three phases.
In the first phase the preprocessing and fuzzification of microarray data is done. In the second
phase the fuzzy association pattern of genes are discovered and the microarray gene data is
grouped according to association patterns. Two types of clustering are done within the third phase
and the results are compared with the proposed work for finding accuracy.
Figure 2. Framework (MGA-FL)
3.1 Preprocessing and Fuzzification
The Yeast Saccharomyces Cerevisiae Dataset is downloaded from National Center for
Biotechnology Information Gene Expression Omnibus (NCBI GEO) website contains about 6240
Genes with their corresponding Yeast values (0.04481, 0.725). Microarray gene data contains
noisy and inconsistent data. Preprocessing is the process of removal of inconsistent data and to
extract necessary information. Table 1 show the microarray gene data downloaded consists of
empty spots. The preprocessing phase is used for extracting needed information from the data that
consists of two main processes.
Table 1. Microarray gene data with empty spots
Gene Id Value Raw Log Ratio
(635/532)
Diameter F635 Median F635 Mean
28080521 2.250 100 74 88
28084699 -0.02025 0.785 110 152 248
28084700 0.875 100 72 86
28081779 -0.03629 1.008 110 181 302
7. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
59
In the preprocessing step the empty spots are replaced with null values using the isempty method.
The empty spots are replaced by unique elements in dataset using unique method. Then the null
values in the dataset are replaced by the maximum unique elements by using max method. Table
2 shows the preprocessed data. After replacing all the null values in the microarray gene data, the
preprocessed gene expression values are given as input for the next process, the fuzzification.
Table 2. Preprocessed Microarray gene data
Gene expression data is quantitative and it contains numeric values. In Type 2 fuzzy ranges are
given for the fuzzy values itself. The membership function for Type 2 fuzzy is given in Figure 3.
Gene expression levels are qualitatively classified into up regulated (U), down regulated (D), low
(L), medium (M) and high (H) states to a varying degree based on a set of membership functions
in Type 2 Fuzzy logic.
Figure 3. Membership Function for Type 2 Fuzzy
The calculated fuzzy numeric values are shown in Table 3. Then the numeric values are converted
into fuzzy linguistic terms are shown in Table 4.
Table 3. Gene data with Fuzzy values
Gene Id Value Raw Log Ratio
(635/532)
Diameter F635 Median F635 Mean
28080521 2 2.250 100 74 88
28084699 -0.02025 0.785 110 152 248
28084700 2 0.875 100 72 86
Gene Id Value Raw Log Ratio
(635/532)
Diameter F635 Median F635 Mean
28080521 -0.04578 0.99997 1 0.991927 0.9920
28084699 0.76562 0.99995 1 0.97578 0.97504
28084700 0.87553 0.99996 1 0.966735 0.96640
28081779 0.93850 0.99997 1 0.969313 0.969085
8. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
60
Table 4. Gene data with Fuzzy Terms
3.2 Fuzzy Association Based Gene Clustering
The numeric quantitative values of gene data are converted into fuzzy terms using fuzzy logic.
After fuzzification, the fuzzy values are given as input for the next phase, the finding of gene
association. In the second phase, to find the fuzzy association pattern lpq ljk the association
between the linguistic terms lpq and ljk are discovered.The microarray gene data with three
states contains nine possible associations according to the gene expression states. The
gene data contains fuzzy association patterns like
ܮ → ,ܮ ܮ → ,ܪ ܮ → ,ܣ ܣ → ,ܮ ܣ → ,ܣ ܣ → ,ܪ ܪ → ,ܮ ܪ → ,ܣ ܪ → ܪ
The gene positions corresponding to a particular association are grouped as shown in Table 5. For
example 51, 69, 3, 5, indicate the gene position numbers in which the associations are present.
Table 5. Association patterns and its Gene numbers
Rule No Associations Gene numbers
1 L A 51,69,165,177,281,317,654,655,703
2 L H 1,2,3,4,5,6,7,8,9,10,11,12,13
3 A H 2308,4272,4273,4274,4630,5850
4 A L 17,18,19,20,21,22,23,24,25,26,27,28
5 H L 5755,5756,5757,5758,5759,5760
6 H A 284,285,286,287,288,289,290,291
7 L L 0
8 A A 0
9 H H 0
The total numbers of genes for particular association are also given Table 6. In the below Table
the count 6512 indicates the total number of genes for the association L A.L A represents
the association Low Average.
Gene Id Value Raw Log Ratio
(635/532)
Diameter F635 Median F635 Mean
28080521 'D' 'U' 'U' 'U' 'U'
28084699 'H' 'U' 'U' 'U' 'U'
28084700 'U' 'U' 'U' 'U' 'U'
28081779 'U' 'U' 'U' 'U' 'U'
9. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
61
Table 6. Total number of genes for Association patterns
Association Patterns Total no of genes
L A
65
L H 6152
H L 6
The uncertainty associated with the fuzzy association patterns can be modeled by using
confidence measure. The confidence measure is defined as the probability of the pattern
Pr(lpq ljk). The weight of evidence measure W (lpq ljk) is calculated to handle the
uncertainties.The weights calculated for five clusters are shown in Table 7.
Table 7. Calculated Weights for Five Clusters
After the calculation of weight, a set of gene expression data collected from a set of N’ genes
from previously unseen gene expression data are collected. To predict the accuracy the fuzzy
association patterns previously discovered in each class can be searched to see which patterns can
match with the new expression profile. The weight of evidence W’ (lpq ljk) supporting the
assignment of new class is defined as:
ܹ′ሺ݈ݍ ↔ ݈݆݇ሻ = ܹሺ݈ݍ ↔ ݈݆݇ሻ ∗ ߤݍܨ
Associations
Cluster
1
Cluster
2
Cluster
3
Cluster
4
Cluster
5
L A 6.4013 2.0717 6.2697 2.0719 6.5422
L H 7.0953 2.0620 6.5270 2.0625 6.5425
A H 2.4775 2.3393 2.8394 2.3377 6.0861
A L 2.7859 2.3297 3.0213 2.3286 6.0864
H L 3.0213 2.3201 3.1758 2.3191 6.0867
H A 2.4788 2.6912 2.5847 2.6887 5.7730
L L 2.6477 2.6818 2.7068 2.6793 5.7727
A A 2.0371 4.5420 2.0625 4.5660 5.3364
H H 2.0794 6.3173 2.0794 6.5525 5.2742
10. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
62
Finally the degree of membership that the new gene expression data belongs to each class is
calculated. The accuracy between the genes can be predicted by calculating how the fuzzy
association patterns in new gene expression data can match with the patterns in previous dataset.
The occurrences of the particular association are grouped and the data belonging to that
association are grouped and stored for each cluster. The calculated fuzzy values are also grouped
for clusters. The rules generated for five clusters are displayed separately in Table 8. The table
contains the cluster number and rule number. As given in the Table 8 the following association
patterns are found for the clusters identified. The numbers like 2, 3, 1, 6, and 5 indicates the
association rule number. Some clusters contain few association rules. It indicates that the other
rules are not present within the cluster.
Table 8.Fuzzy Association Rules for five clusters
The occurrences of the association are grouped for each cluster. Each association rule is
represented by some constant values. The original data and the fuzzy values belonging to the
particular association are grouped in clusters.
3.3 Clustering and Verification Phase
The fuzzy association patterns are discovered and the accuracy is calculated in the second phase.
After the pattern discovery the clustering is done in third phase. Two types of clustering
algorithms are used. One is k-means and other is hierarchical clustering. After clustering the
fuzzy logic technique is applied for the cluster results. A comparison is made with the results
from kmeans without fuzzy logic and kmeans with fuzzy logic. The comparison is done for
hierarchical clustering similar to kmeans. The comparison is done for both Type 1 and Type 2
fuzzy.
After comparison, the results are plotted and verified for both types of clustering. After the
verification of results it is observed that the accuracy is increased by using fuzzy logic technique.
The accuracy values for kmeans clustering without applying fuzzy logic and with applying Type
1 fuzzy logic are shown in Table 9. Table 10 shows the accuracy values for hierarchical
clustering.
Cluster no Association Rule patterns
Cluster 1 2,3,4,5,6,7,8,9
Cluster 2 1,7,9
Cluster 3 1,2,3,4,8
Cluster 4 3,9
Cluster 5 1,6,7,8
11. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
63
Table 9. Accuracy Values for Kmeans and Type 1 Fuzzy
Table 10. Accuracy values for Hierarchical and Type 1 Fuzzy
Similar to Type 1 fuzzy the comparison is made for Type 2 fuzzy using kmeans and hierarchical
clustering algorithms. The accuracy values using Type 2 fuzzy and Kmeans clustering are shown
in Table 11. Similar to kmeans the accuracy values are calculated using hierarchical clustering
and it is shown in Table 12.
Table 11.Accuracy Values for Kmeans and Type 2 Fuzzy
Kmeans without
fuzzy
Kmeans+Type 1
Fuzzy
50.09 60.19
58.94 65.25
14.55 60.31
50.57 80.51
44.23 57.08
Hierarchical
without fuzzy
Hierarchical +
Type 1 fuzzy
5.30 5.60
3.25 6.79
8.01 11.19
9.47 10.93
11.33 13.07
Kmeans without
fuzzy
Kmeans+Type 2
Fuzzy
47.92 81.18
52.99 72.72
55.79 79.41
51.008 70.59
53.56 68.07
56.47 82.25
12. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
64
Table 12. Accuracy values for Hierarchical and Type 2 Fuzzy
The results of clustering without using fuzzy technique are compared with the results of fuzzy
technique. Then the proposed Type 2 approach (MGA-FL) is compared with the existing Type 1
approach. From the Tables 9,10,11,12 it is inferred that the proposed Type 2 approach (MGAFL)
deals with more uncertainties than existing Type 1 fuzzy. The comparison between the accuracy
values of Type 1 and Type 2 fuzzy shows that Type 2 fuzzy is more accurate than the Type 1
fuzzy. The K-Means algorithm and hierarchical clustering algorithms are implemented with fuzzy
technique and without fuzzy technique.
4. IMPLEMENTATION RESULTS AND DISCUSSION
The experimental results and comparative study of the Fuzzy techniqyes and two algorithms are
presented in this section. The membership functions are represented in fuzzy logic toolbox. The
membership function for Type 2 fuzzy is shown in Figure 4. The membership function for
Type 2 fuzzy is represented by trimf function type.
Figure 4. Membership Function for Type 2 fuzzy
Hierarchical without
fuzzy
Hierarchical +
Type 2 fuzzy
5.29 5.90
5.59 7.59
3.12 7.53
7.71 10.27
8.68 11.51
9.97 13.83
13. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
65
Figure 5.Surface Viewer for Type 2 fuzzy
The Surface Viewer in the fuzzy logic toolbox has a special capability that is very helpful in cases
with two (or more) inputs and one output. When opening the surface viewer a three dimensional
curve that represents the mapping of genes to expression is shown. Figure 5 shows the surface
viewer for the proposed work.
The cluster numbers and the gene numbers corresponding to the association patterns are shown in
Table 13 .
Table 13. Cluster numbers and Gene numbers for Association Pattern
Association Patterns Cluster Numbers Gene Numbers
L A 4,3,1 51,69,165,177,281,317,654
L H 4,5,3 1,2,3,4,5,6,7,8,9,10,11,12,13
A H 5,1,1 2308,4272,4273,4274,4630
A L 5,1,4 17,18,19,20,21,22,23,24
H L 3,4,1 5755,5756,5757,5758,5760
H A 2,5,4 284,285,286,287,288,289,290
14. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
66
The number of clusters and the total number of genes associated with a particular cluster is shown
in the form of bar chart in Figure 6. Based on the results from the bar chart it is analyzed that the
maximum numbers of genes fall under the cluster 5 for fuzzy associations A H,
H A, H H, A A.
Figure 6. Clusters and its total number of Genes
The bar diagram for the different association patterns and total number of genes is represented in
the Figure 7. Based on the analysis of results from the Fig 27, it is inferred that the association
patterns like A H, H A, A A and H H occurred frequently in most of the
clusters.
Figure 7. Gene Cluster for Association Patterns
The comparison of clustering algorithm accuracies for Type 1 and Type 2 fuzzy is done and it is
shown in the Table 14.
15. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
67
Table 14. Comparison of accuracy values in Type 1 and Type 2 fuzzy
The result for Type 1 fuzzy and Type 2 fuzzy are analyzed and compared and discussed in the
form of line chart. The accuracy values for kmeans and hierarchical clustering are plotted in line
chart and it is shown in Figure 8 and Figure 9. The analysis of the clustering results infer that the
accracy values are increased by using fuzzy logic technique when compared to the traditional
clustering algorithms. Based on the implementation results the accuracy can be further increased
by using the proposed Type 2 fuzzy approach (MGA-FL) compared to the existing Type 1 fuzzy
approach.
Figure 8.Accuracy results for Kmeans
0
50
100
150
200
250
1 2 3 4 5
Kmeans with Type 2
Fuzzy
Kmeans with Type 1
Fuzzy
Kmeans without Fuzzy
logic
Type1 Type 2
Kmeans Hierrachical Kmeans Hierrachical
60.19 5.60 92.27 5.90
65.25 6.37 71.02 7.59
60.31 6.79 81.18 7.53
80.51 11.19 72.72 10.27
57.08 10.93 79.41 11.51
68.93 11.70 70.59 13.83
66.54 13.07 68.07 13.88
77.06 12.15 82.25 14.15
16. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
68
Figure 9.Accuracy results for Hierarchical clustering
The technique proposed in the research work using Type 2 fuzzy for finding association patterns
to cluster microarray data is found to have highest accuracy than Type 1 fuzzy and other
traditional clustering algorithms.
5. CONCLUSION AND FUTURE DIRECTIONS
Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the
concept of partial truth, where the truth value may range between completely true and completely
false. We have proposed a methodology consisting of a framework (MGA-FL) for improving the
proposed idea. The main idea of our proposed work is to apply fuzzy logic to microarray gene
data to find fuzzy association patterns. The association uncertainties are handled by calculating
weight and the accuracy of gene clusters are predicted. The fuzzy associations are proposed for
both Type 1 and Type 2 fuzzy. In this study the microarray gene dataset is implemented and the
clustering is done using fuzzy logic associations. The various snapshots obtained in the section IV
prove that the proposed idea is found to have more accuracy than existing clustering algorithms
K-Means and Hierarchical clustering. We have also compared the proposed work with existing
Type 1 fuzzy and it is found that the accuracy results of the proposed Type 2 fuzzy logic is
increased when compared to Type 1 fuzzy approach. The proposed approach clusters microarray
gene data based on fuzzy association rules. The same can be implemented for classifying
microarray gene data and compared with famous algorithms like K-NN and SVM. The proposed
work can be combined with other machine learning approach like Neural Networks.
6. REFERENCES
[1] AnirbanMukhopadhyay, SanghamitraBandyopadhyayand UjjwalMaulik, “Analysis of Microarray Data
using Multiobjective Variable String Length Genetic Fuzzy Clustering”, 978-1-4244-2959, 2009 IEEE.
[2] Arun K Pujari, “Data Mining Techniques”, Universities press (India) Limited 2001, ISBN-81-7371-3804.
[3] Daniel T Larose, “Introduction to Data Mining”, Discovering Knowledge in Data: An Introduction to Data
Mining, ISBN 0-471-66657-2, 2005.
[4] Dongguang Li, “DNA Microarray Expression Analysis and Data Mining For Blood Cancer”, 2008
International Seminar on Future BioMedical Information Engineering.
[5] Edmundo Bonilla Huerta, Beatrice Duval, and Jin-Kao Hao, “Fuzzy Logic for Elimination of Redundant
Information of Microarray Data”, Vol. 6 No. 2, 2008.
[6] Hellman M., “Fuzzy Logic Introduction”, Info. &Ctl., Vol. 12, 1968, pp. 94-102.
[7] Juan R Castro, Oscar Castillo, Luis G Martinez, “Interval Type-2 Fuzzy Logic Toolbox”, Engineering
Letters, 15:1, EL_15_1_14, 2007.
[8] Luis Tari, ChittaBaral, Seungchan Kim, “Fuzzy C-Means Clustering with Prior Biological Knowledge”,
Journal of Biomedical Informatics, 2008.
[9] Luscombe N.M, Greenbaum D, Gerstein M., “Bio informatics: A Proposed Definition and Overview of the
Field”, IMIA Yearbook of Medical Informatics 2001: Digital Libraries and Medicine,
pp. 83–99.
17. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.2, No.2, April 2012
69
[10] Michael F Ochs and Andrew K Godwin, “Microarrays in Cancer: Research and Applications”,
BioTechniques 34:S4-S15 March 2003.
[11] Mohammad Mehdi Pourhashem, ManouchehrKelarestaghi, Mir Mohsen Pedram, “Missing Value
Estimation In Microarray Data Using Fuzzy Clustering And Semantic Similarity”, Global Journal of
Computer Science And Technology, Page 18,Vol.10,October 2010.
[12] MuzeyyenBulutOzek, ZuhtuHakanAkpolat, “A Software Tool: Type-2 Fuzzy Logic Toolbox”, Wiley
Periodicals Inc., 2007.
[13] NenadJukic, SvetlozarNestorov, Miguel Velasco, Jami Eddington, “Uncovering Actionable Knowledge in
Corporate Data with Qualified Association Rules”, International Journal of Business Intelligence Research,
2(2), 1-21, April-June 2011.
[14] Nilesh N Karnik, Jerry M Mendel and Qilian Liang, “Type-2 Fuzzy Logic Systems”, IEEE Transactions on
Fuzzy Systems, Vol. 7, No. 6, December 1999.
[15] Pablo Martin Munoz and Francisco J Moreno Velo, “Fuzzy CN2: An Algorithm for Extracting Fuzzy
Classification Rule Lists”, WCCI 2010 IEEE World Congress on Computational Intelligence, 18-23, July
2010 - CCIB, Barcelona, Spain.
[16] Peter J Woolf and Yixin Wang, “A Fuzzy Logic Approach to Analyzing Gene Expression Data”, Physiol
Genomics3: 9–15, 2000.
[17] PradiptaMaji, “Fuzzy–Rough Supervised Attribute Clustering Algorithm and Classification of Microarray
Data”, IEEE Transactions on Systems, Man and Cybernetics—Part B: Cybernetics, Vol. 41 and No. 1,
February 2011.
[18] Qilian Liang and Jerry M. Mendel, “Interval Type-2 Fuzzy Logic Systems:Theory and Design”, IEEE
Transactions On Fuzzy Systems, Vol. 8, No. 5, October 2000.
[19] Raed I Hamed1, Ahson S.I, Parveen R., “A New Approach for Modelling Gene Regulatory Networks
Using Fuzzy Petri Nets”, Journal of Integrative Bioinformatics, 7(1):113, 2010.
[20] Vincent S Tseng, Yen-Hsu Chen, Chun-Hao Chen, and Shin J.W., “Mining Fuzzy Association Patterns in
Gene Expression Databases”, International Journal of Fuzzy Systems, Vol. 8, No. 2, June 2006 .
[21] Yan-Fei Wang, Zu-Guo Yu, “A Type-2 Fuzzy Method for Identification of Disease-Related Genes on
Microarrays”, International Journal of Bioscience, Biochemistry and Bioinformatics, Vol. 1, No. 1, May
2011.
[22] YuchunTang,Yan-Qing Zhang,, Zhen Huang, Xiaohua Hu, and Yichuan Zhao, “Recursive Fuzzy
Granulation for Gene Subsets Extraction and Cancer Classification”, IEEE Transactions on Information
Technology In Biomedicine, Vol. 12, No. 6, November 2008.
[23] Zuoliang Chen, Guoqing Chen, “Building an Associative Classifier Based on Fuzzy Association Rules”,
International Journal of Computational Intelligence Systems, Vol.1, No. 3 (August, 2008), 262 – 273.
Authors
Ms V Bhuvaneswari received her Bachelor’s Degree (B.Sc.) in Computer technology
from Bharathiar University, India 1997, Master’s Degree (MCA) in Computer
Applications from IGNOU, India and M.Phil in Computer Science in 2003 from
Bharathiar University, India. She has qualified JRF, UGC-NET, for Lectureship in the
year 2003. She is currently pursuing her doctoral research in School of Computer Science
and Engineering at Bharathiar University in the area of Data mining. Her research
interests include Bioinformatics, Soft computing and Databases. She is currently working
as Assistant Professor in the School of Computer Science and Engineering, Bharathiar
University, India. She has credit for her publications in journals, International/ National Conferences.
Ms. S.J.Brintha received her Bachelor’s degree in (B.Sc.) Microbiology from Bharathiar
University, India 2007, Master’s Degree (MCA) in Computer Applications from Anna
University, India 2010 and M.Phil in Computer Science in 2011 from Bharathiar University,
India. She has attended National conferences. She was awarded gold medal for First Rank
Holder in Master of Computer Applications. Her research interests include Data Mining,
Bioinformatics, and networking.