SlideShare a Scribd company logo
1 of 41
PREDICTING THE CELLULAR
LOCALIZATION SITES OF
PROTEINS USING ARTIFICIAL
NEURAL NETWORKS
Submitted by:
Vaibhav Dhattarwal
08211018
Supervisor and Guide:
Dr Durga Toshniwal
Organization of Presentation
• Introduction
• Problem Statement
• Background
• Stages Proposed
• Algorithm Implementation
• Results & Discussion
• Conclusion & Future Work
• References
Introduction
• If one is able to deduce or figure out the sub cellular
location of a protein, we can interpret its function, its
part in healthy processes and also in commencement
of disease, and it’s probable usage as a drug target.
• The sub cellular location of a protein can provide
valuable information about the role it has in the
cellular dynamics.
• The intention is to understand their basic or specific
function regards to the life of the cell.
Introduction: Protein Structure
Problem Statement
• “Prediction of Cellular Localization sites of proteins
using artificial neural networks”
• This report aims to combine the simulated artificial
neural networks and the field of bioinformatics to
predict the location of protein in a yeast genome.
• I have introduced a new sub cellular prediction
method based on a back propagation algorithm
implemented artificial neural network.
Background
• Neural Network Definition
• Neural Network Applications
• Neural Network Categorization
• Types of Neural Network
• Perceptrons and Learning Algorithm
• Classification for Yeast protein data set
Background: Neural Network Definition
• A neural network is a system which consists of many
simple processing elements that are operating in parallel
and their function ascertained by network structure,
strengths or weights of connections and the computation
done at those computing elements/nodes.
• A neural network is a massively parallel distributed
processor what holds a strong inherent ability to store
large amount of experimental knowledge. It has two
features:
▫ Knowledge is acquired through a learning procedure.
▫ Interneuron connection strengths or weights are used to
store this knowledge.
• Computer scientists can find out properties of non-symbolic
information processing by using neural networks, they can
also find out more about learning systems in general.
• Statisticians might be able to use neural networks as flexible
and nonlinear regression, and classification models.
• Neural Networks can be used by engineers for signal
processing and automatic control.
• Cognitive scientists deploy neural networks to describe
models of thinking and consciousness, which is basically brain
function.
• Neurophysiologists use neural networks to describe and
research medium-level brain function.
Background: Neural Network Applications
Background: Neural Network
Categorization
• Supervised Learning based
▫ Feed Forward Topology based
▫ Feed Back Topology based
▫ Competitive Learning based
• Unsupervised Learning based
▫ Competitive Learning based
▫ Dimension Reduction Process
▫ Auto Associative Memory
Background: Types of Neural Network
Background: Perceptrons and Learning
Algorithm
Background: Perceptrons and Learning
Algorithm
Background: Yeast Cell
Background: Yeast Protein Data Set
• erl : It is representative of the lumen in the endoplasmic reticulum in the cell. This attribute
tells whether an HDEL pattern as n signal for retention is present or not.
• vac : This attribute give an indication of the content of amino acids in vacuolar and
extracellular proteins after performing a discriminant analysis.
• mit : This attribute gives the composition of N terminal region, which has twenty residue, of
mitochondrial as well as non-mitochondrial protein after performing a discriminant analysis.
• nuc : This feature tell us about nuclear localization patterns as to whether they are present or
not. It also holds some information about the frequency of basic residues.
• pox : This attribute provides the composition of the sequence of protein after discriminant
analysis on them. Not only this, it also indicates the presence of a short sequence motif.
• mcg : This is a parameter used in a signal sequence detection method known as McGeoch.
However in this case we are using a modified version of it.
• gvh : This attribute represents a weight matrix based procedure and is used to detect signal
sequences which are cleavable.
• alm : This final feature helps us by performing identification on the entire sequence for
membrane spanning regions.
Background: Classification Tree
Proposed Stages Of Work Done
• Stage one: Simulating the network
• Stage two: Implementing the algorithm
• Stage three: Training the Network
• Stage four: Obtaining results and comparing
performance
Stage One: Simulating the network
Stage Two: Implementing the algorithm
Stage Three: Training the network
• The localization site is represented by the class as output.
Here are the various classes:
▫ CYT (cytoskeletal)
▫ NUC (nuclear)
▫ MIT (mitochondrial)
▫ ME3 (membrane protein, no N-terminal signal)
▫ ME2 (membrane protein, uncleaved signal)
▫ ME1 (membrane protein, cleaved signal)
▫ EXC (extracellular)
▫ VAC (vacuolar)
▫ POX (peroxisomal)
▫ ERL (endoplasmic reticulum lumen)
Stage Four: Obtaining Results &
Comparing Performance
• The yeast data set class statistics are mapped as
output.
• The attributes of the data set are mapped to reflect the
variation of output.
• Varying the number of nodes in the hidden layer is
used to evaluate performance.
• Parameters for comparing performance are:
▫ Accuracy on test set.
▫ Ratio of correctly classified in the training set
Algorithm Implementation
• Sigmoid Function & Its derivative
• Pseudo Code for a single network layer
• Pseudo Code for all network layers
• Pseudo Code for training patterns
• Pseudo Code for minimizing error
Sigmoid Function & Its derivative
Pseudo Code for a single network layer
• InputLayer2[j] = Wt[0][j]
• for all elements in Layer One [ NumUnitLayer1 ]
• do
▫ Add to InputLayer2[j] the sum over the product
OutputLayer1[i] * Wt[i][j]
• end for
• Compute the sigmoid to get activation output
Pseudo Code for all network layers
• for all elements in hidden layer [ NumUnitHidden ] // computes Hidden Layer PE outputs //
• do
▫ InputHidden[j] = WtInput/Hidden[0][j]
▫ for all elements in input layer [ NumUnitInput ]
▫ do
 Add to InputHidden[j] the sum over OutputInput[i] * WtInput/Hidden [i][j]
▫ end for
• Compute sigmoid for output
• end for
• for all elements in output layer [ NumUnitOutput ] // computes Output Layer PE outputs //
• do
▫ InputOutput[k] = WtHidden/Output[0][k]
▫ for all elements in hidden layer [ NumUnitHidden ]
▫ do
 Add to InputOutput [k] sum over OutputHidden[j] * WtHidden/Output [j][k]
▫ end for
• Compute sigmoid for output
• end for
Design for calculating output
Pseudo Code for training patterns
• Er = 0.0 ;
• for all patterns in the training set
• do // computes for all training patterns(E) //
▫ for all elements in hidden layer [ NumUnitHidden ]
▫ do
 InputHidden[E][j] = WtInput/Hidden[0][j]
 for all elements in input layer [ NumUnitInput ]
 do
 Add to InputHidden[E] [j] the sum over OutputInput[E] [i] * WtInput/Hidden
[i][j];
 end for
▫ Compute sigmoid for output
▫ end for
Pseudo Code for training patterns
▫ for all elements in output layer [ NumUnitOutput ]
▫ do
 InputOutput[E] [k] = WtHidden/Output[0][k]
 for all elements in hidden layer [ NumUnitHidden ]
 do
 Add to InputOutput [E] [k] sum over OutputHidden[E] [j] *
WtHidden/Output [j][k]
 end for
▫ Compute sigmoid for output
▫ Add to Er the sum over the product (1/2) * (Final[E][k] -
Output[E][k]) * (Final[E][k] - Output[E][k])
▫ end for
• end for
Pseudo Code for minimizing error
• for all elements in hidden layer [ NumUnitHidden ]
• do // This loop updates the weight input to hidden
//
▫ Add to ΔWih [0][j] the sum of: product β * ΔH [j] to the
product: α * ΔWih [0][j]
▫ Add to WtInput/Hidden [0][j] the change ΔWih [0][j]
▫ for all elements in input layer [ NumUnitInput ]
▫ do
 Add to ΔWih [i][j] the sum of product β * InputHidden [p][i] * ΔH
[j] to the product: α * ΔWih [i][j]
▫ Add to WtInput/Hidden [i][j] the change ΔWih [i][j]
▫ end for
• end for
Pseudo Code for minimizing error
• for all elements in output layer [ NumUnitOutput ]
• do // This loop updates the weight hidden to
output //
▫ Add to ΔWho [0][k] the sum of: product β * ΔOutput[k] to
the product: α * ΔWho [0][k]
▫ Add to WtHidden/Output [0][k] the change ΔWho [0][k]
▫ for all elements in hidden layer [ NumUnitHidden ]
▫ do
 Add to ΔWho [j][k] the sum of product β * OutputHidden [p][j] *
ΔOutput [k] to the product: α *ΔWho [j][k]
 Add to WtHidden/Output [j][k] the change ΔWho [j][k]
▫ end for
• end for
Results & Discussion
• Yeast Data Set Class Statistics
• Yeast Data Set Attributes
• Comparison of Accuracies of various algorithms
• Variation of success rate with number of iterations
• Variation of success rate with number of nodes in
hidden layer
• Variation of accuracy in training with the criteria in
testing
Results: Yeast Data Set class statistics
Results: Yeast Data Set Attributes
Results: Comparison of Accuracies of
various algorithms
Results: Variation of Success Rate with no
of iterations
Results: Variation of Success Rate with no
of nodes in Hidden Layer
Results: Variation of Accuracy in Training
with Criteria in Testing
Conclusion
• The classes CYT, NUC and MIT have the largest
number of instances.
• Interesting observations are that the value of erl and
pox are almost constant throughout the entirety of the
data set whereas the rest of the attributes show
constant variation.
• The algorithm is able to achieve slightly higher
accuracy than the rest of the algorithms.
Conclusion
• Another thing of note is to see that considerable
success is achieved in the yeast data set which we
chose to implement with accuracy leading up to 61%
• After about 100 iterations the success rate remains
constant more or less.
• The success rate reaches a constant value after about
75 elements in the layer.
• The Accuracy rises till we reach the limit to which we
can set the success rate.
Future Work
• Since the prediction of proteins’ cellular localization
sites is a typical classification problem, many other
techniques such as probability model, Bayesian
network, K-nearest neighbours etc, can be compared
with our technique.
• Thus, an aspect of future work is to examine the
performance of these techniques on this particular
problem.
Key References
• [1]. "A Probablistic Classification System for Predicting the Cellular Localization Sites of
Proteins", Paul Horton & Kenta Nakai, Intelligent Systems in Molecular Biology, 109-115.
• [2]. "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria",
Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110,
1991.
• [3]. "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta
Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992.
• [4]. Cairns, P. Huyck, et.al, A Comparison of Categorization Algorithms for Predicting the
Cellular Localization Sites of Proteins, IEEE Engineering in Medicine and Biology, pp.296-
300, 2001.
• [5]. Donnes, P., and Hoglund, A., Predicting protein subcellular localization: Past, present,
and future Genomics Proteomics Bioinformatics, 2:209-215, 2004.
• [6]. Feng, Z.P., An overview on predicting the subcellular location of a protein, In Silico
Biol2002.
• [7]. Reinhardt, A., and Hubbard, T., Using neural networks for prediction of the subcellular
location of proteins, Nucleic Acids Res., 26(9):2230-2236, 1998.
THANK YOU

More Related Content

What's hot

Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkNaren Chandra Kattla
 
Electricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNElectricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNNaren Chandra Kattla
 
Investigations on Hybrid Learning in ANFIS
Investigations on Hybrid Learning in ANFISInvestigations on Hybrid Learning in ANFIS
Investigations on Hybrid Learning in ANFISIJERA Editor
 
Comparison of Neural Network Training Functions for Hematoma Classification i...
Comparison of Neural Network Training Functions for Hematoma Classification i...Comparison of Neural Network Training Functions for Hematoma Classification i...
Comparison of Neural Network Training Functions for Hematoma Classification i...IOSR Journals
 
Analog VLSI Implementation of Neural Network Architecture for Signal Processing
Analog VLSI Implementation of Neural Network Architecture for Signal ProcessingAnalog VLSI Implementation of Neural Network Architecture for Signal Processing
Analog VLSI Implementation of Neural Network Architecture for Signal ProcessingVLSICS Design
 
Multi-objective Optimization of PID Controller using Pareto-based Surrogate ...
Multi-objective Optimization of PID Controller using  Pareto-based Surrogate ...Multi-objective Optimization of PID Controller using  Pareto-based Surrogate ...
Multi-objective Optimization of PID Controller using Pareto-based Surrogate ...IJECEIAES
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 
Recognition of Epilepsy from Non Seizure Electroencephalogram using combinati...
Recognition of Epilepsy from Non Seizure Electroencephalogram using combinati...Recognition of Epilepsy from Non Seizure Electroencephalogram using combinati...
Recognition of Epilepsy from Non Seizure Electroencephalogram using combinati...Atrija Singh
 
Autotuning of pid controller for robot arm and magnet levitation plant
Autotuning of pid controller for robot arm and magnet levitation plantAutotuning of pid controller for robot arm and magnet levitation plant
Autotuning of pid controller for robot arm and magnet levitation planteSAT Journals
 
Artificial Neural Networks for NIU session 2016 17
Artificial Neural Networks for NIU session 2016 17 Artificial Neural Networks for NIU session 2016 17
Artificial Neural Networks for NIU session 2016 17 Prof. Neeta Awasthy
 
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...ijcax
 
Review: “Implementation of Feedforward and Feedback Neural Network for Signal...
Review: “Implementation of Feedforward and Feedback Neural Network for Signal...Review: “Implementation of Feedforward and Feedback Neural Network for Signal...
Review: “Implementation of Feedforward and Feedback Neural Network for Signal...IJERA Editor
 
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHMBACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHMcscpconf
 
A neuro fuzzy decision support system
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support systemR A Akerkar
 
A boolean modeling for improving
A boolean modeling for improvingA boolean modeling for improving
A boolean modeling for improvingcsandit
 
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITYDCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITYIAEME Publication
 

What's hot (19)

Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural Network
 
Electricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNElectricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANN
 
Investigations on Hybrid Learning in ANFIS
Investigations on Hybrid Learning in ANFISInvestigations on Hybrid Learning in ANFIS
Investigations on Hybrid Learning in ANFIS
 
Comparison of Neural Network Training Functions for Hematoma Classification i...
Comparison of Neural Network Training Functions for Hematoma Classification i...Comparison of Neural Network Training Functions for Hematoma Classification i...
Comparison of Neural Network Training Functions for Hematoma Classification i...
 
Analog VLSI Implementation of Neural Network Architecture for Signal Processing
Analog VLSI Implementation of Neural Network Architecture for Signal ProcessingAnalog VLSI Implementation of Neural Network Architecture for Signal Processing
Analog VLSI Implementation of Neural Network Architecture for Signal Processing
 
Multi-objective Optimization of PID Controller using Pareto-based Surrogate ...
Multi-objective Optimization of PID Controller using  Pareto-based Surrogate ...Multi-objective Optimization of PID Controller using  Pareto-based Surrogate ...
Multi-objective Optimization of PID Controller using Pareto-based Surrogate ...
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
Recognition of Epilepsy from Non Seizure Electroencephalogram using combinati...
Recognition of Epilepsy from Non Seizure Electroencephalogram using combinati...Recognition of Epilepsy from Non Seizure Electroencephalogram using combinati...
Recognition of Epilepsy from Non Seizure Electroencephalogram using combinati...
 
Autotuning of pid controller for robot arm and magnet levitation plant
Autotuning of pid controller for robot arm and magnet levitation plantAutotuning of pid controller for robot arm and magnet levitation plant
Autotuning of pid controller for robot arm and magnet levitation plant
 
Presentation, navid khoob
Presentation, navid khoobPresentation, navid khoob
Presentation, navid khoob
 
Complex system
Complex systemComplex system
Complex system
 
Artificial Neural Networks for NIU session 2016 17
Artificial Neural Networks for NIU session 2016 17 Artificial Neural Networks for NIU session 2016 17
Artificial Neural Networks for NIU session 2016 17
 
01 Introduction to Machine Learning
01 Introduction to Machine Learning01 Introduction to Machine Learning
01 Introduction to Machine Learning
 
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
 
Review: “Implementation of Feedforward and Feedback Neural Network for Signal...
Review: “Implementation of Feedforward and Feedback Neural Network for Signal...Review: “Implementation of Feedforward and Feedback Neural Network for Signal...
Review: “Implementation of Feedforward and Feedback Neural Network for Signal...
 
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHMBACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
 
A neuro fuzzy decision support system
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support system
 
A boolean modeling for improving
A boolean modeling for improvingA boolean modeling for improving
A boolean modeling for improving
 
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITYDCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
DCT AND DFT BASED BIOMETRIC RECOGNITION AND MULTIMODAL BIOMETRIC SECURITY
 

Similar to Dissertation Prsentation - Vaibhav

08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
IRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGAIRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGAIRJET Journal
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptxssuser67281d
 
Classification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolClassification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolIRJET Journal
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature LearningAmgad Muhammad
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdfFEG
 
Table of Contents
Table of ContentsTable of Contents
Table of Contentsbutest
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxFaridAliMousa1
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Blood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNNBlood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNNIRJET Journal
 
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of GlaucomaComparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of GlaucomaIRJET Journal
 
MS SQL SERVER:Microsoft neural network and logistic regression
MS SQL SERVER:Microsoft neural network and logistic regressionMS SQL SERVER:Microsoft neural network and logistic regression
MS SQL SERVER:Microsoft neural network and logistic regressionDataminingTools Inc
 
MS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regressionMS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regressionsqlserver content
 

Similar to Dissertation Prsentation - Vaibhav (20)

08 neural networks
08 neural networks08 neural networks
08 neural networks
 
IRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGAIRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGA
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
 
Classification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka ToolClassification and Prediction Based Data Mining Algorithm in Weka Tool
Classification and Prediction Based Data Mining Algorithm in Weka Tool
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
Table of Contents
Table of ContentsTable of Contents
Table of Contents
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptx
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Verilog
VerilogVerilog
Verilog
 
Neural Networks and Applications
Neural Networks and ApplicationsNeural Networks and Applications
Neural Networks and Applications
 
Blood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNNBlood Cell Image Classification for Detecting Malaria using CNN
Blood Cell Image Classification for Detecting Malaria using CNN
 
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of GlaucomaComparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
 
Unit ii supervised ii
Unit ii supervised iiUnit ii supervised ii
Unit ii supervised ii
 
MS SQL SERVER:Microsoft neural network and logistic regression
MS SQL SERVER:Microsoft neural network and logistic regressionMS SQL SERVER:Microsoft neural network and logistic regression
MS SQL SERVER:Microsoft neural network and logistic regression
 
MS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regressionMS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regression
 

More from Vaibhav Dhattarwal

More from Vaibhav Dhattarwal (6)

Research Paper - Vaibhav
Research Paper - VaibhavResearch Paper - Vaibhav
Research Paper - Vaibhav
 
Project Report -Vaibhav
Project Report -VaibhavProject Report -Vaibhav
Project Report -Vaibhav
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Internship Project Report - Vaibhav
Internship Project Report - VaibhavInternship Project Report - Vaibhav
Internship Project Report - Vaibhav
 
VaibhavDhattarwal_PGP06.054_CV_v2
VaibhavDhattarwal_PGP06.054_CV_v2VaibhavDhattarwal_PGP06.054_CV_v2
VaibhavDhattarwal_PGP06.054_CV_v2
 

Dissertation Prsentation - Vaibhav

  • 1. PREDICTING THE CELLULAR LOCALIZATION SITES OF PROTEINS USING ARTIFICIAL NEURAL NETWORKS Submitted by: Vaibhav Dhattarwal 08211018 Supervisor and Guide: Dr Durga Toshniwal
  • 2. Organization of Presentation • Introduction • Problem Statement • Background • Stages Proposed • Algorithm Implementation • Results & Discussion • Conclusion & Future Work • References
  • 3. Introduction • If one is able to deduce or figure out the sub cellular location of a protein, we can interpret its function, its part in healthy processes and also in commencement of disease, and it’s probable usage as a drug target. • The sub cellular location of a protein can provide valuable information about the role it has in the cellular dynamics. • The intention is to understand their basic or specific function regards to the life of the cell.
  • 5. Problem Statement • “Prediction of Cellular Localization sites of proteins using artificial neural networks” • This report aims to combine the simulated artificial neural networks and the field of bioinformatics to predict the location of protein in a yeast genome. • I have introduced a new sub cellular prediction method based on a back propagation algorithm implemented artificial neural network.
  • 6. Background • Neural Network Definition • Neural Network Applications • Neural Network Categorization • Types of Neural Network • Perceptrons and Learning Algorithm • Classification for Yeast protein data set
  • 7. Background: Neural Network Definition • A neural network is a system which consists of many simple processing elements that are operating in parallel and their function ascertained by network structure, strengths or weights of connections and the computation done at those computing elements/nodes. • A neural network is a massively parallel distributed processor what holds a strong inherent ability to store large amount of experimental knowledge. It has two features: ▫ Knowledge is acquired through a learning procedure. ▫ Interneuron connection strengths or weights are used to store this knowledge.
  • 8. • Computer scientists can find out properties of non-symbolic information processing by using neural networks, they can also find out more about learning systems in general. • Statisticians might be able to use neural networks as flexible and nonlinear regression, and classification models. • Neural Networks can be used by engineers for signal processing and automatic control. • Cognitive scientists deploy neural networks to describe models of thinking and consciousness, which is basically brain function. • Neurophysiologists use neural networks to describe and research medium-level brain function. Background: Neural Network Applications
  • 10. • Supervised Learning based ▫ Feed Forward Topology based ▫ Feed Back Topology based ▫ Competitive Learning based • Unsupervised Learning based ▫ Competitive Learning based ▫ Dimension Reduction Process ▫ Auto Associative Memory Background: Types of Neural Network
  • 11. Background: Perceptrons and Learning Algorithm
  • 12. Background: Perceptrons and Learning Algorithm
  • 14. Background: Yeast Protein Data Set • erl : It is representative of the lumen in the endoplasmic reticulum in the cell. This attribute tells whether an HDEL pattern as n signal for retention is present or not. • vac : This attribute give an indication of the content of amino acids in vacuolar and extracellular proteins after performing a discriminant analysis. • mit : This attribute gives the composition of N terminal region, which has twenty residue, of mitochondrial as well as non-mitochondrial protein after performing a discriminant analysis. • nuc : This feature tell us about nuclear localization patterns as to whether they are present or not. It also holds some information about the frequency of basic residues. • pox : This attribute provides the composition of the sequence of protein after discriminant analysis on them. Not only this, it also indicates the presence of a short sequence motif. • mcg : This is a parameter used in a signal sequence detection method known as McGeoch. However in this case we are using a modified version of it. • gvh : This attribute represents a weight matrix based procedure and is used to detect signal sequences which are cleavable. • alm : This final feature helps us by performing identification on the entire sequence for membrane spanning regions.
  • 16. Proposed Stages Of Work Done • Stage one: Simulating the network • Stage two: Implementing the algorithm • Stage three: Training the Network • Stage four: Obtaining results and comparing performance
  • 17. Stage One: Simulating the network
  • 18. Stage Two: Implementing the algorithm
  • 19. Stage Three: Training the network • The localization site is represented by the class as output. Here are the various classes: ▫ CYT (cytoskeletal) ▫ NUC (nuclear) ▫ MIT (mitochondrial) ▫ ME3 (membrane protein, no N-terminal signal) ▫ ME2 (membrane protein, uncleaved signal) ▫ ME1 (membrane protein, cleaved signal) ▫ EXC (extracellular) ▫ VAC (vacuolar) ▫ POX (peroxisomal) ▫ ERL (endoplasmic reticulum lumen)
  • 20. Stage Four: Obtaining Results & Comparing Performance • The yeast data set class statistics are mapped as output. • The attributes of the data set are mapped to reflect the variation of output. • Varying the number of nodes in the hidden layer is used to evaluate performance. • Parameters for comparing performance are: ▫ Accuracy on test set. ▫ Ratio of correctly classified in the training set
  • 21. Algorithm Implementation • Sigmoid Function & Its derivative • Pseudo Code for a single network layer • Pseudo Code for all network layers • Pseudo Code for training patterns • Pseudo Code for minimizing error
  • 22. Sigmoid Function & Its derivative
  • 23. Pseudo Code for a single network layer • InputLayer2[j] = Wt[0][j] • for all elements in Layer One [ NumUnitLayer1 ] • do ▫ Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j] • end for • Compute the sigmoid to get activation output
  • 24. Pseudo Code for all network layers • for all elements in hidden layer [ NumUnitHidden ] // computes Hidden Layer PE outputs // • do ▫ InputHidden[j] = WtInput/Hidden[0][j] ▫ for all elements in input layer [ NumUnitInput ] ▫ do  Add to InputHidden[j] the sum over OutputInput[i] * WtInput/Hidden [i][j] ▫ end for • Compute sigmoid for output • end for • for all elements in output layer [ NumUnitOutput ] // computes Output Layer PE outputs // • do ▫ InputOutput[k] = WtHidden/Output[0][k] ▫ for all elements in hidden layer [ NumUnitHidden ] ▫ do  Add to InputOutput [k] sum over OutputHidden[j] * WtHidden/Output [j][k] ▫ end for • Compute sigmoid for output • end for
  • 26. Pseudo Code for training patterns • Er = 0.0 ; • for all patterns in the training set • do // computes for all training patterns(E) // ▫ for all elements in hidden layer [ NumUnitHidden ] ▫ do  InputHidden[E][j] = WtInput/Hidden[0][j]  for all elements in input layer [ NumUnitInput ]  do  Add to InputHidden[E] [j] the sum over OutputInput[E] [i] * WtInput/Hidden [i][j];  end for ▫ Compute sigmoid for output ▫ end for
  • 27. Pseudo Code for training patterns ▫ for all elements in output layer [ NumUnitOutput ] ▫ do  InputOutput[E] [k] = WtHidden/Output[0][k]  for all elements in hidden layer [ NumUnitHidden ]  do  Add to InputOutput [E] [k] sum over OutputHidden[E] [j] * WtHidden/Output [j][k]  end for ▫ Compute sigmoid for output ▫ Add to Er the sum over the product (1/2) * (Final[E][k] - Output[E][k]) * (Final[E][k] - Output[E][k]) ▫ end for • end for
  • 28. Pseudo Code for minimizing error • for all elements in hidden layer [ NumUnitHidden ] • do // This loop updates the weight input to hidden // ▫ Add to ΔWih [0][j] the sum of: product β * ΔH [j] to the product: α * ΔWih [0][j] ▫ Add to WtInput/Hidden [0][j] the change ΔWih [0][j] ▫ for all elements in input layer [ NumUnitInput ] ▫ do  Add to ΔWih [i][j] the sum of product β * InputHidden [p][i] * ΔH [j] to the product: α * ΔWih [i][j] ▫ Add to WtInput/Hidden [i][j] the change ΔWih [i][j] ▫ end for • end for
  • 29. Pseudo Code for minimizing error • for all elements in output layer [ NumUnitOutput ] • do // This loop updates the weight hidden to output // ▫ Add to ΔWho [0][k] the sum of: product β * ΔOutput[k] to the product: α * ΔWho [0][k] ▫ Add to WtHidden/Output [0][k] the change ΔWho [0][k] ▫ for all elements in hidden layer [ NumUnitHidden ] ▫ do  Add to ΔWho [j][k] the sum of product β * OutputHidden [p][j] * ΔOutput [k] to the product: α *ΔWho [j][k]  Add to WtHidden/Output [j][k] the change ΔWho [j][k] ▫ end for • end for
  • 30. Results & Discussion • Yeast Data Set Class Statistics • Yeast Data Set Attributes • Comparison of Accuracies of various algorithms • Variation of success rate with number of iterations • Variation of success rate with number of nodes in hidden layer • Variation of accuracy in training with the criteria in testing
  • 31. Results: Yeast Data Set class statistics
  • 32. Results: Yeast Data Set Attributes
  • 33. Results: Comparison of Accuracies of various algorithms
  • 34. Results: Variation of Success Rate with no of iterations
  • 35. Results: Variation of Success Rate with no of nodes in Hidden Layer
  • 36. Results: Variation of Accuracy in Training with Criteria in Testing
  • 37. Conclusion • The classes CYT, NUC and MIT have the largest number of instances. • Interesting observations are that the value of erl and pox are almost constant throughout the entirety of the data set whereas the rest of the attributes show constant variation. • The algorithm is able to achieve slightly higher accuracy than the rest of the algorithms.
  • 38. Conclusion • Another thing of note is to see that considerable success is achieved in the yeast data set which we chose to implement with accuracy leading up to 61% • After about 100 iterations the success rate remains constant more or less. • The success rate reaches a constant value after about 75 elements in the layer. • The Accuracy rises till we reach the limit to which we can set the success rate.
  • 39. Future Work • Since the prediction of proteins’ cellular localization sites is a typical classification problem, many other techniques such as probability model, Bayesian network, K-nearest neighbours etc, can be compared with our technique. • Thus, an aspect of future work is to examine the performance of these techniques on this particular problem.
  • 40. Key References • [1]. "A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins", Paul Horton & Kenta Nakai, Intelligent Systems in Molecular Biology, 109-115. • [2]. "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110, 1991. • [3]. "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992. • [4]. Cairns, P. Huyck, et.al, A Comparison of Categorization Algorithms for Predicting the Cellular Localization Sites of Proteins, IEEE Engineering in Medicine and Biology, pp.296- 300, 2001. • [5]. Donnes, P., and Hoglund, A., Predicting protein subcellular localization: Past, present, and future Genomics Proteomics Bioinformatics, 2:209-215, 2004. • [6]. Feng, Z.P., An overview on predicting the subcellular location of a protein, In Silico Biol2002. • [7]. Reinhardt, A., and Hubbard, T., Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Res., 26(9):2230-2236, 1998.