1) The document presents a study that uses deep learning to automatically learn feature representations from protein sequential data and build a better protein secondary structure classification model.
2) The goals are to understand how neural network layers represent hierarchical features of one-dimensional sequential data and induce a protein sequence classifier that outperforms existing methods.
3) The researchers were able to achieve 62.156% accuracy on protein secondary structure prediction using their deep learning model, which detects common features to distinguish secondary structures.
Contents:
What does sequence mean?
Examples of sequences
Sequence Homology
Sequence Alignment
What is the use of sequence alignment?
Alignment methods
Tools for Sequence Alignment
FASTA Format
BLAST
Principle of BLAST
Variants of BLAST Program
BLAST input
BLAST output
Multiple sequence alignment
What is the use of multiple alignments?
Multiple Alignment Method
Tool for multiple alignments
ClustalW input
ClustalW output
E (Expectation) value
Demerits of progressive alignment
Here is a nice presentation about predict protein software or tool.In this presentation, there is nice discription how this software works and what are the different databases that include in this tool.
Contents:
What does sequence mean?
Examples of sequences
Sequence Homology
Sequence Alignment
What is the use of sequence alignment?
Alignment methods
Tools for Sequence Alignment
FASTA Format
BLAST
Principle of BLAST
Variants of BLAST Program
BLAST input
BLAST output
Multiple sequence alignment
What is the use of multiple alignments?
Multiple Alignment Method
Tool for multiple alignments
ClustalW input
ClustalW output
E (Expectation) value
Demerits of progressive alignment
Here is a nice presentation about predict protein software or tool.In this presentation, there is nice discription how this software works and what are the different databases that include in this tool.
Open CV Implementation of Object Recognition Using Artificial Neural Networksijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Deep learning is a collection of machine learning algorithms utilizing multiple layers, with which higher levels of raw data are slowly removed. For example, lower layers can recognize edges in image processing whereas higher layers may define concepts for humans such as numbers or letters or faces. In this paper we have done a literature survey of some other papers to know how useful is Deep Learning and how to define other Artificial Intelligence things using Deep Learning. Anirban Chakraborty "A Study of Deep Learning Applications" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31629.pdf Paper Url :https://www.ijtsrd.com/computer-science/artificial-intelligence/31629/a-study-of-deep-learning-applications/anirban-chakraborty
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...cscpconf
This paper presents an ensemble of neo-cognitron neural network base classifiers to enhance
the accuracy of the system, along the experimental results. The method offers lesser
computational preprocessing in comparison to other ensemble techniques as it ex-preempts
feature extraction process before feeding the data into base classifiers. This is achieved by the
basic nature of neo-cognitron, it is a multilayer feed-forward neural network. Ensemble of such
base classifiers gives class labels for each pattern that in turn is combined to give the final class
label for that pattern. The purpose of this paper is not only to exemplify learning behaviour of
neo-cognitron as base classifiers, but also to purport better fashion to combine neural network
based ensemble classifiers.
Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...IJMER
The behaviour of soil at the location of the project and interactions of the earth materials during and after construction has a major influence on the success, economy and safety of the work. Another complexity associated with some geotechnical engineering materials, such as sand and gravel, is the difficulty in obtaining undisturbed samples and time consuming involving skilled
technician. Knowledge of California Bearing Ratio (C.B.R) is essential in finding the road thickness. To cope up with the difficulties involved, an attempt has been made to model C.B.R in terms of Fine Fraction, Liquid Limit, Plasticity Index, Maximum Dry density, and Optimum Moisture content. A multi-layer perceptron network with feed forward back propagation is used to model varying the
number of hidden layers. For this purposes 50 soils test data was collected from the laboratory test
results. Among the test data 30 soils data is used for training and remaining 20 soils for testing using
60-40 distribution. The architectures developed are 5-4-1, 5-5-1, and 5-6-1. Model with 5-6-1 architecture is found to be quite satisfactory in predicting C.B.R of soils. A graph is plotted between
the predicted values and observed values of outputs for training and testing process, from the graph it
is found that all the points are close to equality line, indicating predicted values are close to observed
values
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project working on species of the order Hemiptera.
Introduction to ANN Principles and its Applications in Solar Energy TechnologyAli Al-Waeli
I presented the slides in 2022, at SERI, UKM. The aim of the presentation is to provide an overview of AI, Machine Learning and ANN. Moreover, to introduce their application in Solar energy technologies.
Classification Of Iris Plant Using Feedforward Neural Networkirjes
The classification and recognition of type on the basis of individual features and behaviors constitute
a preliminary measure and is an important target in the behavioral sciences. Current statistical methods do not
always yield satisfactory answers. A Feed Forward Artificial Neural Network is the computer model inspired by
the structure of the Human Brain. It views as in the set of artificial nerve cells that are interconnected with the
other neurons. The primary aim of this paper is to demonstrate the process of developing the Artificial Neural
network based classifier which classifies the Iris database. The problem concerns the identification of Iris plant
species on the basis of plant attribute measurements. This paper is related to the use of feed forward neural
networks towards the identification of iris plants on the basis of the following measurements: sepal length, sepal
width, petal length, and petal width. Using this data set a Neural Network (NN) is used for the classification of
iris data set. The EBPA is used for training of this ANN. The results of simulations illustrate the effectiveness of
the neural system in iris class identification.
Open CV Implementation of Object Recognition Using Artificial Neural Networksijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Deep learning is a collection of machine learning algorithms utilizing multiple layers, with which higher levels of raw data are slowly removed. For example, lower layers can recognize edges in image processing whereas higher layers may define concepts for humans such as numbers or letters or faces. In this paper we have done a literature survey of some other papers to know how useful is Deep Learning and how to define other Artificial Intelligence things using Deep Learning. Anirban Chakraborty "A Study of Deep Learning Applications" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31629.pdf Paper Url :https://www.ijtsrd.com/computer-science/artificial-intelligence/31629/a-study-of-deep-learning-applications/anirban-chakraborty
Employing Neocognitron Neural Network Base Ensemble Classifiers To Enhance Ef...cscpconf
This paper presents an ensemble of neo-cognitron neural network base classifiers to enhance
the accuracy of the system, along the experimental results. The method offers lesser
computational preprocessing in comparison to other ensemble techniques as it ex-preempts
feature extraction process before feeding the data into base classifiers. This is achieved by the
basic nature of neo-cognitron, it is a multilayer feed-forward neural network. Ensemble of such
base classifiers gives class labels for each pattern that in turn is combined to give the final class
label for that pattern. The purpose of this paper is not only to exemplify learning behaviour of
neo-cognitron as base classifiers, but also to purport better fashion to combine neural network
based ensemble classifiers.
Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...IJMER
The behaviour of soil at the location of the project and interactions of the earth materials during and after construction has a major influence on the success, economy and safety of the work. Another complexity associated with some geotechnical engineering materials, such as sand and gravel, is the difficulty in obtaining undisturbed samples and time consuming involving skilled
technician. Knowledge of California Bearing Ratio (C.B.R) is essential in finding the road thickness. To cope up with the difficulties involved, an attempt has been made to model C.B.R in terms of Fine Fraction, Liquid Limit, Plasticity Index, Maximum Dry density, and Optimum Moisture content. A multi-layer perceptron network with feed forward back propagation is used to model varying the
number of hidden layers. For this purposes 50 soils test data was collected from the laboratory test
results. Among the test data 30 soils data is used for training and remaining 20 soils for testing using
60-40 distribution. The architectures developed are 5-4-1, 5-5-1, and 5-6-1. Model with 5-6-1 architecture is found to be quite satisfactory in predicting C.B.R of soils. A graph is plotted between
the predicted values and observed values of outputs for training and testing process, from the graph it
is found that all the points are close to equality line, indicating predicted values are close to observed
values
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project working on species of the order Hemiptera.
Introduction to ANN Principles and its Applications in Solar Energy TechnologyAli Al-Waeli
I presented the slides in 2022, at SERI, UKM. The aim of the presentation is to provide an overview of AI, Machine Learning and ANN. Moreover, to introduce their application in Solar energy technologies.
Classification Of Iris Plant Using Feedforward Neural Networkirjes
The classification and recognition of type on the basis of individual features and behaviors constitute
a preliminary measure and is an important target in the behavioral sciences. Current statistical methods do not
always yield satisfactory answers. A Feed Forward Artificial Neural Network is the computer model inspired by
the structure of the Human Brain. It views as in the set of artificial nerve cells that are interconnected with the
other neurons. The primary aim of this paper is to demonstrate the process of developing the Artificial Neural
network based classifier which classifies the Iris database. The problem concerns the identification of Iris plant
species on the basis of plant attribute measurements. This paper is related to the use of feed forward neural
networks towards the identification of iris plants on the basis of the following measurements: sepal length, sepal
width, petal length, and petal width. Using this data set a Neural Network (NN) is used for the classification of
iris data set. The EBPA is used for training of this ANN. The results of simulations illustrate the effectiveness of
the neural system in iris class identification.
1. Using Deep Learning to Automatically Learn Feature Representation and
Build a Better Classification Model on Protein Sequential Data
SonPham,BrianR.King,PhD
Computer Science department, Bucknell University, Lewisburg, PA
BACKGROUND
Deep Learning recently became one of the most exciting
directions that Machine Learning has witnessed in years.
The technology achieved unbelievable success in image
recognition, facial detection and audio extraction. While
most research on Deep Learning focuses on 2D image
recognition, there are very few methods that have
investigated its use on strictly 1D sequential data, such
as those found in biological sequences.
OUR GOAL
This study will aim to investigate the use of deep
learning in order to:
• Understand how each layer of neural network helps
represent hierarchical features of one-dimensional
sequential data
• Induce a protein sequence classifier that can
outperform existing methods.
SCRATCH 1-D DATABASE
SRATCH 1-D protein database is an open-source protein
database by University of California Irvine. The database
contains data of over 5700 proteins and their respective
secondary structures. In this database, each amino acid in a
protein is encoded as one of 20 alphabet letters and its
secondary structure is encoded as either Coil (C), α-helix (H)
or β-strand (E).
PREPROCESSING
In this problem, we will slice each protein into smaller substrings of length 15 using the sliding window technique. Each of these
substrings will be attached with the label of the middle amino acid. We will also randomly sample 100,000 substrings of length 6
for feature detections.
RESULTS
We were able to achieve 62.156% accuracy on
Protein Secondary Structure Prediction. The
technology seems to be able to detect common
features that can be used to distinguish between
different secondary structures.
FUTURE WORK
We plan to work on improve the accuracy of Protein
Secondary Structure prediction as well as applying
the current deep learning architecture to predicting
protein subcellular localization.
ACKNOWLEDGEMENT
REFERENCE
http://deeplearning.stanford.edu/
http://scratch.proteomics.ics.uci.edu/
Given the amino acid sequence, our goal is to predict as
many correct secondary structure as possible.
Random Coil
α-helix
β-strand
Sequence: TIKVLFVDDHEMVRIGIS…
Structure: CEEEEEECCCHHHHHHHH…
Example sequence and its respective structure
FEATURE DETECTION
In order to detect meaningful features out of protein
sequence, we decided to use the sparse auto-encoder. Sparse
auto-encoder is a neural network that can detects common
features out of a set of data. We feed 100,000 sample strings
of length 6 into the network and to find 40 common features
out of these samples.
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
A
R
N
D
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
A
R
N
D
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
A
R
N
D
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
𝑥𝑥
�𝑥𝑥
𝑊𝑊
ℎ
Sparse auto-encoder network architecture Visualization of found features
OOOOOOTIKVLFVDDHEMVRIGISSYLSTQSDIEVVGEGASGKEA…
CEEEEEECCCHHHHHHHHHHHHHCCCEEEEEEECHHHCC…
Input layer
21 x 15
Hidden layer
21 x 6 x 40
CC
Convolutional
layer
40 x 10
Output layer
40 x 10
Random Coil
β-strand
α-helix
𝑃𝑃(𝑦𝑦 = 𝐶𝐶| 𝑥𝑥)
𝑃𝑃(𝑦𝑦 = 𝐸𝐸| 𝑥𝑥)
𝑃𝑃(𝑦𝑦 = 𝐻𝐻| 𝑥𝑥)
Sliding window
Labels
Padded O’s Random sample
DEEP LEARNING ARCHTECTURE
After preprocessing, we will feed the
substrings into the deep learning model,
which is a neural network that contains
multiple layers. Each layer has a different
functionality:
• The first layer retrieves the 15-length
substring input.
• The second layer detects common
features out of 6-length samples.
• The third layer convolve the input will
found features to determine where the
features are.
• The fourth layer uses the new found
input to classify whether the structure
of the substring is a Coil, α-helix or β-
strand.
Cpred Epred Hpred
C
E
H
66.91%
27.20%
21.10%
10.10%
43.46%
10.03%
22.99%
29.35%
68.87%
By observing the confusion matrix, we learnt that
the network still has some problems detecting β-
strand. This might be due to the fact that β-strand
often has contact with other strands that are way
farther than the scope of the substrings that we
sampled.
I would like to thank Professor Brian King for his
expert advise and encouragement throughout this
research. Also, this project would have been
impossible without the funding support from
Bucknell University Program for Undergraduate
Research