SlideShare a Scribd company logo
1 of 7
Download to read offline
Vu Thanh Huy, Nguyen Manh Duy, Tran Tuan Anh and PhamThe Bao*
Faculty of Mathematics and Computer Science, VNUHCM-University of Science, Vietnam
*Corresponding author: PhamThe Bao, Faculty of Mathematics and Computer Science, VNUHCM-University of Science, Vietnam
Submission: November 09, 2017; Published: February 23, 2018
Predicting Protein Transmembrane Regionsby
Using LSTM Model
Introduction
Since about 10%-30% of all proteins contain transmembrane
helices [1], explorations of protein transmembrane structures
are critical for many fields of biology including pharmacy
industry. In contrast to protein secondary structures, determining
transmembrane protein structures requires a time-and-finance-
consuming effort. To get over this problem, machine learning
methods were proposed to capture information and relationship
inside the structures of already explored transmembrane proteins
and then, use that knowledge to predict transmembrane regions of
new proteins without any experiments execution.
Rost et al. [2] first introduced artificial neural network to
predict the transmembrane structure of proteins. The input of the
network was a sliding window of 21 residues with the predicted
residue in the middle of the window. The output layer had two units
which indicated the probability of having one of two characteristics-
transmembrane or not, of the middle residue. Hidden Markov
model was first employed by Krough et al. [3] (TMHMM model) and
the group of Tusnady & Simon [4] (HMMTOP model) to solve the
problem. They assigned residueā€™s characteristics (e.g. helix core,
inside loop, outside loop, helix cap, globular domain,...) to cyclic
states of a hidden Markov model. These states were connected by
transition probabilities which would be learned from a training
dataset. Support vector machine (SVM), which is an algorithm used
to classify patterns into two or more groups, has also been used to
predict transmembrane residues [5]. In this paper, we introduce a
novel method using LSTM model - a recurrent neural network, to
solve the problem.
Materials and Methodology
Dataset
We used a collection of well characterized integral membrane
proteinscollectedbyMolleretal.[6,7].Thedatasetwasactuallybuilt
by unifying, updating and verifying existing datasets. The collection
is categorized into 4 groups A,B,C and D in which group A has the
best quality and D has the least. To obtain a high-quality model,
we only used the first 3 groups A, B, and C which comprise of 177
known structure transmembrane proteins. Besides transmembrane
proteins, we also collected 100 non-transmembrane proteins as
negative samples from the training dataset of Krogh et al. [8]. In
total, we had 277 proteins which comprise of both transmembrane
and non-transmembrane sequences. We divided the dataset into
three parts consisting of 193-42-42 proteins (ration 7-1.5-1.5) as
training, validation and testing set respectively.
Features
Weinvestigatedfourfeaturesinourexperiments.Thesefeatures
are commonly used, easy-to-extract and effective in producing high
accuracy models [1,9]. They are:
Hydrophobicity of amino acids: This index of each amino acid
type is obtained from Kyte and Doolittle hydrophobicity scale [10].
Table 1 shows the hydrophobicity index of each type of amino acid.
The extracted feature is the mean of hydrophobicity of 10 residues
before and after the predicted residue and itself:
10
hydrophobicity 10
hydrophobicity of i residue
21
t
th
i t
tx
+
= āˆ’
=
āˆ‘ 	 (1)
Research Article
1/7Copyright Ā© All rights are reserved by PhamThe Bao.
Volume 1 - Issue - 2
Abstract
Predicting transmembrane regions in proteins using machine learning methods is a classical bioinformatics problem. In this paper, we propose
a novel approach to this problem using the Long Short-Term Memory (LSTM) model-a recurrent neural network. This recurrent model was trained
on an already explored set of proteins to capture the relationships between adjacent amino acids. Then it uses this information to predict whether an
amino acid on a new protein is a transmembrane residue or not. With accuracy up to 92.56%, our experiments show better results than other advanced
approaches. Our second contribution is an analysis of four common, easy-to-extract and effective features of an amino acid used in many machine
learning approaches. They are propensity, hydrophobicity, positive charge and identity feature. We implemented our model with combinations of these
four features to investigate the effect of each feature on our systemā€™s performance. Results of the experiments show that our method is as good as other
state-of-the-art methods and therefore is trustworthy to be used to predict transmembrane regions on structure-unexplored proteins. Our analysis of
the four features also points out efficient combinations of them for solving the problem. We hope this information will help later researches in the field
to choose a useful set of features.
Significances of
Bioengineering & BiosciencesC CRIMSON PUBLISHERS
Wings to the Research
ISSN 2637-8078
Significances Bioeng Biosci Copyright Ā© PhamThe Bao
2/7
How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances
Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510
Volume 1 - Issue - 2
Table 1: Amino acid hydrophobicity scale.
Amino acid A R N D C E Q G H I
Hydrophobicity 1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.5
Amino acid L K M F P S T W Y V
Hydrophobicity 3.8 -3.9 1.9 2.8 -1.6 -0.8 -0.7 -0.9 -1.3 4.2
Positive charge of amino acids: The feature has value 1 with
residue K-Lysine and R-Arginine and value 0 with the rest because
K and R are the only two positive charge residues. This feature was
used based on the ā€œPositive Inside Ruleā€: connecting ā€˜loopā€™ regions
on the inside of the membrane have more positive charges than
ā€˜loopā€™ regions on the outside [11]. This information provided us
clues about positions of transmembrane regions on proteins. This
feature is expressed as:
charge 1, if t residue is K or R
0, else
th
tx
ļ£±
= ļ£²
ļ£³
	 (2)
Propensity of amino acids: This index is a statistical result
obtained from the entire SWISS-PROT database and it was used as a
feature in PRED-TMR method [12,13]. Table 2 shows the propensity
index for each type of amino acid. The index was calculated by the
formula:
propensity
TM
t
Fi
x
Fi
= 	 (3)
Table 2: Amino acid propensity value (transmembrane potential).
Amino acid A R N D C E Q G H I
Propensity 1.383 0.124 0.389 0.153 1.202 0.131 0.273 1.158 0.395 2.083
Amino acid L K M F P S T W Y V
Propensity 1.845 0.108 1.502 2.235 0.597 0.806 0.879 1.79 1.075 1.756
Where i is the type of tth
residue; FiTM
and Fi are the frequencies
of the type i residue in transmembrane segments and in the entire
SWISS-PROT database respectively.
Identity of amino acids: This feature is a vector of 20 in
length. Each amino acid is expressed by value 1 at its corresponding
element and 0 at the rests:
ļ»
20 in length
identity
order of
t residue
0 ... 0 1 0 ... 0
th
tx
ļ£« ļ£¶
ļ£¬ ļ£·
= ļ£¬ ļ£·
ļ£¬ ļ£·
ļ£­ ļ£ø
ļ€¶ļ€“ļ€“ļ€“ļ€“ļ€“ļ€·ļ€“ļ€“ļ€“ļ€“ļ€“ļ€ø (4)
The assigned order of amino acid type is shown in Table 3.
Table 3: Assigned order of amino acids.
Amino
acid
A R N D C E Q G H I
Number 1 2 3 4 5 6 7 8 9 10
Amino
acid
L K M F P S T W Y V
Number 11 12 13 14 15 16 17 18 19 20
Methodology
Figure 1: System flow chart.
3/7
How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances
Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510
Significances Bioeng Biosci Copyright Ā© PhamThe Bao
Volume 1 - Issue - 2
In this paper, we proposed a new approach using Long Short-
Term Memory (LSTM) [14] model. LSTM is a recurrent neural
network structure that has the ability to learn relationships
between far sequential samples. Although neural network model
has been used before [2], the range of surrounding residues (in a
sliding window) that was analyzed to predict the center residue
was very limited. Particularly, only 10 residues before and after
the predicted residue were included in the window. To increase the
analyzed range of surrounding residues, we have to increase the
size of the sliding window which increases the number of nodes in
the whole neural network. This disadvantage stops our model from
learning relationships between far amino acids. However, with
the recurrent structure of LSTM, we can capture the relationship
between unlimitedly far residues without increasing the structureā€™s
size. With this advantage, we expect our model to give more correct
predictions about the transmembrane structure of protein. We can
use information from nearby residues in both sides of the predicted
residue to give better predictions. Therefore, a bidirectional LSTM
model was used to learn the relationship of one residue with its
nearby residues from both sides. Figure 1 demonstrates the flow
chart of our system.
Model configuration
Figure 2: Our bidirectional LSTM.
Since we wanted to investigate the effect of each feature on the
systemā€™s performance, we implemented our model using different
combinations of features along with different configurations. For
each direction of the bidirectional LSTM, our structure had one
layer with the number of hidden units varying according to the
used combination of features. For each feature combination, we had
tried many values of number of units to find the best configuration.
For all combinations consisting of propensity feature (whose length
of feature vector is 20), 150 hidden LSTM units were used, and
other cases not consisting of this feature, 50 units were used. Each
LSTM unit had a recurrent connection to itself and all the units in
the same layer. We used the full standard configuration of LSTM
which had input, output, forget gates and peephole system. For each
amino acid, results returned from the two-directional LSTMs were
summed up and then propagated to an one-layered feed forward
network. This feed forward network had two units in the output
layer, indicating the probability of the residue to be transmembrane
or not. The learning rate used for all cases was 0.01, and the cost
function was a cross-entropy function. Figure 2 demonstrates the
used bidirectional LSTM. Formulas (5) to (19) describes this model
mathematically.
( )(1) (1) (1) (1) (1)
1t f t f t ff W x U h bĻƒ āˆ’= + + (5)
	 ( )(2) (2) (2) (2) (2)
1t f t f t ff W x U h bĻƒ += + + (6)
	 ( )(1) (1) (1) (1) (1)
1t i t i t ii W x U h bĻƒ āˆ’= + + (7)
	 ( )(2) (2) (2) (2) (2)
1t i t i t ii W x U h bĻƒ += + + (8)
	 ( )(1) (1) (1) (1) (1)
1t o t o t oo W x U h bĻƒ āˆ’= + + (9)
	 ( )(2) (2) (2) (2) (2)
1t o t o t oo W x U h bĻƒ += + + (10)
	 ( )(1) (1) (1) (1) (1)
1tanht g t g t gg W x U h bāˆ’= + + (11)
	 ( )(2) (2) (2) (2) (2)
1tanht g t g t gg W x U h b+= + + (12)
	 (1) (1) (1) (1) (1)
1. .t t t t tc f c i gāˆ’= + (13)
	 (2) (2) (2) (2) (2)
1. .t t t t tc f c i g+= + 	 (14)
	 (1) (1) (1)
.tanht t th o c= 		 (15)
	 (2) (2) (2)
.tanht t th o c= 		 (16)
	
(1) (2)
t t th h h= + 		 (17)
	 ( )softmax .t toutput W h b= + 	 (18)
( ) ( )( )transmembrane | , non-transmembrane |t tP class x P class x= = = 	(19)
	 ( )argmaxt ty output= 	(20)
ā€¢	xt
: input feature of the tth
amino acid.
ā€¢	 (1) (1) (1) (1) (1) (1)
, , , , ,t t t t t tf i o g c h : the values of forget gate, input
gate, output gate, candidate value, cell state and hidden state
Significances Bioeng Biosci Copyright Ā© PhamThe Bao
4/7
How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances
Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510
Volume 1 - Issue - 2
respectively of the first LSTM (forward LSTM) at the tth
amino
acid.
ā€¢	
(1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1)
, , , , , , , , , , ,f f f i i i o o o g g gW U b W U b W U b W U b : weights and bias of
the above mentiond values.
ā€¢	
(2) (2) (2) (2) (2) (2)
, , , , ,t t t t t tf i o g c h :valuesofforgetgate,inputgate,output
gate, candidate value, cell state and hidden state respectively of
the second LSTM (backward LSTM) at the tth amino acid.
ā€¢	
(2) (2) (2) (2) (2) (2) (2) (2) (2) (2) (2) (2)
, , , , , , , , , , ,f f f i i i o o o g g gW U b W U b W U b W U b : weights
and bias of the above mentiond values.
ā€¢	 Ļƒ , tanh, softmax: logistic, hyperbolic tangent and
softmax activation function, respectively.
ā€¢	ht
: sum of outputs from two LSTMs at the tth
amino acid.
ā€¢	 , , tW b output : weights and bias of the dense layer and its
output at the tth
amino acid.
ā€¢	yt
: prediction from our model, the probability indicating
whether the tth
amino acid is transmembrane or not.
Post-processing
Although prediction accuracy returned from the bidirectional
LSTM might have been high, we still aimed to improve by a post-
processing step. Actually, this step might reduce slightly the
prediction accuracy on amino acids (only less than 0.3%). However,
itimprovedaccuracyindetectionoftransmembraneregionsoverall,
which was our main objective. The rules for post-processing were:
a.	 Dividing too long transmembrane-predicted region
(more than 30 residues) into two regions with same lengths by
changing the middle residue into non-transmembrane residue
b.	 Rejecting too short transmembrane region (less
than 5 residues) by changing all of its residues into non-
transmembrane residues
c.	 Connecting two predicted transmembrane regions if
they were divided by one non-transmembrane residue and
sum of the length of two regions was less than 25. We did this
by changing the dividing non-transmembrane residue into a
transmembrane residue.
Results and Discussion
Experiments
In order to investigate the effect of the mentioned features on
the performance of our predictions, we implemented the LSTM
model with different combinations of them. To be more specific,
we implemented the model with each feature separately and
all combinations of two, three and four features. Therefore, the
mathematical formula of a sampleā€™s feature was:
{ }hydrophobicity charge propensity identity
, , ,t t t t tx x x x xāŠ‚ (20)
Because these features have different lengths (1 for
hydrophobicity, positive charge, propensity feature and 20 for
identity feature), so for each combination, we had tried and
chosen the best LSTM configuration (number of hidden units)
as mentioned in the model configuration section. Note that the
predictions returned by the bidirectional LSTM were not post-
processed since we aimed to assess the raw performance of each
feature combination.
To compare the quality of our model with other methods, we
also tested common available transmembrane region predictors:
Rost et al. [2], Krogh et al. [3], Hirokawa et al. [15] on our testing
dataset. Systems of these methods are implemented on web servers
at [16] for PHDhtm, [17] for Hirokawa [15,18] for SOSUI. According
to review articles [1,19], these are considered good predictors
which used a variety of approaches. Indeed, comparison result in
[19] indicated that TMHMM, which employed hidden Markov model
to predict transmembrane region, is so far the best transmembrane
helix predictor. PHDhtm and SOSUI are two methods using
different approaches from us which were vanilla neural network
and hydrophobicity analysis - a non-machine-learning method,
respectively.
In this comparison, our model used the combination of three
features, namely hydrophobicity, propensity, and positive charge:
	 { }hydrophobicity charge propensity
, ,t t t tx x x x= 	 (21)
In this comparison test, we also used the post-processing
step to improve accuracy of our model. All the experiments were
implemented on a Dell PC with configuration: an Intel(R) Core(TM)
i7-6700K CPU, 24GB of RAM and an NVIDIA GeForce GTX 750 Ti
GPU.
Results
The tesing results of LSTM models using different combinations
of features are shown in Table 4-7.
a.	 Each feature: (Table 4)
Table 4: Assigned order of amino acids
Feature
used
Hydrophobicity Identity Propensity
Positive
charge
Num. of
hidden
units
50 150 50 50
TPamino
acid 73.61% 78.81% 77.60% 18.86%
TNamino
acid 96.45% 94.12% 96.10% 94.45%
CPamino
acid 90.78% 90.49% 91.80% 74.63%
TPamino
acid: Percentage of correctly predicted transmembrane residues
(true positives).
TNamino
acid: Percentage of correctly predicted non-transmembrane
residues (true negatives).
CPamino
acid: Percentage of correctly predicted transmembrane and non-
transmembrane residues (correct predictions).
b.	 The test results of our model and comparing models
(PHDhtm, TMHMM and SOSUI) are shown in Table 8.
Discussion
According to Table 4, the features hydrophobicity, propensity,
and identity had already worked well on their own. In particular,
they gained higher than 90% of correct predictions on residues.
5/7
How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances
Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510
Significances Bioeng Biosci Copyright Ā© PhamThe Bao
Volume 1 - Issue - 2
Notice that although the identity feature did not rely on any
physical or chemical characteristic of a residue, it still produced
high accuracy predictions. By contrast, the positive charge feature
did not work well. Indeed, it just produced about 75% of correct
predictions.
Combinations of these features increased the accuracy
remarkably as shown in Table 5-7. Figure 3 illustrates the
average accuracies of combinations of one, two, three, and four
features. Combinations of more features tended to produce higher
accuracies. However, not all combinations of more features would
produce higher accuracies, as in the case of the combination of
propensity and positive charge comparing to the combination of
hydrophobicity, propensity, and identity. A reasonable explanation
for this exception is the used LSTM configurations might not have
been the best configuration for the corresponding combinations
of features. When comparing with other state-of-the-art methods,
we chose the set of features: hydrophobicity, propensity, and
positive charge. Since this set of features produced high accuracy -
approximately 92.56%, and used only three features.
Table 5: Experimental results of combinations of two features.
Features
used
Hydrophobicity
+ Identity
Hydrophobicity+
Propensity
Hydrophobicity +
Positive charge
Num. of
hidden units
150 50 50
TPamino
acid 81.33% 79.69% 74.22%
TNamino
acid 95.25% 96.11% 96.57%
CPamino
acid 91.50% 91.93% 91.10%
Features
used
Identity Identity Propensity
+ Propensity
+ Positive
charge
+ Positive charge
Num. of
hidden units
150 150 50
TPamino
acid 80.62% 81.78% 80.48%
TNamino
acid 95.11% 95.26% 96.41%
CPamino
acid 91.58% 91.73% 92.62%
Table 6: Experimental results of combinations of three features.
Features used
Hydrophobicity + Identity
+ Propensity
Hydrophobicity + Identity
+ Positive charge
Identity + Propensity+
Positive charge
Hydrophobicity +
Propensity + Positive
charge
Num. of hidden units 150 150 150 50
TPamino
acid 81.36% 81.46% 82.76% 81.28%
TNamino
acid 95.27% 95.92% 95.01% 96.20%
CPamino
acid 91.43% 92.34% 91.86% 92.56%
Table 7: Experimental results of combinations of four features.
Features used Hydrophobicity + Identity + Propensity + Positive charge
Num. of hidden units 150
TPamino
acid 81.54%
TNamino
acid 95.41%
CPamino
acid 92.29%
Figure 3: Accuracies comparisonbetween combinations of features.
We also observed that the combination of identity, propensity,
and positive charge has the highest true positive prediction
accuracy-TPamino acid (82.76%). Thus if we do not want to
miss any transmembrane residue, this combination would be a
Significances Bioeng Biosci Copyright Ā© PhamThe Bao
6/7
How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances
Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510
Volume 1 - Issue - 2
reasonable choice to use. On the other hand, the combination of
hydrophobicity, and positive charge has the highest true negative
prediction accuracy-TNamino acid (96.57%), and therefore should
be used when we want to avoid predicting any non-existing
transmembrane region.
Results in Table 8 indicated that our methodā€™s accuracy is as
highas otherpredictors,and even higherthan someofthem. Indeed,
the bidirectional LSTM method is better than all others in some
aspects: true negatives (TNamino acid) and correct predictions
(CPamino acid) of predicted residues, true positives (TPregion) and
false positives (FPregion) of predicted regions.
Table 8: Experimental results of combinations of four features.
Methods TPamino
acid TNamino
acid CPamino
acid TPregion
FPregion
Bidirectional
LSTM (After post-
processing)
81.33% 96.17% 92.51% 91.89% 3/185
PHDhtm 86.15% 91.34% 90.03% 81.62% 42/185
TMHMM 84.19% 94.20% 91.68% 90.81% 10/185
SOSUI 79.18% 93.25% 89.69% 86.49% 8/185
TPregion
: Percentage of correctly predicted transmembrane regions (true positives). A predicted transmembrane region is considered correct if it
overlaps at least 10 residues with the ground truth transmembrane region.
Conclusion
In this study, we introduced a novel approach to predicting
transmembrane region on proteins using a bidirectional LSTM
model. Experiments showed that our system was better than other
state-of-the-art methods in following aspects: true negatives and
correct predictions of predicted residues, true positives and false
positives of predicted regions. Furthermore, we investigated the
effect of each used feature. Results indicated that models using
one of the features: identity, propensity and hydrophobicity had
already produced high accuracies. When combining these features
with residue charge feature or with each other, the accuracies rose
noticeably.
References
1.	 Chen CP, Rost B (2002) State-of-the-art in membrane protein. Appl
Bioinformatics 1(1): 21-35.
2.	 Rost B, Fariselli P, Casadio R (1996) Topology prediction for helical
transmembrane proteins at 86% accuracy. Protein Sci 5(8): 1704-1718.
3.	 Krogh A, Larsson B, Heijne GV, Sonnhammer ELL (2001) Predicting
transmembrane protein topology with a hidden markov model:
Application to complete genomes. J Mol Biol 305(3): 567-580.
4.	 TusnƔdy GE, Simon I (2001) The HMMTOP transmembrane topology
prediction server. Bioinformatics 17(9): 849-850.
5.	 Yuan Z, Mattick JS, Teasdale RD (2004) SVMtm: support vector machines
to predict transmembrane segments. J Comput Chem 25(5): 632-636.
6.	 Mƶller S, Kriventseva EV, Apweiler R (2000) A collection of well
characterised integral membrane proteins. Bioinformatics 16(12):
1159-1160.
7.	 Mƶller S, Kriventseva EV, Apweiler R (2016) Collection of well
characterized integral membrane proteins.
8.	 Krogh A, Larsson B, Heijne GV, Sonnhammer ELL (2016) Training
dataset of TMHMM.
9.	 Punta M, Forrest LR, Bigelow H, Kernytsky A, Liu J, et al. (2007)
Membrane protein prediction methods. Methods 41(4): 460-474.
10.	Kyte J, Doolittle RF (1982) A simple method for displaying the
hydropathic character of a protein. J Mol Biol 157(1): 105-132.
11.	vonHeijne G, Gavel Y (1988) Topogenic signals in integral membrane
proteins. European Journal of Biochemistry. 174(4): 671-678.
12.	Pasquier C, Promponas VJ, Palaios GA, Hamodrakas JS, Hamodrakas
SJ (1999) A novel method for predicting transmembrane segments in
proteins based on a statistical analysis of the SwissProt database: the
PRED-TMR algorithm. Protein Eng 12(5): 381-385.
13.	(2016) Propensity of amino acid.
14.	Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural
Computation 9(8): 1735-1780.
15.	Hirokawa T, Boon CS, Mitaku S (1998) SOSUI: classification and
secondary structure. Bioinformatics 14(4): 378-379.
16.	Rost B, Fariselli P, Casadio R (1996) PHDhtm server.
17.	Krogh A, Larsson B, Heijne GV, Sonnhammer ELL (2001) TMHMM server.
18.	Hirokawa T, Boon CS, Mitaku S (1998) SOSUI server.
19.	Mƶller S, Croning MD, Apweiler R (2001) Evaluation of methods for the
prediction of membrane spanning regions. Bioinformatics 17(7): 646-
653.
20.	Chen CP, Rost B (2002) State-of-the-art in membrane protein. Applied
Bioinformatics 1(1): 21-35.
21.	Pasquier C, Hamodrakas SJ (1999) An hierarchical artificial neural
network system for the classification of transmembrane proteins.
Protein Engineering, Design & Selection 12(8): 631-634.
7/7
How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances
Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510
Significances Bioeng Biosci Copyright Ā© PhamThe Bao
Volume 1 - Issue - 2
Your subsequent submission with Crimson Publishers
will attain the below benefits
ā€¢	 High-level peer review and editorial services
ā€¢	 Freely accessible online immediately upon publication
ā€¢	 Authors retain the copyright to their work
ā€¢	 Licensing it under a Creative Commons license
ā€¢	 Visibility through different online platforms
ā€¢	 Global attainment for your research
ā€¢	 Article availability in different formats (Pdf, E-pub, Full Text)
ā€¢	 Endless customer service
ā€¢	 Reasonable Membership services
ā€¢	 Reprints availability upon request
ā€¢	 One step article tracking system
For possible submissions Click Here Submit Article
Creative Commons Attribution 4.0
International License

More Related Content

What's hot

Presentation1
Presentation1Presentation1
Presentation1firesea
Ā 
HMMā€™S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
HMMā€™S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISHMMā€™S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
HMMā€™S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISijcseit
Ā 
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...IOSR Journals
Ā 
Multi objective approach in predicting
Multi objective approach in predictingMulti objective approach in predicting
Multi objective approach in predictingijaia
Ā 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_predictionShwetA Kumari
Ā 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Sheng Wang
Ā 
threading and homology modelling methods
threading and homology modelling methodsthreading and homology modelling methods
threading and homology modelling methodsmohammed muzammil
Ā 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure predictionSubin E K
Ā 
Protein computational analysis
Protein computational analysisProtein computational analysis
Protein computational analysisKinza Irshad
Ā 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure predictionSamvartika Majumdar
Ā 
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
MULISA : A New Strategy for Discovery of Protein Functional Motifs and ResiduesMULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residuescsandit
Ā 
A new algorithm for Predicting Metabolic Pathways
A new algorithm for Predicting Metabolic PathwaysA new algorithm for Predicting Metabolic Pathways
A new algorithm for Predicting Metabolic Pathwaysinventionjournals
Ā 
Review and Comparisons between Multiple Ant Based Routing Algorithms in Mobi...
Review and Comparisons between Multiple Ant Based Routing  Algorithms in Mobi...Review and Comparisons between Multiple Ant Based Routing  Algorithms in Mobi...
Review and Comparisons between Multiple Ant Based Routing Algorithms in Mobi...IJMER
Ā 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methodsratanvishwas
Ā 
IJBB-51-3-188-200
IJBB-51-3-188-200IJBB-51-3-188-200
IJBB-51-3-188-200sankar basu
Ā 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database nadeem akhter
Ā 

What's hot (20)

Presentation1
Presentation1Presentation1
Presentation1
Ā 
HMMā€™S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
HMMā€™S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISHMMā€™S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
HMMā€™S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
Ā 
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
Ā 
Multi objective approach in predicting
Multi objective approach in predictingMulti objective approach in predicting
Multi objective approach in predicting
Ā 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
Ā 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013
Ā 
demonstration lecture on Homology modeling
demonstration lecture on Homology modelingdemonstration lecture on Homology modeling
demonstration lecture on Homology modeling
Ā 
threading and homology modelling methods
threading and homology modelling methodsthreading and homology modelling methods
threading and homology modelling methods
Ā 
Homology modelling
Homology modellingHomology modelling
Homology modelling
Ā 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
Ā 
Protein computational analysis
Protein computational analysisProtein computational analysis
Protein computational analysis
Ā 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
Ā 
Protein modeling
Protein modelingProtein modeling
Protein modeling
Ā 
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
MULISA : A New Strategy for Discovery of Protein Functional Motifs and ResiduesMULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
Ā 
A new algorithm for Predicting Metabolic Pathways
A new algorithm for Predicting Metabolic PathwaysA new algorithm for Predicting Metabolic Pathways
A new algorithm for Predicting Metabolic Pathways
Ā 
Review and Comparisons between Multiple Ant Based Routing Algorithms in Mobi...
Review and Comparisons between Multiple Ant Based Routing  Algorithms in Mobi...Review and Comparisons between Multiple Ant Based Routing  Algorithms in Mobi...
Review and Comparisons between Multiple Ant Based Routing Algorithms in Mobi...
Ā 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
Ā 
Sir hussain
Sir hussainSir hussain
Sir hussain
Ā 
IJBB-51-3-188-200
IJBB-51-3-188-200IJBB-51-3-188-200
IJBB-51-3-188-200
Ā 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
Ā 

Similar to Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model

The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...Haley D. Norman
Ā 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesAlexander Decker
Ā 
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...sipij
Ā 
BEL110 presentation
BEL110 presentationBEL110 presentation
BEL110 presentationvariable_orr
Ā 
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATIONAMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATIONcscpconf
Ā 
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET Journal
Ā 
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...inventionjournals
Ā 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody
Ā 
Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Robin Gutell
Ā 
Comparative Protein Structure Modeling and itsApplications
Comparative Protein Structure Modeling and itsApplicationsComparative Protein Structure Modeling and itsApplications
Comparative Protein Structure Modeling and itsApplicationsLynellBull52
Ā 
Amino acid interaction network prediction using multi objective optimization
Amino acid interaction network prediction using multi objective optimizationAmino acid interaction network prediction using multi objective optimization
Amino acid interaction network prediction using multi objective optimizationcsandit
Ā 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ijaia
Ā 
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Keiji Takamoto
Ā 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Alexander Decker
Ā 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Robin Gutell
Ā 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
Ā 

Similar to Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model (20)

NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based AlgorithmsNMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
Ā 
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
Ā 
PAPER-EF_AJ_RB_CS
PAPER-EF_AJ_RB_CSPAPER-EF_AJ_RB_CS
PAPER-EF_AJ_RB_CS
Ā 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machines
Ā 
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...
Ā 
BEL110 presentation
BEL110 presentationBEL110 presentation
BEL110 presentation
Ā 
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATIONAMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
Ā 
1207.2600
1207.26001207.2600
1207.2600
Ā 
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
Ā 
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Ā 
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
QSAR Studies of the Inhibitory Activity of a Series of Substituted Indole and...
Ā 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Ā 
Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676
Ā 
Comparative Protein Structure Modeling and itsApplications
Comparative Protein Structure Modeling and itsApplicationsComparative Protein Structure Modeling and itsApplications
Comparative Protein Structure Modeling and itsApplications
Ā 
Amino acid interaction network prediction using multi objective optimization
Amino acid interaction network prediction using multi objective optimizationAmino acid interaction network prediction using multi objective optimization
Amino acid interaction network prediction using multi objective optimization
Ā 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
Ā 
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Ā 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
Ā 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
Ā 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
Ā 

More from CrimsonPublishers-SBB

Increase in Bone Mass Density and Overall Bone Health Following High-Impact o...
Increase in Bone Mass Density and Overall Bone Health Following High-Impact o...Increase in Bone Mass Density and Overall Bone Health Following High-Impact o...
Increase in Bone Mass Density and Overall Bone Health Following High-Impact o...CrimsonPublishers-SBB
Ā 
Effect of Genetic Drift versus Natural Selection on Clutch Traits in Two Popu...
Effect of Genetic Drift versus Natural Selection on Clutch Traits in Two Popu...Effect of Genetic Drift versus Natural Selection on Clutch Traits in Two Popu...
Effect of Genetic Drift versus Natural Selection on Clutch Traits in Two Popu...CrimsonPublishers-SBB
Ā 
Information about the Effect of Climatic Factors and Soil Moisture Status on ...
Information about the Effect of Climatic Factors and Soil Moisture Status on ...Information about the Effect of Climatic Factors and Soil Moisture Status on ...
Information about the Effect of Climatic Factors and Soil Moisture Status on ...CrimsonPublishers-SBB
Ā 
Dangerous Food Items_Crimson Publishers
Dangerous Food Items_Crimson PublishersDangerous Food Items_Crimson Publishers
Dangerous Food Items_Crimson PublishersCrimsonPublishers-SBB
Ā 
Factors Affecting the Contribution of 1st Year Female Students Tutorial Class...
Factors Affecting the Contribution of 1st Year Female Students Tutorial Class...Factors Affecting the Contribution of 1st Year Female Students Tutorial Class...
Factors Affecting the Contribution of 1st Year Female Students Tutorial Class...CrimsonPublishers-SBB
Ā 
A Biomechanical Paradigm Shift: Part I: Transforming Lower Extremity Biomecha...
A Biomechanical Paradigm Shift: Part I: Transforming Lower Extremity Biomecha...A Biomechanical Paradigm Shift: Part I: Transforming Lower Extremity Biomecha...
A Biomechanical Paradigm Shift: Part I: Transforming Lower Extremity Biomecha...CrimsonPublishers-SBB
Ā 
Crimson Publishers-A Review on Biomedical Waste and its Management
Crimson Publishers-A Review on Biomedical Waste and its ManagementCrimson Publishers-A Review on Biomedical Waste and its Management
Crimson Publishers-A Review on Biomedical Waste and its ManagementCrimsonPublishers-SBB
Ā 
Crimson Publishers_Chemical Composition of Tamarix and Almond Fibers
Crimson Publishers_Chemical Composition of Tamarix and Almond FibersCrimson Publishers_Chemical Composition of Tamarix and Almond Fibers
Crimson Publishers_Chemical Composition of Tamarix and Almond FibersCrimsonPublishers-SBB
Ā 
Crimson Publishers-An Overview of Aniline and Chlorinelike Compounds as Envir...
Crimson Publishers-An Overview of Aniline and Chlorinelike Compounds as Envir...Crimson Publishers-An Overview of Aniline and Chlorinelike Compounds as Envir...
Crimson Publishers-An Overview of Aniline and Chlorinelike Compounds as Envir...CrimsonPublishers-SBB
Ā 
Crimson Publishers_A Secreted Protein in the Wnt Signaling Pathway-R-Spondin3...
Crimson Publishers_A Secreted Protein in the Wnt Signaling Pathway-R-Spondin3...Crimson Publishers_A Secreted Protein in the Wnt Signaling Pathway-R-Spondin3...
Crimson Publishers_A Secreted Protein in the Wnt Signaling Pathway-R-Spondin3...CrimsonPublishers-SBB
Ā 
Crimson Publishers_Oral Contraceptives and Breast Cancer Risk: A Study among ...
Crimson Publishers_Oral Contraceptives and Breast Cancer Risk: A Study among ...Crimson Publishers_Oral Contraceptives and Breast Cancer Risk: A Study among ...
Crimson Publishers_Oral Contraceptives and Breast Cancer Risk: A Study among ...CrimsonPublishers-SBB
Ā 
Crimson Publishers- New Hope for Cancer Immunotherapy: Viral Based Cancer Vac...
Crimson Publishers- New Hope for Cancer Immunotherapy: Viral Based Cancer Vac...Crimson Publishers- New Hope for Cancer Immunotherapy: Viral Based Cancer Vac...
Crimson Publishers- New Hope for Cancer Immunotherapy: Viral Based Cancer Vac...CrimsonPublishers-SBB
Ā 
Crimson Publishers- CPG Methylation in G-Quadruplex and IMotif DNA Structures
Crimson Publishers- CPG Methylation in G-Quadruplex and IMotif DNA StructuresCrimson Publishers- CPG Methylation in G-Quadruplex and IMotif DNA Structures
Crimson Publishers- CPG Methylation in G-Quadruplex and IMotif DNA StructuresCrimsonPublishers-SBB
Ā 
Crimson Publishers-Noncoding RNA Modulation for Cardiovascular Therapeutics
Crimson Publishers-Noncoding RNA Modulation for Cardiovascular Therapeutics Crimson Publishers-Noncoding RNA Modulation for Cardiovascular Therapeutics
Crimson Publishers-Noncoding RNA Modulation for Cardiovascular Therapeutics CrimsonPublishers-SBB
Ā 
Crimson Publishers- The Effect of Medial Hamstring Weakness on Soft Tissue Lo...
Crimson Publishers- The Effect of Medial Hamstring Weakness on Soft Tissue Lo...Crimson Publishers- The Effect of Medial Hamstring Weakness on Soft Tissue Lo...
Crimson Publishers- The Effect of Medial Hamstring Weakness on Soft Tissue Lo...CrimsonPublishers-SBB
Ā 
Crimson Publishers -A Sensor Multiplatform for Non Invasive Diagnosis of Pros...
Crimson Publishers -A Sensor Multiplatform for Non Invasive Diagnosis of Pros...Crimson Publishers -A Sensor Multiplatform for Non Invasive Diagnosis of Pros...
Crimson Publishers -A Sensor Multiplatform for Non Invasive Diagnosis of Pros...CrimsonPublishers-SBB
Ā 
Crimson Publishers-The Bioscience of Merging Traditional Chinese Medicine wit...
Crimson Publishers-The Bioscience of Merging Traditional Chinese Medicine wit...Crimson Publishers-The Bioscience of Merging Traditional Chinese Medicine wit...
Crimson Publishers-The Bioscience of Merging Traditional Chinese Medicine wit...CrimsonPublishers-SBB
Ā 
Crimson Publishers- Create the Individualized Digital Twin for Noninvasive Pr...
Crimson Publishers- Create the Individualized Digital Twin for Noninvasive Pr...Crimson Publishers- Create the Individualized Digital Twin for Noninvasive Pr...
Crimson Publishers- Create the Individualized Digital Twin for Noninvasive Pr...CrimsonPublishers-SBB
Ā 
Crimson Publishers-Rapid Microvalve Actuated Electroosmotic Reagent Delivery ...
Crimson Publishers-Rapid Microvalve Actuated Electroosmotic Reagent Delivery ...Crimson Publishers-Rapid Microvalve Actuated Electroosmotic Reagent Delivery ...
Crimson Publishers-Rapid Microvalve Actuated Electroosmotic Reagent Delivery ...CrimsonPublishers-SBB
Ā 
Crimson Publishers-The Risk Level of Viet Nam Listed Medical and Human Resou...
Crimson Publishers-The Risk Level of Viet Nam Listed Medical and Human  Resou...Crimson Publishers-The Risk Level of Viet Nam Listed Medical and Human  Resou...
Crimson Publishers-The Risk Level of Viet Nam Listed Medical and Human Resou...CrimsonPublishers-SBB
Ā 

More from CrimsonPublishers-SBB (20)

Increase in Bone Mass Density and Overall Bone Health Following High-Impact o...
Increase in Bone Mass Density and Overall Bone Health Following High-Impact o...Increase in Bone Mass Density and Overall Bone Health Following High-Impact o...
Increase in Bone Mass Density and Overall Bone Health Following High-Impact o...
Ā 
Effect of Genetic Drift versus Natural Selection on Clutch Traits in Two Popu...
Effect of Genetic Drift versus Natural Selection on Clutch Traits in Two Popu...Effect of Genetic Drift versus Natural Selection on Clutch Traits in Two Popu...
Effect of Genetic Drift versus Natural Selection on Clutch Traits in Two Popu...
Ā 
Information about the Effect of Climatic Factors and Soil Moisture Status on ...
Information about the Effect of Climatic Factors and Soil Moisture Status on ...Information about the Effect of Climatic Factors and Soil Moisture Status on ...
Information about the Effect of Climatic Factors and Soil Moisture Status on ...
Ā 
Dangerous Food Items_Crimson Publishers
Dangerous Food Items_Crimson PublishersDangerous Food Items_Crimson Publishers
Dangerous Food Items_Crimson Publishers
Ā 
Factors Affecting the Contribution of 1st Year Female Students Tutorial Class...
Factors Affecting the Contribution of 1st Year Female Students Tutorial Class...Factors Affecting the Contribution of 1st Year Female Students Tutorial Class...
Factors Affecting the Contribution of 1st Year Female Students Tutorial Class...
Ā 
A Biomechanical Paradigm Shift: Part I: Transforming Lower Extremity Biomecha...
A Biomechanical Paradigm Shift: Part I: Transforming Lower Extremity Biomecha...A Biomechanical Paradigm Shift: Part I: Transforming Lower Extremity Biomecha...
A Biomechanical Paradigm Shift: Part I: Transforming Lower Extremity Biomecha...
Ā 
Crimson Publishers-A Review on Biomedical Waste and its Management
Crimson Publishers-A Review on Biomedical Waste and its ManagementCrimson Publishers-A Review on Biomedical Waste and its Management
Crimson Publishers-A Review on Biomedical Waste and its Management
Ā 
Crimson Publishers_Chemical Composition of Tamarix and Almond Fibers
Crimson Publishers_Chemical Composition of Tamarix and Almond FibersCrimson Publishers_Chemical Composition of Tamarix and Almond Fibers
Crimson Publishers_Chemical Composition of Tamarix and Almond Fibers
Ā 
Crimson Publishers-An Overview of Aniline and Chlorinelike Compounds as Envir...
Crimson Publishers-An Overview of Aniline and Chlorinelike Compounds as Envir...Crimson Publishers-An Overview of Aniline and Chlorinelike Compounds as Envir...
Crimson Publishers-An Overview of Aniline and Chlorinelike Compounds as Envir...
Ā 
Crimson Publishers_A Secreted Protein in the Wnt Signaling Pathway-R-Spondin3...
Crimson Publishers_A Secreted Protein in the Wnt Signaling Pathway-R-Spondin3...Crimson Publishers_A Secreted Protein in the Wnt Signaling Pathway-R-Spondin3...
Crimson Publishers_A Secreted Protein in the Wnt Signaling Pathway-R-Spondin3...
Ā 
Crimson Publishers_Oral Contraceptives and Breast Cancer Risk: A Study among ...
Crimson Publishers_Oral Contraceptives and Breast Cancer Risk: A Study among ...Crimson Publishers_Oral Contraceptives and Breast Cancer Risk: A Study among ...
Crimson Publishers_Oral Contraceptives and Breast Cancer Risk: A Study among ...
Ā 
Crimson Publishers- New Hope for Cancer Immunotherapy: Viral Based Cancer Vac...
Crimson Publishers- New Hope for Cancer Immunotherapy: Viral Based Cancer Vac...Crimson Publishers- New Hope for Cancer Immunotherapy: Viral Based Cancer Vac...
Crimson Publishers- New Hope for Cancer Immunotherapy: Viral Based Cancer Vac...
Ā 
Crimson Publishers- CPG Methylation in G-Quadruplex and IMotif DNA Structures
Crimson Publishers- CPG Methylation in G-Quadruplex and IMotif DNA StructuresCrimson Publishers- CPG Methylation in G-Quadruplex and IMotif DNA Structures
Crimson Publishers- CPG Methylation in G-Quadruplex and IMotif DNA Structures
Ā 
Crimson Publishers-Noncoding RNA Modulation for Cardiovascular Therapeutics
Crimson Publishers-Noncoding RNA Modulation for Cardiovascular Therapeutics Crimson Publishers-Noncoding RNA Modulation for Cardiovascular Therapeutics
Crimson Publishers-Noncoding RNA Modulation for Cardiovascular Therapeutics
Ā 
Crimson Publishers- The Effect of Medial Hamstring Weakness on Soft Tissue Lo...
Crimson Publishers- The Effect of Medial Hamstring Weakness on Soft Tissue Lo...Crimson Publishers- The Effect of Medial Hamstring Weakness on Soft Tissue Lo...
Crimson Publishers- The Effect of Medial Hamstring Weakness on Soft Tissue Lo...
Ā 
Crimson Publishers -A Sensor Multiplatform for Non Invasive Diagnosis of Pros...
Crimson Publishers -A Sensor Multiplatform for Non Invasive Diagnosis of Pros...Crimson Publishers -A Sensor Multiplatform for Non Invasive Diagnosis of Pros...
Crimson Publishers -A Sensor Multiplatform for Non Invasive Diagnosis of Pros...
Ā 
Crimson Publishers-The Bioscience of Merging Traditional Chinese Medicine wit...
Crimson Publishers-The Bioscience of Merging Traditional Chinese Medicine wit...Crimson Publishers-The Bioscience of Merging Traditional Chinese Medicine wit...
Crimson Publishers-The Bioscience of Merging Traditional Chinese Medicine wit...
Ā 
Crimson Publishers- Create the Individualized Digital Twin for Noninvasive Pr...
Crimson Publishers- Create the Individualized Digital Twin for Noninvasive Pr...Crimson Publishers- Create the Individualized Digital Twin for Noninvasive Pr...
Crimson Publishers- Create the Individualized Digital Twin for Noninvasive Pr...
Ā 
Crimson Publishers-Rapid Microvalve Actuated Electroosmotic Reagent Delivery ...
Crimson Publishers-Rapid Microvalve Actuated Electroosmotic Reagent Delivery ...Crimson Publishers-Rapid Microvalve Actuated Electroosmotic Reagent Delivery ...
Crimson Publishers-Rapid Microvalve Actuated Electroosmotic Reagent Delivery ...
Ā 
Crimson Publishers-The Risk Level of Viet Nam Listed Medical and Human Resou...
Crimson Publishers-The Risk Level of Viet Nam Listed Medical and Human  Resou...Crimson Publishers-The Risk Level of Viet Nam Listed Medical and Human  Resou...
Crimson Publishers-The Risk Level of Viet Nam Listed Medical and Human Resou...
Ā 

Recently uploaded

HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
Ā 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
Ā 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
Ā 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
Ā 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
Ā 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
Ā 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
Ā 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
Ā 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
Ā 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
Ā 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
Ā 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
Ā 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
Ā 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
Ā 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
Ā 
Gurgaon āœ”ļø9711147426āœØCall In girls Gurgaon Sector 51 escort service
Gurgaon āœ”ļø9711147426āœØCall In girls Gurgaon Sector 51 escort serviceGurgaon āœ”ļø9711147426āœØCall In girls Gurgaon Sector 51 escort service
Gurgaon āœ”ļø9711147426āœØCall In girls Gurgaon Sector 51 escort servicejennyeacort
Ā 

Recently uploaded (20)

HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
Ā 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
Ā 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
Ā 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
Ā 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
Ā 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Ā 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Ā 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
Ā 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
Ā 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
Ā 
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Serviceyoung call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
Ā 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Ā 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Ā 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
Ā 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
Ā 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Ā 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
Ā 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
Ā 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
Ā 
Gurgaon āœ”ļø9711147426āœØCall In girls Gurgaon Sector 51 escort service
Gurgaon āœ”ļø9711147426āœØCall In girls Gurgaon Sector 51 escort serviceGurgaon āœ”ļø9711147426āœØCall In girls Gurgaon Sector 51 escort service
Gurgaon āœ”ļø9711147426āœØCall In girls Gurgaon Sector 51 escort service
Ā 

Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model

  • 1. Vu Thanh Huy, Nguyen Manh Duy, Tran Tuan Anh and PhamThe Bao* Faculty of Mathematics and Computer Science, VNUHCM-University of Science, Vietnam *Corresponding author: PhamThe Bao, Faculty of Mathematics and Computer Science, VNUHCM-University of Science, Vietnam Submission: November 09, 2017; Published: February 23, 2018 Predicting Protein Transmembrane Regionsby Using LSTM Model Introduction Since about 10%-30% of all proteins contain transmembrane helices [1], explorations of protein transmembrane structures are critical for many fields of biology including pharmacy industry. In contrast to protein secondary structures, determining transmembrane protein structures requires a time-and-finance- consuming effort. To get over this problem, machine learning methods were proposed to capture information and relationship inside the structures of already explored transmembrane proteins and then, use that knowledge to predict transmembrane regions of new proteins without any experiments execution. Rost et al. [2] first introduced artificial neural network to predict the transmembrane structure of proteins. The input of the network was a sliding window of 21 residues with the predicted residue in the middle of the window. The output layer had two units which indicated the probability of having one of two characteristics- transmembrane or not, of the middle residue. Hidden Markov model was first employed by Krough et al. [3] (TMHMM model) and the group of Tusnady & Simon [4] (HMMTOP model) to solve the problem. They assigned residueā€™s characteristics (e.g. helix core, inside loop, outside loop, helix cap, globular domain,...) to cyclic states of a hidden Markov model. These states were connected by transition probabilities which would be learned from a training dataset. Support vector machine (SVM), which is an algorithm used to classify patterns into two or more groups, has also been used to predict transmembrane residues [5]. In this paper, we introduce a novel method using LSTM model - a recurrent neural network, to solve the problem. Materials and Methodology Dataset We used a collection of well characterized integral membrane proteinscollectedbyMolleretal.[6,7].Thedatasetwasactuallybuilt by unifying, updating and verifying existing datasets. The collection is categorized into 4 groups A,B,C and D in which group A has the best quality and D has the least. To obtain a high-quality model, we only used the first 3 groups A, B, and C which comprise of 177 known structure transmembrane proteins. Besides transmembrane proteins, we also collected 100 non-transmembrane proteins as negative samples from the training dataset of Krogh et al. [8]. In total, we had 277 proteins which comprise of both transmembrane and non-transmembrane sequences. We divided the dataset into three parts consisting of 193-42-42 proteins (ration 7-1.5-1.5) as training, validation and testing set respectively. Features Weinvestigatedfourfeaturesinourexperiments.Thesefeatures are commonly used, easy-to-extract and effective in producing high accuracy models [1,9]. They are: Hydrophobicity of amino acids: This index of each amino acid type is obtained from Kyte and Doolittle hydrophobicity scale [10]. Table 1 shows the hydrophobicity index of each type of amino acid. The extracted feature is the mean of hydrophobicity of 10 residues before and after the predicted residue and itself: 10 hydrophobicity 10 hydrophobicity of i residue 21 t th i t tx + = āˆ’ = āˆ‘ (1) Research Article 1/7Copyright Ā© All rights are reserved by PhamThe Bao. Volume 1 - Issue - 2 Abstract Predicting transmembrane regions in proteins using machine learning methods is a classical bioinformatics problem. In this paper, we propose a novel approach to this problem using the Long Short-Term Memory (LSTM) model-a recurrent neural network. This recurrent model was trained on an already explored set of proteins to capture the relationships between adjacent amino acids. Then it uses this information to predict whether an amino acid on a new protein is a transmembrane residue or not. With accuracy up to 92.56%, our experiments show better results than other advanced approaches. Our second contribution is an analysis of four common, easy-to-extract and effective features of an amino acid used in many machine learning approaches. They are propensity, hydrophobicity, positive charge and identity feature. We implemented our model with combinations of these four features to investigate the effect of each feature on our systemā€™s performance. Results of the experiments show that our method is as good as other state-of-the-art methods and therefore is trustworthy to be used to predict transmembrane regions on structure-unexplored proteins. Our analysis of the four features also points out efficient combinations of them for solving the problem. We hope this information will help later researches in the field to choose a useful set of features. Significances of Bioengineering & BiosciencesC CRIMSON PUBLISHERS Wings to the Research ISSN 2637-8078
  • 2. Significances Bioeng Biosci Copyright Ā© PhamThe Bao 2/7 How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510 Volume 1 - Issue - 2 Table 1: Amino acid hydrophobicity scale. Amino acid A R N D C E Q G H I Hydrophobicity 1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.5 Amino acid L K M F P S T W Y V Hydrophobicity 3.8 -3.9 1.9 2.8 -1.6 -0.8 -0.7 -0.9 -1.3 4.2 Positive charge of amino acids: The feature has value 1 with residue K-Lysine and R-Arginine and value 0 with the rest because K and R are the only two positive charge residues. This feature was used based on the ā€œPositive Inside Ruleā€: connecting ā€˜loopā€™ regions on the inside of the membrane have more positive charges than ā€˜loopā€™ regions on the outside [11]. This information provided us clues about positions of transmembrane regions on proteins. This feature is expressed as: charge 1, if t residue is K or R 0, else th tx ļ£± = ļ£² ļ£³ (2) Propensity of amino acids: This index is a statistical result obtained from the entire SWISS-PROT database and it was used as a feature in PRED-TMR method [12,13]. Table 2 shows the propensity index for each type of amino acid. The index was calculated by the formula: propensity TM t Fi x Fi = (3) Table 2: Amino acid propensity value (transmembrane potential). Amino acid A R N D C E Q G H I Propensity 1.383 0.124 0.389 0.153 1.202 0.131 0.273 1.158 0.395 2.083 Amino acid L K M F P S T W Y V Propensity 1.845 0.108 1.502 2.235 0.597 0.806 0.879 1.79 1.075 1.756 Where i is the type of tth residue; FiTM and Fi are the frequencies of the type i residue in transmembrane segments and in the entire SWISS-PROT database respectively. Identity of amino acids: This feature is a vector of 20 in length. Each amino acid is expressed by value 1 at its corresponding element and 0 at the rests: ļ» 20 in length identity order of t residue 0 ... 0 1 0 ... 0 th tx ļ£« ļ£¶ ļ£¬ ļ£· = ļ£¬ ļ£· ļ£¬ ļ£· ļ£­ ļ£ø ļ€¶ļ€“ļ€“ļ€“ļ€“ļ€“ļ€·ļ€“ļ€“ļ€“ļ€“ļ€“ļ€ø (4) The assigned order of amino acid type is shown in Table 3. Table 3: Assigned order of amino acids. Amino acid A R N D C E Q G H I Number 1 2 3 4 5 6 7 8 9 10 Amino acid L K M F P S T W Y V Number 11 12 13 14 15 16 17 18 19 20 Methodology Figure 1: System flow chart.
  • 3. 3/7 How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510 Significances Bioeng Biosci Copyright Ā© PhamThe Bao Volume 1 - Issue - 2 In this paper, we proposed a new approach using Long Short- Term Memory (LSTM) [14] model. LSTM is a recurrent neural network structure that has the ability to learn relationships between far sequential samples. Although neural network model has been used before [2], the range of surrounding residues (in a sliding window) that was analyzed to predict the center residue was very limited. Particularly, only 10 residues before and after the predicted residue were included in the window. To increase the analyzed range of surrounding residues, we have to increase the size of the sliding window which increases the number of nodes in the whole neural network. This disadvantage stops our model from learning relationships between far amino acids. However, with the recurrent structure of LSTM, we can capture the relationship between unlimitedly far residues without increasing the structureā€™s size. With this advantage, we expect our model to give more correct predictions about the transmembrane structure of protein. We can use information from nearby residues in both sides of the predicted residue to give better predictions. Therefore, a bidirectional LSTM model was used to learn the relationship of one residue with its nearby residues from both sides. Figure 1 demonstrates the flow chart of our system. Model configuration Figure 2: Our bidirectional LSTM. Since we wanted to investigate the effect of each feature on the systemā€™s performance, we implemented our model using different combinations of features along with different configurations. For each direction of the bidirectional LSTM, our structure had one layer with the number of hidden units varying according to the used combination of features. For each feature combination, we had tried many values of number of units to find the best configuration. For all combinations consisting of propensity feature (whose length of feature vector is 20), 150 hidden LSTM units were used, and other cases not consisting of this feature, 50 units were used. Each LSTM unit had a recurrent connection to itself and all the units in the same layer. We used the full standard configuration of LSTM which had input, output, forget gates and peephole system. For each amino acid, results returned from the two-directional LSTMs were summed up and then propagated to an one-layered feed forward network. This feed forward network had two units in the output layer, indicating the probability of the residue to be transmembrane or not. The learning rate used for all cases was 0.01, and the cost function was a cross-entropy function. Figure 2 demonstrates the used bidirectional LSTM. Formulas (5) to (19) describes this model mathematically. ( )(1) (1) (1) (1) (1) 1t f t f t ff W x U h bĻƒ āˆ’= + + (5) ( )(2) (2) (2) (2) (2) 1t f t f t ff W x U h bĻƒ += + + (6) ( )(1) (1) (1) (1) (1) 1t i t i t ii W x U h bĻƒ āˆ’= + + (7) ( )(2) (2) (2) (2) (2) 1t i t i t ii W x U h bĻƒ += + + (8) ( )(1) (1) (1) (1) (1) 1t o t o t oo W x U h bĻƒ āˆ’= + + (9) ( )(2) (2) (2) (2) (2) 1t o t o t oo W x U h bĻƒ += + + (10) ( )(1) (1) (1) (1) (1) 1tanht g t g t gg W x U h bāˆ’= + + (11) ( )(2) (2) (2) (2) (2) 1tanht g t g t gg W x U h b+= + + (12) (1) (1) (1) (1) (1) 1. .t t t t tc f c i gāˆ’= + (13) (2) (2) (2) (2) (2) 1. .t t t t tc f c i g+= + (14) (1) (1) (1) .tanht t th o c= (15) (2) (2) (2) .tanht t th o c= (16) (1) (2) t t th h h= + (17) ( )softmax .t toutput W h b= + (18) ( ) ( )( )transmembrane | , non-transmembrane |t tP class x P class x= = = (19) ( )argmaxt ty output= (20) ā€¢ xt : input feature of the tth amino acid. ā€¢ (1) (1) (1) (1) (1) (1) , , , , ,t t t t t tf i o g c h : the values of forget gate, input gate, output gate, candidate value, cell state and hidden state
  • 4. Significances Bioeng Biosci Copyright Ā© PhamThe Bao 4/7 How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510 Volume 1 - Issue - 2 respectively of the first LSTM (forward LSTM) at the tth amino acid. ā€¢ (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) , , , , , , , , , , ,f f f i i i o o o g g gW U b W U b W U b W U b : weights and bias of the above mentiond values. ā€¢ (2) (2) (2) (2) (2) (2) , , , , ,t t t t t tf i o g c h :valuesofforgetgate,inputgate,output gate, candidate value, cell state and hidden state respectively of the second LSTM (backward LSTM) at the tth amino acid. ā€¢ (2) (2) (2) (2) (2) (2) (2) (2) (2) (2) (2) (2) , , , , , , , , , , ,f f f i i i o o o g g gW U b W U b W U b W U b : weights and bias of the above mentiond values. ā€¢ Ļƒ , tanh, softmax: logistic, hyperbolic tangent and softmax activation function, respectively. ā€¢ ht : sum of outputs from two LSTMs at the tth amino acid. ā€¢ , , tW b output : weights and bias of the dense layer and its output at the tth amino acid. ā€¢ yt : prediction from our model, the probability indicating whether the tth amino acid is transmembrane or not. Post-processing Although prediction accuracy returned from the bidirectional LSTM might have been high, we still aimed to improve by a post- processing step. Actually, this step might reduce slightly the prediction accuracy on amino acids (only less than 0.3%). However, itimprovedaccuracyindetectionoftransmembraneregionsoverall, which was our main objective. The rules for post-processing were: a. Dividing too long transmembrane-predicted region (more than 30 residues) into two regions with same lengths by changing the middle residue into non-transmembrane residue b. Rejecting too short transmembrane region (less than 5 residues) by changing all of its residues into non- transmembrane residues c. Connecting two predicted transmembrane regions if they were divided by one non-transmembrane residue and sum of the length of two regions was less than 25. We did this by changing the dividing non-transmembrane residue into a transmembrane residue. Results and Discussion Experiments In order to investigate the effect of the mentioned features on the performance of our predictions, we implemented the LSTM model with different combinations of them. To be more specific, we implemented the model with each feature separately and all combinations of two, three and four features. Therefore, the mathematical formula of a sampleā€™s feature was: { }hydrophobicity charge propensity identity , , ,t t t t tx x x x xāŠ‚ (20) Because these features have different lengths (1 for hydrophobicity, positive charge, propensity feature and 20 for identity feature), so for each combination, we had tried and chosen the best LSTM configuration (number of hidden units) as mentioned in the model configuration section. Note that the predictions returned by the bidirectional LSTM were not post- processed since we aimed to assess the raw performance of each feature combination. To compare the quality of our model with other methods, we also tested common available transmembrane region predictors: Rost et al. [2], Krogh et al. [3], Hirokawa et al. [15] on our testing dataset. Systems of these methods are implemented on web servers at [16] for PHDhtm, [17] for Hirokawa [15,18] for SOSUI. According to review articles [1,19], these are considered good predictors which used a variety of approaches. Indeed, comparison result in [19] indicated that TMHMM, which employed hidden Markov model to predict transmembrane region, is so far the best transmembrane helix predictor. PHDhtm and SOSUI are two methods using different approaches from us which were vanilla neural network and hydrophobicity analysis - a non-machine-learning method, respectively. In this comparison, our model used the combination of three features, namely hydrophobicity, propensity, and positive charge: { }hydrophobicity charge propensity , ,t t t tx x x x= (21) In this comparison test, we also used the post-processing step to improve accuracy of our model. All the experiments were implemented on a Dell PC with configuration: an Intel(R) Core(TM) i7-6700K CPU, 24GB of RAM and an NVIDIA GeForce GTX 750 Ti GPU. Results The tesing results of LSTM models using different combinations of features are shown in Table 4-7. a. Each feature: (Table 4) Table 4: Assigned order of amino acids Feature used Hydrophobicity Identity Propensity Positive charge Num. of hidden units 50 150 50 50 TPamino acid 73.61% 78.81% 77.60% 18.86% TNamino acid 96.45% 94.12% 96.10% 94.45% CPamino acid 90.78% 90.49% 91.80% 74.63% TPamino acid: Percentage of correctly predicted transmembrane residues (true positives). TNamino acid: Percentage of correctly predicted non-transmembrane residues (true negatives). CPamino acid: Percentage of correctly predicted transmembrane and non- transmembrane residues (correct predictions). b. The test results of our model and comparing models (PHDhtm, TMHMM and SOSUI) are shown in Table 8. Discussion According to Table 4, the features hydrophobicity, propensity, and identity had already worked well on their own. In particular, they gained higher than 90% of correct predictions on residues.
  • 5. 5/7 How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510 Significances Bioeng Biosci Copyright Ā© PhamThe Bao Volume 1 - Issue - 2 Notice that although the identity feature did not rely on any physical or chemical characteristic of a residue, it still produced high accuracy predictions. By contrast, the positive charge feature did not work well. Indeed, it just produced about 75% of correct predictions. Combinations of these features increased the accuracy remarkably as shown in Table 5-7. Figure 3 illustrates the average accuracies of combinations of one, two, three, and four features. Combinations of more features tended to produce higher accuracies. However, not all combinations of more features would produce higher accuracies, as in the case of the combination of propensity and positive charge comparing to the combination of hydrophobicity, propensity, and identity. A reasonable explanation for this exception is the used LSTM configurations might not have been the best configuration for the corresponding combinations of features. When comparing with other state-of-the-art methods, we chose the set of features: hydrophobicity, propensity, and positive charge. Since this set of features produced high accuracy - approximately 92.56%, and used only three features. Table 5: Experimental results of combinations of two features. Features used Hydrophobicity + Identity Hydrophobicity+ Propensity Hydrophobicity + Positive charge Num. of hidden units 150 50 50 TPamino acid 81.33% 79.69% 74.22% TNamino acid 95.25% 96.11% 96.57% CPamino acid 91.50% 91.93% 91.10% Features used Identity Identity Propensity + Propensity + Positive charge + Positive charge Num. of hidden units 150 150 50 TPamino acid 80.62% 81.78% 80.48% TNamino acid 95.11% 95.26% 96.41% CPamino acid 91.58% 91.73% 92.62% Table 6: Experimental results of combinations of three features. Features used Hydrophobicity + Identity + Propensity Hydrophobicity + Identity + Positive charge Identity + Propensity+ Positive charge Hydrophobicity + Propensity + Positive charge Num. of hidden units 150 150 150 50 TPamino acid 81.36% 81.46% 82.76% 81.28% TNamino acid 95.27% 95.92% 95.01% 96.20% CPamino acid 91.43% 92.34% 91.86% 92.56% Table 7: Experimental results of combinations of four features. Features used Hydrophobicity + Identity + Propensity + Positive charge Num. of hidden units 150 TPamino acid 81.54% TNamino acid 95.41% CPamino acid 92.29% Figure 3: Accuracies comparisonbetween combinations of features. We also observed that the combination of identity, propensity, and positive charge has the highest true positive prediction accuracy-TPamino acid (82.76%). Thus if we do not want to miss any transmembrane residue, this combination would be a
  • 6. Significances Bioeng Biosci Copyright Ā© PhamThe Bao 6/7 How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510 Volume 1 - Issue - 2 reasonable choice to use. On the other hand, the combination of hydrophobicity, and positive charge has the highest true negative prediction accuracy-TNamino acid (96.57%), and therefore should be used when we want to avoid predicting any non-existing transmembrane region. Results in Table 8 indicated that our methodā€™s accuracy is as highas otherpredictors,and even higherthan someofthem. Indeed, the bidirectional LSTM method is better than all others in some aspects: true negatives (TNamino acid) and correct predictions (CPamino acid) of predicted residues, true positives (TPregion) and false positives (FPregion) of predicted regions. Table 8: Experimental results of combinations of four features. Methods TPamino acid TNamino acid CPamino acid TPregion FPregion Bidirectional LSTM (After post- processing) 81.33% 96.17% 92.51% 91.89% 3/185 PHDhtm 86.15% 91.34% 90.03% 81.62% 42/185 TMHMM 84.19% 94.20% 91.68% 90.81% 10/185 SOSUI 79.18% 93.25% 89.69% 86.49% 8/185 TPregion : Percentage of correctly predicted transmembrane regions (true positives). A predicted transmembrane region is considered correct if it overlaps at least 10 residues with the ground truth transmembrane region. Conclusion In this study, we introduced a novel approach to predicting transmembrane region on proteins using a bidirectional LSTM model. Experiments showed that our system was better than other state-of-the-art methods in following aspects: true negatives and correct predictions of predicted residues, true positives and false positives of predicted regions. Furthermore, we investigated the effect of each used feature. Results indicated that models using one of the features: identity, propensity and hydrophobicity had already produced high accuracies. When combining these features with residue charge feature or with each other, the accuracies rose noticeably. References 1. Chen CP, Rost B (2002) State-of-the-art in membrane protein. Appl Bioinformatics 1(1): 21-35. 2. Rost B, Fariselli P, Casadio R (1996) Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Sci 5(8): 1704-1718. 3. Krogh A, Larsson B, Heijne GV, Sonnhammer ELL (2001) Predicting transmembrane protein topology with a hidden markov model: Application to complete genomes. J Mol Biol 305(3): 567-580. 4. TusnĆ”dy GE, Simon I (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics 17(9): 849-850. 5. Yuan Z, Mattick JS, Teasdale RD (2004) SVMtm: support vector machines to predict transmembrane segments. J Comput Chem 25(5): 632-636. 6. Mƶller S, Kriventseva EV, Apweiler R (2000) A collection of well characterised integral membrane proteins. Bioinformatics 16(12): 1159-1160. 7. Mƶller S, Kriventseva EV, Apweiler R (2016) Collection of well characterized integral membrane proteins. 8. Krogh A, Larsson B, Heijne GV, Sonnhammer ELL (2016) Training dataset of TMHMM. 9. Punta M, Forrest LR, Bigelow H, Kernytsky A, Liu J, et al. (2007) Membrane protein prediction methods. Methods 41(4): 460-474. 10. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157(1): 105-132. 11. vonHeijne G, Gavel Y (1988) Topogenic signals in integral membrane proteins. European Journal of Biochemistry. 174(4): 671-678. 12. Pasquier C, Promponas VJ, Palaios GA, Hamodrakas JS, Hamodrakas SJ (1999) A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng 12(5): 381-385. 13. (2016) Propensity of amino acid. 14. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8): 1735-1780. 15. Hirokawa T, Boon CS, Mitaku S (1998) SOSUI: classification and secondary structure. Bioinformatics 14(4): 378-379. 16. Rost B, Fariselli P, Casadio R (1996) PHDhtm server. 17. Krogh A, Larsson B, Heijne GV, Sonnhammer ELL (2001) TMHMM server. 18. Hirokawa T, Boon CS, Mitaku S (1998) SOSUI server. 19. Mƶller S, Croning MD, Apweiler R (2001) Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17(7): 646- 653. 20. Chen CP, Rost B (2002) State-of-the-art in membrane protein. Applied Bioinformatics 1(1): 21-35. 21. Pasquier C, Hamodrakas SJ (1999) An hierarchical artificial neural network system for the classification of transmembrane proteins. Protein Engineering, Design & Selection 12(8): 631-634.
  • 7. 7/7 How to cite this article: V Thanh H, Nguyen M D, Tran T A, PhamT B. Predicting Protein Transmembrane Regionsby Using LSTM Model. Significances Bioeng Biosci. 1(2). SBB.000510.2018. DOI: 10.31031/SBB.2018.01.000510 Significances Bioeng Biosci Copyright Ā© PhamThe Bao Volume 1 - Issue - 2 Your subsequent submission with Crimson Publishers will attain the below benefits ā€¢ High-level peer review and editorial services ā€¢ Freely accessible online immediately upon publication ā€¢ Authors retain the copyright to their work ā€¢ Licensing it under a Creative Commons license ā€¢ Visibility through different online platforms ā€¢ Global attainment for your research ā€¢ Article availability in different formats (Pdf, E-pub, Full Text) ā€¢ Endless customer service ā€¢ Reasonable Membership services ā€¢ Reprints availability upon request ā€¢ One step article tracking system For possible submissions Click Here Submit Article Creative Commons Attribution 4.0 International License