Semi-covariance coefficient analysis of spike proteins from SARS-CoV-2 and other coronaviruses for viral evolution and charge characteristics associated with fatality.
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Semi-covariance analysis of spike protein charge patterns distinguishes coronaviruses
1. Semi-covariance coefficient analysis of spike proteins
from SARS-CoV-2 and other coronaviruses for viral
evolution and characteristics associated with fatality
By
Jun Huang, Rebecca Spencer, Wandong Zhang
December 25 2020 – January 25 2021
Data Processing Training for 401 Lab
3. 3.36%
1. Introduction
401 Lab 2020 Math Training
• Complex modeling has received significant attention in recent
years and is increasingly used to explain the statistical
phenomenon with increasing and decreasing fluctuations such
as the similarity or difference of spike protein charge patterns
of coronaviruses.
• Different from the existing covariance or correlation coefficient
methods in traditional integer dimension construction, this
study proposes a simplified novel fractional dimension
derivation with the exact Excel tool algorithm.
• It involves the fractional center moment extension to
covariance, which ends up as a complex covariance coefficient
that is better than the Pearson correlation coefficient, in the
sense that the nonlinearity relationship can be further depicted.
4. 3.36%
Positive or Negative Charge
401 Lab 2020 Math Training
• The spike protein sequences of coronaviruses were obtained
from the GenBank and GISAID database, including the
coronaviruses from pangolin, bat, canine, porcine (three
variations), feline, tiger, SARS-CoV-1, MERS, and SARS-CoV-
2 (Wuhan, Beijing, New York, German and UK) were used as
the representative examples in this study.
• By examining the values above and below the average/mean
based on the positive and negative charge patterns of the
amino acid residues of the spike proteins from coronaviruses,
the proposed algorithm provides deep insights into the
nonlinear evolving trends of spike proteins for understanding
the viral gene sequence evolution and identifying the protein
characteristics associated with viral fatality.
• SARS-CoV-1 is negative charged, SARS-CoV-2 is 10 times
more positive charged. UK version 20% more charging!
5. 3.36%
Viability
401 Lab 2020 Math Training
• The calculation results demonstrate that the complex
covariance coefficient analyzed by this algorithm is capable of
distinguishing the subtle nonlinear differences in the spike
protein charge patterns with reference to the Wuhan SARS-
CoV-2 for which the Pearson correlation coefficient may
overlook.
• Our analysis reveals the unique convergent (positive
correlative) to divergent (negative correlative) domain center
positions of each virus.
• The convergent or conserved region may be critical to the viral
evolution stability or viability; while the divergent region is
highly variable between coronaviruses suggesting high
frequency of mutations in this region.
6. 3.36%
Residues
401 Lab 2020 Math Training
• The analysis shows that the conserved center region of SARS-
CoV-1 spike protein is located at amino acid residues 900, but
shifted to the amino acid residues 700 in MERS spike protein,
and then to amino acid residues 600 in SARS-CoV-2 spike
protein, indicating the evolvement of the coronaviruses.
• Another important characteristic our study reveals that the
distance between the divergent mean and the maximal
divergent point in each of the viruses (MERS>SARS-CoV-
1>SARS-CoV-2) is proportional to viral fatality rate.
• This algorithm may help to understand and analyze the
evolving trends and critical characteristics of other coronaviral
proteins and viruses.
7. Number Matters.
2. Materials and Methods
401 Lab 2020 Math Training
The coronavirus spike protein sequences used in this study were obtained
from the NCBI GenBank and the GISAID database, including SARS-
CoV-2 (the sequences isolated in Wuhan, Beijing Xinfadi wholesale
market, Germany, New York, UK and New York Zoo tiger), SARS-CoV-1,
Middle East respiratory syndrome (MERS), bat coronavirus (RaTG13),
pangolin coronavirus, feline coronavirus, canine coronavirus, and swine
coronaviruses [Swine Transmissible gastroenteritis virus (Swine-stomach),
swine enteric coronavirus (Swine-Ent), and porcine respiratory
coronavirus (Swine-Res)]. The sequence ID from the GenBank and
GISAID database are listed in Table 1.
9. 3.36%
3. Results and Conclusion
• To compare and prove the usefulness of the simplified
complex variances, we compare the correlation of SARS-
CoV-2 viral spike protein sequence with other coronavirus
spike protein sequences.
• Since Excel is not capable of handling the imaginary
number, we simplify the calculation with integer power,
but separate the positive and negative covariance signs.
• Because coronaviruses spike proteins have different
electrical charge levels, we normalize the covariance by the
variance respectively just as the Pearson calculation does.
401 Lab 2020 Math Training
10. 3.36%
Figures and Tables
• Figures 1-6 are the calculation results from our algorithm of semi-
covariance coefficient for spike protein Wuhan SARS-CoV-2 in
comparison with spike proteins of other coronaviruses listed in Table 1.
• Figure 7-9 are a combination of linear and nonlinear relationships
baselined on Wuhan.
• Figure 10-15 are nonlinear relationships baselined on Wuhan.
• Figure 16-19 are linear relationships baselined on Wuhan.
• Figure 20 is a combination of linear and nonlinear relationships again.
Nonlinear relationship is piece wised linear, that means only partial
proteins are related.
• It is evident that the fatality rate caused by the virus is highly related to
the distance between the divergent center (mean) and the maximal
divergent point (Table 2).
401 Lab 2020 Math Training
17. 3.36%
Scatter Plot
• A scatter graph (also called a scatter plot, scatter chart or scatter
diagram) is a type of plot or mathematical diagram using Cartesian
coordinates (with 4 quadrants) to display values for two variables for a
set of data.
• The data are displayed as a collection of points, each having the value
of one variable (charge value from Wuhan sequence) determining the
position on the horizontal axis (for Wuhan) and the value of the other
variable (charge value from others like Pangolin etc) determining the
position on the vertical axis (for other).
• A scatter plot can suggest various kinds of correlations between
variables with a certain linear or nonlinear pattern. Correlations may
be positive (rising), negative (falling), or neither (uncorrelated). If the
pattern of dots slopes from lower left to upper right, it indicates a
positive correlation between the variables being studied. If the
pattern of dots slopes from upper left to lower right, it indicates a
negative correlation.
401 Lab 2020 Math Training
18. 3.36%
Linear vs Nonlinear
• If the dots are continuously connected one after another, we have a
simple linear relationship. If the dots form a few islands, we have the
nonlinear pattern. If both patterns are there, we have the mixed of
linear and nonlinear.
• If within the islands, it is linear, we can call it local linear, globally
nonlinear, or piece wised linear. It means only a particular charged
piece of the entire sequence is linear correlated within that piece.
• If we view the island as a super dot, and super dots forming a linear
relationship, we call it global linear, locally nonlinear. It means the
specially charged pieces of the entire sequence are linear
correlated among the pieces. Each piece has its unique electro-
biological functions.
• The 1st and 3rd quadrants are pieces where Wuhan sequence have
the same charge as the Pangolin's. The 2nd and 4th quadrants are
pieces where Wuhan sequence have the opposite charge as the
Pangolin's.
401 Lab 2020 Math Training
35. 3.36%
Conclusion
• We have analyzed spike protein charge patterns of
coronaviruses by using our algorithm of semi-covariance
(nonlinear) coefficient as compared to Pearson (linear)
correlation.
• The analysis reveals additional performance index over Pearson
analysis, such as both positive- and negative-correlative
centers/regions in the spike proteins.
• The analysis provides in-depth understanding for the nonlinear
viral evolution pattern and identifies the protein characteristics
associated with viral fatality.
• The example code is available from the Excel file on the github
server (https://github.com/steedhuang/covid-19-gene-convertor).
• Our future work will pay more attention on the relationship
between positive charges to infectivity. As UK version has 20%
more positive charges!
401 Lab 2020 Math Training
36. 3.36%
Acknowledgement
The work in Dr. Zhang’s lab is supported by a team
grant on the Rapid Research Response to COVID-19
Outbreak awarded from the Canadian Institute of Health
Research (CIHR) and by funding from the National
Research Council of Canada.
Thanks go to Lishen Wang from Jiangsu University for
writing Python code to covert sequences into charges.
Thanks also go to Mei Huang from Ottawa Hospital
COVID-19 patient unit for proof reading and editing the
final version.
401 Lab 2020 Math Training