SVM based prioritization of cancer causing mutations in centromere protein family

A short introduction to Centrosomal
Variants
SVM based prioritization of cancer causing
mutations in centromere protein family

Centromere
 The centromere is the part of a chromosome that links sister
chromatids.
 During anaphase of mitosis, paired centromeres in each distinct
chromosome begin to move apart as daughter chromosomes
migrate centromere first toward opposite ends of the cell.
 It is the most condensed and constricted region of a chromosome.
 It serves as the point of attachment for spindle fibers.
 Deregulation in the their activity leads to several checkpoint
dissorders and pathogeneticities.

Mutation induced centromere dysfunctioning is
linked with several human diseases
 BardetBiedlsyndrome
 Polycystic kidney disease
 Lissencephaly
 Primordial Dwarfism
 Autosomal Primary Recessive Microcephaly
 Cancer

Few important Centromere protein
families
 CEP family proteins
 CENP family proteins
 MAD family proteins
 hSAS family proteins
 CEPTIN family proteins

CENP-E recruitment and its activity is
mediated by several other proteomic
complexes

Proteins selected for evaluation
CENPA, CENPB, CENPC, CENPE, CENPF, CENPH, CENPI, CENPJ, CENPK, CENPL,
CENPM, CENPN, CENPO, CENPP, CENPQ, CENPR, CENPS, CENPT, CENPU, CENPV,
CENPW, CENPX, CENPY, CENPZ
Total 823 structural variants from CENP protein family were collected for
this study

Machine Learning: What is it all about
1. Computers are very intelligent and has greater compilaton ability.
2. It can learn everything, no matter what you give.
3. Training data must not contain any wrong values.
4. To prevent the use of spurious datas we must validate and scale the entire dataset
before starting the training session.
5. There are three different methodologies in machine learning.
a. Supervised learning methods
b. Unsupervised learning methods
c. Reinforcement learning methods

 Supervised learning is the machine learning task of inferring a function from
supervised (labeled) training data.
 A supervised learning algorithm analyzes the training data and produces an inferred
function.
 The parallel task in human and animal psychology is often carride out by this method.
 Few widely used supervised learning algorithms are:
1. Support vector machines
2. Bayesian statistics
3. Artificial neural network
4. Random Forests
5. Regression analysis

Support Vector Machines
 A support vector machine (SVM) is a concept in statistics and computer science for a set of
related supervised learning methods that analyze data and recognize patterns, used for
classification and regression analysis.
 Given a set of training examples, each marked as belonging to one of two categories, an
SVM training algorithm builds a model that assigns new examples into one category or
the other.
 More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a
high or infinite dimensional space, which can be used for classification.

Here consider Đ as a training data for which,
Đ = {(xi,yi) | xi Є Rp, yi Є {1, -1}} (for i=1 to n)
 For training we used radial basis function kernal for greater accuarcy
(RBF): K(xi , xj) = exp(−γ ||xi − xj||^2), γ > 0.

Objective
To identify the cancer associated nsSNP's in CENP
protein family using support vector machine
approach

1. Examination of protocol
2. Application of protocol to collect datasets for training the machine
4. Application of designed classifier to identify the cancer associated
mutations in CENP family proteins.
3. Designing a Support Vector Machine classifier system using machine
learning algorithm
Methodology
5. Studying the dynamic behaviour of cancer associated structural variants

Examination of protocol was carried out on CENPE proteinExamination of protocol was carried out on CENPE protein
➔Centromere-associated protein-E (CENPE), a protein with 2701 amino acids and relative
molecular weight of 312 kDa, is highly expressed in mitosis and accumulates in the cell just
prior to mitosis.
➔It is required for efficient, stable microtubule capture at kinetochores.
➔It plays an essential role in integrating the mechanics of microtubule-chromosome
interactions with mitotic checkpoint signaling, and has emerged as a novel target for cancer
therapy.
➔It contains ATP-sensitive motor-like domain at its N-terminus that is actively involved in
hydrolyzing ATP to produce directed mechanical force along microtubules.
➔Absence of CENPE reduces tension at the bi-orientated chromosomes resulting in
misaligned chromosomes in the metaphase plate, leading to metaphase arrest.
➔CENPE expression was also found to be reduced in human HCC tissue, and lower
expression of CENPE was found to be inducing aneuploidy in LO2 cells.

Prediction of oncogenic mutant in CENPE using SNP prediction tools
 We first collected 100 nsSNP reported in CENPE coding gene from NCBI dbSNP database.
 SIFT, Polyphen, PhDSnp, Pmut, CancPredict and Dr. Cancer tools were used to identify the
cancer associated SNP from the available dataset.
 We found Y63H as highly deleterious and cancer associated using above tools.
 To analyse the structural consequences of this mutation we further carried out olecular
dynamic simulation of CENPE native and mutant motor domain for 5 ns timescale.
 Insilico X-ray scatering was carried out throughout the simulation in order to observe the
change in ionic density in native and mutant structure.
 Root mean square deviation was then plotted to analyze the relative fluctuation of the
structures.

Molecular blueprint of structural variation in CENPE motor
domain: Inside body environment
Native Mutant

Time (ps) Time (ps)
NativeNative MutantMutant
Root Mean Square FluctuationRoot Mean Square Fluctuation

Calculation of R208K CENPE-ATP association constantCalculation of R208K CENPE-ATP association constant
According to Debye-Huckel theory
Ҡ(reaction rate constant) œ U (electrostatic interaction energy)
Ҡnative            Ҡmutant
134.6        Ҡmutant
Ҡmutant                    134.6 Ҳ 3.06
Unative
=
Umutant
-13.42
=
-3.06
=
13.42
= 30.69
CENPEnative + ATP -> CENPEnative-ATP complex; = 134.6Ҡ
CENPEmutant + ATP -> CENPEmutant-ATP complex; = 30.69Ҡ

Time (seconds) Time (seconds)
Native Mutant
CENPE-ATP
CENPE-ADP CENPE-ADP
CENPE-ATP
Time (seconds) Time (seconds)

Tools used to collect training data's
Row 1 Row 2 Row 3 Row 4
0
2
4
6
8
10
12
Column 1
Column 2
Column 3
Tools used to collect SNP training datas
1. SIFT, Polyphen, PhDSnp, Pmut, CancPredict and Dr. Cancer tools were used to collect the SNP
datasets.
2. Cancer variant datas were obtained from Swissvar database.
3. Neutral variants were randomly picked from Swissprot database.
4. Scaling, training and model generation were carried out using support vector machine algorithm.
5. RBF kernal was used to generate the classifier model.
6. Rescaling and cross-validation was carried out by changing the Ć and γ values untill the maximum
accuracy was obtained.

Model designed for neutral variants
Model designed for 100 Neutral and Cancer variants

References
 Kim Y, Holland AJ, Lan W, Cleveland DW. Aurora kinases and protein phosphatase 1
mediate chromosome congression through regulation of CENP-E. Cell. 2010 142:444-
55.
 Maia AF, Feijão T, Vromans MJ, Sunkel CE, Lens SM. Aurora B kinase cooperates with
CENP-E to promote timely anaphase onset. Chromosoma. 2010 119:405-13.
 Yang CP, Liu L, Ikui AE, Horwitz SB. The interaction between mitotic checkpoint proteins,
CENP-E and BubR1, is diminished in epothilone B-resistant A549 cells. Cell Cycle.
2010 Mar 15;9(6):1207-13.
 Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) GROMACS 4: Algorithms for
Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J Chem Theory
Comput. 4:435–447.
 Frisch C, Fersht AR, Schreiber G. Experimental assignment of the structure of the transition
state for the association of barnase and barstar. J Mol Biol. 2001 308:69-77.

AcknowledgementAcknowledgement
J. Febin Prabhudass
Asst. Prof. Seniour
School of Biosciences and Technology
VIT Univerity

SVM based prioritization of cancer causing mutations in centromere protein family

SVM based prioritization of cancer causing mutations in centromere protein family

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SVM based prioritization of cancer causing mutations in centromere protein family

Similar to SVM based prioritization of cancer causing mutations in centromere protein family (20)

Recently uploaded

Recently uploaded (20)