2. Protein Remote Homology
â–° Protein remote homology detection refers to the
identification of the homologous proteins, which are similar
in structure and function but sharing low sequence
identity.
â–° Detecting remote homolog proteins has an important
impact on the proteomics, biomedical sciences , and it is
one of the fundamental techniques for protein structure
and function prediction.
2
3. Protein Remote Homology
â–° As the development of sequencing techniques, the number
of protein sequences is growing rapidly.
â–° Up to June 2016, there are >64 million protein sequences
in UniProtKB/TrEMBL database, and millions of sequences
are added into this database per month.
3
4. Protein Remote Homology
â–° In contrast, the number of proteins with known structures
grows much slower.
â–° Up to June 2016, there are only about 119 000 structures
deposited in protein data bank (PDB), and only thousands
of structures are added per year. Therefore, the huge gap
between protein sequences and structures is obvious and
quickly increasing. It is an emergency task to explore
effective and low-cost approaches to reduce this gap.
4
5. Protein Remote Homology
â–° Because the traditional biological techniques for protein
remote homology detection are expensive and ineffective,
computational approach is an alternative scheme with low
cost.
5
9. Approaches: Discriminative methods
â–° treat the protein remote homology detection as a superfamily level
classification task.
â–° a supervised manner using both the positive and negative samples, which
are then used to predict the unseen samples.
â–° the number of false-positive samples can be efficiently reduced compared
with alignment methods.
â–° the feature vectors of some discriminative methods are constructed based
on the alignment methods.
9
11. Approaches: Discriminative methods
â–° multiclass classification problem.
â–° detect the superfamily of a query protein.
â–° positive training samples come from the proteins outside this family
but within the same superfamily.
â–° Negative samples are selected from outside of the same fold.
▰ Ex: SVM, Random Forest…etc
11