“Homology-enhanced probabilistic consistency”
multiple sequence alignment :
a case study on transmembrane protein
Jia-Ming...
Transmembrane protein
Membrane proteins are likely to constitute 20-30% of all ORFs
contained in genomes.
Odorant receptor...
Transmembrane protein multiple
sequence alignment
• 1994 first address alignment for transmembrane proteins
– Cserzo M, Be...
BAliBASE 2.0 reference 7
Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple
alignment of tran...
We need an accurate Transmembrane MSA!
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids
Res 2005, 33(3)...
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids
Res 2005, 33(3)...
Pair-hidden Markov Model
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency-
based multipl...
Probabilistic consistency
transformation
Homology-extended probabilistic
consistency
New emission probabilities are like the following.
20 20
)..,..(),('
m n
nmnmj...
Homology-extended probabilistic
consistency
P(xi ~ y j Îa* | x,y)¬
1
S
aigkP xi ~ zk Îa* | x,z( )
zk
å · bjgkP zk ~ y j Îa...
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids
Res 2005, 33(3)...
Que1: how to build a profile?
• Database Size
• Searching parameters
– E-value : most used, anything else???
1. Matrix fil...
Word hit & Neighborhood
Searching parameters
• Fast, Insensitive search
– High percent identity
– blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e ...
UniRef50
TM
UniRef90
TM
UniRef100
TM
UniProt
TM
Different database
UniProt (release 15.15 – 2010)
NCBI non-redundant (NR)
...
Database Size
Data Set No.
UniRef50-TM 87,989
UniRef90-TM 263,306
UniRef100-TM 613,015
UniProt-TM 818,635
UniRef50 3,077,4...
Performance comparison of different
database sizes for the BAliBASE2-ref7.
UniRef50-TM contains about 100 times fewer sequ...
10% more columns are correctly aligned when compared with
PRALINETM .
The rows, Pairs and Cols, denote the sum of correcte...
BAliBASE 3.0
The performance of other methods are from Rausch et al. The SP and TC scores of full-
length sequences are ev...
Que2: how to score profiles?
Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile
alignme...
• Prediction mode : –template_file PSITM
• Output : -output tm_html
This output was obtained on Or94b of D. melanogaster a...
Paolo Di Tommaso
http://tcoffee.crg.cat/tmcoffee
TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee
Upcoming SlideShare
Loading in …5
×

TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

596 views

Published on

Chang, J-M, P Di Tommaso, JF Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
596
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

  1. 1. “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.
  2. 2. Transmembrane protein Membrane proteins are likely to constitute 20-30% of all ORFs contained in genomes. Odorant receptors Richard Benton, “Eppendorf winner. Evolution and revolution in odor detection,” Science (New York, N.Y.) 326, no. 5951 (October 16, 2009): 382-383.
  3. 3. Transmembrane protein multiple sequence alignment • 1994 first address alignment for transmembrane proteins – Cserzo M, Bernassau JM, Simon I, Maigret B: New alignment strategy for transmembrane proteins. J Mol Biol 1994, 243(3):388-396. • Few multiple sequence alignment software till now => 3 – ShafrirY, Guy HR: STAM: simple transmembrane alignment method. Bioinformatics 2004, 20(5):758-769. – Forrest LR,Tang CL, Honig B: On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. BiophysJ 2006, 91(2):508-517. – PirovanoW, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492- 497.
  4. 4. BAliBASE 2.0 reference 7 Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-497.
  5. 5. We need an accurate Transmembrane MSA!
  6. 6. Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
  7. 7. Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
  8. 8. Pair-hidden Markov Model Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency- based multiple sequence alignment. Genome Res 2005, 15(2):330-340. Emission probabilities, which correspond to traditional substitution scores, are based on the BLOSUM62 matrix.
  9. 9. Probabilistic consistency transformation
  10. 10. Homology-extended probabilistic consistency New emission probabilities are like the following. 20 20 )..,..(),(' m n nmnmji AAAApyxp where αm is the frequency with which residue m appears at position i and βn is the frequency with which residue n appears at position j; p(A.A.m, A.A.n) is the original emission probabilities in ProbCons.
  11. 11. Homology-extended probabilistic consistency P(xi ~ y j Îa* | x,y)¬ 1 S aigkP xi ~ zk Îa* | x,z( ) zk å · bjgkP zk ~ y j Îa* | z,y( )zÎS å where αi , βj , and rk are the profile frequency.
  12. 12. Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824. Que1: how to build a profile? Que2: how to score profiles?
  13. 13. Que1: how to build a profile? • Database Size • Searching parameters – E-value : most used, anything else??? 1. Matrix file : -M 2. Filter the query sequence for low-complexity subsequence : -F 3. Neighborhood word threshold : -f 4. Truncates the report to number of alignments: -b
  14. 14. Word hit & Neighborhood
  15. 15. Searching parameters • Fast, Insensitive search – High percent identity – blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e 1e-5 • Slow, Sensitive search – Increase sensitivity, decrease specificity – blastp –F “m S” –f 9 –M BLOSUM45 –e 100 –b 10000 –v 10000 • Book “BLAST”, page 146, 147
  16. 16. UniRef50 TM UniRef90 TM UniRef100 TM UniProt TM Different database UniProt (release 15.15 – 2010) NCBI non-redundant (NR) UniRef50 UniRef90 UniRef100 keyword:"Transmembrane [KW-0812]"
  17. 17. Database Size Data Set No. UniRef50-TM 87,989 UniRef90-TM 263,306 UniRef100-TM 613,015 UniProt-TM 818,635 UniRef50 3,077,464 UniRef90 6,544,144 UniRef100 9,865,668 UniProt 11,009,767 NCBI NR 10,565,004 UniRef50 TM UniRef90 TM UniRef100 TM UniProt TM UniProt (release 15.15 – 2010) NCBI non-redundant (NR) UniRef50 UniRef90 UniRef100 keyword:"Transmembrane [KW-0812]"
  18. 18. Performance comparison of different database sizes for the BAliBASE2-ref7. UniRef50-TM contains about 100 times fewer sequences than the full UniProt. The level accuracy is comparable and even superior to that achieved with the default PSI-Coffee while the CPU time requirements are dramatically decreased by a factor 10.
  19. 19. 10% more columns are correctly aligned when compared with PRALINETM . The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively.
  20. 20. BAliBASE 3.0 The performance of other methods are from Rausch et al. The SP and TC scores of full- length sequences are evaluated by core blocks (by xml).
  21. 21. Que2: how to score profiles? Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 2004, 20(8):1301-1308.
  22. 22. • Prediction mode : –template_file PSITM • Output : -output tm_html This output was obtained on Or94b of D. melanogaster and its orthologs of other Drosophlia species. Notably, the predicted topology of the Or94b set is consistent with the Benton et al.’s conclusion.
  23. 23. Paolo Di Tommaso http://tcoffee.crg.cat/tmcoffee

×