SlideShare a Scribd company logo
1 of 34
Protein Sequence Analysis -
Overview
Raja Mazumder
Senior Protein Scientist, PIR
Assistant Professor, Department of Biochemistry and Molecular
Biology
Georgetown University Medical Center
NIH Proteomics Workshop 2004
2
Overview
 Proteomics and protein bioinformatics
(protein sequence analysis)
 Why do protein sequence analysis?
 Searching sequence databases
 Post-processing search results
 Detecting remote homologs
3
Clinical Proteomics
From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695
From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695
4
Single protein and shotgun analysis
Adapted from: McDonald et al. 2002. Disease Markers 18 99-105
Protein Bioinformatics
Mixture of proteins
Gel
based
seperation
Single protein analysis
Digestion of
protein mixture
Spot excision
and digestion
LC or
LC/LC separation
Shotgun analysis
Peptides from a
single protein
Peptides from
many proteins
MS analysis
MS/MS analysis
5
Protein Bioinformatics: Protein
sequence analysis
 Helps characterize protein sequences in silico
and allows prediction of protein structure and
function
 Statistically significant BLAST hits usually
signifies sequence homology
 Homologous sequences may or may not have
the same function but would always (very few
exceptions) have the same structural fold
 Protein sequence analysis allows protein
classification
6
Development of protein sequence
databases
 Atlas of protein sequence and structure –
Dayhoff (1966) first sequence database (pre-
bioinformatics). Currently known as Protein
Information Resource (PIR)
 Protein data bank (PDB) – structural database
(1972) remains most widely used database of
structures
 UniProt – The United Protein Databases
(UniProt, 2003) is a central database of protein
sequence and function created by joining the
forces of the SWISS-PROT, TrEMBL and PIR
protein database activities
7
Comparative protein sequence
analysis and evolution
 Patterns of conservation in sequences allows us
to determine which residues are under selective
constraints (are important for protein function)
 Comparative analysis of proteins more sensitive
than comparing DNA
 Homologous proteins have a common ancestor
 Different proteins evolve at different rates
 Protein classification systems based on
evolution: PIRSF and COG
8
PIRSF and large-scale functional
annotation of proteins
 PIRSF structure is in the
form of a network
classification system
based on the evolutionary
relationships of whole
proteins and domains
 As part of the UniProt
project, PIR has
developed this
classification strategy to
assist in the propagation
and standardization of
protein annotation
9
Comparing proteins
 Amino acid sequence of protein generated
from proteomics experiment
 e.g. protein fragment
DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKF
CKFT
 Amino-acids of two sequences can be
aligned and we can easily count the
number of identical residues (or use an
index of similarity) to find the % similarity.
 Proteins structures can be compared by
superimposition
10
Protein sequence alignment
 Pairwise alignment
 a b a c d
 a b _ c d
 Multiple sequence alignment usually
provides more information
 a b a c d
 a b _ c d
 x b a c e
 Multiple alignment difficult to do for
distantly related proteins
11
Protein sequence analysis
overview
 Protein databases
 PIR and UniProt
 Searching databases
 Peptide search, BLAST search, Text search
 Information retrieval and analysis
 Protein records at UniProt and PIR
 Multiple sequence alignment
 Secondary structure prediction
 Homology modeling
12
Universal Protein Knowledgebase
(UniProt)
PIR (Protein Information Resource) has recently joined forces with EBI (European
Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics) to establish the
UniProt
Literature-Based
Annotation
UniProt Archive
UniProt NREF
Swiss-
Prot
PIR-PSD
TrEMBL RefSeq GenBank/
EMBL/DDBJ
EnsEMBL PDB Patent
Data
Other
Data
UniProt Knowledgebase
Classification
Automated Annotation
Clustering at
100, 90, 50%
Literature-Based
Annotation
UniProt Archive
UniProt NREF
Swiss-
Prot
PIR-PSD
TrEMBL RefSeq GenBank/
EMBL/DDBJ
EnsEMBL PDB Patent
Data
Other
Data
UniProt Knowledgebase
Classification
Automated Annotation
Clustering at
100, 90, 50%
http://www.uniprot.org/
13
Peptide Search
14
Query Sequence
 Unknown sequence is Q9I7I7
 BLAST Q9I7I7 against the UniProt
knowledgebase
(http://www.pir.uniprot.org/search/blast.shtml)
 Analyze results
15
BLAST results
16
Text Search
17
Text search results: display
options
Moving Pubmed ID and PDB ID into “Columns in Display”
18
Text search results: add input
box
19
Text Search Result with NULL/NOT
NULL
20
UniProt protein record:
21
SIR2_HUMAN protein record
22
Are Q9I7I7 and SIR2_HUMAN
close homologs?
 Check BLAST results
 Check pairwise alignment
23
Protein structure prediction
 Programs can predict
secondary structure
information with 70%
accuracy
 Homology modeling -
prediction of ‘target
structure from closely
related ‘template’
structure
24
Secondary structure prediction
http://bioinf.cs.ucl.ac.uk/psipred/
25
Secondary structure prediction
results
26
Sir2 Homolog-Nad Complex
27
Homology modeling
http://www.expasy.org/swissmod/SWISS-MODEL.html
28
Homology model of Q9I7I7
Blue - excellent
Green - so so
Red - not good
Yellow - beta sheet
Red - alpha helix
Grey - loop
29
Sequence features:
SIR2_HUMAN
30
Multiple sequence alignment
31
Multiple sequence alignment
 Q9I7I7, Q82QG9, SIR2_HUMAN
32
Sequence features:
CRAA_RABIT
33
Identifying remote homologs
34
Structure guided sequence
alignment

More Related Content

Similar to NIH-mar2604.rm.ppt

Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Keiji Takamoto
 
Functional proteomics, and tools
Functional proteomics, and toolsFunctional proteomics, and tools
Functional proteomics, and toolsKAUSHAL SAHU
 
protein databases
 protein databases protein databases
protein databaseswasisyed
 
Multi Target Bioactivity Models in Pipeline Pilot
Multi Target Bioactivity Models in Pipeline PilotMulti Target Bioactivity Models in Pipeline Pilot
Multi Target Bioactivity Models in Pipeline PilotGerard van Westen
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases Hemant Bothe
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebaseKew Sama
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSHEETHUMOLKS
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 
Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis ServicesCreative Proteomics
 
How bioinformatic and sequencing data might inform the regulatory process - O...
How bioinformatic and sequencing data might inform the regulatory process - O...How bioinformatic and sequencing data might inform the regulatory process - O...
How bioinformatic and sequencing data might inform the regulatory process - O...OECD Environment
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeSumanthBT1
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxPRIYANKAZALA9
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Targeted and Discovery Proteomics
Targeted and Discovery ProteomicsTargeted and Discovery Proteomics
Targeted and Discovery ProteomicsCreative Proteomics
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wdWagied Davids
 
Proteomics of Hormone Fragments 111112
Proteomics of Hormone Fragments 111112Proteomics of Hormone Fragments 111112
Proteomics of Hormone Fragments 111112Roshan Kumar
 

Similar to NIH-mar2604.rm.ppt (20)

Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
 
Proteomic databases
Proteomic databasesProteomic databases
Proteomic databases
 
Functional proteomics, and tools
Functional proteomics, and toolsFunctional proteomics, and tools
Functional proteomics, and tools
 
protein databases
 protein databases protein databases
protein databases
 
Multi Target Bioactivity Models in Pipeline Pilot
Multi Target Bioactivity Models in Pipeline PilotMulti Target Bioactivity Models in Pipeline Pilot
Multi Target Bioactivity Models in Pipeline Pilot
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis Services
 
How bioinformatic and sequencing data might inform the regulatory process - O...
How bioinformatic and sequencing data might inform the regulatory process - O...How bioinformatic and sequencing data might inform the regulatory process - O...
How bioinformatic and sequencing data might inform the regulatory process - O...
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programme
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Targeted and Discovery Proteomics
Targeted and Discovery ProteomicsTargeted and Discovery Proteomics
Targeted and Discovery Proteomics
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
 
Proteomics of Hormone Fragments 111112
Proteomics of Hormone Fragments 111112Proteomics of Hormone Fragments 111112
Proteomics of Hormone Fragments 111112
 

Recently uploaded

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 

Recently uploaded (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

NIH-mar2604.rm.ppt

  • 1. Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center NIH Proteomics Workshop 2004
  • 2. 2 Overview  Proteomics and protein bioinformatics (protein sequence analysis)  Why do protein sequence analysis?  Searching sequence databases  Post-processing search results  Detecting remote homologs
  • 3. 3 Clinical Proteomics From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695 From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695
  • 4. 4 Single protein and shotgun analysis Adapted from: McDonald et al. 2002. Disease Markers 18 99-105 Protein Bioinformatics Mixture of proteins Gel based seperation Single protein analysis Digestion of protein mixture Spot excision and digestion LC or LC/LC separation Shotgun analysis Peptides from a single protein Peptides from many proteins MS analysis MS/MS analysis
  • 5. 5 Protein Bioinformatics: Protein sequence analysis  Helps characterize protein sequences in silico and allows prediction of protein structure and function  Statistically significant BLAST hits usually signifies sequence homology  Homologous sequences may or may not have the same function but would always (very few exceptions) have the same structural fold  Protein sequence analysis allows protein classification
  • 6. 6 Development of protein sequence databases  Atlas of protein sequence and structure – Dayhoff (1966) first sequence database (pre- bioinformatics). Currently known as Protein Information Resource (PIR)  Protein data bank (PDB) – structural database (1972) remains most widely used database of structures  UniProt – The United Protein Databases (UniProt, 2003) is a central database of protein sequence and function created by joining the forces of the SWISS-PROT, TrEMBL and PIR protein database activities
  • 7. 7 Comparative protein sequence analysis and evolution  Patterns of conservation in sequences allows us to determine which residues are under selective constraints (are important for protein function)  Comparative analysis of proteins more sensitive than comparing DNA  Homologous proteins have a common ancestor  Different proteins evolve at different rates  Protein classification systems based on evolution: PIRSF and COG
  • 8. 8 PIRSF and large-scale functional annotation of proteins  PIRSF structure is in the form of a network classification system based on the evolutionary relationships of whole proteins and domains  As part of the UniProt project, PIR has developed this classification strategy to assist in the propagation and standardization of protein annotation
  • 9. 9 Comparing proteins  Amino acid sequence of protein generated from proteomics experiment  e.g. protein fragment DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKF CKFT  Amino-acids of two sequences can be aligned and we can easily count the number of identical residues (or use an index of similarity) to find the % similarity.  Proteins structures can be compared by superimposition
  • 10. 10 Protein sequence alignment  Pairwise alignment  a b a c d  a b _ c d  Multiple sequence alignment usually provides more information  a b a c d  a b _ c d  x b a c e  Multiple alignment difficult to do for distantly related proteins
  • 11. 11 Protein sequence analysis overview  Protein databases  PIR and UniProt  Searching databases  Peptide search, BLAST search, Text search  Information retrieval and analysis  Protein records at UniProt and PIR  Multiple sequence alignment  Secondary structure prediction  Homology modeling
  • 12. 12 Universal Protein Knowledgebase (UniProt) PIR (Protein Information Resource) has recently joined forces with EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics) to establish the UniProt Literature-Based Annotation UniProt Archive UniProt NREF Swiss- Prot PIR-PSD TrEMBL RefSeq GenBank/ EMBL/DDBJ EnsEMBL PDB Patent Data Other Data UniProt Knowledgebase Classification Automated Annotation Clustering at 100, 90, 50% Literature-Based Annotation UniProt Archive UniProt NREF Swiss- Prot PIR-PSD TrEMBL RefSeq GenBank/ EMBL/DDBJ EnsEMBL PDB Patent Data Other Data UniProt Knowledgebase Classification Automated Annotation Clustering at 100, 90, 50% http://www.uniprot.org/
  • 14. 14 Query Sequence  Unknown sequence is Q9I7I7  BLAST Q9I7I7 against the UniProt knowledgebase (http://www.pir.uniprot.org/search/blast.shtml)  Analyze results
  • 17. 17 Text search results: display options Moving Pubmed ID and PDB ID into “Columns in Display”
  • 18. 18 Text search results: add input box
  • 19. 19 Text Search Result with NULL/NOT NULL
  • 22. 22 Are Q9I7I7 and SIR2_HUMAN close homologs?  Check BLAST results  Check pairwise alignment
  • 23. 23 Protein structure prediction  Programs can predict secondary structure information with 70% accuracy  Homology modeling - prediction of ‘target structure from closely related ‘template’ structure
  • 28. 28 Homology model of Q9I7I7 Blue - excellent Green - so so Red - not good Yellow - beta sheet Red - alpha helix Grey - loop
  • 31. 31 Multiple sequence alignment  Q9I7I7, Q82QG9, SIR2_HUMAN