SlideShare a Scribd company logo
1 of 88
Download to read offline
PROTEIN PATTERN
DATABASES
PROTEIN SEQUENCES
SUPERFAMILY
FAMILY
DOMAIN
MOTIF
SITE
RESIDUE
BASIC INFORMATION COMES
FROM SEQUENCE
• Multiple alignments of related sequences- can
build up consensus sequences of known families,
domains, motifs or sites.
• Pattern
• Matrix
• Profile
• HMM
COMMON PROTEIN PATTERN
DATABASES
• Prosite patterns
• Prosite profiles
• Pfam
• SMART
• Prints
• TIGRFAMs
• BLOCKS
Alignment databases
• ProDom
• PIR-ALN
• ProtoMap
• Domo
• ProClass
PROSITE Patterns and profiles
• http://www.expasy.ch/prosite/
• Building a pattern:
a.) from literature -test against SP, update if necessary
b.) new patterns:
Start with reviewed protein family, known functional sites:
• enzyme catalytic site,
• attachment site eg heme,
• metal ion binding site
• cysteines for disulphide bonds,
• molecule (GTP) or protein binding site
PROSITE PATTERNS
Alignment
Functionally
important
residues
Find 4-5
conserved
residues
Core pattern
Find only
correct
entries-
leave
Find many
false
positives,
increase
pattern and
re-search
Search SP
Pattern is given as regular expression:
[AC]-x-V-x(4)-{ED}
ala/cys-any-val-any-any-any-any-(any except glu or asp)
PROSITE PROFILES
• Not confined to small regions, cover whole protein or
domain and has more info on allowed aa at each position
• Start with multiple seq alignment -uses a symbol comparison
table to convert residue frequency distributions into weights
• Result- table of position-specific amino acid weights and
gap costs- calculate a similarity score for any alignment
between a profile and a sequence, or parts of a profile and a
sequence
• Tested on SP, refined. Begin as prefiles then integrated
Pfam
•http://www.sanger.ac.uk/Software/Pfam/index.shtml
•Database of HMMs for domains and families
•HMMs are built from HMMER2 (Bayesian statistical models), can use
two modes ls or fs, all domains should be matched with ls
•Use Bits scores, thresholds are chosen manually using E-value from
extreme fit distribution
•Two parts to Pfam:
➩ PfamA -manually curated
➩ PfamB -automatic clustering of rest of SPTR from ProDom using Domainer
•Use -looking at domain structure of SPTR protein or new sequence
Additional features of Pfam
• PfamA has about 65% coverage of SPTR, rest is
covered by PfamB
• Can search directly with DNA -Wise2 package
• Can view taxonomic range of each entry
• Can view proteins with similar domain structure
and view of all family members
• Links to other databases including 3D structure
• Note: No 2 PfamA HMMs should overlap
SMART- Simple Modular Architecture
Research Tool
• http://smart.embl-heidelberg.de/
• Relies on hand curated multiple sequence alignments of
representative family members from PSI-BLAST- builds HMMs-
used to search database for more seq for alignment- iterative
searching until no more homologues detected
• Store Ep (highest per protein E-value of T) and En (lowest per
protein E-value of N) values
• Will predict domain homologue with sequence if
– Ep < E-value <En and E-value <1.0
Additional features of SMART
• Used for identification of genetically mobile domains and
analysis of domain architectures
• Can search for proteins containing specific combinations of
domains in defined taxa
• Can search for proteins with identical domain architecture
• Also has information on intrinsic features like signal
sequences, transmembrane helices, coiled-coil regions and
compositionally biased regions
ProDom
• http://www.toulouse.inra.fr/prodom.html
• Groups all sequences in SPTR into domains ->150 000
families
• Use automatic process to build up domains -DOMAINER
• For expert curated families, use PfamA alignments to build
new ProDom families
• Use diameter (max distance between two domains in family)
and radius of gyration root mean square of distance between
domain and family consensus), both counted in PAM (percent
accepted mutations (no per 100 aa) to measure consistency of
a family, lower these values, more homogeneous family
Building of ProDom families
DB initialisation: SEG, split
modified seq, sort by size
Query DB looking for
internal repeats
Extract shortest
sequence as a query
PSI-BLAST New ProDom
domain family
Extract first repeat -
use as a query
Remove new found
domains, sort by size
Repeated
until database
is empty
PRINTS -Fingerprint DB
• http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/
• Fingerprint- set of motifs used to predict occurrence of similar motifs in
a sequence
• Built by iterative scanning of OWL database
• Multiple sequence alignment- identify conserved motifs- scan database
with each motif- correlate hitlists for each- should have more sequences
now- generate more motifs- repeat until convergence
• Recognition of individual elements in fingerprint is mutually conditional
• True members match all elements in order, subfamily may match part of
fingerprint
BLOCKS
• http://www.blocks.fhcrc.org/
• Multiply aligned ungapped segments corresponding to most
highly conserved regions of proteins- represented in profile
• Built up using PROTOMAT (BLOSUM scoring model),
calibrated against SWISS-PROT, use LAMA to search
blocks against blocks
• Starting sequences from Prosite, PRINTS, Pfam, ProDom
and Domo - total of 2129 families
Building of Blocks
Alignments
from Prosite
Alignments
from ProDom
Alignments
from PfamA
Alignments
from PRINTS
Alignments
from Domo
Build blocks using
PROTOMAT
Search for common
blocks LAMA- remove
Build blocks using
PROTOMAT
Search for common
blocks LAMA- remove
Build blocks using
PROTOMAT
Search for common
blocks LAMA- remove
Blocks
database
annotated
verified
Unverified and
changes
SEARCHING BLOCKS
• Compare a protein or DNA (1-6 frames)
sequence to database of blocks
• Blocks Searcher- used via internet or email:
First position of sequence aligned to first position of first block -
score for that position, score summed over width of alignment,
then block is aligned with next position etc for all blocks in
database- get best alignment score. Search is slow (350 aa/2 min)
• Can search database of PSI-BLAST PSSMs for
each blocks family using IMPALA
TIGRFAMs
• http://www.tigr.org/TIGRFAMs
• Collection of protein families in HMMs built with curated
multiple sequence alignments and with associated functional
information
• Equivalog- homologous proteins conserved with respect to
function since last ancestor (other pattern databases
concentrate on related seq not function)
• > 800 non-overlapping families -can search by text or
sequence
• Has information for automatic annotation of function,
weighted towards microbial genomes
Text search
results
Example entry
Sequence search result
PIR-ALN
• http://www-nbrf.georgetown.edu/pirwww/
search/textpiraln.html
• Database of annotated protein sequence alignments
derived automatically from PIR PSD
• Includes alignments at superfamily (whole sequence),
family (45% identity) and domain (in more than one
superfamily) levels
• 3983 alignments, 1480 superfamilies, 371 domains
• Can search by protein accession number or text
PROTOMAP
• http://www.protomap.cs.huji.ac.il
• Automatic classification of all SWISS-PROT proteins into
groups of related proteins (also including TrEMBL now)
• Based on pairwise similarities
• Has hierarchical organisation for sub- and super-family
distinctions
• 13 354 clusters, 5869 ≥ 2 proteins, 1403 ≥ 10
• Keeps SP annotation eg description, keywords
• Can search with a sequence -classify it into existing clusters
DOMO
• http://www.infobiogen.fr/srs6bin/cgi-bin/wgetz?-
page+LibInfo+-lib+DOMO (SRS)
• Database of gapped multiple sequence alignments from
SWISS-PROT and PIR
• Domain boundaries inferred automatically, rather than
from 3D data
• Has 8877 alignments, 99058 domains, and repeats
• Each entry is one homologous domain, has annotation on
related proteins, functional families, evolutionary tree etc
ProClass
• http://pir.georgetown.edu/gfserver/proclass.html
• Non-redundant protein database organized by
family relationships defined by Prosite patterns
and PIR superfamilies.
• Facilitates protein family information retrieval,
domain and family relationships, and classifies
multi-domain proteins
• Contains 155,868 sequence entries
SBASE (Agricultural Biotechnology Centre)
• http://sbase.abc.hu/main.html
• Protein domain library from clustering of functional and
structural domains
• SBASE entries - grouped by Standard names (SN groups)
that designate various functional and structural domains of
protein sequences- relies on good annotation of domains
• Detects subclasses too
• Can do similarity search with BLAST or PSI-BLAST
Integrating Pattern databases
• MetaFam
• IProClass
• CDD
• InterPro
METAFAM
• http://metafam.ahc.umn.edu/
• Protein family classification built with Blocks+,
DOMO, Pfam, PIR-ALN, PRINTS, Prosite,
ProDom, SBASE, SYSTERS
• Automatically create supersets of overlapping
families using set-theory to compare databases-
reference domains covering total area
• Use non-redundant protein set from SPTR & PIR
IProClass
• http://pir.georgetown.edu/iproclass/
• Integrated database linking ProClass, PIR-ALN,
Prosite, Pfam and Blocks
• Contains >20000 non-redundant SP & PIR
proteins, 28000 superfamilies, 2600 domains,
1300 motifs, 280 PTMs
• Can be searched by text or sequence
CDD Conserved
Domain Database
• http://www.ncbi.nlm.nih.gov:80/Structure/cdd/cdd.shtml
• Database of domains derived from SMART, Pfam and
contributions from NCBI (LOAD)
• Uses reverse position-specific BLAST (matrix)
• Links to proteins in Entrez and 3D structure
• Stand-alone version of RPS-BLAST at:
ftp://ncbi.nlm.nih.gov/toolbox
CDD homepage
CDD Search
result
DART
CDD
example
entry
PIR link
from CDD
INTERPRO
• http://www.ebi.ac.uk/interpro
• Integration of different signature
recognition methods (PROSITE,
PRINTS, PFAM, ProDom and SMART)
InterPro release 3
• Built from PROSITE, PRINTS, Pfam, ProDom, SMART, SWISS-PROT
and TrEMBL
• Contains 3915 entries encoded by 7714 different regular expressions,
profiles, fingerprints, Hidden Markov Models and ProDom domains
• InterPro provides >1 million InterPro matches hits against 532403 SWISS-
PROT + TrEMBL protein sequences (68% coverage)
• Direct access to the underlying Oracle database
• A XML flatfile is available at ftp://ftp.ebi.ac.uk/pub/databases/interpro/
• SRS implementation
• Text- and sequence-based searches
InterProScan
• PROSITE patterns: ppsearch
• PROSITE profiles: pfscan
• PFAM HMMs: hmmpfam
• PRINTS fingerprints: fpscan
• ProDom
• SMART
• eMotif derived PROSITE pattern
• TMHMM
• SignalP
PRINTS detailed results ANX3_MOUSE
Annexin type III
SUMMARY
• Many different protein signature databases from
small patterns to alignments to complex HMMs
• Have different strengths and weaknesses
• Have different database formats
• Therefore: best to combine methods, preferably
in a database with them already merged for
simple analysis with consistent format
Protein Secondary Structure
• CATH (Class, Architecture,Topology, Homology)
http://www.biochem.ucl.ac.uk/dbbrowser/cath/
• SCOP (structural classification of proteins) -hierarchical
database of protein folds
http://scop.mrc-lmb.cam.ac.uk/scop
• FSSP Fold classification using structure-structure
alignment of proteins
http://www2.ebi.ac.uk/fssp/fssp.html
• TOPS Cartoon representation of topology showing
helices and strands http://tops.ebi.ac.uk/tops/

More Related Content

Similar to patterndat.pdf

ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsNick Loman
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Lucidworks
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)Ariful Islam Sagar
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptxericndunek
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure predictionSubin E K
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303Bruno Mmassy
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.pptSoumen Barman
 

Similar to patterndat.pdf (20)

ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
 
Protein database
Protein databaseProtein database
Protein database
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
BLAST
BLASTBLAST
BLAST
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
Blasta
BlastaBlasta
Blasta
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.ppt
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 

patterndat.pdf

  • 3. BASIC INFORMATION COMES FROM SEQUENCE • Multiple alignments of related sequences- can build up consensus sequences of known families, domains, motifs or sites. • Pattern • Matrix • Profile • HMM
  • 4. COMMON PROTEIN PATTERN DATABASES • Prosite patterns • Prosite profiles • Pfam • SMART • Prints • TIGRFAMs • BLOCKS Alignment databases • ProDom • PIR-ALN • ProtoMap • Domo • ProClass
  • 5. PROSITE Patterns and profiles • http://www.expasy.ch/prosite/ • Building a pattern: a.) from literature -test against SP, update if necessary b.) new patterns: Start with reviewed protein family, known functional sites: • enzyme catalytic site, • attachment site eg heme, • metal ion binding site • cysteines for disulphide bonds, • molecule (GTP) or protein binding site
  • 6. PROSITE PATTERNS Alignment Functionally important residues Find 4-5 conserved residues Core pattern Find only correct entries- leave Find many false positives, increase pattern and re-search Search SP Pattern is given as regular expression: [AC]-x-V-x(4)-{ED} ala/cys-any-val-any-any-any-any-(any except glu or asp)
  • 7. PROSITE PROFILES • Not confined to small regions, cover whole protein or domain and has more info on allowed aa at each position • Start with multiple seq alignment -uses a symbol comparison table to convert residue frequency distributions into weights • Result- table of position-specific amino acid weights and gap costs- calculate a similarity score for any alignment between a profile and a sequence, or parts of a profile and a sequence • Tested on SP, refined. Begin as prefiles then integrated
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Pfam •http://www.sanger.ac.uk/Software/Pfam/index.shtml •Database of HMMs for domains and families •HMMs are built from HMMER2 (Bayesian statistical models), can use two modes ls or fs, all domains should be matched with ls •Use Bits scores, thresholds are chosen manually using E-value from extreme fit distribution •Two parts to Pfam: ➩ PfamA -manually curated ➩ PfamB -automatic clustering of rest of SPTR from ProDom using Domainer •Use -looking at domain structure of SPTR protein or new sequence
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Additional features of Pfam • PfamA has about 65% coverage of SPTR, rest is covered by PfamB • Can search directly with DNA -Wise2 package • Can view taxonomic range of each entry • Can view proteins with similar domain structure and view of all family members • Links to other databases including 3D structure • Note: No 2 PfamA HMMs should overlap
  • 21. SMART- Simple Modular Architecture Research Tool • http://smart.embl-heidelberg.de/ • Relies on hand curated multiple sequence alignments of representative family members from PSI-BLAST- builds HMMs- used to search database for more seq for alignment- iterative searching until no more homologues detected • Store Ep (highest per protein E-value of T) and En (lowest per protein E-value of N) values • Will predict domain homologue with sequence if – Ep < E-value <En and E-value <1.0
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. Additional features of SMART • Used for identification of genetically mobile domains and analysis of domain architectures • Can search for proteins containing specific combinations of domains in defined taxa • Can search for proteins with identical domain architecture • Also has information on intrinsic features like signal sequences, transmembrane helices, coiled-coil regions and compositionally biased regions
  • 29. ProDom • http://www.toulouse.inra.fr/prodom.html • Groups all sequences in SPTR into domains ->150 000 families • Use automatic process to build up domains -DOMAINER • For expert curated families, use PfamA alignments to build new ProDom families • Use diameter (max distance between two domains in family) and radius of gyration root mean square of distance between domain and family consensus), both counted in PAM (percent accepted mutations (no per 100 aa) to measure consistency of a family, lower these values, more homogeneous family
  • 30. Building of ProDom families DB initialisation: SEG, split modified seq, sort by size Query DB looking for internal repeats Extract shortest sequence as a query PSI-BLAST New ProDom domain family Extract first repeat - use as a query Remove new found domains, sort by size Repeated until database is empty
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37. PRINTS -Fingerprint DB • http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/ • Fingerprint- set of motifs used to predict occurrence of similar motifs in a sequence • Built by iterative scanning of OWL database • Multiple sequence alignment- identify conserved motifs- scan database with each motif- correlate hitlists for each- should have more sequences now- generate more motifs- repeat until convergence • Recognition of individual elements in fingerprint is mutually conditional • True members match all elements in order, subfamily may match part of fingerprint
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44. BLOCKS • http://www.blocks.fhcrc.org/ • Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile • Built up using PROTOMAT (BLOSUM scoring model), calibrated against SWISS-PROT, use LAMA to search blocks against blocks • Starting sequences from Prosite, PRINTS, Pfam, ProDom and Domo - total of 2129 families
  • 45. Building of Blocks Alignments from Prosite Alignments from ProDom Alignments from PfamA Alignments from PRINTS Alignments from Domo Build blocks using PROTOMAT Search for common blocks LAMA- remove Build blocks using PROTOMAT Search for common blocks LAMA- remove Build blocks using PROTOMAT Search for common blocks LAMA- remove Blocks database annotated verified Unverified and changes
  • 46. SEARCHING BLOCKS • Compare a protein or DNA (1-6 frames) sequence to database of blocks • Blocks Searcher- used via internet or email: First position of sequence aligned to first position of first block - score for that position, score summed over width of alignment, then block is aligned with next position etc for all blocks in database- get best alignment score. Search is slow (350 aa/2 min) • Can search database of PSI-BLAST PSSMs for each blocks family using IMPALA
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52. TIGRFAMs • http://www.tigr.org/TIGRFAMs • Collection of protein families in HMMs built with curated multiple sequence alignments and with associated functional information • Equivalog- homologous proteins conserved with respect to function since last ancestor (other pattern databases concentrate on related seq not function) • > 800 non-overlapping families -can search by text or sequence • Has information for automatic annotation of function, weighted towards microbial genomes
  • 56. PIR-ALN • http://www-nbrf.georgetown.edu/pirwww/ search/textpiraln.html • Database of annotated protein sequence alignments derived automatically from PIR PSD • Includes alignments at superfamily (whole sequence), family (45% identity) and domain (in more than one superfamily) levels • 3983 alignments, 1480 superfamilies, 371 domains • Can search by protein accession number or text
  • 57. PROTOMAP • http://www.protomap.cs.huji.ac.il • Automatic classification of all SWISS-PROT proteins into groups of related proteins (also including TrEMBL now) • Based on pairwise similarities • Has hierarchical organisation for sub- and super-family distinctions • 13 354 clusters, 5869 ≥ 2 proteins, 1403 ≥ 10 • Keeps SP annotation eg description, keywords • Can search with a sequence -classify it into existing clusters
  • 58. DOMO • http://www.infobiogen.fr/srs6bin/cgi-bin/wgetz?- page+LibInfo+-lib+DOMO (SRS) • Database of gapped multiple sequence alignments from SWISS-PROT and PIR • Domain boundaries inferred automatically, rather than from 3D data • Has 8877 alignments, 99058 domains, and repeats • Each entry is one homologous domain, has annotation on related proteins, functional families, evolutionary tree etc
  • 59. ProClass • http://pir.georgetown.edu/gfserver/proclass.html • Non-redundant protein database organized by family relationships defined by Prosite patterns and PIR superfamilies. • Facilitates protein family information retrieval, domain and family relationships, and classifies multi-domain proteins • Contains 155,868 sequence entries
  • 60. SBASE (Agricultural Biotechnology Centre) • http://sbase.abc.hu/main.html • Protein domain library from clustering of functional and structural domains • SBASE entries - grouped by Standard names (SN groups) that designate various functional and structural domains of protein sequences- relies on good annotation of domains • Detects subclasses too • Can do similarity search with BLAST or PSI-BLAST
  • 61. Integrating Pattern databases • MetaFam • IProClass • CDD • InterPro
  • 62. METAFAM • http://metafam.ahc.umn.edu/ • Protein family classification built with Blocks+, DOMO, Pfam, PIR-ALN, PRINTS, Prosite, ProDom, SBASE, SYSTERS • Automatically create supersets of overlapping families using set-theory to compare databases- reference domains covering total area • Use non-redundant protein set from SPTR & PIR
  • 63. IProClass • http://pir.georgetown.edu/iproclass/ • Integrated database linking ProClass, PIR-ALN, Prosite, Pfam and Blocks • Contains >20000 non-redundant SP & PIR proteins, 28000 superfamilies, 2600 domains, 1300 motifs, 280 PTMs • Can be searched by text or sequence
  • 64. CDD Conserved Domain Database • http://www.ncbi.nlm.nih.gov:80/Structure/cdd/cdd.shtml • Database of domains derived from SMART, Pfam and contributions from NCBI (LOAD) • Uses reverse position-specific BLAST (matrix) • Links to proteins in Entrez and 3D structure • Stand-alone version of RPS-BLAST at: ftp://ncbi.nlm.nih.gov/toolbox
  • 67. DART
  • 70. INTERPRO • http://www.ebi.ac.uk/interpro • Integration of different signature recognition methods (PROSITE, PRINTS, PFAM, ProDom and SMART)
  • 71. InterPro release 3 • Built from PROSITE, PRINTS, Pfam, ProDom, SMART, SWISS-PROT and TrEMBL • Contains 3915 entries encoded by 7714 different regular expressions, profiles, fingerprints, Hidden Markov Models and ProDom domains • InterPro provides >1 million InterPro matches hits against 532403 SWISS- PROT + TrEMBL protein sequences (68% coverage) • Direct access to the underlying Oracle database • A XML flatfile is available at ftp://ftp.ebi.ac.uk/pub/databases/interpro/ • SRS implementation • Text- and sequence-based searches
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83. InterProScan • PROSITE patterns: ppsearch • PROSITE profiles: pfscan • PFAM HMMs: hmmpfam • PRINTS fingerprints: fpscan • ProDom • SMART • eMotif derived PROSITE pattern • TMHMM • SignalP
  • 84.
  • 85. PRINTS detailed results ANX3_MOUSE Annexin type III
  • 86. SUMMARY • Many different protein signature databases from small patterns to alignments to complex HMMs • Have different strengths and weaknesses • Have different database formats • Therefore: best to combine methods, preferably in a database with them already merged for simple analysis with consistent format
  • 87.
  • 88. Protein Secondary Structure • CATH (Class, Architecture,Topology, Homology) http://www.biochem.ucl.ac.uk/dbbrowser/cath/ • SCOP (structural classification of proteins) -hierarchical database of protein folds http://scop.mrc-lmb.cam.ac.uk/scop • FSSP Fold classification using structure-structure alignment of proteins http://www2.ebi.ac.uk/fssp/fssp.html • TOPS Cartoon representation of topology showing helices and strands http://tops.ebi.ac.uk/tops/