SlideShare a Scribd company logo
LECTURE TOPIC: PROTEIN DATABASES
TOPICS COVERED: UniProtKB/Swiss-Prot/TrEMBL, PIR,
MIPS, PROSITE, PRINTS, BLOCKS,
Pfam, NDRB, OWL, PDB, SCOP,
CATH, NDB, PQS, SYSTERS, Motif
LECTURE BY: Ashok Kumar T
ashok
@biogem.org
Computational Terms & Definitions
 Protein Sequence – 20 AA characters [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y] in sequence
 Protein Structure – 3D of atomic co-ordinates [x-axis, y-axis, z-axis]
 Types of Biological Databases – [Raw Database = Plain text, Object-oriented Database = Table (Records),
Relational Database = Table of tables]
 3D Atom Model – [Sphere = Atom, Cylinder = Bond, Dotted Line = Bond Interaction]
 Sequence Alignment – [Match = Similar Character, Mismatch = Different Character, Gap = No Substitute
Character, Word = Sub-string, Sequence = Super-string, Score = Rating, Identity = Similar in function]
 Motif – Short, conserved sequence associated with a distinct function
 Domain – Evolutionarily conserved sequence region that corresponds to a structurally independent 3D
unit associated with a particular functional role. It is usually much larger than a motif
 Pattern – Sequence with symbol representation for a expression. Example: N{P}-[ST]{P}A(2,3).
 Regular Expression – Representation format for a sequence motif, which includes positional information
for conserved and partly conserved residues. Similar to Pattern, but applies to MSA
 Profile – Scoring matrix that represents a multiple sequence alignment. It contains probability or
frequency values of residues for each aligned position in the alignment including gaps
UniProtKB/Swiss-Prot/TrEMBL
 Universal Protein Resource (UniProt) is a
comprehensive and non-redundant resource for
protein sequence and annotation data
 The UniProt databases are the UniProt
Knowledgebase (UniProtKB), the UniProt
Reference Clusters (UniRef), and the UniProt
Archive (UniParc)
 UniProt Metagenomic and Environmental
Sequences (UniMES) database is a repository
specifically developed for metagenomic and
environmental data
http://www.uniprot.org/
Background of UniProtKB
• UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI),
the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR)
• EMBL-EBI and SIB together used to produce Swiss-Prot and TrEMBL, while PIR produced
the Protein Sequence Database (PIR-PSD)
• Translated EMBL Nucleotide Sequence Data Library (TrEMBL) was originally created
because sequence data was being generated at a pace that exceeded Swiss-Prot's ability
to keep up
• PIR maintained the PIR-PSD and related databases, including iProClass, a database of
protein sequences and curated families
UniProtKB Search Result
NBRF/PIR
The Protein Information Resource (PIR) is an integrated bioinformatics resource for
genomic, proteomic and systems biology research and scientific studies, established by
the National Biomedical Research Foundation (NBRF). PIR offers a wide variety of
resources mainly oriented to assist the propagation and standardization of protein
annotation:
 PRO – Protein related ontology
 iProClass – Integrated protein knowledgebase
 iProLINK – Literature information and knowledgebase
 iPTMnet – Integrated protein post-translational modification resource
 iProXpress – Integrated protein expression analysis system
 RESID Database - Comprehensive collection of annotations and structures for protein
modifications
http://pir.georgetown.edu/
MIPS
• Munich Information Center for Protein Sequences (MIPS) is a research center hosted by
Institute of Bioinformatics and Systems Biology (IBIS) and it is part of the Helmholtz
Research Center for Environmental Health, Germany
• MIPS focus on the systematic analysis of genome information including the
development and application of bioinformatics methods in genome annotation, gene
expression analysis and proteomics
• MIPS supports and maintains a set of generic databases as well as the systematic
comparative analysis of microbial, fungal, and plant genomes
• MIPS offers different Databases, Web Services, and Platforms in Genomics, Proteins,
Metabolomics and multi-omics integration, chemical screening, and Disease annotation
HOME PAGE: https://www.helmholtz-muenchen.de/ibis/
PPI: http://mips.helmholtz-muenchen.de/proj/ppi/
PROSITE
• PROSITE, a protein domain database for functional characterization and annotation.
• PROSITE consists of entries describing the protein families, domains and functional sites as
well as amino acid patterns and profiles in them.
• PROSITE is manually curated by a team of the Swiss Institute of Bioinformatics and tightly
integrated into Swiss-Prot protein annotation.
• PROSITE is complemented by ProRule, a collection of rules based on profiles and patterns.
• The rules contain information about biologically meaningful residues, like active sites,
substrate- or co-factor-binding sites, posttranslational modification sites or disulfide bonds,
to help function determination.
http://prosite.expasy.org/
Result of PROSITE for Matching Pattern Hits
PRINTS
• PRINTS database is a collection of protein motif fingerprints
• Fingerprint is a group of conserved motifs used to characterize a protein family
• Motifs do not overlap, but are separated along a sequence, though they may be
contiguous in 3D-space to define molecular binding sites or interaction surfaces
• Fingerprints can encode protein folds and functionalities more flexibly and powerfully
than can single motifs
• PRINTS provides detailed annotation resource for protein families, and a diagnostic
tool for newly determined sequences
• PRINTS is a founding partner of the integrated resource, InterPro, a widely used
database of protein families, domains and functional sites
http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/
http://130.88.97.239/PRINTS/
BLOCKS
• BLOCKS Database is based on InterPro entries with sequences from Swiss-Prot and
TrEMBL
• Blocks are multiple aligned ungapped segments corresponding to the most highly
conserved regions of proteins
• BLOCKS cross-references to PROSITE and/or PRINTS and/or SMART, and/or Pfam
and/or ProDom entries.
• BLOCKS Database was constructed by the PROTOMAT system using the MOTIF
algorithm
http://blocks.fhcrc.org/
Pfam
• The Pfam database is a large collection of protein families, each represented by multiple
sequence alignments and hidden Markov models (HMMs).
• Pfam version 31.0 was produced at the EBI using a sequence database called Pfamseq,
which is based on UniProtKB.
• Pfam 31.0 has 16,712 families
• The descriptions of Pfam families are managed by the general public using Wikipedia.
• The Pfam database contains information about protein domains and families.
• Pfam-A is the manually curated portion of the database
• Pfam-B contains a large number of small families derived from clusters produced by an
algorithm called ADDA (for automatic generation).
• Pfam-B families can be useful when no Pfam-A families are found (but lower quality).
http://pfam.xfam.org/
Classification of Pfam Entries
• Family - A collection of related protein regions
• Domain - A structural unit
• Repeat - A short unit which is unstable in isolation but forms a stable structure when
multiple copies are present
• Motifs - A short unit found outside globular domains
• Coiled-Coil - Regions that predominantly contain coiled-coil motifs, regions that typically
contain alpha-helices that are coiled together in bundles of 2-7.
• Disordered - Regions that are conserved, yet are either shown or predicted to contain
bias sequence composition and/or are intrinsically disordered (non-globular).
• Clans - A collection of families that have arisen from a single evolutionary origin
• Related Pfam entries are grouped together into clans; the relationship may be defined
by similarity of sequence, structure or profile-HMM.
NRDB/NRDB90
• NRDB (Non-Redundant DataBase) is a so-called non-redundant composite of the following
sources: PDB, RefSeq, UniProtKB/Swiss-Prot, DDBJ, EMBL, GenBank, and PIR
• NRDB is similar in content to OWL, but contains non-redundant and more up-to-date
information
• NRDB is not non-redundant, but non-identical - i.e., only identical sequence copies are
removed from the database
• NRDB algorithm was written by Warren Gish at Washington University to construct database
called NRDB90
• NRDB contains sequences which do not have homologues with sequence identity of 90% or
more
• NRDB is currently maintained by NCBI
http://www.ebi.ac.uk/~holm/nrdb90/ [MOVED]
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
OWL
• OWL is a non-redundant composite of 4 publicly-available primary sources: Swiss-
Prot, PIR, GenBank (translation) and NRL-3D
• Swiss-Prot is the highest priority source, all others being compared against it to
eliminate identical and trivially-different sequences
• The strict redundancy criteria render OWL relatively “small” and hence efficient in
similarity searches
http://www.bioinf.man.ac.uk/dbbrowser/OWL
http://130.88.97.239/OWL/
PDB
• The Protein Data Bank (PDB) archive is the single worldwide repository of information
about the 3D structures of large biological molecules, including proteins and nucleic
acids.
• The PDB was established in 1971 at Brookhaven National Laboratory (BNL) under the
leadership of Walter Hamilton and originally contained 7 structures.
• In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became
responsible for the management of the PDB.
• In 2003, the wwPDB was formed to maintain a single PDB archive of macromolecular
structural data that is freely and publicly available to the global community.
• The RCSB PDB supports a website where visitors can perform simple and complex queries
on the data, analyze, and visualize the results.
• Members of wwPDB are: RCSB PDB (USA), PDBe (Europe) and PDBj (Japan), and
Biological Magnetic Resonance Data Bank BMRB (USA).
http://rcsb.org/pdb/
SCOP2
• The SCOP (Structural Classification of Proteins) database is a large manual classification of protein
structural domains based on similarities of their structures and amino acid sequences.
• A motivation for this classification is to determine the evolutionary relationship between proteins.
• Proteins with the same shapes but having little sequence or functional similarity are placed in
different “superfamilies”, and are assumed to have only a very distant common ancestor.
• Proteins having the same shape and some similarity of sequence and/or function are placed in
“families”, and are assumed to have a closer common ancestor.
• SCOP has been discontinued and the last official version of SCOP is 1.75. SCOP1.75 is also known as
SCOP2.
• SCOP2 offers two different ways for accessing data: SCOP2-browser, and SCOP2-graph.
• SCOP2-browser allows navigation in a traditional way by browsing pages displaying the node
information.
• SCOP2-graph is a graph-based web tool for display and navigation.
• The source of protein structures is the Protein Data Bank.
http://scop2.mrc-lmb.cam.ac.uk/
Classification of SCOP Entries
• The unit of classification of structure in SCOP is the protein domain.
• The levels of SCOP are as follows.
1. Class: Types of folds, e.g., all α, all β, α/β, α+β, α&β, etc.
2. Fold: The different shapes of domains within a class, e.g., 2 helices; antiparallel hairpin,
left-handed twist, etc.
3. Superfamily: The domains in a fold are grouped into superfamilies, which have at least
distant common ancestor.
4. Family: The domains in a superfamily are grouped into families, which have recent
common ancestor.
5. Protein domain: The domains in families are grouped into protein domains, which are
essentially the same protein.
6. Species: The domains in “protein domains” are grouped according to species.
7. Domain: It is part of a protein. For simple proteins, it can be the entire protein.
Hierarchical structure of SCOP
Output of SCOP
CATH
• The CATH (Class, Architecture, Topology, and Homologous superfamily) is a semi-
automatic, hierarchical classification of protein domains.
• CATH shares many broad features with its principal rival, SCOP.
• The four main levels of the CATH hierarchy are as follows:
1. Class: the overall secondary-structure content of the domain. e.g., all α, all β, α/β,
α+β, α&β, etc.
2. Architecture: high structural similarity but no evidence of homology. Equivalent to
a fold in SCOP.
3. Topology: a large-scale grouping of topologies which share particular structural
features
4. Homologous superfamily: indicative of a demonstrable evolutionary relationship.
Equivalent to the superfamily level of SCOP.
http://www.cathdb.info/
NDB
 Nucleic Acid Database (NDB) is a repository of 3D nucleic acid structures and their complexes
 Structures available in the NDB include RNA and DNA oligonucleotides with two or more bases
either alone or complexed with proteins or small molecule ligands
 NDB contains both primary and derived information about the structures
• Primary information include X-ray crystallography or NMR coordinate data
• Derived information include valence geometry, torsion angles and intermolecular contacts
data
 NDB offers varieties of online and offline tools for analyzing nucleic acid structures. The featured
tools include
• RNA 3D Motif Atlas, a representative collection of RNA 3D internal and hairpin loop motifs
• Non-redundant Lists of RNA-containing 3D structures
• RNA Base Triple Atlas, a collection of motifs consisting of two RNA basepairs
• WebFR3D, a webserver for symbolic and geometric searching of RNA 3D structures
• R3D Align, an application for detailed nucleotide to nucleotide alignments of RNA 3D
structures
http://ndbserver.rutgers.edu
PQS/PDBePISA/PISA
 PISA (Proteins, Interfaces, Structures, and Assemblies), formerly known as PQS (Protein
Quaternary Structure) database, was constructed by EMBL-EBI
 PISA is an interactive tool for the exploration of macromolecular interfaces
 PISA presents results calculated by certain physico-chemical models for PDB and/or uploaded
macromolecular structures
 PISA provides probable quaternary structures (assemblies), their structural and chemical
properties and probable dissociation pattern
http://www.ebi.ac.uk/pdbe/pisa/
SYSTERS
 SYSTERS (SYSTEmatic Re-Searching) is a collection of graph-based algorithms to hierarchically
partition a large set of protein sequences into homologous families and super-families
 SYSTERS are based on an all-against-all database search (using Smith-Waterman comparisons
on a GeneMatcher machine)
 The resulting set of protein families contains four different types of clusters based on the
connectivity within their family distance graph with decreasing reliability:
 Perfect Clusters (P): all sequences are connected to all other sequences in the cluster
 Single Sequence Cluster (S): a special case of perfect cluster
 Nested Clusters (N): at least one sequence is connected to all other sequences in the cluster
 Overlapping Clusters (O): no sequence is connected to all other sequences in the cluster
http://systers.molgen.mpg.de/ [DISCONTINUED]
Motif
• Motif is a search service provided by GenomeNet to search with a protein query sequence
against Motif Libraries
• Supports several motif databases such as Prosite, BLOCKS, ProDom, Pfam, and PRINTS
• Allows you to search protein sequence libraries with your patterns
• Each residue must be separated with - (minus sign)
• x represents any amino acids
• [DE] means either D or E
• {FWY} means any amino acids except for F, W and Y
• A(2,3) means that A appears 2 to 3 times consecutively
• The pattern string must be terminated with . (period)
For example, C-x-{C}-[DN]-x(2)-C-x(5)-C-C.
• Generates a profile from a set of multiple aligned sequences using PFMake or HMMBuild
http://www.genome.jp/tools/motif/
Protein Databases
Protein Databases
Protein Databases

More Related Content

What's hot

Scop database
Scop databaseScop database
Scop database
Sayantani Roy
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
nadeem akhter
 
Protein database
Protein databaseProtein database
Protein database
Rajpal Choudhary
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
Alichy Sowmya
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
Hafiz Muhammad Zeeshan Raza
 
Protein database
Protein  databaseProtein  database
Protein database
KAUSHAL SAHU
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
KAUSHAL SAHU
 
UniProt
UniProtUniProt
UniProt
AmnaA7
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
SATHIYA NARAYANAN
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
DrSatyabrataSahoo
 
Structural databases
Structural databases Structural databases
Structural databases
Priyadharshana
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
harshita agarwal
 
TrEMBL
TrEMBLTrEMBL
Protein Database
Protein DatabaseProtein Database
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
Vijay Hemmadi
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
KAUSHAL SAHU
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
Harindu Chathuranga Korala
 

What's hot (20)

Scop database
Scop databaseScop database
Scop database
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Protein database
Protein databaseProtein database
Protein database
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Protein database
Protein  databaseProtein  database
Protein database
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
UniProt
UniProtUniProt
UniProt
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Structural databases
Structural databases Structural databases
Structural databases
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
 
TrEMBL
TrEMBLTrEMBL
TrEMBL
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
protein data bank
protein data bankprotein data bank
protein data bank
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 

Similar to Protein Databases

Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
KAUSHAL SAHU
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
PrashantSharma807
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
PUNJAB AGRICULTURAL UNIVERSITY, LUDHIANA, 141004, PUNJAB (INDIA)
 
Biological databases
Biological databasesBiological databases
Biological databases
Tamanna Syeda
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.ppt
SanthiyaAK
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
science lover
 
Biological databases
Biological databases Biological databases
Biological databases
SEKHARREDDYAMBATI
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
Cherry
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
Vandana Yadav03
 
Protein databases
Protein databasesProtein databases
Protein databases
bansalaman80
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
 
Protein database
Protein databaseProtein database
Protein database
Khalid Hakeem
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
BibiQuinah
 
Biological database
Biological databaseBiological database
Biological database
Iqbal college Peringammala TVM
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.ppt
Bangaluru
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
Razzaqe
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
Razzaqe
 

Similar to Protein Databases (20)

Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Biological databases
Biological databasesBiological databases
Biological databases
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.ppt
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Biological databases
Biological databases Biological databases
Biological databases
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Protein database
Protein databaseProtein database
Protein database
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Biological database
Biological databaseBiological database
Biological database
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.ppt
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 

Recently uploaded

Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 

Recently uploaded (20)

Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 

Protein Databases

  • 1. LECTURE TOPIC: PROTEIN DATABASES TOPICS COVERED: UniProtKB/Swiss-Prot/TrEMBL, PIR, MIPS, PROSITE, PRINTS, BLOCKS, Pfam, NDRB, OWL, PDB, SCOP, CATH, NDB, PQS, SYSTERS, Motif LECTURE BY: Ashok Kumar T ashok @biogem.org
  • 2. Computational Terms & Definitions  Protein Sequence – 20 AA characters [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y] in sequence  Protein Structure – 3D of atomic co-ordinates [x-axis, y-axis, z-axis]  Types of Biological Databases – [Raw Database = Plain text, Object-oriented Database = Table (Records), Relational Database = Table of tables]  3D Atom Model – [Sphere = Atom, Cylinder = Bond, Dotted Line = Bond Interaction]  Sequence Alignment – [Match = Similar Character, Mismatch = Different Character, Gap = No Substitute Character, Word = Sub-string, Sequence = Super-string, Score = Rating, Identity = Similar in function]  Motif – Short, conserved sequence associated with a distinct function  Domain – Evolutionarily conserved sequence region that corresponds to a structurally independent 3D unit associated with a particular functional role. It is usually much larger than a motif  Pattern – Sequence with symbol representation for a expression. Example: N{P}-[ST]{P}A(2,3).  Regular Expression – Representation format for a sequence motif, which includes positional information for conserved and partly conserved residues. Similar to Pattern, but applies to MSA  Profile – Scoring matrix that represents a multiple sequence alignment. It contains probability or frequency values of residues for each aligned position in the alignment including gaps
  • 3. UniProtKB/Swiss-Prot/TrEMBL  Universal Protein Resource (UniProt) is a comprehensive and non-redundant resource for protein sequence and annotation data  The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc)  UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data http://www.uniprot.org/
  • 4. Background of UniProtKB • UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR) • EMBL-EBI and SIB together used to produce Swiss-Prot and TrEMBL, while PIR produced the Protein Sequence Database (PIR-PSD) • Translated EMBL Nucleotide Sequence Data Library (TrEMBL) was originally created because sequence data was being generated at a pace that exceeded Swiss-Prot's ability to keep up • PIR maintained the PIR-PSD and related databases, including iProClass, a database of protein sequences and curated families
  • 5.
  • 7. NBRF/PIR The Protein Information Resource (PIR) is an integrated bioinformatics resource for genomic, proteomic and systems biology research and scientific studies, established by the National Biomedical Research Foundation (NBRF). PIR offers a wide variety of resources mainly oriented to assist the propagation and standardization of protein annotation:  PRO – Protein related ontology  iProClass – Integrated protein knowledgebase  iProLINK – Literature information and knowledgebase  iPTMnet – Integrated protein post-translational modification resource  iProXpress – Integrated protein expression analysis system  RESID Database - Comprehensive collection of annotations and structures for protein modifications http://pir.georgetown.edu/
  • 8.
  • 9.
  • 10. MIPS • Munich Information Center for Protein Sequences (MIPS) is a research center hosted by Institute of Bioinformatics and Systems Biology (IBIS) and it is part of the Helmholtz Research Center for Environmental Health, Germany • MIPS focus on the systematic analysis of genome information including the development and application of bioinformatics methods in genome annotation, gene expression analysis and proteomics • MIPS supports and maintains a set of generic databases as well as the systematic comparative analysis of microbial, fungal, and plant genomes • MIPS offers different Databases, Web Services, and Platforms in Genomics, Proteins, Metabolomics and multi-omics integration, chemical screening, and Disease annotation HOME PAGE: https://www.helmholtz-muenchen.de/ibis/ PPI: http://mips.helmholtz-muenchen.de/proj/ppi/
  • 11.
  • 12.
  • 13.
  • 14. PROSITE • PROSITE, a protein domain database for functional characterization and annotation. • PROSITE consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. • PROSITE is manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. • PROSITE is complemented by ProRule, a collection of rules based on profiles and patterns. • The rules contain information about biologically meaningful residues, like active sites, substrate- or co-factor-binding sites, posttranslational modification sites or disulfide bonds, to help function determination. http://prosite.expasy.org/
  • 15.
  • 16.
  • 17. Result of PROSITE for Matching Pattern Hits
  • 18. PRINTS • PRINTS database is a collection of protein motif fingerprints • Fingerprint is a group of conserved motifs used to characterize a protein family • Motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space to define molecular binding sites or interaction surfaces • Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs • PRINTS provides detailed annotation resource for protein families, and a diagnostic tool for newly determined sequences • PRINTS is a founding partner of the integrated resource, InterPro, a widely used database of protein families, domains and functional sites http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/ http://130.88.97.239/PRINTS/
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. BLOCKS • BLOCKS Database is based on InterPro entries with sequences from Swiss-Prot and TrEMBL • Blocks are multiple aligned ungapped segments corresponding to the most highly conserved regions of proteins • BLOCKS cross-references to PROSITE and/or PRINTS and/or SMART, and/or Pfam and/or ProDom entries. • BLOCKS Database was constructed by the PROTOMAT system using the MOTIF algorithm http://blocks.fhcrc.org/
  • 24.
  • 25.
  • 26.
  • 27. Pfam • The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). • Pfam version 31.0 was produced at the EBI using a sequence database called Pfamseq, which is based on UniProtKB. • Pfam 31.0 has 16,712 families • The descriptions of Pfam families are managed by the general public using Wikipedia. • The Pfam database contains information about protein domains and families. • Pfam-A is the manually curated portion of the database • Pfam-B contains a large number of small families derived from clusters produced by an algorithm called ADDA (for automatic generation). • Pfam-B families can be useful when no Pfam-A families are found (but lower quality). http://pfam.xfam.org/
  • 28. Classification of Pfam Entries • Family - A collection of related protein regions • Domain - A structural unit • Repeat - A short unit which is unstable in isolation but forms a stable structure when multiple copies are present • Motifs - A short unit found outside globular domains • Coiled-Coil - Regions that predominantly contain coiled-coil motifs, regions that typically contain alpha-helices that are coiled together in bundles of 2-7. • Disordered - Regions that are conserved, yet are either shown or predicted to contain bias sequence composition and/or are intrinsically disordered (non-globular). • Clans - A collection of families that have arisen from a single evolutionary origin • Related Pfam entries are grouped together into clans; the relationship may be defined by similarity of sequence, structure or profile-HMM.
  • 29.
  • 30.
  • 31.
  • 32. NRDB/NRDB90 • NRDB (Non-Redundant DataBase) is a so-called non-redundant composite of the following sources: PDB, RefSeq, UniProtKB/Swiss-Prot, DDBJ, EMBL, GenBank, and PIR • NRDB is similar in content to OWL, but contains non-redundant and more up-to-date information • NRDB is not non-redundant, but non-identical - i.e., only identical sequence copies are removed from the database • NRDB algorithm was written by Warren Gish at Washington University to construct database called NRDB90 • NRDB contains sequences which do not have homologues with sequence identity of 90% or more • NRDB is currently maintained by NCBI http://www.ebi.ac.uk/~holm/nrdb90/ [MOVED] http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
  • 33.
  • 34.
  • 35. OWL • OWL is a non-redundant composite of 4 publicly-available primary sources: Swiss- Prot, PIR, GenBank (translation) and NRL-3D • Swiss-Prot is the highest priority source, all others being compared against it to eliminate identical and trivially-different sequences • The strict redundancy criteria render OWL relatively “small” and hence efficient in similarity searches http://www.bioinf.man.ac.uk/dbbrowser/OWL http://130.88.97.239/OWL/
  • 36.
  • 37.
  • 38.
  • 39. PDB • The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. • The PDB was established in 1971 at Brookhaven National Laboratory (BNL) under the leadership of Walter Hamilton and originally contained 7 structures. • In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB. • In 2003, the wwPDB was formed to maintain a single PDB archive of macromolecular structural data that is freely and publicly available to the global community. • The RCSB PDB supports a website where visitors can perform simple and complex queries on the data, analyze, and visualize the results. • Members of wwPDB are: RCSB PDB (USA), PDBe (Europe) and PDBj (Japan), and Biological Magnetic Resonance Data Bank BMRB (USA). http://rcsb.org/pdb/
  • 40.
  • 41.
  • 42. SCOP2 • The SCOP (Structural Classification of Proteins) database is a large manual classification of protein structural domains based on similarities of their structures and amino acid sequences. • A motivation for this classification is to determine the evolutionary relationship between proteins. • Proteins with the same shapes but having little sequence or functional similarity are placed in different “superfamilies”, and are assumed to have only a very distant common ancestor. • Proteins having the same shape and some similarity of sequence and/or function are placed in “families”, and are assumed to have a closer common ancestor. • SCOP has been discontinued and the last official version of SCOP is 1.75. SCOP1.75 is also known as SCOP2. • SCOP2 offers two different ways for accessing data: SCOP2-browser, and SCOP2-graph. • SCOP2-browser allows navigation in a traditional way by browsing pages displaying the node information. • SCOP2-graph is a graph-based web tool for display and navigation. • The source of protein structures is the Protein Data Bank. http://scop2.mrc-lmb.cam.ac.uk/
  • 43. Classification of SCOP Entries • The unit of classification of structure in SCOP is the protein domain. • The levels of SCOP are as follows. 1. Class: Types of folds, e.g., all α, all β, α/β, α+β, α&β, etc. 2. Fold: The different shapes of domains within a class, e.g., 2 helices; antiparallel hairpin, left-handed twist, etc. 3. Superfamily: The domains in a fold are grouped into superfamilies, which have at least distant common ancestor. 4. Family: The domains in a superfamily are grouped into families, which have recent common ancestor. 5. Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein. 6. Species: The domains in “protein domains” are grouped according to species. 7. Domain: It is part of a protein. For simple proteins, it can be the entire protein.
  • 46.
  • 47. CATH • The CATH (Class, Architecture, Topology, and Homologous superfamily) is a semi- automatic, hierarchical classification of protein domains. • CATH shares many broad features with its principal rival, SCOP. • The four main levels of the CATH hierarchy are as follows: 1. Class: the overall secondary-structure content of the domain. e.g., all α, all β, α/β, α+β, α&β, etc. 2. Architecture: high structural similarity but no evidence of homology. Equivalent to a fold in SCOP. 3. Topology: a large-scale grouping of topologies which share particular structural features 4. Homologous superfamily: indicative of a demonstrable evolutionary relationship. Equivalent to the superfamily level of SCOP. http://www.cathdb.info/
  • 48.
  • 49.
  • 50.
  • 51.
  • 52. NDB  Nucleic Acid Database (NDB) is a repository of 3D nucleic acid structures and their complexes  Structures available in the NDB include RNA and DNA oligonucleotides with two or more bases either alone or complexed with proteins or small molecule ligands  NDB contains both primary and derived information about the structures • Primary information include X-ray crystallography or NMR coordinate data • Derived information include valence geometry, torsion angles and intermolecular contacts data  NDB offers varieties of online and offline tools for analyzing nucleic acid structures. The featured tools include • RNA 3D Motif Atlas, a representative collection of RNA 3D internal and hairpin loop motifs • Non-redundant Lists of RNA-containing 3D structures • RNA Base Triple Atlas, a collection of motifs consisting of two RNA basepairs • WebFR3D, a webserver for symbolic and geometric searching of RNA 3D structures • R3D Align, an application for detailed nucleotide to nucleotide alignments of RNA 3D structures http://ndbserver.rutgers.edu
  • 53.
  • 54.
  • 55. PQS/PDBePISA/PISA  PISA (Proteins, Interfaces, Structures, and Assemblies), formerly known as PQS (Protein Quaternary Structure) database, was constructed by EMBL-EBI  PISA is an interactive tool for the exploration of macromolecular interfaces  PISA presents results calculated by certain physico-chemical models for PDB and/or uploaded macromolecular structures  PISA provides probable quaternary structures (assemblies), their structural and chemical properties and probable dissociation pattern http://www.ebi.ac.uk/pdbe/pisa/
  • 56.
  • 57.
  • 58.
  • 59.
  • 60. SYSTERS  SYSTERS (SYSTEmatic Re-Searching) is a collection of graph-based algorithms to hierarchically partition a large set of protein sequences into homologous families and super-families  SYSTERS are based on an all-against-all database search (using Smith-Waterman comparisons on a GeneMatcher machine)  The resulting set of protein families contains four different types of clusters based on the connectivity within their family distance graph with decreasing reliability:  Perfect Clusters (P): all sequences are connected to all other sequences in the cluster  Single Sequence Cluster (S): a special case of perfect cluster  Nested Clusters (N): at least one sequence is connected to all other sequences in the cluster  Overlapping Clusters (O): no sequence is connected to all other sequences in the cluster http://systers.molgen.mpg.de/ [DISCONTINUED]
  • 61.
  • 62.
  • 63.
  • 64. Motif • Motif is a search service provided by GenomeNet to search with a protein query sequence against Motif Libraries • Supports several motif databases such as Prosite, BLOCKS, ProDom, Pfam, and PRINTS • Allows you to search protein sequence libraries with your patterns • Each residue must be separated with - (minus sign) • x represents any amino acids • [DE] means either D or E • {FWY} means any amino acids except for F, W and Y • A(2,3) means that A appears 2 to 3 times consecutively • The pattern string must be terminated with . (period) For example, C-x-{C}-[DN]-x(2)-C-x(5)-C-C. • Generates a profile from a set of multiple aligned sequences using PFMake or HMMBuild http://www.genome.jp/tools/motif/