Protein Sequence, Structure, and Functional Databases: UniProtKB, Swiss-Prot, TrEMBL, PIR, MIPS, PROSITE, PRINTS, BLOCKS, Pfam, NDRB, OWL, PDB, SCOP, CATH, NDB, PQS, SYSTERS, and Motif. Presented at UGC Sponsored National Workshop on Bioinformatics and Sequence Analysis conducted by Nesamony Memorial Christian College, Marthandam on 9th and 10th October, 2017 by Prof. T. Ashok Kumar
This presentation gives you a detailed information about the swiss prot database that comes under UniProtKB. It also covers TrEMBL: a computer annotated supplement to Swiss-Prot.
An integrated publicly accessible bioinformatics resource to support genomic/proteomic research and scientific discovery.
Established in 1984, by the National Biomedical Research Foundation (NBRF) Georgetown University Medial Center, Washington D.C., USA.
It is the source of annotated protein databases and analysis tools for the researchers.
Serve as primary resource for the exploration of protein information.
Accessible by text search for entry and list retrieval, and also BLAST search and peptide match.
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
Sequence Alignment Pairwise alignment:- Global Alignment and Local AlignmentTwo types of alignment Progressive Programs for multiple sequence alignment BLOSUM Point accepted mutation (PAM)PAM VS BLOSUM
This presentation gives you a detailed information about the swiss prot database that comes under UniProtKB. It also covers TrEMBL: a computer annotated supplement to Swiss-Prot.
An integrated publicly accessible bioinformatics resource to support genomic/proteomic research and scientific discovery.
Established in 1984, by the National Biomedical Research Foundation (NBRF) Georgetown University Medial Center, Washington D.C., USA.
It is the source of annotated protein databases and analysis tools for the researchers.
Serve as primary resource for the exploration of protein information.
Accessible by text search for entry and list retrieval, and also BLAST search and peptide match.
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
Sequence Alignment Pairwise alignment:- Global Alignment and Local AlignmentTwo types of alignment Progressive Programs for multiple sequence alignment BLOSUM Point accepted mutation (PAM)PAM VS BLOSUM
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. This presentation deals with what, why, how, where and who of PDB. In this presentation we have also included briefing about various file formats available in PDB with emphasis on PDB file format
Bioinformatics is the application of Information technology to store, organize and analyze the vast amount of biological data which is available in the form of sequences and structures of proteins and nucleic acids. The biological information of nucleic acids is available as sequences while the data of proteins is available as sequences and structures.
A biological database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. The activity of preparing a database can be divided in to:
Collection of data in a form which can be easily accessed
Making it available to a multi-user system (always available for the user)
INTRODUCTION.
NCBI.
EMBL.
DDBJ.
CONCLUSION.
REFERENSE.
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health.
The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.
The NCBI houses a series of databases relevant to biotechnology and biomedicine. Major databases include GenBank for DNA sequences and PubMed, a bibliographic database for the biomedical literature.
All these databases are available online through the Entrez search engine.
Lecture delivered by T. Ashok Kumar, Head, Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil, Thuckalay, INDIA. UGC Sponsored National Workshop on BIOINFORMATICS AND GENOME ANALYSIS for College Teachers on August 11 & 12, 2014. Organized by Centre for Bioinformatics, Department of Zoology, NMCC.
Secondary Structure Prediction of proteins Vijay Hemmadi
Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, Garnier, Osguthorbe & Robson (GOR) and Lim. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate (see Kabsch & Sander, 1984 given below). An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Some good references on the subject:
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. This presentation deals with what, why, how, where and who of PDB. In this presentation we have also included briefing about various file formats available in PDB with emphasis on PDB file format
Bioinformatics is the application of Information technology to store, organize and analyze the vast amount of biological data which is available in the form of sequences and structures of proteins and nucleic acids. The biological information of nucleic acids is available as sequences while the data of proteins is available as sequences and structures.
A biological database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. The activity of preparing a database can be divided in to:
Collection of data in a form which can be easily accessed
Making it available to a multi-user system (always available for the user)
INTRODUCTION.
NCBI.
EMBL.
DDBJ.
CONCLUSION.
REFERENSE.
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health.
The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.
The NCBI houses a series of databases relevant to biotechnology and biomedicine. Major databases include GenBank for DNA sequences and PubMed, a bibliographic database for the biomedical literature.
All these databases are available online through the Entrez search engine.
Lecture delivered by T. Ashok Kumar, Head, Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil, Thuckalay, INDIA. UGC Sponsored National Workshop on BIOINFORMATICS AND GENOME ANALYSIS for College Teachers on August 11 & 12, 2014. Organized by Centre for Bioinformatics, Department of Zoology, NMCC.
Secondary Structure Prediction of proteins Vijay Hemmadi
Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parameters. Probably the most famous early methods are those of Chou & Fasman, Garnier, Osguthorbe & Robson (GOR) and Lim. Although the authors originally claimed quite high accuracies (70-80 %), under careful examination, the methods were shown to be only between 56 and 60% accurate (see Kabsch & Sander, 1984 given below). An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method.
Some good references on the subject:
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics
"A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information."
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
2. Computational Terms & Definitions
Protein Sequence – 20 AA characters [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y] in sequence
Protein Structure – 3D of atomic co-ordinates [x-axis, y-axis, z-axis]
Types of Biological Databases – [Raw Database = Plain text, Object-oriented Database = Table (Records),
Relational Database = Table of tables]
3D Atom Model – [Sphere = Atom, Cylinder = Bond, Dotted Line = Bond Interaction]
Sequence Alignment – [Match = Similar Character, Mismatch = Different Character, Gap = No Substitute
Character, Word = Sub-string, Sequence = Super-string, Score = Rating, Identity = Similar in function]
Motif – Short, conserved sequence associated with a distinct function
Domain – Evolutionarily conserved sequence region that corresponds to a structurally independent 3D
unit associated with a particular functional role. It is usually much larger than a motif
Pattern – Sequence with symbol representation for a expression. Example: N{P}-[ST]{P}A(2,3).
Regular Expression – Representation format for a sequence motif, which includes positional information
for conserved and partly conserved residues. Similar to Pattern, but applies to MSA
Profile – Scoring matrix that represents a multiple sequence alignment. It contains probability or
frequency values of residues for each aligned position in the alignment including gaps
3. UniProtKB/Swiss-Prot/TrEMBL
Universal Protein Resource (UniProt) is a
comprehensive and non-redundant resource for
protein sequence and annotation data
The UniProt databases are the UniProt
Knowledgebase (UniProtKB), the UniProt
Reference Clusters (UniRef), and the UniProt
Archive (UniParc)
UniProt Metagenomic and Environmental
Sequences (UniMES) database is a repository
specifically developed for metagenomic and
environmental data
http://www.uniprot.org/
4. Background of UniProtKB
• UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI),
the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR)
• EMBL-EBI and SIB together used to produce Swiss-Prot and TrEMBL, while PIR produced
the Protein Sequence Database (PIR-PSD)
• Translated EMBL Nucleotide Sequence Data Library (TrEMBL) was originally created
because sequence data was being generated at a pace that exceeded Swiss-Prot's ability
to keep up
• PIR maintained the PIR-PSD and related databases, including iProClass, a database of
protein sequences and curated families
7. NBRF/PIR
The Protein Information Resource (PIR) is an integrated bioinformatics resource for
genomic, proteomic and systems biology research and scientific studies, established by
the National Biomedical Research Foundation (NBRF). PIR offers a wide variety of
resources mainly oriented to assist the propagation and standardization of protein
annotation:
PRO – Protein related ontology
iProClass – Integrated protein knowledgebase
iProLINK – Literature information and knowledgebase
iPTMnet – Integrated protein post-translational modification resource
iProXpress – Integrated protein expression analysis system
RESID Database - Comprehensive collection of annotations and structures for protein
modifications
http://pir.georgetown.edu/
8.
9.
10. MIPS
• Munich Information Center for Protein Sequences (MIPS) is a research center hosted by
Institute of Bioinformatics and Systems Biology (IBIS) and it is part of the Helmholtz
Research Center for Environmental Health, Germany
• MIPS focus on the systematic analysis of genome information including the
development and application of bioinformatics methods in genome annotation, gene
expression analysis and proteomics
• MIPS supports and maintains a set of generic databases as well as the systematic
comparative analysis of microbial, fungal, and plant genomes
• MIPS offers different Databases, Web Services, and Platforms in Genomics, Proteins,
Metabolomics and multi-omics integration, chemical screening, and Disease annotation
HOME PAGE: https://www.helmholtz-muenchen.de/ibis/
PPI: http://mips.helmholtz-muenchen.de/proj/ppi/
11.
12.
13.
14. PROSITE
• PROSITE, a protein domain database for functional characterization and annotation.
• PROSITE consists of entries describing the protein families, domains and functional sites as
well as amino acid patterns and profiles in them.
• PROSITE is manually curated by a team of the Swiss Institute of Bioinformatics and tightly
integrated into Swiss-Prot protein annotation.
• PROSITE is complemented by ProRule, a collection of rules based on profiles and patterns.
• The rules contain information about biologically meaningful residues, like active sites,
substrate- or co-factor-binding sites, posttranslational modification sites or disulfide bonds,
to help function determination.
http://prosite.expasy.org/
18. PRINTS
• PRINTS database is a collection of protein motif fingerprints
• Fingerprint is a group of conserved motifs used to characterize a protein family
• Motifs do not overlap, but are separated along a sequence, though they may be
contiguous in 3D-space to define molecular binding sites or interaction surfaces
• Fingerprints can encode protein folds and functionalities more flexibly and powerfully
than can single motifs
• PRINTS provides detailed annotation resource for protein families, and a diagnostic
tool for newly determined sequences
• PRINTS is a founding partner of the integrated resource, InterPro, a widely used
database of protein families, domains and functional sites
http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/
http://130.88.97.239/PRINTS/
19.
20.
21.
22.
23. BLOCKS
• BLOCKS Database is based on InterPro entries with sequences from Swiss-Prot and
TrEMBL
• Blocks are multiple aligned ungapped segments corresponding to the most highly
conserved regions of proteins
• BLOCKS cross-references to PROSITE and/or PRINTS and/or SMART, and/or Pfam
and/or ProDom entries.
• BLOCKS Database was constructed by the PROTOMAT system using the MOTIF
algorithm
http://blocks.fhcrc.org/
24.
25.
26.
27. Pfam
• The Pfam database is a large collection of protein families, each represented by multiple
sequence alignments and hidden Markov models (HMMs).
• Pfam version 31.0 was produced at the EBI using a sequence database called Pfamseq,
which is based on UniProtKB.
• Pfam 31.0 has 16,712 families
• The descriptions of Pfam families are managed by the general public using Wikipedia.
• The Pfam database contains information about protein domains and families.
• Pfam-A is the manually curated portion of the database
• Pfam-B contains a large number of small families derived from clusters produced by an
algorithm called ADDA (for automatic generation).
• Pfam-B families can be useful when no Pfam-A families are found (but lower quality).
http://pfam.xfam.org/
28. Classification of Pfam Entries
• Family - A collection of related protein regions
• Domain - A structural unit
• Repeat - A short unit which is unstable in isolation but forms a stable structure when
multiple copies are present
• Motifs - A short unit found outside globular domains
• Coiled-Coil - Regions that predominantly contain coiled-coil motifs, regions that typically
contain alpha-helices that are coiled together in bundles of 2-7.
• Disordered - Regions that are conserved, yet are either shown or predicted to contain
bias sequence composition and/or are intrinsically disordered (non-globular).
• Clans - A collection of families that have arisen from a single evolutionary origin
• Related Pfam entries are grouped together into clans; the relationship may be defined
by similarity of sequence, structure or profile-HMM.
29.
30.
31.
32. NRDB/NRDB90
• NRDB (Non-Redundant DataBase) is a so-called non-redundant composite of the following
sources: PDB, RefSeq, UniProtKB/Swiss-Prot, DDBJ, EMBL, GenBank, and PIR
• NRDB is similar in content to OWL, but contains non-redundant and more up-to-date
information
• NRDB is not non-redundant, but non-identical - i.e., only identical sequence copies are
removed from the database
• NRDB algorithm was written by Warren Gish at Washington University to construct database
called NRDB90
• NRDB contains sequences which do not have homologues with sequence identity of 90% or
more
• NRDB is currently maintained by NCBI
http://www.ebi.ac.uk/~holm/nrdb90/ [MOVED]
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
33.
34.
35. OWL
• OWL is a non-redundant composite of 4 publicly-available primary sources: Swiss-
Prot, PIR, GenBank (translation) and NRL-3D
• Swiss-Prot is the highest priority source, all others being compared against it to
eliminate identical and trivially-different sequences
• The strict redundancy criteria render OWL relatively “small” and hence efficient in
similarity searches
http://www.bioinf.man.ac.uk/dbbrowser/OWL
http://130.88.97.239/OWL/
36.
37.
38.
39. PDB
• The Protein Data Bank (PDB) archive is the single worldwide repository of information
about the 3D structures of large biological molecules, including proteins and nucleic
acids.
• The PDB was established in 1971 at Brookhaven National Laboratory (BNL) under the
leadership of Walter Hamilton and originally contained 7 structures.
• In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became
responsible for the management of the PDB.
• In 2003, the wwPDB was formed to maintain a single PDB archive of macromolecular
structural data that is freely and publicly available to the global community.
• The RCSB PDB supports a website where visitors can perform simple and complex queries
on the data, analyze, and visualize the results.
• Members of wwPDB are: RCSB PDB (USA), PDBe (Europe) and PDBj (Japan), and
Biological Magnetic Resonance Data Bank BMRB (USA).
http://rcsb.org/pdb/
40.
41.
42. SCOP2
• The SCOP (Structural Classification of Proteins) database is a large manual classification of protein
structural domains based on similarities of their structures and amino acid sequences.
• A motivation for this classification is to determine the evolutionary relationship between proteins.
• Proteins with the same shapes but having little sequence or functional similarity are placed in
different “superfamilies”, and are assumed to have only a very distant common ancestor.
• Proteins having the same shape and some similarity of sequence and/or function are placed in
“families”, and are assumed to have a closer common ancestor.
• SCOP has been discontinued and the last official version of SCOP is 1.75. SCOP1.75 is also known as
SCOP2.
• SCOP2 offers two different ways for accessing data: SCOP2-browser, and SCOP2-graph.
• SCOP2-browser allows navigation in a traditional way by browsing pages displaying the node
information.
• SCOP2-graph is a graph-based web tool for display and navigation.
• The source of protein structures is the Protein Data Bank.
http://scop2.mrc-lmb.cam.ac.uk/
43. Classification of SCOP Entries
• The unit of classification of structure in SCOP is the protein domain.
• The levels of SCOP are as follows.
1. Class: Types of folds, e.g., all α, all β, α/β, α+β, α&β, etc.
2. Fold: The different shapes of domains within a class, e.g., 2 helices; antiparallel hairpin,
left-handed twist, etc.
3. Superfamily: The domains in a fold are grouped into superfamilies, which have at least
distant common ancestor.
4. Family: The domains in a superfamily are grouped into families, which have recent
common ancestor.
5. Protein domain: The domains in families are grouped into protein domains, which are
essentially the same protein.
6. Species: The domains in “protein domains” are grouped according to species.
7. Domain: It is part of a protein. For simple proteins, it can be the entire protein.
47. CATH
• The CATH (Class, Architecture, Topology, and Homologous superfamily) is a semi-
automatic, hierarchical classification of protein domains.
• CATH shares many broad features with its principal rival, SCOP.
• The four main levels of the CATH hierarchy are as follows:
1. Class: the overall secondary-structure content of the domain. e.g., all α, all β, α/β,
α+β, α&β, etc.
2. Architecture: high structural similarity but no evidence of homology. Equivalent to
a fold in SCOP.
3. Topology: a large-scale grouping of topologies which share particular structural
features
4. Homologous superfamily: indicative of a demonstrable evolutionary relationship.
Equivalent to the superfamily level of SCOP.
http://www.cathdb.info/
48.
49.
50.
51.
52. NDB
Nucleic Acid Database (NDB) is a repository of 3D nucleic acid structures and their complexes
Structures available in the NDB include RNA and DNA oligonucleotides with two or more bases
either alone or complexed with proteins or small molecule ligands
NDB contains both primary and derived information about the structures
• Primary information include X-ray crystallography or NMR coordinate data
• Derived information include valence geometry, torsion angles and intermolecular contacts
data
NDB offers varieties of online and offline tools for analyzing nucleic acid structures. The featured
tools include
• RNA 3D Motif Atlas, a representative collection of RNA 3D internal and hairpin loop motifs
• Non-redundant Lists of RNA-containing 3D structures
• RNA Base Triple Atlas, a collection of motifs consisting of two RNA basepairs
• WebFR3D, a webserver for symbolic and geometric searching of RNA 3D structures
• R3D Align, an application for detailed nucleotide to nucleotide alignments of RNA 3D
structures
http://ndbserver.rutgers.edu
53.
54.
55. PQS/PDBePISA/PISA
PISA (Proteins, Interfaces, Structures, and Assemblies), formerly known as PQS (Protein
Quaternary Structure) database, was constructed by EMBL-EBI
PISA is an interactive tool for the exploration of macromolecular interfaces
PISA presents results calculated by certain physico-chemical models for PDB and/or uploaded
macromolecular structures
PISA provides probable quaternary structures (assemblies), their structural and chemical
properties and probable dissociation pattern
http://www.ebi.ac.uk/pdbe/pisa/
56.
57.
58.
59.
60. SYSTERS
SYSTERS (SYSTEmatic Re-Searching) is a collection of graph-based algorithms to hierarchically
partition a large set of protein sequences into homologous families and super-families
SYSTERS are based on an all-against-all database search (using Smith-Waterman comparisons
on a GeneMatcher machine)
The resulting set of protein families contains four different types of clusters based on the
connectivity within their family distance graph with decreasing reliability:
Perfect Clusters (P): all sequences are connected to all other sequences in the cluster
Single Sequence Cluster (S): a special case of perfect cluster
Nested Clusters (N): at least one sequence is connected to all other sequences in the cluster
Overlapping Clusters (O): no sequence is connected to all other sequences in the cluster
http://systers.molgen.mpg.de/ [DISCONTINUED]
61.
62.
63.
64. Motif
• Motif is a search service provided by GenomeNet to search with a protein query sequence
against Motif Libraries
• Supports several motif databases such as Prosite, BLOCKS, ProDom, Pfam, and PRINTS
• Allows you to search protein sequence libraries with your patterns
• Each residue must be separated with - (minus sign)
• x represents any amino acids
• [DE] means either D or E
• {FWY} means any amino acids except for F, W and Y
• A(2,3) means that A appears 2 to 3 times consecutively
• The pattern string must be terminated with . (period)
For example, C-x-{C}-[DN]-x(2)-C-x(5)-C-C.
• Generates a profile from a set of multiple aligned sequences using PFMake or HMMBuild
http://www.genome.jp/tools/motif/