3. What we expect from a database..!!
• Sequence, functional, structural information,
related bibliography
• Well Structured and Indexed
• Well cross-referenced (with other databases)
• Periodically updated
• Tools for analysis and visualization
8. Standard contents of a sequence database
• Sequences
• Accession number
• References
• Taxonomic data
• Annotation/curation
• Keywords
• Cross-references
• Documentation
9. NCBI
• Very comprehensive biological database
• GENBANK: The nucleotide sequence database
• Provides 42 different resource
• Provides a simple and easy to use web
interface
http://www.ncbi.nlm.nih.gov/
10. • Sequence submission: done using Bankit or
Sequin
• Search Engine for data retrieval: Entrez
• Retrieves information across all the resources
under NCBI
Example: PubMed, taxonomy, SNP, PubChem
etc.
11. Tools for analysis
• BLAST
• Primer-BLAST
• B-Link
• ORF finder
• Genome workbench
16. Pfam
• Proteins contain conserved regions
• Based on the conserved regions, proteins are
classified into families
• Provides links to external databases like PDB,
SCOP, CATH etc.
17. Pfam: Features
• Sequence search
• View Pfam family
• View a clan
• View a sequence
• View a structure
• Keyword search
18. Gene Indices
• Project aimed at indexing genes and their
variants in the various genome sequences.
• Creating a catalogue of genes in a wide range
of organisms
• Reduce redundancy
21. • PDB – Protein Data Bank
• CATH
• SCOP – Structural Classification of Proteins
22. wwPDB
• Contains information about experimentally
determined structures of proteins, nucleic
acids, and complex assemblies
• RCSB-PDB, PDBe, PDBj, BMRB – repositories of
protein structure data
• Files in PDB, mmCIF, PDBML/XML formats
23. • Advanced search – provides comprehensive
information about a protein.
• Sequence info, domain info, sequence
similarity, literature, apart from the details of
the structure.
• Cross referenced to SCOP and CATH
24. CATH
• Classification of proteins based on domain
structures
• Each protein chopped into individual domains
and assigned into homologous superfamilies.
• Hierarchial domain classification of PDB
entries.
25. CATH hierarchy
• Class – derived from secondary structure content is assigned
automatically
• Architecture – describes gross orientation of secondary
structures, independent of connectivity
•Topology – clusters structures according to their
topological connections and numbers of secondary
structures
•Homologous superfamily – this level groups
together protein domains which are thought to
share a common ancestor and can therefore be
described as homologous
26. SCOP
• Description of structural and evolutionary
relationships between all the proteins with
known structures
• Uses the PDB entries
• Search using keywords or PDB identifiers