SlideShare a Scribd company logo
1 of 56
Biological Databases
SMT. P.SANGEETHA
LECTURER IN BIOTECHNOLOGY
KVRGCW(A), KURNOOL
Biological Databases
A biological database is a large, organized body of persistent data, usually
associated with computerized software designed to update, query, and retrieve
components of the data stored within the system.
The chief objective of the development of a database is to organize data in a set of
structured records to enable easy retrieval of information.
Example. A few popular databases are GenBank from NCBI (National Center for
Biotechnology Information), SwissProt from the Swiss Institute
of Bioinformatics and PIR from the Protein Information Resource.
Importance of Databases
1. Databases act as a store house of information.
2. Databases are used to store and organize data in such a way that
information can be retrieved easily via a variety of search criteria.
3. It facilitates the discovery of new biological insights from raw data.
Importance of Databases
4. Secondary databases have become the molecular biologist’s reference
library over the past decade or so, providing a wealth of information on just
about any gene or gene product that has been investigated by the research
community.
5. It helps to solve cases where many users want to access the same entries of
data.
6. Allows the indexing of data.
7. It helps to remove redundancy of data.
Types of Biological Databases
1. Based on content of biological data
2. Based on the nature of data.
1. Based on content of biological data
1. Primary databases
2. Secondary databases
1. Primary databases
 Primary databases are also called as Archieval Database.
 They are populated with experimentally derived data such as nucleotide
sequence, protein sequence or macromolecular structure.
 Experimental results are submitted directly into the database by researchers, and
the data are essentially archival in nature.
 Once given a database accession number, the data in primary databases are never
changed: they form part of the scientific record.
1. Primary databases
Examples
GenBank and DDBJ (nucleotide sequence)
Protein Data Bank (PDB; coordinates of three-dimensional macromolecular
structures)
2. Secondary databases
Secondary databases comprise data derived from the results of analysing primary
data.
Secondary databases often draw upon information from numerous sources,
including other databases (primary and secondary), controlled vocabularies and
the scientific literature.
They are highly curated, often using a complex combination of computational
algorithms and manual analysis and interpretation to derive new knowledge from
the public record of science.
2. Secondary databases
Examples
InterPro (protein families, motifs and domains)
UniProt Knowledgebase (sequence and functional information on proteins)
Ensembl (variation, function, regulation and more layered onto whole
genome sequences)
2.Based on the nature of data
1. Structural database
2. Sequence database
i. Protein sequence databases
ii. Nucleic Acid sequence databases
1.Structural databases
The structural databases contain structural information for each material
derived from analysis of diffraction data.
EX. PDB, CATH and SCOP
PDB(Protein Data Bank)
www.rcsb.org/pdb/
 The PDB was established in1970’s at the Brookehaven Lab on Long island, New
York State, US.
 In 1999, the management was moved to the Research Collaboratory for
Structural Bioinformatics(RCSB – a joint organisation between Rutgers University,
San Diego Super Computer Centre).
The PDB entries contain the atomic coordinates, and some structural parameters
connected with the atoms or computed from the structures(secondary structure).
PDB(Protein Data Bank)
 The PDB entries contain some annotations, but it is not as comprehensive
as in SWISS PROT.
 There are no legal restrictions on the use of the data in PDB.
 The Protein Data Bank is an archive of experimentally determined three
dimensional structures (3D) of biological macromolecules, serving a global
community of researchers, educators, and students.
PDB(Protein Data Bank)
 The archives contain atomic coordinates, bibliographic citations, primary
and secondary structure information as well as crystallographic structure
factors and NMR(Nuclear Magnetic Resonance) experimental data.
 PDB is the main primary database for 3D structures of biological
macromolecules determined by X-Ray Crystallography and NMR.
PDB(Protein Data Bank)
Structural biologists usually deposit their structures in the PDB on
publication and some scientific journals require this before accepting a
paper.
 It also accepts the experimental data used to determine the structures(X-
Ray Crystallography and NMR) and homology models.
2. Sequence databases
A sequence database is a type of biological database that is composed of a
large collection of computerised nucleic acid sequences or other polymer
sequences stored on a computer. These include
I. Nucleotide databases
II. Protein databases
NCBI(National Centre for Biotechnological Information)
www.ncbi.nlm.nih.gov
 NCBI is a public available tool on web. NCBI was established in November
1988 at the National Library of Medicine in the United States.
 The NLM was chosen because it had experience in creating and
maintaining biomedical databases and as part of the National Institute of
Health(NIH) , it could establish a research program in computational
molecular biology.
NCBI(National Centre for Biotechnological Information)
 The mission of NCBI is to develop new information technologies to aid in understanding of
fundamental molecular and genetic process that control health and disease.
 More specifically, NCBI has been charged with creating automated systems for storing
and analysing knowledge about molecular biology, biochemistry and genetics; facilitating
the use of such databases and software by the research and medical community,
coordinating efforts to gather biotechnology information both nationally and internationally
and performing research into advanced methods of computer based information processing
for analysing the structure and function of biologically important molecules.
NCBI maintains several databases. They are as
follows
 Literature databases
 Entrez databases
 Nucleotide databases
 Genome specific resources
 Tools for data mining
NCBI maintains several databases. They are as
follows
 Tools for Sequence Analysis
 Tools for 3D structure display and Similarity Searching
 Maps
 Resource Statistics
 Collaborative Cancer Research
 FTP (File Transfer Protocol)
1.Nucleotide databases
The nucleotide database is a collection of sequences from several sources including
GenBank, RefSeq,etc.
I.PRIMARY DATABASES OF NUCLEOTIDE SEQUENCES:
These are the chief databases that store and make available raw nucleic acid sequences to
the public and researchers. They are referred to as primary nucleotide sequence databases
since they are the repository of all the nucleic acid sequences.
Ex. GenBank,DDBJ,EMBL
1.EMBL (European Molecular Biological
Laboratory)
www.ebi.ac.uk
 EMBL is the nucleotide sequence database from EBI(European Bioinformatics
Institute).
 The EBI institute manages databases of biological data including nucleic acid,
protein sequences and macromolecular structures.
 The EBI is a pioneer of novel and developmental bioinformatics research.
 The EBI is a centre for research and services in bioinformatics.
1.EMBL (European Molecular Biological
Laboratory)
 The mission of EBI is to ensure that the growing body of information from
molecular biology and genome research is placed in the public domain and is
accessible freely.
 The databases is produced in collaboration with DDBJ and Gen Bank.
 Information can be retrieved from EMBL using the SRS(Sequence Retrieval
System) ; this links the principal DNA and the protein sequence databases with
motif, structure, mapping and other specialist databases.
1.EMBL (European Molecular Biological
Laboratory)
 SRS is one of the most powerful data browsing retrieval tools available.SRS
provides rapid, user friendly access to the large volumes of diverse and
heterogeneous life science data stored in more than 400 internal and public domain
databases.
 It can be used to browse the various biological sequence and literature databases.
 The EBI provides access to many tools for browsing and retrieving biological
related sequence and literature data.
2.DDBJ (DNA Data Bank of Japan)
www.ddbj.nig.ac.jp
 DDBJ began in 1986 as a collaboration with EMBL and GenBank. The database
is produced, maintained and distributed at the National Institute of Genetics.
 Sequences may be submitted to it from all corners of the world by means of a web
based data submission tool.
 The Web is also used to provide standard search tools such as Fast A and BLAST.
2.DDBJ (DNA Data Bank of Japan)
 DDBJ is a sole DNA Databank of Japan which is officially certified to collect the DNA
sequences from researchers and to issue the internationally recognised accession number to
data submitters.
 DDBJ is one of the International DNA databases including EBI responsible for EMBL
database and NCBI responsible for GenBank database.
 Consequently, DDBJ has been collaborating with the two databanks through exchanging
data and information on Internet, and by holding two meetings, the International DNA
DataBank Advisory Meeting and the International DNA DataBanks Collaborative
Meeting(IAM and ICM).
3. GenBank
 GenBank, the DNA database from NCBI incorporates sequences from publicly
available sources.
 Information can be retrieved from GenBank using the Entrez Integrated
Retrieval system; this combines data from the principal DNA and protein sequence
databases with the information from genome maps and protein structures.
 Additional information on sequences can be accessed via MEDLINE facility
which provides abstracts from the original published articles.
3. GenBank
 GenBank may be searched with the user query sequence by means of
NCBI’s web interface to the BLAST suite of programs.
A GenBank includes the sequence files, indices created on various database
fields and information derived from database(Ex.Gen Pept, a database of
translated coding sequences in FastA format). Most commonly used is the
sequence entry file, which contains the sequence itself and descriptive
information relating to it.
3. GenBank
 A GenBank entry consists of keywords, relevant associated sub key words,
and an optional Feature Table, it end is indicated by a // terminator.
 The entry continues with BASE COUNT record which details the
frequency of occurrence of the different base types in the sequence.
2.Secondary databases of nucleotide
sequences
Many of the secondary databases are simply the sub-collection of sequences culled from one
or other of the primary databases such as GenBank or EMBL.
1.Omniome databases:
2. Fly Base Database
3. ACeDB
2.Secondary databases of nucleotide
sequences
1.Omniome databases:
 is a comprehensive microbial resource maintained by TIGR(The Institute for
Genomic Research].
 It has not only the sequence and annotation of each of the completed genomes,
but also has associated information about the organisms[such as taxon and gram
stain pattern], the structure and composition of their DNA molecules and many
other attributes of protein sequences predicted from the DNA sequences.
2.Secondary databases of nucleotide
sequences
2.Fly Base Database :
A consortium sequenced the entire genome of the fruitfly D.melanogaster to
a high degree of completeness and quality.
3.ACeDB :
It is a repository of not only the sequence but also the genetic map as well as
phenotypic information about the C.elegans nematode worm.
II. PROTEIN DATABASES:
A protein database is one or more datasets about protein’s aminoacid
sequence, conformation, structure and features such as active sites.
1.Primary databases of proteins :
The primary databases hold the experimentally determined protein
sequences inferred from the conceptual translation of nucleotide sequences.
1.PIR (Protein Information Resource)
www.pir.georgetown.edu
 The Protein Sequence Database was developed at the National Biomedical
Research Foundation (NBRF) in US.
 It is involved in collaboration with Martinsred Institute for Protein Sequences
(MIPS), Japan International Protein Information database (JIPID).
 PIR was developed by Margaret Dayhoff as a collection of sequences for
investigating evolutionary relationships among proteins.
1.PIR (Protein Information Resource)
The PIR database is split into four distinct sections – PIR1 to PIR4 which
differ in terms of the quality of data, and level of annotation provided.
PIR 1 – contains fully classified and annotated entries
PIR 2 – includes preliminary entries which have not been thoroughly
reviewed and may contain redundancy
PIR 3 – contains unverified entries, which have not been reviewed
1.PIR (Protein Information Resource)
PIR 4 entries fall into 4 categories :
1. Conceptual translations of artefactual sequences.
2. Conceptual translations of sequences that are not transcribed or translated.
3. Protein sequences or conceptual translations that are genetically engineered.
4. Sequence that are not genetically encoded and produced on ribosomes.
One can search for entries or do sequences similarity searches at the PIR site. The database
can be downloaded as a set of files.
2. SWISS PROT
www.expasy.ch/sprot/
 Swiss Prot is a protein sequence database, established in 1986, was
produced collaboratively by the Department of Medical Biochemistry at the
University of Geneva and the EMBL ; after 1994, the collaboration moved to
EMBL’s UK outstation, EBI.
 In 1998, the collaboration moved to Swiss Institute of
Bioinformatics(SIB). Hence, the database is now maintained collaboratively
by SIB and EBI/EMBL.
2. SWISS PROT
 Swiss Prot is a protein sequence database which strives to provide a high
level of annotations such as the description of the function of a protein, its
domain structure, post translational modifications, variants, etc, a minimal
level of redundancy and high level of integration with other databases.
 In 1996, a computer annotated supplement to SWISSPROT was created,
termed TrEMBL.
2. SWISS PROT
In SWISS PROT , as in many sequence databases, two classes of data can be
distinguished :
1. Core data : Core data consists of :
1. Sequence data
2. Citation information(bibliographic references)
3. Taxonomic data(description of the biological source of the protein)
2. SWISS PROT
2. Annotation :
1. Function of protein
2. Post translational modifications
3. Domains and sites
4. Secondary structure
2. SWISS PROT
2. Annotation :
5. Quaternary structure
6. Similarities to other proteins
7. Diseases associated with any member of deficiencies in the protein
8. Sequence conflicts, variants
2. SWISS PROT
Sequence Entry File
 Each line is flagged with a two letter code, which helps to present the
information in a structured way.
 Entries begin with the identification(ID) line and end with a // terminator.
 ID codes can some times change, so an additional identifier, an accession
number(AC NO.), is also provided which ought to remain static between
database releases.
2. SWISS PROT
Sequence Entry File
 Next, the DT lines provide information about data of entry of the sequence
of database and details of when it was last modified.
 The following lines give the gene name(GN), the Organism Species(OS),
and the Organism Classification(OC) within the biological kingdoms.
2. SWISS PROT
Sequence Entry File
CC- Comment lines denote the function of protein, post translational
modifications, similarity and tissue specificity.
 Database cross reference(DR) lines follow the comment field. These provide links
to other biomolecular databases.
 Following the DR lines; (KW) key words and then a number of FT lines are
present.
2. SWISS PROT
Sequence Entry File
 FT line is Feature Table line which highlights the regions of interest in the
sequence including secondary structure, ligand binding sites, post translational
modifications.
 The final section of database entry includes the sequence(SQ) itself. The entry
ends with a //terminator.
SWISS PROT has become the most widely used protein sequence database in the world.
3. PubMed
PubMed is a free resource supporting the search and retrieval of biomedical and
life sciences literature with the aim of improving health–both globally and
personally.
1.The PubMed database contains more than 33 million citations and abstracts of
biomedical literature.
2.It does not include full text journal articles; however, links to the full text are
often present when available from other sources, such as the publisher's website
or PubMed Central (PMC).
3. PubMed
3. It is available to the public online since 1996.
4. PubMed was developed and is maintained by the National Centre for
Biotechnology Information (NCBI), at the U.S. National Library of Medicine
(NLM), located at the National Institutes of Health (NIH).
5. Citations in PubMed primarily stem from the biomedicine and health fields, and
related disciplines such as life sciences, behavioural sciences, chemical sciences, and
bioengineering.
3. PubMed
PubMed facilitates searching across several NLM literature resources:
1.Medline 2. PubMed Central (PMC) 3. Bookshelf
1. MEDLINE
MEDLINE is the largest component of PubMed and consists primarily of
citations from journals selected for MEDLINE; articles indexed with MeSH
(Medical Subject Headings) and curated with funding, genetic, chemical and
other metadata.
3. PubMed
2. PubMed Central (PMC)
Citations for PubMed Central (PMC) articles make up the second largest
component of PubMed.
PMC is a full text archive that includes articles from journals reviewed and
selected by NLM for archiving (current and historical), as well as individual
articles collected for archiving in compliance with funder policies.
3. PubMed
3. Bookshelf
The final component of PubMed is citations for books and some individual
chapters available on Bookshelf.
Bookshelf is a full text archive of books, reports, databases, and other
documents related to biomedical, health, and life sciences.
1. Secondary databases of proteins
The secondary databases are so termed because they contain the results of analysis of the sequences held in
primary databases.
1. PROSITE:
 A set of databases collects together patterns found in protein sequences rather than the complete
sequences.
 PROSITE is one such pattern database.
 The protein motif and pattern are encoded as regular expressions.
The information corresponding to each entry in PROSITE is of two forms – the patterns and the related
descriptive text.
1. Secondary databases of proteins
2. PRINTS:
In the PRINTS database, the protein sequence patterns, are stored as “finger prints”. The information
includes :
1. The first section contains cross links to other databases that have more information about the
characterised family.
2. The second section provides a table showing how many of the motifs that makeup the finger print occurs
in how many of the sequences of that family.
3. The last section of the entry contains the actual fingerprints that are stored as multiple aligned sets of
sequences , the alignment is made without gaps.
1. Secondary databases of proteins
3.Pfam :
Pfam contains the profiles used using Hidden Markov Models(HMM)
.HMM builds the model of the pattern as a series of the match, substitute,
insert or delete state, with scores assigned for alignment to go from one state
to another.
1. Secondary databases of proteins
4.TrEMBL :
 TrEMBL(Translated EMBL) was created in 1996 as a computer annotated
supplement to SWISS –PROT.
 It contains translations of all the coding sequences (COS) in EMBL.
 TrEMBL was designed to address the need for a well structured SWISS PROT
link resource that would allow very rapid access to sequence data from the genome
projects.
THANK
YOU

More Related Content

What's hot (20)

Biological database
Biological databaseBiological database
Biological database
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Scop database
Scop databaseScop database
Scop database
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Prosite
PrositeProsite
Prosite
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Biological databases
Biological databasesBiological databases
Biological databases
 
UniProt
UniProtUniProt
UniProt
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
EMBL
EMBLEMBL
EMBL
 

Similar to Biological databases.pptx

Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanksNithyaNandapal
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu KAUSHAL SAHU
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptxSwarup Malakar
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databasesSangeeta Das
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...BibiQuinah
 
Nucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptxNucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptxkarmandeepkaur7
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introductionDrGopaSarma
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxVandana Yadav03
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptxrnath286
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsRaj Varun
 

Similar to Biological databases.pptx (20)

Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanks
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
What are Databases?
What are Databases?What are Databases?
What are Databases?
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Nucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptxNucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptx
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Databases.ppt
Databases.pptDatabases.ppt
Databases.ppt
 
Biological databases
Biological databasesBiological databases
Biological databases
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

More from PagudalaSangeetha

More from PagudalaSangeetha (10)

6.2 Organic Container Gardening.pptx
6.2 Organic Container Gardening.pptx6.2 Organic Container Gardening.pptx
6.2 Organic Container Gardening.pptx
 
6.1 Urban Farming.pptx
6.1 Urban Farming.pptx6.1 Urban Farming.pptx
6.1 Urban Farming.pptx
 
Bioinformatics.pptx
Bioinformatics.pptxBioinformatics.pptx
Bioinformatics.pptx
 
OMICS.pptx
OMICS.pptxOMICS.pptx
OMICS.pptx
 
PHYLOGENETIC TREE CONSTRUCTION.pptx
PHYLOGENETIC TREE CONSTRUCTION.pptxPHYLOGENETIC TREE CONSTRUCTION.pptx
PHYLOGENETIC TREE CONSTRUCTION.pptx
 
Fatty acid biosynthesis.pptx
Fatty acid biosynthesis.pptxFatty acid biosynthesis.pptx
Fatty acid biosynthesis.pptx
 
Ketone bodies.pptx
Ketone bodies.pptxKetone bodies.pptx
Ketone bodies.pptx
 
Sequence alignment.pptx
Sequence alignment.pptxSequence alignment.pptx
Sequence alignment.pptx
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Vaccines.pptx
Vaccines.pptxVaccines.pptx
Vaccines.pptx
 

Recently uploaded

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 

Recently uploaded (20)

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 

Biological databases.pptx

  • 1. Biological Databases SMT. P.SANGEETHA LECTURER IN BIOTECHNOLOGY KVRGCW(A), KURNOOL
  • 2. Biological Databases A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. The chief objective of the development of a database is to organize data in a set of structured records to enable easy retrieval of information. Example. A few popular databases are GenBank from NCBI (National Center for Biotechnology Information), SwissProt from the Swiss Institute of Bioinformatics and PIR from the Protein Information Resource.
  • 3. Importance of Databases 1. Databases act as a store house of information. 2. Databases are used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria. 3. It facilitates the discovery of new biological insights from raw data.
  • 4. Importance of Databases 4. Secondary databases have become the molecular biologist’s reference library over the past decade or so, providing a wealth of information on just about any gene or gene product that has been investigated by the research community. 5. It helps to solve cases where many users want to access the same entries of data. 6. Allows the indexing of data. 7. It helps to remove redundancy of data.
  • 5. Types of Biological Databases 1. Based on content of biological data 2. Based on the nature of data.
  • 6. 1. Based on content of biological data 1. Primary databases 2. Secondary databases
  • 7. 1. Primary databases  Primary databases are also called as Archieval Database.  They are populated with experimentally derived data such as nucleotide sequence, protein sequence or macromolecular structure.  Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature.  Once given a database accession number, the data in primary databases are never changed: they form part of the scientific record.
  • 8. 1. Primary databases Examples GenBank and DDBJ (nucleotide sequence) Protein Data Bank (PDB; coordinates of three-dimensional macromolecular structures)
  • 9. 2. Secondary databases Secondary databases comprise data derived from the results of analysing primary data. Secondary databases often draw upon information from numerous sources, including other databases (primary and secondary), controlled vocabularies and the scientific literature. They are highly curated, often using a complex combination of computational algorithms and manual analysis and interpretation to derive new knowledge from the public record of science.
  • 10. 2. Secondary databases Examples InterPro (protein families, motifs and domains) UniProt Knowledgebase (sequence and functional information on proteins) Ensembl (variation, function, regulation and more layered onto whole genome sequences)
  • 11. 2.Based on the nature of data 1. Structural database 2. Sequence database i. Protein sequence databases ii. Nucleic Acid sequence databases
  • 12. 1.Structural databases The structural databases contain structural information for each material derived from analysis of diffraction data. EX. PDB, CATH and SCOP
  • 13. PDB(Protein Data Bank) www.rcsb.org/pdb/  The PDB was established in1970’s at the Brookehaven Lab on Long island, New York State, US.  In 1999, the management was moved to the Research Collaboratory for Structural Bioinformatics(RCSB – a joint organisation between Rutgers University, San Diego Super Computer Centre). The PDB entries contain the atomic coordinates, and some structural parameters connected with the atoms or computed from the structures(secondary structure).
  • 14. PDB(Protein Data Bank)  The PDB entries contain some annotations, but it is not as comprehensive as in SWISS PROT.  There are no legal restrictions on the use of the data in PDB.  The Protein Data Bank is an archive of experimentally determined three dimensional structures (3D) of biological macromolecules, serving a global community of researchers, educators, and students.
  • 15. PDB(Protein Data Bank)  The archives contain atomic coordinates, bibliographic citations, primary and secondary structure information as well as crystallographic structure factors and NMR(Nuclear Magnetic Resonance) experimental data.  PDB is the main primary database for 3D structures of biological macromolecules determined by X-Ray Crystallography and NMR.
  • 16. PDB(Protein Data Bank) Structural biologists usually deposit their structures in the PDB on publication and some scientific journals require this before accepting a paper.  It also accepts the experimental data used to determine the structures(X- Ray Crystallography and NMR) and homology models.
  • 17. 2. Sequence databases A sequence database is a type of biological database that is composed of a large collection of computerised nucleic acid sequences or other polymer sequences stored on a computer. These include I. Nucleotide databases II. Protein databases
  • 18. NCBI(National Centre for Biotechnological Information) www.ncbi.nlm.nih.gov  NCBI is a public available tool on web. NCBI was established in November 1988 at the National Library of Medicine in the United States.  The NLM was chosen because it had experience in creating and maintaining biomedical databases and as part of the National Institute of Health(NIH) , it could establish a research program in computational molecular biology.
  • 19. NCBI(National Centre for Biotechnological Information)  The mission of NCBI is to develop new information technologies to aid in understanding of fundamental molecular and genetic process that control health and disease.  More specifically, NCBI has been charged with creating automated systems for storing and analysing knowledge about molecular biology, biochemistry and genetics; facilitating the use of such databases and software by the research and medical community, coordinating efforts to gather biotechnology information both nationally and internationally and performing research into advanced methods of computer based information processing for analysing the structure and function of biologically important molecules.
  • 20. NCBI maintains several databases. They are as follows  Literature databases  Entrez databases  Nucleotide databases  Genome specific resources  Tools for data mining
  • 21. NCBI maintains several databases. They are as follows  Tools for Sequence Analysis  Tools for 3D structure display and Similarity Searching  Maps  Resource Statistics  Collaborative Cancer Research  FTP (File Transfer Protocol)
  • 22. 1.Nucleotide databases The nucleotide database is a collection of sequences from several sources including GenBank, RefSeq,etc. I.PRIMARY DATABASES OF NUCLEOTIDE SEQUENCES: These are the chief databases that store and make available raw nucleic acid sequences to the public and researchers. They are referred to as primary nucleotide sequence databases since they are the repository of all the nucleic acid sequences. Ex. GenBank,DDBJ,EMBL
  • 23. 1.EMBL (European Molecular Biological Laboratory) www.ebi.ac.uk  EMBL is the nucleotide sequence database from EBI(European Bioinformatics Institute).  The EBI institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures.  The EBI is a pioneer of novel and developmental bioinformatics research.  The EBI is a centre for research and services in bioinformatics.
  • 24. 1.EMBL (European Molecular Biological Laboratory)  The mission of EBI is to ensure that the growing body of information from molecular biology and genome research is placed in the public domain and is accessible freely.  The databases is produced in collaboration with DDBJ and Gen Bank.  Information can be retrieved from EMBL using the SRS(Sequence Retrieval System) ; this links the principal DNA and the protein sequence databases with motif, structure, mapping and other specialist databases.
  • 25. 1.EMBL (European Molecular Biological Laboratory)  SRS is one of the most powerful data browsing retrieval tools available.SRS provides rapid, user friendly access to the large volumes of diverse and heterogeneous life science data stored in more than 400 internal and public domain databases.  It can be used to browse the various biological sequence and literature databases.  The EBI provides access to many tools for browsing and retrieving biological related sequence and literature data.
  • 26. 2.DDBJ (DNA Data Bank of Japan) www.ddbj.nig.ac.jp  DDBJ began in 1986 as a collaboration with EMBL and GenBank. The database is produced, maintained and distributed at the National Institute of Genetics.  Sequences may be submitted to it from all corners of the world by means of a web based data submission tool.  The Web is also used to provide standard search tools such as Fast A and BLAST.
  • 27. 2.DDBJ (DNA Data Bank of Japan)  DDBJ is a sole DNA Databank of Japan which is officially certified to collect the DNA sequences from researchers and to issue the internationally recognised accession number to data submitters.  DDBJ is one of the International DNA databases including EBI responsible for EMBL database and NCBI responsible for GenBank database.  Consequently, DDBJ has been collaborating with the two databanks through exchanging data and information on Internet, and by holding two meetings, the International DNA DataBank Advisory Meeting and the International DNA DataBanks Collaborative Meeting(IAM and ICM).
  • 28. 3. GenBank  GenBank, the DNA database from NCBI incorporates sequences from publicly available sources.  Information can be retrieved from GenBank using the Entrez Integrated Retrieval system; this combines data from the principal DNA and protein sequence databases with the information from genome maps and protein structures.  Additional information on sequences can be accessed via MEDLINE facility which provides abstracts from the original published articles.
  • 29. 3. GenBank  GenBank may be searched with the user query sequence by means of NCBI’s web interface to the BLAST suite of programs. A GenBank includes the sequence files, indices created on various database fields and information derived from database(Ex.Gen Pept, a database of translated coding sequences in FastA format). Most commonly used is the sequence entry file, which contains the sequence itself and descriptive information relating to it.
  • 30. 3. GenBank  A GenBank entry consists of keywords, relevant associated sub key words, and an optional Feature Table, it end is indicated by a // terminator.  The entry continues with BASE COUNT record which details the frequency of occurrence of the different base types in the sequence.
  • 31. 2.Secondary databases of nucleotide sequences Many of the secondary databases are simply the sub-collection of sequences culled from one or other of the primary databases such as GenBank or EMBL. 1.Omniome databases: 2. Fly Base Database 3. ACeDB
  • 32. 2.Secondary databases of nucleotide sequences 1.Omniome databases:  is a comprehensive microbial resource maintained by TIGR(The Institute for Genomic Research].  It has not only the sequence and annotation of each of the completed genomes, but also has associated information about the organisms[such as taxon and gram stain pattern], the structure and composition of their DNA molecules and many other attributes of protein sequences predicted from the DNA sequences.
  • 33. 2.Secondary databases of nucleotide sequences 2.Fly Base Database : A consortium sequenced the entire genome of the fruitfly D.melanogaster to a high degree of completeness and quality. 3.ACeDB : It is a repository of not only the sequence but also the genetic map as well as phenotypic information about the C.elegans nematode worm.
  • 34. II. PROTEIN DATABASES: A protein database is one or more datasets about protein’s aminoacid sequence, conformation, structure and features such as active sites. 1.Primary databases of proteins : The primary databases hold the experimentally determined protein sequences inferred from the conceptual translation of nucleotide sequences.
  • 35. 1.PIR (Protein Information Resource) www.pir.georgetown.edu  The Protein Sequence Database was developed at the National Biomedical Research Foundation (NBRF) in US.  It is involved in collaboration with Martinsred Institute for Protein Sequences (MIPS), Japan International Protein Information database (JIPID).  PIR was developed by Margaret Dayhoff as a collection of sequences for investigating evolutionary relationships among proteins.
  • 36. 1.PIR (Protein Information Resource) The PIR database is split into four distinct sections – PIR1 to PIR4 which differ in terms of the quality of data, and level of annotation provided. PIR 1 – contains fully classified and annotated entries PIR 2 – includes preliminary entries which have not been thoroughly reviewed and may contain redundancy PIR 3 – contains unverified entries, which have not been reviewed
  • 37. 1.PIR (Protein Information Resource) PIR 4 entries fall into 4 categories : 1. Conceptual translations of artefactual sequences. 2. Conceptual translations of sequences that are not transcribed or translated. 3. Protein sequences or conceptual translations that are genetically engineered. 4. Sequence that are not genetically encoded and produced on ribosomes. One can search for entries or do sequences similarity searches at the PIR site. The database can be downloaded as a set of files.
  • 38. 2. SWISS PROT www.expasy.ch/sprot/  Swiss Prot is a protein sequence database, established in 1986, was produced collaboratively by the Department of Medical Biochemistry at the University of Geneva and the EMBL ; after 1994, the collaboration moved to EMBL’s UK outstation, EBI.  In 1998, the collaboration moved to Swiss Institute of Bioinformatics(SIB). Hence, the database is now maintained collaboratively by SIB and EBI/EMBL.
  • 39. 2. SWISS PROT  Swiss Prot is a protein sequence database which strives to provide a high level of annotations such as the description of the function of a protein, its domain structure, post translational modifications, variants, etc, a minimal level of redundancy and high level of integration with other databases.  In 1996, a computer annotated supplement to SWISSPROT was created, termed TrEMBL.
  • 40. 2. SWISS PROT In SWISS PROT , as in many sequence databases, two classes of data can be distinguished : 1. Core data : Core data consists of : 1. Sequence data 2. Citation information(bibliographic references) 3. Taxonomic data(description of the biological source of the protein)
  • 41. 2. SWISS PROT 2. Annotation : 1. Function of protein 2. Post translational modifications 3. Domains and sites 4. Secondary structure
  • 42. 2. SWISS PROT 2. Annotation : 5. Quaternary structure 6. Similarities to other proteins 7. Diseases associated with any member of deficiencies in the protein 8. Sequence conflicts, variants
  • 43. 2. SWISS PROT Sequence Entry File  Each line is flagged with a two letter code, which helps to present the information in a structured way.  Entries begin with the identification(ID) line and end with a // terminator.  ID codes can some times change, so an additional identifier, an accession number(AC NO.), is also provided which ought to remain static between database releases.
  • 44. 2. SWISS PROT Sequence Entry File  Next, the DT lines provide information about data of entry of the sequence of database and details of when it was last modified.  The following lines give the gene name(GN), the Organism Species(OS), and the Organism Classification(OC) within the biological kingdoms.
  • 45. 2. SWISS PROT Sequence Entry File CC- Comment lines denote the function of protein, post translational modifications, similarity and tissue specificity.  Database cross reference(DR) lines follow the comment field. These provide links to other biomolecular databases.  Following the DR lines; (KW) key words and then a number of FT lines are present.
  • 46. 2. SWISS PROT Sequence Entry File  FT line is Feature Table line which highlights the regions of interest in the sequence including secondary structure, ligand binding sites, post translational modifications.  The final section of database entry includes the sequence(SQ) itself. The entry ends with a //terminator. SWISS PROT has become the most widely used protein sequence database in the world.
  • 47. 3. PubMed PubMed is a free resource supporting the search and retrieval of biomedical and life sciences literature with the aim of improving health–both globally and personally. 1.The PubMed database contains more than 33 million citations and abstracts of biomedical literature. 2.It does not include full text journal articles; however, links to the full text are often present when available from other sources, such as the publisher's website or PubMed Central (PMC).
  • 48. 3. PubMed 3. It is available to the public online since 1996. 4. PubMed was developed and is maintained by the National Centre for Biotechnology Information (NCBI), at the U.S. National Library of Medicine (NLM), located at the National Institutes of Health (NIH). 5. Citations in PubMed primarily stem from the biomedicine and health fields, and related disciplines such as life sciences, behavioural sciences, chemical sciences, and bioengineering.
  • 49. 3. PubMed PubMed facilitates searching across several NLM literature resources: 1.Medline 2. PubMed Central (PMC) 3. Bookshelf 1. MEDLINE MEDLINE is the largest component of PubMed and consists primarily of citations from journals selected for MEDLINE; articles indexed with MeSH (Medical Subject Headings) and curated with funding, genetic, chemical and other metadata.
  • 50. 3. PubMed 2. PubMed Central (PMC) Citations for PubMed Central (PMC) articles make up the second largest component of PubMed. PMC is a full text archive that includes articles from journals reviewed and selected by NLM for archiving (current and historical), as well as individual articles collected for archiving in compliance with funder policies.
  • 51. 3. PubMed 3. Bookshelf The final component of PubMed is citations for books and some individual chapters available on Bookshelf. Bookshelf is a full text archive of books, reports, databases, and other documents related to biomedical, health, and life sciences.
  • 52. 1. Secondary databases of proteins The secondary databases are so termed because they contain the results of analysis of the sequences held in primary databases. 1. PROSITE:  A set of databases collects together patterns found in protein sequences rather than the complete sequences.  PROSITE is one such pattern database.  The protein motif and pattern are encoded as regular expressions. The information corresponding to each entry in PROSITE is of two forms – the patterns and the related descriptive text.
  • 53. 1. Secondary databases of proteins 2. PRINTS: In the PRINTS database, the protein sequence patterns, are stored as “finger prints”. The information includes : 1. The first section contains cross links to other databases that have more information about the characterised family. 2. The second section provides a table showing how many of the motifs that makeup the finger print occurs in how many of the sequences of that family. 3. The last section of the entry contains the actual fingerprints that are stored as multiple aligned sets of sequences , the alignment is made without gaps.
  • 54. 1. Secondary databases of proteins 3.Pfam : Pfam contains the profiles used using Hidden Markov Models(HMM) .HMM builds the model of the pattern as a series of the match, substitute, insert or delete state, with scores assigned for alignment to go from one state to another.
  • 55. 1. Secondary databases of proteins 4.TrEMBL :  TrEMBL(Translated EMBL) was created in 1996 as a computer annotated supplement to SWISS –PROT.  It contains translations of all the coding sequences (COS) in EMBL.  TrEMBL was designed to address the need for a well structured SWISS PROT link resource that would allow very rapid access to sequence data from the genome projects.