SlideShare a Scribd company logo
Database
 A Computerized archive used to store and organize data in such
a way that information can be retrieved easily.
 A database is a repository of information that has a specific
structure that enables the entering and extraction of data
 In general this database structure consists of files or tables,
 each containing numerous records and fields
Conti..
 Database System (DBS) is an integrated collection of related files
along with the detail about their definition, interpretation,
manipulation and maintenance
 A database system is based on the data. Also a database system can
be run or executed by using software called DBMS (Database
Management System).
 A database system controls the data from unauthorized access.
 A database management system (DBMS) is a collection of programs
that enables users to create and maintain a database.
Database management systems
 Database management systems provide several functions in
addition to simple file management:
 control security
 maintain data integrity
 provide for backup and recovery
 control redundancy
 allow data independence
 provide non-procedural query language
 perform automatic query optimization
Organisation
 Organisation:
 flat files
 Relational databases
Flat-file databases
 the simplest form of a database,
 where collections of data, such as nucleotide and amino
acid sequence, are stored as either a large single text file
Conti…
Conti..
 a database that treats all of its data as a collection of
relations
 A relational database stores the data within a number of
tables.
 Each table consists of records and fields (rows and
columns)
Types of Database
 The databases can be classified into three
categories on the basis of the information
stored.
 They are Primary, Secondary and
Composite databases.
 Primary databases contain data that is
derived experimentally.
 They usually store information related to
the sequences or structures of biological
components
 They can be further divided into protein or
nucleotide databases
Primary Database
 This databases contains the raw nucleic acid sequence data
which are produced and submitted by researchers worldwide.
 NCBI(The National Centre for Biotechnology Information)
 GenBank
 DDBJ (DNA data bank of Japan)
 SWISS-PROT(Swiss-Prot )
 PIR (Protein Information Resource)
 PDB(Protein Data Bank)
 TrEMBL (Translated European Molecular Biology Laboratory)
Protein
PIR
MIPS
SWISS-PROT
TrEMBL
Conti…
Secondary Databases
Secondary Databases:
 contain information derived from primary databases.
 store information such as conserved sequences, active
site residues, and signature sequences. Protein
Databank data is stored in secondary databases.
Examples include:
 Class Architecture Topology Homology (CATH),
 Kyoto Encyclopedia of Genes and Genomics (KEGG),
 Protein Families (Pfam)
 and Structural Classification of Proteins (SCOP)
Composite Databases
Composite Databases
 are collections of several primary database resources.
 provide users with various tools and software for analysis of data.
 NCBI being a composite database has stored a lot of sequence of
nucleotide and protein within its server and thereby suffers from
high redundancy in the data deposited
Biological databases
 Biological databases can be broadly classified in to
 Sequence database
 structure database
 and pathway databases.
 Sequence databases are applicable to both nucleic acid sequences
and protein sequences, whereas structure databases are applicable
to only Proteins.
Sequence databases
Sequence databases
 Nucleotide and protein sequence databases represent the most
widely used and some of the best established biological
databases.
 serve as repositories for wet lab results and the primary source
for experimental results.
 Major public data banks included in this type are
 GenBank in USA,
 EMBL (European Molecular Biology Laboratory) in Europe
 and DDBJ (DNADataBank) in Japan
Conti….
 And protein databases includes
 ExPaSy
 UniProt
 PIR
 PDB
 Swiss-Prot
 TrEMBL
NATIONAL CENTER FOR BIOTECHNOLOGY
INFORMATION (NCBI)
 developed at the National Institutes of Health (NIH) in 1988
 Part of national library of medicine at national institute of
health
 provides access to a large amount of biomedical and genomic
information (www.ncbi.nlm.nih.gov/home/
about/mission.shtml).
 It maintains a large scale of databases and bioinformatics
tools as well as services.
 One of the most popular databases is GenBank
Conti…
Mission or role
 The aim is to find novel techniques and methodologies for dealing
with huge and complex data
 and provide better accessibility to analytical and computational
tools.
 Maintenance of biological databases whether primary or
secondary.
 It includes GENEBANK
 NCBI provides the data retrieval systems such as ENTREZ
 Provides computational sources for the analysis of the GENEBANK
data and other biological data
Conti…
Resources
 The resources that are present on this site can be divided
into two major categories:
 1) databases
 2) tools
 The major databases maintained at NCBI are
 GenBank and PubMed (bibliographic database for biomedical literature).
 Other databases include the
 Gene,
 Genome,
 Epigenomics,
 Gene
 Expression
 RefSeq,
 Structure, Database of Short Genetic Variation (dbSNP),
 TAXONOMY, etc.
TOOLS at NCBI
 The NCBI also provides a variety of tools for database search
 The Entrez: is search engine of NCBI
 The other tools include
 Genomes Browser,
 BLAST,
 CDTree,
 Genetic Codes,
 Open Reading Frame Finder (ORF Finder),
 SNP Database Specialized Search Tools,
GenBank
 GenBank (Genetic Sequence Databank)
 GenBank® is the genetic sequence database at the National Center for
Biotechnology Information (NCBI).
 It was established in the year 1982 and now maintained by the
National Center for Biotechnology (NCBI).
 It contains publicly available nucleotide sequences
 DNA sequences can be submitted to GenBank using several different
methods.
 BankIt: Web-based form for submission of a small number of
sequences
 Sequin: More appropriate for complicated submissions containing
many sequences
Structure of Genbank
 A detailed structure of a nucleotide
sequence file format in this database
includes the following:
 1. Locus: This can be defined as a title
given by GenBank itself to name the
sequence entry. It includes the
following:
 a. Locus Name: Similar to accession
number for the sequence.
 b. Sequence Length: Tells the number
of bases existing in the sequence.
Conti….
 c. Molecule-Type: Identifies the
type of nucleic acid sequence.
The various types are mRNA
(which is present as cDNA), rRNA,
snRNA, and DNA.
 d. GB Division: Postulates class of
the data according to
classification criteria of GenBank.
 e. Modification Date: The date on
which the record was modified.
 2. Definition: This denotes the name of the
nucleotide sequence.
 3. Accession: This covers accession number,
accession version, and GI number.
 Accession number can be defined as the
unique identifier associated with each
nucleotide sequence present in the
database.
 4. VERSION - Identification number assigned
to a single, specific sequence in the
database. This number is in the format
“accession.version.”
 5. GI Also a sequence identification
number. Whenever a sequence is changed,
the version number is increased and a new
GI is assigned.
 6. Keyword: Defined words that
were used to index the entries.
 7. The Source: This describes
organism from which sequences
have been obtained.
 8. Organism - The scientific name
(usually genus and species) and
phylogenetic lineage
 9. REFERENCE - Citations of
publications by sequence authors,
the journal from which with the
sequence was derived
 10. Features: These
consist of the
information derived
from the sequence
such as biological
source,
 exon,
 intron,
 promoters,
 CDS
 alternate splice,
 Base Count,
 Origin
European Molecular Biology Laboratory
(EMBL)
 The EMBL Nucleotide Sequence Database is maintained by EBI,
UK
 It was formed in the year 1974
 It develops and maintains a large number of databases, and
scientists can access the data free of cost.
 This database serves as the primary source of nucleotide
sequences for Europe.
 in this database, the nucleotide sequence data generated by
large-scale genome-sequencing projects and those available
from the European Patent Office can be submitted
Conti…
 Data collection is done in collaboration with GenBank
(USA) and the DNA Database of Japan (DDBJ).
 The other genomic databases held at EBI are
 Ensembl (a database of genome annotation)
 Genome Reviews.
 The daily releases of the database contain new
submissions and updated sequence data
 while every 3 months the entire database is released.
DDBJ
 DDBJ: DNA Data Bank of Japan Is a biological database
that collects DNA sequences submitted by researchers.
 It is run by the National Institute of Genetics, Japan.
DDBJ Flat File Format
 The data submitted in DDBJ is managed and retrieved
according to the DDBJ format (flat file).
 The flat file includes the sequence and the information of
who submitted the data, references, source organisms,
and information about the feature, etc
Ensembl Genome Database
 Ensembl is one of several well known genome browsers for the
retrieval of genomic information from several organisms
including human, plants, bacteria and animals.
 Created and maintained by the EBI and the Sanger Center (UK)
databases for green plants
 There are three different comparative genomic databases
for green plants, namely,
 GreenPhylDB,
 Plaza,
 Phytozome
 These databases aim to support studies on genomics
studies related to plant evolution and
 to provides comparative data on genomes and gene
families and the tools for their analysis.
Conti…..
 It provides information on
 genomic context of plant genes,
 Gene homologues, and paralogues,
 RNA transcripts from the given genes,
 peptide sequences, and
 functions of gene families.
 It allows access to complete genome sequences available in the
database.
Protein Databases
Swiss-Prot
Swiss-Prot is a protein sequence and knowledge database.
 It is well known for high quality of annotation, use of
standardized nomenclature, and links to specialized databases.
 its repository contains the amino acid sequence, the protein
name and description, taxonomic data, and citation information
PFAM
 A database of protein families, Pfam contains annotations as
well as multiple sequence alignments generated using hidden
Markov models
Conti…
 TrEMBL: The European Bioinformatics Institute, collaborating with
Swiss-Prot, introduced another database, TrEMBL (translation of EMBL
nucleotide sequence database)
 This database consists of computer annotated entries obtained from
the translation of all coding sequences in the nucleotide databases.
 PIR: The Protein Information Resource (PIR) is an integrated public
bioinformatics resource that supports genomic and proteomic
research and scientific studies
 The PIR serves the scientific community through on-line access, and
performing off-line sequence identification services for researchers.
 It is a database of freely accessible protein sequences which contains
high-quality data and functional information for the proteins
Structure databases
There are many structural database that include
Protein DataBank (PDB)
 Important in solving real problems in molecular biology
 PDB Established in 1972 at Brookhaven National
Laboratory (BNL)
 It contains structural information of the macromolecules
determined by X-ray, crystallographic, NMR methods
 PDB is maintained by the Research Collaboratory for
Structural Bioinformatics (RCSB).
Conti…
 PROSITE: is a database of protein domains and families.
 PROSITE contains biologically significant sites, patterns
and profiles that help to reliably identify to which known
protein family a new sequence belongs.
 CATH: The CATH database (Class, architecure, topology,
homologous superfamily) is a hierarchical classification of
protein domain structures, which clusters proteins at four
major structural levels.
Pathway databases
 Pathway databases
 A pathway database (DB) is a DB that describes
biochemical pathways, reactions, and enzymes
 Some examples of the pathway databases are
 KEGG (The Kyoto Encyclopedia of Genes and Genomes)
 BRENDA,
 Biocyc.
Conti…
 KEGG: The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the
primary resource for the Japanese Genome Net service
 it is a collection of online databases dealing with genomes, enzymatic
pathways, and biological chemicals
 KEGG contains three databases: PATHWAY, GENES, and LIGAND.
 The PATHWAY database stores computerized knowledge on molecular
interaction networks.
 The GENES database contains data concerning sequences of genes and
proteins generated by the genome projects.
 The LIGAND database holds information about the chemical compounds and
chemical reactions that are relevant to cellular processes.
 BioCyc: The BioCyc Database Collection is a compilation of
 pathway and genome information for different organisms.
 It includes two other databases,
 EcoCyc which describes Escherichia coli K-12;
 MetaCyc, which describes pathways for more than 300
organisms.

More Related Content

What's hot

Protein data bank
Protein data bankProtein data bank
Protein data bank
Alichy Sowmya
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
Santosh Kumar Sahoo
 
Ddbj
DdbjDdbj
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
Biological database
Biological databaseBiological database
Biological database
Iqbal college Peringammala TVM
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
Hafiz Muhammad Zeeshan Raza
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
The Oxford College Engineering
 
UniProt
UniProtUniProt
UniProt
AmnaA7
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
Proteins databases
Proteins databasesProteins databases
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
Ashwini
 
Rasmol
RasmolRasmol
Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanks
NithyaNandapal
 
Cath
CathCath
Cath
Ramya S
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
SATHIYA NARAYANAN
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
Rida Khalid
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
geetikaJethra
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
RishikaMaji
 
BLAST
BLASTBLAST

What's hot (20)

Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Ddbj
DdbjDdbj
Ddbj
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Biological database
Biological databaseBiological database
Biological database
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
UniProt
UniProtUniProt
UniProt
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Rasmol
RasmolRasmol
Rasmol
 
Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanks
 
NCBI
NCBINCBI
NCBI
 
Cath
CathCath
Cath
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
BLAST
BLASTBLAST
BLAST
 

Similar to Database in bioinformatics

DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
Cherry
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
PagudalaSangeetha
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
sworna kumari chithiraivelu
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
KAUSHAL SAHU
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
Sangeeta Das
 
Biological databases
Biological databasesBiological databases
Biological databases
Biotech Online
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
DrGopaSarma
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
Swarup Malakar
 
What are Databases?
What are Databases?What are Databases?
What are Databases?
Muzzamilahmed15
 
Bioinformatics
BioinformaticsBioinformatics
Introduction to Bioinformatics and DatabasesDay1.ppt
Introduction to Bioinformatics and DatabasesDay1.pptIntroduction to Bioinformatics and DatabasesDay1.ppt
Introduction to Bioinformatics and DatabasesDay1.ppt
khadijarafiq2012
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
Vidya Kalaivani Rajkumar
 
Biological databases
Biological databasesBiological databases
Biological databases
Sarfaraz Nasri
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
Vidya Kalaivani Rajkumar
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
rnath286
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
BibiQuinah
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
Saramita De Chakravarti
 

Similar to Database in bioinformatics (20)

DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
What are Databases?
What are Databases?What are Databases?
What are Databases?
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to Bioinformatics and DatabasesDay1.ppt
Introduction to Bioinformatics and DatabasesDay1.pptIntroduction to Bioinformatics and DatabasesDay1.ppt
Introduction to Bioinformatics and DatabasesDay1.ppt
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 

More from VinaKhan1

introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
VinaKhan1
 
amoeba infecting brain and eyes
 amoeba infecting brain and eyes amoeba infecting brain and eyes
amoeba infecting brain and eyes
VinaKhan1
 
Entamoebidae
EntamoebidaeEntamoebidae
Entamoebidae
VinaKhan1
 
Order amoebidae
Order amoebidaeOrder amoebidae
Order amoebidae
VinaKhan1
 
6. protein secondry structure ppt
6. protein secondry structure ppt6. protein secondry structure ppt
6. protein secondry structure ppt
VinaKhan1
 
Family cryptosporididae
Family cryptosporididaeFamily cryptosporididae
Family cryptosporididae
VinaKhan1
 
Secondary structure of rna and its predicting elements
Secondary structure of rna and its predicting elementsSecondary structure of rna and its predicting elements
Secondary structure of rna and its predicting elements
VinaKhan1
 

More from VinaKhan1 (7)

introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 
amoeba infecting brain and eyes
 amoeba infecting brain and eyes amoeba infecting brain and eyes
amoeba infecting brain and eyes
 
Entamoebidae
EntamoebidaeEntamoebidae
Entamoebidae
 
Order amoebidae
Order amoebidaeOrder amoebidae
Order amoebidae
 
6. protein secondry structure ppt
6. protein secondry structure ppt6. protein secondry structure ppt
6. protein secondry structure ppt
 
Family cryptosporididae
Family cryptosporididaeFamily cryptosporididae
Family cryptosporididae
 
Secondary structure of rna and its predicting elements
Secondary structure of rna and its predicting elementsSecondary structure of rna and its predicting elements
Secondary structure of rna and its predicting elements
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 

Database in bioinformatics

  • 1. Database  A Computerized archive used to store and organize data in such a way that information can be retrieved easily.  A database is a repository of information that has a specific structure that enables the entering and extraction of data  In general this database structure consists of files or tables,  each containing numerous records and fields
  • 2. Conti..  Database System (DBS) is an integrated collection of related files along with the detail about their definition, interpretation, manipulation and maintenance  A database system is based on the data. Also a database system can be run or executed by using software called DBMS (Database Management System).  A database system controls the data from unauthorized access.  A database management system (DBMS) is a collection of programs that enables users to create and maintain a database.
  • 3. Database management systems  Database management systems provide several functions in addition to simple file management:  control security  maintain data integrity  provide for backup and recovery  control redundancy  allow data independence  provide non-procedural query language  perform automatic query optimization
  • 4. Organisation  Organisation:  flat files  Relational databases Flat-file databases  the simplest form of a database,  where collections of data, such as nucleotide and amino acid sequence, are stored as either a large single text file
  • 6. Conti..  a database that treats all of its data as a collection of relations  A relational database stores the data within a number of tables.  Each table consists of records and fields (rows and columns)
  • 7. Types of Database  The databases can be classified into three categories on the basis of the information stored.  They are Primary, Secondary and Composite databases.  Primary databases contain data that is derived experimentally.  They usually store information related to the sequences or structures of biological components  They can be further divided into protein or nucleotide databases
  • 8. Primary Database  This databases contains the raw nucleic acid sequence data which are produced and submitted by researchers worldwide.  NCBI(The National Centre for Biotechnology Information)  GenBank  DDBJ (DNA data bank of Japan)  SWISS-PROT(Swiss-Prot )  PIR (Protein Information Resource)  PDB(Protein Data Bank)  TrEMBL (Translated European Molecular Biology Laboratory) Protein PIR MIPS SWISS-PROT TrEMBL
  • 10. Secondary Databases Secondary Databases:  contain information derived from primary databases.  store information such as conserved sequences, active site residues, and signature sequences. Protein Databank data is stored in secondary databases. Examples include:  Class Architecture Topology Homology (CATH),  Kyoto Encyclopedia of Genes and Genomics (KEGG),  Protein Families (Pfam)  and Structural Classification of Proteins (SCOP)
  • 11. Composite Databases Composite Databases  are collections of several primary database resources.  provide users with various tools and software for analysis of data.  NCBI being a composite database has stored a lot of sequence of nucleotide and protein within its server and thereby suffers from high redundancy in the data deposited
  • 12. Biological databases  Biological databases can be broadly classified in to  Sequence database  structure database  and pathway databases.  Sequence databases are applicable to both nucleic acid sequences and protein sequences, whereas structure databases are applicable to only Proteins.
  • 13. Sequence databases Sequence databases  Nucleotide and protein sequence databases represent the most widely used and some of the best established biological databases.  serve as repositories for wet lab results and the primary source for experimental results.  Major public data banks included in this type are  GenBank in USA,  EMBL (European Molecular Biology Laboratory) in Europe  and DDBJ (DNADataBank) in Japan
  • 14. Conti….  And protein databases includes  ExPaSy  UniProt  PIR  PDB  Swiss-Prot  TrEMBL
  • 15. NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION (NCBI)  developed at the National Institutes of Health (NIH) in 1988  Part of national library of medicine at national institute of health  provides access to a large amount of biomedical and genomic information (www.ncbi.nlm.nih.gov/home/ about/mission.shtml).  It maintains a large scale of databases and bioinformatics tools as well as services.  One of the most popular databases is GenBank
  • 16. Conti… Mission or role  The aim is to find novel techniques and methodologies for dealing with huge and complex data  and provide better accessibility to analytical and computational tools.  Maintenance of biological databases whether primary or secondary.  It includes GENEBANK  NCBI provides the data retrieval systems such as ENTREZ  Provides computational sources for the analysis of the GENEBANK data and other biological data
  • 17. Conti… Resources  The resources that are present on this site can be divided into two major categories:  1) databases  2) tools
  • 18.  The major databases maintained at NCBI are  GenBank and PubMed (bibliographic database for biomedical literature).  Other databases include the  Gene,  Genome,  Epigenomics,  Gene  Expression  RefSeq,  Structure, Database of Short Genetic Variation (dbSNP),  TAXONOMY, etc.
  • 19. TOOLS at NCBI  The NCBI also provides a variety of tools for database search  The Entrez: is search engine of NCBI  The other tools include  Genomes Browser,  BLAST,  CDTree,  Genetic Codes,  Open Reading Frame Finder (ORF Finder),  SNP Database Specialized Search Tools,
  • 20. GenBank  GenBank (Genetic Sequence Databank)  GenBank® is the genetic sequence database at the National Center for Biotechnology Information (NCBI).  It was established in the year 1982 and now maintained by the National Center for Biotechnology (NCBI).  It contains publicly available nucleotide sequences  DNA sequences can be submitted to GenBank using several different methods.  BankIt: Web-based form for submission of a small number of sequences  Sequin: More appropriate for complicated submissions containing many sequences
  • 21. Structure of Genbank  A detailed structure of a nucleotide sequence file format in this database includes the following:  1. Locus: This can be defined as a title given by GenBank itself to name the sequence entry. It includes the following:  a. Locus Name: Similar to accession number for the sequence.  b. Sequence Length: Tells the number of bases existing in the sequence.
  • 22. Conti….  c. Molecule-Type: Identifies the type of nucleic acid sequence. The various types are mRNA (which is present as cDNA), rRNA, snRNA, and DNA.  d. GB Division: Postulates class of the data according to classification criteria of GenBank.  e. Modification Date: The date on which the record was modified.
  • 23.  2. Definition: This denotes the name of the nucleotide sequence.  3. Accession: This covers accession number, accession version, and GI number.  Accession number can be defined as the unique identifier associated with each nucleotide sequence present in the database.  4. VERSION - Identification number assigned to a single, specific sequence in the database. This number is in the format “accession.version.”  5. GI Also a sequence identification number. Whenever a sequence is changed, the version number is increased and a new GI is assigned.
  • 24.  6. Keyword: Defined words that were used to index the entries.  7. The Source: This describes organism from which sequences have been obtained.  8. Organism - The scientific name (usually genus and species) and phylogenetic lineage  9. REFERENCE - Citations of publications by sequence authors, the journal from which with the sequence was derived
  • 25.  10. Features: These consist of the information derived from the sequence such as biological source,  exon,  intron,  promoters,  CDS  alternate splice,  Base Count,  Origin
  • 26. European Molecular Biology Laboratory (EMBL)  The EMBL Nucleotide Sequence Database is maintained by EBI, UK  It was formed in the year 1974  It develops and maintains a large number of databases, and scientists can access the data free of cost.  This database serves as the primary source of nucleotide sequences for Europe.  in this database, the nucleotide sequence data generated by large-scale genome-sequencing projects and those available from the European Patent Office can be submitted
  • 27. Conti…  Data collection is done in collaboration with GenBank (USA) and the DNA Database of Japan (DDBJ).  The other genomic databases held at EBI are  Ensembl (a database of genome annotation)  Genome Reviews.  The daily releases of the database contain new submissions and updated sequence data  while every 3 months the entire database is released.
  • 28. DDBJ  DDBJ: DNA Data Bank of Japan Is a biological database that collects DNA sequences submitted by researchers.  It is run by the National Institute of Genetics, Japan. DDBJ Flat File Format  The data submitted in DDBJ is managed and retrieved according to the DDBJ format (flat file).  The flat file includes the sequence and the information of who submitted the data, references, source organisms, and information about the feature, etc
  • 29. Ensembl Genome Database  Ensembl is one of several well known genome browsers for the retrieval of genomic information from several organisms including human, plants, bacteria and animals.  Created and maintained by the EBI and the Sanger Center (UK)
  • 30. databases for green plants  There are three different comparative genomic databases for green plants, namely,  GreenPhylDB,  Plaza,  Phytozome  These databases aim to support studies on genomics studies related to plant evolution and  to provides comparative data on genomes and gene families and the tools for their analysis.
  • 31. Conti…..  It provides information on  genomic context of plant genes,  Gene homologues, and paralogues,  RNA transcripts from the given genes,  peptide sequences, and  functions of gene families.  It allows access to complete genome sequences available in the database.
  • 32. Protein Databases Swiss-Prot Swiss-Prot is a protein sequence and knowledge database.  It is well known for high quality of annotation, use of standardized nomenclature, and links to specialized databases.  its repository contains the amino acid sequence, the protein name and description, taxonomic data, and citation information PFAM  A database of protein families, Pfam contains annotations as well as multiple sequence alignments generated using hidden Markov models
  • 33. Conti…  TrEMBL: The European Bioinformatics Institute, collaborating with Swiss-Prot, introduced another database, TrEMBL (translation of EMBL nucleotide sequence database)  This database consists of computer annotated entries obtained from the translation of all coding sequences in the nucleotide databases.  PIR: The Protein Information Resource (PIR) is an integrated public bioinformatics resource that supports genomic and proteomic research and scientific studies  The PIR serves the scientific community through on-line access, and performing off-line sequence identification services for researchers.  It is a database of freely accessible protein sequences which contains high-quality data and functional information for the proteins
  • 34. Structure databases There are many structural database that include Protein DataBank (PDB)  Important in solving real problems in molecular biology  PDB Established in 1972 at Brookhaven National Laboratory (BNL)  It contains structural information of the macromolecules determined by X-ray, crystallographic, NMR methods  PDB is maintained by the Research Collaboratory for Structural Bioinformatics (RCSB).
  • 35. Conti…  PROSITE: is a database of protein domains and families.  PROSITE contains biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs.  CATH: The CATH database (Class, architecure, topology, homologous superfamily) is a hierarchical classification of protein domain structures, which clusters proteins at four major structural levels.
  • 36. Pathway databases  Pathway databases  A pathway database (DB) is a DB that describes biochemical pathways, reactions, and enzymes  Some examples of the pathway databases are  KEGG (The Kyoto Encyclopedia of Genes and Genomes)  BRENDA,  Biocyc.
  • 37. Conti…  KEGG: The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary resource for the Japanese Genome Net service  it is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals  KEGG contains three databases: PATHWAY, GENES, and LIGAND.  The PATHWAY database stores computerized knowledge on molecular interaction networks.  The GENES database contains data concerning sequences of genes and proteins generated by the genome projects.  The LIGAND database holds information about the chemical compounds and chemical reactions that are relevant to cellular processes.
  • 38.  BioCyc: The BioCyc Database Collection is a compilation of  pathway and genome information for different organisms.  It includes two other databases,  EcoCyc which describes Escherichia coli K-12;  MetaCyc, which describes pathways for more than 300 organisms.