SlideShare a Scribd company logo
10/11/2017
1
Biological Databases
Dr. Ayaz Ahmad
2
Biological databases
1. Biological information and databases
– Overview and definition, types of biological databases
2. Popular databases, records, data format
– Genbank, SwissProt, OMIM, PDB, KEGG, BIND, Pfam, PROSITE, PubMed
3. Accessing biological databases, retrieval systems
– Entrez, SRS
4. Searching biological databases
– Data quality, coverage, redundancy, errors
Textbook:
--T.K.Atwood and D.J. Parry Smith, Introduction to Bioinformatics.
Biological databases: chapters 3 and 4
10/11/2017
2
3
Biological Information
Nucleic acids:
• DNA sequence, genes, gene products (proteins), mutation,
gene coding, distribution patterns, motifs
• Genomics: genome, gene structure and expression, genetic
map, genetic disorder
• RNA sequence, secondary structure, 3D structure,
interactions
Proteins:
• Protein sequence, corresponding gene, secondary structure,
3D structure, function, motifs, homology, interactions
• Proteomics: expression profile, proteins in disease processes
etc.
• Ligands and drugs (inhibitors, activators, substrates,
metabolites)
4
Biological Information
Pathways:
• Molecular networks, biological chain events,
regulation, feedback, kinetic data
Function:
• Binding sites, interactions, molecular action
(binding, chemical reaction, etc.)
• Biological effect (signaling, transport, feedback,
regulation, modification, etc.)
• Functional relationship, protein families, motifs, and
homologs
10/11/2017
3
WHAT IS A DATABASE?
• Structured collection of information.
• Consists of basic units called records or entries.
• Each record consists of fields, which hold pre-defined
data related to the record.
• For example, a protein database would have protein
entries as records and protein properties as fields (e.g.,
name of protein, length, amino-acid sequence)
THE ‘PERFECT’ DATABASE
• Comprehensive, but easy to search.
• Annotated, but not “too annotated”.
• A simple, easy to understand structure.
• Cross-referenced.
• Minimum redundancy.
• Easy retrieval of data.
10/11/2017
4
7
Biological databases
Purpose
1. To disseminate biological data and information
2. To provide biological data in computer-readable form
3. To allow analysis of biological data
TYPES OF MOLECULAR DATABASES
• Primary Databases
– Original submissions by experimentalists
– Content controlled by the submitter
• Examples: GenBank, Trace, SRA, SNP, GEO
• Derived Databases
– Derived from primary data
– Content controlled by third party (e.g. NCBI)
• Examples: NCBI Protein, Refseq, TPA, RefSNP, GEO
datasets, UniGene, Homologene, Structure,
Conserved Domain
10/11/2017
5
PRIMARY VS. DERIVED SEQUENCE
DATABASES
GenBank
Sequencing
Centers
TATAGCCG TATAGCCGTATAGCCG TATAGCCG
Labs
Algorithms
UniGene
Curators
RefSeq
Genome
Assembly
TATAGCCG
AGCTCCGATA
CCGATGACAA
Updated
continually
by NCBI
Updated ONLY
by submitters
Bibliographic Databases
Integrated Databases
Structural Databases
Sequence Databases
Clinical Databases
Types of Biological Databases
10/11/2017
6
“Ten Important Bioinformatics Databases”
GenBank www.ncbi.nlm.nih.gov nucleotide sequences
Ensembl www.ensembl.org human/mouse genome
(and others)
PubMed www.ncbi.nlm.nih.gov literature references
NR www.ncbi.nlm.nih.gov protein sequences
SWISS-PROT www.expasy.ch protein sequences
InterPro www.ebi.ac.uk protein domains
OMIM www.ncbi.nlm.nih.gov genetic diseases
Enzymes www.chem.qmul.ac.uk enzymes
PDB www.rcsb.org/pdb/ protein structures
KEGG www.genome.ad.jp metabolic pathways
Source: Bioinformatics for Dummies
12
GenBank
http://www.ncbi.nih.gov/Genbank/
10/11/2017
7
13
GenBank database
(http://www.ncbi.nih.gov/Genbank/)
– Contains publicly available DNA sequences from more than
100,000 organisms.
– Also contains derived protein sequences, and annotations
describing biological, structural, and other relevant features.
– Accessible through Entrez, NCBI’s integrated retrieval system
– Sequence similarity search tools: BLAST
GenBank
• Annotated collection of all publicly
available nucleotide sequences and their
protein translations.
• Receives sequences produced in
laboratories throughout the world from
more than 100,000 distinct organisms.
• Grows exponentially, doubling every 10
months
10/11/2017
8
GENBANK - PRIMARY SEQUENCE DB
http://www.ncbi.nlm.nih.gov/genbank/
• Nucleotide only sequence database
• Archival in nature
– Historical
– Reflective of submitter point of view
– Redundant
• Data
– Direct submissions
– Batch submissions
– FTP accounts (genome data)
GenBank
•Data shared nightly among three
collaborating databases
•GenBank at NCBI
•DNA Database of Japan (DDBJ)
•EMBL at EBI
10/11/2017
9
The International Sequence Database Collaboration
Source NCBI
GeneBank Release 220
June 2017
• full release every two months
• incremental and cumulative updates daily
• available only through internet
ftp://ftp.ncbi.nih.gov/genbank/
10/11/2017
10
GenBank Record
➢ Header
information that apply to
the whole record
➢ Features
annotations on the record
➢ Sequence
GeneBank Record
modification
date
Header
Locus Name
Sequence Length
Molecule Type
GenBank Division
Modification DateAccession Number
Version Number
10/11/2017
11
GeneBank Record
Link to Seq
FEATURE
GenBank RecordSequence
10/11/2017
12
Direct Submission
• A typical GenBank submission consists of
a single, contiguous stretch of DNA or
RNA sequence (contigs) with annotations
(metadata).
• If part of a nucleotide sequence encodes a
protein, a conceptual translation, called a
CDS (coding sequence) is annotated.
High-Throughput Genomic
Sequence (HTGS)
• HTGS entries are submitted in bulk by
genome centers, processed by an
automated system, and then released to
GenBank.
• Currently, more than 30 genome centers
are submitting data for a number of
organisms, including human, mouse, rat,
rice, and Plasmodium falciparum.
10/11/2017
13
Whole Genome Shotgun
Sequences (WGS)
• Shotgun sequence reads are assembled into contigs,
submitted, and updated as the sequencing project
progresses and new assemblies are computed.
Submission Tools
• BankIt: Web-based form for submission of
a small number of sequences with minimal
annotation to GenBank.
• Sequin: More appropriate for complicated
submissions containing a significant
amount of annotation or many sequences.
10/11/2017
14
Sequence Data Flow and
Processing
• Within 48 hours of direct submission with BankIt or Sequin,
the database staff reviews the submission to determine
whether it meets the minimal criteria and then assigns an
Accession number.
– All sequences must be > 50 bp in length and be sequenced by,
or on behalf of, the group submitting the sequence.
– GenBank will not accept sequences constructed in silico
– GenBank will not accept noncontiguous sequences containing
internal, unsequenced spacers.
– GenBank will not accept sequences for which there is no
physical counterpart, such as those derived from a mix of
genomic DNA and mRNA.
– Submissions are checked to determine whether they are new or
updates.
Sequence Data Flow and
Processing
• Indexing:
– Biological validity: Translation, organism lineage, BLAST
searches
– Vector contamination: Is there any vector DNA present in the
sequence?
– Publication status: If published, citation is included in annotation
and linked to Entrez
– Formatting and spelling
• Sequences are sent to submitter for final review before
release into the public database.
• Sequences must become publicly available once the
accession number or the sequence has been published.
• GenBank annotation staff process about 1900
submissions/month, or about 20,000 sequences.
10/11/2017
15
Essential Bioinformatics and
Biocomputing (LSM2104), NUS 29
DNA databases
• An Example from GenBank– flat file
– Human Alpha-Lactalbumin gene
This protein is a complex of 2 proteins A and B. In the absence of the
B protein, the enzyme catalyzes the transfer of
galactose from UDP-galactose to Nacetylglucosamine (cf. EC 2.4.1.90).
Essential Bioinformatics and
Biocomputing (LSM2104), NUS 30
A GenBank entry – HEADER
10/11/2017
16
31
GenBank Entry – Links provided in the Header
• MapViewer – find the gene position in chromosome
• Related Sequences – other entries related to this gene (or sequence)
• OMIM– link to catalog of human genes and genetic disorders
• Protein – retrieve protein record from GenPept
• Medline and PubMed –literature abstracts related to this gene
• Taxonomy – Classification of organisms
• UniGene – Unified gene data
• UniSTS – Unified sequence tagged sites, marker and mapping data
• LinkOut – links to publishers, aggregators libraries, biological databases,
sequence centers, and other Web resources
• REFSEQ – reference sequence standards
Note: These links are representative. Other links may also be found in GenBank
entries.
Essential Bioinformatics and
Biocomputing (LSM2104), NUS 32
GenBank entry - FEATURES
10/11/2017
17
33
GenBank - SEQUENCE

More Related Content

What's hot

Ncbi
NcbiNcbi
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Subhranil Bhattacharjee
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
Yogesh Joshi
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
Ashfaq Ahmad
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
Santosh Kumar Sahoo
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
Hafiz Muhammad Zeeshan Raza
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
Proteomic databases
Proteomic databasesProteomic databases
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
Bahauddin Zakariya University lahore
 
Ddbj
DdbjDdbj
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)
ShivaniShewale2
 
Rasmol
RasmolRasmol
Prosite
PrositeProsite
UniProt
UniProtUniProt
UniProt
AmnaA7
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 
EMBL
EMBLEMBL

What's hot (20)

Ncbi
NcbiNcbi
Ncbi
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Homology
HomologyHomology
Homology
 
NCBI
NCBINCBI
NCBI
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Proteomic databases
Proteomic databasesProteomic databases
Proteomic databases
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Ddbj
DdbjDdbj
Ddbj
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)
 
Rasmol
RasmolRasmol
Rasmol
 
Prosite
PrositeProsite
Prosite
 
UniProt
UniProtUniProt
UniProt
 
Protein databases
Protein databasesProtein databases
Protein databases
 
EMBL
EMBLEMBL
EMBL
 

Similar to Biological databases

Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
Kim D. Pruitt
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
nguyenpg
 
Genome resource databases in horticutural crops
Genome resource databases in horticutural cropsGenome resource databases in horticutural crops
Genome resource databases in horticutural cropsPulipati Gangadhara Rao
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
VHIR Vall d’Hebron Institut de Recerca
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
nedalalazzwy
 
DNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectiveDNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectivePalaniappan SP
 
DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
Cherry
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Golden Helix Inc
 
02. Biological sequence databases.pptx
02. Biological sequence databases.pptx02. Biological sequence databases.pptx
02. Biological sequence databases.pptx
HussainTaqi1
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
kigaruantony
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.ppt
Bangaluru
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
science lover
 
Ncbi basic intro_v_pitt_kent_osu
Ncbi basic intro_v_pitt_kent_osuNcbi basic intro_v_pitt_kent_osu
Ncbi basic intro_v_pitt_kent_osu
Ben Busby
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Data Driven Innovation
 

Similar to Biological databases (20)

Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
 
Genome resource databases in horticutural crops
Genome resource databases in horticutural cropsGenome resource databases in horticutural crops
Genome resource databases in horticutural crops
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
DNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectiveDNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data Perspective
 
DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
02. Biological sequence databases.pptx
02. Biological sequence databases.pptx02. Biological sequence databases.pptx
02. Biological sequence databases.pptx
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.ppt
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Ncbi basic intro_v_pitt_kent_osu
Ncbi basic intro_v_pitt_kent_osuNcbi basic intro_v_pitt_kent_osu
Ncbi basic intro_v_pitt_kent_osu
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 

More from Ashfaq Ahmad

10000 plus English Vocabulary
10000 plus English Vocabulary10000 plus English Vocabulary
10000 plus English Vocabulary
Ashfaq Ahmad
 
Personality and psychographics
Personality and psychographicsPersonality and psychographics
Personality and psychographics
Ashfaq Ahmad
 
Affinity chromatography
Affinity chromatographyAffinity chromatography
Affinity chromatography
Ashfaq Ahmad
 
Basics of spectroscopy
Basics of spectroscopyBasics of spectroscopy
Basics of spectroscopy
Ashfaq Ahmad
 
Spectroscopy basics
Spectroscopy basicsSpectroscopy basics
Spectroscopy basics
Ashfaq Ahmad
 
High performance liquid chromatography
High performance liquid chromatographyHigh performance liquid chromatography
High performance liquid chromatography
Ashfaq Ahmad
 
Affinity chromatography and gel filteration
Affinity chromatography and gel filterationAffinity chromatography and gel filteration
Affinity chromatography and gel filteration
Ashfaq Ahmad
 
Rflp presentation
Rflp presentationRflp presentation
Rflp presentation
Ashfaq Ahmad
 
Lecture 11 and 12 microbial_sem_6 (1)
Lecture 11 and 12 microbial_sem_6 (1)Lecture 11 and 12 microbial_sem_6 (1)
Lecture 11 and 12 microbial_sem_6 (1)
Ashfaq Ahmad
 
Lecture 9 and 10 microbial_sem_6
Lecture 9 and 10 microbial_sem_6Lecture 9 and 10 microbial_sem_6
Lecture 9 and 10 microbial_sem_6
Ashfaq Ahmad
 
Lecture 7 and 8 microbial_sem_6_20180307
Lecture 7 and 8 microbial_sem_6_20180307Lecture 7 and 8 microbial_sem_6_20180307
Lecture 7 and 8 microbial_sem_6_20180307
Ashfaq Ahmad
 
Lecture 5 and 6 microbial_sem_6_20180307
Lecture 5 and 6 microbial_sem_6_20180307Lecture 5 and 6 microbial_sem_6_20180307
Lecture 5 and 6 microbial_sem_6_20180307
Ashfaq Ahmad
 
Chromatography basics
Chromatography basicsChromatography basics
Chromatography basics
Ashfaq Ahmad
 
Research methodology notes
Research methodology notesResearch methodology notes
Research methodology notes
Ashfaq Ahmad
 
Lecture 2 microbial_sem_6_20180220
Lecture 2 microbial_sem_6_20180220Lecture 2 microbial_sem_6_20180220
Lecture 2 microbial_sem_6_20180220
Ashfaq Ahmad
 
Lecture 1 microbial_sem_6_20170213
Lecture 1 microbial_sem_6_20170213Lecture 1 microbial_sem_6_20170213
Lecture 1 microbial_sem_6_20170213
Ashfaq Ahmad
 
Western blotting
Western blottingWestern blotting
Western blotting
Ashfaq Ahmad
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
Ashfaq Ahmad
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
Ashfaq Ahmad
 
Snp and its role in diseases
Snp and its role in diseasesSnp and its role in diseases
Snp and its role in diseases
Ashfaq Ahmad
 

More from Ashfaq Ahmad (20)

10000 plus English Vocabulary
10000 plus English Vocabulary10000 plus English Vocabulary
10000 plus English Vocabulary
 
Personality and psychographics
Personality and psychographicsPersonality and psychographics
Personality and psychographics
 
Affinity chromatography
Affinity chromatographyAffinity chromatography
Affinity chromatography
 
Basics of spectroscopy
Basics of spectroscopyBasics of spectroscopy
Basics of spectroscopy
 
Spectroscopy basics
Spectroscopy basicsSpectroscopy basics
Spectroscopy basics
 
High performance liquid chromatography
High performance liquid chromatographyHigh performance liquid chromatography
High performance liquid chromatography
 
Affinity chromatography and gel filteration
Affinity chromatography and gel filterationAffinity chromatography and gel filteration
Affinity chromatography and gel filteration
 
Rflp presentation
Rflp presentationRflp presentation
Rflp presentation
 
Lecture 11 and 12 microbial_sem_6 (1)
Lecture 11 and 12 microbial_sem_6 (1)Lecture 11 and 12 microbial_sem_6 (1)
Lecture 11 and 12 microbial_sem_6 (1)
 
Lecture 9 and 10 microbial_sem_6
Lecture 9 and 10 microbial_sem_6Lecture 9 and 10 microbial_sem_6
Lecture 9 and 10 microbial_sem_6
 
Lecture 7 and 8 microbial_sem_6_20180307
Lecture 7 and 8 microbial_sem_6_20180307Lecture 7 and 8 microbial_sem_6_20180307
Lecture 7 and 8 microbial_sem_6_20180307
 
Lecture 5 and 6 microbial_sem_6_20180307
Lecture 5 and 6 microbial_sem_6_20180307Lecture 5 and 6 microbial_sem_6_20180307
Lecture 5 and 6 microbial_sem_6_20180307
 
Chromatography basics
Chromatography basicsChromatography basics
Chromatography basics
 
Research methodology notes
Research methodology notesResearch methodology notes
Research methodology notes
 
Lecture 2 microbial_sem_6_20180220
Lecture 2 microbial_sem_6_20180220Lecture 2 microbial_sem_6_20180220
Lecture 2 microbial_sem_6_20180220
 
Lecture 1 microbial_sem_6_20170213
Lecture 1 microbial_sem_6_20170213Lecture 1 microbial_sem_6_20170213
Lecture 1 microbial_sem_6_20170213
 
Western blotting
Western blottingWestern blotting
Western blotting
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Snp and its role in diseases
Snp and its role in diseasesSnp and its role in diseases
Snp and its role in diseases
 

Recently uploaded

Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 

Recently uploaded (20)

Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 

Biological databases

  • 1. 10/11/2017 1 Biological Databases Dr. Ayaz Ahmad 2 Biological databases 1. Biological information and databases – Overview and definition, types of biological databases 2. Popular databases, records, data format – Genbank, SwissProt, OMIM, PDB, KEGG, BIND, Pfam, PROSITE, PubMed 3. Accessing biological databases, retrieval systems – Entrez, SRS 4. Searching biological databases – Data quality, coverage, redundancy, errors Textbook: --T.K.Atwood and D.J. Parry Smith, Introduction to Bioinformatics. Biological databases: chapters 3 and 4
  • 2. 10/11/2017 2 3 Biological Information Nucleic acids: • DNA sequence, genes, gene products (proteins), mutation, gene coding, distribution patterns, motifs • Genomics: genome, gene structure and expression, genetic map, genetic disorder • RNA sequence, secondary structure, 3D structure, interactions Proteins: • Protein sequence, corresponding gene, secondary structure, 3D structure, function, motifs, homology, interactions • Proteomics: expression profile, proteins in disease processes etc. • Ligands and drugs (inhibitors, activators, substrates, metabolites) 4 Biological Information Pathways: • Molecular networks, biological chain events, regulation, feedback, kinetic data Function: • Binding sites, interactions, molecular action (binding, chemical reaction, etc.) • Biological effect (signaling, transport, feedback, regulation, modification, etc.) • Functional relationship, protein families, motifs, and homologs
  • 3. 10/11/2017 3 WHAT IS A DATABASE? • Structured collection of information. • Consists of basic units called records or entries. • Each record consists of fields, which hold pre-defined data related to the record. • For example, a protein database would have protein entries as records and protein properties as fields (e.g., name of protein, length, amino-acid sequence) THE ‘PERFECT’ DATABASE • Comprehensive, but easy to search. • Annotated, but not “too annotated”. • A simple, easy to understand structure. • Cross-referenced. • Minimum redundancy. • Easy retrieval of data.
  • 4. 10/11/2017 4 7 Biological databases Purpose 1. To disseminate biological data and information 2. To provide biological data in computer-readable form 3. To allow analysis of biological data TYPES OF MOLECULAR DATABASES • Primary Databases – Original submissions by experimentalists – Content controlled by the submitter • Examples: GenBank, Trace, SRA, SNP, GEO • Derived Databases – Derived from primary data – Content controlled by third party (e.g. NCBI) • Examples: NCBI Protein, Refseq, TPA, RefSNP, GEO datasets, UniGene, Homologene, Structure, Conserved Domain
  • 5. 10/11/2017 5 PRIMARY VS. DERIVED SEQUENCE DATABASES GenBank Sequencing Centers TATAGCCG TATAGCCGTATAGCCG TATAGCCG Labs Algorithms UniGene Curators RefSeq Genome Assembly TATAGCCG AGCTCCGATA CCGATGACAA Updated continually by NCBI Updated ONLY by submitters Bibliographic Databases Integrated Databases Structural Databases Sequence Databases Clinical Databases Types of Biological Databases
  • 6. 10/11/2017 6 “Ten Important Bioinformatics Databases” GenBank www.ncbi.nlm.nih.gov nucleotide sequences Ensembl www.ensembl.org human/mouse genome (and others) PubMed www.ncbi.nlm.nih.gov literature references NR www.ncbi.nlm.nih.gov protein sequences SWISS-PROT www.expasy.ch protein sequences InterPro www.ebi.ac.uk protein domains OMIM www.ncbi.nlm.nih.gov genetic diseases Enzymes www.chem.qmul.ac.uk enzymes PDB www.rcsb.org/pdb/ protein structures KEGG www.genome.ad.jp metabolic pathways Source: Bioinformatics for Dummies 12 GenBank http://www.ncbi.nih.gov/Genbank/
  • 7. 10/11/2017 7 13 GenBank database (http://www.ncbi.nih.gov/Genbank/) – Contains publicly available DNA sequences from more than 100,000 organisms. – Also contains derived protein sequences, and annotations describing biological, structural, and other relevant features. – Accessible through Entrez, NCBI’s integrated retrieval system – Sequence similarity search tools: BLAST GenBank • Annotated collection of all publicly available nucleotide sequences and their protein translations. • Receives sequences produced in laboratories throughout the world from more than 100,000 distinct organisms. • Grows exponentially, doubling every 10 months
  • 8. 10/11/2017 8 GENBANK - PRIMARY SEQUENCE DB http://www.ncbi.nlm.nih.gov/genbank/ • Nucleotide only sequence database • Archival in nature – Historical – Reflective of submitter point of view – Redundant • Data – Direct submissions – Batch submissions – FTP accounts (genome data) GenBank •Data shared nightly among three collaborating databases •GenBank at NCBI •DNA Database of Japan (DDBJ) •EMBL at EBI
  • 9. 10/11/2017 9 The International Sequence Database Collaboration Source NCBI GeneBank Release 220 June 2017 • full release every two months • incremental and cumulative updates daily • available only through internet ftp://ftp.ncbi.nih.gov/genbank/
  • 10. 10/11/2017 10 GenBank Record ➢ Header information that apply to the whole record ➢ Features annotations on the record ➢ Sequence GeneBank Record modification date Header Locus Name Sequence Length Molecule Type GenBank Division Modification DateAccession Number Version Number
  • 11. 10/11/2017 11 GeneBank Record Link to Seq FEATURE GenBank RecordSequence
  • 12. 10/11/2017 12 Direct Submission • A typical GenBank submission consists of a single, contiguous stretch of DNA or RNA sequence (contigs) with annotations (metadata). • If part of a nucleotide sequence encodes a protein, a conceptual translation, called a CDS (coding sequence) is annotated. High-Throughput Genomic Sequence (HTGS) • HTGS entries are submitted in bulk by genome centers, processed by an automated system, and then released to GenBank. • Currently, more than 30 genome centers are submitting data for a number of organisms, including human, mouse, rat, rice, and Plasmodium falciparum.
  • 13. 10/11/2017 13 Whole Genome Shotgun Sequences (WGS) • Shotgun sequence reads are assembled into contigs, submitted, and updated as the sequencing project progresses and new assemblies are computed. Submission Tools • BankIt: Web-based form for submission of a small number of sequences with minimal annotation to GenBank. • Sequin: More appropriate for complicated submissions containing a significant amount of annotation or many sequences.
  • 14. 10/11/2017 14 Sequence Data Flow and Processing • Within 48 hours of direct submission with BankIt or Sequin, the database staff reviews the submission to determine whether it meets the minimal criteria and then assigns an Accession number. – All sequences must be > 50 bp in length and be sequenced by, or on behalf of, the group submitting the sequence. – GenBank will not accept sequences constructed in silico – GenBank will not accept noncontiguous sequences containing internal, unsequenced spacers. – GenBank will not accept sequences for which there is no physical counterpart, such as those derived from a mix of genomic DNA and mRNA. – Submissions are checked to determine whether they are new or updates. Sequence Data Flow and Processing • Indexing: – Biological validity: Translation, organism lineage, BLAST searches – Vector contamination: Is there any vector DNA present in the sequence? – Publication status: If published, citation is included in annotation and linked to Entrez – Formatting and spelling • Sequences are sent to submitter for final review before release into the public database. • Sequences must become publicly available once the accession number or the sequence has been published. • GenBank annotation staff process about 1900 submissions/month, or about 20,000 sequences.
  • 15. 10/11/2017 15 Essential Bioinformatics and Biocomputing (LSM2104), NUS 29 DNA databases • An Example from GenBank– flat file – Human Alpha-Lactalbumin gene This protein is a complex of 2 proteins A and B. In the absence of the B protein, the enzyme catalyzes the transfer of galactose from UDP-galactose to Nacetylglucosamine (cf. EC 2.4.1.90). Essential Bioinformatics and Biocomputing (LSM2104), NUS 30 A GenBank entry – HEADER
  • 16. 10/11/2017 16 31 GenBank Entry – Links provided in the Header • MapViewer – find the gene position in chromosome • Related Sequences – other entries related to this gene (or sequence) • OMIM– link to catalog of human genes and genetic disorders • Protein – retrieve protein record from GenPept • Medline and PubMed –literature abstracts related to this gene • Taxonomy – Classification of organisms • UniGene – Unified gene data • UniSTS – Unified sequence tagged sites, marker and mapping data • LinkOut – links to publishers, aggregators libraries, biological databases, sequence centers, and other Web resources • REFSEQ – reference sequence standards Note: These links are representative. Other links may also be found in GenBank entries. Essential Bioinformatics and Biocomputing (LSM2104), NUS 32 GenBank entry - FEATURES