SlideShare a Scribd company logo
INTRODUCTION TO
BIOLOGICAL DATABASES
 Able to recognize various data formats, and their primary
use
 Understand and utilize all types of sequence identifiers
 Understand various feature present in the GenBank flat
files and other Databases
2
Objectives
1. Database type: Primary vs. Secondary
2. Sequence Databases: GenBank, GCG and ACDEB
3. Structure Databases: PDB, MMDB
3. Data Formats
4. Other “databases” and “datasets”
3Outline
 Structured collection of information of related data elements
 Consists of basic units called records or entries.
 Each record consists of fields, which hold pre-defined
data related to the record.
What are databases ??
Tables (entities)
•basic elements of information to track, e.g., gene, organism,
sequence, citation
Columns (fields)
•attributes of tables, e.g. for citation table, title, journal, volume,
author
Rows (records)
•actual data
•whereas fields describe what data is stored, the rows of a table are
where the actual data is stored
Databases
6
Information system
Query system
Storage System
Data
GenBank flat file
PDB file
Interaction Record
Title of a book
Book
Structural organization of data base
Characteristics of perfect database
 Comprehensive, but easy to search.
 Annotated, but not “too annotated”.
 A simple, easy to understand structure.
 Cross-referenced.
 Minimum redundancy.
 Easy retrieval of data.
 Primary Databases
 Original submissions by experimentalists
 Content controlled by the submitter
 Examples: GenBank, EMBL, DDBJ etc
 Derivative Databases
 Derived from primary data
 Content controlled by third party (NCBI)
 Examples: NCBI Protein, Refseq, TPA, RefSNP, GEO datasets,
UniGene, Homologene, Structure, Conserved Domain
Divisions of biological databases
GenBankGenBank
SequencingSequencing
CentersCenters
GA
GAGA
ATT
ATT
C
CGAGA
ATT
ATT
C
C
AT
GAGA
ATT
C
C GAGA
ATT
C
C
TTGACA
ATTGACTA
ACGTGC
TTGACA
CGTGA
ATTGACTA
TATAGCCG
ACGTGC
ACGTGC
ACGTGC
TTGACA
TTGACA
CGTGA
CGTGA
CGTGA
ATTGACTA
ATTGACTA
ATTGACTA
ATTGACTA
TATAGCCG
TATAGCCGTATAGCCG
TATAGCCGTATAGCCG TATAGCCGTATAGCCG TATAGCCG
CATT
GAGA
ATT
C
C GAGA
ATT
C
C LabsLabs
AlgorithmsAlgorithms
UniGene
CuratorsCurators
RefSeq
Genome
Assembly
TATAGCCG
AGCTCCGATA
CCGATGACAA
Updated
continually
by NCBI
Updated
continually
by NCBI
Updated ONLY
by submitters
Primary vs. Secondary sequence databases
“Ten Important Bioinformatics Databases”
GenBank www.ncbi.nlm.nih.gov nucleotide sequences
Ensembl www.ensembl.org human/mouse genome
(and others)
PubMed www.ncbi.nlm.nih.gov literature references
NR www.ncbi.nlm.nih.gov protein sequences
SWISS-PROT www.expasy.ch protein sequences
InterPro www.ebi.ac.uk protein domains
OMIM www.ncbi.nlm.nih.gov genetic diseases
Enzymes www.chem.qmul.ac.uk enzymes
PDB www.rcsb.org/pdb/ protein structures
KEGG www.genome.ad.jp metabolic pathways
Source: Bioinformatics for Dummies
The National Center for
Biotechnology Information
Created in 1988 as a part of the
National Library of Medicine at NIH
– Establish public databases
– Research in computational biology
– Develop software tools for sequence analysis
– Collection and distribution of biomedical information
Bethesda,MD
NCBI (National Center for Biotechnology
Information)
• over 30 databases including
GenBank, PubMed, OMIM, and
GEO
• Access all NCBI resources via
Entrez
(www.ncbi.nlm.nih.gov/Entrez/)
 Entrez integrated molecular and literature databases
 Free public access to biomedical literature
 PubMed free Medline (3 million searches per day)
 PubMed Central full text online access
 Sequence database
Services provided by NCBI
ENTREZ: A DISCOVERY SYSTEM
Gene
Taxonomy
PubMed
abstracts
Nucleotide
sequences
Protein
sequences
3-D
Structure
3 -D
Structure
Word weight
VAST
BLASTBLAST
Phylogeny
Hard Link
Neighbors
Related Sequences
Neighbors
Related Sequences
BLink
Domains
Neighbors
Related Structures
Pre-computed and pre-compiled data.
•A potential “gold mine” of undiscovered
relationships.
•Used less than expected.
Pre-computed and pre-compiled data.
•A potential “gold mine” of undiscovered
relationships.
•Used less than expected.
 Primary
 GenBank: NCBI’s primary sequence database
 Trace Archive: reads from capillary sequencers
 Sequence Read Archive: next generation data
 Derivative
 GenPept (GenBank translations)
 Outside Protein (UniProt—Swiss-Prot, PDB)
 NCBI Reference Sequences (RefSeq)
Sequence Database at NCBI
 Nucleotide only sequence database
 Archival in nature
 Historical
 Reflective of submitter point of view (subjective)
 Redundant
 Data
 Direct submissions (traditional records)
 Batch submissions
 FTP accounts (genome data)
Genbank: Primary Sequence Database
 Three collaborating databases for Nucleotides
1. GenBank
2. DNA Database of Japan (DDBJ)
3. European Molecular Biology Laboratory (EMBL) Database
Genbank: Primary Sequence Database
www.ncbi.nlm.nih.gov/GenBank
In GenBank, records are grouped for various reasons: understand
this is key to using and fully taking advantage of this database.
26
Guiding principle of GenBank
 You need identifiers which are stable through time
 Need identifiers which will always refer to specific sequences
 Need these identifiers to track history of sequence updates
 Also need feature and annotation identifiers
Identifier
LOCUS, Accession, NID and protein_id27
LOCUS: Unique string of 10 letters and numbers in
the database. Not maintained amongst databases,
and is therefore a poor sequence identifier.
ACCESSION: A unique identifier to that record, citable
entity; does not change when record is updated. A good
record identifier, ideal for citation in publication.
VERSION: : New system where the accession and version play the
same function as the accession and gi number.
Nucleotide gi: Geninfo identifier (gi), a unique integer
which will change every time the sequence changes.
PID: Protein Identifier: g, e or d prefix to gi number.
Can have one or two on one CDS.
Protein gi: Geninfo identifier (gi), a unique integer which
will change every time the sequence changes.
protein_id: Identifier which has the same
structure and function as the nucleotide Accession.version
numbers, but slightlt different format.
LOCUS, Accession, gi and PID28
Accession.version
LOCUS HSU40282 1789 bp mRNA PRI 21-MAY-1998
DEFINITION Homo sapiens integrin-linked kinase (ILK) mRNA, complete cds.
ACCESSION U40282
VERSION U40282.1 GI:3150001
CDS 157..1515
/gene="ILK"
/note="protein serine/threonine kinase"
/codon_start=1
/product="integrin-linked kinase"
/protein_id="AAC16892.1"
/db_xref="PID:g3150002"
/db_xref="GI:3150002"
LOCUS: HSU40282
ACCESSION: U40282
VERSION: U40282.1
GI: 3150001
PID: g3150002
Protein gi: 3150002
protein_id: AAC16892.1 Protein_idprotein gi
ACCESSION
LOCUS
PIDgi
Traditional GenBank Record
ACCESSION U07418
VERSION U07418.1 GI:466461
ACCESSION U07418
VERSION U07418.1 GI:466461
Accession
•Stable
•Reportable
•Universal
Accession
•Stable
•Reportable
•Universal
Version
Tracks changes in sequence
Version
Tracks changes in sequence
GI number
NCBI internal use
GI number
NCBI internal use
well annotatedwell annotated
the sequence is the datathe sequence is the data
GCG Package (Genetics
computer group)
What is GCG (Genetics
Computer Group)
 An integrated package of over 130 programs (the GCG
Wisconsin Package).
 For extensive analyses of nucleic acid and protein
sequences.
 Associated with most major public nucleic acid and
protein databases.
 Works on UNIX OS and Windows.
Why use GCG
 Removes the need for the constant
collection of new software by end users.
 Removes the need to learn new interface as
new software is released.
 Provides a flow of analyses within a single
interface.
 Unix environment allows users to automate
complex, repetitive tasks.
 Allows users to use multiple processors to
accelerate their jobs.
 Supports almost all public databases that
can be updated daily. Fast local search.
Interfaces
 Command Line: Running programs from UNIX
system prompt.
 SeqLab: Graphic User’s Interface, requiring an X
windows display.
 SeqWeb: to a core set of sequence analysis
program.
Limitations with GCG
 The GUI interface does not give the users the full
access to the power of the command line, nor to
the complete set of programs.
 Many programs place a limit of the maximum size
of the sequences that they can handle (350 Kb).
This limitation will be removed in version 11.
Databases GCG Supports
 Nucleic acid databases
 GenBank
 EMBL (abridged)
 Protein databases
 NRL_3D
 UniProt (SWISS-PROT, PIR, TrEMBL)
 PROSITE, Pfam,
Restriction Enzymes (REBASE)
Typical program
Result from MAP analysis
Structural Data Bases
Protein Data Bank (PDB)
Protein Data Bank (PDB)
total
yearly
Protein Data Bank (PDB)
PDB
 HEADER
 COMPND
 SOURCE
 AUTHOR
 DATE
 JRNL
 REMARK
 SECRES
 ATOM COORDINATES
42
HEADER LEUCINE ZIPPER 15-JUL-93 1DGC 1DGC 2
COMPND GCN4 LEUCINE ZIPPER COMPLEXED WITH SPECIFIC 1DGC 3
COMPND 2 ATF/CREB SITE DNA 1DGC 4
SOURCE GCN4: YEAST (SACCHAROMYCES CEREVISIAE); DNA: SYNTHETIC 1DGC 5
AUTHOR T.J.RICHMOND 1DGC 6
REVDAT 1 22-JUN-94 1DGC 0 1DGC 7
JRNL AUTH P.KONIG,T.J.RICHMOND 1DGC 8
JRNL TITL THE X-RAY STRUCTURE OF THE GCN4-BZIP BOUND TO 1DGC 9
JRNL TITL 2 ATF/CREB SITE DNA SHOWS THE COMPLEX DEPENDS ON DNA 1DGC 10
JRNL TITL 3 FLEXIBILITY 1DGC 11
JRNL REF J.MOL.BIOL. V. 233 139 1993 1DGC 12
JRNL REFN ASTM JMOBAK UK ISSN 0022-2836 0070 1DGC 13
REMARK 1 1DGC 14
REMARK 2 1DGC 15
REMARK 2 RESOLUTION. 3.0 ANGSTROMS. 1DGC 16
REMARK 3 1DGC 17
REMARK 3 REFINEMENT. 1DGC 18
REMARK 3 PROGRAM X-PLOR 1DGC 19
REMARK 3 AUTHORS BRUNGER 1DGC 20
REMARK 3 R VALUE 0.216 1DGC 21
REMARK 3 RMSD BOND DISTANCES 0.020 ANGSTROMS 1DGC 22
REMARK 3 RMSD BOND ANGLES 3.86 DEGREES 1DGC 23
REMARK 3 1DGC 24
REMARK 3 NUMBER OF REFLECTIONS 3296 1DGC 25
REMARK 3 RESOLUTION RANGE 10.0 - 3.0 ANGSTROMS 1DGC 26
REMARK 3 DATA CUTOFF 3.0 SIGMA(F) 1DGC 27
REMARK 3 PERCENT COMPLETION 98.2 1DGC 28
REMARK 3 1DGC 29
REMARK 3 NUMBER OF PROTEIN ATOMS 456 1DGC 30
REMARK 3 NUMBER OF NUCLEIC ACID ATOMS 386 1DGC 31
REMARK 4 1DGC 32
REMARK 4 GCN4: TRANSCRIPTIONAL ACTIVATOR OF GENES ENCODING FOR AMINO 1DGC 33
REMARK 4 ACID BIOSYNTHETIC ENZYMES. 1DGC 34
REMARK 5 1DGC 35
REMARK 5 AMINO ACIDS NUMBERING (RESIDUE NUMBER) CORRESPONDS TO THE 1DGC 36
REMARK 5 281 AMINO ACIDS OF INTACT GCN4. 1DGC 37
REMARK 6 1DGC 38
REMARK 6 BZIP SEQUENCE 220 - 281 USED FOR CRYSTALLIZATION. 1DGC 39
REMARK 7 1DGC 40
REMARK 7 MODEL FROM AMINO ACIDS 227 - 281 SINCE AMINO ACIDS 220 - 1DGC 41
REMARK 7 226 ARE NOT WELL ORDERED. 1DGC 42
REMARK 8 1DGC 43
REMARK 8 RESIDUE NUMBERING OF NUCLEOTIDES: 1DGC 44
REMARK 8 5' T G G A G A T G A C G T C A T C T C C 1DGC 45
REMARK 8 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1DGC 46
REMARK 9 1DGC 47
REMARK 9 THE ASYMMETRIC UNIT CONTAINS ONE HALF OF PROTEIN/DNA 1DGC 48
REMARK 9 COMPLEX PER ASYMMETRIC UNIT. 1DGC 49
REMARK 10 1DGC 50
REMARK 10 MOLECULAR DYAD AXIS OF PROTEIN DIMER AND PALINDROMIC HALF 1DGC 51
REMARK 10 SITES OF THE DNA COINCIDES WITH CRYSTALLOGRAPHIC TWO-FOLD 1DGC 52
REMARK 10 AXIS. THE FULL PROTEIN/DNA COMPLEX CAN BE OBTAINED BY 1DGC 53
REMARK 10 APPLYING THE FOLLOWING TRANSFORMATION MATRIX AND 1DGC 54
REMARK 10 TRANSLATION VECTOR TO THE COORDINATES X Y Z: 1DGC 55
REMARK 10 1DGC 56
REMARK 10 0 -1 0 X 117.32 X SYMM 1DGC 57
REMARK 10 -1 0 0 Y + 117.32 = Y SYMM 1DGC 58
REMARK 10 0 0 -1 Z 43.33 Z SYMM 1DGC 59
SEQRES 1 A 62 ILE VAL PRO GLU SER SER ASP PRO ALA ALA LEU LYS ARG 1DGC 60
SEQRES 2 A 62 ALA ARG ASN THR GLU ALA ALA ARG ARG SER ARG ALA ARG 1DGC 61
SEQRES 3 A 62 LYS LEU GLN ARG MET LYS GLN LEU GLU ASP LYS VAL GLU 1DGC 62
SEQRES 4 A 62 GLU LEU LEU SER LYS ASN TYR HIS LEU GLU ASN GLU VAL 1DGC 63
SEQRES 5 A 62 ALA ARG LEU LYS LYS LEU VAL GLY GLU ARG 1DGC 64
SEQRES 1 B 19 T G G A G A T G A C G T C 1DGC 65
SEQRES 2 B 19 A T C T C C 1DGC 66
HELIX 1 A ALA A 228 LYS A 276 1 1DGC 67
CRYST1 58.660 58.660 86.660 90.00 90.00 90.00 P 41 21 2 8 1DGC 68
ORIGX1 1.000000 0.000000 0.000000 0.00000 1DGC 69
ORIGX2 0.000000 1.000000 0.000000 0.00000 1DGC 70
ORIGX3 0.000000 0.000000 1.000000 0.00000 1DGC 71
SCALE1 0.017047 0.000000 0.000000 0.00000 1DGC 72
SCALE2 0.000000 0.017047 0.000000 0.00000 1DGC 73
SCALE3 0.000000 0.000000 0.011539 0.00000 1DGC 74
ATOM 1 N PRO A 227 35.313 108.011 15.140 1.00 38.94 1DGC 75
ATOM 2 CA PRO A 227 34.172 107.658 15.972 1.00 39.82 1DGC 76
ATOM 842 C5 C B 9 57.692 100.286 22.744 1.00 29.82 1DGC 916
ATOM 843 C6 C B 9 58.128 100.193 21.465 1.00 30.63 1DGC 917
TER 844 C B 9 1DGC 918
MASTER 46 0 0 1 0 0 0 6 842 2 0 7 1DGC 919
END 1DGC 920

More Related Content

What's hot

DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
ZoufishanY
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
Alichy Sowmya
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
Hafiz Muhammad Zeeshan Raza
 
ENTREZ.ppt
ENTREZ.pptENTREZ.ppt
ENTREZ.ppt
kishoreGupta17
 
Rasmol
RasmolRasmol
Proteins databases
Proteins databasesProteins databases
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
Sangeeta Das
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
Pranavathiyani G
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
sagrika chugh
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
KAUSHAL SAHU
 
Molecular modeling database
Molecular modeling database Molecular modeling database
Molecular modeling database
Jayati Shrivastava
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
The Oxford College Engineering
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
Gen bank
Gen bankGen bank
BLAST
BLASTBLAST
Cath
CathCath
Cath
Ramya S
 
EMBL
EMBLEMBL

What's hot (20)

DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
ENTREZ.ppt
ENTREZ.pptENTREZ.ppt
ENTREZ.ppt
 
Rasmol
RasmolRasmol
Rasmol
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
NCBI
NCBINCBI
NCBI
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
Molecular modeling database
Molecular modeling database Molecular modeling database
Molecular modeling database
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Gen bank
Gen bankGen bank
Gen bank
 
BLAST
BLASTBLAST
BLAST
 
Cath
CathCath
Cath
 
EMBL
EMBLEMBL
EMBL
 

Similar to Intro to databases

Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
Silpa87
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
VinaKhan1
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
sworna kumari chithiraivelu
 
DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
Cherry
 
02. Biological sequence databases.pptx
02. Biological sequence databases.pptx02. Biological sequence databases.pptx
02. Biological sequence databases.pptx
HussainTaqi1
 
Biological databases
Biological databasesBiological databases
Biological databases
Sarfaraz Nasri
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
Ncbi
NcbiNcbi
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
BITS
 
Introduction to Bioinformatics and DatabasesDay1.ppt
Introduction to Bioinformatics and DatabasesDay1.pptIntroduction to Bioinformatics and DatabasesDay1.ppt
Introduction to Bioinformatics and DatabasesDay1.ppt
khadijarafiq2012
 
Genomic Databases-.pptx
Genomic Databases-.pptxGenomic Databases-.pptx
Genomic Databases-.pptx
jyosthsnakattula
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
KAUSHAL SAHU
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
Saramita De Chakravarti
 
Biological databases
Biological databasesBiological databases
Biological databases
Ashfaq Ahmad
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
DrGopaSarma
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
Rajendra K Labala
 

Similar to Intro to databases (20)

Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
 
02. Biological sequence databases.pptx
02. Biological sequence databases.pptx02. Biological sequence databases.pptx
02. Biological sequence databases.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Ncbi
NcbiNcbi
Ncbi
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Introduction to Bioinformatics and DatabasesDay1.ppt
Introduction to Bioinformatics and DatabasesDay1.pptIntroduction to Bioinformatics and DatabasesDay1.ppt
Introduction to Bioinformatics and DatabasesDay1.ppt
 
Genomic Databases-.pptx
Genomic Databases-.pptxGenomic Databases-.pptx
Genomic Databases-.pptx
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 

More from bhargvi sharma

Nucleic acid database
Nucleic acid database Nucleic acid database
Nucleic acid database bhargvi sharma
 

More from bhargvi sharma (6)

Nucleic acid database
Nucleic acid database Nucleic acid database
Nucleic acid database
 
Microarray technology
Microarray technologyMicroarray technology
Microarray technology
 
Chernobyl Disaster
Chernobyl DisasterChernobyl Disaster
Chernobyl Disaster
 
Downstream processing
Downstream processingDownstream processing
Downstream processing
 
Cheese making
Cheese makingCheese making
Cheese making
 
Antibiotics
AntibioticsAntibiotics
Antibiotics
 

Recently uploaded

Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 

Recently uploaded (20)

Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 

Intro to databases

  • 2.  Able to recognize various data formats, and their primary use  Understand and utilize all types of sequence identifiers  Understand various feature present in the GenBank flat files and other Databases 2 Objectives
  • 3. 1. Database type: Primary vs. Secondary 2. Sequence Databases: GenBank, GCG and ACDEB 3. Structure Databases: PDB, MMDB 3. Data Formats 4. Other “databases” and “datasets” 3Outline
  • 4.  Structured collection of information of related data elements  Consists of basic units called records or entries.  Each record consists of fields, which hold pre-defined data related to the record. What are databases ??
  • 5. Tables (entities) •basic elements of information to track, e.g., gene, organism, sequence, citation Columns (fields) •attributes of tables, e.g. for citation table, title, journal, volume, author Rows (records) •actual data •whereas fields describe what data is stored, the rows of a table are where the actual data is stored Databases
  • 6. 6 Information system Query system Storage System Data GenBank flat file PDB file Interaction Record Title of a book Book Structural organization of data base
  • 7. Characteristics of perfect database  Comprehensive, but easy to search.  Annotated, but not “too annotated”.  A simple, easy to understand structure.  Cross-referenced.  Minimum redundancy.  Easy retrieval of data.
  • 8.  Primary Databases  Original submissions by experimentalists  Content controlled by the submitter  Examples: GenBank, EMBL, DDBJ etc  Derivative Databases  Derived from primary data  Content controlled by third party (NCBI)  Examples: NCBI Protein, Refseq, TPA, RefSNP, GEO datasets, UniGene, Homologene, Structure, Conserved Domain Divisions of biological databases
  • 9. GenBankGenBank SequencingSequencing CentersCenters GA GAGA ATT ATT C CGAGA ATT ATT C C AT GAGA ATT C C GAGA ATT C C TTGACA ATTGACTA ACGTGC TTGACA CGTGA ATTGACTA TATAGCCG ACGTGC ACGTGC ACGTGC TTGACA TTGACA CGTGA CGTGA CGTGA ATTGACTA ATTGACTA ATTGACTA ATTGACTA TATAGCCG TATAGCCGTATAGCCG TATAGCCGTATAGCCG TATAGCCGTATAGCCG TATAGCCG CATT GAGA ATT C C GAGA ATT C C LabsLabs AlgorithmsAlgorithms UniGene CuratorsCurators RefSeq Genome Assembly TATAGCCG AGCTCCGATA CCGATGACAA Updated continually by NCBI Updated continually by NCBI Updated ONLY by submitters Primary vs. Secondary sequence databases
  • 10. “Ten Important Bioinformatics Databases” GenBank www.ncbi.nlm.nih.gov nucleotide sequences Ensembl www.ensembl.org human/mouse genome (and others) PubMed www.ncbi.nlm.nih.gov literature references NR www.ncbi.nlm.nih.gov protein sequences SWISS-PROT www.expasy.ch protein sequences InterPro www.ebi.ac.uk protein domains OMIM www.ncbi.nlm.nih.gov genetic diseases Enzymes www.chem.qmul.ac.uk enzymes PDB www.rcsb.org/pdb/ protein structures KEGG www.genome.ad.jp metabolic pathways Source: Bioinformatics for Dummies
  • 11. The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases – Research in computational biology – Develop software tools for sequence analysis – Collection and distribution of biomedical information Bethesda,MD
  • 12. NCBI (National Center for Biotechnology Information) • over 30 databases including GenBank, PubMed, OMIM, and GEO • Access all NCBI resources via Entrez (www.ncbi.nlm.nih.gov/Entrez/)
  • 13.  Entrez integrated molecular and literature databases  Free public access to biomedical literature  PubMed free Medline (3 million searches per day)  PubMed Central full text online access  Sequence database Services provided by NCBI
  • 14. ENTREZ: A DISCOVERY SYSTEM Gene Taxonomy PubMed abstracts Nucleotide sequences Protein sequences 3-D Structure 3 -D Structure Word weight VAST BLASTBLAST Phylogeny Hard Link Neighbors Related Sequences Neighbors Related Sequences BLink Domains Neighbors Related Structures Pre-computed and pre-compiled data. •A potential “gold mine” of undiscovered relationships. •Used less than expected. Pre-computed and pre-compiled data. •A potential “gold mine” of undiscovered relationships. •Used less than expected.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.  Primary  GenBank: NCBI’s primary sequence database  Trace Archive: reads from capillary sequencers  Sequence Read Archive: next generation data  Derivative  GenPept (GenBank translations)  Outside Protein (UniProt—Swiss-Prot, PDB)  NCBI Reference Sequences (RefSeq) Sequence Database at NCBI
  • 23.  Nucleotide only sequence database  Archival in nature  Historical  Reflective of submitter point of view (subjective)  Redundant  Data  Direct submissions (traditional records)  Batch submissions  FTP accounts (genome data) Genbank: Primary Sequence Database
  • 24.  Three collaborating databases for Nucleotides 1. GenBank 2. DNA Database of Japan (DDBJ) 3. European Molecular Biology Laboratory (EMBL) Database Genbank: Primary Sequence Database
  • 26. In GenBank, records are grouped for various reasons: understand this is key to using and fully taking advantage of this database. 26 Guiding principle of GenBank  You need identifiers which are stable through time  Need identifiers which will always refer to specific sequences  Need these identifiers to track history of sequence updates  Also need feature and annotation identifiers Identifier
  • 27. LOCUS, Accession, NID and protein_id27 LOCUS: Unique string of 10 letters and numbers in the database. Not maintained amongst databases, and is therefore a poor sequence identifier. ACCESSION: A unique identifier to that record, citable entity; does not change when record is updated. A good record identifier, ideal for citation in publication. VERSION: : New system where the accession and version play the same function as the accession and gi number. Nucleotide gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes. PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two on one CDS. Protein gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes. protein_id: Identifier which has the same structure and function as the nucleotide Accession.version numbers, but slightlt different format.
  • 28. LOCUS, Accession, gi and PID28 Accession.version LOCUS HSU40282 1789 bp mRNA PRI 21-MAY-1998 DEFINITION Homo sapiens integrin-linked kinase (ILK) mRNA, complete cds. ACCESSION U40282 VERSION U40282.1 GI:3150001 CDS 157..1515 /gene="ILK" /note="protein serine/threonine kinase" /codon_start=1 /product="integrin-linked kinase" /protein_id="AAC16892.1" /db_xref="PID:g3150002" /db_xref="GI:3150002" LOCUS: HSU40282 ACCESSION: U40282 VERSION: U40282.1 GI: 3150001 PID: g3150002 Protein gi: 3150002 protein_id: AAC16892.1 Protein_idprotein gi ACCESSION LOCUS PIDgi
  • 29. Traditional GenBank Record ACCESSION U07418 VERSION U07418.1 GI:466461 ACCESSION U07418 VERSION U07418.1 GI:466461 Accession •Stable •Reportable •Universal Accession •Stable •Reportable •Universal Version Tracks changes in sequence Version Tracks changes in sequence GI number NCBI internal use GI number NCBI internal use well annotatedwell annotated the sequence is the datathe sequence is the data
  • 31. What is GCG (Genetics Computer Group)  An integrated package of over 130 programs (the GCG Wisconsin Package).  For extensive analyses of nucleic acid and protein sequences.  Associated with most major public nucleic acid and protein databases.  Works on UNIX OS and Windows.
  • 32. Why use GCG  Removes the need for the constant collection of new software by end users.  Removes the need to learn new interface as new software is released.  Provides a flow of analyses within a single interface.  Unix environment allows users to automate complex, repetitive tasks.  Allows users to use multiple processors to accelerate their jobs.  Supports almost all public databases that can be updated daily. Fast local search.
  • 33. Interfaces  Command Line: Running programs from UNIX system prompt.  SeqLab: Graphic User’s Interface, requiring an X windows display.  SeqWeb: to a core set of sequence analysis program.
  • 34. Limitations with GCG  The GUI interface does not give the users the full access to the power of the command line, nor to the complete set of programs.  Many programs place a limit of the maximum size of the sequences that they can handle (350 Kb). This limitation will be removed in version 11.
  • 35. Databases GCG Supports  Nucleic acid databases  GenBank  EMBL (abridged)  Protein databases  NRL_3D  UniProt (SWISS-PROT, PIR, TrEMBL)  PROSITE, Pfam, Restriction Enzymes (REBASE)
  • 37. Result from MAP analysis
  • 40. Protein Data Bank (PDB) total yearly
  • 42. PDB  HEADER  COMPND  SOURCE  AUTHOR  DATE  JRNL  REMARK  SECRES  ATOM COORDINATES 42 HEADER LEUCINE ZIPPER 15-JUL-93 1DGC 1DGC 2 COMPND GCN4 LEUCINE ZIPPER COMPLEXED WITH SPECIFIC 1DGC 3 COMPND 2 ATF/CREB SITE DNA 1DGC 4 SOURCE GCN4: YEAST (SACCHAROMYCES CEREVISIAE); DNA: SYNTHETIC 1DGC 5 AUTHOR T.J.RICHMOND 1DGC 6 REVDAT 1 22-JUN-94 1DGC 0 1DGC 7 JRNL AUTH P.KONIG,T.J.RICHMOND 1DGC 8 JRNL TITL THE X-RAY STRUCTURE OF THE GCN4-BZIP BOUND TO 1DGC 9 JRNL TITL 2 ATF/CREB SITE DNA SHOWS THE COMPLEX DEPENDS ON DNA 1DGC 10 JRNL TITL 3 FLEXIBILITY 1DGC 11 JRNL REF J.MOL.BIOL. V. 233 139 1993 1DGC 12 JRNL REFN ASTM JMOBAK UK ISSN 0022-2836 0070 1DGC 13 REMARK 1 1DGC 14 REMARK 2 1DGC 15 REMARK 2 RESOLUTION. 3.0 ANGSTROMS. 1DGC 16 REMARK 3 1DGC 17 REMARK 3 REFINEMENT. 1DGC 18 REMARK 3 PROGRAM X-PLOR 1DGC 19 REMARK 3 AUTHORS BRUNGER 1DGC 20 REMARK 3 R VALUE 0.216 1DGC 21 REMARK 3 RMSD BOND DISTANCES 0.020 ANGSTROMS 1DGC 22 REMARK 3 RMSD BOND ANGLES 3.86 DEGREES 1DGC 23 REMARK 3 1DGC 24 REMARK 3 NUMBER OF REFLECTIONS 3296 1DGC 25 REMARK 3 RESOLUTION RANGE 10.0 - 3.0 ANGSTROMS 1DGC 26 REMARK 3 DATA CUTOFF 3.0 SIGMA(F) 1DGC 27 REMARK 3 PERCENT COMPLETION 98.2 1DGC 28 REMARK 3 1DGC 29 REMARK 3 NUMBER OF PROTEIN ATOMS 456 1DGC 30 REMARK 3 NUMBER OF NUCLEIC ACID ATOMS 386 1DGC 31 REMARK 4 1DGC 32 REMARK 4 GCN4: TRANSCRIPTIONAL ACTIVATOR OF GENES ENCODING FOR AMINO 1DGC 33 REMARK 4 ACID BIOSYNTHETIC ENZYMES. 1DGC 34 REMARK 5 1DGC 35 REMARK 5 AMINO ACIDS NUMBERING (RESIDUE NUMBER) CORRESPONDS TO THE 1DGC 36 REMARK 5 281 AMINO ACIDS OF INTACT GCN4. 1DGC 37 REMARK 6 1DGC 38 REMARK 6 BZIP SEQUENCE 220 - 281 USED FOR CRYSTALLIZATION. 1DGC 39 REMARK 7 1DGC 40 REMARK 7 MODEL FROM AMINO ACIDS 227 - 281 SINCE AMINO ACIDS 220 - 1DGC 41 REMARK 7 226 ARE NOT WELL ORDERED. 1DGC 42 REMARK 8 1DGC 43 REMARK 8 RESIDUE NUMBERING OF NUCLEOTIDES: 1DGC 44 REMARK 8 5' T G G A G A T G A C G T C A T C T C C 1DGC 45 REMARK 8 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1DGC 46 REMARK 9 1DGC 47 REMARK 9 THE ASYMMETRIC UNIT CONTAINS ONE HALF OF PROTEIN/DNA 1DGC 48 REMARK 9 COMPLEX PER ASYMMETRIC UNIT. 1DGC 49 REMARK 10 1DGC 50 REMARK 10 MOLECULAR DYAD AXIS OF PROTEIN DIMER AND PALINDROMIC HALF 1DGC 51 REMARK 10 SITES OF THE DNA COINCIDES WITH CRYSTALLOGRAPHIC TWO-FOLD 1DGC 52 REMARK 10 AXIS. THE FULL PROTEIN/DNA COMPLEX CAN BE OBTAINED BY 1DGC 53 REMARK 10 APPLYING THE FOLLOWING TRANSFORMATION MATRIX AND 1DGC 54 REMARK 10 TRANSLATION VECTOR TO THE COORDINATES X Y Z: 1DGC 55 REMARK 10 1DGC 56 REMARK 10 0 -1 0 X 117.32 X SYMM 1DGC 57 REMARK 10 -1 0 0 Y + 117.32 = Y SYMM 1DGC 58 REMARK 10 0 0 -1 Z 43.33 Z SYMM 1DGC 59 SEQRES 1 A 62 ILE VAL PRO GLU SER SER ASP PRO ALA ALA LEU LYS ARG 1DGC 60 SEQRES 2 A 62 ALA ARG ASN THR GLU ALA ALA ARG ARG SER ARG ALA ARG 1DGC 61 SEQRES 3 A 62 LYS LEU GLN ARG MET LYS GLN LEU GLU ASP LYS VAL GLU 1DGC 62 SEQRES 4 A 62 GLU LEU LEU SER LYS ASN TYR HIS LEU GLU ASN GLU VAL 1DGC 63 SEQRES 5 A 62 ALA ARG LEU LYS LYS LEU VAL GLY GLU ARG 1DGC 64 SEQRES 1 B 19 T G G A G A T G A C G T C 1DGC 65 SEQRES 2 B 19 A T C T C C 1DGC 66 HELIX 1 A ALA A 228 LYS A 276 1 1DGC 67 CRYST1 58.660 58.660 86.660 90.00 90.00 90.00 P 41 21 2 8 1DGC 68 ORIGX1 1.000000 0.000000 0.000000 0.00000 1DGC 69 ORIGX2 0.000000 1.000000 0.000000 0.00000 1DGC 70 ORIGX3 0.000000 0.000000 1.000000 0.00000 1DGC 71 SCALE1 0.017047 0.000000 0.000000 0.00000 1DGC 72 SCALE2 0.000000 0.017047 0.000000 0.00000 1DGC 73 SCALE3 0.000000 0.000000 0.011539 0.00000 1DGC 74 ATOM 1 N PRO A 227 35.313 108.011 15.140 1.00 38.94 1DGC 75 ATOM 2 CA PRO A 227 34.172 107.658 15.972 1.00 39.82 1DGC 76 ATOM 842 C5 C B 9 57.692 100.286 22.744 1.00 29.82 1DGC 916 ATOM 843 C6 C B 9 58.128 100.193 21.465 1.00 30.63 1DGC 917 TER 844 C B 9 1DGC 918 MASTER 46 0 0 1 0 0 0 6 842 2 0 7 1DGC 919 END 1DGC 920

Editor's Notes

  1. Francis Ouellette August 3rd, 1999 Lecture 2.0
  2. Primary databases serve as a repository of experimentalist sequences (GenBank). Derivative databases are sources of edited/curated sequences (RefSeq…reference sequences, UniGene...genes compared to genetic loci on genomes)
  3. ~11,000 sequences are submitted per day.
  4. Francis Ouellette August 3rd, 1999 Lecture 2.0
  5. Francis Ouellette August 3rd, 1999 Lecture 2.0
  6. Francis Ouellette August 3rd, 1999 Lecture 2.0