Introduction to Bioinformatics: Part 3

Bioinformatics
Types (levels) of
Biological
Databases
Hierarchical
(primary) Raw data EMBL DDBJ GenBank
Object-
Oriented
(Secondary)
Human curated knowledge base
SwissProt UniProt RefSeq SEED
FLyBase
WormBase
Specialized
(Relational)
(Tertiary)
International
databases
(Main sources)
America
Europe
Japan
The Central Paradigm
of Bioinformatics
Genetic Information Molecular Structure Biochemical Function Symptoms
National Institutes
of Health (NIH)
Entrez Molecular Sequence Database System
Computer scientists view DNA as nothing more than a string of letters like any other 'text.'
GenBank
National Institute
of Genetics (NIG)
Getentry
Center for
Information
Biology (CIB)
National Center for Biotechnology
Information (NCBI)
The European Molecular
Biology
Laboratory (EMBL)
The DNA Data Bank of Japan (DDBJ)
Biological sequence
databases
Nucleotide databases
GenBank of NCBI
EMBL
DDBJ
Genome databases
NCBI genomes
Ensembl Genome Browser
UCSC Genome Bioinformatics Site
Protein Databases SwissProt
Sequence Retrieval System (SRS)
The European Bioinformatics Institute (EBI)
This resourceorganizes information on genomes including
sequences,maps, chromosomes,assemblies,and annotations
a biological database that collects DNA sequences
accepts all submitted sequences
not check whether the sequence is accurate or not.
Notes
Annotations of most genes and genomes are done by researchers and except for
specific 'Reference Sequences' (RefSeqs) are not curated by Genbank experts
As a result, the structure of the same gene reported by different labs can be different!
All genomes are full of SNPs and other polymorphisms.
Gene finding algorithms are not perfect, especially when it comes to predicting splice sites.
Alternative splicing can lead to different transcripts (different versions) of the same gene.
there is often no single correct sequence for a given gene
Prints Prosite Profile Blocks
The amount of computational
processing work, however, varies greatly
simple archives of translated sequence
data from identified open reading frames in DNA
provide additional annotation and information related to higher
levels of information regarding structure and functions
The National Microbial Pathogen
Database Resource (NMPDR)
a genomics platform based on subsystem annotation [PATRIC]
It can be easy to treat computational experiments as a big black box(terminal)
Never run a computational experiment
without visually inspecting the data
The computer might not be seeing the data you think it is
it might not be producing the data you think it should
Too large analysis solutions
Use only the first small fragment of your real data.
Generate a small, synthetic input specific to each step
Use a random subset of the available data
Protein Sequencing Methods
Sequence Analysis
Sequencing
Sequence Assembly
Alignment
Searching (in Databases)
Edman Degradation
Mass Spectroscopy
Homology
qualitative- based on common ancestry
Similarity
quantitative
A sequence cannot be
more homologous
high or low homology
A sequence can be
closely related homolog
distant homolog
Sequences can be highly similar but not homologous
Proteins can be highly homologous with low “primary” sequence similarity
Proteins are modular
APPLICATIONS
(Uses)
The comparison of sequences in order to find similarity,
often to infer if they are related (homologous).
Intrinsic features of the sequence like
active sites
post translational modification sites
gene structures
distributions of introns and exons
Identification of
sequence differences to get the genetic
marker like
point mutations
single nucleotide polymorphism (SNP)
molecular structure from sequence alone
Genetic diseases
Revealing the evolution and genetic diversity of sequences and organisms.
Including
Sequencing
Quality control
Assembly
Annotation
Comparison
DNA fragment sequencing
Overlapping windows matching
Local DNA Reassembly
All similars are homologous but not all homologs are similars
Search sequence database for sequence-comparison analysis
Types
Orthology
Ex:
Bovine ribonuclease (digestive enzyme)
Human ribonuclease (digestive enzyme) and
Paralogy
Ex:
Human ribonuclease (digestive enzyme)
Human Angiogenin (stimulates blood vessel growth) and
the plant Flu regulatory protein is present
both in
Arabidopsis (multicellular higher plant)
Chlamydomonas (single cell green algae)
The complex Chlamydomonas version can fully substitute the much simpler
Arabidopsis protein, if transferred from algae to plant genome by means of
molecular cloning.
Two organisms that are very closely related are likely to display very
similar DNA sequences between two orthologs.
Conversely, an organism that is further removed evolutionarily from another organism is
likely to display a greater divergence in the sequence of the orthologs being studied.
the hemoglobin gene of humans and the myoglobin gene of chimpanzees

Introduction to Bioinformatics: Part 3

Recommended

Recommended

More Related Content

Similar to Introduction to Bioinformatics: Part 3

Similar to Introduction to Bioinformatics: Part 3 (20)

More from AhmedAbdElMoniem35

More from AhmedAbdElMoniem35 (20)

Recently uploaded

Recently uploaded (20)

Introduction to Bioinformatics: Part 3