Overview of Bioinformatics
- The applications of computer sciences to molecular
biology in particular to the study of macromolecules
such as proteins and nucleic acids
- Bioinformatics is an interdisciplinary research area
at the interface between computer science and
biological science
Synonyms: Molecular Bioinformatics,
Computational Biology, Biocomputing
Bioinformatics
What is bioinformatics?
Definition: Application of computational and analysis tools to the
capture and interpretation of biological data
Computational Biology is sometimes considered to be
synonymous with Bioinformatics
More commonly, Bioinformatics and Computation Biology are regarded
as overlapping terms as might be represented by a Venn diagram
What does that mean?
Mathematics
IT/Engineering
Statistics
Processor development
Network traffic improvement
Storage solutions
Artificial Intelligence
Pattern recognition
Text mining
Image processing
Simulation
3D structure visualisation
Surface modelling
ontologies
Databases
Sequence alignment
Comparative genomics
Drug design
Protein: protein interactions
Gene finding
Protein folding
Homology searching
Evolutionary modelling
Gene expression analysis
Non-coding RNA
GWAS
Annotation
Epidemiology
Personalised medicine
Biological networks
Bioinformatics Topics
Informatics Biology
Operating Systems
Windows, Macintosh, Linux
All OS options are conceptually identical …
enabling control over files, folders, and programs
Linux command line! … the only option for compute
intense software
Bioinformatics Topics
Informatics Biology
Programming
Sufficient skill to affect basic management of
large datasets is important
Sufficient skill to construct simple customized pipelines
Bioinformatics Topics
Informatics Biology
Statistics
A basic understanding of Statistics is just as vital when
designing an experiment
When large datasets need to be interpreted, it demands a
working familiarity with a quality Statistical Package
Bioinformatics software commonly employs statistics to
select the most probable answer from a set of many possible
answers to a given question
Bioinformatics Topics
Informatics Biology
Data Generation
Experimental Data types include:
Sequences - Typically Next-Generation DNA Sequencing (NGS)
3D Protein Structures - X-ray crystallography or Nuclear
magnetic resonance spectroscopy (NMR)
Gene Expression Data - Microarrays
Bioinformatics Topics
Informatics Biology
Data Analysis
The Alignment of Pairs of Homologous DNA/Protein sequences
Fundamental to most forms of DNA/Protein Sequence analysis
Searching for Homologous Sequences in a Sequence Database
Database searching is the most common Bioinformatics
process by far
Database searching is pairwise comparison repeated many times
A list of matches, ordered by the improbability of occurring just by
chance is generated
Bioinformatics Topics
Informatics Biology
Data/Information Storage/Access
Raw Experimental Data, can next be Annotated in the light of
analytical revelation
Data + Annotation = Information
Information can now be stored in Databases that allow
users easy and unrestricted access
Primary DNA Sequence Databases
Original submission by experimentalists content controlled by the
submitter
EMBL, NCBI-GenBank, DDBJ
Primary Protein Sequence Databases
PIR, Swissprot, TrEMBL
Genome Databases store entire genome sequence(s) AND their
interpretation
Protein Structure Databases
PDB, PDBj, CATH, SCOP
Gene Ontology Database
The Gene Ontology (GO) database provides a hierarchy of formally agreed terms
to describe gene products accurately and unambiguously
Searching with these terms radically improves the efficacy of
annotation searching
A simplistic ordering for the Bioinformatics Topics
Goal
- Ultimate goal - Better understand functions of cell at
the molecular level
- Bioinformatics research (raw seqs and structures)
can generate new insights and provide a “global”
perspective of the cell
- Cell functions can be better understood by analyzing
sequence data as flow of genetic information is
dictated by the “central dogma” of biology
- Cellular functions are performed by proteins whose
capabilities are determined by their sequences
- Therefore, solving functional problems using
sequence and sometimes structural approaches has
proved to be a fruitful endeavor
Scope
- Bioinformatics consists of two subfields
i) Development of computational tools and databases
ii) Application of these tools and databases in generating
biological knowledge to better understand living systems
- Tool development includes: writing software for sequence,
structural, and functional analysis, construction and curating of
biological databases
- Sequence analysis include sequence alignment, sequence
database searching, motif and pattern discovery, gene and
promoter finding, reconstruction of evolutionary relationships,
and genome assembly and comparison
- The three aspects of bioinformatics analysis are not isolated
but often interact to produce integrated results
Why is bioinformatics needed?
• Small- and large-scale biological analyses
• New laboratory technologies
• Move away from single gene to whole genome
• Genome sequencing
• Collection and storage of biological information
• Manipulation of biological information
• Computers have capability for both, and cheap
Problems and Challenges
 Know the sequence of every possible
transcript but not understand the functions of
these transcripts and their corresponding
proteins!
 How to make sense of all of the gene and
protein data in order to assign functions to
these genes and proteins and to understand
biological processes at the molecular level?
gggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcagaccaaatttagtcagtgtgaaaa
atctctagcagtggcgcctgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagagagatgggtgc
gagagcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatac
tgggacaactacaaccagcccttcagacaggatcagaagaacttagatcattatataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagca
gcagctgacacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaacatccaggggcaaatggtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatc
agaaggagccaccccacaagatttaaacaccatgctaaacacagtggggggacatcaagcagccatgcaaatgttaaaagagaccatcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaa
gtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccacctatcccagtaggagaaatctataagagatggataatcctgggattaaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaa
ccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgacagaaaccttgttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgggaccagcagctacactagaagaaatgatgacagc
atgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaattgcaggg
cccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccagggaattttcctcagaacagactagagccaacagccccaccagccccaccagaagag
agcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaactaaaggaagctctattagatacaggagcagatgataca
gtattagaagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtcaacataattggaagaaatct
gttgactcagattggttgcactttaaattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaattt
caaaaatcgggcctgaaaatccatataatactccagtatttgccataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatca
gtaacagtactggatgtgggtgatgcatatttttcagttcccttagataaagaattcaggaagtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcag
catgacaaaaatcttagagccttttagaaaacaaaatccagacatagttatctatcaatacatggacgatttgtatgtaggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaac
atcagaaagaacctccattcctttggatgggttatgaactccatcctgataaatggacagtacagcctatagtgctgccagaaaaggacagctggactgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaa
ttatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctagaactggcagaaaacagggaaattctaaaagaaccagtacatggagtgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtca
atggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgtaaaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagtaatatggggaaagactcctaaatttaaactacccatac
aaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaagaacccataataggagcagaaactttctatgtagatggggcagctaacagggagactaaa
ttaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaagtaaacatagtaacagactcacaatatgcattaggaatcattcaagcaca
accagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatcaggaaagtactctttttagatggaatagata
aagcccaagaagaacatgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaatatggcaactagat
tgtacacatttagaaggaaaaattatcctggtagcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcac
cagtactacagttaaggccgcctgttggtgggcaggaatcaagcaggaatttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcag
tattcatccacaattttaaaagaaaaggggggattggggggtacagtgcaggggaaagaatagtagacataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaagga
ccagcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataatagtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttag
taaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactatgaaagtactcatccgagaataagttcagaagtacacatcccactagggaatgcaaaattggtaataacaacatattggggtctacatacaggagaaagagactggcatttgggt
caaggagtctccatagaattgaggaaaaggagatatagcacacaattagaccctaacctagcagaccaactaattcatctgcattactttgattgtttttcagaatctgctataagaaatgccatattaggacatatagttagccctaggtgtgaatatcaagcaggacataa
caaggtaggatctctacagtacttggcactaacagcattagtaagaccaagaaaaaagataaagccacctttgcctagtgttacaaaactgacagaggatagatggaacaagccccagaagaccaagggccacaaagggaaccatacaatgaatggacactagaacttttaga
ggagctcaagaatgaagctgttagacattttcctaggatatggctccatagcttagggcaacatatctatgaaacttatggagatacttgggcaggagtggaagccataataagaattctgcaacaactgctgtttattcatttcagaattgggtgtcaacatagcagaatag
acattcttcgacgaaggagagcaagaaatggagccagtagatcctagactagagccctggaagcatccaggaagtcagcctaggactgcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttgtttcataacaaaaggcttaggcatctcctatggcag
gaagaagcggagacagcgacgaagagctcctcaagacagtcagactcatcaagtttctctatcaaagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagtagcattagtagtagcagcaataatagcaatagttgtgtggtccatagtattcatagaat
ataggaaaataagaagacaaaacaaaatagaaaggttgattgatagaataatagaaagagcagaagacagtggcaatgagagtgacggagatcaggaagaattatcagcacttgtggaaatggggcacgatgctccttgggatgttaatgatctgtaaagctgcagaaaattt
gtgggtcacagtttattatggggtacctgtgtggaaagaagcaaccaccactctattttgtgcctcagatgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcctgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgacagaaa
attttaacatgtggaaaaataacatggtagaccaaatgcatgaggatataattagtttatgggatcaaagcctaaagccatgtgtaaaattaaccccactctgtgttactttaaattgcactgattatgggaatgatactaacaccaataatagtagtgctactaaccccact
agtagtagcgggggaatggaggggagaggagaaataaaaaattgctctttcaatatcaccagaagcataagagataaagtgaagaaagaatatgcacttttttatagtcttgatgtaataccaataaaagatgataatactagctataggttgagaagttgtaacacctcagt
cattacacaggcctgtccaaaggtatcctttgaaccaattcccatacattattgtgccccggctggttttgcgattctaaagtgtaatgataaaaagttcaatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatggaattaggccagtagtatcaactc
aactgctgttaaatggcagtctagcagaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagtcataatagtacatctgaatgaatctgtagaaattaattgtacaagactcaacaacattacaaggagaagtatacatgtaggacatgtaggaccaggc
agagcaatttatacaacaggaataataggaaaaataagacaagcacattgtaacattagtagagcaaaatggaataacactttaaaacagatagttacaaaattaagagaacaatttaagaataaaacaatagtctttaatcaatcctcaggaggggacccagaaattgtaat
gcacagttttaattgtggaggggaatttttctactgtaattcaacacaactgtttaacagtacttggaatggtactgcatggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgcagaataaaacaaattataaacatgtggcaggaagtaggaaaag
caatgtatgcacctcccatcagaggacaaattagatgttcatcaaatattacagggctgatattaacaagagatggtggtattaaccagaccaacaccaccgagattttcaggcctggaggaggagatatgaaggacaattggagaagtgaattatataaatataaagtagta
aaaattgaaccattaggagtagcacccaccaaggcaaagagaagagtggtgcaaagagaaaaaagagcagtgggaataataggagctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagcgtcaatgacgctgacggtacaggccagacaattattgtc
tggtatagtgcaacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttgcacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgtggaaagatacctaagggatcaacagctcctggggttttggggttgctctggaaaactcattt
gcaccactgctgtgccttggaatactagttggagtaataaatctctgagtcagatttgggataacatgacctggatgcagtgggaaagggaaattgataattacacaagcttaatatacaacttaattgaagaatcgcaaaaccaacaagaaaagaatgaacaagagttattg
gaattagataactgggcaagtttgtggaattggtttagcataacaaattggctgtggtatataaaaatattcataatgatagtaggaggcttggtaggtttaagaatagtttttactgtactttctatagtaaatagagttaggcagggatactcaccattgtcgtttcagac
gcgcctcccagccaggaggggacccgacaggcccgaaggaatcgaagaagaaggtggagagagagacagagacagatccggtcaattagtggatggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttcagctaccaccgcttgagagacttactcttga
ttgtaacgaggattgtggaacttctgggacgcagggggtgggaagccctcaaatattggtggaatctcctacaatattggattcaggaactaaagaatagtgctgttagcttgctcaacgccacagccatagcagtagctgagggaactgatagggttatagaagtattacaa
agagcttgtagagctattctccacatacctagaagaataagacagggcttagaaagggctttgcaataagatgggtggtaagtggtcaaaaagtagtaaaattggatggcctactgtaagggaaagaatgagaagagctgagccagcagcagatggggtgggagcagtatctc
gagacctggaaaaacatggagcaatcacaagtagtaatacagcaactaacaatgctgattgtgcctggctagaagcacaagaggaggaggaggtgggttttccagtcagacctcaggtacctttaagaccaatgacttacaagggagcgttagatcttagccactttttaaaa
gaaaaggggggactggaagggctaatttggtcccagaaaagacaagacatccttgatttgtgggtccaccacacacaaggctacttccctgattggcagaactacacaccagggccagggatcagatatccactgacctttggttggtgcttcaagctagtaccagttgagcc
agagaaggtagaagaggccaatgaaggagagaacaacagattgttacaccctgtgagcctgcatgggatggaggacccggagaaagaagtgttagtatggaggtttgacagccgcctagtactccgtcacatggcccgagagctgcatccggagtactacaaggactgctgac
actgagctttctacaagggactttccgctggggactttccagggaggcgtggcctgggcgggactggggagtggcgagccctcagatgctgcatataagcagctgctttttgcctgtactgggtctctcttgttagaccagatctgagcctgggagctctctggctaactagg
gaacccactgcttaagcctcaataaagcttgccttgagtgcttca
DNA sequences are meaningless!
Challenges
 Databases and data resources
Because we need to store and retrieve lots
of data
 Search and analysis tools
Because we need to infer
function by comparison
 Interfaces and visualisation tools
Because we need to look at
lots of data
From gene to protein and its function(s)
> DNA sequence
AATTCATGAAAATCGTATACTGGTCTGGTACCGGCAACAC
TGAGAAAATGGCAGAGCTCATCGCTAAAGGTATCATCGAA
TCTGGTAAAGACGTCAACACCATCAACGTGTCTGACGTTA
ACATCGATGAACTGCTGAACGAAGATATCCTGATCCTGGG
TTGCTCTGCCATGGGCGATGAAGTTCTCGAGGAAAGCGAA
TTTGAACCGTTCATCGAAGAGATCTCTACCAAAATCTCTG
GTAAGAAGGTTGCGCTGTTCGGTTCTTACGGTTGGGGCGA
CGGTAAGTGGATGCGTGACTTCGAAGAACGTATGAACGGC
TACGGTTGCGTTGTTGTTGAGACCCCGCTGATCGTTCAGA
ACGAGCCGGACGAAGCTGAGCAGGACTGCATCGAATTTGG
TAAGAAGATCGCGAACATCTAGTAGA
Gene
> Protein sequence
MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVS
DVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEIS
TKISGKKVALFGSYGWGDGKWMRDFEERMNGYG
CVVVETPLIVQNEPDEAEQDCIEFGKKIANI
Function
What is the function of these structures?
What is the function of this sequence?
What is the function of this motif?
– the fold provides a scaffold, which can be decorated
in different ways by different sequences to confer
different functions
– knowing the fold & function allows us to rationalise
how the structure effects its function at the molecular
level
Goals of Functional Genomics
Tools currently available for genomics and
functional genomics studies
 Standard molecular biology and protein analysis
techniques, i.e. hybridization, 2D gel
electrophoresis, SAGE, etc.
 Advance technologies, i.e. microarray, GeneChips,
proteomics, etc.
 Bioinformatics: gene annotation, gene and genome
analysis, data mining, etc.
Molecular Biology
• Central Dogma of Molecular Biology:
– molecules and processes.
• Molecular biology studies:
–structure of macromolecules (DNA, RNA and protein)
–flow and expression of genetic information.
–metabolic steps that mediate the flow of information
from the genome to the phenotype of the organism
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
We need Bioinformatics in all levels
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Genome Level
Genome Projects
need to store and
organize DNA
sequences
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Transcription Level How do we find protein
coding regions, introns
and exons in genomic
DNA sequences?
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Transcription Level Under which
condition is a certain
gene transcribed?
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Translation Level
What do we
know about a
specific protein?
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Translation Level
How can we
compare protein
sequences?
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Structure Level
Can we predict
protein structures?
Example
Leveraging Genomic Data
Novel Diagnostics
Microchips & Microarrays - DNA
Gene Expression - RNA
Proteomics - Protein
Understanding Metabolism
Understanding Disease
Inherited Diseases - OMIM
Infectious Diseases
Pathogenic Bacteria
Viruses
Novel Therapeutics
Drug Target Discovery
Rational Drug Design
Molecular Docking
Gene Therapy
Stem Cell Therapy
Impact of Genomics on Medicine
I. Diagnostics
 Genomics: Identifying all known human genes
 Functional Genomics: Functional analysis of genes
 In what tissues are they important?
 When in development are the genes used?
 How are they regulated?
 Novel diagnostics
 Linking genes to diseases and to traits
 Predisposition to diseases
 Expression of genes and disease
 Personal Genomics
 Understanding the link between genomics and environment
 Increased vigilance and taking action to prevent disease
 Improving health care
Impact of Genomics on Medicine
II. Therapeutics
 Novel Drug Development
 Identifying novel drug targets
 Validating drug targets
 Predicting toxicity and adverse reactions
 Improving clinical trials and testing
 Gene therapy
 Replacing the gene rather than the gene product
 Stem cells therapies
 Replacing the entire cell type or tissue to cure a disease
 Pharmacogenomics
 Personalized medicine
 Adjusting drug, amounts and delivery to suit patients
 Maximize efficacy and minimize side effects
 Identify genetics of adverse reactions
 Identify patients who respond optimally
Application of bioinformatics
 To clinical problems
Understanding disease
Treatment and management
Development of medicines
Tailoring treatment
Applications of Bioinformatics
Molecular
Interactions
Structure Prediction
NH
O COO-
H
N
N
N
OH
NH2
N
CH2
NH
N
NH
O
COO-
COO-
H
N
N
NH
N
OH
NH2
Search for new drugs
NH2
NH2
N
N
CH3
Cl
N
CH3
NH2
NH2
N
N CH2
OCH3
OCH3
OCH3
NH2
NH2
N
N CH2
OCH3
OCH3
OCH3
H
C
NH
NH2
N
NH
CH3
Cl
NH
CH3
H
C
NH
NH2
N
NH
CH3
Cl
NH
CH3
Cl
data analysis, algorithms,
visualization, statistics, etc.
DNA chips
Biochemical Networks
Genetic Variations
Optimizing therapies
Sequence Analysis
Genomes
Proteins d1dhfa_ LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQ
- NLVIMGKKTWFSI
d8dfr__ LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQ
- NAVIMGKKTWFSI
d4dfra_ ISLIAALAVDRVIGMENAMPWN
- LPADLAWFKRNTL
-------- NKPVIMGRHTWESI
d3dfr__ TAFLWAQDRDGLIGKDGHLPWH
- LPDDLHYFRAQTV
-------- GKIMVVGRRTYESF
d1dhfa_ LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQ
- NLVIMGKKTWFSI
d8dfr__ LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQ
- NAVIMGKKTWFSI
d4dfra_ ISLIAALAVDRVIGMENAMPW
- NLPADLAWFKRNTLD
-------- KPVIMGRHTWESI
d3dfr__ TAFLWAQDRNGLIGKDGHLPW
- HLPDDLHYFRAQTVG
-------- KIMVVGRRTYESF
caaaaatagggttaatatgaatctcgatctccattttgttcatcgtattcaacaacaagcc
aaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtggcgagatatct
cttggaaaaactttcaagagcaactcaatcaactttctcgagcattgcttgctcacaatat
tgacgtacaagataaaatcgccatttttgcccataatatggaacgttgggttgttcatgaa
actttcggtatcaaagatggtttaatgaccactgttcacgcaacgactacaatcgttgaca
ttgcgaccttacaaattcgagcaatcacagtgcctatttacgcaaccaatacagcccagca
agcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtcggcgatcaagagcaa
tacgatcaaacattggaaattgctcatcattgtccaaaattacaaaaaattgtagcaatga
aatccaccattcaattacaacaagatcctctttcttgcacttgg

Introduction to Bioinformatics-1.pdf