This document provides an introduction and primer to key concepts in bioinformatics. It discusses DNA structure and genes, how bioinformatics uses computer science to solve biological problems like genome sequencing, and the central dogma of DNA transcription and translation into mRNA and protein. It then outlines three tasks - converting a DNA sequence to mRNA, evaluating a sequence for single nucleotide polymorphisms related to sickle cell disease, and using a restriction endonuclease to identify tandem repeats related to Huntington's disease risk.
2. Key Theory - What is DNA?
DNA acts as the genetic code for the majority of living organisms
Double helical structure in the nucleus
Composed of four nucleotide bases (Adenine, Thymine, Guanine, Cytosine)
Complimentary base pairing (AT, GC)
Sections of DNA are called genes (variations are called alleles)
Two strands run in opposite directions (Antiparallel)
2
3. Introduction to
Bioinformatics
Modern Biology and Medicine reliant on Genetics
(therefore Bioinformatics)
Using Computer Science [and sometimes Statistics] to
solve Biological Problems
Sequenced the Human Genome in 1990-2003
(although technically still going)
Average collection of all bases in human DNA
Helps diagnosis of Genetic Diseases, identification of
why they exist and how to fix them
FASTA format is used to denote nucleotide sequences
3
4. Key Theory - The Central
Dogma
Three bases in DNA is called a codon and codes for a
particular amino acid (20 different kinds)
It is universal, degenerate and non-overlapping
Enzyme RNA polymerase copies template strand (5’)
Creates single stranded messenger RNA (mRNA) to
send this message to ribosomes, similar to coding
strand (3’)
RNA is single stranded, has Uracil instead of Thymine
Use transfer RNA (tRNA) to bring amino acids and
create a protein
4
5. Task 1 – Convert DNA
to mRNA
(transcription)
Scenario: a patient is suspected of
having sickle cell due to symptoms of
episodes of pain and frequent fatigue
Sickle cell disease causes RBC to morph
into unusually long shapes
Sufferers are prone to anaemia due to
insufficient oxygen carrying
Has a genetic cause of a single
nucleotide (HBB gene on Chr.11)
Result should be
“AUUGCCGUCUGAAGAGGACUCCUCAGU
CUACGUGGU”
5
6. Key Theory - Basis of Genetic Mutations
Mutations (SNP) are a change in the base
sequence
Caused by natural misreads or mutagenic
agents (radiation & chemicals)
INDELS – insertion/deletion of bases
Substitutions – one type of base is changed for
another (A to T)
Others include changes in chromosome
numbers (i.e. non-disjunction in Down’s
syndrome), translocation and inversion
Can have beneficial effects (increasing
diversity) or detrimental (genetic disease)
6
7. Task 2 – Evaluate for Single
Nucleotide Polymorphism
Single Nucleotide Polymorphisms can
be detected by comparing with the
reference sequence
Only those homozygous will display
sickle cell symptoms – why?
SC is evolutionary advantageous
against malaria
Found predominantly in African,
Arabian and Indian origins
What type of SNP have you
detected?
7
8. Key Theory –
Restriction
Endonucleas
es
Enzymes that cut DNA sequences at specific
palindromic recognition sites
Make incisions at the sugar-phosphate backbone of
each strand separately
Derive from bacteria and archaea as an immune
system against viruses
Can cut at “blunt ends” (i.e. Smal) or “sticky ends”
(i.e. EcoRI)
8
9. Task 3 – Identifying
Tandem Repeats
Scenario – a patient has a relative who was recently
diagnosed with Huntington’s disease, we need to
identify whether they are at risk
Causes movement disorders such as involuntary jerking
motions, cognitive disorders such as difficulty
processing words and new information
Defective gene with excessive tandem repeats codes for
a faulty protein which gradually degenerates the brain
HD gene has a section of trinucleotide repeats (CAG),
coding for a polyglutamine tract, above 36 repeats
means Huntingdon’s will probably manifest
Uses EcoP15I restriction endonuclease, which cuts at
blunt ends
9