Genome Sequencing - Ahmadrezarafati 1395-01-30

Genome
Sequencing
A Seminar in MIS
by Ahmadreza Rafati Roudsari
1395/01/30
DNA (Deoxyribonucleic acid)
RNA (Ribonucleic acid)
1

A brief History…
Marshall Nirenberg is best
known for “breaking the genetic
code” in 1961, an achievement
that won him the Nobel Prize.
Marshall Nirenberg, c. 1968.
Gregor Mendel is usually
considered to be the founder
of modern genetics. (with
1856-1869 experiments)
Gregor Mendel. The completed chart of the genetic code
To read the code, select a
letter from the left, right, and
top columns, such as U-C-A.
This combination represents
an mRNA codon. Draw
imaginary horizontal and
vertical lines to connect the
letters. They intersect at the
amino acid for which they
code. For example, UCA is the
code for serine.
2

Scientific Instruments
Spectrophotometer
Electrophoresis Instrument.French PressCentrifuge
Multi-plater.
3

Glossary
Amino acid
Base
A nucleotide base (Guanine, Adenine, Cytosine, and Thymine) is one of the building blocks of DNA, along with phosphates and sugar.
4
Essential Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Threonine Tryptophan Valine
Nonessential Alanine Arginine* Asparagine Aspartic acid Cysteine* Glutamic acid Glutamine* Glycine Proline* Selenocysteine* Serine* Tyrosine*

Glossary
Codon
A codon is a triplet series of bases linked together during protein synthesis
to form an amino acid. Each codon carries the code for a specific amino
acid.
“Central Dogma”
Francis Crick's “central dogma” of molecular biology,
put simply, is:
DNA makes RNA makes protein.
DNA and RNA
DNA, deoxyribonucleic acid, and RNA, ribonucleic acid, are molecules that hold the
genetic information of each cell.
Escherichia coli
Genome
Protein
Ribosome 5
The ribosome (/ˈraɪbəˌsoʊm, -boʊ-/[1]) is a complex molecular machine found within all living cells, that serves as the site of
biological protein synthesis (translation). Ribosomes link amino acids together in the order specified by messenger
RNA(mRNA) molecules.
In modern molecular biology and genetics, the genome is the genetic material of an organism. It consists
of DNA (or RNA in RNA viruses). The genome includes both the genes and the non-protein-coding[1] information of the
DNA/RNA.[2]
Escherichia coli (/ˌɛʃᵻˈrɪkiə ˈkoʊlaɪ/;[1] also known as E. coli) is a Gram-
negative, facultatively anaerobic, rod-shaped bacterium of the genus Escherichia that is
commonly found in the lower intestine of warm-blooded organisms (endotherms).[2]
Proteins (/ˈproʊˌtiːnz/ or /ˈproʊti.ᵻnz/) are large biomolecules, or macromolecules, consisting of one or more long chains
of amino acidresidues.

Glossary
6
Standard genetic code
1st 2nd base 3rd
base U C A G base
U
UUU
(Phe/F) Phenylalanine
UCU
(Ser/S) Serine
UAU
(Tyr/Y) Tyrosine
UGU
(Cys/C) Cysteine
U
UUC UCC UAC UGC C
UUA
(Leu/L) Leucine
UCA UAA Stop (Ochre) UGA Stop (Opal) A
UUG UCG UAG Stop (Amber) UGG (Trp/W) Tryptophan G
C
CUU CCU
(Pro/P) Proline
CAU
(His/H) Histidine
CGU
(Arg/R) Arginine
U
CUC CCC CAC CGC C
CUA CCA CAA
(Gln/Q) Glutamine
CGA A
CUG CCG CAG CGG G
A
AUU
(Ile/I) Isoleucine
ACU
(Thr/T) Threonine
AAU
(Asn/N) Asparagine
AGU
(Ser/S) Serine
U
AUC ACC AAC AGC C
AUA ACA AAA
(Lys/K) Lysine
AGA
(Arg/R) Arginine
A
AUG[A] (Met/M) Methionine ACG AAG AGG G
G
GUU
(Val/V) Valine
GCU
(Ala/A) Alanine
GAU
(Asp/D) Aspartic acid
GGU
(Gly/G) Glycine
U
GUC GCC GAC GGC C
GUA GCA GAA
(Glu/E) Glutamic acid
GGA A
GUG GCG GAG GGG G
In RNA, thymine (T) is replaced by uracil (U),
and the deoxyribose is substituted by ribose.
Ribonucleic acid (RNA) is a polymeric molecule implicated in various biological
roles in coding, decoding, regulation, and expression of genes.

Glossary
7
Messenger RNA (mRNA) is a large family of RNA molecules that convey genetic information from DNA to
the ribosome, where they specify the amino acid sequence of the protein products of gene expression.

The genetic code has 7 main characteristics:
1. It is made up of codons, which are triplets of bases. Each codon specifies a specific amino acid.
2. The codons do not overlap; that is, the sequence GCCCAC contains two triplets, “GCC” and “CAC”
not counting the “CCC” and other subsequent three-letter sequences.
3. The code includes punctuation in the form of three “stop” codons that do not code for an amino
acid: UAA, UAG, and UGA.
4. The genetic code is known as a “degenerate” code. This means that each amino acid is triggered by
between one and six codons. (There are only 20 amino acids and 64 possible codon triplets).
5. To read each gene and glean the necessary information to form proteins, cells begin at a fixed and
particular starting point on the mRNA strand. The initiation codon is AUG (methionine).
6. The mRNA strand is read from the 5' to the 3' end.
7. If there are mutations or errors in the DNA, the message may be changed and incorrect protein
formation results (1)
8

People with public genome sequences
The first nearly complete human genomes sequenced
were J. Craig Venter's (American at 7.5-fold average
coverage) in 2007.
An American
biotechnologist,
biochemist,
geneticist,
and entrepreneur.
James Watson a Han Chinese
a Yoruban from Nigeria
a female leukemia patient
Seong-Jin Kim
& Steve Jobs
for the cost of $100,000
9

Sequence Analysis
• In bioinformatics, sequence analysis is the process of subjecting
a DNA, RNA or peptide sequence to any of a wide range of analytical
methods to understand its features, function, structure, or evolution.
• In chemistry, sequence analysis comprises techniques used to determine the
sequence of a polymer formed of several monomers. In molecular
biology and genetics, the same process is called simply "sequencing".
• In marketing, sequence analysis is often used in analytical customer
relationship management applications, such as NPTB models (Next Product to
Buy).
• In sociology, sequence methods are increasingly used to study life-course and
career trajectories, patterns of organizational and national development,
conversation and interaction structure, and the problem of work/family
synchrony.(2)
11

GenBank
• The GenBank sequence database is
an open access, annotated
collection of all publicly
available nucleotide sequences and
their protein translations.
ftp://ftp.ncbi.nih.gov/ (3)
12

DNA Patterns
DNA patterns are graphs of DNA or RNA sequences. Various functional structures such as promoters
and genes, or larger structures like bacterial or viral genomes, can be analyzed using DNA patterns.
Method
The technique was described in 2012 by Paul Gagniuc and Constantin Ionescu-Tirgoviste.[3]
They adapted algorithms from cryptography and optical character recognition to make their graphs.
To graph a DNA pattern, two values, kappa index of coincidence and the total percentage of
cytosine plus guanine (C + G)% are calculated from a sliding window which is "circulated" over the
DNA sequence.
13

Index of Coincidence (IC)
In cryptography, coincidence counting is the technique (invented by William F.
Friedman[1]) of putting two texts side-by-side and counting the number of times
that identical letters appear in the same position in both texts. This count, either
as a ratio of the total or normalized by dividing by the expected count for a
random source model, is known as the index of coincidence, or IC for short.
14
Where c is the normalizing coefficient (26 for English), na is the number of times the
letter "a" appears in the text, and N is the length of the text.
where N is the length of the text and n1 through nc are the frequencies (as integers) of the c letters of the
alphabet (c = 26 for monocase English). The sum of the ni is necessarily N.
The products n(n−1) count the number of combinations of n elements.

Gene Promoter
15
In genetics, a promoter is a region
of DNA that initiates transcription of a
particular gene. Promoters are located
near the transcription start sites of genes,
on the same strand and upstream on the
DNA (towards the 5' region of the sense
strand). Promoters can be about 100–
1000 base pairs long.[1]

http://www.ncbi.nlm.nih.gov/
• The National Center for Biotechnology Information advances science
and health by providing access to biomedical and genomic information.
16

Bilbliography
• (1) https://history.nih.gov/exhibits/nirenberg/HS5_cracked.htm
• (2) https://en.wikipedia.org/wiki/Sequence_analysis
• (3) https://en.wikipedia.org/wiki/GenBank
• (4) https://en.wikipedia.org/wiki/DNA_Patterns
• (5) https://en.wikipedia.org/wiki/Transfer_RNA
• (6) https://en.wikipedia.org/wiki/Genetic_code
• (7) https://en.wikipedia.org/wiki/Index_of_coincidence
• (8) PromKappa V3.0 Java (uses the DNA pattern method)
• Thanks to https://xbioinformatics.wordpress.com/tag/promkappa/ for software.
17

Seminar in Management of Information Systems
18

Genome Sequencing - Ahmadrezarafati 1395-01-30

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Genome Sequencing - Ahmadrezarafati 1395-01-30

Similar to Genome Sequencing - Ahmadrezarafati 1395-01-30 (20)

Genome Sequencing - Ahmadrezarafati 1395-01-30