SlideShare a Scribd company logo
Genome
Sequencing
A Seminar in MIS
by Ahmadreza Rafati Roudsari
1395/01/30
DNA (Deoxyribonucleic acid)
RNA (Ribonucleic acid)
1
A brief History…
Marshall Nirenberg is best
known for “breaking the genetic
code” in 1961, an achievement
that won him the Nobel Prize.
Marshall Nirenberg, c. 1968.
Gregor Mendel is usually
considered to be the founder
of modern genetics. (with
1856-1869 experiments)
Gregor Mendel. The completed chart of the genetic code
To read the code, select a
letter from the left, right, and
top columns, such as U-C-A.
This combination represents
an mRNA codon. Draw
imaginary horizontal and
vertical lines to connect the
letters. They intersect at the
amino acid for which they
code. For example, UCA is the
code for serine.
2
Scientific Instruments
Spectrophotometer
Electrophoresis Instrument.French PressCentrifuge
Multi-plater.
3
Glossary
Amino acid
Base
A nucleotide base (Guanine, Adenine, Cytosine, and Thymine) is one of the building blocks of DNA, along with phosphates and sugar.
4
Essential Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Threonine Tryptophan Valine
Nonessential Alanine Arginine* Asparagine Aspartic acid Cysteine* Glutamic acid Glutamine* Glycine Proline* Selenocysteine* Serine* Tyrosine*
Glossary
Codon
A codon is a triplet series of bases linked together during protein synthesis
to form an amino acid. Each codon carries the code for a specific amino
acid.
“Central Dogma”
Francis Crick's “central dogma” of molecular biology,
put simply, is:
DNA makes RNA makes protein.
DNA and RNA
DNA, deoxyribonucleic acid, and RNA, ribonucleic acid, are molecules that hold the
genetic information of each cell.
Escherichia coli
Genome
Protein
Ribosome 5
The ribosome (/ˈraɪbəˌsoʊm, -boʊ-/[1]) is a complex molecular machine found within all living cells, that serves as the site of
biological protein synthesis (translation). Ribosomes link amino acids together in the order specified by messenger
RNA(mRNA) molecules.
In modern molecular biology and genetics, the genome is the genetic material of an organism. It consists
of DNA (or RNA in RNA viruses). The genome includes both the genes and the non-protein-coding[1] information of the
DNA/RNA.[2]
Escherichia coli (/ˌɛʃᵻˈrɪkiə ˈkoʊlaɪ/;[1] also known as E. coli) is a Gram-
negative, facultatively anaerobic, rod-shaped bacterium of the genus Escherichia that is
commonly found in the lower intestine of warm-blooded organisms (endotherms).[2]
Proteins (/ˈproʊˌtiːnz/ or /ˈproʊti.ᵻnz/) are large biomolecules, or macromolecules, consisting of one or more long chains
of amino acidresidues.
Glossary
6
Standard genetic code
1st 2nd base 3rd
base U C A G base
U
UUU
(Phe/F) Phenylalanine
UCU
(Ser/S) Serine
UAU
(Tyr/Y) Tyrosine
UGU
(Cys/C) Cysteine
U
UUC UCC UAC UGC C
UUA
(Leu/L) Leucine
UCA UAA Stop (Ochre) UGA Stop (Opal) A
UUG UCG UAG Stop (Amber) UGG (Trp/W) Tryptophan G
C
CUU CCU
(Pro/P) Proline
CAU
(His/H) Histidine
CGU
(Arg/R) Arginine
U
CUC CCC CAC CGC C
CUA CCA CAA
(Gln/Q) Glutamine
CGA A
CUG CCG CAG CGG G
A
AUU
(Ile/I) Isoleucine
ACU
(Thr/T) Threonine
AAU
(Asn/N) Asparagine
AGU
(Ser/S) Serine
U
AUC ACC AAC AGC C
AUA ACA AAA
(Lys/K) Lysine
AGA
(Arg/R) Arginine
A
AUG[A] (Met/M) Methionine ACG AAG AGG G
G
GUU
(Val/V) Valine
GCU
(Ala/A) Alanine
GAU
(Asp/D) Aspartic acid
GGU
(Gly/G) Glycine
U
GUC GCC GAC GGC C
GUA GCA GAA
(Glu/E) Glutamic acid
GGA A
GUG GCG GAG GGG G
In RNA, thymine (T) is replaced by uracil (U),
and the deoxyribose is substituted by ribose.
Ribonucleic acid (RNA) is a polymeric molecule implicated in various biological
roles in coding, decoding, regulation, and expression of genes.
Glossary
7
Messenger RNA (mRNA) is a large family of RNA molecules that convey genetic information from DNA to
the ribosome, where they specify the amino acid sequence of the protein products of gene expression.
The genetic code has 7 main characteristics:
1. It is made up of codons, which are triplets of bases. Each codon specifies a specific amino acid.
2. The codons do not overlap; that is, the sequence GCCCAC contains two triplets, “GCC” and “CAC”
not counting the “CCC” and other subsequent three-letter sequences.
3. The code includes punctuation in the form of three “stop” codons that do not code for an amino
acid: UAA, UAG, and UGA.
4. The genetic code is known as a “degenerate” code. This means that each amino acid is triggered by
between one and six codons. (There are only 20 amino acids and 64 possible codon triplets).
5. To read each gene and glean the necessary information to form proteins, cells begin at a fixed and
particular starting point on the mRNA strand. The initiation codon is AUG (methionine).
6. The mRNA strand is read from the 5' to the 3' end.
7. If there are mutations or errors in the DNA, the message may be changed and incorrect protein
formation results (1)
8
People with public genome sequences
The first nearly complete human genomes sequenced
were J. Craig Venter's (American at 7.5-fold average
coverage) in 2007.
An American
biotechnologist,
biochemist,
geneticist,
and entrepreneur.
James Watson a Han Chinese
a Yoruban from Nigeria
a female leukemia patient
Seong-Jin Kim
& Steve Jobs
for the cost of $100,000
9
10
Sequence Analysis
• In bioinformatics, sequence analysis is the process of subjecting
a DNA, RNA or peptide sequence to any of a wide range of analytical
methods to understand its features, function, structure, or evolution.
• In chemistry, sequence analysis comprises techniques used to determine the
sequence of a polymer formed of several monomers. In molecular
biology and genetics, the same process is called simply "sequencing".
• In marketing, sequence analysis is often used in analytical customer
relationship management applications, such as NPTB models (Next Product to
Buy).
• In sociology, sequence methods are increasingly used to study life-course and
career trajectories, patterns of organizational and national development,
conversation and interaction structure, and the problem of work/family
synchrony.(2)
11
GenBank
• The GenBank sequence database is
an open access, annotated
collection of all publicly
available nucleotide sequences and
their protein translations.
ftp://ftp.ncbi.nih.gov/ (3)
12
DNA Patterns
DNA patterns are graphs of DNA or RNA sequences. Various functional structures such as promoters
and genes, or larger structures like bacterial or viral genomes, can be analyzed using DNA patterns.
Method
The technique was described in 2012 by Paul Gagniuc and Constantin Ionescu-Tirgoviste.[3]
They adapted algorithms from cryptography and optical character recognition to make their graphs.
To graph a DNA pattern, two values, kappa index of coincidence and the total percentage of
cytosine plus guanine (C + G)% are calculated from a sliding window which is "circulated" over the
DNA sequence.
13
Index of Coincidence (IC)
In cryptography, coincidence counting is the technique (invented by William F.
Friedman[1]) of putting two texts side-by-side and counting the number of times
that identical letters appear in the same position in both texts. This count, either
as a ratio of the total or normalized by dividing by the expected count for a
random source model, is known as the index of coincidence, or IC for short.
14
Where c is the normalizing coefficient (26 for English), na is the number of times the
letter "a" appears in the text, and N is the length of the text.
where N is the length of the text and n1 through nc are the frequencies (as integers) of the c letters of the
alphabet (c = 26 for monocase English). The sum of the ni is necessarily N.
The products n(n−1) count the number of combinations of n elements.
Gene Promoter
15
In genetics, a promoter is a region
of DNA that initiates transcription of a
particular gene. Promoters are located
near the transcription start sites of genes,
on the same strand and upstream on the
DNA (towards the 5' region of the sense
strand). Promoters can be about 100–
1000 base pairs long.[1]
http://www.ncbi.nlm.nih.gov/
• The National Center for Biotechnology Information advances science
and health by providing access to biomedical and genomic information.
16
Bilbliography
• (1) https://history.nih.gov/exhibits/nirenberg/HS5_cracked.htm
• (2) https://en.wikipedia.org/wiki/Sequence_analysis
• (3) https://en.wikipedia.org/wiki/GenBank
• (4) https://en.wikipedia.org/wiki/DNA_Patterns
• (5) https://en.wikipedia.org/wiki/Transfer_RNA
• (6) https://en.wikipedia.org/wiki/Genetic_code
• (7) https://en.wikipedia.org/wiki/Index_of_coincidence
• (8) PromKappa V3.0 Java (uses the DNA pattern method)
• Thanks to https://xbioinformatics.wordpress.com/tag/promkappa/ for software.
17
Seminar in Management of Information Systems
18

More Related Content

What's hot

Dominant and codominant markers30nov
Dominant and codominant markers30novDominant and codominant markers30nov
Dominant and codominant markers30nov
AnkitTiwari354
 
Differential gene profiling methods
Differential gene profiling methodsDifferential gene profiling methods
Differential gene profiling methods
sonamyadav82
 
Study of Transcriptome
Study of TranscriptomeStudy of Transcriptome
Study of Transcriptome
BOTANYWith
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansari
Tahura Mariyam Ansari
 
Types of nucliec acids, biosynthesis and catabolism
Types of nucliec acids, biosynthesis and catabolismTypes of nucliec acids, biosynthesis and catabolism
Types of nucliec acids, biosynthesis and catabolism
Shereen
 
GRAS proteins expression and purification
GRAS proteins  expression and purification GRAS proteins  expression and purification
GRAS proteins expression and purification
Mesele Tilahun
 
Transcription plegable
Transcription plegableTranscription plegable
Transcription plegable
Giliana Velasquez
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discoveryAmit Ruchi Yadav
 
Restriction enzymes
Restriction enzymesRestriction enzymes
Restriction enzymes
Deepa Arumugam
 
subtractive hybridization
subtractive hybridizationsubtractive hybridization
subtractive hybridization
Sakshi Saxena
 
P 53 Tumour Biology
P 53 Tumour BiologyP 53 Tumour Biology
P 53 Tumour Biology
Gaurav Dwivedi
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
Shifa Ansari
 
Expression vectors
Expression vectorsExpression vectors
Expression vectors
Ravi Kant Agrawal
 
transcriptional gene silencing
transcriptional gene silencingtranscriptional gene silencing
transcriptional gene silencing
Sheetal Mehla
 
Lecture 2a cosmids
Lecture 2a cosmidsLecture 2a cosmids
Lecture 2a cosmids
Ishah Khaliq
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
Karin Lagesen
 

What's hot (20)

Dominant and codominant markers30nov
Dominant and codominant markers30novDominant and codominant markers30nov
Dominant and codominant markers30nov
 
Differential gene profiling methods
Differential gene profiling methodsDifferential gene profiling methods
Differential gene profiling methods
 
Study of Transcriptome
Study of TranscriptomeStudy of Transcriptome
Study of Transcriptome
 
Molecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansariMolecular markers by tahura mariyam ansari
Molecular markers by tahura mariyam ansari
 
Genetic code 3
Genetic code 3Genetic code 3
Genetic code 3
 
Gene expression
Gene expressionGene expression
Gene expression
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Types of nucliec acids, biosynthesis and catabolism
Types of nucliec acids, biosynthesis and catabolismTypes of nucliec acids, biosynthesis and catabolism
Types of nucliec acids, biosynthesis and catabolism
 
GRAS proteins expression and purification
GRAS proteins  expression and purification GRAS proteins  expression and purification
GRAS proteins expression and purification
 
Transcription plegable
Transcription plegableTranscription plegable
Transcription plegable
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
Restriction enzymes
Restriction enzymesRestriction enzymes
Restriction enzymes
 
subtractive hybridization
subtractive hybridizationsubtractive hybridization
subtractive hybridization
 
P 53 Tumour Biology
P 53 Tumour BiologyP 53 Tumour Biology
P 53 Tumour Biology
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Expression vectors
Expression vectorsExpression vectors
Expression vectors
 
transcriptional gene silencing
transcriptional gene silencingtranscriptional gene silencing
transcriptional gene silencing
 
Lecture 2a cosmids
Lecture 2a cosmidsLecture 2a cosmids
Lecture 2a cosmids
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 

Similar to Genome Sequencing - Ahmadrezarafati 1395-01-30

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Nawfal Aldujaily
 
HGP, the human genome project
HGP, the human genome projectHGP, the human genome project
HGP, the human genome project
Bahauddin Zakariya University lahore
 
916215 bioinformatics-over-view
916215 bioinformatics-over-view916215 bioinformatics-over-view
916215 bioinformatics-over-view
kudipudi
 
L-1_Nucleic acid.pptx
L-1_Nucleic acid.pptxL-1_Nucleic acid.pptx
L-1_Nucleic acid.pptx
MithilaBanik
 
Genome organization and gene expression and its regulation
Genome organization and gene expression and its regulationGenome organization and gene expression and its regulation
Genome organization and gene expression and its regulation
abhishek soni
 
The Structure of DNA and RNA
The Structure of DNA and RNAThe Structure of DNA and RNA
The Structure of DNA and RNA
Eneutron
 
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
IBM India Smarter Computing
 
RNA- STRUCTURE AND FUNCTIONS
RNA- STRUCTURE AND FUNCTIONSRNA- STRUCTURE AND FUNCTIONS
RNA- STRUCTURE AND FUNCTIONS
SushrutMohapatra
 
Genetic code
Genetic codeGenetic code
Genetic code
IqraSami3
 
BTC 810 Analysis of Transcriptomes.pptx
BTC 810 Analysis of Transcriptomes.pptxBTC 810 Analysis of Transcriptomes.pptx
BTC 810 Analysis of Transcriptomes.pptx
ChijiokeNsofor
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
Divya Srivastava
 
Genetica molecular
Genetica molecularGenetica molecular
Genetica molecular
Josué Moreno Marquina
 
Central dogma
Central dogmaCentral dogma
Central dogmaneizylah
 
Genome organization ,gene expression sand regulation
Genome organization ,gene expression sand regulation Genome organization ,gene expression sand regulation
Genome organization ,gene expression sand regulation
sukanyakk
 
Genetic Information Transfer (Biology for Engineers)
Genetic Information Transfer (Biology for Engineers)Genetic Information Transfer (Biology for Engineers)
Genetic Information Transfer (Biology for Engineers)
Dr. Arun Sharma
 
Dna and protein synthesis
Dna and protein synthesisDna and protein synthesis
Dna and protein synthesisPaula Mills
 
Forensic dna typing by John M Butler
Forensic dna typing by John M ButlerForensic dna typing by John M Butler
Forensic dna typing by John M Butler
Muhammad Ahmad
 
Genetic fine structure
Genetic fine structureGenetic fine structure
Genetic fine structure
Sujan Karki
 
Gutell 089.book bioinfomaticsdictionary.2004
Gutell 089.book bioinfomaticsdictionary.2004Gutell 089.book bioinfomaticsdictionary.2004
Gutell 089.book bioinfomaticsdictionary.2004Robin Gutell
 
Molecular biology dna, rna, rep, trancr, transl (autosaved)
Molecular biology  dna, rna, rep, trancr, transl (autosaved)Molecular biology  dna, rna, rep, trancr, transl (autosaved)
Molecular biology dna, rna, rep, trancr, transl (autosaved)
Ministry of Education, Ethiopia
 

Similar to Genome Sequencing - Ahmadrezarafati 1395-01-30 (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
HGP, the human genome project
HGP, the human genome projectHGP, the human genome project
HGP, the human genome project
 
916215 bioinformatics-over-view
916215 bioinformatics-over-view916215 bioinformatics-over-view
916215 bioinformatics-over-view
 
L-1_Nucleic acid.pptx
L-1_Nucleic acid.pptxL-1_Nucleic acid.pptx
L-1_Nucleic acid.pptx
 
Genome organization and gene expression and its regulation
Genome organization and gene expression and its regulationGenome organization and gene expression and its regulation
Genome organization and gene expression and its regulation
 
The Structure of DNA and RNA
The Structure of DNA and RNAThe Structure of DNA and RNA
The Structure of DNA and RNA
 
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
Enabling next-generation sequencing applications with IBM Storwize V7000 Unif...
 
RNA- STRUCTURE AND FUNCTIONS
RNA- STRUCTURE AND FUNCTIONSRNA- STRUCTURE AND FUNCTIONS
RNA- STRUCTURE AND FUNCTIONS
 
Genetic code
Genetic codeGenetic code
Genetic code
 
BTC 810 Analysis of Transcriptomes.pptx
BTC 810 Analysis of Transcriptomes.pptxBTC 810 Analysis of Transcriptomes.pptx
BTC 810 Analysis of Transcriptomes.pptx
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Genetica molecular
Genetica molecularGenetica molecular
Genetica molecular
 
Central dogma
Central dogmaCentral dogma
Central dogma
 
Genome organization ,gene expression sand regulation
Genome organization ,gene expression sand regulation Genome organization ,gene expression sand regulation
Genome organization ,gene expression sand regulation
 
Genetic Information Transfer (Biology for Engineers)
Genetic Information Transfer (Biology for Engineers)Genetic Information Transfer (Biology for Engineers)
Genetic Information Transfer (Biology for Engineers)
 
Dna and protein synthesis
Dna and protein synthesisDna and protein synthesis
Dna and protein synthesis
 
Forensic dna typing by John M Butler
Forensic dna typing by John M ButlerForensic dna typing by John M Butler
Forensic dna typing by John M Butler
 
Genetic fine structure
Genetic fine structureGenetic fine structure
Genetic fine structure
 
Gutell 089.book bioinfomaticsdictionary.2004
Gutell 089.book bioinfomaticsdictionary.2004Gutell 089.book bioinfomaticsdictionary.2004
Gutell 089.book bioinfomaticsdictionary.2004
 
Molecular biology dna, rna, rep, trancr, transl (autosaved)
Molecular biology  dna, rna, rep, trancr, transl (autosaved)Molecular biology  dna, rna, rep, trancr, transl (autosaved)
Molecular biology dna, rna, rep, trancr, transl (autosaved)
 

Genome Sequencing - Ahmadrezarafati 1395-01-30

  • 1. Genome Sequencing A Seminar in MIS by Ahmadreza Rafati Roudsari 1395/01/30 DNA (Deoxyribonucleic acid) RNA (Ribonucleic acid) 1
  • 2. A brief History… Marshall Nirenberg is best known for “breaking the genetic code” in 1961, an achievement that won him the Nobel Prize. Marshall Nirenberg, c. 1968. Gregor Mendel is usually considered to be the founder of modern genetics. (with 1856-1869 experiments) Gregor Mendel. The completed chart of the genetic code To read the code, select a letter from the left, right, and top columns, such as U-C-A. This combination represents an mRNA codon. Draw imaginary horizontal and vertical lines to connect the letters. They intersect at the amino acid for which they code. For example, UCA is the code for serine. 2
  • 4. Glossary Amino acid Base A nucleotide base (Guanine, Adenine, Cytosine, and Thymine) is one of the building blocks of DNA, along with phosphates and sugar. 4 Essential Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Threonine Tryptophan Valine Nonessential Alanine Arginine* Asparagine Aspartic acid Cysteine* Glutamic acid Glutamine* Glycine Proline* Selenocysteine* Serine* Tyrosine*
  • 5. Glossary Codon A codon is a triplet series of bases linked together during protein synthesis to form an amino acid. Each codon carries the code for a specific amino acid. “Central Dogma” Francis Crick's “central dogma” of molecular biology, put simply, is: DNA makes RNA makes protein. DNA and RNA DNA, deoxyribonucleic acid, and RNA, ribonucleic acid, are molecules that hold the genetic information of each cell. Escherichia coli Genome Protein Ribosome 5 The ribosome (/ˈraɪbəˌsoʊm, -boʊ-/[1]) is a complex molecular machine found within all living cells, that serves as the site of biological protein synthesis (translation). Ribosomes link amino acids together in the order specified by messenger RNA(mRNA) molecules. In modern molecular biology and genetics, the genome is the genetic material of an organism. It consists of DNA (or RNA in RNA viruses). The genome includes both the genes and the non-protein-coding[1] information of the DNA/RNA.[2] Escherichia coli (/ˌɛʃᵻˈrɪkiə ˈkoʊlaɪ/;[1] also known as E. coli) is a Gram- negative, facultatively anaerobic, rod-shaped bacterium of the genus Escherichia that is commonly found in the lower intestine of warm-blooded organisms (endotherms).[2] Proteins (/ˈproʊˌtiːnz/ or /ˈproʊti.ᵻnz/) are large biomolecules, or macromolecules, consisting of one or more long chains of amino acidresidues.
  • 6. Glossary 6 Standard genetic code 1st 2nd base 3rd base U C A G base U UUU (Phe/F) Phenylalanine UCU (Ser/S) Serine UAU (Tyr/Y) Tyrosine UGU (Cys/C) Cysteine U UUC UCC UAC UGC C UUA (Leu/L) Leucine UCA UAA Stop (Ochre) UGA Stop (Opal) A UUG UCG UAG Stop (Amber) UGG (Trp/W) Tryptophan G C CUU CCU (Pro/P) Proline CAU (His/H) Histidine CGU (Arg/R) Arginine U CUC CCC CAC CGC C CUA CCA CAA (Gln/Q) Glutamine CGA A CUG CCG CAG CGG G A AUU (Ile/I) Isoleucine ACU (Thr/T) Threonine AAU (Asn/N) Asparagine AGU (Ser/S) Serine U AUC ACC AAC AGC C AUA ACA AAA (Lys/K) Lysine AGA (Arg/R) Arginine A AUG[A] (Met/M) Methionine ACG AAG AGG G G GUU (Val/V) Valine GCU (Ala/A) Alanine GAU (Asp/D) Aspartic acid GGU (Gly/G) Glycine U GUC GCC GAC GGC C GUA GCA GAA (Glu/E) Glutamic acid GGA A GUG GCG GAG GGG G In RNA, thymine (T) is replaced by uracil (U), and the deoxyribose is substituted by ribose. Ribonucleic acid (RNA) is a polymeric molecule implicated in various biological roles in coding, decoding, regulation, and expression of genes.
  • 7. Glossary 7 Messenger RNA (mRNA) is a large family of RNA molecules that convey genetic information from DNA to the ribosome, where they specify the amino acid sequence of the protein products of gene expression.
  • 8. The genetic code has 7 main characteristics: 1. It is made up of codons, which are triplets of bases. Each codon specifies a specific amino acid. 2. The codons do not overlap; that is, the sequence GCCCAC contains two triplets, “GCC” and “CAC” not counting the “CCC” and other subsequent three-letter sequences. 3. The code includes punctuation in the form of three “stop” codons that do not code for an amino acid: UAA, UAG, and UGA. 4. The genetic code is known as a “degenerate” code. This means that each amino acid is triggered by between one and six codons. (There are only 20 amino acids and 64 possible codon triplets). 5. To read each gene and glean the necessary information to form proteins, cells begin at a fixed and particular starting point on the mRNA strand. The initiation codon is AUG (methionine). 6. The mRNA strand is read from the 5' to the 3' end. 7. If there are mutations or errors in the DNA, the message may be changed and incorrect protein formation results (1) 8
  • 9. People with public genome sequences The first nearly complete human genomes sequenced were J. Craig Venter's (American at 7.5-fold average coverage) in 2007. An American biotechnologist, biochemist, geneticist, and entrepreneur. James Watson a Han Chinese a Yoruban from Nigeria a female leukemia patient Seong-Jin Kim & Steve Jobs for the cost of $100,000 9
  • 10. 10
  • 11. Sequence Analysis • In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. • In chemistry, sequence analysis comprises techniques used to determine the sequence of a polymer formed of several monomers. In molecular biology and genetics, the same process is called simply "sequencing". • In marketing, sequence analysis is often used in analytical customer relationship management applications, such as NPTB models (Next Product to Buy). • In sociology, sequence methods are increasingly used to study life-course and career trajectories, patterns of organizational and national development, conversation and interaction structure, and the problem of work/family synchrony.(2) 11
  • 12. GenBank • The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. ftp://ftp.ncbi.nih.gov/ (3) 12
  • 13. DNA Patterns DNA patterns are graphs of DNA or RNA sequences. Various functional structures such as promoters and genes, or larger structures like bacterial or viral genomes, can be analyzed using DNA patterns. Method The technique was described in 2012 by Paul Gagniuc and Constantin Ionescu-Tirgoviste.[3] They adapted algorithms from cryptography and optical character recognition to make their graphs. To graph a DNA pattern, two values, kappa index of coincidence and the total percentage of cytosine plus guanine (C + G)% are calculated from a sliding window which is "circulated" over the DNA sequence. 13
  • 14. Index of Coincidence (IC) In cryptography, coincidence counting is the technique (invented by William F. Friedman[1]) of putting two texts side-by-side and counting the number of times that identical letters appear in the same position in both texts. This count, either as a ratio of the total or normalized by dividing by the expected count for a random source model, is known as the index of coincidence, or IC for short. 14 Where c is the normalizing coefficient (26 for English), na is the number of times the letter "a" appears in the text, and N is the length of the text. where N is the length of the text and n1 through nc are the frequencies (as integers) of the c letters of the alphabet (c = 26 for monocase English). The sum of the ni is necessarily N. The products n(n−1) count the number of combinations of n elements.
  • 15. Gene Promoter 15 In genetics, a promoter is a region of DNA that initiates transcription of a particular gene. Promoters are located near the transcription start sites of genes, on the same strand and upstream on the DNA (towards the 5' region of the sense strand). Promoters can be about 100– 1000 base pairs long.[1]
  • 16. http://www.ncbi.nlm.nih.gov/ • The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. 16
  • 17. Bilbliography • (1) https://history.nih.gov/exhibits/nirenberg/HS5_cracked.htm • (2) https://en.wikipedia.org/wiki/Sequence_analysis • (3) https://en.wikipedia.org/wiki/GenBank • (4) https://en.wikipedia.org/wiki/DNA_Patterns • (5) https://en.wikipedia.org/wiki/Transfer_RNA • (6) https://en.wikipedia.org/wiki/Genetic_code • (7) https://en.wikipedia.org/wiki/Index_of_coincidence • (8) PromKappa V3.0 Java (uses the DNA pattern method) • Thanks to https://xbioinformatics.wordpress.com/tag/promkappa/ for software. 17
  • 18. Seminar in Management of Information Systems 18