SlideShare a Scribd company logo
Genome Annotation
MICROBIO 590B Bioinformatics Lab: Bacterial Genomics
Professor Kristen DeAngelis
UMass Amherst
Fall 2022
1
Lecture Learning Goals
• Describe how genes are identified.
• Distinguish between an open reading frame, a genome feature, a
gene, and a protein coding region.
• Explain how genomes are annotated and the kinds of databases that
are used to classify genes.
• List the genes involved in cellular metabolism, for both energy
generation (catabolism) and cell growth (anabolism).
• Explain the idea behind metabolic models, and describe one
application.
2
Annotation of an Open Reading Frame
• …
3
Open Reading Frames
• Some ORFs are located one strand, and others, on the other strand- facing in the
opposite orientation. The strands are designated as + or – and ORFs are
diagrammed as located on the top (+) or bottom (-) strand template. The diagram
below shows that most ORFs are in the same orientation for S-TIM5
bacteriophage.
• For the ORFs located above the line, ‘upstream’, where the promoter is located
(the 5’ end of the ORF), is to the left.
• Open reading frames (ORFs) are sections of the genome that are flanked by start
and stop codons, and thus can be readily identified with computer algorithms.
Algorithm identify ORFs that may or may not be used by the cell to produce a
protein (termed CDS- coding sequence).
4
Open Reading Frames
5
Sabehi, Shaulov, Silver, Yanai, Harel, and Lindell, PNAS. 2012
Origin of replication
• Models for bacterial (A) and eukaryotic
(B) DNA replication initiation.
• A) Circular bacterial chromosomes contain a
cis-acting element, the replicator, that is
located at or near replication origins.
• B) Linear eukaryotic chromosomes contain
many replication origins.
• Most bacterial chromosomes are
circular and contain a single origin of
chromosomal replication (oriC).
• Origins in bacteria contain three
functional elements that control origin
activity:
• conserved DNA repeats that are specifically
recognized by DnaA (called DnaA-boxes)
• an AT-rich DNA unwinding element (DUE)
• and binding sites for proteins that help
regulate replication initiation
6
Ribosomal operons tend to locate near the origin
of replication
• rRNA is the ribosomal RNA, a major constituent of the ribosome, accounting for about 2/3
of its mass
• A large number of ribosomes is required for growing cells
• Fast-growing cells have many copies of the ribosomal operon
7
http://book.bionumbers.org/how-many-ribosomal-rna-gene-copies-are-in-the-genome/
GC skew
• The leading (single) strand tends to
have more Gs than Cs, though the
number of each base are the same
when you examine all base pairs
(double stranded).
• The difference is referred to as GC
skew, which can be examined to
locate the origin of replication.
• When the G content exceed the C
content, this is considered a positive
skew and indicates a leading strand.
8
Billings et al., Standards in Genomic Sciences 2015
key elements to genome annotation
1. The program scans through the sequence to identify rRNA and tRNA genes.
• rRNA = ribosomal RNA genes, structural RNA in the ribosome with ribosomal proteins
• tRNA = transfer RNA genes, connects the amino acid to the mRNA for growing proteins
2. The program predicts gene-encoding regions (also known as Open Reading
Frames, or ORFs)
3. The program looks for other elements of interest (phages, CRISPR arrays, etc)
4. Compare the sequence of a feature (any of items 1-3) to a reference database
of sequences with known functions. If the sequence looks similar to what has
already been annotated in the database (hopefully based on experimental
evidence), then it assigns the same function to this sequence - whether or not
that is actually what it does! But it's the best we can do.
9
Ribosomes and non-coding RNA
• Ribosomes are mostly coded in operons
• Ribosome structure requires 3 types of structural RNA molecules: 5s, 16s and
23s rRNAs
• Ribosomes also require proteins; these are also good phylogenetic markers
• Unlinked rRNA genes are widespread among bacteria and archaea
10
Brewer et al., ISMEJ 2019
Annotate Genomes with Prokka
• Number of genes predicted
• aka total CDS
• aka total coding sequences
• Number of protein coding genes
• Number of genes with non-hypothetical
function
• Number of genes with EC number
• Total tRNAs
• Total rRNAs
11
Seemann, Bioinformatics 2014
How many ORFs are annotated?
• UP to half of all ORFs have no known homologs… !
• Orphan genes, or ORFans … usually considered unique to a very narrow taxon,
generally a species
• Orphans are a subset of taxonomically-restricted genes (TRGs), which are
unique to a specific taxonomic level (e.g. plant-specific)
• Non-homology based methods based on the context and the interactions of a
protein may help identify missing metabolic activities and functional
annotation
• Why?
• Some are sequencing errors
• Some may be derived from horizontal gene transfer, duplication and
divergence, or de novo origination
• Some could be non-coding RNAs
12
Pseudogenes
• Pseudogenes are nonfunctional segments of DNA that resemble
functional genes
• Most bacterial pseudogenes are found in non-free-living organisms,
like symbionts or obligate intracellular parasites
• These will (generally) not be included in genome annotations
13
Categorizing protein coding genes
• Many organizational schemes categorize protein coding genes
• Which one you choose depends upon which are available your goals
• Common options include:
• Enzyme (enzyme nomenclature) and EC numbers,
• FIGfams (functional homologs, part of SEED subsystems),
• Pfam and TIGRfam (curated protein families),
• COG (curated clusters of orthologous groups of proteins),
• KO (KEGG Orthology), KEGG (metabolic pathways and reactions),
• InterPro (protein families and domains),
• GO (gene ontologies),
• LIGAND (compounds), and
• MetaCyc (metabolic pathways)
14
https://img.jgi.doe.gov/datasource.html
Categorizing protein coding genes: EC number
• EC number stand for Enzyme Commission number
• EC numbers are assigned by the Nomenclature Committee of the
International Union of Biochemistry and Molecular Biology
15
Categorizing protein coding genes: EC number
• EC numbers have four positions which describe exactly what kind of
reaction the enzyme catalyzes
• An example is beta-glucosidase, the terminal exonuclease in the
depolymerization of cellulose to sugars
16
EC 3.2.1.21
general type of
reaction catalyzed
by the enzyme;
EC 3 group is
hydrolyase
https://www.qmul.ac.uk/sbcs/iubmb/enzyme/EC3/2/1/21.html
Subclass of the
top-level group;
EC 3.2 group is
glycosylases
Sub-subclass of the
top-level group;
EC 3.2.1 group is
Glycosidases, i.e.
enzymes hydrolysing
O- and S-glycosyl
compounds
serial number of the
enzyme in its sub-subclass;
β-glucosidase, Hydrolysis of
terminal, non-reducing β-
D-glucosyl residues with
release of β-D-glucose
Categorizing protein coding genes: FIGfams
• The original SEED Project was started in 2003 by the Fellowship for Interpretation
of Genomes (FIG) as an open source effort
• annotation is done by the
curation of subsystems across
many genomes, not on a gene-
by-gene basis
• From the curated subsystems we
extract a set of freely available
protein families (FIGfams)
• These FIGfams form the core
component of the RAST server
(RAST=Rapid Annotation using
Subsytems Technology)
17
https://www.theseed.org/wiki/Home_of_the_SEED
Categorizing protein coding genes: FIGfams
18
Meyer et al., Nucleic Acids Research 2009
Categorizing protein coding genes: FIGfams
• Each FIGfam is a set of proteins that are believed to be isofunctional
homologs
• they all are believed to implement the same function,
• and they are believed to derive from a common ancestor because they
appear to be similar
19
Categorizing protein coding genes: pfams
• The Pfam database is a large collection of protein families, each
represented by multiple sequence alignments and hidden Markov
models (HMMs).
• Pfam 34.0 (March 2021, 19179 entries)
• The general purpose of the Pfam database is to provide a complete
and accurate classification of protein families and domains
20
http://pfam.xfam.org; Mistry et al., Nucleic Acids Research, 2020
Categorizing protein coding genes: pfams
• Proteins may have multiple pfams, since domains are characterized
21
Mistry et al., Nucleic Acids Research, 2020
• Newly revised the Pfam entries
that cover the SARS-CoV-2
proteome, with new entries for
regions not covered by Pfam.
• The structure of NSP15 from
Kim et al. shows the three new
Pfam domains,
• (i) CoV_NSP15_N Coronavirus
replicase domain in red,
• (ii) CoV_NSP15_M Coronavirus
replicase NSP15 domain in blue,
• (iii) CoV_NSP15_C Coronavirus
replicase NSP15, uridylate-specific
endoribonuclease in green.
Categorizing protein
coding genes: COGs
• Clusters of Orthologous Genes
(COGs)
• relatively small collection of fewer
than 5000 clusters of orthologous
proteins (COGs) consists of the
products of the most widespread
bacterial and archaeal genes
22
https://www.ncbi.nlm.nih.gov/research/COG
Categorizing protein coding genes: COGs
23
Shields et al., mSphere 2018
• An example of how COGs are used in analyzing change in relative
abundance of protein coding genes across treatments
Categorizing protein coding genes: KEGG and KO
• KEGG: Kyoto Encyclopedia of Genes and Genomes
• KEGG is a database resource for understanding high-level functions
and utilities of the biological system, such as the cell, the organism
and the ecosystem, from molecular-level information, especially
large-scale molecular datasets generated by genome sequencing and
other high-throughput experimental technologies.
24
https://www.genome.jp/kegg/
Categorizing protein
coding genes:
KEGG and KO
• …
25
https://www.genome.jp/kegg/
Categorizing protein coding genes: KEGG and KO
• KEGG consists of
eighteen original
databases in four
categories
26
Kanehisa et al., Nucleic Acids Research 2020
Categorizing protein coding genes: KEGG and KO
27
Categorizing protein coding genes: KEGG and KO
28
• Circles represent metabolites
• Lines represent enzymes that
make biochemical
transformations
Categorizing protein coding genes: KEGG and KO
29
Categorizing protein coding genes: KEGG and KO
• The KO (KEGG Orthology) database is a database of molecular
functions represented in terms of functional orthologs.
• A functional ortholog is manually defined in the context of KEGG molecular
networks, namely, KEGG pathway maps, BRITE hierarchies and KEGG
modules.
• Each node of the network, such as a box in the KEGG pathway map, is given a
KO identifier (called K number) as a functional ortholog defined from
experimentally characterized genes and proteins in specific organisms, which
are then used to assign orthologous genes in other organisms based on
sequence similarity.
• The granularity of "function" is context-dependent, and the resulting KO
grouping may correspond to a group of highly similar sequences within a
limited organism group or it may be a more divergent group.
30
Categorizing protein coding genes: KEGG and KO
• The KO (KEGG Orthology) database
• KEGG pathway maps are drawn based on experimental evidence in
specific organisms but they are designed to be applicable to other
organisms as well, because different organisms, such as human and
mouse, often share identical pathways consisting of functionally
identical genes, called orthologous genes or orthologs
31
Metabolism
• All chemical reactions inside a cell
• Metabolic pathways are the stepwise reactions that generate energy
by breaking down larger molecules (catabolism) or that are
biosynthetic and require energy (anabolism)
32
https://openstax.org/books/microbiology/pages/8-1-energy-matter-and-enzymes
Metabolism
• The energy currency
of cells include ATP,
NAD+, NADP+, and
FAD
• Exergonic reactions
are coupled to
endergonic reactions
to make the
combinations
favorable
33
https://openstax.org/books/microbiology/pages/8-1-energy-matter-and-enzymes
Catabolism of carbohydrates: glycolysis
• the most common pathway for the metabolism of glucose
• Produces energy, reduced electron carriers, and precursor molecules
for anabolism
• Can be coupled to aerobic or anaerobic growth
• Glycolysis
• Embden-Meyerhof-Parnoff pathway, aka “glycolysis”
• Entner-Doudoroff pathway is an alternative glycolysis
• Pentose-phosphate pathway processes five-carbon sugars
34
Glycolysis, the “upper” half
• 2 ATPs are used to
phosphorylate
glucose, which is
then split into two
3-carbon molecules
35
https://openstax.org/books/microbiology/pages/c-metabolic-pathways
Glycolysis, the “lower” half
• Further phosphorylation
requires NAD+, producing
4 ATPs per glucose
• Net 2 ATP per glucose
36
https://openstax.org/books/microbiology/pages/c-metabolic-pathways
Substrate-level phosphorylation
• One of two enzymatic reactions in the energy payoff phase of
glycolysis generates ATP
37
https://openstax.org/books/microbiology/pages/8-2-catabolism-of-carbohydrates
Entner-Doudoroff
pathway
• to catabolize glucose to
pyruvate, ED uses the
unique enzymes
• 6-phosphogluconate
dehydratase aldolase
(EC 4.2.1.12) and
• 2-keto-deoxy-6-
phosphogluconate
aldolase (EC 4.2.1.14)
(KDPG)
38
https://openstax.org/books/microbiology/pages/c-metabolic-pathways
Entner-Doudoroff pathway
• EMP glycolysis generates
net 2 ATP per glucose
• ED glycolysis only generates
one ATP per glucose
39
Flamholz et al., PNAS 2013
Entner-Doudoroff pathway
• EMP glycolysis generates
net 2 ATP per glucose
• ED glycolysis only generates
one ATP per glucose
• Why?
40
Flamholz et al., PNAS 2013
Entner-Doudoroff pathway
• “ED pathway is expected to
require several-fold less
enzymatic protein to
achieve the same glucose
conversion rate as the EMP
pathway”
41
Flamholz et al., PNAS 2013
Entner-Doudoroff pathway
• “energy-deprived anaerobes
overwhelmingly rely upon
the higher ATP yield of the
EMP pathway, whereas the
ED pathway is common
among facultative
anaerobes and even more
common among aerobes”
42
Flamholz et al., PNAS 2013
Pentose-Phosphate pathway
• aka phosphogluconate pathway and the hexose monophosphate shunt
• Parallels glycolysis, generates NADPH and 5C sugars as well as ribose 5-
phosphate, a precursor for the synthesis of nucleotides from glucose
43
https://openstax.org/books/microbiology/pages/c-metabolic-pathways
The Transition Reaction
• Glycolysis produces pyruvate, which can be further oxidized to
generate more energy
• For this to happen, pyruvate must be decarboxylated (below, left)
• This is accomplished by the Coenyzyme-A (“CoA”, below, right)
44
https://openstax.org/books/microbiology/pages/c-metabolic-pathways
Tricarboxylic Acid (TCA) Cycle
• Closed loop pathway in 8 steps that capture the 2C acetyl group of
acetyl-CoA, producing 2 CO2, 1 ATP, 3 NADH and 1 FADH2
45
https://openstax.org/books/microbiology/pages/8-2-catabolism-of-carbohydrates
TCA cycle intersects anabolism and catabolism
• As well as generating energy,
intermediate compounds are
precursors for biosynthesis of
• amino acids,
• chlorophylls,
• fatty acids, and
• nucleotides
• TCA cycle is anabolic and
catabolic
46
https://openstax.org/books/microbiology/pages/8-2-catabolism-of-carbohydrates
TCA cycle
47
Respiration
• Most cellular ATP is
generated by oxidative
phosphorylation
• As opposed to substrate-
level phosphorylation
• In oxidative
phosphorylation, ATP is
formed from the transfer
of electrons from NADH
or FADH2 to O2 by a
series of electron
carriers
• How much ATP depends
on the terminal electron
acceptor
• More ATP from O2 than
from NO3
-, SO4
2-, Fe3+,
CO2, other inorganics
48
https://openstax.org/books/microbiology/pages/8-3-cellular-respiration
Electron Transport Chain
• A series of electron
carriers and ion pumps
embedded in the cell
membrane that pump
protons (H+) across a
membrane
• Proton motive force is
generated by expelling
protons outside of the cell
• Protons then want to flow
across the membrane, but
must go through the ATP
synthase, which drives
ATP production
49
https://openstax.org/books/microbiology/pages/c-metabolic-pathways
Carbohydrate Active Enzymes (CAZy)
http://www.cazy.org
Modules that catalyze the breakdown, biosynthesis or modification of
carbohydrates and glycoconjugates :
• Glycoside Hydrolases (GHs) : hydrolysis and/or rearrangement of glycosidic bonds
• GlycosylTransferases (GTs) : formation of glycosidic bonds
• Polysaccharide Lyases (PLs) : non-hydrolytic cleavage of glycosidic bonds
• Carbohydrate Esterases (CEs) : hydrolysis of carbohydrate esters
• Auxiliary Activities (AAs) : redox enzymes that act in conjunction with CAZymes.
Associated Modules currently covered
• Carbohydrate-Binding Modules (CBMs) : adhesion to carbohydrates
50
Metabolic Modeling
• Combination of genome
sequence with physiology to
predict growth
• Mathematical network
model that represents the
systems biology of metabolic
pathways within an organism
51
Sertbas & Ulgen, Front. C.D.B, 2020
Metabolic models help predict pathogenesis
• …
52
Sertbas & Ulgen, Front. C.D.B, 2020
Metabolic models to identify novel antimicrobial
drug targets and develop new antibiotics
53
https://doi.org/10.1038/s41429-020-00366-2
Metabolic models improve food fermentation
• Lactic acid bacteria like Lactococcus
lactis make lactic acid from sugars in
foods like cheese, yogurt, wine, salami,
and sauerkraut
• They also make therapeutic proteins &
flavor ingredients
• By targeting the lac operon (below),
genetic engineers can tune metabolic
pathways and products (left)
54
https://doi.org/10.1016/j.tibtech.2003.11.011
Lecture Learning Goals
• Describe how genes are identified.
• Distinguish between an open reading frame, a genome feature, a
gene, and a protein coding region.
• Explain how genomes are annotated and the kinds of databases that
are used to classify genes.
• List the genes involved in cellular metabolism, for both energy
generation (catabolism) and cell growth (anabolism).
• Explain the idea behind metabolic models, and describe one
application.
55

More Related Content

Similar to 08_Annotation_2022.pdf

D.N.A and genetics /certified fixed orthodontic courses by Indian dental acad...
D.N.A and genetics /certified fixed orthodontic courses by Indian dental acad...D.N.A and genetics /certified fixed orthodontic courses by Indian dental acad...
D.N.A and genetics /certified fixed orthodontic courses by Indian dental acad...
Indian dental academy
 
Finding genes
Finding genesFinding genes
Finding genes
Sabahat Ali
 
unit-2 cloning vector, r-DNA Technology, PCR.pptx
unit-2 cloning vector, r-DNA Technology, PCR.pptxunit-2 cloning vector, r-DNA Technology, PCR.pptx
unit-2 cloning vector, r-DNA Technology, PCR.pptx
BkGupta21
 
Structural annotation................pptx
Structural annotation................pptxStructural annotation................pptx
Structural annotation................pptx
Cherry
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
Monica Munoz-Torres
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
Monica Munoz-Torres
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Nawfal Aldujaily
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
Monica Munoz-Torres
 
Lecture3BiologicaldataforBioinformatics.pptx
Lecture3BiologicaldataforBioinformatics.pptxLecture3BiologicaldataforBioinformatics.pptx
Lecture3BiologicaldataforBioinformatics.pptx
ahmadFouad24
 
Content of the genome
Content of the genomeContent of the genome
Content of the genome
Kiran Modi
 
Large’ dna genomes
Large’ dna genomesLarge’ dna genomes
Large’ dna genomes
Shital Sharma
 
BCHM 415- Restriction_enzymes.pptx
BCHM 415- Restriction_enzymes.pptxBCHM 415- Restriction_enzymes.pptx
BCHM 415- Restriction_enzymes.pptx
ABUBAKARYAKUBUSADDEE1
 
Dn abarcode
Dn abarcodeDn abarcode
Dn abarcode
vp1221210130
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
Monica Munoz-Torres
 
Genetic Code and Translation.pdf
Genetic Code and Translation.pdfGenetic Code and Translation.pdf
Genetic Code and Translation.pdf
university of karachi
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
Monica Munoz-Torres
 
Restriction enzyme
Restriction enzymeRestriction enzyme
Restriction enzyme
Ananya Azad Hrisha
 
Genetic code.pptx
Genetic code.pptxGenetic code.pptx
Genetic code.pptx
Aliya Fathima Ilyas
 
Genetic Engineering by Kailash Sontakke Botany Sem-VI Unit-IV all
Genetic Engineering by Kailash Sontakke Botany Sem-VI Unit-IV allGenetic Engineering by Kailash Sontakke Botany Sem-VI Unit-IV all
Genetic Engineering by Kailash Sontakke Botany Sem-VI Unit-IV all
KAILASHSONTAKKE
 
Gene discovery
Gene discoveryGene discovery

Similar to 08_Annotation_2022.pdf (20)

D.N.A and genetics /certified fixed orthodontic courses by Indian dental acad...
D.N.A and genetics /certified fixed orthodontic courses by Indian dental acad...D.N.A and genetics /certified fixed orthodontic courses by Indian dental acad...
D.N.A and genetics /certified fixed orthodontic courses by Indian dental acad...
 
Finding genes
Finding genesFinding genes
Finding genes
 
unit-2 cloning vector, r-DNA Technology, PCR.pptx
unit-2 cloning vector, r-DNA Technology, PCR.pptxunit-2 cloning vector, r-DNA Technology, PCR.pptx
unit-2 cloning vector, r-DNA Technology, PCR.pptx
 
Structural annotation................pptx
Structural annotation................pptxStructural annotation................pptx
Structural annotation................pptx
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
 
Lecture3BiologicaldataforBioinformatics.pptx
Lecture3BiologicaldataforBioinformatics.pptxLecture3BiologicaldataforBioinformatics.pptx
Lecture3BiologicaldataforBioinformatics.pptx
 
Content of the genome
Content of the genomeContent of the genome
Content of the genome
 
Large’ dna genomes
Large’ dna genomesLarge’ dna genomes
Large’ dna genomes
 
BCHM 415- Restriction_enzymes.pptx
BCHM 415- Restriction_enzymes.pptxBCHM 415- Restriction_enzymes.pptx
BCHM 415- Restriction_enzymes.pptx
 
Dn abarcode
Dn abarcodeDn abarcode
Dn abarcode
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
Genetic Code and Translation.pdf
Genetic Code and Translation.pdfGenetic Code and Translation.pdf
Genetic Code and Translation.pdf
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
Restriction enzyme
Restriction enzymeRestriction enzyme
Restriction enzyme
 
Genetic code.pptx
Genetic code.pptxGenetic code.pptx
Genetic code.pptx
 
Genetic Engineering by Kailash Sontakke Botany Sem-VI Unit-IV all
Genetic Engineering by Kailash Sontakke Botany Sem-VI Unit-IV allGenetic Engineering by Kailash Sontakke Botany Sem-VI Unit-IV all
Genetic Engineering by Kailash Sontakke Botany Sem-VI Unit-IV all
 
Gene discovery
Gene discoveryGene discovery
Gene discovery
 

More from Kristen DeAngelis

10_Hypothesis_2022.pdf
10_Hypothesis_2022.pdf10_Hypothesis_2022.pdf
10_Hypothesis_2022.pdf
Kristen DeAngelis
 
09_MeetTheIsolates_2022.pdf
09_MeetTheIsolates_2022.pdf09_MeetTheIsolates_2022.pdf
09_MeetTheIsolates_2022.pdf
Kristen DeAngelis
 
07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf
Kristen DeAngelis
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
Kristen DeAngelis
 
05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf
Kristen DeAngelis
 
04_Assembly_2022.pdf
04_Assembly_2022.pdf04_Assembly_2022.pdf
04_Assembly_2022.pdf
Kristen DeAngelis
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf
Kristen DeAngelis
 
02_Microbio590B_genomics_2022.pdf
02_Microbio590B_genomics_2022.pdf02_Microbio590B_genomics_2022.pdf
02_Microbio590B_genomics_2022.pdf
Kristen DeAngelis
 
01_Microbio590B_intro_2022.pdf
01_Microbio590B_intro_2022.pdf01_Microbio590B_intro_2022.pdf
01_Microbio590B_intro_2022.pdf
Kristen DeAngelis
 
MorrillMicrobeMadness_HowtoPlay_Bracket.pdf
MorrillMicrobeMadness_HowtoPlay_Bracket.pdfMorrillMicrobeMadness_HowtoPlay_Bracket.pdf
MorrillMicrobeMadness_HowtoPlay_Bracket.pdf
Kristen DeAngelis
 
MorrillMicrobeMadness_2022.pdf
MorrillMicrobeMadness_2022.pdfMorrillMicrobeMadness_2022.pdf
MorrillMicrobeMadness_2022.pdf
Kristen DeAngelis
 
Lecture 11 (3 11-2021) acellular life
Lecture 11 (3 11-2021) acellular lifeLecture 11 (3 11-2021) acellular life
Lecture 11 (3 11-2021) acellular life
Kristen DeAngelis
 
Lecture 10 (3 9-2021) archaea
Lecture 10 (3 9-2021) archaeaLecture 10 (3 9-2021) archaea
Lecture 10 (3 9-2021) archaea
Kristen DeAngelis
 
Lecture 09 (3 4-2021) euks
Lecture 09 (3 4-2021) euksLecture 09 (3 4-2021) euks
Lecture 09 (3 4-2021) euks
Kristen DeAngelis
 
Lecture 08 (3 2-2021) rares
Lecture 08 (3 2-2021) raresLecture 08 (3 2-2021) rares
Lecture 08 (3 2-2021) rares
Kristen DeAngelis
 
Lecture 07 (2 25-21) soils
Lecture 07 (2 25-21) soilsLecture 07 (2 25-21) soils
Lecture 07 (2 25-21) soils
Kristen DeAngelis
 
Lecture 06 (2 23-2021) microbial mats
Lecture 06 (2 23-2021) microbial matsLecture 06 (2 23-2021) microbial mats
Lecture 06 (2 23-2021) microbial mats
Kristen DeAngelis
 
Lecture 05 (2 16-2021) baas becking
Lecture 05 (2 16-2021) baas beckingLecture 05 (2 16-2021) baas becking
Lecture 05 (2 16-2021) baas becking
Kristen DeAngelis
 
Lecture 04 (2 11-2021) motility
Lecture 04 (2 11-2021) motilityLecture 04 (2 11-2021) motility
Lecture 04 (2 11-2021) motility
Kristen DeAngelis
 
Lecture 03 (2 09-2021) early earth
Lecture 03 (2 09-2021) early earthLecture 03 (2 09-2021) early earth
Lecture 03 (2 09-2021) early earth
Kristen DeAngelis
 

More from Kristen DeAngelis (20)

10_Hypothesis_2022.pdf
10_Hypothesis_2022.pdf10_Hypothesis_2022.pdf
10_Hypothesis_2022.pdf
 
09_MeetTheIsolates_2022.pdf
09_MeetTheIsolates_2022.pdf09_MeetTheIsolates_2022.pdf
09_MeetTheIsolates_2022.pdf
 
07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
 
05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf
 
04_Assembly_2022.pdf
04_Assembly_2022.pdf04_Assembly_2022.pdf
04_Assembly_2022.pdf
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf
 
02_Microbio590B_genomics_2022.pdf
02_Microbio590B_genomics_2022.pdf02_Microbio590B_genomics_2022.pdf
02_Microbio590B_genomics_2022.pdf
 
01_Microbio590B_intro_2022.pdf
01_Microbio590B_intro_2022.pdf01_Microbio590B_intro_2022.pdf
01_Microbio590B_intro_2022.pdf
 
MorrillMicrobeMadness_HowtoPlay_Bracket.pdf
MorrillMicrobeMadness_HowtoPlay_Bracket.pdfMorrillMicrobeMadness_HowtoPlay_Bracket.pdf
MorrillMicrobeMadness_HowtoPlay_Bracket.pdf
 
MorrillMicrobeMadness_2022.pdf
MorrillMicrobeMadness_2022.pdfMorrillMicrobeMadness_2022.pdf
MorrillMicrobeMadness_2022.pdf
 
Lecture 11 (3 11-2021) acellular life
Lecture 11 (3 11-2021) acellular lifeLecture 11 (3 11-2021) acellular life
Lecture 11 (3 11-2021) acellular life
 
Lecture 10 (3 9-2021) archaea
Lecture 10 (3 9-2021) archaeaLecture 10 (3 9-2021) archaea
Lecture 10 (3 9-2021) archaea
 
Lecture 09 (3 4-2021) euks
Lecture 09 (3 4-2021) euksLecture 09 (3 4-2021) euks
Lecture 09 (3 4-2021) euks
 
Lecture 08 (3 2-2021) rares
Lecture 08 (3 2-2021) raresLecture 08 (3 2-2021) rares
Lecture 08 (3 2-2021) rares
 
Lecture 07 (2 25-21) soils
Lecture 07 (2 25-21) soilsLecture 07 (2 25-21) soils
Lecture 07 (2 25-21) soils
 
Lecture 06 (2 23-2021) microbial mats
Lecture 06 (2 23-2021) microbial matsLecture 06 (2 23-2021) microbial mats
Lecture 06 (2 23-2021) microbial mats
 
Lecture 05 (2 16-2021) baas becking
Lecture 05 (2 16-2021) baas beckingLecture 05 (2 16-2021) baas becking
Lecture 05 (2 16-2021) baas becking
 
Lecture 04 (2 11-2021) motility
Lecture 04 (2 11-2021) motilityLecture 04 (2 11-2021) motility
Lecture 04 (2 11-2021) motility
 
Lecture 03 (2 09-2021) early earth
Lecture 03 (2 09-2021) early earthLecture 03 (2 09-2021) early earth
Lecture 03 (2 09-2021) early earth
 

Recently uploaded

Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
suyashempire
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 

Recently uploaded (20)

Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 

08_Annotation_2022.pdf

  • 1. Genome Annotation MICROBIO 590B Bioinformatics Lab: Bacterial Genomics Professor Kristen DeAngelis UMass Amherst Fall 2022 1
  • 2. Lecture Learning Goals • Describe how genes are identified. • Distinguish between an open reading frame, a genome feature, a gene, and a protein coding region. • Explain how genomes are annotated and the kinds of databases that are used to classify genes. • List the genes involved in cellular metabolism, for both energy generation (catabolism) and cell growth (anabolism). • Explain the idea behind metabolic models, and describe one application. 2
  • 3. Annotation of an Open Reading Frame • … 3
  • 4. Open Reading Frames • Some ORFs are located one strand, and others, on the other strand- facing in the opposite orientation. The strands are designated as + or – and ORFs are diagrammed as located on the top (+) or bottom (-) strand template. The diagram below shows that most ORFs are in the same orientation for S-TIM5 bacteriophage. • For the ORFs located above the line, ‘upstream’, where the promoter is located (the 5’ end of the ORF), is to the left. • Open reading frames (ORFs) are sections of the genome that are flanked by start and stop codons, and thus can be readily identified with computer algorithms. Algorithm identify ORFs that may or may not be used by the cell to produce a protein (termed CDS- coding sequence). 4
  • 5. Open Reading Frames 5 Sabehi, Shaulov, Silver, Yanai, Harel, and Lindell, PNAS. 2012
  • 6. Origin of replication • Models for bacterial (A) and eukaryotic (B) DNA replication initiation. • A) Circular bacterial chromosomes contain a cis-acting element, the replicator, that is located at or near replication origins. • B) Linear eukaryotic chromosomes contain many replication origins. • Most bacterial chromosomes are circular and contain a single origin of chromosomal replication (oriC). • Origins in bacteria contain three functional elements that control origin activity: • conserved DNA repeats that are specifically recognized by DnaA (called DnaA-boxes) • an AT-rich DNA unwinding element (DUE) • and binding sites for proteins that help regulate replication initiation 6
  • 7. Ribosomal operons tend to locate near the origin of replication • rRNA is the ribosomal RNA, a major constituent of the ribosome, accounting for about 2/3 of its mass • A large number of ribosomes is required for growing cells • Fast-growing cells have many copies of the ribosomal operon 7 http://book.bionumbers.org/how-many-ribosomal-rna-gene-copies-are-in-the-genome/
  • 8. GC skew • The leading (single) strand tends to have more Gs than Cs, though the number of each base are the same when you examine all base pairs (double stranded). • The difference is referred to as GC skew, which can be examined to locate the origin of replication. • When the G content exceed the C content, this is considered a positive skew and indicates a leading strand. 8 Billings et al., Standards in Genomic Sciences 2015
  • 9. key elements to genome annotation 1. The program scans through the sequence to identify rRNA and tRNA genes. • rRNA = ribosomal RNA genes, structural RNA in the ribosome with ribosomal proteins • tRNA = transfer RNA genes, connects the amino acid to the mRNA for growing proteins 2. The program predicts gene-encoding regions (also known as Open Reading Frames, or ORFs) 3. The program looks for other elements of interest (phages, CRISPR arrays, etc) 4. Compare the sequence of a feature (any of items 1-3) to a reference database of sequences with known functions. If the sequence looks similar to what has already been annotated in the database (hopefully based on experimental evidence), then it assigns the same function to this sequence - whether or not that is actually what it does! But it's the best we can do. 9
  • 10. Ribosomes and non-coding RNA • Ribosomes are mostly coded in operons • Ribosome structure requires 3 types of structural RNA molecules: 5s, 16s and 23s rRNAs • Ribosomes also require proteins; these are also good phylogenetic markers • Unlinked rRNA genes are widespread among bacteria and archaea 10 Brewer et al., ISMEJ 2019
  • 11. Annotate Genomes with Prokka • Number of genes predicted • aka total CDS • aka total coding sequences • Number of protein coding genes • Number of genes with non-hypothetical function • Number of genes with EC number • Total tRNAs • Total rRNAs 11 Seemann, Bioinformatics 2014
  • 12. How many ORFs are annotated? • UP to half of all ORFs have no known homologs… ! • Orphan genes, or ORFans … usually considered unique to a very narrow taxon, generally a species • Orphans are a subset of taxonomically-restricted genes (TRGs), which are unique to a specific taxonomic level (e.g. plant-specific) • Non-homology based methods based on the context and the interactions of a protein may help identify missing metabolic activities and functional annotation • Why? • Some are sequencing errors • Some may be derived from horizontal gene transfer, duplication and divergence, or de novo origination • Some could be non-coding RNAs 12
  • 13. Pseudogenes • Pseudogenes are nonfunctional segments of DNA that resemble functional genes • Most bacterial pseudogenes are found in non-free-living organisms, like symbionts or obligate intracellular parasites • These will (generally) not be included in genome annotations 13
  • 14. Categorizing protein coding genes • Many organizational schemes categorize protein coding genes • Which one you choose depends upon which are available your goals • Common options include: • Enzyme (enzyme nomenclature) and EC numbers, • FIGfams (functional homologs, part of SEED subsystems), • Pfam and TIGRfam (curated protein families), • COG (curated clusters of orthologous groups of proteins), • KO (KEGG Orthology), KEGG (metabolic pathways and reactions), • InterPro (protein families and domains), • GO (gene ontologies), • LIGAND (compounds), and • MetaCyc (metabolic pathways) 14 https://img.jgi.doe.gov/datasource.html
  • 15. Categorizing protein coding genes: EC number • EC number stand for Enzyme Commission number • EC numbers are assigned by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology 15
  • 16. Categorizing protein coding genes: EC number • EC numbers have four positions which describe exactly what kind of reaction the enzyme catalyzes • An example is beta-glucosidase, the terminal exonuclease in the depolymerization of cellulose to sugars 16 EC 3.2.1.21 general type of reaction catalyzed by the enzyme; EC 3 group is hydrolyase https://www.qmul.ac.uk/sbcs/iubmb/enzyme/EC3/2/1/21.html Subclass of the top-level group; EC 3.2 group is glycosylases Sub-subclass of the top-level group; EC 3.2.1 group is Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds serial number of the enzyme in its sub-subclass; β-glucosidase, Hydrolysis of terminal, non-reducing β- D-glucosyl residues with release of β-D-glucose
  • 17. Categorizing protein coding genes: FIGfams • The original SEED Project was started in 2003 by the Fellowship for Interpretation of Genomes (FIG) as an open source effort • annotation is done by the curation of subsystems across many genomes, not on a gene- by-gene basis • From the curated subsystems we extract a set of freely available protein families (FIGfams) • These FIGfams form the core component of the RAST server (RAST=Rapid Annotation using Subsytems Technology) 17 https://www.theseed.org/wiki/Home_of_the_SEED
  • 18. Categorizing protein coding genes: FIGfams 18 Meyer et al., Nucleic Acids Research 2009
  • 19. Categorizing protein coding genes: FIGfams • Each FIGfam is a set of proteins that are believed to be isofunctional homologs • they all are believed to implement the same function, • and they are believed to derive from a common ancestor because they appear to be similar 19
  • 20. Categorizing protein coding genes: pfams • The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). • Pfam 34.0 (March 2021, 19179 entries) • The general purpose of the Pfam database is to provide a complete and accurate classification of protein families and domains 20 http://pfam.xfam.org; Mistry et al., Nucleic Acids Research, 2020
  • 21. Categorizing protein coding genes: pfams • Proteins may have multiple pfams, since domains are characterized 21 Mistry et al., Nucleic Acids Research, 2020 • Newly revised the Pfam entries that cover the SARS-CoV-2 proteome, with new entries for regions not covered by Pfam. • The structure of NSP15 from Kim et al. shows the three new Pfam domains, • (i) CoV_NSP15_N Coronavirus replicase domain in red, • (ii) CoV_NSP15_M Coronavirus replicase NSP15 domain in blue, • (iii) CoV_NSP15_C Coronavirus replicase NSP15, uridylate-specific endoribonuclease in green.
  • 22. Categorizing protein coding genes: COGs • Clusters of Orthologous Genes (COGs) • relatively small collection of fewer than 5000 clusters of orthologous proteins (COGs) consists of the products of the most widespread bacterial and archaeal genes 22 https://www.ncbi.nlm.nih.gov/research/COG
  • 23. Categorizing protein coding genes: COGs 23 Shields et al., mSphere 2018 • An example of how COGs are used in analyzing change in relative abundance of protein coding genes across treatments
  • 24. Categorizing protein coding genes: KEGG and KO • KEGG: Kyoto Encyclopedia of Genes and Genomes • KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies. 24 https://www.genome.jp/kegg/
  • 25. Categorizing protein coding genes: KEGG and KO • … 25 https://www.genome.jp/kegg/
  • 26. Categorizing protein coding genes: KEGG and KO • KEGG consists of eighteen original databases in four categories 26 Kanehisa et al., Nucleic Acids Research 2020
  • 27. Categorizing protein coding genes: KEGG and KO 27
  • 28. Categorizing protein coding genes: KEGG and KO 28 • Circles represent metabolites • Lines represent enzymes that make biochemical transformations
  • 29. Categorizing protein coding genes: KEGG and KO 29
  • 30. Categorizing protein coding genes: KEGG and KO • The KO (KEGG Orthology) database is a database of molecular functions represented in terms of functional orthologs. • A functional ortholog is manually defined in the context of KEGG molecular networks, namely, KEGG pathway maps, BRITE hierarchies and KEGG modules. • Each node of the network, such as a box in the KEGG pathway map, is given a KO identifier (called K number) as a functional ortholog defined from experimentally characterized genes and proteins in specific organisms, which are then used to assign orthologous genes in other organisms based on sequence similarity. • The granularity of "function" is context-dependent, and the resulting KO grouping may correspond to a group of highly similar sequences within a limited organism group or it may be a more divergent group. 30
  • 31. Categorizing protein coding genes: KEGG and KO • The KO (KEGG Orthology) database • KEGG pathway maps are drawn based on experimental evidence in specific organisms but they are designed to be applicable to other organisms as well, because different organisms, such as human and mouse, often share identical pathways consisting of functionally identical genes, called orthologous genes or orthologs 31
  • 32. Metabolism • All chemical reactions inside a cell • Metabolic pathways are the stepwise reactions that generate energy by breaking down larger molecules (catabolism) or that are biosynthetic and require energy (anabolism) 32 https://openstax.org/books/microbiology/pages/8-1-energy-matter-and-enzymes
  • 33. Metabolism • The energy currency of cells include ATP, NAD+, NADP+, and FAD • Exergonic reactions are coupled to endergonic reactions to make the combinations favorable 33 https://openstax.org/books/microbiology/pages/8-1-energy-matter-and-enzymes
  • 34. Catabolism of carbohydrates: glycolysis • the most common pathway for the metabolism of glucose • Produces energy, reduced electron carriers, and precursor molecules for anabolism • Can be coupled to aerobic or anaerobic growth • Glycolysis • Embden-Meyerhof-Parnoff pathway, aka “glycolysis” • Entner-Doudoroff pathway is an alternative glycolysis • Pentose-phosphate pathway processes five-carbon sugars 34
  • 35. Glycolysis, the “upper” half • 2 ATPs are used to phosphorylate glucose, which is then split into two 3-carbon molecules 35 https://openstax.org/books/microbiology/pages/c-metabolic-pathways
  • 36. Glycolysis, the “lower” half • Further phosphorylation requires NAD+, producing 4 ATPs per glucose • Net 2 ATP per glucose 36 https://openstax.org/books/microbiology/pages/c-metabolic-pathways
  • 37. Substrate-level phosphorylation • One of two enzymatic reactions in the energy payoff phase of glycolysis generates ATP 37 https://openstax.org/books/microbiology/pages/8-2-catabolism-of-carbohydrates
  • 38. Entner-Doudoroff pathway • to catabolize glucose to pyruvate, ED uses the unique enzymes • 6-phosphogluconate dehydratase aldolase (EC 4.2.1.12) and • 2-keto-deoxy-6- phosphogluconate aldolase (EC 4.2.1.14) (KDPG) 38 https://openstax.org/books/microbiology/pages/c-metabolic-pathways
  • 39. Entner-Doudoroff pathway • EMP glycolysis generates net 2 ATP per glucose • ED glycolysis only generates one ATP per glucose 39 Flamholz et al., PNAS 2013
  • 40. Entner-Doudoroff pathway • EMP glycolysis generates net 2 ATP per glucose • ED glycolysis only generates one ATP per glucose • Why? 40 Flamholz et al., PNAS 2013
  • 41. Entner-Doudoroff pathway • “ED pathway is expected to require several-fold less enzymatic protein to achieve the same glucose conversion rate as the EMP pathway” 41 Flamholz et al., PNAS 2013
  • 42. Entner-Doudoroff pathway • “energy-deprived anaerobes overwhelmingly rely upon the higher ATP yield of the EMP pathway, whereas the ED pathway is common among facultative anaerobes and even more common among aerobes” 42 Flamholz et al., PNAS 2013
  • 43. Pentose-Phosphate pathway • aka phosphogluconate pathway and the hexose monophosphate shunt • Parallels glycolysis, generates NADPH and 5C sugars as well as ribose 5- phosphate, a precursor for the synthesis of nucleotides from glucose 43 https://openstax.org/books/microbiology/pages/c-metabolic-pathways
  • 44. The Transition Reaction • Glycolysis produces pyruvate, which can be further oxidized to generate more energy • For this to happen, pyruvate must be decarboxylated (below, left) • This is accomplished by the Coenyzyme-A (“CoA”, below, right) 44 https://openstax.org/books/microbiology/pages/c-metabolic-pathways
  • 45. Tricarboxylic Acid (TCA) Cycle • Closed loop pathway in 8 steps that capture the 2C acetyl group of acetyl-CoA, producing 2 CO2, 1 ATP, 3 NADH and 1 FADH2 45 https://openstax.org/books/microbiology/pages/8-2-catabolism-of-carbohydrates
  • 46. TCA cycle intersects anabolism and catabolism • As well as generating energy, intermediate compounds are precursors for biosynthesis of • amino acids, • chlorophylls, • fatty acids, and • nucleotides • TCA cycle is anabolic and catabolic 46 https://openstax.org/books/microbiology/pages/8-2-catabolism-of-carbohydrates
  • 48. Respiration • Most cellular ATP is generated by oxidative phosphorylation • As opposed to substrate- level phosphorylation • In oxidative phosphorylation, ATP is formed from the transfer of electrons from NADH or FADH2 to O2 by a series of electron carriers • How much ATP depends on the terminal electron acceptor • More ATP from O2 than from NO3 -, SO4 2-, Fe3+, CO2, other inorganics 48 https://openstax.org/books/microbiology/pages/8-3-cellular-respiration
  • 49. Electron Transport Chain • A series of electron carriers and ion pumps embedded in the cell membrane that pump protons (H+) across a membrane • Proton motive force is generated by expelling protons outside of the cell • Protons then want to flow across the membrane, but must go through the ATP synthase, which drives ATP production 49 https://openstax.org/books/microbiology/pages/c-metabolic-pathways
  • 50. Carbohydrate Active Enzymes (CAZy) http://www.cazy.org Modules that catalyze the breakdown, biosynthesis or modification of carbohydrates and glycoconjugates : • Glycoside Hydrolases (GHs) : hydrolysis and/or rearrangement of glycosidic bonds • GlycosylTransferases (GTs) : formation of glycosidic bonds • Polysaccharide Lyases (PLs) : non-hydrolytic cleavage of glycosidic bonds • Carbohydrate Esterases (CEs) : hydrolysis of carbohydrate esters • Auxiliary Activities (AAs) : redox enzymes that act in conjunction with CAZymes. Associated Modules currently covered • Carbohydrate-Binding Modules (CBMs) : adhesion to carbohydrates 50
  • 51. Metabolic Modeling • Combination of genome sequence with physiology to predict growth • Mathematical network model that represents the systems biology of metabolic pathways within an organism 51 Sertbas & Ulgen, Front. C.D.B, 2020
  • 52. Metabolic models help predict pathogenesis • … 52 Sertbas & Ulgen, Front. C.D.B, 2020
  • 53. Metabolic models to identify novel antimicrobial drug targets and develop new antibiotics 53 https://doi.org/10.1038/s41429-020-00366-2
  • 54. Metabolic models improve food fermentation • Lactic acid bacteria like Lactococcus lactis make lactic acid from sugars in foods like cheese, yogurt, wine, salami, and sauerkraut • They also make therapeutic proteins & flavor ingredients • By targeting the lac operon (below), genetic engineers can tune metabolic pathways and products (left) 54 https://doi.org/10.1016/j.tibtech.2003.11.011
  • 55. Lecture Learning Goals • Describe how genes are identified. • Distinguish between an open reading frame, a genome feature, a gene, and a protein coding region. • Explain how genomes are annotated and the kinds of databases that are used to classify genes. • List the genes involved in cellular metabolism, for both energy generation (catabolism) and cell growth (anabolism). • Explain the idea behind metabolic models, and describe one application. 55