SlideShare a Scribd company logo
GENE315: Genome size and complexity
Paul Gardner
March 14, 2023
Introduction
▶ I started at Otago in 2018.
▶ Taught bioinformatics, genomics & biostatistics for 6 years at
the University of Canterbury.
▶ Worked for ∼ 10 years in Europe. I’ve lived in Bielefeld,
Copenhagen and Cambridge. Taught courses in Denmark,
Cambridge, Poland, Portugal, Australia, USA, and NZ.
▶ Originally from Whāngārā, married with three children.
Overview of lectures 03-08
▶ 3. Genome size and complexity.
▶ 4. Comparative genomics & genome annotation.
▶ 5. Comparative genomics & sequence alignment.
▶ 6. Comparative genomics & homology search.
▶ 7. Measuring genome function.
Objectives for lecture 03
▶ An appreciation for:
▶ “Junk DNA” & the C-value paradox.
▶ The ENCODE project.
▶ Critical appraisals of genome function.
What is a gene?
NIH:
Wikipedia:
Ohno S (1972) So Much “Junk DNA” in our Genome. Brookhaven Symposium on Biology. – available on
Blackboard.
Summary of Susumu Ohno’s main points about junk DNA
▶ The human genome is roughly 3 × 109 bases, ≈ 750 times E.
coli.
▶ Assuming the same gene density as E. coli, humans will have
≈ 3 million genes.
▶ HOWEVER, if gene density is preserved, then salamanders
have 10 times the number of genes that we do.
▶ There may be a strict upper limit to the number of gene loci.
▶ Therefore, only a fraction of human DNA is “genic”.
▶ Argues that based upon mutation rates, that the number of
genes must be less than 1
mutation rate to avoid meltdown.
▶ and the number of genes is ≈ 30, 000, therefore about 6% of
our DNA is genic.1
▶ Speculates about some hypothetical functions of non-coding
DNA, e.g. chromosome structure, promoter/operators,
inhibiting non-sense/frameshift mutations, ...
▶ Speculates that our genome is littered with the “fossil”
remains of extinct genes.
Vertebrate genomes & junk DNA
▶ Vary in length by two orders of magnitude, e.g. bird genomes
are ≈ 1Gb, while salamander genomes are ≈ 32Gb (giant
lungfish genome is ≈ 43Gb).
▶ Variation largely driven by decaying remnants of transposons.
▶ ≈ 300Mb of sequence is conserved across the vertebrates.
▶ Randomly generated sequences when inserted into genomes
are also transcribed, promoters are very degenerate.
▶ I.e. many different sequences can be bound by a transcription
factor
▶ Which suggests the number of functional elements should not
scale with genome length.
Ohno S (1972) So much’junk’DNA in our genome. In Evolution of Genetic Systems, Brookhaven Symp. Biol.
Mammalian genome variation & junk DNA
▶ But salamanders are special... (if the genome sizes were
longer in the birds than in the salamanders we could
rationalise how complex birds are).
▶ Let’s look at just the mammalian extremes of genome size
then... the red vizcacha rat, the mountain vizcacha rat and a
bat!
Which of these mammals is more complex?
Southern bent-wing bat
Miniopterus orianae bassanii
~1.6 Gb genome
Mountain viscacha rat
Octomys mimax
~3.2 Gb genome
Plains viscacha rat
Tympanoctomys barrerae
~6.2 Gb genome
Evans et al. (2017) Evolution of the Largest Mammalian Genome. GBE.
30 years later we forgot about Ohno’s line of reasoning...
Editorial (2000) The nature of the number. Nature genetics.
Willyard (2018) New human gene tally reignites debate. Nature.
Hatje et al. (2019) The Protein-Coding Human Genome: Annotating High-Hanging Fruits. BioEssays.
Genome sizes & number of genes
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
A
Nucleus
Cytoplasm
ER
Golgi
Lysosome
Mitochondria
B
Interactions
Genome length
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Paramecium
Arabidopsis
Homo
Gemmata
Escherichia
Saccharomyces
Lokiarchaeum
Mycoplasma
Nanoarchaeum
Escherichia phage
104
105
106
107
108
109
1010
101
102
103
104
105
●
Animal
Fungi
Plant
Protists
Archaea
Bacteria
Phage
Transect
Number
of
protein
coding
genes
▶ Prediction: “In fact, there seems to be a strict upper limit for
the number of genes loci which we can afford to keep in our
genome” (Ohno, 1972)
▶ Eukaryotic genomes can vary widely in size, with little change
in the number of protein-coding genes. Gardner, Gumy & Fineran (2019)
Unpublished.
Related to junk DNA, is the “C-value paradox”
▶ C-values (a.k.a. genome size) vary enormously between
closely related species.
▶ Vertebrates: birds (1 Gb), bats (1.6 Gb), human (3 Gb),
tuatara (5 Gb), salamander (32 Gb), lungfish (130 Gb).
▶ Allium (onions): 7 Mb to 32 Mb.
▶ There is little relationship between the “perceived
complexity” (or the corresponding number of genes) of a
species and the size of a genome.
▶ Given that C-values are relatively constant within
species/cells, yet bears no relationship to the complexity, or
presumed number of genes, then this somewhat of a paradox.
https://en.wikipedia.org/wiki/C-value#C-value paradox
Eddy, SR (2012) The C-value paradox, junk DNA and ENCODE. Current Biology.
Palazzo & Gregory et al. (2014) The Case for Junk DNA PLoS Genetics.
Drivers of genome size variation
C-value variation is largely driven by transposable elements and
other repetitive DNA elements
https://sandwalk.blogspot.com/2018/03/whats-in-your-genome-pie-chart.html
http://www.dfam.org/entry/DF0000001/hits
Protein functions, 2023 – still lots of unknowns
cytoskeletal protein 3% (627)
transporter 5% (1031)
scaffold/adaptor protein 4% (746)
protein−binding activity modulator 4% (793)
RNA metabolism protein 4% (827)
gene−specific transcriptional regulator 7% (1516)
defense/immunity protein 3% (664)
metabolite interconversion enzyme 9% (1939)
protein modifying enzyme 8% (1622)
transmembrane signal receptor 6% (1151)
Unclassified 33% (6695)
Other 14% (2980)
Unknown 0% (0)
20,851 Human Proteome Functions 2023
Gardner (2023)
Data from Panther: http://www.pantherdb.org
The ENCODE Project 2012
▶ 443 authors, from genome/analysis centres around the world
▶ Aim to map functional elements across the human genome
▶ Generated LOTS of important data that we are still using:
▶ Transcription: RNA-seq, CAGE (Cap Analysis Gene
Expression), RNA-PET (paired end tag)
▶ Translation: mass spectrometry
▶ Transcription-factor-binding sites: ChIP-seq and DNase-seq
▶ Chromatin structure: DNase-seq, FAIRE-seq, histone
ChIP-seq and MNase-seq
▶ DNA methylation sites: RRBS assay
▶ Largely used 3 cell lines: erythroleukaemia K562;
B-lymphoblastoid GM12878 & H1 embryonic stem cells
The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome.
Nature.
ENCODE 2012
http://www.nature.com/encode/
ENCODE 2012: LOTS of cool results (30 papers)
E.g. transcription is correlated (r = 0.9) with transcription factor
binding (left) and cancer associated SNPs are linked to chromatin
structure, which may influence gene expression (right).
ENCODE 2012
▶ Acknowledged that “Comparative genomic studies suggest
that 3% to 8% of bases are under purifying (negative)
selection”
▶ Yet conclude “The vast majority (80.4%) of the human
genome participates in at least one biochemical RNA- and/or
chromatin-associated event in at least one cell type.”
▶ “Many non-coding variants in individual genome sequences lie
in ENCODE-annotated functional regions; this number is at
least as large as those that lie in protein-coding genes.”
▶ “Single nucleotide polymorphisms (SNPs) associated with
disease by GWAS are enriched within non-coding functional
elements”
The ENCODE press lead to some “interesting” headlines:
And there was a predictable scientific backlash:
▶ Eddy, SR (2012) The C-value paradox, junk DNA and
ENCODE. Current Biology.
▶ Q&A style discussion of C-values, junk DNA and ENCODE
▶ Eddy, SR (2013) The ENCODE project: Missteps
overshadowing a success. Current Biology.
▶ “all reproducible biochemical events were claimed to be
‘critical’ and ‘needed’.”
▶ “Far from disproving junk DNA, ENCODE’s operationalized
definition of function included junk DNA”.
▶ “genomes contain a lot of DNA that is not critically important
for host functions, much of which arises from mobile element
replication”.
▶ Sean proposes the “The Random Genome Project: the
missing negative control”.
▶ “Biology is noisy” therefore an assessment of noise in
transcription, translation & DNA binding would allow an
assessment of how much of the ENCODE results are noise,
and how much is genuinely functional.
▶ Recent progress: Deciphering eukaryotic gene-regulatory logic
with 100 million random promoters
And further responses
▶ Doolittle WF (2013) Is junk DNA bunk? A critique of
ENCODE. PNAS.
▶ If the number of functional elements were to rise significantly
with C-value then
▶ (i) organisms with larger genomes are more complex
phenotypically
▶ NB. Organism/phenotypic complexity is generally not well
defined.
▶ (ii) ENCODE’s definition of a functional element identifies
many sites that would not be considered functional or
phenotype-determining by standard uses in biology
▶ (iii) the same phenotypic functions are often determined in a
more diffuse fashion in larger-genomed organisms
▶ Graur D et al. (2013) On the immortality of television
sets:“function” in the human genome according to the
evolution-free gospel of ENCODE. Genome biology and
evolution.
▶ “a very forcefully worded critique” – W. Ford Doolittle
▶ “angry, dogmatic, scattershot, sometimes inaccurate” – Sean
Eddy
Note: Transcription & translation are likely to be noisy
processes
Genome
Functional Junk
Untranscribed Transcribed Transcribed Untranscribed
Untranslated Translated Translated Untranslated
Transcriptome
Proteome
Promoters/Transcription factor binding motifs are
degenerate
▶ “Degenerate” motifs refers to the fact that many different
sequences can promote transcription
▶ This implies there is lots of potential for random, noisy
transcription
http://jaspar.genereg.net/
How would you determine if a genomic region is
“functional”?
▶ Lecture #7
Humans are not the most (or least) complex species!
Big brains, hands, sure. But we can’t breath underwater water,
regrow limbs, don’t have tentacles, can’t fly unassisted, can’t
photosynthesise, melt under mild radiation, turn inside out in a
vacuum, get cancers - circulatory disease - infectious disease, do
stupid things, get too hot and too cold, ...
Humans are weak and puny!
Great Chain of Being
https://en.wikipedia.org/wiki/Category:Obsolete biological theories
The main points
▶ C-values (genome sizes) vary dramatically, even between
closely related species.
▶ The “C-value paradox” is that genome size is not proportional
to the perceived organism “complexity”.
▶ Many “functional” non-coding elements exist in the genome
(.e.g ncRNAs, promoters, etc.)
▶ The majority of the eukaryotic genomes is littered with
“Junk” DNA.
▶ these are largely pseudogenised transposable elements and
endogenous retroviruses
▶ NB. junk ̸= in-active AND non-coding ̸= junk
▶ ENCODE is a valuable resource for identifying “biochemical”
activity in cell lines.
▶ Some of the ENCODE conclusions and press are controversial.
▶ Reports of the death of junk DNA are greatly exaggerated.
Self-evaluation exercises
▶ Outline the C-value paradox, include in your answer a
definition of junk DNA.
▶ What are the main sources of junk DNA?
▶ Describe the ENCODE project and main conclusions.
▶ Outline an approach to distinguish between functional and
noisy cellular processes.
▶ The below figures have been used to associate proportions of
noncoding DNA with “complexity”. Using your knowledge of
the C-value paradox, write a critique of this result.
Mattick (2004) The hidden genetic program of complex organisms. Scientific American.
Taft et al. (2007) The relationship between non-protein-coding DNA and eukaryotic complexity. BioEssays.
Self-evaluation exercises
▶ Use chatGPT to generate an essay on “genome function and
junk DNA”. Critique the essay based upon the references
presented in this lecture.
AI generated art by https://creator.nightcafe.studio/studio, “Junk DNA”, Preset: Surreal
Suggested reading
▶ Important: Ohno S (1972) So Much “Junk DNA” in our
Genome. Brookhaven Symposium on Biology.
▶ Important: Eddy, SR (2013) The ENCODE project: Missteps
overshadowing a success. Current Biology.
▶ Extra: Palazzo & Gregory (2014) The Case for Junk DNA.
PLOS Genetics.
Questions relating to my lectures can be asked & viewed
here:
https://docs.google.com/document/d/1PQd dp7C 0cXA8SwUv-
qrkTOj8c8fUAt-U Z5dg2yc8/edit?usp=sharing
The End
Karitane, 23 Oct 2020.

More Related Content

What's hot

MOLECULAR DOCKING PRINCIPAL OF DRUG DISCOVERY
MOLECULAR DOCKING PRINCIPAL OF DRUG DISCOVERYMOLECULAR DOCKING PRINCIPAL OF DRUG DISCOVERY
MOLECULAR DOCKING PRINCIPAL OF DRUG DISCOVERY
Rajratan Thorat
 
Genetic variation in G protein coupled receptors
Genetic variation in G protein coupled receptorsGenetic variation in G protein coupled receptors
Genetic variation in G protein coupled receptors
SachinGulia12
 
Gene mapping and its sequence
Gene mapping and its sequenceGene mapping and its sequence
Gene mapping and its sequence
Dr.SIBI P ITTIYAVIRAH
 
Nuclear Receptors
Nuclear ReceptorsNuclear Receptors
3d qsar
3d qsar3d qsar
3d qsar
Mahendra G S
 
Genetic variation and its role in health and pharmacology
Genetic variation and its role in health and pharmacologyGenetic variation and its role in health and pharmacology
Genetic variation and its role in health and pharmacology
Gagandeep Jaiswal
 
Genetic mapping and sequencing
Genetic mapping and sequencingGenetic mapping and sequencing
Genetic mapping and sequencing
Aamna Tabassum
 
Pharmacogenomics
PharmacogenomicsPharmacogenomics
Pharmacogenomics
manisha pahal
 
Functional genomics,Pharmaco genomics, and Meta genomics.
Functional genomics,Pharmaco genomics, and Meta genomics.Functional genomics,Pharmaco genomics, and Meta genomics.
Functional genomics,Pharmaco genomics, and Meta genomics.
sangeeta jadav
 
Antisense technologies and antisense oligonucleotides
Antisense technologies and antisense oligonucleotidesAntisense technologies and antisense oligonucleotides
Antisense technologies and antisense oligonucleotides
RangnathChikane
 
Pharmacogenetics and Pharmacogenomics
Pharmacogenetics and PharmacogenomicsPharmacogenetics and Pharmacogenomics
Pharmacogenetics and Pharmacogenomics
Raul Soto
 
Genetic Variations in Drug Transporters
Genetic Variations in Drug TransportersGenetic Variations in Drug Transporters
Drug Receptor Interactions
Drug Receptor InteractionsDrug Receptor Interactions
Drug Receptor Interactions
Shruthi Rammohan
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacology
Deepak Kumar
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
pkchoudhury
 
G protein coupled receptor
G protein coupled receptorG protein coupled receptor
G protein coupled receptor
Fardan Qadeer
 
Snps is pharmagenomic studeis
Snps is pharmagenomic studeisSnps is pharmagenomic studeis
Snps is pharmagenomic studeis
Rajveer Singh
 
G protein coupled receptor(gpcr)
G protein coupled receptor(gpcr)G protein coupled receptor(gpcr)
G protein coupled receptor(gpcr)
BHARAT KUMAR
 
3 d qsar approaches structure
3 d qsar approaches structure3 d qsar approaches structure
3 d qsar approaches structure
ROHIT PAL
 
Pathway analysis 2012
Pathway analysis 2012Pathway analysis 2012
Pathway analysis 2012
Stephen Turner
 

What's hot (20)

MOLECULAR DOCKING PRINCIPAL OF DRUG DISCOVERY
MOLECULAR DOCKING PRINCIPAL OF DRUG DISCOVERYMOLECULAR DOCKING PRINCIPAL OF DRUG DISCOVERY
MOLECULAR DOCKING PRINCIPAL OF DRUG DISCOVERY
 
Genetic variation in G protein coupled receptors
Genetic variation in G protein coupled receptorsGenetic variation in G protein coupled receptors
Genetic variation in G protein coupled receptors
 
Gene mapping and its sequence
Gene mapping and its sequenceGene mapping and its sequence
Gene mapping and its sequence
 
Nuclear Receptors
Nuclear ReceptorsNuclear Receptors
Nuclear Receptors
 
3d qsar
3d qsar3d qsar
3d qsar
 
Genetic variation and its role in health and pharmacology
Genetic variation and its role in health and pharmacologyGenetic variation and its role in health and pharmacology
Genetic variation and its role in health and pharmacology
 
Genetic mapping and sequencing
Genetic mapping and sequencingGenetic mapping and sequencing
Genetic mapping and sequencing
 
Pharmacogenomics
PharmacogenomicsPharmacogenomics
Pharmacogenomics
 
Functional genomics,Pharmaco genomics, and Meta genomics.
Functional genomics,Pharmaco genomics, and Meta genomics.Functional genomics,Pharmaco genomics, and Meta genomics.
Functional genomics,Pharmaco genomics, and Meta genomics.
 
Antisense technologies and antisense oligonucleotides
Antisense technologies and antisense oligonucleotidesAntisense technologies and antisense oligonucleotides
Antisense technologies and antisense oligonucleotides
 
Pharmacogenetics and Pharmacogenomics
Pharmacogenetics and PharmacogenomicsPharmacogenetics and Pharmacogenomics
Pharmacogenetics and Pharmacogenomics
 
Genetic Variations in Drug Transporters
Genetic Variations in Drug TransportersGenetic Variations in Drug Transporters
Genetic Variations in Drug Transporters
 
Drug Receptor Interactions
Drug Receptor InteractionsDrug Receptor Interactions
Drug Receptor Interactions
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacology
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
G protein coupled receptor
G protein coupled receptorG protein coupled receptor
G protein coupled receptor
 
Snps is pharmagenomic studeis
Snps is pharmagenomic studeisSnps is pharmagenomic studeis
Snps is pharmagenomic studeis
 
G protein coupled receptor(gpcr)
G protein coupled receptor(gpcr)G protein coupled receptor(gpcr)
G protein coupled receptor(gpcr)
 
3 d qsar approaches structure
3 d qsar approaches structure3 d qsar approaches structure
3 d qsar approaches structure
 
Pathway analysis 2012
Pathway analysis 2012Pathway analysis 2012
Pathway analysis 2012
 

Similar to ppgardner-lecture03-genomesize-complexity.pdf

VARSHA KOSHLE PAPER 2 C-VALUE PARADOX PRESENTATION MSC 2ND SEM BOTANY.pptx
VARSHA KOSHLE PAPER 2 C-VALUE PARADOX  PRESENTATION MSC 2ND SEM BOTANY.pptxVARSHA KOSHLE PAPER 2 C-VALUE PARADOX  PRESENTATION MSC 2ND SEM BOTANY.pptx
VARSHA KOSHLE PAPER 2 C-VALUE PARADOX PRESENTATION MSC 2ND SEM BOTANY.pptx
alishadewangan1
 
ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdf
Paul Gardner
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
Vijay Raj Yanamala
 
Genome project.pdf
Genome project.pdfGenome project.pdf
Genome project.pdf
ManchikantiDivya
 
Human genome project(ibri)
Human genome project(ibri)Human genome project(ibri)
Human genome project(ibri)
ajay vishwakrma
 
Genomics
GenomicsGenomics
20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics
Javier Quílez Oliete
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
Monica Munoz-Torres
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
c.titus.brown
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Amol Kunde
 
Investigation of phylogenic relationships of shrew populations using genetic...
Investigation of phylogenic relationships  of shrew populations using genetic...Investigation of phylogenic relationships  of shrew populations using genetic...
Investigation of phylogenic relationships of shrew populations using genetic...
Juan Barrera
 
Investigation of phylogenic relationships of shrew populations using genetic...
Investigation of phylogenic relationships  of shrew populations using genetic...Investigation of phylogenic relationships  of shrew populations using genetic...
Investigation of phylogenic relationships of shrew populations using genetic...
Juan Barrera
 
genetics lab poster SRC
genetics lab poster SRCgenetics lab poster SRC
genetics lab poster SRC
Juan Barrera
 
2014 naples
2014 naples2014 naples
2014 naples
c.titus.brown
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
7006ASWATHIRR
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahu
KAUSHAL SAHU
 
Cot curve analysis for gene and genome complexity
Cot curve analysis for gene and genome complexityCot curve analysis for gene and genome complexity
Cot curve analysis for gene and genome complexity
Dr. GURPREET SINGH
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
Monica Munoz-Torres
 
Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?
Andrei Afanasiev
 
DNA Methylation & C Value.pdf
DNA Methylation & C Value.pdfDNA Methylation & C Value.pdf
DNA Methylation & C Value.pdf
soniaangeline
 

Similar to ppgardner-lecture03-genomesize-complexity.pdf (20)

VARSHA KOSHLE PAPER 2 C-VALUE PARADOX PRESENTATION MSC 2ND SEM BOTANY.pptx
VARSHA KOSHLE PAPER 2 C-VALUE PARADOX  PRESENTATION MSC 2ND SEM BOTANY.pptxVARSHA KOSHLE PAPER 2 C-VALUE PARADOX  PRESENTATION MSC 2ND SEM BOTANY.pptx
VARSHA KOSHLE PAPER 2 C-VALUE PARADOX PRESENTATION MSC 2ND SEM BOTANY.pptx
 
ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdf
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 
Genome project.pdf
Genome project.pdfGenome project.pdf
Genome project.pdf
 
Human genome project(ibri)
Human genome project(ibri)Human genome project(ibri)
Human genome project(ibri)
 
Genomics
GenomicsGenomics
Genomics
 
20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Investigation of phylogenic relationships of shrew populations using genetic...
Investigation of phylogenic relationships  of shrew populations using genetic...Investigation of phylogenic relationships  of shrew populations using genetic...
Investigation of phylogenic relationships of shrew populations using genetic...
 
Investigation of phylogenic relationships of shrew populations using genetic...
Investigation of phylogenic relationships  of shrew populations using genetic...Investigation of phylogenic relationships  of shrew populations using genetic...
Investigation of phylogenic relationships of shrew populations using genetic...
 
genetics lab poster SRC
genetics lab poster SRCgenetics lab poster SRC
genetics lab poster SRC
 
2014 naples
2014 naples2014 naples
2014 naples
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahu
 
Cot curve analysis for gene and genome complexity
Cot curve analysis for gene and genome complexityCot curve analysis for gene and genome complexity
Cot curve analysis for gene and genome complexity
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?
 
DNA Methylation & C Value.pdf
DNA Methylation & C Value.pdfDNA Methylation & C Value.pdf
DNA Methylation & C Value.pdf
 

More from Paul Gardner

ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
Paul Gardner
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
Paul Gardner
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
Paul Gardner
 
Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?
Paul Gardner
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
Paul Gardner
 
Clustering
ClusteringClustering
Clustering
Paul Gardner
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
Paul Gardner
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
Paul Gardner
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
Paul Gardner
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
Paul Gardner
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
Paul Gardner
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
Paul Gardner
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
Paul Gardner
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
Paul Gardner
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
Paul Gardner
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
Paul Gardner
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
Paul Gardner
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Paul Gardner
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
Paul Gardner
 
01 nc rna-intro
01 nc rna-intro01 nc rna-intro
01 nc rna-intro
Paul Gardner
 

More from Paul Gardner (20)

ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
 
Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
 
Clustering
ClusteringClustering
Clustering
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
 
01 nc rna-intro
01 nc rna-intro01 nc rna-intro
01 nc rna-intro
 

Recently uploaded

Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
savindersingh16
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
Immunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunologyImmunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunology
VetriVel359477
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
frank0071
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
abhinayakamasamudram
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 

Recently uploaded (20)

Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
Immunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunologyImmunotherapy presentation from clinical immunology
Immunotherapy presentation from clinical immunology
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8Reaching the age of Adolescence- Class 8
Reaching the age of Adolescence- Class 8
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 

ppgardner-lecture03-genomesize-complexity.pdf

  • 1. GENE315: Genome size and complexity Paul Gardner March 14, 2023
  • 2. Introduction ▶ I started at Otago in 2018. ▶ Taught bioinformatics, genomics & biostatistics for 6 years at the University of Canterbury. ▶ Worked for ∼ 10 years in Europe. I’ve lived in Bielefeld, Copenhagen and Cambridge. Taught courses in Denmark, Cambridge, Poland, Portugal, Australia, USA, and NZ. ▶ Originally from Whāngārā, married with three children.
  • 3. Overview of lectures 03-08 ▶ 3. Genome size and complexity. ▶ 4. Comparative genomics & genome annotation. ▶ 5. Comparative genomics & sequence alignment. ▶ 6. Comparative genomics & homology search. ▶ 7. Measuring genome function.
  • 4. Objectives for lecture 03 ▶ An appreciation for: ▶ “Junk DNA” & the C-value paradox. ▶ The ENCODE project. ▶ Critical appraisals of genome function.
  • 5. What is a gene? NIH: Wikipedia:
  • 6. Ohno S (1972) So Much “Junk DNA” in our Genome. Brookhaven Symposium on Biology. – available on Blackboard.
  • 7. Summary of Susumu Ohno’s main points about junk DNA ▶ The human genome is roughly 3 × 109 bases, ≈ 750 times E. coli. ▶ Assuming the same gene density as E. coli, humans will have ≈ 3 million genes. ▶ HOWEVER, if gene density is preserved, then salamanders have 10 times the number of genes that we do. ▶ There may be a strict upper limit to the number of gene loci. ▶ Therefore, only a fraction of human DNA is “genic”. ▶ Argues that based upon mutation rates, that the number of genes must be less than 1 mutation rate to avoid meltdown. ▶ and the number of genes is ≈ 30, 000, therefore about 6% of our DNA is genic.1 ▶ Speculates about some hypothetical functions of non-coding DNA, e.g. chromosome structure, promoter/operators, inhibiting non-sense/frameshift mutations, ... ▶ Speculates that our genome is littered with the “fossil” remains of extinct genes.
  • 8. Vertebrate genomes & junk DNA ▶ Vary in length by two orders of magnitude, e.g. bird genomes are ≈ 1Gb, while salamander genomes are ≈ 32Gb (giant lungfish genome is ≈ 43Gb). ▶ Variation largely driven by decaying remnants of transposons. ▶ ≈ 300Mb of sequence is conserved across the vertebrates. ▶ Randomly generated sequences when inserted into genomes are also transcribed, promoters are very degenerate. ▶ I.e. many different sequences can be bound by a transcription factor ▶ Which suggests the number of functional elements should not scale with genome length. Ohno S (1972) So much’junk’DNA in our genome. In Evolution of Genetic Systems, Brookhaven Symp. Biol.
  • 9. Mammalian genome variation & junk DNA ▶ But salamanders are special... (if the genome sizes were longer in the birds than in the salamanders we could rationalise how complex birds are). ▶ Let’s look at just the mammalian extremes of genome size then... the red vizcacha rat, the mountain vizcacha rat and a bat! Which of these mammals is more complex? Southern bent-wing bat Miniopterus orianae bassanii ~1.6 Gb genome Mountain viscacha rat Octomys mimax ~3.2 Gb genome Plains viscacha rat Tympanoctomys barrerae ~6.2 Gb genome Evans et al. (2017) Evolution of the Largest Mammalian Genome. GBE.
  • 10. 30 years later we forgot about Ohno’s line of reasoning... Editorial (2000) The nature of the number. Nature genetics. Willyard (2018) New human gene tally reignites debate. Nature. Hatje et al. (2019) The Protein-Coding Human Genome: Annotating High-Hanging Fruits. BioEssays.
  • 11. Genome sizes & number of genes 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 A Nucleus Cytoplasm ER Golgi Lysosome Mitochondria B Interactions Genome length ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Paramecium Arabidopsis Homo Gemmata Escherichia Saccharomyces Lokiarchaeum Mycoplasma Nanoarchaeum Escherichia phage 104 105 106 107 108 109 1010 101 102 103 104 105 ● Animal Fungi Plant Protists Archaea Bacteria Phage Transect Number of protein coding genes ▶ Prediction: “In fact, there seems to be a strict upper limit for the number of genes loci which we can afford to keep in our genome” (Ohno, 1972) ▶ Eukaryotic genomes can vary widely in size, with little change in the number of protein-coding genes. Gardner, Gumy & Fineran (2019) Unpublished.
  • 12. Related to junk DNA, is the “C-value paradox” ▶ C-values (a.k.a. genome size) vary enormously between closely related species. ▶ Vertebrates: birds (1 Gb), bats (1.6 Gb), human (3 Gb), tuatara (5 Gb), salamander (32 Gb), lungfish (130 Gb). ▶ Allium (onions): 7 Mb to 32 Mb. ▶ There is little relationship between the “perceived complexity” (or the corresponding number of genes) of a species and the size of a genome. ▶ Given that C-values are relatively constant within species/cells, yet bears no relationship to the complexity, or presumed number of genes, then this somewhat of a paradox. https://en.wikipedia.org/wiki/C-value#C-value paradox Eddy, SR (2012) The C-value paradox, junk DNA and ENCODE. Current Biology. Palazzo & Gregory et al. (2014) The Case for Junk DNA PLoS Genetics.
  • 13. Drivers of genome size variation C-value variation is largely driven by transposable elements and other repetitive DNA elements https://sandwalk.blogspot.com/2018/03/whats-in-your-genome-pie-chart.html http://www.dfam.org/entry/DF0000001/hits
  • 14. Protein functions, 2023 – still lots of unknowns cytoskeletal protein 3% (627) transporter 5% (1031) scaffold/adaptor protein 4% (746) protein−binding activity modulator 4% (793) RNA metabolism protein 4% (827) gene−specific transcriptional regulator 7% (1516) defense/immunity protein 3% (664) metabolite interconversion enzyme 9% (1939) protein modifying enzyme 8% (1622) transmembrane signal receptor 6% (1151) Unclassified 33% (6695) Other 14% (2980) Unknown 0% (0) 20,851 Human Proteome Functions 2023 Gardner (2023) Data from Panther: http://www.pantherdb.org
  • 15. The ENCODE Project 2012 ▶ 443 authors, from genome/analysis centres around the world ▶ Aim to map functional elements across the human genome ▶ Generated LOTS of important data that we are still using: ▶ Transcription: RNA-seq, CAGE (Cap Analysis Gene Expression), RNA-PET (paired end tag) ▶ Translation: mass spectrometry ▶ Transcription-factor-binding sites: ChIP-seq and DNase-seq ▶ Chromatin structure: DNase-seq, FAIRE-seq, histone ChIP-seq and MNase-seq ▶ DNA methylation sites: RRBS assay ▶ Largely used 3 cell lines: erythroleukaemia K562; B-lymphoblastoid GM12878 & H1 embryonic stem cells The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature.
  • 17. ENCODE 2012: LOTS of cool results (30 papers) E.g. transcription is correlated (r = 0.9) with transcription factor binding (left) and cancer associated SNPs are linked to chromatin structure, which may influence gene expression (right).
  • 18. ENCODE 2012 ▶ Acknowledged that “Comparative genomic studies suggest that 3% to 8% of bases are under purifying (negative) selection” ▶ Yet conclude “The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type.” ▶ “Many non-coding variants in individual genome sequences lie in ENCODE-annotated functional regions; this number is at least as large as those that lie in protein-coding genes.” ▶ “Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched within non-coding functional elements”
  • 19. The ENCODE press lead to some “interesting” headlines:
  • 20. And there was a predictable scientific backlash: ▶ Eddy, SR (2012) The C-value paradox, junk DNA and ENCODE. Current Biology. ▶ Q&A style discussion of C-values, junk DNA and ENCODE ▶ Eddy, SR (2013) The ENCODE project: Missteps overshadowing a success. Current Biology. ▶ “all reproducible biochemical events were claimed to be ‘critical’ and ‘needed’.” ▶ “Far from disproving junk DNA, ENCODE’s operationalized definition of function included junk DNA”. ▶ “genomes contain a lot of DNA that is not critically important for host functions, much of which arises from mobile element replication”. ▶ Sean proposes the “The Random Genome Project: the missing negative control”. ▶ “Biology is noisy” therefore an assessment of noise in transcription, translation & DNA binding would allow an assessment of how much of the ENCODE results are noise, and how much is genuinely functional. ▶ Recent progress: Deciphering eukaryotic gene-regulatory logic with 100 million random promoters
  • 21. And further responses ▶ Doolittle WF (2013) Is junk DNA bunk? A critique of ENCODE. PNAS. ▶ If the number of functional elements were to rise significantly with C-value then ▶ (i) organisms with larger genomes are more complex phenotypically ▶ NB. Organism/phenotypic complexity is generally not well defined. ▶ (ii) ENCODE’s definition of a functional element identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology ▶ (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms ▶ Graur D et al. (2013) On the immortality of television sets:“function” in the human genome according to the evolution-free gospel of ENCODE. Genome biology and evolution. ▶ “a very forcefully worded critique” – W. Ford Doolittle ▶ “angry, dogmatic, scattershot, sometimes inaccurate” – Sean Eddy
  • 22. Note: Transcription & translation are likely to be noisy processes Genome Functional Junk Untranscribed Transcribed Transcribed Untranscribed Untranslated Translated Translated Untranslated Transcriptome Proteome
  • 23. Promoters/Transcription factor binding motifs are degenerate ▶ “Degenerate” motifs refers to the fact that many different sequences can promote transcription ▶ This implies there is lots of potential for random, noisy transcription http://jaspar.genereg.net/
  • 24. How would you determine if a genomic region is “functional”? ▶ Lecture #7
  • 25. Humans are not the most (or least) complex species! Big brains, hands, sure. But we can’t breath underwater water, regrow limbs, don’t have tentacles, can’t fly unassisted, can’t photosynthesise, melt under mild radiation, turn inside out in a vacuum, get cancers - circulatory disease - infectious disease, do stupid things, get too hot and too cold, ... Humans are weak and puny! Great Chain of Being https://en.wikipedia.org/wiki/Category:Obsolete biological theories
  • 26. The main points ▶ C-values (genome sizes) vary dramatically, even between closely related species. ▶ The “C-value paradox” is that genome size is not proportional to the perceived organism “complexity”. ▶ Many “functional” non-coding elements exist in the genome (.e.g ncRNAs, promoters, etc.) ▶ The majority of the eukaryotic genomes is littered with “Junk” DNA. ▶ these are largely pseudogenised transposable elements and endogenous retroviruses ▶ NB. junk ̸= in-active AND non-coding ̸= junk ▶ ENCODE is a valuable resource for identifying “biochemical” activity in cell lines. ▶ Some of the ENCODE conclusions and press are controversial. ▶ Reports of the death of junk DNA are greatly exaggerated.
  • 27. Self-evaluation exercises ▶ Outline the C-value paradox, include in your answer a definition of junk DNA. ▶ What are the main sources of junk DNA? ▶ Describe the ENCODE project and main conclusions. ▶ Outline an approach to distinguish between functional and noisy cellular processes. ▶ The below figures have been used to associate proportions of noncoding DNA with “complexity”. Using your knowledge of the C-value paradox, write a critique of this result. Mattick (2004) The hidden genetic program of complex organisms. Scientific American. Taft et al. (2007) The relationship between non-protein-coding DNA and eukaryotic complexity. BioEssays.
  • 28. Self-evaluation exercises ▶ Use chatGPT to generate an essay on “genome function and junk DNA”. Critique the essay based upon the references presented in this lecture. AI generated art by https://creator.nightcafe.studio/studio, “Junk DNA”, Preset: Surreal
  • 29. Suggested reading ▶ Important: Ohno S (1972) So Much “Junk DNA” in our Genome. Brookhaven Symposium on Biology. ▶ Important: Eddy, SR (2013) The ENCODE project: Missteps overshadowing a success. Current Biology. ▶ Extra: Palazzo & Gregory (2014) The Case for Junk DNA. PLOS Genetics.
  • 30. Questions relating to my lectures can be asked & viewed here: https://docs.google.com/document/d/1PQd dp7C 0cXA8SwUv- qrkTOj8c8fUAt-U Z5dg2yc8/edit?usp=sharing
  • 31.
  • 32. The End Karitane, 23 Oct 2020.