SlideShare a Scribd company logo
GENE315: Measuring genome function
Paul Gardner
March 20, 2023
Objectives for lecture 07
▶ An understanding of:
▶ How “function” in a genomic context can be defined. This is a
surprisingly philosophical problem.
▶ Some strategies for determining if a genomic region is likely to
be functional or not.
OK Go (2010) This Too Shall Pass.
Rube Goldberg machine
What are the “functional” bits in a genome?
HG38 chr11::62,851,834-62,855,980.
A famous region...
What is “functional”?
▶ ENCODE: “biochemically functional” i.e. anything with a
reproducible biochemical activity is functional.
▶ Biological philosophy:
▶ “That A implies B does not entail that B must imply A.”
▶ I.e. Protein & ncRNA genes are transcribed, does not imply
that all transcribed elements are genes (or functional).
▶ “Is the mere fact that X causes Y enough to say that Y is X’s
proper function?”
▶ Doolittle & Brunet describe three posibilities based upon
Darwinian evolution:
▶ Selected effect: a trait that was under positive selection in
previous generations (now maintained by negative selection).
▶ Constructive neutral evolution: negative selection of traits
that were never under positive selection.
▶ Exaptation: traits shaped by natural selection for some use
other than their current one.
▶ I.e. evidence of evolutionary selection is required before
claiming a function.
Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
What is your theoretical model of the cell?
Image sources: Wikimedia Commons & Prof. David S. Goodsell
What is “functional”?
Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
What is “functional”? – the Gardner simplification
EVIDENCE FOR MAINTENANCE BY
NEGATIVE SELECTION, OR
CURRENTLY UNDER POSITIVE
SELECTION
Based on Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC
Biology.
Some useful definitions
▶ Negative or purifying selection: the selective removal of alleles
that are deleterious to fitness.
▶ Positive selection: Positive selection is the process by which
alleles that increase organismal fitness also increase in
frequency in a population.
Booker et al. (2017) Detecting positive selection in the genome. BMC Biology.
▶ Legend has it that the charismatic inventor of the
“Kahungunu Wave” left many descendants (Ngā Tukemata o
Kahungunu)
Some useful definitions
▶ Constructive neutral evolution: how complex biological
systems might arrive via a series of neutral events.
▶ Almost synonymous with “Negative Selection”, as used by
Doolittle.
▶ Exaptation: a trait (or gene) evolved to serve one particular
function, but subsequently used to serve another.
Lukes̆ et al. (2011) How a Neutral Evolutionary Ratchet Can Build Cellular Complexity. IUBMB Life.
A great example of Exaptation
▶ Xist (X-inactive specific transcript) is a long non-coding RNA
required for X-inactivation in placental mammals
▶ Xist derived/exapted from a protein-coding pseudogene
(Lnx3)
▶ Mutually exclusive phylogenetic pattern with Lnx3, sequence
& synteny preserved and out-of-frame mutations in the
mammals...
Duret et al (2006) The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene.
Science.
Doolittle & Brunet quotes
▶ On accumulating transposable elements: “There is no
selective advantage to individuals within a species for doing
this, and evolution has no foresight [15]!”
▶ “A function concept embedded in one definitional framework
(SE) was (supposedly) refuted by empirical data based on
quite another (CR).”
▶ “it is the irrelevance of the majority of TEs (at least half of
our own DNA) to fitness at the organismal level that means
that “junk” is likely always to be a reasonable way to refer to
it”
▶ Final sentence: “If lungfishes have junk, or at the least
extraordinarily weakly or diffusely functional DNA in their
genomes, why is it that we think we do not?”
Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
What is “functional”? – a more holistic point of view.
▶ Argue for using combinations of biochemical, evolutionary,
and genetic evidence to elucidate genome function in human
biology and disease.1
Kellis et al. (2014) Defining functional DNA elements in the human genome. PNAS.
1. there are many different levels of evolutionary evidence!
Our attempt...
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1kGP SNP Count
1kGP Average MAF
gnomAD Average MAF
gnomAD SNP Count
Covariance Min E−value
Secondary Structure MFE
Interaction Energy Avg
Interaction Energy Min
gnomAD SNP Density
Accessibility
Repeat Distance Sum
1kGP SNP Density
Neutral Predictor
Repeat Distance Min
Genomic copies (E−val<0.01)
RNAalifold Score
Fickett score
GERP Score Avg
GC%
Primary Cell RPKM
Covariance Max
RNAcode Coding Potential
Tissue RPKM
Primary Cell MRD
PhastCons Avg
GERP Score Max
Tissue MRD
PhyloP Avg
PhyloP Max
PhastCons Max
0 50 100 150
Mean Decrease in Protein−coding
Gini Coefficient over 100 runs
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1kGP SNP Count
1kGP Average MAF
Fickett score
gnomAD Average MAF
gnomAD SNP Count
RNAcode Coding Potential
gnomAD SNP Density
Neutral Predictor
RNAalifold Score
Genomic copies (E−val<0.01)
Covariance Max
1kGP SNP Density
Covariance Min E−value
GC%
Repeat Distance Min
Repeat Distance Sum
Interaction Energy Min
PhyloP Max
GERP Score Max
GERP Score Avg
Secondary Structure MFE
Accessibility
Interaction Energy Avg
PhastCons Max
PhyloP Avg
PhastCons Avg
Tissue MRD
Tissue RPKM
Primary Cell RPKM
Primary Cell MRD
0 50 100
Mean Decrease in Short ncRNA
Gini Coefficient over 100 runs
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1kGP SNP Count
Genomic copies (E−val<0.01)
gnomAD SNP Count
PhastCons Max
RNAalifold Score
Secondary Structure MFE
Accessibility
Interaction Energy Avg
1kGP SNP Density
gnomAD Average MAF
PhyloP Avg
1kGP Average MAF
Interaction Energy Min
PhastCons Avg
Covariance Min E−value
Repeat Distance Min
Neutral Predictor
RNAcode Coding Potential
Repeat Distance Sum
gnomAD SNP Density
Fickett score
PhyloP Max
GERP Score Avg
GERP Score Max
Covariance Max
Primary Cell RPKM
GC%
Tissue RPKM
Tissue MRD
Primary Cell MRD
20 40 60
Mean Decrease in lncRNA
Gini Coefficient over 100 runs
Feature type
●
●
●
●
●
●
Intrinsic sequence
Sequence conservation
Transcriptome expression
Genomic repeat association
Protein/RNA specific features
Population variation
Proteins Short ncRNAs Long ncRNAs
Cooper & Gardner (2021) Features of Functional Human Genes. bioRxiv.
Random sequence experiments identify lots of transcription
(& translation)
Recall Sean Eddy’s “Random Chromosome Project”?
▶ Neme et al (2017) “Random sequences are an abundant
source of bioactive RNAs or peptides”. Nature Ecology &
Evolution.
▶ ... “by expressing clones with random sequences in E. coli” ...
“Contrary to expectations, we find that random sequences
with bioactivity are not rare.”
▶ de Boer et al (2019) “Deciphering eukaryotic gene-regulatory
logic with 100 million random promoters”. Nature
Biotechnology.
▶ ... “we measure the expression output of > 100 million
synthetic yeast promoter sequences that are fully random” ...
“ it is often tacitly assumed that functional TFBSs are rare” ...
“TFBSs are common in random DNA” ... “Abundant weak
regulatory interactions explain most of expression level”
NB. expression from random sequence is also reproducible!
Based upon the Doolittle & Brunet classification...
▶ Are the following elements likely to be a “Selected
Effect”, “Constructive Neutral Evolution”, “Exapted” or
no beneficial effect?
Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
Complete the following table in small groups...
Ultra-conserved elements (UCRs)
▶ Highly conserved sequence (in mammals)
Bejerano et al. (2004) Ultraconserved Elements in the Human Genome. Science.
Snetkova et al. (2022) Perfect and imperfect views of ultraconserved sequences. Nature Reviews Genetics.
UCNEbase: ultra-conserved non-coding elements database
Human accelerated regions (HARs)
Positive selection since humans diverged from the human and
chimp ancestor.
▶ Whalen & Pollard (2022) Enhancer Function and Evolutionary
Roles of Human Accelerated Regions. Annual Review of
Genetics.
Example: HAR1
▶ Compare genome sequences, find otherwise conserved regions
that are evolving rapidly in humans HAR1 in particular,
appears to be involved in cortical development
▶ Pollard et al. (2006) Forces Shaping the Fastest Evolving
Regions in the Human Genome. PLOS Genetics.
▶ Pollard et al. (2006) An RNA gene expressed during cortical
development evolved rapidly in humans. Nature.
Scale
chr20:
GWAS Catalog
Chimp
Gorilla
Orangutan
Gibbon
Rhesus
Crab-eating_macaque
Baboon
Green_monkey
Marmoset
Bushbaby
Chinese_tree_shrew
Squirrel
Lesser_Egyptian_jerboa
Prairie_vole
Chinese_hamster
Mouse
Rat
Naked_mole-rat
Guinea_pig
Chinchilla
Brush-tailed_rat
Rabbit
Pika
Pig
Alpaca
Bactrian_camel
Killer_whale
Tibetan_antelope
Cow
Sheep
Domestic_goat
White_rhinoceros
Cat
Dog
Ferret_
Panda
Pacific_walrus
Weddell_seal
Black_flying-fox
Megabat
David’s_myotis_(bat)
Little_brown_bat
Big_brown_bat
Hedgehog
Shrew
Elephant
Cape_elephant_shrew
Manatee
Cape_golden_mole
Tenrec
Armadillo
Opossum
Wallaby
Chicken
Common dbSNP(153)
SINE
LINE
LTR
DNA
Simple
Low Complexity
Satellite
RNA
Other
Unknown
100 bases hg38
63,102,050 63,102,100 63,102,150 63,102,200 63,102,250 63,102,300 63,102,350 63,102,400 63,102,450
Your Sequence from Blat Search
GENCODE V39 (4 items filtered out)
Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105)
Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105)
NHGRI-EBI Catalog of Published Genome-Wide Association Studies
100 vertebrates Basewise Conservation by PhyloP
Multiz Alignments of 100 Vertebrates
Short Genetic Variants from dbSNP release 153
Repeating Elements by RepeatMasker
HAR1B
HAR1B
HAR1B
ENSG00000274915
HAR1A
HAR1B
HAR1B
HAR1B
ENSG00000274915
HAR1A
Wikipedia article: Human accelerated regions
HAR1 in the UCSC genome browser.
7SL RNA
▶ Also called SRP RNA, part of the signal recognition particle,
involved in protein transport
▶ Universally conserved, orthologues are found in archaea,
bacteria and eukaryotes.
Scale
chr14:
GWAS Catalog
Chimp
Gorilla
Orangutan
Gibbon
Rhesus
Crab-eating_macaque
Baboon
Green_monkey
Marmoset
Bushbaby
Chinese_tree_shrew
Squirrel
Lesser_Egyptian_jerboa
Prairie_vole
Chinese_hamster
Mouse
Rat
Naked_mole-rat
Guinea_pig
Chinchilla
Brush-tailed_rat
Rabbit
Pika
Pig
Alpaca
Bactrian_camel
Killer_whale
Tibetan_antelope
Cow
Sheep
Domestic_goat
White_rhinoceros
Cat
Dog
Ferret_
Panda
Pacific_walrus
Weddell_seal
Black_flying-fox
Megabat
David’s_myotis_(bat)
Little_brown_bat
Big_brown_bat
Hedgehog
Shrew
Elephant
Cape_elephant_shrew
Manatee
Cape_golden_mole
Tenrec
Armadillo
Opossum
Wallaby
Chicken
Common dbSNP(153)
SINE
LINE
LTR
DNA
Simple
Low Complexity
Satellite
RNA
Other
Unknown
200 bases hg38
49,586,450 49,586,500 49,586,550 49,586,600 49,586,650 49,586,700 49,586,750 49,586,800 49,586,850 49,586,900 49,586,950 49,587,000 49,587,050
Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105)
Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105)
NHGRI-EBI Catalog of Published Genome-Wide Association Studies
Vertebrate Multiz Alignment & Conservation (100 Species)
Multiz Alignments of 100 Vertebrates
Short Genetic Variants from dbSNP release 153
Repeating Elements by RepeatMasker
RPS29
RPS29
RN7SL1
Cons 100 Verts
Wikipedia article: Signal recognition particle
7SL in the UCSC genome browser.
Alu elements (I)
▶ Transposable element, derived from 7SL RNA. Over one
million copies in the human genome
▶ Breaks a LOT of bioinformatics tools, frequently masked from
genome sequences
Wikipedia article: Alu element
Dfam entry for the AluY family
Alu elements (II)
▶ Is the Alu element on human chromosome 14, between
31,474,114 and 31,473,798 likely to be functional?
Scale
chr14:
GWAS Catalog
Chimp
Gorilla
Orangutan
Gibbon
Rhesus
Crab-eating_macaque
Baboon
Green_monkey
Marmoset
Bushbaby
Chinese_tree_shrew
Squirrel
Lesser_Egyptian_jerboa
Prairie_vole
Chinese_hamster
Mouse
Rat
Naked_mole-rat
Guinea_pig
Chinchilla
Brush-tailed_rat
Rabbit
Pika
Pig
Alpaca
Bactrian_camel
Killer_whale
Tibetan_antelope
Cow
Sheep
Domestic_goat
White_rhinoceros
Cat
Dog
Ferret_
Panda
Pacific_walrus
Weddell_seal
Black_flying-fox
Megabat
David’s_myotis_(bat)
Little_brown_bat
Big_brown_bat
Hedgehog
Shrew
Elephant
Cape_elephant_shrew
Manatee
Cape_golden_mole
Tenrec
Armadillo
Opossum
Wallaby
Chicken
Common dbSNP(153)
SINE
LINE
LTR
DNA
Simple
Low Complexity
Satellite
RNA
Other
Unknown
100 bases hg38
31,473,750 31,473,800 31,473,850 31,473,900 31,473,950 31,474,000 31,474,050 31,474,100 31,474,150
Your Sequence from Blat Search
GENCODE V39
Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105)
Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105)
NHGRI-EBI Catalog of Published Genome-Wide Association Studies
100 vertebrates Basewise Conservation by PhyloP
Multiz Alignments of 100 Vertebrates
Short Genetic Variants from dbSNP release 153
Repeating Elements by RepeatMasker
An Alu element in the UCSC genome browser.
Mills et al. (2007) Which transposable elements are active in the human genome?
The main points
▶ “Function” measured in a variety of ways:
▶ Phenotypic: “biochemical” activity
▶ Evolutionary: evidence of positive/negative selection
▶ Genetic: knockout or mutation and compensation/complement
▶ A Darwinian definition of “function”: phenotype/activity
AND evolutionary evidence (SE, CNE or Exaptation)
▶ Evolutionary evidence can be varied: usually based on
sequence conservation or reduced SNPs
▶ Some suggest overlapping evidence of biochemical activity &
evolutionary conservation (i.e. regions under negative
selection).
Self-evaluation exercises
▶ Compare and contrast the ENCODE definition of function to
the Darwinian definition proposed by Doolittle & Brunet
(2017).
▶ Outline a method for determining if a predicted mouse gene is
functional. What are potential pitfalls for the approach?
▶ Where do human-accelarated regions, ultraconserved regions,
Xist, BC1 RNA and Junk DNA lie in Doolittle & Brunet
(2017) classification of functional DNA?
▶ Justify your answers!
Further reading
▶ Doolittle & Brunet (2017) On causal roles and selected
effects: our genome is mostly junk. BMC Biology.
▶ Kellis et al. (2014) Defining functional DNA elements in the
human genome. PNAS.
▶ Neme et al (2017) Random sequences are an abundant source
of bioactive RNAs or peptides. Nature Ecology & Evolution.
Post questions on the FAQ GoogleDoc:
https://docs.google.com/document/d/1PQd dp7C 0cXA8SwUv-
qrkTOj8c8fUAt-U Z5dg2yc8/edit?usp=sharing
Genetics
Mātai Ira
otago.ac.nz/genetics
Genetics
Mātai Ira
Social Mixer
WHEN
Wednesday 29th March 6.00pm to 8pm
WHERE
BIG13, ground floor Biochemistry
Genetics
Mātai Ira
otago.ac.nz/genetics
PIZZA&SOFTDRINKS
PROVIDED!
Near Rakiriri, Otago Harbour, 7 May 2020.

More Related Content

Similar to ppgardner-lecture07-genome-function.pdf

2015 03 13_puurs_v_public
2015 03 13_puurs_v_public2015 03 13_puurs_v_public
2015 03 13_puurs_v_public
Prof. Wim Van Criekinge
 
In-class introduction to basic Punnett square set-up and problem s.docx
In-class introduction to basic Punnett square set-up and problem s.docxIn-class introduction to basic Punnett square set-up and problem s.docx
In-class introduction to basic Punnett square set-up and problem s.docx
bradburgess22840
 
BolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisBolingerJustin - Honors Thesis
BolingerJustin - Honors Thesis
Justin P. Bolinger
 
Genetics research for society and global understanding - Myles Axton
Genetics research for society and global understanding - Myles AxtonGenetics research for society and global understanding - Myles Axton
Genetics research for society and global understanding - Myles Axton
Human Variome Project
 
Biotechnology- Principles and processes investigatory project.
Biotechnology- Principles and processes investigatory project.Biotechnology- Principles and processes investigatory project.
Biotechnology- Principles and processes investigatory project.
Nishant Upadhyay
 
Functional genomics, a conceptual approach
Functional genomics, a conceptual approachFunctional genomics, a conceptual approach
Functional genomics, a conceptual approach
KAUSHAL SAHU
 
Bioinformatics workshop presentation
Bioinformatics   workshop presentationBioinformatics   workshop presentation
Bioinformatics workshop presentation
SKUAST-Kashmir
 
Genetic regulation of human brain aging
Genetic regulation of human brain agingGenetic regulation of human brain aging
Genetic regulation of human brain aging
Alzforum
 
Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...
Andrei KUCHARAVY
 
PopaPhDMain
PopaPhDMainPopaPhDMain
PopaPhDMain
Alexandra POPA
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
Sri Ambati
 
Exploring the role of DNA methylation as a source of phenotypic variation in ...
Exploring the role of DNA methylation as a source of phenotypic variation in ...Exploring the role of DNA methylation as a source of phenotypic variation in ...
Exploring the role of DNA methylation as a source of phenotypic variation in ...
mgavery
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About It
Anita de Waard
 
Epigenetics of the Developing BrainFrances A. Cham pagne .docx
Epigenetics of the Developing BrainFrances A. Cham pagne .docxEpigenetics of the Developing BrainFrances A. Cham pagne .docx
Epigenetics of the Developing BrainFrances A. Cham pagne .docx
SALU18
 
generic optimization techniques lecture slides
generic optimization techniques  lecture slidesgeneric optimization techniques  lecture slides
generic optimization techniques lecture slides
SardarHamidullah
 
20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics
Javier Quílez Oliete
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
c.titus.brown
 
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textNetwork biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
Lars Juhl Jensen
 
Genetics pp
Genetics ppGenetics pp
Genetics pp
abonica
 

Similar to ppgardner-lecture07-genome-function.pdf (20)

2015 03 13_puurs_v_public
2015 03 13_puurs_v_public2015 03 13_puurs_v_public
2015 03 13_puurs_v_public
 
In-class introduction to basic Punnett square set-up and problem s.docx
In-class introduction to basic Punnett square set-up and problem s.docxIn-class introduction to basic Punnett square set-up and problem s.docx
In-class introduction to basic Punnett square set-up and problem s.docx
 
BolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisBolingerJustin - Honors Thesis
BolingerJustin - Honors Thesis
 
Genetics research for society and global understanding - Myles Axton
Genetics research for society and global understanding - Myles AxtonGenetics research for society and global understanding - Myles Axton
Genetics research for society and global understanding - Myles Axton
 
Biotechnology- Principles and processes investigatory project.
Biotechnology- Principles and processes investigatory project.Biotechnology- Principles and processes investigatory project.
Biotechnology- Principles and processes investigatory project.
 
Functional genomics, a conceptual approach
Functional genomics, a conceptual approachFunctional genomics, a conceptual approach
Functional genomics, a conceptual approach
 
Bioinformatics workshop presentation
Bioinformatics   workshop presentationBioinformatics   workshop presentation
Bioinformatics workshop presentation
 
Genetic regulation of human brain aging
Genetic regulation of human brain agingGenetic regulation of human brain aging
Genetic regulation of human brain aging
 
Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...
 
PopaPhDMain
PopaPhDMainPopaPhDMain
PopaPhDMain
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
Exploring the role of DNA methylation as a source of phenotypic variation in ...
Exploring the role of DNA methylation as a source of phenotypic variation in ...Exploring the role of DNA methylation as a source of phenotypic variation in ...
Exploring the role of DNA methylation as a source of phenotypic variation in ...
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About It
 
Epigenetics of the Developing BrainFrances A. Cham pagne .docx
Epigenetics of the Developing BrainFrances A. Cham pagne .docxEpigenetics of the Developing BrainFrances A. Cham pagne .docx
Epigenetics of the Developing BrainFrances A. Cham pagne .docx
 
generic optimization techniques lecture slides
generic optimization techniques  lecture slidesgeneric optimization techniques  lecture slides
generic optimization techniques lecture slides
 
20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textNetwork biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
 
Genetics pp
Genetics ppGenetics pp
Genetics pp
 

More from Paul Gardner

ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
Paul Gardner
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
Paul Gardner
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
Paul Gardner
 
Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?
Paul Gardner
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
Paul Gardner
 
Clustering
ClusteringClustering
Clustering
Paul Gardner
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
Paul Gardner
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
Paul Gardner
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
Paul Gardner
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
Paul Gardner
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
Paul Gardner
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
Paul Gardner
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
Paul Gardner
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
Paul Gardner
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
Paul Gardner
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
Paul Gardner
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
Paul Gardner
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Paul Gardner
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
Paul Gardner
 
01 nc rna-intro
01 nc rna-intro01 nc rna-intro
01 nc rna-intro
Paul Gardner
 

More from Paul Gardner (20)

ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
 
Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
 
Clustering
ClusteringClustering
Clustering
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
 
01 nc rna-intro
01 nc rna-intro01 nc rna-intro
01 nc rna-intro
 

Recently uploaded

Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
Nistarini College, Purulia (W.B) India
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
RDhivya6
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
frank0071
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
ShibsekharRoy1
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 

Recently uploaded (20)

Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 

ppgardner-lecture07-genome-function.pdf

  • 1. GENE315: Measuring genome function Paul Gardner March 20, 2023
  • 2. Objectives for lecture 07 ▶ An understanding of: ▶ How “function” in a genomic context can be defined. This is a surprisingly philosophical problem. ▶ Some strategies for determining if a genomic region is likely to be functional or not. OK Go (2010) This Too Shall Pass. Rube Goldberg machine
  • 3. What are the “functional” bits in a genome? HG38 chr11::62,851,834-62,855,980. A famous region...
  • 4. What is “functional”? ▶ ENCODE: “biochemically functional” i.e. anything with a reproducible biochemical activity is functional. ▶ Biological philosophy: ▶ “That A implies B does not entail that B must imply A.” ▶ I.e. Protein & ncRNA genes are transcribed, does not imply that all transcribed elements are genes (or functional). ▶ “Is the mere fact that X causes Y enough to say that Y is X’s proper function?” ▶ Doolittle & Brunet describe three posibilities based upon Darwinian evolution: ▶ Selected effect: a trait that was under positive selection in previous generations (now maintained by negative selection). ▶ Constructive neutral evolution: negative selection of traits that were never under positive selection. ▶ Exaptation: traits shaped by natural selection for some use other than their current one. ▶ I.e. evidence of evolutionary selection is required before claiming a function. Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 5. What is your theoretical model of the cell? Image sources: Wikimedia Commons & Prof. David S. Goodsell
  • 6. What is “functional”? Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 7. What is “functional”? – the Gardner simplification EVIDENCE FOR MAINTENANCE BY NEGATIVE SELECTION, OR CURRENTLY UNDER POSITIVE SELECTION Based on Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 8. Some useful definitions ▶ Negative or purifying selection: the selective removal of alleles that are deleterious to fitness. ▶ Positive selection: Positive selection is the process by which alleles that increase organismal fitness also increase in frequency in a population. Booker et al. (2017) Detecting positive selection in the genome. BMC Biology.
  • 9. ▶ Legend has it that the charismatic inventor of the “Kahungunu Wave” left many descendants (Ngā Tukemata o Kahungunu)
  • 10. Some useful definitions ▶ Constructive neutral evolution: how complex biological systems might arrive via a series of neutral events. ▶ Almost synonymous with “Negative Selection”, as used by Doolittle. ▶ Exaptation: a trait (or gene) evolved to serve one particular function, but subsequently used to serve another. Lukes̆ et al. (2011) How a Neutral Evolutionary Ratchet Can Build Cellular Complexity. IUBMB Life.
  • 11. A great example of Exaptation ▶ Xist (X-inactive specific transcript) is a long non-coding RNA required for X-inactivation in placental mammals ▶ Xist derived/exapted from a protein-coding pseudogene (Lnx3) ▶ Mutually exclusive phylogenetic pattern with Lnx3, sequence & synteny preserved and out-of-frame mutations in the mammals... Duret et al (2006) The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science.
  • 12. Doolittle & Brunet quotes ▶ On accumulating transposable elements: “There is no selective advantage to individuals within a species for doing this, and evolution has no foresight [15]!” ▶ “A function concept embedded in one definitional framework (SE) was (supposedly) refuted by empirical data based on quite another (CR).” ▶ “it is the irrelevance of the majority of TEs (at least half of our own DNA) to fitness at the organismal level that means that “junk” is likely always to be a reasonable way to refer to it” ▶ Final sentence: “If lungfishes have junk, or at the least extraordinarily weakly or diffusely functional DNA in their genomes, why is it that we think we do not?” Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 13. What is “functional”? – a more holistic point of view. ▶ Argue for using combinations of biochemical, evolutionary, and genetic evidence to elucidate genome function in human biology and disease.1 Kellis et al. (2014) Defining functional DNA elements in the human genome. PNAS. 1. there are many different levels of evolutionary evidence!
  • 14. Our attempt... ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1kGP SNP Count 1kGP Average MAF gnomAD Average MAF gnomAD SNP Count Covariance Min E−value Secondary Structure MFE Interaction Energy Avg Interaction Energy Min gnomAD SNP Density Accessibility Repeat Distance Sum 1kGP SNP Density Neutral Predictor Repeat Distance Min Genomic copies (E−val<0.01) RNAalifold Score Fickett score GERP Score Avg GC% Primary Cell RPKM Covariance Max RNAcode Coding Potential Tissue RPKM Primary Cell MRD PhastCons Avg GERP Score Max Tissue MRD PhyloP Avg PhyloP Max PhastCons Max 0 50 100 150 Mean Decrease in Protein−coding Gini Coefficient over 100 runs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1kGP SNP Count 1kGP Average MAF Fickett score gnomAD Average MAF gnomAD SNP Count RNAcode Coding Potential gnomAD SNP Density Neutral Predictor RNAalifold Score Genomic copies (E−val<0.01) Covariance Max 1kGP SNP Density Covariance Min E−value GC% Repeat Distance Min Repeat Distance Sum Interaction Energy Min PhyloP Max GERP Score Max GERP Score Avg Secondary Structure MFE Accessibility Interaction Energy Avg PhastCons Max PhyloP Avg PhastCons Avg Tissue MRD Tissue RPKM Primary Cell RPKM Primary Cell MRD 0 50 100 Mean Decrease in Short ncRNA Gini Coefficient over 100 runs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1kGP SNP Count Genomic copies (E−val<0.01) gnomAD SNP Count PhastCons Max RNAalifold Score Secondary Structure MFE Accessibility Interaction Energy Avg 1kGP SNP Density gnomAD Average MAF PhyloP Avg 1kGP Average MAF Interaction Energy Min PhastCons Avg Covariance Min E−value Repeat Distance Min Neutral Predictor RNAcode Coding Potential Repeat Distance Sum gnomAD SNP Density Fickett score PhyloP Max GERP Score Avg GERP Score Max Covariance Max Primary Cell RPKM GC% Tissue RPKM Tissue MRD Primary Cell MRD 20 40 60 Mean Decrease in lncRNA Gini Coefficient over 100 runs Feature type ● ● ● ● ● ● Intrinsic sequence Sequence conservation Transcriptome expression Genomic repeat association Protein/RNA specific features Population variation Proteins Short ncRNAs Long ncRNAs Cooper & Gardner (2021) Features of Functional Human Genes. bioRxiv.
  • 15. Random sequence experiments identify lots of transcription (& translation) Recall Sean Eddy’s “Random Chromosome Project”? ▶ Neme et al (2017) “Random sequences are an abundant source of bioactive RNAs or peptides”. Nature Ecology & Evolution. ▶ ... “by expressing clones with random sequences in E. coli” ... “Contrary to expectations, we find that random sequences with bioactivity are not rare.” ▶ de Boer et al (2019) “Deciphering eukaryotic gene-regulatory logic with 100 million random promoters”. Nature Biotechnology. ▶ ... “we measure the expression output of > 100 million synthetic yeast promoter sequences that are fully random” ... “ it is often tacitly assumed that functional TFBSs are rare” ... “TFBSs are common in random DNA” ... “Abundant weak regulatory interactions explain most of expression level” NB. expression from random sequence is also reproducible!
  • 16. Based upon the Doolittle & Brunet classification... ▶ Are the following elements likely to be a “Selected Effect”, “Constructive Neutral Evolution”, “Exapted” or no beneficial effect? Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 17. Complete the following table in small groups...
  • 18. Ultra-conserved elements (UCRs) ▶ Highly conserved sequence (in mammals) Bejerano et al. (2004) Ultraconserved Elements in the Human Genome. Science. Snetkova et al. (2022) Perfect and imperfect views of ultraconserved sequences. Nature Reviews Genetics. UCNEbase: ultra-conserved non-coding elements database
  • 19. Human accelerated regions (HARs) Positive selection since humans diverged from the human and chimp ancestor. ▶ Whalen & Pollard (2022) Enhancer Function and Evolutionary Roles of Human Accelerated Regions. Annual Review of Genetics.
  • 20. Example: HAR1 ▶ Compare genome sequences, find otherwise conserved regions that are evolving rapidly in humans HAR1 in particular, appears to be involved in cortical development ▶ Pollard et al. (2006) Forces Shaping the Fastest Evolving Regions in the Human Genome. PLOS Genetics. ▶ Pollard et al. (2006) An RNA gene expressed during cortical development evolved rapidly in humans. Nature. Scale chr20: GWAS Catalog Chimp Gorilla Orangutan Gibbon Rhesus Crab-eating_macaque Baboon Green_monkey Marmoset Bushbaby Chinese_tree_shrew Squirrel Lesser_Egyptian_jerboa Prairie_vole Chinese_hamster Mouse Rat Naked_mole-rat Guinea_pig Chinchilla Brush-tailed_rat Rabbit Pika Pig Alpaca Bactrian_camel Killer_whale Tibetan_antelope Cow Sheep Domestic_goat White_rhinoceros Cat Dog Ferret_ Panda Pacific_walrus Weddell_seal Black_flying-fox Megabat David’s_myotis_(bat) Little_brown_bat Big_brown_bat Hedgehog Shrew Elephant Cape_elephant_shrew Manatee Cape_golden_mole Tenrec Armadillo Opossum Wallaby Chicken Common dbSNP(153) SINE LINE LTR DNA Simple Low Complexity Satellite RNA Other Unknown 100 bases hg38 63,102,050 63,102,100 63,102,150 63,102,200 63,102,250 63,102,300 63,102,350 63,102,400 63,102,450 Your Sequence from Blat Search GENCODE V39 (4 items filtered out) Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105) Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105) NHGRI-EBI Catalog of Published Genome-Wide Association Studies 100 vertebrates Basewise Conservation by PhyloP Multiz Alignments of 100 Vertebrates Short Genetic Variants from dbSNP release 153 Repeating Elements by RepeatMasker HAR1B HAR1B HAR1B ENSG00000274915 HAR1A HAR1B HAR1B HAR1B ENSG00000274915 HAR1A Wikipedia article: Human accelerated regions HAR1 in the UCSC genome browser.
  • 21. 7SL RNA ▶ Also called SRP RNA, part of the signal recognition particle, involved in protein transport ▶ Universally conserved, orthologues are found in archaea, bacteria and eukaryotes. Scale chr14: GWAS Catalog Chimp Gorilla Orangutan Gibbon Rhesus Crab-eating_macaque Baboon Green_monkey Marmoset Bushbaby Chinese_tree_shrew Squirrel Lesser_Egyptian_jerboa Prairie_vole Chinese_hamster Mouse Rat Naked_mole-rat Guinea_pig Chinchilla Brush-tailed_rat Rabbit Pika Pig Alpaca Bactrian_camel Killer_whale Tibetan_antelope Cow Sheep Domestic_goat White_rhinoceros Cat Dog Ferret_ Panda Pacific_walrus Weddell_seal Black_flying-fox Megabat David’s_myotis_(bat) Little_brown_bat Big_brown_bat Hedgehog Shrew Elephant Cape_elephant_shrew Manatee Cape_golden_mole Tenrec Armadillo Opossum Wallaby Chicken Common dbSNP(153) SINE LINE LTR DNA Simple Low Complexity Satellite RNA Other Unknown 200 bases hg38 49,586,450 49,586,500 49,586,550 49,586,600 49,586,650 49,586,700 49,586,750 49,586,800 49,586,850 49,586,900 49,586,950 49,587,000 49,587,050 Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105) Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105) NHGRI-EBI Catalog of Published Genome-Wide Association Studies Vertebrate Multiz Alignment & Conservation (100 Species) Multiz Alignments of 100 Vertebrates Short Genetic Variants from dbSNP release 153 Repeating Elements by RepeatMasker RPS29 RPS29 RN7SL1 Cons 100 Verts Wikipedia article: Signal recognition particle 7SL in the UCSC genome browser.
  • 22. Alu elements (I) ▶ Transposable element, derived from 7SL RNA. Over one million copies in the human genome ▶ Breaks a LOT of bioinformatics tools, frequently masked from genome sequences Wikipedia article: Alu element Dfam entry for the AluY family
  • 23. Alu elements (II) ▶ Is the Alu element on human chromosome 14, between 31,474,114 and 31,473,798 likely to be functional? Scale chr14: GWAS Catalog Chimp Gorilla Orangutan Gibbon Rhesus Crab-eating_macaque Baboon Green_monkey Marmoset Bushbaby Chinese_tree_shrew Squirrel Lesser_Egyptian_jerboa Prairie_vole Chinese_hamster Mouse Rat Naked_mole-rat Guinea_pig Chinchilla Brush-tailed_rat Rabbit Pika Pig Alpaca Bactrian_camel Killer_whale Tibetan_antelope Cow Sheep Domestic_goat White_rhinoceros Cat Dog Ferret_ Panda Pacific_walrus Weddell_seal Black_flying-fox Megabat David’s_myotis_(bat) Little_brown_bat Big_brown_bat Hedgehog Shrew Elephant Cape_elephant_shrew Manatee Cape_golden_mole Tenrec Armadillo Opossum Wallaby Chicken Common dbSNP(153) SINE LINE LTR DNA Simple Low Complexity Satellite RNA Other Unknown 100 bases hg38 31,473,750 31,473,800 31,473,850 31,473,900 31,473,950 31,474,000 31,474,050 31,474,100 31,474,150 Your Sequence from Blat Search GENCODE V39 Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105) Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105) NHGRI-EBI Catalog of Published Genome-Wide Association Studies 100 vertebrates Basewise Conservation by PhyloP Multiz Alignments of 100 Vertebrates Short Genetic Variants from dbSNP release 153 Repeating Elements by RepeatMasker An Alu element in the UCSC genome browser. Mills et al. (2007) Which transposable elements are active in the human genome?
  • 24. The main points ▶ “Function” measured in a variety of ways: ▶ Phenotypic: “biochemical” activity ▶ Evolutionary: evidence of positive/negative selection ▶ Genetic: knockout or mutation and compensation/complement ▶ A Darwinian definition of “function”: phenotype/activity AND evolutionary evidence (SE, CNE or Exaptation) ▶ Evolutionary evidence can be varied: usually based on sequence conservation or reduced SNPs ▶ Some suggest overlapping evidence of biochemical activity & evolutionary conservation (i.e. regions under negative selection).
  • 25. Self-evaluation exercises ▶ Compare and contrast the ENCODE definition of function to the Darwinian definition proposed by Doolittle & Brunet (2017). ▶ Outline a method for determining if a predicted mouse gene is functional. What are potential pitfalls for the approach? ▶ Where do human-accelarated regions, ultraconserved regions, Xist, BC1 RNA and Junk DNA lie in Doolittle & Brunet (2017) classification of functional DNA? ▶ Justify your answers!
  • 26. Further reading ▶ Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology. ▶ Kellis et al. (2014) Defining functional DNA elements in the human genome. PNAS. ▶ Neme et al (2017) Random sequences are an abundant source of bioactive RNAs or peptides. Nature Ecology & Evolution. Post questions on the FAQ GoogleDoc: https://docs.google.com/document/d/1PQd dp7C 0cXA8SwUv- qrkTOj8c8fUAt-U Z5dg2yc8/edit?usp=sharing
  • 27. Genetics Mātai Ira otago.ac.nz/genetics Genetics Mātai Ira Social Mixer WHEN Wednesday 29th March 6.00pm to 8pm WHERE BIG13, ground floor Biochemistry Genetics Mātai Ira otago.ac.nz/genetics PIZZA&SOFTDRINKS PROVIDED!
  • 28. Near Rakiriri, Otago Harbour, 7 May 2020.