Random RNA interactions control protein expression in
prokaryotes
Paul Gardner
University of Canterbury
Christchurch
New Zealand
Feel free to share what you hear
These slides are available at: http://www.slideshare.net/ppgardne/presentations
The hard work of Sinan Umu, Ant Poole & Ren Dobson
mRNA levels are imperfectly correlated with protein levels
Lu et al. (2007) Nature biotechnology.
Determinants of protein concentration
Protein concentration depends on mRNA concentration, translation and
degradation rates
DNA
[D]
RNA
[R]
Protein
[P]
ktranscription ktranslation
kmRNA degradation kprotein degradation
0 1
A
T GGC
TA
A
GGGGCA
A
T
C
T
T
TA
C
A A
G
AT
CC
G
T
T
C
C
T
G
A
AC
G
C
AC
T G
C
G
T C
G
G
G
A
A
C
G
T
G
T
T C
CAGTTTCTATTTATT
T
G G T G A A T G GTATTA A G C T GC
AA
G
G G
C
AA
A
T
C
G
A
G
T
C
T
TT
T
G
A
T
C
AG
T
T
C
G
T
G
A
T
C
C
T
G
T
T
G
A A
A
A
A
C
A
C
G
G
T
C
A GC
C
A
G
A
T
G
G
T TT
A
C
A
A
GC
A
C
G
C
G
A
T
T
T C T A
C
T
G
T
T G T C C CG
T CT
C
G C C C G G T T T C
T
C
AT
CA
CA
GTAA
CAACGCCG
GT
GGC
G
G
T
A
C
C
A
G
C
A
G
T
A
A
C T A C C A T
C
A
TGGTAGCAGCG
C
G
C A
G
A A
T
AC
T
T
CC
G
C
G
C
A
ACAGG
A
C
A
G
C
G
A
A
GAAACCG
A
A
TAA
de Sousa Abreu, Penalva, Marcotte & Vogel (2009) Global signatures of protein and mRNA expression levels. Molecular
BioSystems.
Two general models describe variation in translation rate
1. Codon usage (Ikemura, 1981)
Figure from: Tuller & Zur (2015) Nucl. Acids Res.
Two general models describe variation in translation rate
2. mRNA structure (Pelletier & Sonenberg, 1987)
Figure from: Tuller & Zur (2015) Nucl. Acids Res.
We think we have a third general model...
http://dx.doi.org/10.7554/eLife.13479
http://dx.doi.org/10.7554/eLife.20686
Non-coding RNAs are abundant
q
q
q
q
q
q
q
q
012345
log10(MeanReadDepth)
Core ncRNA genes
Core protein coding genes
Lindgreen, Umu et al. (2014) PLOS Computational Biology.
Bacterial non-coding RNA function
Hfq
AUG
SD
X
Ribosome
sRNA
AUG
RNase E
recruitment
AUG
SD
Ribosome
Anti-antisense mechanism
Selective mRNA stabilisation
AUG
RNase E
Shine-Dalgarno
sequence
Sequestration of ribosome binding site
Induction of mRNA decay
SD =
Figure by Bethany Jose
Checking for mRNA:ncRNA interactions
Looking for regulatory interactions which are specific and small in
number, off-targets are non-specific and large in number
Compare 5 ends of CDS & ncRNAs
Looking for a bump on the left...
−15 −10 −5 0
0.000.050.100.150.200.25
Binding Energy (kcal/mol)
Density
Checking for mRNA:ncRNA interactions
−15 −10 −5 0
0.000.050.100.150.200.25
Binding Energy (kcal/mol)
Native
Shuffled (P = 7.69−52
)
Checking negative controls!
−15 −10 −5 0
0.000.050.100.150.200.25
Binding Energy (kcal/mol)
Native
Shuffled (P = 7.69−52
)
Different phylum (P = 0 )
Downstream (P = 2.66−124
)
Rev. complement (P = 6.51−57
)
Intergenic (P = 6.16−93
)
Do ubiquitous and abundant RNAs influence translation?
Given that ncRNAs are among the most abundant RNAs in the cell
([ncRNA] >> [mRNA])
AND that RNAs frequently hybridise
THEN maybe stochastic interactions with mRNAs inhibit translation
Corley & Laederach (2016) Bioinformatics: Selecting against accidental RNA interactions. eLife.
How can this hypothesis be tested?
We predict that:
1. There is selection against mRNA:ncRNA interactions
2. That stochastic mRNA:ncRNA interactions influence [protein]:[mRNA]
ratios
For consistency: focus on 6 ncRNA families & 114 mRNAs/proteins
that are highly conserved & expressed; And first 21 nts of CDS.
Tested 1,582 bacterial & 118 archaeal genomes
Are mRNA:ncRNA interactions selected against?
−15 −10 −5 0
−0.010−0.0050.0000.0050.0100.015
Binding Energy (kcal/mol)
DensityDifference Actinobacteria (n:163) P = 9.8x10−69
Bacteroidetes (n:60) P = 8.7x10−148
Chlamydiae (n:38) P = 1.4x10−193
Cyanobacteria (n:40) P = 3.8x10−11
Firmicutes (n:378) P = 0
Proteobacteria (n:756) P = 0
Spirochaetes (n:38) P = 1.6x10−98
Archaea (n:118) P = 4.2x10−177
Background (n:100)
More stable interactions
NativeinteractionsShuffledinteractions
Act
Bac
Chl
Cya
Fir
Pro
Spi
Arc
010203040
−log10P
Do mRNA:ncRNA interactions influence protein
expression?
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
2.02.53.03.54.0
−300 −250 −200 −150
Rs=0.65
log10(fluorescence)
Avoidance (kcal/mol)
Expression data from: Kudla et al. (2009) Science.
Do mRNA:ncRNA interactions influence protein
expression?
Testing the relationship between protein abundance estimates and
avoidance, mRNA secondary structure, codon usage and mRNA
abundance
GFP datasets Mass-Spec datasets
E.coli
(n=52)
GFP/qPCR
E.coli
(n=154)
GFP/Northern
E.coli
(n=14,234)
mCherry/RNAseq
E.coli
(n=389)
MS/microarray
E.coli
(n=3,301)
MS/microarray
P.aeruginosa
(n=5,479)
MS/microarray
P.aeruginosa
(n=1,148)
MS/microarray
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*P < 0.05
0.0 0.60.2 0.4-0.2
Correlation Coefficient
Avoidance
Secondary
Structure
Codon
[mRNA]
Testing the extremes of expression
0.1
0.5
0.8
1.2
1.6
1.9
2.3
2.6
3
3.3
3.7
4.1
4.4
4.8
Freq
0
20
40
60
80
100
120
A
log10([Protein]/[mRNA])
Frequency
low expression (n=10)
high expression (n=10)
B
Avoidance
Codon
Sec.Str.
Null
Sec.Str.
Codon
Avoidance
−2
−1
0
1
2
*
*
Zscore
low expression (n=10)
high expression (n=10)
E. coli genes (n = 389)
Designing mRNAs
239aa GFP can be encoded by 7.62x10111 synonymous mRNAs
Extremes of avoidance have a stronger effect than codon usage or
secondary structure
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
4.24.34.44.54.64.7
0.60 0.65 0.70 0.75 0.80 0.85
CAI
log10(fluorescence)
Rs=0.29
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
4.24.34.44.54.64.7
−15 −10 −5 0
Folding Energy (kcal/mol)
Rs=0.34
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
4.24.34.44.54.64.7
−350 −300 −250 −200 −150 −100
Binding Energy (kcal/mol)
Rs=0.56
hi low
●
●
●
●
●
●
Avoid
Fold
Codon
Optimal●
Avoidance in 3D on the ribosome
Protein binds to regions with low avoidance (green) while exposed
regions are high avoidance (blue): P = 9.3x10−15, Fishers exact test
Further Work
Further work:
Testing adaptation with experimental evolution experiments
Do mRNA:ncRNA interactions influence eukaryotic gene expression?
Number of possible interactions increases quadratically with number of
genes. May require spatial & temporal separation of genes
Does avoidance drive compartmentalisation and increases in nucleotide
binding proteins?
Do mRNA:ncRNA interactions influence viral infection, hybridisation,
HGT & transformation expts?
Are protein, DNA and protein:nucleotide interactions also avoided?
And now for something completely different...
Bioinformaticians are horrible!
Bioinformaticians are bad, impatient & intolerant
Build a phylogenetic tree: which of the 172 methods do you use?
MBIORE
ANC-GENE
BAli-Phy
BAMBE
BayesPhylogenies
BEAST
BEST
Bio++
bms_runner
burntrees
Cadence
Crux
IMa2
Mesquite
MrBayes
MrBayesPlugin
MrBayes-tree-scanners
Multidivtime
p4
SIMMAP
PAL
tracer
PAML
Vanilla
PHASE
PHYLLAB
PhyloBayes
ARB
Bionumerics
BIRCH
Bosque
BPAnalysis
CAFCA
CRANN
DAMBE
EMBOSS
TNT
FootPrinter
Freqpars
Gambit
GAPars
GelCompar-II
GeneTree
gmaes
Hennig86
IDEA
LVB
MALIGN
MEGA
Mesquite
Murka
Network
NimbleTree
NONA
Notung
Parsimov
PAST
PAUP*
PAUPRat
PaupUp
phangorn
PHYLIP
PhyloNet
Phylo_win
POY
PRAP
PSODA
RA
SeaView
SeqState
Simplot
sog
TCS
Parsimony Maximum Likelihood Bayesian
ALIFRITZ
aLRT
ARB
Bio++
Bionumerics
BIRCH
BootPHYML
Bosque
CodeAxe
CoMET
Concaterpillar
CONSEL
Crux
DAMBE
DART
Darwin
dnarates
DPRML
DT-ModSel
EMBOSS
EREM
fastDNAml
fastDNAmlRev
FASTML
FastTree
GARLI
GZ-Gamma
HY-PHY
IQPNNI
Kakusan4
Leaphy
Mac5
McRate
Mesquite
MetaPIGA
MixtureTree
Modelfit
ModelGenerator
MOLPHY
MrAIC
MrModeltest
MrMTgui
MultiPhyl
NEPAL
NHML
nhPhyML
NimbleTree
p4
PAL
PAML
PARAT
PARBOOT
PASSML
PAUP*
PAUPRat
PaupUp
phangorn
PHYLLAB
PhyloCoCo
Phylo_win
PHYML
PhyML-Multi
PhyNav
PHYSIG
PLATO
Porn*
PRAP
PROCOV
ProtTest
PTP
r8s-bootstrap
Rate4Site
Rate-evolution
RAxML
raxmlGUI
RevDNArates
rRNA-phylogeny
SeaView
Segminator
SEMPHY
SeqPup
SeqState
SIMMAP
Simplot
SLR
Spectronet
Spectrum
SplitsTree
SSA
TipDate
Treefinder
TREE-PUZZLE
Vanilla
How can we choose software?
Which methods do you use?
Approach software like a scientist
Are any good controls available?
Positive: databases, publications,
simulation, ...
Negative: randomized, select
relevant negative data, ...
Some common accuracy metrics:
Sensitivity (true positive rate)
Specificity (true negative rate)
Mathew’s correlation coefficients
Area under an ROC curve
False positive rate
Truepositiverate
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
DBS, Pfam
DBS, Treefam
DBS, Custom
PROVEAN
Polyphen−2
SIFT
FATHMM, weighted
FATHMM, unweighted
Wheeler et al. (2016) A profile-based method for
identifying functional divergence of orthologous genes
in bacterial genomes. Bioinformatics.
Benchmarks are useful, and fun...
Is there really a relationship between software speed &
accuracy?
Can we run a meta-analysis of bioinformatic benchmarks?
If speed isn’t related to accuracy, then what is?
Some possibilities:
Software age
Journal “impact” (IF & GoogleScholar H5)
Number of citations
Corresponding author’s H-index & M-index
After some literature mining...
found 43 matching articles.
102 benchmarks
Accuracy & speed ranks for 243 bioinformatic software tools
Manually extracted IF, H, age, ...
65 journals (Bioinformatics, NAR, Genome Research, ...)
151 author GoogleScholar profiles
abyss antepiseeker apg barry bellerophontes bfast bismark biss boost bowtie bowtie2 bowtiestar bratbw bsmap
bsmooth bsseeker buckycon buckymrbayes buckymrbayesspa buckypop buckyraxml builder bwa bwasw caml camp carma
ce celera clark clc clustalomega clustalw comus coprarna coral cosine crisp cro cromwell cufflinks cwt dali
de dexseq dialign dialign22 dialignt dialigntx diffsplice diginormvelvet dima djigsaw downhillsimplex dsgseq
ebi echo edenanonstrict edenastrict edit epimode ericscript erpin fa fasta fasttree fisherexacttest
fusioncatcher fusionmap gassst gatk genometa gojobori goldman gossamer gottcha greedyft gsnap heidge hitec
hmmer hshrec idbaud igtpduplossft inchworm infernal intarna jaffa kalign kbsps kraken kthse leidnl limpic
lmat lms lofreq lsqman mafft mafftfftns mafftfftns2 mafftlinsi mapsplice maq mats megan metaphlan metaphyler
methylkit methylsig mgrast minia mira mirdeep mireap mirena mirexpress mlclustalw mlclustalwquicktree mlmafft
mlmafftparttree mlmuscle mlopal mlprankgt modellerv mosaik motu mpest mpjclustalw mpsclustalw mrfast mrpml
mrpmp mrsfast msinspect multalin muscle musclemaxiters mzmine nbc ncbiblast nest newbler nfuse novoalign
oases onecodex openms pairfold paralign pass perm phylonetft phylopythias phymmbl piler poa poy poystar
pragcz probalign probcons probtree process pso pt qiime qsra quake raiphy ravenna raxml raxmllimited
rdiffparam repeatfinder repeatgluer repeatscout reptile rmap rnacofold rnaduplex rnahybrid rnaplex rnaup
rsearch rsmatch sam sate scro scwrl scwrlcons segemehl segmodencad seqgsea seqman seqmap sga sharcgs shrimp
simulatedannealing sl smalt snap snpruler snver soap soap2 soapdenovo soapec soapstar spades sparse
sparseassembler spcomp specarray spt srmapper ssaha ssake ssap ssearch ssm sst st starbeast strcutal
swissmodel taipan targetrna targetrna2 taxatortk tcoffee team tmap tophatfusion transabyss trinity upmes
varscan vcake velvet wmrpmp woodhams wublast xalign xcmswithcorrection xcmswithoutretentiontime zema
Nothing is correlated with accuracy!
R
el.age
Year
AccuracySpeed
JH
5
JIF
C
ites
R
el.citesH
−index
M
−index
R
el.age
Year
Accuracy
Speed
JH
5
JIF
C
ites
R
el.cites
H
−index
M
−index
R
el.age
Year
Speed
JH
5
JIF
C
ites
R
el.cites
H
−index
M
−index
X X X X X X
X X X X X X X
X
X
X X X X
X X X X X X X
X X X X X X X
X X X X X X X X
X X X X X X
X X X X X
Correlates with accuracy rank
Spearman'srho
−0.2
−0.1
0.0
0.1
0.2
xxx
x
x
x
x
x
x
xx
xx
x
x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
xxx
x
x
x
x
x
xxx
x
xx
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
xx
x
x
xxx
x
x
xx
xx
x
x
x
x
x
xx
xx
x
x
x
xx
xx
x
xxxx
x
x
x
x
x
x
x
xxx
xx
xxx
x
x
x
x
x
x
x
xx
x
xx
x
x
x
xxxxxx
x
xx
xxxxxx
x
x
x
x
x
x
x
x
x
x
xxxx
x
xxxx
xx
x
x
x
xx
xxx
xx
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
xx
xx
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
x
xx
xx
x
x
x
x
xxx
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
x
x
xx
x
xxx
x x
xxxx
x
xx
x
xxxx
x
xxxx
x
x
xx
xx
x
xxx
x
xx
xx
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
xx
xxx
x
xx
x
x
x
xx
xxx
x
x
x
x
x
xxx
x
xx
x
xx
xx
x
x
x
x
x
x
x
xxxxx
x
x
x
x
xx
x
x
x
x
xxxxx
x
x
xx
x
x
xxx
x
xx
x
x
xx
x
x
x
x
xxxx
x
x
xx
x
x
x
xx
xxx
x
x
x
x
x
x
x
x
x
x
x
xx
x
xx
x
xx
xx
x
x
x
x
x
xx
x
x
xxx
x
x
x
x
x
xxxxxx
xx
x
x
xxx
x
x
xx
xxxx
xx
xx
xxx
x
x
xxxxxxx
x
xxx
x
xxxxxxx
x
x
x
x
xxx
xx
x
x
x
x
xxxxxx
xxx
x
x
x
x
xxxx
x
x
x
x
x
x
x
xxxxxxx
x
x
xxx
xx
xx
xxxxx
x
x
x
x
x
xx
x
x
xx
x
xxxxx
x
x
xx
xxx
x
x
x
xx
xxx
x
x
x
x
x
x
xxxx
x
x
x
xxxx
x
xxx
x
x
x
x
xx
x
xx
x
x
x
xxx
x
x
xx
x
x
xxx
x
x
xxx
x
x
x
x
x
x
x
x
xx
x x
xx
x
x
x
x
x
x
x
x
xx
xx
xx
xx
x
x
x
x
x
x
xx
x
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
xx
x
x
xxxx
x
x
x
x
xxx
xxxxx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xxx
xxx
xx
x
x
x
x
x
x
xx
xx
x
x
x
x
x
xx
xx
x
x
x
x
x
x
xx
xx
x
x
xx
x
x
x
x
xxx
x
xx
x
x
xx
xx
x
xx
x
x
xx
x
x
x
xx
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
xx
x
x
xx
x
x
x
x
x
x
x
x
xxx
xx
x
x
xx
x
xxx
xx
x
x
xxx
xxx
xxx
x
x
x
x
x
x
xx
x
x
x
xxx
xx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
x
xxx
x
x
x
x
x
x
xx
x
xx
x
x
x
x
x
x
x
xxxx
x
x
x
xxx
x
x
x
x
xx
xx
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
xx
x
xxx
xx
x
x
x
xx
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
xxx
x
xxxxxx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
xx
x
xx
x
x
x
x
x
x
xx
x
x
x
x
x
x
xxx
x
x
xx
x
x
x
x
x
xx
x
xx
x
x
x
xx
xx
xx
x
x
xx
x
x
x
x
x
x
xx
x
x
xxx
x
x
x
x
x
x
x
xx
x
x
x
x
xx
xx
x
x
x
x
x
xx
x
x
x
x
xxxxx
x
x
x
xx
xx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
xxxxx
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
xxxx
x
xxx
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
x
xx
xxx
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
x
x
xxx
x
xxxxxx
x
x
x
x
x
xxxx
x
x
x
xxxxx
x
x
x
xx
x
x
x
xxx
x
x
x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
xx
x
x
xx
xx
x
x
x
xx
x
xx
x
x
x
x
x
x
x
xxxx
x
x
x
x
xxxx
xxx
xx
x
xx
x
x
x
xxx
x
x
x
x
x
x
x
xxx
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
x
x
xxx
x
xxx
xxx
x
x
x
x
x
x
x
x
x
xx
x
xxx
x
xxx
x
x
x
x
xxxx
x
xxxx
x
xx
x
x
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
xx
xxx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
xx
x
xx
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
xx
xx
xx
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
xxxx
xx
x
x
xxxx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
x
xxx
xx
x
xxx
x
x
x
x
x
x
x
xxx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xxx
xx
x
x
x
x
x
x
x
x
xx
x
xxx
x
xx
x
x
xxx
xx
x
x
x
x
x
x
xx
x
x
x
xx
x
xx
x
x
xx
x
x
xx
xxx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
xx
x
xxx
x
x
x
x
x
xxx
xxx
x
x
xx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
xxx
x
x
x
x
x
x
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
x
x
x
x
xx
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
xx
xx
x
x
xxxxxx
xx
x
xxxxx
x
x
x
xxx
xxx
x
x
x
x
x
x
x
x
xxx
x
xx
x
x
x
x
xxx
x
x
x
x
x
xx
x
xxx
x
x
xx
x
xx
xxx
x
xx
x
x
x
x
x
xx
x
xxxxx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
xx
x
x
xx
x
x
x
x
xxx
x
x
xx
x
xx
x
x
x
xxx
x
xx
x
x
x
x
x
x
xx
x
xxxxx
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
xxx
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
x
xx
x
x
xxx
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
x
xxx
x
x
x
x
x
x
xxx
x
x
xx
x
xx
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
xxx
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
xx
xx
x
x
x
x
x
xxxx
x
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
xx
xx
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxxxx
xx
xx
x
x
x
x
x
x
x
x
x
x
xxx
x
xx
x
x
x
x
xx
xx
x
x
xxx
x
xx
xxx
x
x
x
x
xx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxxx
x
x
x
xx
x
x
x
xxx
xx
xx
xxx
x
x
xx
x
xx
x
xx
x
x
x
x
x
xxx
x
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
xxxx
xx
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xxx
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
x
x
xxxx
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
xxx
x
x
x
xx
x
xx
x
x
x
x
xx
xxx
x
xxx
x
x
x
xxxxx
x
x
x
x
xx
xxx
xxx
x
xxx
x
x
x
x
x
x
x
x
xx
x
xx
xx
x
x
x
x
x
xxx
x
x
xx
xx
x
x
xx
x
x
x
xx
xx
xx
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
xx
x
x
x
x
xxx
x
x
x
xxx
x
x
xx
x
x
x
x
x
x
xx
x
x
xxxx
x
x
xx
x
x
x
x
x
xx
x
x
x
xxx
x
x
x
x
x
xxx
xx
x
x
x
xx
xxx
x
x
x
x
x
x
x
xx
xx
x
x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
xxxx
xxx
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
xxxxxx
x
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
x
xx
xxx
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
x
x
x
xxxx
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
xxx
x
xx
x
x
x
xxx
xx
x
x
x
x
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
xx
x
x
x
xx
xx
xxx
x
x
xx
x
x
xx
xxx
x
xxx
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
xx
x
xxxxxx
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
xx
x
xx
xx
x
x
x
x
xxx
xx
x
x
x
x
x
x
x
xx
xxx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
xx
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
xxx
xx
x
xx
x
xx
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
xxx
x
x
x
xx
xxx
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
xx xxx
xx
xxxxxx
x
x
x
x
xxx
x
x
x
x
x
x
x
x
xxxxx
x
xx
xx
x
xx
xx
xxx
x
x
xx
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
x
x
x
xx
x
xxx
x
x
x
xx
x
xxxx
xx
x
xx
x
xx
x
xx
x
xx
x
xx
x
xxx
x
xx
x
x
x
x
xx
xx
xx
xx
xxx
x
x
x
x
x
x
x
x
x
xx
xx
x
x
x
xx
x
x
x
x
x
x
xx
xx
xx
x
xx
x
x
x
x
x
x
x
x
x
xx
x
xxxx
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xxx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
xx
x
xx
x
x
xx
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
xx
x
x
x
x
xx
x
xx
x
x
xx
x
x
x
x
xxx
xx
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
xx
xx
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
xxx
x
x
xx
x
x
x
x
x
x
x
x
xx
x
xxxx
x
x
x
x
x
-1 0 1
Spearman's rho
A B
-3 30
Z-score
Speed
Accuracy
Freq.
0 6 12
0
1000
2000
Freq.
0 6 12
0
1000
2000
Freq.
0 20
0
5000
10000
10
Freq.
0 6 12
0
1000
2000
Freq.
0 6 12
0
1000
2000
X
X X
X X
X X
X X X X X
X X
X X
X X X
X X
Conclusions
Speed is NOT reflective of accuracy
Neither is author/journal reputation, software age & # citations
The only reasonable way to select software is by benchmarking
Publication bias is influencing software accuracy
It doesn’t matter how famous you are, you can still write great software!
Thanks!
Avoidance: Sinan Umu, Anthony Poole & Renwick Dobson
Meta-benchmark: James Paterson, Fatemeh Ashari Ghomi, Sinan Umu,
Stephanie McGimpsey, Aleksandra Pawlik
Umu, Poole, Dobson & Gardner (2016) Avoidance of stochastic RNA interactions can be harnessed to control protein expression
levels in bacteria and archaea. eLife.
Gardner et al. (2017) A meta-analysis of bioinformatics software benchmarks reveals that publication-bias influences software
accuracy. In preparation.
These slides are available at: http://www.slideshare.net/ppgardne/presentations

Random RNA interactions control protein expression in prokaryotes

  • 1.
    Random RNA interactionscontrol protein expression in prokaryotes Paul Gardner University of Canterbury Christchurch New Zealand
  • 2.
    Feel free toshare what you hear These slides are available at: http://www.slideshare.net/ppgardne/presentations
  • 3.
    The hard workof Sinan Umu, Ant Poole & Ren Dobson
  • 4.
    mRNA levels areimperfectly correlated with protein levels Lu et al. (2007) Nature biotechnology.
  • 5.
    Determinants of proteinconcentration Protein concentration depends on mRNA concentration, translation and degradation rates DNA [D] RNA [R] Protein [P] ktranscription ktranslation kmRNA degradation kprotein degradation 0 1 A T GGC TA A GGGGCA A T C T T TA C A A G AT CC G T T C C T G A AC G C AC T G C G T C G G G A A C G T G T T C CAGTTTCTATTTATT T G G T G A A T G GTATTA A G C T GC AA G G G C AA A T C G A G T C T TT T G A T C AG T T C G T G A T C C T G T T G A A A A A C A C G G T C A GC C A G A T G G T TT A C A A GC A C G C G A T T T C T A C T G T T G T C C CG T CT C G C C C G G T T T C T C AT CA CA GTAA CAACGCCG GT GGC G G T A C C A G C A G T A A C T A C C A T C A TGGTAGCAGCG C G C A G A A T AC T T CC G C G C A ACAGG A C A G C G A A GAAACCG A A TAA de Sousa Abreu, Penalva, Marcotte & Vogel (2009) Global signatures of protein and mRNA expression levels. Molecular BioSystems.
  • 6.
    Two general modelsdescribe variation in translation rate 1. Codon usage (Ikemura, 1981) Figure from: Tuller & Zur (2015) Nucl. Acids Res.
  • 7.
    Two general modelsdescribe variation in translation rate 2. mRNA structure (Pelletier & Sonenberg, 1987) Figure from: Tuller & Zur (2015) Nucl. Acids Res.
  • 8.
    We think wehave a third general model... http://dx.doi.org/10.7554/eLife.13479 http://dx.doi.org/10.7554/eLife.20686
  • 9.
    Non-coding RNAs areabundant q q q q q q q q 012345 log10(MeanReadDepth) Core ncRNA genes Core protein coding genes Lindgreen, Umu et al. (2014) PLOS Computational Biology.
  • 10.
    Bacterial non-coding RNAfunction Hfq AUG SD X Ribosome sRNA AUG RNase E recruitment AUG SD Ribosome Anti-antisense mechanism Selective mRNA stabilisation AUG RNase E Shine-Dalgarno sequence Sequestration of ribosome binding site Induction of mRNA decay SD = Figure by Bethany Jose
  • 11.
    Checking for mRNA:ncRNAinteractions Looking for regulatory interactions which are specific and small in number, off-targets are non-specific and large in number Compare 5 ends of CDS & ncRNAs Looking for a bump on the left... −15 −10 −5 0 0.000.050.100.150.200.25 Binding Energy (kcal/mol) Density
  • 12.
    Checking for mRNA:ncRNAinteractions −15 −10 −5 0 0.000.050.100.150.200.25 Binding Energy (kcal/mol) Native Shuffled (P = 7.69−52 )
  • 13.
    Checking negative controls! −15−10 −5 0 0.000.050.100.150.200.25 Binding Energy (kcal/mol) Native Shuffled (P = 7.69−52 ) Different phylum (P = 0 ) Downstream (P = 2.66−124 ) Rev. complement (P = 6.51−57 ) Intergenic (P = 6.16−93 )
  • 14.
    Do ubiquitous andabundant RNAs influence translation? Given that ncRNAs are among the most abundant RNAs in the cell ([ncRNA] >> [mRNA]) AND that RNAs frequently hybridise THEN maybe stochastic interactions with mRNAs inhibit translation Corley & Laederach (2016) Bioinformatics: Selecting against accidental RNA interactions. eLife.
  • 15.
    How can thishypothesis be tested? We predict that: 1. There is selection against mRNA:ncRNA interactions 2. That stochastic mRNA:ncRNA interactions influence [protein]:[mRNA] ratios For consistency: focus on 6 ncRNA families & 114 mRNAs/proteins that are highly conserved & expressed; And first 21 nts of CDS. Tested 1,582 bacterial & 118 archaeal genomes
  • 16.
    Are mRNA:ncRNA interactionsselected against? −15 −10 −5 0 −0.010−0.0050.0000.0050.0100.015 Binding Energy (kcal/mol) DensityDifference Actinobacteria (n:163) P = 9.8x10−69 Bacteroidetes (n:60) P = 8.7x10−148 Chlamydiae (n:38) P = 1.4x10−193 Cyanobacteria (n:40) P = 3.8x10−11 Firmicutes (n:378) P = 0 Proteobacteria (n:756) P = 0 Spirochaetes (n:38) P = 1.6x10−98 Archaea (n:118) P = 4.2x10−177 Background (n:100) More stable interactions NativeinteractionsShuffledinteractions Act Bac Chl Cya Fir Pro Spi Arc 010203040 −log10P
  • 17.
    Do mRNA:ncRNA interactionsinfluence protein expression? ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 2.02.53.03.54.0 −300 −250 −200 −150 Rs=0.65 log10(fluorescence) Avoidance (kcal/mol) Expression data from: Kudla et al. (2009) Science.
  • 18.
    Do mRNA:ncRNA interactionsinfluence protein expression? Testing the relationship between protein abundance estimates and avoidance, mRNA secondary structure, codon usage and mRNA abundance GFP datasets Mass-Spec datasets E.coli (n=52) GFP/qPCR E.coli (n=154) GFP/Northern E.coli (n=14,234) mCherry/RNAseq E.coli (n=389) MS/microarray E.coli (n=3,301) MS/microarray P.aeruginosa (n=5,479) MS/microarray P.aeruginosa (n=1,148) MS/microarray * * * * * * * * * * * * * * * * * * * * * * * * *P < 0.05 0.0 0.60.2 0.4-0.2 Correlation Coefficient Avoidance Secondary Structure Codon [mRNA]
  • 19.
    Testing the extremesof expression 0.1 0.5 0.8 1.2 1.6 1.9 2.3 2.6 3 3.3 3.7 4.1 4.4 4.8 Freq 0 20 40 60 80 100 120 A log10([Protein]/[mRNA]) Frequency low expression (n=10) high expression (n=10) B Avoidance Codon Sec.Str. Null Sec.Str. Codon Avoidance −2 −1 0 1 2 * * Zscore low expression (n=10) high expression (n=10) E. coli genes (n = 389)
  • 20.
    Designing mRNAs 239aa GFPcan be encoded by 7.62x10111 synonymous mRNAs Extremes of avoidance have a stronger effect than codon usage or secondary structure ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 4.24.34.44.54.64.7 0.60 0.65 0.70 0.75 0.80 0.85 CAI log10(fluorescence) Rs=0.29 ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4.24.34.44.54.64.7 −15 −10 −5 0 Folding Energy (kcal/mol) Rs=0.34 ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 4.24.34.44.54.64.7 −350 −300 −250 −200 −150 −100 Binding Energy (kcal/mol) Rs=0.56 hi low ● ● ● ● ● ● Avoid Fold Codon Optimal●
  • 21.
    Avoidance in 3Don the ribosome Protein binds to regions with low avoidance (green) while exposed regions are high avoidance (blue): P = 9.3x10−15, Fishers exact test
  • 22.
    Further Work Further work: Testingadaptation with experimental evolution experiments Do mRNA:ncRNA interactions influence eukaryotic gene expression? Number of possible interactions increases quadratically with number of genes. May require spatial & temporal separation of genes Does avoidance drive compartmentalisation and increases in nucleotide binding proteins? Do mRNA:ncRNA interactions influence viral infection, hybridisation, HGT & transformation expts? Are protein, DNA and protein:nucleotide interactions also avoided?
  • 23.
    And now forsomething completely different...
  • 24.
    Bioinformaticians are horrible! Bioinformaticiansare bad, impatient & intolerant Build a phylogenetic tree: which of the 172 methods do you use? MBIORE ANC-GENE BAli-Phy BAMBE BayesPhylogenies BEAST BEST Bio++ bms_runner burntrees Cadence Crux IMa2 Mesquite MrBayes MrBayesPlugin MrBayes-tree-scanners Multidivtime p4 SIMMAP PAL tracer PAML Vanilla PHASE PHYLLAB PhyloBayes ARB Bionumerics BIRCH Bosque BPAnalysis CAFCA CRANN DAMBE EMBOSS TNT FootPrinter Freqpars Gambit GAPars GelCompar-II GeneTree gmaes Hennig86 IDEA LVB MALIGN MEGA Mesquite Murka Network NimbleTree NONA Notung Parsimov PAST PAUP* PAUPRat PaupUp phangorn PHYLIP PhyloNet Phylo_win POY PRAP PSODA RA SeaView SeqState Simplot sog TCS Parsimony Maximum Likelihood Bayesian ALIFRITZ aLRT ARB Bio++ Bionumerics BIRCH BootPHYML Bosque CodeAxe CoMET Concaterpillar CONSEL Crux DAMBE DART Darwin dnarates DPRML DT-ModSel EMBOSS EREM fastDNAml fastDNAmlRev FASTML FastTree GARLI GZ-Gamma HY-PHY IQPNNI Kakusan4 Leaphy Mac5 McRate Mesquite MetaPIGA MixtureTree Modelfit ModelGenerator MOLPHY MrAIC MrModeltest MrMTgui MultiPhyl NEPAL NHML nhPhyML NimbleTree p4 PAL PAML PARAT PARBOOT PASSML PAUP* PAUPRat PaupUp phangorn PHYLLAB PhyloCoCo Phylo_win PHYML PhyML-Multi PhyNav PHYSIG PLATO Porn* PRAP PROCOV ProtTest PTP r8s-bootstrap Rate4Site Rate-evolution RAxML raxmlGUI RevDNArates rRNA-phylogeny SeaView Segminator SEMPHY SeqPup SeqState SIMMAP Simplot SLR Spectronet Spectrum SplitsTree SSA TipDate Treefinder TREE-PUZZLE Vanilla
  • 25.
    How can wechoose software? Which methods do you use?
  • 26.
    Approach software likea scientist Are any good controls available? Positive: databases, publications, simulation, ... Negative: randomized, select relevant negative data, ... Some common accuracy metrics: Sensitivity (true positive rate) Specificity (true negative rate) Mathew’s correlation coefficients Area under an ROC curve False positive rate Truepositiverate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 DBS, Pfam DBS, Treefam DBS, Custom PROVEAN Polyphen−2 SIFT FATHMM, weighted FATHMM, unweighted Wheeler et al. (2016) A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes. Bioinformatics.
  • 27.
  • 28.
    Is there reallya relationship between software speed & accuracy? Can we run a meta-analysis of bioinformatic benchmarks? If speed isn’t related to accuracy, then what is? Some possibilities: Software age Journal “impact” (IF & GoogleScholar H5) Number of citations Corresponding author’s H-index & M-index
  • 29.
    After some literaturemining... found 43 matching articles. 102 benchmarks Accuracy & speed ranks for 243 bioinformatic software tools Manually extracted IF, H, age, ... 65 journals (Bioinformatics, NAR, Genome Research, ...) 151 author GoogleScholar profiles abyss antepiseeker apg barry bellerophontes bfast bismark biss boost bowtie bowtie2 bowtiestar bratbw bsmap bsmooth bsseeker buckycon buckymrbayes buckymrbayesspa buckypop buckyraxml builder bwa bwasw caml camp carma ce celera clark clc clustalomega clustalw comus coprarna coral cosine crisp cro cromwell cufflinks cwt dali de dexseq dialign dialign22 dialignt dialigntx diffsplice diginormvelvet dima djigsaw downhillsimplex dsgseq ebi echo edenanonstrict edenastrict edit epimode ericscript erpin fa fasta fasttree fisherexacttest fusioncatcher fusionmap gassst gatk genometa gojobori goldman gossamer gottcha greedyft gsnap heidge hitec hmmer hshrec idbaud igtpduplossft inchworm infernal intarna jaffa kalign kbsps kraken kthse leidnl limpic lmat lms lofreq lsqman mafft mafftfftns mafftfftns2 mafftlinsi mapsplice maq mats megan metaphlan metaphyler methylkit methylsig mgrast minia mira mirdeep mireap mirena mirexpress mlclustalw mlclustalwquicktree mlmafft mlmafftparttree mlmuscle mlopal mlprankgt modellerv mosaik motu mpest mpjclustalw mpsclustalw mrfast mrpml mrpmp mrsfast msinspect multalin muscle musclemaxiters mzmine nbc ncbiblast nest newbler nfuse novoalign oases onecodex openms pairfold paralign pass perm phylonetft phylopythias phymmbl piler poa poy poystar pragcz probalign probcons probtree process pso pt qiime qsra quake raiphy ravenna raxml raxmllimited rdiffparam repeatfinder repeatgluer repeatscout reptile rmap rnacofold rnaduplex rnahybrid rnaplex rnaup rsearch rsmatch sam sate scro scwrl scwrlcons segemehl segmodencad seqgsea seqman seqmap sga sharcgs shrimp simulatedannealing sl smalt snap snpruler snver soap soap2 soapdenovo soapec soapstar spades sparse sparseassembler spcomp specarray spt srmapper ssaha ssake ssap ssearch ssm sst st starbeast strcutal swissmodel taipan targetrna targetrna2 taxatortk tcoffee team tmap tophatfusion transabyss trinity upmes varscan vcake velvet wmrpmp woodhams wublast xalign xcmswithcorrection xcmswithoutretentiontime zema
  • 30.
    Nothing is correlatedwith accuracy! R el.age Year AccuracySpeed JH 5 JIF C ites R el.citesH −index M −index R el.age Year Accuracy Speed JH 5 JIF C ites R el.cites H −index M −index R el.age Year Speed JH 5 JIF C ites R el.cites H −index M −index X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Correlates with accuracy rank Spearman'srho −0.2 −0.1 0.0 0.1 0.2 xxx x x x x x x xx xx x x x x x xx x x x x x x xx x x x xxx x x x x x xxx x xx x x xx x x x x x x x x x x xx x x xx x x x x xx x x x xx x xx x x xxx x x xx xx x x x x x xx xx x x x xx xx x xxxx x x x x x x x xxx xx xxx x x x x x x x xx x xx x x x xxxxxx x xx xxxxxx x x x x x x x x x x xxxx x xxxx xx x x x xx xxx xx x x x xx x x x x x xx x x x x x x x xx x x xx x x x xx xx x xx x x x x xx x x x x x x x x xx x xx x x x x xx xx x x x x xxx x x x x x x xx x x x x x x x x x x x xx x x x x x x x x x x x x x xx x x x x xx x x x x x x x x x xx x x xx x x x x x x xx x xxx x x xxxx x xx x xxxx x xxxx x x xx xx x xxx x xx xx x x x x x x x xx x x x x x x x xx xxx x xx x x x xx xxx x x x x x xxx x xx x xx xx x x x x x x x xxxxx x x x x xx x x x x xxxxx x x xx x x xxx x xx x x xx x x x x xxxx x x xx x x x xx xxx x x x x x x x x x x x xx x xx x xx xx x x x x x xx x x xxx x x x x x xxxxxx xx x x xxx x x xx xxxx xx xx xxx x x xxxxxxx x xxx x xxxxxxx x x x x xxx xx x x x x xxxxxx xxx x x x x xxxx x x x x x x x xxxxxxx x x xxx xx xx xxxxx x x x x x xx x x xx x xxxxx x x xx xxx x x x xx xxx x x x x x x xxxx x x x xxxx x xxx x x x x xx x xx x x x xxx x x xx x x xxx x x xxx x x x x x x x x xx x x xx x x x x x x x x xx xx xx xx x x x x x x xx x x xx x x xx x x x x xx x x x x x x x xx x x x x x xx x x xx x x xxxx x x x x xxx xxxxx x x x x x x x x x x xx x x xxx xxx xx x x x x x x xx xx x x x x x xx xx x x x x x x xx xx x x xx x x x x xxx x xx x x xx xx x xx x x xx x x x xx x x x x xxx x x x x xx x x x x xx x x x x xx x x xx x x x x x x x x xxx xx x x xx x xxx xx x x xxx xxx xxx x x x x x x xx x x x xxx xx x x x xx x x x x x x x xx x x x xxx x x x x x x xx x xx x x x x x x x xxxx x x x xxx x x x x xx xx x x x x xx x x x x x xx x x x xx x xxx xx x x x xx x xx x x x x x xx x x x x x x x x xxx x xxxxxx x x x x x x x x x x x x x x x x xx x x x xx x x x xx x xx x x x x x x xx x x x x x x xxx x x xx x x x x x xx x xx x x x xx xx xx x x xx x x x x x x xx x x xxx x x x x x x x xx x x x x xx xx x x x x x xx x x x x xxxxx x x x xx xx x xx x x x x x x x x x x x x xx x x x x x x x x x xxxxx x x x x xx x x x x xx x x x x x x x xxxx x xxx x x x x xxx x x x x xx x x x xx x x x x xx x x x xx x x xx xxx x x xx x x x x x x x x x x x x x x x x x x x x xx x xx x x x x x xxx x xxxxxx x x x x x xxxx x x x xxxxx x x x xx x x x xxx x x x x x x xx x x x x x x xx x x x x xx x x x x xx x x x x x xx x x xx xx x x x xx x xx x x x x x x x xxxx x x x x xxxx xxx xx x xx x x x xxx x x x x x x x xxx x xx x x xx x x x x xx x x x x x xxx x xxx xxx x x x x x x x x x xx x xxx x xxx x x x x xxxx x xxxx x xx x x x x x xx xx x x x x x x x x x xx xxx x x x x x x x x x x x x x x xxx x x x xx x x x x x x x xx x x xx x xx x x x x x x x xxx x x x x x x x x x x xxx x x x x x x x x xx xx xx x x x x x xx x x x x x xx x x x x x x x xx x x x x x xx x x x x x xxxx xx x x xxxx xx x x x x x x x x x x x x x xx x x x x x x xx x x x x xxx xx x xxx x x x x x x x xxx x xx x x x x x x x x x x x x x x x x x x xx x x x xxx xx x x x x x x x x xx x xxx x xx x x xxx xx x x x x x x xx x x x xx x xx x x xx x x xx xxx x x x x x x x x x x x x xx x x x xx x x x x x x x xx x xx x xxx x x x x x xxx xxx x x xx x xx x x x x x x x x x x x x x x x x x x x x x xx x x x xx x x xxx x x x x x x x x x xx xx x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xx xx x x x x xx x xx x x x x xx x x x x x x x x xx x x x x x x x x x x x x x x x xxx xx xx x x xxxxxx xx x xxxxx x x x xxx xxx x x x x x x x x xxx x xx x x x x xxx x x x x x xx x xxx x x xx x xx xxx x xx x x x x x xx x xxxxx x x x x xx x x x x x x x x x x x x xx x x xx xx x x xx x x x x xxx x x xx x xx x x x xxx x xx x x x x x x xx x xxxxx x x x x x x x xx x x x x xx x x x x x x x x x x xx x x x x x x x x x x x x x x x x x x x x xx xx xxx x x x x x xx x x x x xx x x x x x x x x x x x x x x x x x x x x x xx xx x xx x x xxx xx x x x x x x x x x x x x xx x x x x xx x x x xx x x xxx x x x x x x xxx x x xx x xx x x xx x x x x x x x x x x x x xx xx xxx x x x x x x xx x x x x xx x x xx x x x x xx x x x x x x x x xx x x x x x xx x x x x xx x x x xx xx x x x x x xxxx x x x x x x x x x xx x xx x x x x x x x x x x x x x x xxx x x xx xx x x x xx xx x x x x x x x x x x x x x x x x xxx x x x x x x x x x x x x x x x xxxxx xx xx x x x x x x x x x x xxx x xx x x x x xx xx x x xxx x xx xxx x x x x xx x xx x x x x x x x x x x x x x x x x x xx x x x x x xx x x x x x x x x x x x x x x xxxx x x x xx x x x xxx xx xx xxx x x xx x xx x xx x x x x x xxx x x x x x xx x x xx x x x x xx x xxxx xx x x x xx x x x x x x x x x x xxx xx x x x x x x x x x x xx x x xx x x x x xx x x x xx x x x xxxx xx xx x x x x x x x x x x x x x x xx x x x xx x x x x x x xx x x x xxx x x x xx x xx x x x x xx xxx x xxx x x x xxxxx x x x x xx xxx xxx x xxx x x x x x x x x xx x xx xx x x x x x xxx x x xx xx x x xx x x x xx xx xx x x x x x x x x x xxx x x x x xx x x x x x xx x x x x x x x x x x xx x x xx xx x x x x xxx x x x xxx x x xx x x x x x x xx x x xxxx x x xx x x x x x xx x x x xxx x x x x x xxx xx x x x xx xxx x x x x x x x xx xx x x x x x xx x x x x x x xx x x x x x x x x x x x x x x x xx x xxxx xxx x x x x x x x x x xxx x x x x x x x xx x x x x x xx x x x x x x x x xx x x x x x x x xxxxxx x x x x x xxx x x x x xx x x x x x xx xxx x x x x x x x x xx x x x xx x x x x x x x x x x x x x x xx x x x x x x x x x x x x x x x x x x x x xxx x x x x x x x x x x x xxxx x x x x x x x x xx x x x x xx xx x x x x x x x x x xxx x x x x xx x x x x x x x x x x x xxx x xx x x x xxx xx x x x x x x x xx xx x x x x x x x x x x x x x x x x x x x x x x x xx x x x x xx x x x xx x x x x x x x xx x xx x x x xx xx xxx x x xx x x xx xxx x xxx x x x x x x x x x x xx x x x x x xx x xx x xxxxxx x x x xxx x x x x xx x x x x x x x x x x x x x xx x x x x x x x x x x xx x x xx xx x xx xx x x x x xxx xx x x x x x x x xx xxx x x x xx x x x x x x x xx x x x x x xx x x x x x x x x x x x x x x x x x x x x x x x xx x x x x x x x x x x x x xxx x x x xx xx x x x x xx x x x x x x x x x xxx x x x x x x x xxx xx x xx x xx x x x x x x xx x x x x xx x x x x x xxx x x x xx xxx x x x x x x x x xx x x x x xx x xx xxx xx xxxxxx x x x x xxx x x x x x x x x xxxxx x xx xx x xx xx xxx x x xx x x x x x x x xxx x x x x x x x x x xx x xx x x x x x x xx x xxx x x x xx x xxxx xx x xx x xx x xx x xx x xx x xxx x xx x x x x xx xx xx xx xxx x x x x x x x x x xx xx x x x xx x x x x x x xx xx xx x xx x x x x x x x x x xx x xxxx x x xx x x x xx x x x x x x x x x x xx x x xxx x x x x x x x x x x x x xx x xx x xx x x xx x x x x x x x x x xx x x xx x x x x xx x xx x x x x xx x xx x x xx x x x x xxx xx x x x x x xx x x x x xx x x x x x xx xx x x x xx x x x x xx x x x x x x xxx x x xx x x x x x x x x xx x xxxx x x x x x -1 0 1 Spearman's rho A B
  • 31.
    -3 30 Z-score Speed Accuracy Freq. 0 612 0 1000 2000 Freq. 0 6 12 0 1000 2000 Freq. 0 20 0 5000 10000 10 Freq. 0 6 12 0 1000 2000 Freq. 0 6 12 0 1000 2000 X X X X X X X X X X X X X X X X X X X X X
  • 32.
    Conclusions Speed is NOTreflective of accuracy Neither is author/journal reputation, software age & # citations The only reasonable way to select software is by benchmarking Publication bias is influencing software accuracy It doesn’t matter how famous you are, you can still write great software!
  • 33.
    Thanks! Avoidance: Sinan Umu,Anthony Poole & Renwick Dobson Meta-benchmark: James Paterson, Fatemeh Ashari Ghomi, Sinan Umu, Stephanie McGimpsey, Aleksandra Pawlik Umu, Poole, Dobson & Gardner (2016) Avoidance of stochastic RNA interactions can be harnessed to control protein expression levels in bacteria and archaea. eLife. Gardner et al. (2017) A meta-analysis of bioinformatics software benchmarks reveals that publication-bias influences software accuracy. In preparation. These slides are available at: http://www.slideshare.net/ppgardne/presentations