SlideShare a Scribd company logo
1 of 38
Fall 2015
Kurt Wollenberg, PhD
Phylogenetics Specialist
Bioinformatics and Computational Biology Branch (BCBB)
Phylogenetics and Sequence Analysis
Lecture 6: Selection Analysis Using
HyPhy
10 ω ∞
Course Organization
• Building a clean sequence
• Collecting homologs
• Aligning your sequences
• Building trees
• Further Analysis
Lecture Organization
• What is selection?
• What does selection look like?
• HyPhy: using maximum likelihood to
quantify selection
• HyPhy input
• Interpreting your output
What is selection?
• Mutation – the source of all variation
• Substitution – changing one nucleotide to
another
• Synonymous substitution
• Non-synonymous substitution
• Conservative substitution
Synonymous/non-synonymous
Conservative/non-conservative
Adapted from Miller and Levine
Substitution
...GGG TGT TGT...
Time
...GGC AGT TGG...
Translation Translation
...G-S-W... ...G-C-C...
conservative
What is selection?
• Differential survival of genotypes
• Positive selection - diversifying
• Maintain variation
• Variation can also be maintained by
genetic drift
• Negative selection – purifying
• Remove variation
Why is selection important?
• Pathogens
 Surviving the immune system
• Patients
 Surviving infection
• Identification of the molecular basis for
survival
What is selection analysis?
• Determining the dN/dS ratio
 Inference of ancestral states
 Determination of the effect of inferred
nucleotide changes on amino acids
• Is dN/dS significantly different from 1?
What is selection analysis?
• Positive (Diversifying) Selection
dN/dS > 1
• Negative (Purifying) Selection
dN/dS < 1
Two types of selection analysis
• Selection at sites
→ Are there significant levels of non-synonymous
substitution at specific codons?
→ Averaged across lineages.
• Selection along lineages
→ Are there significant levels of non-synonymous
substitution in specific groups?
→ Averaged across sequences.
Two types of selection analysis
5 100 I
MC1_01B4fs TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
MC1_01A10 TGCACT---A ATCTGACAAA GGCTATTAAG ACCAATGGGA ATGCTAATAA
MC1_01C1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
MC1_01A20 TGCACTAATA ATCTGACAAA GGCTAGTAAT GCCACTGAGA AGGCTAATAA
MC1_01TA1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA
TACCAGTACT AATGCCACTG AGAGTACCAT TACTAATACC ACTGAAATAA
TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA
TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA
TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA
Selection at sites (codons)
Selection
along
lineages
MC1_01B4fs
MC1_01A10
MC1_01C1
MC1_01A20
MC1_01TA1
Calculating dN/dS
• Estimate transition/transversion rate ratio
→ Transition: A G C T
→ Transversion: purine purimidine
→ Based on four-fold degenerate sites at third codon positions and
nondegenerate sites.
• Count synonymous and nonsynonymous substitutions
for each codon site or along each lineage.
→ Use ML estimates of ancestral codon states.
• Correct counts for multiple hits.
• Fitting the data to a distribution of dN/dS ratios.
→ Calculate the probability of dN/dS falling into a particular class.
A G
C T
purines
pyrimidines
HyPhy: the software
• Input data format
→ Sequences must be aligned.
→ Fasta, PHYLIP, or Nexus format.
→ Include the tree after the alignment.
• Program options
→ Analysis scripts.
http://datamonkey.org
Standard Analyses
HyPhy: the software
Positive Selection Analyses
HyPhy: the software
• REL – relative effects model – very similar to PAML
codeml F61 analysis
• 2pFEL –2-parameter fixed-effects model
• MEME – mixed-effects model
The Algorithms
HyPhy: the software
• Analyzes levels of positive selection at individual
sites while allowing level of positive selection to vary
from branch to branch.
• Other models (FEL, REL, etc.) assume level of
positive selection to be uniform across all branches.
• Includes site-to-site variation in synonymous
substitution rate, just like FEL.
The MEME Algorithm
HyPhy: the software
Mixed Effects Model of Evolution
HyPhy: Input file formats
>MC1_01B4fs
TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT
>MC1_01A10
TGCACT---AATCTGACAAAGGCTATTAAGACCAATGGGAATGCTAATAATACCAGTACT
>MC1_01C1
TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT
>MC1_01A20
TGCACTAATAATCTGACAAAGGCTAGTAATGCCACTGAGAAGGCTAATAATACCATTACT
>MC1_01TA1
TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT
(((MC1_01B4fs,MC1_01A10),MC1_01C1),(MC1_01A20,MC1_01TA1));
Fasta
Make sure
these
match!
No stop codons!
HyPhy: Input file formats
>MC1_01B4fs
>MC1_01A10
>MC1_01C1
>MC1_01A20
>MC1_01TA1
TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT
TGCACT---AATCTGACAAAGGCTATTAAGACCAATGGGAATGCTAATAATACCAGTACT
TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT
TGCACTAATAATCTGACAAAGGCTAGTAATGCCACTGAGAAGGCTAATAATACCATTACT
TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT
AAT---------ACCACTATTAATAGCGGGGGAGAAATAA
AATGCCACTGAGAGTACCATTACTAATACCACTGAAATAA
AAT---------ACCACTATTAATAGCGGGGGAGAAATAA
AAT---------ACCACTATTGATAGCGGGGGAGAAATAA
AAT---------ACCACTATTGATAGCGGGGGAGAAATAA
(((MC1_01B4fs,MC1_01A10),MC1_01C1),(MC1_01A20,MC1_01TA1));
Fasta - interleaved
Make sure
these
match!
No stop codons!
HyPhy: Input file formats
5 100 I
MC1_01B4fs TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
MC1_01A10 TGCACT---A ATCTGACAAA GGCTATTAAG ACCAATGGGA ATGCTAATAA
MC1_01C1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
MC1_01A20 TGCACTAATA ATCTGACAAA GGCTAGTAAT GCCACTGAGA AGGCTAATAA
MC1_01TA1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA
TACCAGTACT AATGCCACTG AGAGTACCAT TACTAATACC ACTGAAATAA
TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA
TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA
TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA
1
(((MC1_01B4fs,MC1_01A10),MC1_01C1),(MC1_01A20,MC1_01TA1));
PHLYIP
Make sure
these
match!
No stop codons!
HyPhy: Input file formats
#NEXUS
Begin data;
Dimensions ntax=5 nchar=100;
Format datatype=DNA gap=- interleave;
MATRIX
MC1_01B4fs TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
MC1_01A10 TGCACT---A ATCTGACAAA GGCTATTAAG ACCAATGGGA ATGCTAATAA
MC1_01C1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
MC1_01A20 TGCACTAATA ATCTGACAAA GGCTAGTAAT GCCACTGAGA AGGCTAATAA
MC1_01TA1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA
TACCAGTACT AATGCCACTG AGAGTACCAT TACTAATACC ACTGAAATAA
TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA
TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA
TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA
;
End;
Begin trees;
TREE
tree=(((MC1_01B4fs,MC1_01A10),MC1_01C1),(MC1_01A20,MC1_01TA1));
End;
Nexus
Make sure
these
match!
No stop codons!
HyPhy: Input file formats
5 100 I
MC1_01B4fs TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
MC1_01A10 TGCACT---A ATCTGACAAA GGCTATTAAG ACCAATGGGA ATGCTAATAA
MC1_01C1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
MC1_01A20 TGCACTAATA ATCTGACAAA GGCTAGTAAT GCCACTGAGA AGGCTAATAA
MC1_01TA1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA
TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA
TACCAGTACT AATGCCACTG AGAGTACCAT TACTAATACC ACTGAAATAA
TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA
TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA
TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA
Sequence data
GAPS!
• Gaps must be in multiples of three and preserve the reading frame!
• ML models cannot analyze gapped codon sites – they will be ignored for the
analysis.
HyPhy: data input
Codon frequency models
0: 1/61 – All codons equally probable
1: F1x4 - Codon frequencies based on overall nucleotide
frequencies (fA, fC, fG, fT)
2: F3x4 - Codon frequencies based on nucleotide frequencies at
each codon site (fA1, fC1, fG1, fT1, fA2, fC2, fG2, fT2, fA3, fC3, fG3, fT3)
3: F61 - Codon frequencies based on frequency of each codon in
the data (faaa, faac, faag, faat, faca, facc, facg, fact, faga, fagc, fagg, fagt, ...)
F3x4 is used in the MEME analysis
HyPhy: data output
HyPhy: data output
HyPhy: data output
Codon alpha beta1 p1 beta2 p2 LRT p-value q-value Log(L)
125 0.0000 0.0000 0.7253 13.6010 0.2747 8.0176 0.0081 0.3258 -30.6957
153 0.9139 0.0000 0.8927 49.5335 0.1073 8.7797 0.0055 0.2580 -22.6063
163 0.6646 0.0000 0.8478 20.0270 0.1522 5.4309 0.0304 0.8559 -25.8135
188 0.0000 0.0000 0.9673 1883.3800 0.0327 9.7907 0.0033 0.1853 -12.7060
211 0.7245 0.0000 0.9660 4586.8100 0.0340 11.9000 0.0011 0.1062 -22.5928
218 0.8903 0.0000 0.9662 304.8150 0.0338 13.3192 0.0006 0.1555 -17.2640
234 0.0000 0.0000 0.8160 263.5910 0.1840 10.1891 0.0027 0.1893 -33.7409
237 0.0969 0.0969 0.7191 17.6831 0.2809 12.3769 0.0009 0.1252 -35.2664
239 0.6719 0.0000 0.8254 11.6736 0.1746 5.9541 0.0232 0.7269 -22.8152
264 0.0000 0.0000 0.9370 8.8091 0.0630 6.6746 0.0160 0.5655 -11.4430
HyPhy: further analyses
Selection Analysis
Site vs. Lineage Selection
Generally, if significant positive (or
negative) selection occurs at only a few
sites, averaging over the length of the
sequences will result in no significant
positive (or negative) selection being
detected.
Significance of Results
Selection analysis only determines if there
is a significant excess or lack of non-
synonymous substitution. The relative level
of physiochemical similarity among the two
amino acids is beyond the capabilities of
selection analysis software.
Selection Analysis
References
Fundamentals of Molecular Evolution. 2nd edition. Grauer and Li.
2000
A good introduction to the major concepts in molecular evolution.
The Phylogenetics Handbook. P. Lemey, editor. 2005.
Chapter 14: Theory of and practice of diversifying selection
analysis.
MEME reference:
Murrell B, et al. (2012) Detecting individual sites subject to
episodic diversifying selection. PLoS Genetics, 8(7):e1002764.
Seminar Follow-Up Site
 For access to past recordings, handouts, slides visit this site from the
NIH network:
http://collab.niaid.nih.gov/sites/research/SIG/Bioinformatics/
34
1. Select a
Subject Matter
View:
• Seminar Details
• Handout and
Reference Docs
• Relevant Links
• Seminar
Recording Links
2. Select a
Topic
Recommended Browsers:
• IE for Windows,
• Safari for Mac (Firefox on a
Mac is incompatible with
NIH Authentication
technology)
Login
• If prompted to log in use
“NIH” in front of your
username
35
Retrieving Slides/Handouts
This lecture
series
36
Retrieving Slides/Handouts
This lecture
These slides
37
Questions?
38
Next
Molecular Evolutionary Analysis of
Pathogens Using BEAST
Thursday, 10 December at 1300

More Related Content

What's hot

MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESKaran Veer Singh
 
Alpha fold 2
Alpha fold 2Alpha fold 2
Alpha fold 2Vishwas N
 
Runs of Homozygosity presentation
Runs of Homozygosity presentationRuns of Homozygosity presentation
Runs of Homozygosity presentationHadeel Abu Jamous
 
Statistical analysis of genetics parameter
Statistical analysis of genetics parameterStatistical analysis of genetics parameter
Statistical analysis of genetics parameterPawan Nagar
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applicationsAGRF_Ltd
 
DNA BarcodING IN ANIMALS
DNA BarcodING IN ANIMALS DNA BarcodING IN ANIMALS
DNA BarcodING IN ANIMALS Gull Fatima
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignmentavrilcoghlan
 
Telomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeTelomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeAdam Phillippy
 
FindMod
FindModFindMod
FindModSobia
 

What's hot (20)

MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
 
Alpha fold 2
Alpha fold 2Alpha fold 2
Alpha fold 2
 
Runs of Homozygosity presentation
Runs of Homozygosity presentationRuns of Homozygosity presentation
Runs of Homozygosity presentation
 
Topological associated domains- Hi-C
Topological associated domains- Hi-CTopological associated domains- Hi-C
Topological associated domains- Hi-C
 
Statistical analysis of genetics parameter
Statistical analysis of genetics parameterStatistical analysis of genetics parameter
Statistical analysis of genetics parameter
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
DNA BarcodING IN ANIMALS
DNA BarcodING IN ANIMALS DNA BarcodING IN ANIMALS
DNA BarcodING IN ANIMALS
 
Biological networks - building and visualizing
Biological networks - building and visualizingBiological networks - building and visualizing
Biological networks - building and visualizing
 
Phylogenetics: Tree building
Phylogenetics: Tree buildingPhylogenetics: Tree building
Phylogenetics: Tree building
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Parsimony methods
Parsimony methodsParsimony methods
Parsimony methods
 
Exome sequence analysis
Exome sequence analysisExome sequence analysis
Exome sequence analysis
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
BEAST: Species phylogeny and phylogeographic analysis
BEAST: Species phylogeny and phylogeographic analysisBEAST: Species phylogeny and phylogeographic analysis
BEAST: Species phylogeny and phylogeographic analysis
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Telomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeTelomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosome
 
FindMod
FindModFindMod
FindMod
 

Viewers also liked

Bottlenecks -- some ramblings and a bit of data from maize PAGXXII
Bottlenecks -- some ramblings and a bit of data from maize PAGXXIIBottlenecks -- some ramblings and a bit of data from maize PAGXXII
Bottlenecks -- some ramblings and a bit of data from maize PAGXXIIjrossibarra
 
Estrategia pedagógica para fortalecer la lectura
Estrategia pedagógica para fortalecer la lecturaEstrategia pedagógica para fortalecer la lectura
Estrategia pedagógica para fortalecer la lecturaJhon Becerra
 
Evolutionary Genetics of Complex Genome
Evolutionary Genetics of Complex GenomeEvolutionary Genetics of Complex Genome
Evolutionary Genetics of Complex Genomejrossibarra
 
1ª lista de exercício de administração financeira monitores leony e michelly
1ª lista de exercício de administração financeira   monitores leony e michelly1ª lista de exercício de administração financeira   monitores leony e michelly
1ª lista de exercício de administração financeira monitores leony e michellyFelipe Pontes
 
Exercício derivativos
Exercício derivativos Exercício derivativos
Exercício derivativos Felipe Pontes
 
Lmcp 1532 pembangunan bandar mapan
Lmcp 1532 pembangunan bandar mapanLmcp 1532 pembangunan bandar mapan
Lmcp 1532 pembangunan bandar mapanCHEW leeyee
 

Viewers also liked (20)

Phylogenetics: Making publication-quality tree figures
Phylogenetics: Making publication-quality tree figuresPhylogenetics: Making publication-quality tree figures
Phylogenetics: Making publication-quality tree figures
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
BEAST: Time-stamped data and population dynamics
BEAST: Time-stamped data and population dynamicsBEAST: Time-stamped data and population dynamics
BEAST: Time-stamped data and population dynamics
 
Pathogen phylogenetics using BEAST
Pathogen phylogenetics using BEASTPathogen phylogenetics using BEAST
Pathogen phylogenetics using BEAST
 
3D graphics using VMD
3D graphics using VMD3D graphics using VMD
3D graphics using VMD
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Bottlenecks -- some ramblings and a bit of data from maize PAGXXII
Bottlenecks -- some ramblings and a bit of data from maize PAGXXIIBottlenecks -- some ramblings and a bit of data from maize PAGXXII
Bottlenecks -- some ramblings and a bit of data from maize PAGXXII
 
Estrategia pedagógica para fortalecer la lectura
Estrategia pedagógica para fortalecer la lecturaEstrategia pedagógica para fortalecer la lectura
Estrategia pedagógica para fortalecer la lectura
 
Evolutionary Genetics of Complex Genome
Evolutionary Genetics of Complex GenomeEvolutionary Genetics of Complex Genome
Evolutionary Genetics of Complex Genome
 
Cocopala khanif
Cocopala khanifCocopala khanif
Cocopala khanif
 
Cuestionario
Cuestionario Cuestionario
Cuestionario
 
1ª lista de exercício de administração financeira monitores leony e michelly
1ª lista de exercício de administração financeira   monitores leony e michelly1ª lista de exercício de administração financeira   monitores leony e michelly
1ª lista de exercício de administração financeira monitores leony e michelly
 
Langebio 2015
Langebio 2015Langebio 2015
Langebio 2015
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHub
 
Exercício derivativos
Exercício derivativos Exercício derivativos
Exercício derivativos
 
PORTFOLIO (FEBRERO 2017)
PORTFOLIO (FEBRERO 2017)PORTFOLIO (FEBRERO 2017)
PORTFOLIO (FEBRERO 2017)
 
Pronoun personal
Pronoun personalPronoun personal
Pronoun personal
 
Lmcp 1532
Lmcp 1532Lmcp 1532
Lmcp 1532
 
Lmcp 1532 pembangunan bandar mapan
Lmcp 1532 pembangunan bandar mapanLmcp 1532 pembangunan bandar mapan
Lmcp 1532 pembangunan bandar mapan
 

Similar to Selection analysis using HyPhy

Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
 
Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008bosc_2008
 
The Paternal Tree of Humanity
The Paternal Tree of HumanityThe Paternal Tree of Humanity
The Paternal Tree of HumanityFamily Tree DNA
 
SNC-LAVALIN SHOWCASES SIMULATION-BASED PRODUCTION SCHEDULING EXPERTISE AT 201...
SNC-LAVALIN SHOWCASES SIMULATION-BASED PRODUCTION SCHEDULING EXPERTISE AT 201...SNC-LAVALIN SHOWCASES SIMULATION-BASED PRODUCTION SCHEDULING EXPERTISE AT 201...
SNC-LAVALIN SHOWCASES SIMULATION-BASED PRODUCTION SCHEDULING EXPERTISE AT 201...Vincent Béchard, MASc.
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...Benjamin Schwessinger
 
Applied Econometrics assignment3
Applied Econometrics assignment3Applied Econometrics assignment3
Applied Econometrics assignment3Chenguang Li
 
Single elctron transisto PHASE 2.pptx
Single elctron transisto PHASE 2.pptxSingle elctron transisto PHASE 2.pptx
Single elctron transisto PHASE 2.pptxssuser1580e5
 
Bioc4010 sample questions
Bioc4010 sample questionsBioc4010 sample questions
Bioc4010 sample questionsDan Gaston
 
Simple Microarray Analysis Using R
Simple Microarray Analysis Using RSimple Microarray Analysis Using R
Simple Microarray Analysis Using RAhmed Moustafa
 
atmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdf
atmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdfatmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdf
atmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdfVeswin Electronics
 
Course on parsing methods for biologists with a focus on ChIP-seq data
Course on parsing methods for biologists with a focus on ChIP-seq dataCourse on parsing methods for biologists with a focus on ChIP-seq data
Course on parsing methods for biologists with a focus on ChIP-seq dataLuca Cozzuto
 
Fpga(field programmable gate array)
Fpga(field programmable gate array) Fpga(field programmable gate array)
Fpga(field programmable gate array) Iffat Anjum
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings Sylvain Hallé
 

Similar to Selection analysis using HyPhy (20)

In silico analysis for unknown data
In silico analysis for unknown dataIn silico analysis for unknown data
In silico analysis for unknown data
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
The Paternal Tree of Humanity
The Paternal Tree of HumanityThe Paternal Tree of Humanity
The Paternal Tree of Humanity
 
SNC-LAVALIN SHOWCASES SIMULATION-BASED PRODUCTION SCHEDULING EXPERTISE AT 201...
SNC-LAVALIN SHOWCASES SIMULATION-BASED PRODUCTION SCHEDULING EXPERTISE AT 201...SNC-LAVALIN SHOWCASES SIMULATION-BASED PRODUCTION SCHEDULING EXPERTISE AT 201...
SNC-LAVALIN SHOWCASES SIMULATION-BASED PRODUCTION SCHEDULING EXPERTISE AT 201...
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
 
Applied Econometrics assignment3
Applied Econometrics assignment3Applied Econometrics assignment3
Applied Econometrics assignment3
 
Manual licor 6200 condensado
Manual licor 6200 condensadoManual licor 6200 condensado
Manual licor 6200 condensado
 
Single elctron transisto PHASE 2.pptx
Single elctron transisto PHASE 2.pptxSingle elctron transisto PHASE 2.pptx
Single elctron transisto PHASE 2.pptx
 
Bioc4010 sample questions
Bioc4010 sample questionsBioc4010 sample questions
Bioc4010 sample questions
 
Simple Microarray Analysis Using R
Simple Microarray Analysis Using RSimple Microarray Analysis Using R
Simple Microarray Analysis Using R
 
atmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdf
atmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdfatmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdf
atmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdf
 
Paloma Pérez-Enfermedades raras de la piel
Paloma Pérez-Enfermedades raras de la pielPaloma Pérez-Enfermedades raras de la piel
Paloma Pérez-Enfermedades raras de la piel
 
Course on parsing methods for biologists with a focus on ChIP-seq data
Course on parsing methods for biologists with a focus on ChIP-seq dataCourse on parsing methods for biologists with a focus on ChIP-seq data
Course on parsing methods for biologists with a focus on ChIP-seq data
 
Fpga(field programmable gate array)
Fpga(field programmable gate array) Fpga(field programmable gate array)
Fpga(field programmable gate array)
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings
 
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
 
Advanced Computational Drug Design
Advanced Computational Drug DesignAdvanced Computational Drug Design
Advanced Computational Drug Design
 

More from Bioinformatics and Computational Biosciences Branch

More from Bioinformatics and Computational Biosciences Branch (20)

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Intro to homology modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Biological networks
Biological networksBiological networks
Biological networks
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Categorical models
Categorical modelsCategorical models
Categorical models
 
Better graphics in R
Better graphics in RBetter graphics in R
Better graphics in R
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
GraphPad Prism: Curve fitting
 

Recently uploaded

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 

Recently uploaded (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 

Selection analysis using HyPhy

  • 1. Fall 2015 Kurt Wollenberg, PhD Phylogenetics Specialist Bioinformatics and Computational Biology Branch (BCBB) Phylogenetics and Sequence Analysis Lecture 6: Selection Analysis Using HyPhy
  • 3. Course Organization • Building a clean sequence • Collecting homologs • Aligning your sequences • Building trees • Further Analysis
  • 4. Lecture Organization • What is selection? • What does selection look like? • HyPhy: using maximum likelihood to quantify selection • HyPhy input • Interpreting your output
  • 5. What is selection? • Mutation – the source of all variation • Substitution – changing one nucleotide to another • Synonymous substitution • Non-synonymous substitution • Conservative substitution
  • 7. Substitution ...GGG TGT TGT... Time ...GGC AGT TGG... Translation Translation ...G-S-W... ...G-C-C... conservative
  • 8. What is selection? • Differential survival of genotypes • Positive selection - diversifying • Maintain variation • Variation can also be maintained by genetic drift • Negative selection – purifying • Remove variation
  • 9. Why is selection important? • Pathogens  Surviving the immune system • Patients  Surviving infection • Identification of the molecular basis for survival
  • 10. What is selection analysis? • Determining the dN/dS ratio  Inference of ancestral states  Determination of the effect of inferred nucleotide changes on amino acids • Is dN/dS significantly different from 1?
  • 11. What is selection analysis? • Positive (Diversifying) Selection dN/dS > 1 • Negative (Purifying) Selection dN/dS < 1
  • 12. Two types of selection analysis • Selection at sites → Are there significant levels of non-synonymous substitution at specific codons? → Averaged across lineages. • Selection along lineages → Are there significant levels of non-synonymous substitution in specific groups? → Averaged across sequences.
  • 13. Two types of selection analysis 5 100 I MC1_01B4fs TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA MC1_01A10 TGCACT---A ATCTGACAAA GGCTATTAAG ACCAATGGGA ATGCTAATAA MC1_01C1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA MC1_01A20 TGCACTAATA ATCTGACAAA GGCTAGTAAT GCCACTGAGA AGGCTAATAA MC1_01TA1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA TACCAGTACT AATGCCACTG AGAGTACCAT TACTAATACC ACTGAAATAA TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA Selection at sites (codons) Selection along lineages MC1_01B4fs MC1_01A10 MC1_01C1 MC1_01A20 MC1_01TA1
  • 14. Calculating dN/dS • Estimate transition/transversion rate ratio → Transition: A G C T → Transversion: purine purimidine → Based on four-fold degenerate sites at third codon positions and nondegenerate sites. • Count synonymous and nonsynonymous substitutions for each codon site or along each lineage. → Use ML estimates of ancestral codon states. • Correct counts for multiple hits. • Fitting the data to a distribution of dN/dS ratios. → Calculate the probability of dN/dS falling into a particular class. A G C T purines pyrimidines
  • 15. HyPhy: the software • Input data format → Sequences must be aligned. → Fasta, PHYLIP, or Nexus format. → Include the tree after the alignment. • Program options → Analysis scripts. http://datamonkey.org
  • 18. • REL – relative effects model – very similar to PAML codeml F61 analysis • 2pFEL –2-parameter fixed-effects model • MEME – mixed-effects model The Algorithms HyPhy: the software
  • 19. • Analyzes levels of positive selection at individual sites while allowing level of positive selection to vary from branch to branch. • Other models (FEL, REL, etc.) assume level of positive selection to be uniform across all branches. • Includes site-to-site variation in synonymous substitution rate, just like FEL. The MEME Algorithm HyPhy: the software Mixed Effects Model of Evolution
  • 20. HyPhy: Input file formats >MC1_01B4fs TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT >MC1_01A10 TGCACT---AATCTGACAAAGGCTATTAAGACCAATGGGAATGCTAATAATACCAGTACT >MC1_01C1 TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT >MC1_01A20 TGCACTAATAATCTGACAAAGGCTAGTAATGCCACTGAGAAGGCTAATAATACCATTACT >MC1_01TA1 TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT (((MC1_01B4fs,MC1_01A10),MC1_01C1),(MC1_01A20,MC1_01TA1)); Fasta Make sure these match! No stop codons!
  • 21. HyPhy: Input file formats >MC1_01B4fs >MC1_01A10 >MC1_01C1 >MC1_01A20 >MC1_01TA1 TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT TGCACT---AATCTGACAAAGGCTATTAAGACCAATGGGAATGCTAATAATACCAGTACT TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT TGCACTAATAATCTGACAAAGGCTAGTAATGCCACTGAGAAGGCTAATAATACCATTACT TGCACTAATAATCTGATT---------AATATCACTGAGAATACTAATAATACCATTACT AAT---------ACCACTATTAATAGCGGGGGAGAAATAA AATGCCACTGAGAGTACCATTACTAATACCACTGAAATAA AAT---------ACCACTATTAATAGCGGGGGAGAAATAA AAT---------ACCACTATTGATAGCGGGGGAGAAATAA AAT---------ACCACTATTGATAGCGGGGGAGAAATAA (((MC1_01B4fs,MC1_01A10),MC1_01C1),(MC1_01A20,MC1_01TA1)); Fasta - interleaved Make sure these match! No stop codons!
  • 22. HyPhy: Input file formats 5 100 I MC1_01B4fs TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA MC1_01A10 TGCACT---A ATCTGACAAA GGCTATTAAG ACCAATGGGA ATGCTAATAA MC1_01C1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA MC1_01A20 TGCACTAATA ATCTGACAAA GGCTAGTAAT GCCACTGAGA AGGCTAATAA MC1_01TA1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA TACCAGTACT AATGCCACTG AGAGTACCAT TACTAATACC ACTGAAATAA TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA 1 (((MC1_01B4fs,MC1_01A10),MC1_01C1),(MC1_01A20,MC1_01TA1)); PHLYIP Make sure these match! No stop codons!
  • 23. HyPhy: Input file formats #NEXUS Begin data; Dimensions ntax=5 nchar=100; Format datatype=DNA gap=- interleave; MATRIX MC1_01B4fs TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA MC1_01A10 TGCACT---A ATCTGACAAA GGCTATTAAG ACCAATGGGA ATGCTAATAA MC1_01C1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA MC1_01A20 TGCACTAATA ATCTGACAAA GGCTAGTAAT GCCACTGAGA AGGCTAATAA MC1_01TA1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA TACCAGTACT AATGCCACTG AGAGTACCAT TACTAATACC ACTGAAATAA TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA ; End; Begin trees; TREE tree=(((MC1_01B4fs,MC1_01A10),MC1_01C1),(MC1_01A20,MC1_01TA1)); End; Nexus Make sure these match! No stop codons!
  • 24. HyPhy: Input file formats 5 100 I MC1_01B4fs TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA MC1_01A10 TGCACT---A ATCTGACAAA GGCTATTAAG ACCAATGGGA ATGCTAATAA MC1_01C1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA MC1_01A20 TGCACTAATA ATCTGACAAA GGCTAGTAAT GCCACTGAGA AGGCTAATAA MC1_01TA1 TGCACTAATA ATCTGATT-- -------AAT ATCACTGAGA ATACTAATAA TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA TACCAGTACT AATGCCACTG AGAGTACCAT TACTAATACC ACTGAAATAA TACCATTACT AAT------- --ACCACTAT TAATAGCGGG GGAGAAATAA TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA TACCATTACT AAT------- --ACCACTAT TGATAGCGGG GGAGAAATAA Sequence data GAPS! • Gaps must be in multiples of three and preserve the reading frame! • ML models cannot analyze gapped codon sites – they will be ignored for the analysis.
  • 26. Codon frequency models 0: 1/61 – All codons equally probable 1: F1x4 - Codon frequencies based on overall nucleotide frequencies (fA, fC, fG, fT) 2: F3x4 - Codon frequencies based on nucleotide frequencies at each codon site (fA1, fC1, fG1, fT1, fA2, fC2, fG2, fT2, fA3, fC3, fG3, fT3) 3: F61 - Codon frequencies based on frequency of each codon in the data (faaa, faac, faag, faat, faca, facc, facg, fact, faga, fagc, fagg, fagt, ...) F3x4 is used in the MEME analysis
  • 29. HyPhy: data output Codon alpha beta1 p1 beta2 p2 LRT p-value q-value Log(L) 125 0.0000 0.0000 0.7253 13.6010 0.2747 8.0176 0.0081 0.3258 -30.6957 153 0.9139 0.0000 0.8927 49.5335 0.1073 8.7797 0.0055 0.2580 -22.6063 163 0.6646 0.0000 0.8478 20.0270 0.1522 5.4309 0.0304 0.8559 -25.8135 188 0.0000 0.0000 0.9673 1883.3800 0.0327 9.7907 0.0033 0.1853 -12.7060 211 0.7245 0.0000 0.9660 4586.8100 0.0340 11.9000 0.0011 0.1062 -22.5928 218 0.8903 0.0000 0.9662 304.8150 0.0338 13.3192 0.0006 0.1555 -17.2640 234 0.0000 0.0000 0.8160 263.5910 0.1840 10.1891 0.0027 0.1893 -33.7409 237 0.0969 0.0969 0.7191 17.6831 0.2809 12.3769 0.0009 0.1252 -35.2664 239 0.6719 0.0000 0.8254 11.6736 0.1746 5.9541 0.0232 0.7269 -22.8152 264 0.0000 0.0000 0.9370 8.8091 0.0630 6.6746 0.0160 0.5655 -11.4430
  • 31. Selection Analysis Site vs. Lineage Selection Generally, if significant positive (or negative) selection occurs at only a few sites, averaging over the length of the sequences will result in no significant positive (or negative) selection being detected.
  • 32. Significance of Results Selection analysis only determines if there is a significant excess or lack of non- synonymous substitution. The relative level of physiochemical similarity among the two amino acids is beyond the capabilities of selection analysis software. Selection Analysis
  • 33. References Fundamentals of Molecular Evolution. 2nd edition. Grauer and Li. 2000 A good introduction to the major concepts in molecular evolution. The Phylogenetics Handbook. P. Lemey, editor. 2005. Chapter 14: Theory of and practice of diversifying selection analysis. MEME reference: Murrell B, et al. (2012) Detecting individual sites subject to episodic diversifying selection. PLoS Genetics, 8(7):e1002764.
  • 34. Seminar Follow-Up Site  For access to past recordings, handouts, slides visit this site from the NIH network: http://collab.niaid.nih.gov/sites/research/SIG/Bioinformatics/ 34 1. Select a Subject Matter View: • Seminar Details • Handout and Reference Docs • Relevant Links • Seminar Recording Links 2. Select a Topic Recommended Browsers: • IE for Windows, • Safari for Mac (Firefox on a Mac is incompatible with NIH Authentication technology) Login • If prompted to log in use “NIH” in front of your username
  • 38. 38 Next Molecular Evolutionary Analysis of Pathogens Using BEAST Thursday, 10 December at 1300