A simple method for incorporating sequence information into directed evolution experiments Presentation Transcript
A simple method for incorporating sequence information into directed evolution experiments Kyle L. Jensen*, Hal Alper*, Curt Fischer, Gregory Stephanopoulos Department of Chemical Engineering Massachusetts Institute of Technology sequence phenotype
When screening throughput is limit, linking sequence to phenotype can help direct downstream searches
(no selectable trait)
Here, a P Ltet promoter was mutated to create a library of promoter variants Alper H., C. Fischer, E. Nevoigt, and G. Stephanopoulos, 2005. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. U S A 102:12678-83.
69 promoter variants were created using error prone PCR
The 69 promoter variants spanned an 800-fold range of activity - How different are the underlying, mutagenized sequences? - What, on a sequence level, causes the variation? 800 fold range Log relative fluorescence Mutant number Top 50% Bottom 50%
Each of the 69 mutants had a unique sequence and incorporated multiple transition SNPs mutations promoter region Log relative fluorescence Mutant number Position [nt] Mutant number
The effects of individual mutations were “masked” by the presence of other mutations
Just because a mutation occurs more frequently in one class, is it correlated?
Is the ratio of top/bottom important?
What is the statistical significance of a mutation that is distributed between the two classes?
Some mutations have obvious effects ...most do not Position [nt] Mutant number Class distribution
Each individual position can be evaluated using a simple binomial distribution Same as: what's the probability of getting heads 14 of 20 coin tosses? P-value: 14 or more heads out of 20 Assuming the positions are independent Position [nt] Class distribution
Similar analysis over the promoter region revealed 7 positions significantly correlated with activity Class distribution Position [nt]
Position [nt] Mutant number Class distribution Log relative fluorescence Mutant number Position [nt]
A similar analysis can be applied to an arbitrary number of mutants and phenotypic classes 1 2 M . . . mutants M phenotypes Mutants with mutations as “position 35” . . . . . . . . . or 1 2 3 4 5 6 Y
The generalized probability of the phenotype distribution can be used to find mutation-phenotype correlations
Probability of a particular vector color distribution
Significance of a correlation between mutations at “position 35” and the green phenotypic class
Prior probability of
In our case, we tested 8 locations, spanning a range of functions & confidences Class distribution Position [nt]
7/8 of the single position mutants were in agreement with the predicted function
Rationally designed promoters with combinations of mutations showed predicted activity but also signs of site interaction
In summary, this simple method, based on multinomial statistics, can be used to link sequence variations to particular phenotypes