Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design potential

50 views

Published on

Presented to the ZHAW Anisimova Group on 2018-05-24

Published in: Science
  • Be the first to comment

  • Be the first to like this

2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design potential

  1. 1. ARMADILLO PROTEIN EVOLUTION & BINDING PREDICTION Spencer Bliven Anisimova Journal Club 2018-05-24
  2. 2. ARMADILLO REPEAT PROTEINS
  3. 3. WHY ARMADILLOS? BIOLOGICAL INTEREST  Roles in localization, protein transport, & more  ß-catenin: cell adhesion, development  α-importin: nuclear localization  APC: tumor suppressor gene, linked to colorectal cancer  Probably homologous to HEAT repeat family  Slightly different structure  huntingtin: disease, nerve signaling, protein transport  ß-importin: nuclear localization  phosphotases/kinases  Ancient family (primarily eukaryotic, but predates metazoans)
  4. 4. WHY ARMADILLOS? PROTEIN-PROTEIN BINDING  Bind peptides, so could be used like antibodies or DARPins (therapeutics, biotech, assays, etc.)  Bind extended chains  Target disordered regions and termini  Linear epitope, so much easier to design  Modular binding [5AEI]
  5. 5. ARMADILLO EVOLUTION Armadillidium vulgare
  6. 6. TANDEM REPEAT EVOLUTION  Duplications & fusions within a gene lead to tandem repeats  Speciation and gene duplication lead to orthologs and paralogs  Pattern of repeats tells us the sequence of evolutionary events
  7. 7. HEAT & ARM Andrade MA, Petosa C, O'donoghue SI, Müller CW, Bork P. Comparison of ARM and HEAT protein repeats. J Mol Biol. Academic Press; 2001 May 25;309(1):1–18.
  8. 8. ARM FAMILY Gul, I. S., Hulpiau, P., Saeys, Y., & van Roy, F. (2017) Cellular and Molecular Life Sciences, 74(3), 525–541 ∂-catenins & ARM formins ß-catenin not with ∂-catenins ß-importin HEAT (outgroup) Catenin beta-like α-importin
  9. 9. LIMITATIONS OF PRIOR STUDIES  Don’t model repeat evolution  Either use full-length sequences (no support for copy variation) or single repeats (inconsistent boundaries, repeats segregate differently between species)  No reconciliation between gene tree and repeat tree  Older papers use limited species and sequences  Inconsistent inclusion of HEAT repeats MY APPROACH  Detect repeats with TRAL (cpHMM)  Alignment & tree inference with ProGraphML+TR  Joint gene tree and repeat tree inference (future work)
  10. 10. TRAL  Tandem Repeat Annotation Library  Circularly permuted Hidden Markov Model (cpHMM) for tandem repeat alignment  Integrates repeat detection software  Important for expanding analysis beyond ArmRP family Schaper et al. (2015). TRAL: tandem repeat annotation library. Bioinformatics, 31(18), 3051–3053. Schaper E, Gascuel O, Anisimova M. Deep conservation of human protein tandem repeats within the eukaryotes. Mol Biol Evol. 2014 May;31(5):1132–48.
  11. 11. DETECTED REPEATS BY SPECIES (GUL HMM) Species ArmRP Proteins Macrostomum lignano 170 Echinostoma caproni 163 Lingula anatina 125 human 107 zebrafish 107 scaled quail 100 tropical clawed frog 95 owl limpet 93 starlet sea anemone 93 Florida lancelet 90 Japanese sea cucumber 84 Schistocephalus solidus 84 Octopus bimaculoides 82 Biomphalaria glabrata 82 purple sea urchin 81 platypus 75 green sea turtle 75 Stylophora pistillata 75 Wild Bactrian camel 72 Amphimedon queenslandica 68 Number of Proteins Numberofspecies 94 species
  12. 12. PROGRAPHML+TR Szalkowski AM, Anisimova M. Graph-based modeling of tandem repeats improves global multiple sequence alignment. Nucleic Acids Res. 2013 Sep;41(17):e162–2.
  13. 13. OUTLOOK: EVOLUTION  Improve Arm profiles based on structural searches  MMTF-pySpark for rapid structural searches  Finish phylogenetic reconstruction with ProGraphML+TR on diverse species  Joint gene-repeat reconstruction  Analogous to joint species-gene tree inference (e.g. Szöllosi et al, 2015)
  14. 14. ARM BINDING
  15. 15. MOTIVATION  Nature’s solution to binding molecules  Used in diagnostics, therapy, labelling, biochemistry research  $105 billion industry (2016)  3D epitope  Produced in vivo in animals (polyclonal) then optimized biochemically (monoclonal) Antibodies
  16. 16. MOTIVATION  Nature’s solution to binding molecules  Used in diagnostics, therapy, labelling, biochemistry research  $105 billion industry (2016)  3D epitope  Produced in vivo in animals (polyclonal) then optimized biochemically (monoclonal) Antibodies DARPins  Designed Ankyrin Repeat Proteins  Developed by Andreas Plückthun, UZH  Commercialized by Molecular Partners AG ($571 million market cap)  Similar uses to antibodies  3D epitope  Produced in vitro from a randomized library
  17. 17. MOTIVATION  Nature’s solution to binding molecules  Used in diagnostics, therapy, labelling, biochemistry research  $105 billion industry (2016)  3D epitope  Produced in vivo in animals (polyclonal) then optimized biochemically (monoclonal) Antibodies DARPins dArmRP  Designed Ankyrin Repeat Proteins  Developed by Andreas Plückthun, UZH  Commercialized by Molecular Partners AG ($571 million market cap)  Similar uses to antibodies  3D epitope  Produced in vitro from a randomized library  Designed Armadillo Repeat Proteins  Bind extended peptides (tails, disordered regions, denatured proteins)  1D epitope  Rationally designed in silico?
  18. 18. ARM STRUCTURE & CONSERVATION Gul 2017 Fig 1B Structure: Repeat from designed ARM YIIIM5AII (Hansen…Plückthun, 2016) [5aei], colored and labeled as in the alignment H1 H2 H3 H1 H2 H3 Hydrophobic core
  19. 19. BINDING HINTS FROM DARMRP ((KR)N BINDING) Gul 2017 Fig 1B Structure: Repeat from designed ARM YIIIM5AII (Hansen…Plückthun, 2016) [5aei], colored and labeled as in the alignment H1 H2 H3 Nonspecific binding Mutants available for 7 residues in Arg pocket Lys pocket has only one specific interaction H1 H2 H3 Hydrophobic core
  20. 20. BINDING MODULARITY  For dArmRP, binding is linear with the number of repeats and for single-residue mutations Predictable binding energies Single-residue resolution K->A R->A 2K->2A 2R->2A
  21. 21. KERNEL MODEL  Regression problem: predict binding affinity from sequence at 7 positions  Extract 5 features based on amino acid properties (Atchley 2005)  Use linear regression with various kernels log10 𝑌 = 𝐾 𝐾 + 𝜆𝐼 log10 𝑌  Linear kernel 𝑎, 𝑏 = 𝑎 𝑇 𝑏  Gaussian kernel 𝑎, 𝑏 = 𝑒𝑥𝑝 −𝜎 𝑎 − 𝑏 2
  22. 22. RESULTS  Train on 138 datapoints from Plückthun group  Essentially all “positive” binding cases  Leave-one-out cross validation for error estimation  Linear: 0.42 standard error (log10 M units)  Gaussian: 1.42, but numerically instable
  23. 23. LINEAR KERNEL lambda=0.001 0.42 standard error (log10 M units) R=.90 Measured Binding (log10) PredictedBinding(log10)
  24. 24. GAUSSIAN KERNEL lambda=10-4 sigma=108 1.42 standard error (log10 M units) R=.13 Measured Binding (log10) PredictedBinding(log10)
  25. 25. GAUSSIAN KERNAL  Numerically unstable implementation (hat matrix is near-singular)  No renormalization currently
  26. 26. OUTLOOK: BINDING  Switch from regression to classification  Additional training data from collaborators  In particular, need non-binding examples  More sophisticated classifiers  Numerically stable implementation  Better kernels?  Proactively suggest informative instances for our collaborators to measure
  27. 27. THANKS!  ACGTeam: Maria Anisimova, Manuel Gil, Victor Garcia, Lorenzo Gatti, Max Maiolo, Simone Ulzega, Erich Zbinden  Matteo Delucci & Lina Naef (ACLS masters) – TRAL  Elke Schaper – TRAL  Somayeh Danafar – Kernel methods  Andreas Plückthun, Patrick Ernst, Yvonne Stark (UZH) – Binding data But wait, there’s more! MMTF format coming next…
  28. 28. MMTF DEMO?
  29. 29. MMTF  Compression: data normalization, vectorization, run-length encoding, delta encoding  Optional lossy/course representation  Now has widespread software support (BioPython, BioJava, most molecular viewers, etc)
  30. 30.  MMTF Format: http://mmtf.rcsb.org/  MMTF-Spark library:  Java https://github.com/sbl-sdsc/mmtf-spark/  Python https://github.com/sbl-sdsc/mmtf-pyspark/  Fast, parallelized whole-PDB analysis
  31. 31. CE-SYMM OPEN REPEAT DETECTION ß-catenin [1I7X]
  32. 32. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

×