Structural Understanding of
Intrinsically Disordered Proteins
Joe Passman
April 26, 2013
6/24/20131
While I Have Your Attention…

6/24/20132
Overview

6/24/20133
Rise, Prevalence, and
Possible Roles of Disorder in
Proteins
Background
6/24/20134
Structure / Function Paradigm
Transcription
mRNA
mRNA
Translation
Peptide
3D Protein
Folding
Output
Nucleus
DNA
Splicing
Export
RNA
???
6/24/20135
Disorder Becomes Apparent
 Early discovery
 Bovine serum albumin binding sites (Karush,1950)
 Later…
 Rapid rise of genomic data (~1990)
 Predictors of natural disordered regions (PONDRs)
 Early proton NMR experiments (Daniels et al, 1978)
0
200
400
600
800
2013
2010
2007
2004
2001
1998
1995
1992
1989
1986
1983
1978
Numberof
Publications
Year
6/24/20136
Disorder, Disorder, (most)Everywhere!

Hosoda et al, 2011
6/24/20137
Why Did We Miss It?
 Unobserved
 Bias of experiment
 Access to genomic data
limited before ~1990
 Crystal structure
relatively uninformative
 Ignored
 Crystal structure artifacts
dismissed
 Disorder thought to be an
artifact
Dyson & Wright, 2005
6/24/20138
What is Disorder in Proteins?
 Definition:
 A protein that does not adopt a well-defined native
structure when isolated in solution under near-
physiological conditions (Eliezer, 2009)
 2 types
 Denatured state ensembles (DSEs)
 Intrinsically disordered proteins (IDPs)
 Vast and malleable configurational ensembles (CEs)
 Charged
 What can impact disorder?
6/24/20139
Why are IDPs Interesting?
Diverse Roles!
 Regulatory
 Homeostasis of signaling pathways
 Translation/Transcription
 Structural
 Flexible Linkers
AND….. They can kill you.
 Disease states
 Cancer (lack of cell cycle regulation)
 Brain (amyloid plaque formation)
Lee et al, 2003
6/24/201310
Proposed Mechanisms
6/24/201311
 Regulation
 Folding upon binding
 Highly specific / low affinity binding
 Multiple interaction sites
 Aggregation
Dyson & Wright, 2005Schärpf et al, 2001
Background
6/24/201312
Structure
6/24/201313
 107 residues (1799 atoms)
 Positively charged side chains
 Proportion of arginine to lysine: 22%
Structure Position Length
(residues)
Visual
Helix 4-10 7
Helix 12-20 9
Helix 23-25 3
Beta Strand 26-29 4
Beta Sheet
Alpha Helix
Function
6/24/201314
 Transient complex
 Interacts with
 RNA
 RNA polymerase
 End Result
 Prevent termination of
transcription
Goldenberg, 2012
More on Interaction with RNA
6/24/201315
Schärpf et al, 2001
6/24/201316
 Regulatory function
 Multiple interaction partners
 Extensively unfolded in isolation
 Flexible structure
Background
6/24/201317
Questions, hypothesis, and
goals
What Are We Trying to Address?
6/24/201318

Goldenberg, 2011
Hypothesis/Expected Outcomes
6/24/201319

Goals
6/24/201320
 Provide a set of atomistic properties
 Quantify correspondence with macroscopic ensemble-
averaged experimental data
 Develop reference point for crowding studies
Context: Alzheimer’s Disease
NIH / National Institute on Aging
6/24/201321

A Basic Understanding of Intrinsically Disordered Proteins

Editor's Notes

  • #3 Disorder is prevalent in the eukaryotic proteome and functions include participation in critical cellular control mechanisms (binding for termination activation / suppression), usage as flexible linkers, channel transport, etc.The importance of the 𝜆 𝑁 protein is that it provides a route for the suppression of the Rho protein during RNA polymerization through high specific, low affinity interactions.Importance of macromolecules in (thermo)dynamic aspects of foldingDisorder is prevalent in the eukaryotic proteome and functions include participation in critical cellular control mechanisms (binding for termination activation / suppression), usage as flexible linkers, channel transport, etc.The importance of the 𝜆 𝑁 protein is that it provides a route for the suppression of the Rho protein during RNA polymerization through high specific, low affinity interactions.Importance of macromolecules in (thermo)dynamic aspects of folding
  • #4 Drive Home Points of Talk:Disorder is prevalent in the eukaryotic proteome and functions include participation in critical cellular control mechanisms (binding for termination activation / suppression) , usage as flexible linkers, channel transport, etc.The importance of the 𝜆 𝑁 protein is that it provides a route for the suppression of the Rho protein during RNA polymerization through high specific, low affinity interactions.Importance of macromolecules in (thermo)dynamic aspects of foldingDrive Home Points of Talk:Disorder is prevalent in the eukaryotic proteome and functions include participation in critical cellular control mechanisms (binding for termination activation / suppression) , usage as flexible linkers, channel transport, etc.The importance of the 𝜆 𝑁 protein is that it provides a route for the suppression of the Rho protein during RNA polymerization through high specific, low affinity interactions.Importance of macromolecules in (thermo)dynamic aspects of folding
  • #6 Standard modelTranscription of DNA -> Splicing -> Movement through nuclear membrane -> Translation by tRNA-> Folding-> FunctionIs this model always true?Massive 3-D structure function relationship studies have been conducted examining a quantitative relationship (Hvidsten et al, 2009).BUT: They concluded a positive correlation because disorder and some functions, such as dimerization, cation channel activity, and transcriptoncoactivatorStandard modelTranscription of DNA -> Splicing -> Movement through nuclear membrane -> Translation by tRNA-> Folding-> FunctionIs this model always true?Massive 3-D structure function relationship studies have been conducted examining a quantitative relationship (Hvidsten et al, 2009).BUT: They concluded a positive correlation because disorder and some functions, such as dimerization, cation channel activity, and transcriptoncoactivator
  • #7 Early discoveryKarush found a Gaussian-type description of binding site free energies for the serum albumin Ignored:Perhaps one reason was a belief that disorder is an artifact because protease digestion would eliminate such proteins in vivo. Later: Other indicators of disorder in the proteomeA series of PONDRs have been developed that take amino acid sequence inputs and give disorder tendency outputs (6, 9, 10, 13, 14). The various PONDRs are distinguished by training sets, data representations for their inputs, and machine learning models for their development.PONDRs may underestimate disorder in vivo due to the presence of the regions involved in ligand binding or crystal contacts.11% of a dataset containing more than 17 000 disordered residues from over 140 proteins were falsely predicted to be in correspondingly long ordered regions.Daniels et al performed and NMR study that found that the major part of the membrane network is the interaction of the chromogranin protein, shown to have approximately a random coil conformation, with the adenosine triphosphate and the adrenaline. Suggests that the purpose of the organized loose structure would appear to be to lower osmotic pressure without hindering rapid release of the vesicle contents on breaking the membraneNMR spectroscopy effective in identifying local structure
  • #8 If disorder was a snake, it would have bitten usEukaryotes hypothesized to exhibit more disorder due to its strong correlation with regulatory and signaling functions.Some exceptions to complexity of organism -> more disorder Single-celled eukaryotes with host-changing lifestyleTypically > 30% of eucaryotic proteins have disordered regions of length > or = 50 consecutive residues (Dunker, 2001)Picture:Hsoda et al developed a Dichotomic System to Divide Proteins into Structural/un-structural Regions (DICHOT)system to classify the entire protein sequence into structural domains (SDs) and intrinsically disordered (ID) regions (Fukuchi, 2009).The new dichotomic system first identifies domains of known structures, followed by assignment of structural domains and ID regions with a combination of pre-existing tools and a newly developed program based on sequence divergence, taking un-aligned regions into consideration.Application of this system to human TFs (401 proteins) showed that 38% of the residues were in SDs, while 62% were in ID regions. The abundance of ID regions makes a sharp contrast to TFs of Escherichia coli (229 proteins), in which only 5% fell in ID regions.If disorder was a snake, it would have bitten usEukaryotes hypothesized to exhibit more disorder due to its strong correlation with regulatory and signaling functions.Some exceptions to complexity of organism -> more disorder Single-celled eukaryotes with host-changing lifestyleTypically > 30% of eucaryotic proteins have disordered regions of length > or = 50 consecutive residues (Dunker, 2001)Picture:Hsoda et al developed a Dichotomic System to Divide Proteins into Structural/un-structural Regions (DICHOT)system to classify the entire protein sequence into structural domains (SDs) and intrinsically disordered (ID) regions (Fukuchi, 2009).The new dichotomic system first identifies domains of known structures, followed by assignment of structural domains and ID regions with a combination of pre-existing tools and a newly developed program based on sequence divergence, taking un-aligned regions into consideration.Application of this system to human TFs (401 proteins) showed that 38% of the residues were in SDs, while 62% were in ID regions. The abundance of ID regions makes a sharp contrast to TFs of Escherichia coli (229 proteins), in which only 5% fell in ID regions.
  • #9 No observationBias of experimentProcess of making homogenate of plant / animal tissue releases proteasesUnder these conditions, unfolded proteins more sensitive to proteolytic activityGenomic mappingSequence to structure mapping is difficult unless the sequence has a high level of homology to sequences with known structure.Even then… Not all mRNAs are actively translated at any particular time and IDPs are transient.Crystal structure uninformativeOnly indicates presence of disorder through absence of electron densityMay have bound ligand -> leads to induced orderMissing loops in structure indicateGaps may be alerting us to the presence of intrinsically disordered loopsSuch gaps are the basis for the DISOPRED2 disorder prediction serverIgnored Crystal structure artifactsThese "gaps" in the model are often thought to be artifacts of inadvertent disorder in the crystal.Artifactbecause protease digestion would eliminate such proteins in vivo.No observationBias of experimentProcess of making homogenate of plant / animal tissue releases proteasesUnder these conditions, unfolded proteins more sensitive to proteolytic activityGenomic mappingSequence to structure mapping is difficult unless the sequence has a high level of homology to sequences with known structure.Even then… Not all mRNAs are actively translated at any particular time and IDPs are transient.Crystal structure uninformativeOnly indicates presence of disorder through absence of electron densityMay have bound ligand -> leads to induced orderMissing loops in structure indicateGaps may be alerting us to the presence of intrinsically disordered loopsSuch gaps are the basis for the DISOPRED2 disorder prediction serverIgnored Crystal structure artifactsThese "gaps" in the model are often thought to be artifacts of inadvertent disorder in the crystal.Artifactbecause protease digestion would eliminate such proteins in vivo.
  • #10 DSEs – normally foldIDPs – normally do not foldWhat can impact disorder?Changes in solution conditions
  • #12 RegulationFolding upon bidning supported by chemical shifts in NMR spectra, especially for
  • #14 Numerous specific and nonspecific contacts between the positively charged side-chains of these amino acids and the negatively charged phosphodiester backbone of the RNA are possible, giving a likely explanation for complex aggregation at relatively low concentrations of 300 µm.
  • #15 Termination mechanism unknown
  • #16 N-terminal segment binds to RNAC-terminal segment interacts with RNAP
  • #19 The figure above is a cartoon representation of one of the unfolded proteins we study, the N protein of bacteriophage lambda (blue), in a solution containing about about 130 mg/mL of a small globular protein (orange). The blue sphere has the same volume as the unfolded protein. We are using small-angle neutron scattering (SANS) to study the behavior of unfolded proteins under conditions like those depicted in the figure, and we use computer simulations to test various ideas about unfolded proteins. One of the important results to emerge from these studies is the finding that crowding causes surprisingly little effect on the average dimensions of the N protein, and likely other unfolded proteins.
  • #20 R_g from SAXS data