Mol. Cells 2020; 43(11): 899-908
Published online November 26, 2020
https://doi.org/10.14348/molcells.2020.0186
© The Korean Society for Molecular and Cellular Biology
Correspondence to : junseockkoh@snu.ac.kr
Intrinsically disordered proteins or regions (IDPs or IDRs) are widespread in the eukaryotic proteome. Although lacking stable three-dimensional structures in the free forms, IDRs perform critical functions in various cellular processes. Accordingly, mutations and altered expression of IDRs are associated with many pathological conditions. Hence, it is of great importance to understand at the molecular level how IDRs interact with their binding partners. In particular, discovering the unique interaction features of IDRs originating from their dynamic nature may reveal uncharted regulatory mechanisms of specific biological processes. Here we discuss the mechanisms of the macromolecular interactions mediated by IDRs and present the relevant cellular processes including transcription, cell cycle progression, signaling, and nucleocytoplasmic transport. Of special interest is the multivalent binding nature of IDRs driving assembly of multicomponent macromolecular complexes. Integrating the previous theoretical and experimental investigations, we suggest that such IDR-driven multiprotein complexes can function as versatile allosteric switches to process diverse cellular signals. Finally, we discuss the future challenges and potential medical applications of the IDR research.
Keywords allostery, coupled folding and binding, dynamic binding, intrinsically disordered proteins or regions, macromolecular complex, multivalent binding
In the classical structure-function paradigm, a protein must fold into a well-defined three-dimensional structure in order to carry out its function. The landmark achievements underpinning this paradigm were the determination of the first three-dimensional protein structures (myoglobin and hemoglobin) by Kendrew (Kendrew et al., 1958) and Perutz (Perutz et al., 1960). In parallel, the elegant biochemical experiments by Anfinsen and colleagues further demonstrated that the native three dimensional structure of a protein is determined by its amino acid sequence (Anfinsen et al., 1961). For more than half a century, the structure-function paradigm has been one of the most fundamental frameworks in understanding complex biological processes at the molecular level. However, in the mid-1990s, with advances in bioinformatics, it was realized that a significant portion of the proteomes from various species contains proteins or regions in a protein with the amino acid contents distinct from those of ordered globular proteins (Romero et al., 1998; Wootton, 1994). Concurrently, nuclear magnetic resonance (NMR) experiments observed regulatory proteins that are disordered but fully functional under physiological conditions (Daughdrill et al., 1997; Kriwacki et al., 1996). Following these counterintuitive discoveries, numerous biophysical and bioinformatic investigations have accumulated a substantial amount of evidence demonstrating the prevalence and the biological significance of disordered proteins, collectively termed intrinsically disordered proteins (IDPs) or regions in a protein (IDRs) (Dunker et al., 2002; Uversky, 2002; van der Lee et al., 2014; Wright and Dyson, 1999). About a third of the eukaryotic proteome contains IDRs of 30 or more residues in length (Oates et al., 2013; Ward et al., 2004) that are particularly abundant in the proteomes associated with dynamic cellular processes including gene expression and signaling as well as with cancer and neurodegenerative diseases (Babu et al., 2011; Iakoucheva et al., 2002; Liu et al., 2006; Uversky et al., 2008; Xie et al., 2007).
In order to fully appreciate the functional role of IDRs in such diverse biological processes, it is required to understand how IDRs interact with other macromolecules. Of particular interest is the unique interaction features of IDRs that arise from conformational disorder and potentially provide functional advantages in specific cellular processes. In this review, we briefly overview the sequence and conformational features of IDRs and then discuss the mechanisms of the IDR-mediated macromolecular interactions distinct from those of ordered proteins. To expedite the discussion, the mechanisms are classified into three broad categories: i) coupled folding and binding, ii) dynamic binding, and iii) multivalent binding (Fig. 1). We also present the exemplary cellular processes that are exquisitely regulated by these interaction principles. In particular, we focus on the third principle that enables IDRs to drive assembly of multicomponent macromolecular complexes that potentially function as manifold allosteric switches to process and integrate diverse cellular signals. The more comprehensive physical and biological attributes of IDRs including multisite posttranslational modifications and liquid-liquid phase separations have been discussed in many excellent reviews (Csizmok et al., 2016; Uversky, 2018; van der Lee et al., 2014; Wright and Dyson, 2015; Wu and Fuxreiter, 2016).
The amino acid sequences of IDRs exhibit compositional biases distinct from those of ordered proteins (Radivojac et al., 2007; Uversky et al., 2000; Weathers et al., 2004). In general, while the IDR sequences are enriched in polar and charged amino acids, they contain bulky hydrophobic amino acids in low abundance. These sequence characteristics directly dictates the conformational features of IDRs. One of the major driving forces in protein folding is the hydrophobic effect leading to compaction of a polypeptide chain into a stable tertiary structure that buries hydrophobic amino acids from the aqueous solvent (Chothia, 1974; Spolar and Record, 1994). The hydrophobic effect and other noncovalent interactions cooperatively stabilize the folded form, minimizing the free energy of the polypeptide best represented as funnel-shaped energy landscapes (Dill and Chan, 1997; Onuchic et al., 1995). In contrast, folding of an IDR into a unique and stable tertiary structure is unfavorable due to the low abundance of hydrophobic amino acids. Instead, they interact with the solvent through hydrophilic residues and dynamically interconvert among multiple conformational states (Csizmok et al., 2016; Dyson and Wright, 2005; Mukhopadhyay et al., 2007). Therefore, the typical energy landscape of an IDR is flat with multiple local minima (heterogeneity) that are separated by low activation energy barriers (fast interconversion) (Fisher and Stultz, 2011; Papoian, 2008; Wei et al., 2016).
The majority of IDRs utilize linear peptide motifs to interact with their target macromolecules. These linear motifs are categorized into two broad groups, namely short linear motifs (SLiMs) and molecular recognition features (MoRFs) (Davey et al., 2012; Mohan et al., 2006; Oldfield et al., 2005; Tompa et al., 2014). While SLiMs contain 3-10 amino acids among which 3-4 residues are directly involved in the binding specificity, MoRFs are longer peptide motifs with 10-70 amphipathic amino acids. All MoRFs and selected SLiMs undergo disorder-to-order transitions upon binding to their targets, a process termed coupled folding and binding (Dyson and Wright, 2002; Spolar and Record, 1994). The type of the ordered structures adopted by IDRs includes α-helices, β-strands, and rigid loops. The coupled folding and binding mechanism has been extensively investigated for the interactions of the transcriptional co-activator CBP (CREB-binding protein) with its various binding partners (Dyson and Wright, 2016). A representative example is the binding of the phosphorylated KID (kinase-inducible domain) of CREB (cAMP-response element-binding protein) to KIX (KID interacting domain) of CBP. While pKID is disordered in the free form, it folds into two helices upon binding to KIX (Fig. 1) (Radhakrishnan et al., 1997; Sugase et al., 2007).
A suggested functional role of coupled folding and binding is to achieve the structural complementarity or specificity of the binding interface formed between an IDR and its target protein or DNA (Spolar and Record, 1994; Wright and Dyson, 1999; 2009). From the thermodynamic point of view, folding of a disordered motif into a well-defined structure is entropically unfavorable and must reduce the overall binding affinity. Such an entropic cost has been predicted to fine-tune the affinity between strong and weak binding in order to achieve both the binding specificity and reversibility required for robust and rapid signaling processes in the cell (Dyson and Wright, 2005; Fuxreiter et al., 2004; Zhou, 2012).
Some IDRs transiently adopt secondary structures in the free states that closely resemble the conformations of the target-bound states (Tompa, 2002). These observations invoked a conformational selection mechanism in which the preformed structures are the binding competent states and therefore selected out of other disordered states by the binding partners (Fuxreiter et al., 2004). A physiological significance of the preformed structures of IDRs was suggested from the investigation of the interaction between the E3 ubiquitin ligase Mdm2 and the N-terminal transactivation domain (NTAD) of the p53 tumor suppressor protein (Borcherds et al., 2014). In the free form, NTAD is mostly disordered with the low helical content, but folds into an amphipathic helix upon binding Mdm2. Mutating one of the conserved flanking prolines (Pro27) into alanine increased the population of the helical form and the affinity of NTAD for Mdm2 by an order of magnitude, likely due to the reduction in the entropic cost for folding. However, the increased affinity altered the stability of p53, impaired target gene expression, and ultimately caused failure to induce cell cycle arrest upon DNA damage. Such a dramatic effect strongly suggests that the abundance of the preformed structure is optimized in the conformational ensemble of NTAD for signaling fidelity. However, it may be ambiguous in some instances to interpret mutational or protein engineering data in part because these molecular interventions can affect not only the equilibrium for the formation of the preformed structure but also the binding equilibrium itself. Thus, in order to properly assess the functionality of the preformed structures, it is critical to combine spectroscopic and thermodynamic approaches to dissect the observed effects into these two (or more) contributions.
Because of the weak and reversible nature of the target binding, IDRs often retain the dynamic property even in the target-bound states, forming an ensemble of conformationally heterogeneous complexes (fuzzy complex) (Baker et al., 2007; Fuxreiter, 2018; Mittag et al., 2010; Tompa and Fuxreiter, 2008). A direct visual demonstration of the fuzzy complex was provided in the NMR investigation of the interaction between the transcription activator GCN4 and one of the multiple subunits of the Mediator complex (transcription co-activator) (Brzovic et al., 2011; Tuttle et al., 2018). The central activation domain (cAD) of GCN4 comprising 35 amino acids is disordered in the free form, but folds into an amphipathic α-helix in the residues 117 to 124 upon binding to the first activator binding domain (ABD1) of Med15. Remarkably, the folded helical region adopts multiple orientations on the ABD1 surface and forms a fuzzy complex (Fig. 1). The fuzziness arises from the surprisingly simple binding interface in which a few key hydrophobic residues of cAD are inserted into the shallow hydrophobic clefts on the ABD1 surface. The hydrophobic residues of cAD dynamically sample the multiple clefts on ABD1, leading to the multiple orientations of the helical segment. A subsequent study has shown that the peptides mimicking cAD of GCN4 but with the increased fuzziness bind ABD1 tighter than the wild type cAD (Warfield et al., 2014). Remarkably, these peptides enhanced the transcriptional activities of the Med15-dependent genes, underscoring the physiological significance of the fuzzy complexes.
Conformational fluctuations of IDRs in the target-bound states allow their limited regions to be transiently exposed to the solvent and make alternative interactions with other proteins as demonstrated in the extensive structural and biophysical investigations of the interaction between p27 and Cdk2/cyclin A (Fig. 2) (Galea et al., 2008; Grimmler et al., 2007; Lacy et al., 2004; Tsytlonok et al., 2019). The disordered KID (kinase inhibitory domain, residues 29-90) of p27 uses three subdomains, designated D1, D2, and 310, to bind the Cdk2/cyclin A complex and inhibit the catalytic activity of Cdk2. In particular, the D2 and 310 subdomains adopt a beta hairpin (and an intermolecular beta-sheet) and a 310 helix, respectively, in the bound state in which a tyrosine residue (Y88) in the 310 helix is inserted into the active site of Cdk2. However, persistent flexibility of the D2 and 310 subdomains allows these regions to dynamically sample the bound and solvent-exposed states. In particular, Y88 is transiently exposed during this “breathing” motion for phosphorylation by the non-receptor tyrosine kinase BCR-ABL. The phosphorylation unfolds and completely ejects the 310 subdomain from the active site of Cdk2, which partially restores the catalytic activity of Cdk2. Subsequently, a threonine residue (T187) located near the C-terminus of p27 is phosphorylated by Cdk2 via a pseudounimolecular mechanism. In this phosphorylation step, the extremely flexible nature of the entire C-terminal IDR of p27 permits close proximity between the residue T187 and the Cdk2 active site. In turn, p27 with the phosphorylated T187 can be polyubiquitinated by the E3 ligase SCFSkp2 and selectively degraded by the 26S proteasome. Finally, the fully active Cdk2/cyclin A complex drives progression from G1 to S phase of the cell division cycle. In summary, p27 exploits its intrinsic disorder and large-scale dynamic motions in order to exquisitely regulate the catalytic activity of Cdk2/cyclin A through the multistep post-translational modification cascade.
IDRs driving the formation of higher-order assemblies have caught much attention for the last decade (Fung et al., 2018; Fuxreiter et al., 2014; Wu and Fuxreiter, 2016). These IDRs present multiple MoRFs or SLiMs to make multivalent interactions with many binding partners (Fig. 1) (Cumberworth et al., 2013; Wright and Dyson, 2015). Multiple binding sites can be identical, promoting cooperative and high-affinity binding of an IDR to multiple copies of a target protein (Praefcke et al., 2004). Otherwise, multiple distinct MoRFs or SLiMs embedded in an IDR recruit diverse binding partners, enabling the IDR to function as an interaction hub (Cortese et al., 2008; Dunker et al., 2005; Haynes et al., 2006; Hegyi et al., 2007; van der Lee et al., 2014; Wright and Dyson, 2015). A molecular basis for the scaffolding property of IDRs was deduced to be their disordered and extended conformations that expose greater surface area available for potential interactions as compared to the ordered proteins of the same size (Dosztanyi et al., 2006; Gunasekaran et al., 2003). In addition, IDRs utilize linear motifs comprising relatively small numbers of amino acids for target binding (Tompa et al., 2014) while ordered proteins arrange numerous amino acids, often distant from each other on a linear sequence, into complex three-dimensional binding sites. Therefore, it should be more efficient to incorporate multiple target binding sites into an IDR than into an ordered protein (Cumberworth et al., 2013).
Axin is a representative hub protein interacting with various proteins in the Wnt, JNK, TGF-β, and p53 signaling pathways (Cortese et al., 2008). In the Wnt signaling pathway, β-catenin, casein kinase Iα, and glycogen synthetase kinase 3β (GSK-3β) bind the distinct motifs present in the central disordered region (residues ~200 to ~800) of axin to assemble the β-catenin destruction complex (Cortese et al., 2008; Xue et al., 2013). The assembly has been predicted to increase the local concentrations of these axin binding proteins and consequently facilitate the interactions among them (i.e., reduction of dimensionality) for the efficient phosphorylation and degradation of β-catenin (Noutsou et al., 2011; Xue et al., 2013). Of note, the binding of GSK-3β was observed to inhibit the interaction of the JNK signaling pathway protein MEKK1 with axin (Zhang et al., 2001), which suggests a negative coupling mechanism to prevent cross-talks among different signaling pathways. Since the binding sites of these two proteins on axin are non-overlapping, the observed inhibitory effect is not competitive. Furthermore, it was suggested that the facilitated interactions among the proteins in the β-catenin destruction complex may be driven not only by the aforementioned reduction of dimensionality but also by conformational changes of axin (Cortese et al., 2008). Collectively, it is plausible to hypothesize that allosteric coupling exists among target binding sites of an IDR hub for sophisticated regulation of the binding affinities and catalytic activities of hub-bound proteins.
In the structure-function paradigm, allostery has been conceived as a communication between two distinct ligand binding sites on a macromolecule connected by a well-defined structured pathway (Changeux, 2013; Monod et al., 1965). However, a theoretical framework has been proposed to show the feasibility of allosteric coupling between two sites within an IDR (Hilser and Thompson, 2007; Motlagh et al., 2014). This framework, termed ensemble allosteric model (EAM), elegantly demonstrates that coupling between two sites can be achieved by an intricate balance among the intrinsic stabilities of the two sites, their affinities for the respective ligands, and the interaction energy between the two sites. Furthermore, the EAM predicted allosteric coupling to be optimal when one or both of the ligand binding sites are disordered in the free forms.
The experimental demonstration for IDR-mediated allosteric coupling is beginning to emerge from a handful of systems: the interaction between glucocorticoid receptor and its target DNA (Li et al., 2017); the transcriptional regulation of the phd/doc toxin-antitoxin operon of bacteriophage P1 (Garcia-Pino et al., 2010); the interactions between the adenoviral oncoprotein E1A and host regulatory proteins (Fig. 3A) (Ferreon et al., 2013); the interactions of nuclear pore proteins with transport factors (Fig. 3B) (Blus et al., 2019; Koh and Blobel, 2015). E1A is an intrinsically disordered hub protein that uses multiple promiscuous binding motifs to interact with numerous host proteins (Pelka et al., 2008). For instance, E1A uses its N-terminal region and two conserved regions (termed CR1 and CR2) to bind the TAZ2 (transcriptional adaptor zinc-finger 2) domain of CBP and the pocket domain of pRb (retinoblastoma protein) in order to reprogram the cell cycle and transcriptional regulation (Ferrari et al., 2008; Horwitz et al., 2008). For the E1A variant containing the N-terminal region and CR1, the binding sites of the two host proteins are distinct and positively coupled through disorder-to-order transitions to promote the ternary complex formation (Ferreon et al., 2013). However, truncation of the N-terminal region of E1A drives a striking transition in allostery from positive to negative coupling, which favors the formation of the binary complexes (Fig. 3A). The positive allosteric coupling was suggested critical to cooperatively recruit the two host proteins and facilitate the acetylation of pRb by the HAT (histone acetyl transferase) domain of CBP. In turn, the acetylation triggers degradation of pRb, causing uncontrolled cell cycle progression to S phase and proliferation of infected cells. Conversely, the negative coupling implies that the concentration and the overall activity of the ternary complex can be fine-tuned by the availability of the N-terminal region of E1A. The in vivo availability of the N-terminal region can be regulated by binding of other cellular proteins to this region. Therefore, in order to corroborate the physiological significance of the truncation-driven allosteric switch, it is important to test whether negative allostery can be induced by occupation of the N-terminal region of intact E1A by other proteins.
Recent thermodynamic and structural investigations of the interactions of nuclear pore proteins (nucleoporins) with transport factors (karyopherins) have demonstrated that interactions of karyopherins with IDRs of nucleoporins allosterically modulate the interaction networks among nucleoporins (Blus et al., 2019; Koh and Blobel, 2015). In particular, Nup53 utilizes its N and C-terminal IDRs to interact with other nucleoporins (Nic96 and Nup157) and assemble into the core of the nuclear pore complex (NPC). The C-terminal IDR of Nup53 interacts with a karyopherin (Kap121) as well, and the karyopherin binding allosterically destabilizes the nucleoporin interactions at both the N and C-terminal IDRs (Blus et al., 2019). The flexibility of the NPC core induced by the allosteric destabilization may be required to accommodate potential conformational changes in the central channel of the NPC during various transport events (Blus et al., 2019; Koh and Blobel, 2015).
These pioneering discoveries suggest a novel feature of allostery in multiprotein complexes assembled from IDRs (Fig. 4). Because of the reversible target binding and conformational fluctuation of IDRs, an IDR-based macromolecular complex exists as an ensemble of various states with similar free energy levels and low energy barriers among them (Fig. 4A). Each state has a unique conformation/function determined by the affinities of the bound protein subunits and cooperativities among them. At the same time, IDRs present multiple regulatory sites to recognize diverse external signals (protein binding, PTMs) that allosterically shift the ensemble toward distinct conformational and functional states (Figs. 4B-4D). Because of the preconditioned heterogeneity and low energy barriers within the ensemble, the multifaceted allosteric modulation of the macromolecular complex is thermodynamically favorable and kinetically fast, particularly as compared to allosteric coupling in ordered proteins (Fig. 4E). In short, allostery mediated by IDRs confers versatility and agility on the macromolecular complexes that must promptly process and adapt to the various external stimuli.
For the last two decades, our perception of IDRs has dramatically evolved from nonfunctional terminal tails or passive domain tethers to critical regulatory components in dynamic cellular processes. Despite the paradigm shift, it is far from complete to understand the staggering complexity of how IDRs interact with various targets and thereby regulate biological processes especially in the context of large macromolecular complexes such as the NPC and the Mediator complex. As discussed earlier (Fig. 4), IDRs may function not only as architectural scaffolds but also as manifold allosteric switches in these large complexes. In order to test this model, it is essential to purify intact proteins that contain both IDRs and ordered domains, which is a nontrivial technical problem. A theoretical challenge is to develop analytical tools quantifying the stoichiometries, binding constants, and cooperativities of non 1:1, multivalent interactions among IDRs and target proteins. In parallel, an integrative structural approach utilizing x-ray crystallography (target-bound motif structure), NMR (conformational ensemble), cryo-EM (global architecture), and low-resolution methods (e.g., light scattering, fluorescence resonance energy transfer [FRET], cross-linking) is required to explore changes in the conformational and functional states of the macromolecular complexes driven by IDR-mediated allostery. The combined effort will eventually make significant contributions in addressing the long-standing biological questions, for example, how the Mediator complex, enriched in IDRs, interacts with multiple transcription factors of enhanceosomes and accordingly modulates its conformation to recognize and activate specific promoters (Malik and Roeder, 2010).
Given the various regulatory roles of IDRs, it is not surprising that mutations affecting the property of IDRs are associated with many pathological conditions including cancer and neurodegenerative diseases (Uversky et al., 2008). Notably, a significant portion of dosage-sensitive genes, harmful to cells when overexpressed, was found to encode IDRs (Vavouri et al., 2009). This dosage-sensitivity is suggested to originate from the promiscuous binding property of IDRs. Overexpression of a promiscuous IDR involved in a specific signaling pathway would result in off-target interactions, disrupting a balance among signaling pathways. Hence, IDRs are attractive targets in treating devastating human diseases (Cheng et al., 2006). Indeed, recent efforts with optimization for enhancing the binding affinity and specificity have successfully designed small molecules targeting IDRs to inhibit their interactions with binding partners or substrates (Cheng et al., 2006; Joshi and Vendruscolo, 2015; Metallo, 2010). The representative examples include the oncogenic transcription factor c-Myc (Follis et al., 2008; Hammoudeh et al., 2009; Harvey et al., 2012), the nuclear protein 1 (Neira et al., 2017), and the protein tyrosine phosphatase 1B (PT1B) (Krishnan et al., 2014). In particular, an allosteric site in the C-terminal IDR of PT1B was targeted by a natural product to lock the protein in an inactive form (Krishnan et al., 2014). In conclusion, IDRs present the unparalleled variety and complexity in the interaction with macromolecules. Although extremely challenging, deconvoluting the complicated interaction features into quantitative molecular terms will culminate in discovering novel regulatory mechanisms of fundamental biological processes. In turn, such discoveries will serve as a firm foundation for future medical applications.
We thank the Koh lab members for the constructive comments on the manuscript. This work was supported by Samsung Science & Technology Foundation and Research (SSTF-BA1802-09), POSCO Chung-am junior faculty fellowship, and Creative-Pioneering Researchers Program through Seoul National University.
The authors have no potential conflicts of interest to disclose.
Mol. Cells 2020; 43(11): 899-908
Published online November 30, 2020 https://doi.org/10.14348/molcells.2020.0186
Copyright © The Korean Society for Molecular and Cellular Biology.
Sunghyun Hong1,2 , Sangmin Choi1,2
, Ryeonghyeon Kim1,2
, and Junseock Koh1,*
1School of Biological Sciences, Seoul National University, Seoul 08826, Korea, 2These authors contributed equally to this work.
Correspondence to:junseockkoh@snu.ac.kr
Intrinsically disordered proteins or regions (IDPs or IDRs) are widespread in the eukaryotic proteome. Although lacking stable three-dimensional structures in the free forms, IDRs perform critical functions in various cellular processes. Accordingly, mutations and altered expression of IDRs are associated with many pathological conditions. Hence, it is of great importance to understand at the molecular level how IDRs interact with their binding partners. In particular, discovering the unique interaction features of IDRs originating from their dynamic nature may reveal uncharted regulatory mechanisms of specific biological processes. Here we discuss the mechanisms of the macromolecular interactions mediated by IDRs and present the relevant cellular processes including transcription, cell cycle progression, signaling, and nucleocytoplasmic transport. Of special interest is the multivalent binding nature of IDRs driving assembly of multicomponent macromolecular complexes. Integrating the previous theoretical and experimental investigations, we suggest that such IDR-driven multiprotein complexes can function as versatile allosteric switches to process diverse cellular signals. Finally, we discuss the future challenges and potential medical applications of the IDR research.
Keywords: allostery, coupled folding and binding, dynamic binding, intrinsically disordered proteins or regions, macromolecular complex, multivalent binding
In the classical structure-function paradigm, a protein must fold into a well-defined three-dimensional structure in order to carry out its function. The landmark achievements underpinning this paradigm were the determination of the first three-dimensional protein structures (myoglobin and hemoglobin) by Kendrew (Kendrew et al., 1958) and Perutz (Perutz et al., 1960). In parallel, the elegant biochemical experiments by Anfinsen and colleagues further demonstrated that the native three dimensional structure of a protein is determined by its amino acid sequence (Anfinsen et al., 1961). For more than half a century, the structure-function paradigm has been one of the most fundamental frameworks in understanding complex biological processes at the molecular level. However, in the mid-1990s, with advances in bioinformatics, it was realized that a significant portion of the proteomes from various species contains proteins or regions in a protein with the amino acid contents distinct from those of ordered globular proteins (Romero et al., 1998; Wootton, 1994). Concurrently, nuclear magnetic resonance (NMR) experiments observed regulatory proteins that are disordered but fully functional under physiological conditions (Daughdrill et al., 1997; Kriwacki et al., 1996). Following these counterintuitive discoveries, numerous biophysical and bioinformatic investigations have accumulated a substantial amount of evidence demonstrating the prevalence and the biological significance of disordered proteins, collectively termed intrinsically disordered proteins (IDPs) or regions in a protein (IDRs) (Dunker et al., 2002; Uversky, 2002; van der Lee et al., 2014; Wright and Dyson, 1999). About a third of the eukaryotic proteome contains IDRs of 30 or more residues in length (Oates et al., 2013; Ward et al., 2004) that are particularly abundant in the proteomes associated with dynamic cellular processes including gene expression and signaling as well as with cancer and neurodegenerative diseases (Babu et al., 2011; Iakoucheva et al., 2002; Liu et al., 2006; Uversky et al., 2008; Xie et al., 2007).
In order to fully appreciate the functional role of IDRs in such diverse biological processes, it is required to understand how IDRs interact with other macromolecules. Of particular interest is the unique interaction features of IDRs that arise from conformational disorder and potentially provide functional advantages in specific cellular processes. In this review, we briefly overview the sequence and conformational features of IDRs and then discuss the mechanisms of the IDR-mediated macromolecular interactions distinct from those of ordered proteins. To expedite the discussion, the mechanisms are classified into three broad categories: i) coupled folding and binding, ii) dynamic binding, and iii) multivalent binding (Fig. 1). We also present the exemplary cellular processes that are exquisitely regulated by these interaction principles. In particular, we focus on the third principle that enables IDRs to drive assembly of multicomponent macromolecular complexes that potentially function as manifold allosteric switches to process and integrate diverse cellular signals. The more comprehensive physical and biological attributes of IDRs including multisite posttranslational modifications and liquid-liquid phase separations have been discussed in many excellent reviews (Csizmok et al., 2016; Uversky, 2018; van der Lee et al., 2014; Wright and Dyson, 2015; Wu and Fuxreiter, 2016).
The amino acid sequences of IDRs exhibit compositional biases distinct from those of ordered proteins (Radivojac et al., 2007; Uversky et al., 2000; Weathers et al., 2004). In general, while the IDR sequences are enriched in polar and charged amino acids, they contain bulky hydrophobic amino acids in low abundance. These sequence characteristics directly dictates the conformational features of IDRs. One of the major driving forces in protein folding is the hydrophobic effect leading to compaction of a polypeptide chain into a stable tertiary structure that buries hydrophobic amino acids from the aqueous solvent (Chothia, 1974; Spolar and Record, 1994). The hydrophobic effect and other noncovalent interactions cooperatively stabilize the folded form, minimizing the free energy of the polypeptide best represented as funnel-shaped energy landscapes (Dill and Chan, 1997; Onuchic et al., 1995). In contrast, folding of an IDR into a unique and stable tertiary structure is unfavorable due to the low abundance of hydrophobic amino acids. Instead, they interact with the solvent through hydrophilic residues and dynamically interconvert among multiple conformational states (Csizmok et al., 2016; Dyson and Wright, 2005; Mukhopadhyay et al., 2007). Therefore, the typical energy landscape of an IDR is flat with multiple local minima (heterogeneity) that are separated by low activation energy barriers (fast interconversion) (Fisher and Stultz, 2011; Papoian, 2008; Wei et al., 2016).
The majority of IDRs utilize linear peptide motifs to interact with their target macromolecules. These linear motifs are categorized into two broad groups, namely short linear motifs (SLiMs) and molecular recognition features (MoRFs) (Davey et al., 2012; Mohan et al., 2006; Oldfield et al., 2005; Tompa et al., 2014). While SLiMs contain 3-10 amino acids among which 3-4 residues are directly involved in the binding specificity, MoRFs are longer peptide motifs with 10-70 amphipathic amino acids. All MoRFs and selected SLiMs undergo disorder-to-order transitions upon binding to their targets, a process termed coupled folding and binding (Dyson and Wright, 2002; Spolar and Record, 1994). The type of the ordered structures adopted by IDRs includes α-helices, β-strands, and rigid loops. The coupled folding and binding mechanism has been extensively investigated for the interactions of the transcriptional co-activator CBP (CREB-binding protein) with its various binding partners (Dyson and Wright, 2016). A representative example is the binding of the phosphorylated KID (kinase-inducible domain) of CREB (cAMP-response element-binding protein) to KIX (KID interacting domain) of CBP. While pKID is disordered in the free form, it folds into two helices upon binding to KIX (Fig. 1) (Radhakrishnan et al., 1997; Sugase et al., 2007).
A suggested functional role of coupled folding and binding is to achieve the structural complementarity or specificity of the binding interface formed between an IDR and its target protein or DNA (Spolar and Record, 1994; Wright and Dyson, 1999; 2009). From the thermodynamic point of view, folding of a disordered motif into a well-defined structure is entropically unfavorable and must reduce the overall binding affinity. Such an entropic cost has been predicted to fine-tune the affinity between strong and weak binding in order to achieve both the binding specificity and reversibility required for robust and rapid signaling processes in the cell (Dyson and Wright, 2005; Fuxreiter et al., 2004; Zhou, 2012).
Some IDRs transiently adopt secondary structures in the free states that closely resemble the conformations of the target-bound states (Tompa, 2002). These observations invoked a conformational selection mechanism in which the preformed structures are the binding competent states and therefore selected out of other disordered states by the binding partners (Fuxreiter et al., 2004). A physiological significance of the preformed structures of IDRs was suggested from the investigation of the interaction between the E3 ubiquitin ligase Mdm2 and the N-terminal transactivation domain (NTAD) of the p53 tumor suppressor protein (Borcherds et al., 2014). In the free form, NTAD is mostly disordered with the low helical content, but folds into an amphipathic helix upon binding Mdm2. Mutating one of the conserved flanking prolines (Pro27) into alanine increased the population of the helical form and the affinity of NTAD for Mdm2 by an order of magnitude, likely due to the reduction in the entropic cost for folding. However, the increased affinity altered the stability of p53, impaired target gene expression, and ultimately caused failure to induce cell cycle arrest upon DNA damage. Such a dramatic effect strongly suggests that the abundance of the preformed structure is optimized in the conformational ensemble of NTAD for signaling fidelity. However, it may be ambiguous in some instances to interpret mutational or protein engineering data in part because these molecular interventions can affect not only the equilibrium for the formation of the preformed structure but also the binding equilibrium itself. Thus, in order to properly assess the functionality of the preformed structures, it is critical to combine spectroscopic and thermodynamic approaches to dissect the observed effects into these two (or more) contributions.
Because of the weak and reversible nature of the target binding, IDRs often retain the dynamic property even in the target-bound states, forming an ensemble of conformationally heterogeneous complexes (fuzzy complex) (Baker et al., 2007; Fuxreiter, 2018; Mittag et al., 2010; Tompa and Fuxreiter, 2008). A direct visual demonstration of the fuzzy complex was provided in the NMR investigation of the interaction between the transcription activator GCN4 and one of the multiple subunits of the Mediator complex (transcription co-activator) (Brzovic et al., 2011; Tuttle et al., 2018). The central activation domain (cAD) of GCN4 comprising 35 amino acids is disordered in the free form, but folds into an amphipathic α-helix in the residues 117 to 124 upon binding to the first activator binding domain (ABD1) of Med15. Remarkably, the folded helical region adopts multiple orientations on the ABD1 surface and forms a fuzzy complex (Fig. 1). The fuzziness arises from the surprisingly simple binding interface in which a few key hydrophobic residues of cAD are inserted into the shallow hydrophobic clefts on the ABD1 surface. The hydrophobic residues of cAD dynamically sample the multiple clefts on ABD1, leading to the multiple orientations of the helical segment. A subsequent study has shown that the peptides mimicking cAD of GCN4 but with the increased fuzziness bind ABD1 tighter than the wild type cAD (Warfield et al., 2014). Remarkably, these peptides enhanced the transcriptional activities of the Med15-dependent genes, underscoring the physiological significance of the fuzzy complexes.
Conformational fluctuations of IDRs in the target-bound states allow their limited regions to be transiently exposed to the solvent and make alternative interactions with other proteins as demonstrated in the extensive structural and biophysical investigations of the interaction between p27 and Cdk2/cyclin A (Fig. 2) (Galea et al., 2008; Grimmler et al., 2007; Lacy et al., 2004; Tsytlonok et al., 2019). The disordered KID (kinase inhibitory domain, residues 29-90) of p27 uses three subdomains, designated D1, D2, and 310, to bind the Cdk2/cyclin A complex and inhibit the catalytic activity of Cdk2. In particular, the D2 and 310 subdomains adopt a beta hairpin (and an intermolecular beta-sheet) and a 310 helix, respectively, in the bound state in which a tyrosine residue (Y88) in the 310 helix is inserted into the active site of Cdk2. However, persistent flexibility of the D2 and 310 subdomains allows these regions to dynamically sample the bound and solvent-exposed states. In particular, Y88 is transiently exposed during this “breathing” motion for phosphorylation by the non-receptor tyrosine kinase BCR-ABL. The phosphorylation unfolds and completely ejects the 310 subdomain from the active site of Cdk2, which partially restores the catalytic activity of Cdk2. Subsequently, a threonine residue (T187) located near the C-terminus of p27 is phosphorylated by Cdk2 via a pseudounimolecular mechanism. In this phosphorylation step, the extremely flexible nature of the entire C-terminal IDR of p27 permits close proximity between the residue T187 and the Cdk2 active site. In turn, p27 with the phosphorylated T187 can be polyubiquitinated by the E3 ligase SCFSkp2 and selectively degraded by the 26S proteasome. Finally, the fully active Cdk2/cyclin A complex drives progression from G1 to S phase of the cell division cycle. In summary, p27 exploits its intrinsic disorder and large-scale dynamic motions in order to exquisitely regulate the catalytic activity of Cdk2/cyclin A through the multistep post-translational modification cascade.
IDRs driving the formation of higher-order assemblies have caught much attention for the last decade (Fung et al., 2018; Fuxreiter et al., 2014; Wu and Fuxreiter, 2016). These IDRs present multiple MoRFs or SLiMs to make multivalent interactions with many binding partners (Fig. 1) (Cumberworth et al., 2013; Wright and Dyson, 2015). Multiple binding sites can be identical, promoting cooperative and high-affinity binding of an IDR to multiple copies of a target protein (Praefcke et al., 2004). Otherwise, multiple distinct MoRFs or SLiMs embedded in an IDR recruit diverse binding partners, enabling the IDR to function as an interaction hub (Cortese et al., 2008; Dunker et al., 2005; Haynes et al., 2006; Hegyi et al., 2007; van der Lee et al., 2014; Wright and Dyson, 2015). A molecular basis for the scaffolding property of IDRs was deduced to be their disordered and extended conformations that expose greater surface area available for potential interactions as compared to the ordered proteins of the same size (Dosztanyi et al., 2006; Gunasekaran et al., 2003). In addition, IDRs utilize linear motifs comprising relatively small numbers of amino acids for target binding (Tompa et al., 2014) while ordered proteins arrange numerous amino acids, often distant from each other on a linear sequence, into complex three-dimensional binding sites. Therefore, it should be more efficient to incorporate multiple target binding sites into an IDR than into an ordered protein (Cumberworth et al., 2013).
Axin is a representative hub protein interacting with various proteins in the Wnt, JNK, TGF-β, and p53 signaling pathways (Cortese et al., 2008). In the Wnt signaling pathway, β-catenin, casein kinase Iα, and glycogen synthetase kinase 3β (GSK-3β) bind the distinct motifs present in the central disordered region (residues ~200 to ~800) of axin to assemble the β-catenin destruction complex (Cortese et al., 2008; Xue et al., 2013). The assembly has been predicted to increase the local concentrations of these axin binding proteins and consequently facilitate the interactions among them (i.e., reduction of dimensionality) for the efficient phosphorylation and degradation of β-catenin (Noutsou et al., 2011; Xue et al., 2013). Of note, the binding of GSK-3β was observed to inhibit the interaction of the JNK signaling pathway protein MEKK1 with axin (Zhang et al., 2001), which suggests a negative coupling mechanism to prevent cross-talks among different signaling pathways. Since the binding sites of these two proteins on axin are non-overlapping, the observed inhibitory effect is not competitive. Furthermore, it was suggested that the facilitated interactions among the proteins in the β-catenin destruction complex may be driven not only by the aforementioned reduction of dimensionality but also by conformational changes of axin (Cortese et al., 2008). Collectively, it is plausible to hypothesize that allosteric coupling exists among target binding sites of an IDR hub for sophisticated regulation of the binding affinities and catalytic activities of hub-bound proteins.
In the structure-function paradigm, allostery has been conceived as a communication between two distinct ligand binding sites on a macromolecule connected by a well-defined structured pathway (Changeux, 2013; Monod et al., 1965). However, a theoretical framework has been proposed to show the feasibility of allosteric coupling between two sites within an IDR (Hilser and Thompson, 2007; Motlagh et al., 2014). This framework, termed ensemble allosteric model (EAM), elegantly demonstrates that coupling between two sites can be achieved by an intricate balance among the intrinsic stabilities of the two sites, their affinities for the respective ligands, and the interaction energy between the two sites. Furthermore, the EAM predicted allosteric coupling to be optimal when one or both of the ligand binding sites are disordered in the free forms.
The experimental demonstration for IDR-mediated allosteric coupling is beginning to emerge from a handful of systems: the interaction between glucocorticoid receptor and its target DNA (Li et al., 2017); the transcriptional regulation of the phd/doc toxin-antitoxin operon of bacteriophage P1 (Garcia-Pino et al., 2010); the interactions between the adenoviral oncoprotein E1A and host regulatory proteins (Fig. 3A) (Ferreon et al., 2013); the interactions of nuclear pore proteins with transport factors (Fig. 3B) (Blus et al., 2019; Koh and Blobel, 2015). E1A is an intrinsically disordered hub protein that uses multiple promiscuous binding motifs to interact with numerous host proteins (Pelka et al., 2008). For instance, E1A uses its N-terminal region and two conserved regions (termed CR1 and CR2) to bind the TAZ2 (transcriptional adaptor zinc-finger 2) domain of CBP and the pocket domain of pRb (retinoblastoma protein) in order to reprogram the cell cycle and transcriptional regulation (Ferrari et al., 2008; Horwitz et al., 2008). For the E1A variant containing the N-terminal region and CR1, the binding sites of the two host proteins are distinct and positively coupled through disorder-to-order transitions to promote the ternary complex formation (Ferreon et al., 2013). However, truncation of the N-terminal region of E1A drives a striking transition in allostery from positive to negative coupling, which favors the formation of the binary complexes (Fig. 3A). The positive allosteric coupling was suggested critical to cooperatively recruit the two host proteins and facilitate the acetylation of pRb by the HAT (histone acetyl transferase) domain of CBP. In turn, the acetylation triggers degradation of pRb, causing uncontrolled cell cycle progression to S phase and proliferation of infected cells. Conversely, the negative coupling implies that the concentration and the overall activity of the ternary complex can be fine-tuned by the availability of the N-terminal region of E1A. The in vivo availability of the N-terminal region can be regulated by binding of other cellular proteins to this region. Therefore, in order to corroborate the physiological significance of the truncation-driven allosteric switch, it is important to test whether negative allostery can be induced by occupation of the N-terminal region of intact E1A by other proteins.
Recent thermodynamic and structural investigations of the interactions of nuclear pore proteins (nucleoporins) with transport factors (karyopherins) have demonstrated that interactions of karyopherins with IDRs of nucleoporins allosterically modulate the interaction networks among nucleoporins (Blus et al., 2019; Koh and Blobel, 2015). In particular, Nup53 utilizes its N and C-terminal IDRs to interact with other nucleoporins (Nic96 and Nup157) and assemble into the core of the nuclear pore complex (NPC). The C-terminal IDR of Nup53 interacts with a karyopherin (Kap121) as well, and the karyopherin binding allosterically destabilizes the nucleoporin interactions at both the N and C-terminal IDRs (Blus et al., 2019). The flexibility of the NPC core induced by the allosteric destabilization may be required to accommodate potential conformational changes in the central channel of the NPC during various transport events (Blus et al., 2019; Koh and Blobel, 2015).
These pioneering discoveries suggest a novel feature of allostery in multiprotein complexes assembled from IDRs (Fig. 4). Because of the reversible target binding and conformational fluctuation of IDRs, an IDR-based macromolecular complex exists as an ensemble of various states with similar free energy levels and low energy barriers among them (Fig. 4A). Each state has a unique conformation/function determined by the affinities of the bound protein subunits and cooperativities among them. At the same time, IDRs present multiple regulatory sites to recognize diverse external signals (protein binding, PTMs) that allosterically shift the ensemble toward distinct conformational and functional states (Figs. 4B-4D). Because of the preconditioned heterogeneity and low energy barriers within the ensemble, the multifaceted allosteric modulation of the macromolecular complex is thermodynamically favorable and kinetically fast, particularly as compared to allosteric coupling in ordered proteins (Fig. 4E). In short, allostery mediated by IDRs confers versatility and agility on the macromolecular complexes that must promptly process and adapt to the various external stimuli.
For the last two decades, our perception of IDRs has dramatically evolved from nonfunctional terminal tails or passive domain tethers to critical regulatory components in dynamic cellular processes. Despite the paradigm shift, it is far from complete to understand the staggering complexity of how IDRs interact with various targets and thereby regulate biological processes especially in the context of large macromolecular complexes such as the NPC and the Mediator complex. As discussed earlier (Fig. 4), IDRs may function not only as architectural scaffolds but also as manifold allosteric switches in these large complexes. In order to test this model, it is essential to purify intact proteins that contain both IDRs and ordered domains, which is a nontrivial technical problem. A theoretical challenge is to develop analytical tools quantifying the stoichiometries, binding constants, and cooperativities of non 1:1, multivalent interactions among IDRs and target proteins. In parallel, an integrative structural approach utilizing x-ray crystallography (target-bound motif structure), NMR (conformational ensemble), cryo-EM (global architecture), and low-resolution methods (e.g., light scattering, fluorescence resonance energy transfer [FRET], cross-linking) is required to explore changes in the conformational and functional states of the macromolecular complexes driven by IDR-mediated allostery. The combined effort will eventually make significant contributions in addressing the long-standing biological questions, for example, how the Mediator complex, enriched in IDRs, interacts with multiple transcription factors of enhanceosomes and accordingly modulates its conformation to recognize and activate specific promoters (Malik and Roeder, 2010).
Given the various regulatory roles of IDRs, it is not surprising that mutations affecting the property of IDRs are associated with many pathological conditions including cancer and neurodegenerative diseases (Uversky et al., 2008). Notably, a significant portion of dosage-sensitive genes, harmful to cells when overexpressed, was found to encode IDRs (Vavouri et al., 2009). This dosage-sensitivity is suggested to originate from the promiscuous binding property of IDRs. Overexpression of a promiscuous IDR involved in a specific signaling pathway would result in off-target interactions, disrupting a balance among signaling pathways. Hence, IDRs are attractive targets in treating devastating human diseases (Cheng et al., 2006). Indeed, recent efforts with optimization for enhancing the binding affinity and specificity have successfully designed small molecules targeting IDRs to inhibit their interactions with binding partners or substrates (Cheng et al., 2006; Joshi and Vendruscolo, 2015; Metallo, 2010). The representative examples include the oncogenic transcription factor c-Myc (Follis et al., 2008; Hammoudeh et al., 2009; Harvey et al., 2012), the nuclear protein 1 (Neira et al., 2017), and the protein tyrosine phosphatase 1B (PT1B) (Krishnan et al., 2014). In particular, an allosteric site in the C-terminal IDR of PT1B was targeted by a natural product to lock the protein in an inactive form (Krishnan et al., 2014). In conclusion, IDRs present the unparalleled variety and complexity in the interaction with macromolecules. Although extremely challenging, deconvoluting the complicated interaction features into quantitative molecular terms will culminate in discovering novel regulatory mechanisms of fundamental biological processes. In turn, such discoveries will serve as a firm foundation for future medical applications.
We thank the Koh lab members for the constructive comments on the manuscript. This work was supported by Samsung Science & Technology Foundation and Research (SSTF-BA1802-09), POSCO Chung-am junior faculty fellowship, and Creative-Pioneering Researchers Program through Seoul National University.
The authors have no potential conflicts of interest to disclose.
Jaejun Choi, Ryeonghyeon Kim, and Junseock Koh
Mol. Cells 2022; 45(7): 444-453 https://doi.org/10.14348/molcells.2022.0035