Mol. Cells 2018; 41(10): 889-899
Published online October 10, 2018
https://doi.org/10.14348/molcells.2018.0192
© The Korean Society for Molecular and Cellular Biology
Correspondence to : *Correspondence: khhan600@kribb.re.kr
Intrinsically disordered proteins (IDPs) are highly unorthodox proteins that do not form three-dimensional structures under physiological conditions. The discovery of IDPs has destroyed the classical structure-function paradigm in protein science, 3-D structure = function, because IDPs even without well-folded 3-D structures are still capable of performing important biological functions and furthermore are associated with fatal diseases such as cancers, neurodegenerative diseases and viral pandemics. Pre-structured motifs (PreSMos) refer to transient local secondary structural elements present in the target-unbound state of IDPs. During the last two decades PreSMos have been steadily acknowledged as the critical determinants for target binding in dozens of IDPs. To date, the PreSMo concept provides the most convincing structural rationale explaining the IDP-target binding behavior at an atomic resolution. Here we present a brief developmental history of PreSMos and describe their common characteristics. We also provide a list of newly discovered PreSMos along with their functional relevance.
Keywords IDPs, IDR (Intrinsically Disordered Region), NMR, IUPs (Intrinsically Unfolded Proteins), PreSMos (Pre-Structured Motifs)
The central dogma in protein science, established over the last half-century, states that “a well-folded 3-D structure is a prerequisite for protein function”. The 3-D structure in this statement refers to the one that is observed under near-physiological conditions, (i.e., ~ pH 7, ambient temperature, and aqueous buffer, etc.). Intrinsically unstructured/unfolded proteins (IUPs), now more commonly known as intrinsically disordered proteins (IDPs) (Dunker et al., 2013), are very peculiar proteins that do not form well-folded 3-D structures even under non-denaturing conditions. Naturally, IDPs are of great importance from a protein folding perspective. More intriguing are the observations that IDPs are functional or active without 3-D structures, for example, being involved in transcription (Lee et al., 2000; Sherr, 2004; Kim et al., 2017a; 2017b), translation (Fletcher and Wagner, 1998; Kim et al., 2015), cell cycle regulation (Pavletich, 1999), chaperoning (Hong et al., 2005), and membrane-binding (Atwal et al., 2007; Eliezer et al., 2001). The discovery of many, as much as half of the entire human proteome (Dunker et al., 2000), such highly unorthodox proteins has strongly suggested that the classical structure-function relationship of proteins needs to be reexamined. Cleary, the golden paradigm in structural biology, 3-D structure = protein function, is no longer valid. Several reviews dealing with general aspects of IDPs are available for further reading (Chavali et al., 2017; Dunker et al., 2013; Lee et al., 2012; Uversky and Dunker, 2010; Uversky, 2015).
Not only because of a basic scientific point of view are our interests in IDPs keen but also because of the fact that these proteins are involved in many fatal diseases. For example, ~80% of human cancers are associated with IDPs (Galea et al., 2008) such as eIF4E-binding proteins (4EBPs) (Fletcher and Wagner, 1998; Kim et al., 2015), Bcl-XL (Xu et al., 2009], human glucocorticoid receptors (Kim et al., 2017b), E7 (Lee et al., 2016), hypoxia inducible factors (Semenza, 2003; Kim et al., 2009a) and p53 all of which are so-called “hybrid-type” IDPs where intrinsically disordered regions (IDRs) coexist with globular domains (Lee et al., 2000; Wells et al., 2008). The causative agents of mad cow disease or Creutzfeldt-Jakob disease (CJD) in humans are prions that are also IDPs where a C-terminal globular domain coexists with a long intrinsically disordered region (IDR) at the N-terminus encompassing ~120 amino acid residues (James et al., 1997; Liu et al., 1999). Alpha-synuclein (Eliezer et al., 2001) and tau (Bibow et al., 2011; Künze et al., 2012), implicated in PD (Parkinson’s diseases) and AD (Alzheimer’s disease) respectively, are also IDPs. Furthermore, several viral strains including the well-known AIDS-causing HIV-1 produce IDPs (Chi et al., 2007; Feuerstein et al., 2012; Kim et al., 2009b; Lee et al., 2016; Liang et al., 2007; Reingewertz et al., 2009; To et al., 2016). Clearly, there is an immediate and strong need to acquire very thorough knowledge not only on the normal functionality of IDPs but also on their pathologic connection to above diseases since it has become apparent that the classical globular protein based approach is unlikely to provide us with sufficient information that can be used for developing effective weaponry against IDP-associated diseases.
The most obvious characteristic of IDPs is that they do not possess spatially-disposed active pockets, a fact that brings us to a simple but profound question of how then these long malleable stretches of amino acids (sometimes hundreds of amino acids) can bind their targets. Targets of IDPs are not just proteins, but can be nucleic acids (Thapar et al., 2004; To et al., 2016; Wells et al., 2008), lipids, metals, and small molecules (Follis et al., 2008; Metallo, 2010). Efforts were made recently to classify IDPs into several subfamilies (van der Lee et al., 2014). While intuitive, such a classification fails to provide detailed insights into how all these different subfamilies bind their targets. The well-cited expression “coupled folding and binding” (Dyson and Wright, 2002) is a useful term, but only as far as one tries to depict the rather easily-predictable topological change that IDPs need experience upon binding to their partners. This generic description therefore fails to provide any atomistic details associated with IDP-target binding that, if available, would be highly valuable for IDP-based drug design. As the axiom “
Direct and quantitative evidence that some sort of a secondary structural element, e.g., helix, is needed for transcriptional activity came from an NMR study on p53 TAD (Lee et al., 2000). The 73-residue long p53 TAD in its unbound form was found “unstructured” in a tertiary sense, yet contained a
While the conceptual development on PreSMos has been somewhat delayed due to previous misconceptions that IDPs were completely unstructured, the presence of local residual secondary structures in isolated IDPs has been increasingly detected by many NMR investigations including a few critical NMR reports published at the turn of the century. The first key report found that p53 TAD has local structural elements (a helix and two turns) in the unbound state, as described above (Lee et al., 2000). The second report made by Ramelot et al. demonstrated that the cytoplasmic tail of the amyloid precursor protein forms a transient structure and such a pre-ordered structure is important for its binding to cytosolic factors (Ramelot et al., 2000). Sayers et al. also reported that structural preordering important for target binding was detected in the N-terminal region of ribosomal protein S4 (Sayers et al., 2000). Zhao et al. reported local structural elements in the overall loosely folded Sml1 (Zhao et al., 2000). Zitzewitz et al. published an article in 2000 with a title of “Preformed secondary structure drives the association reaction of GCN4-p1, a model coiled-coil system” (Zitzewitz et al., 2000). Another report by Bienkiewicz et al. described the functional consequences of pre-organized helical structure in the intrinsically disordered cell-cycle inhibitor p27 (Kip1) (Bienkiewicz et al., 2002). All these early NMR studies contributed to the foundation of the PreSMo concept, the idea that IDPs are not completely unstructured, but mostly unstructured (MU), and contain PreSMos. Following these NMR reports, bioinformatics studies proposed similar concepts such as PSE (Pre-formed Structural Element) (Fuxreiter et al., 2004), MoRF (Molecular Recognition Element) (Mohan et al., 2006; Oldfield et al., 2005), or primary contact sites a few years later. All these results, NMR experimental or predicted, point in unison to the idea that IDPs possess local secondary structural elements that are “hot spots” for target-binding.
In 2012 we published the first comprehensive review on PreSMos (Lee et al., 2012) because no explicit articles on the subject were available, despite the fact that PreSMos (whatever they may be called) have been recognized for more than a decade as very important (perhaps the most significant) features explaining IDP-target binding on a per-residue basis. Several additional pieces of evidence have recently been published, demonstrating the functional significance of PreSMos (Kim et al., 2017b; Iešmantavičius et al., 2014; Mohan et al., 2014; Salamanova et al., 2018). In the first review, we presented 27 IDPs/IDRs containing PreSMos which constitute ~56% of all IDPs characterized by then. Most critically, we introduced the term pre-structured motifs (PreSMos) in order to unambiguously point out the importance of the pre-structured nature of target-binding segments in free IDPs and to provide a convenient term that can replace various names “transient, nascent, residual, minimally-structured, non-negligible, pre-existing, pre-formed, or pre-ordered secondary structures”. These terms were used mainly by NMR structural biologists who did not hasten to generalize the concept with a particular name realizing that PreSMos had only been observed in a handful of IDPs until 2005. This review is a follow-up to our 2012 review. Because we have found 20 more PreSMos since our first review here we provide an updated list of PreSMos and a brief description on their functional significance; however, we acknowledge that the list may still be incomplete. In addition, we describe differences between the PreSMos that are detected experimentally and the terms derived from bioinformatics predictions. With this review we now have 47 IDPs/IDRs containing PreSMos, strongly suggesting that PreSMos are general
The definition of a PreSMo was given in our 2012 review (Lee et al., 2012); PreSMos are NMR-detected transient secondary structural elements within long (minimally 40 residues) and functionally-active IDRs of IDPs. We underline the fact that PreSMos are the
Table 1 is an updated list of PreSMos found in 47 IDPs/IDRs. The total number of IDPs studied in detail by NMR (with an exception of C-XPC studied by SAXS) is 70 even though the number of reports are more than 70 reports because some IDPs were investigated more than once. Notably, several IDPs (4EBP1, HIV-1 Tat, VP16 TAD, securin, and p21Waf1/Cipl/Sdil) that were originally reported as CU types with no PreSMo turned out to be MU types in later studies. For convenience, we added the 20 newly-identified PreSMos (starting from Myb25) at the end of Table 1, including a few PreSMos that were actually reported before 2012, but were not included in our 2102 review. Although the number of investigated IDPs is small compared to the possible number of IDPs/IDRs predicted by bioinformatics (thousands or more) it is sufficient to provide an overview on PreSMos. In 2012, the number of IDPs/IDRs with PreSMos was 27 (out of 48 studied) it is now 48 out of 70; the proportion of MU type IDPs/IDRs increased from 56% to 69%. The proportion is likely to increase if more IDPs/IDRs are characterized. One immediate feature noted in Table 1 is that in most cases we essentially study IDRs rather than IDPs (only 15 are IDPs), although we speak of IDPs. Note that all IDPs/IDRs in Table 1 are composed of more than 40 residues except for Myb25/Myb32. IDPs by definition consist of a minimal 40 residues and are distinct from the short flexible linkers and loops typically composed of fewer than 20 residues. The other feature shown in Table 1 is that most PreSMos are helices even though some are turns, β-strands and poly-proline type II helices. A high percentage of helices is also noted in MoRFs where α-MoRFs are the majority (Mohan et al., 2006; Oldfield et al., 2005).
NMR is the main tool that enables quantitative definition of a PreSMo (Chi et al., 2007; Eliezer et al., 2001; Kim et al., 2009a; 2009b; 2015; 2017b; Lee et al., 2000; 2012; 2016; Liu et al., 1999; Xu et al., 2009). The beauty of NMR technique is that the presence of a PreSMo is reflected in several independent NMR parameters. In the early days, one needed to provide all of these NMR parameters (chemical shifts, inter-proton NOEs, J-couplings, T1 and T2 relaxation times, heteronuclear NOEs, temperature coefficients of backbone amide protons, etc.) to prove the existence of a PreSMo (Lee et al., 2000), whereas it usually is sufficient in recent years to just provide SSP (secondary structure propensity) scores (Marsh et al., 2006) as the concept of PreSMos has become more and more widely accepted. The SSP scores derived from CSIs (chemical shift indices) reveal an actual percentile value of a PreSMo population whereas CSIs can only indicate whether or not a PreSMo is present. A very important feature of a PreSMo is that it is never 100% populated. On the average, they are ~30% pre-populated, i.e., transient (Lee et al., 2012). This transient nature of PreSMos probably is the main cause that made several NMR investigators fail to detect them in the early days (Fletcher and Wagner, 1998; O’Hare and Williams, 1992; Radhakrishnan et al., 1997).
The most common bioinformatics term used interchangeably with PreSMos is MoRFs (Mohan et al., 2006). For example, the mdm2-binding helix PreSMo detected by NMR in free p53 TAD is reported as an α-MoRF, a MoRF seen as an alpha helix in the target-bound state (Oldfield et al., 2005). Although there are a few more (out of more than a hundred) MoRFs that overlap with PreSMos fundamental differences exist between MoRFs and PreSMos. By definition MoRFs were identified in the x-ray structures of complexes between target proteins and short fragments of IDPs/IDRs that were predicted to be disordered by bioinformatics disorder prediction algorithms. The concept of the MoRF implicitly acknowledges the idea that the structured, bound-conformation is induced only upon target binding which is based on the early-day idea that IDPs have no pre-structured secondary structures. On the other hand, the definition of a PreSMo is not associated with the target-bound structure at all. In this regard, stating that a MoRF is found by NMR experiments is inaccurate (Bourhis et al., 2004) since one cannot tell if a MoRF would exist within an isolated IDP. One has to obtain a complex structure between a target and a PreSMo/MoRF in order to conclude that the putative MoRF (which is actually a PreSMo) is indeed a MoRF. Thus, a helix PreSMo may become an α-MoRF, but the opposite may not necessarily be true. With PreSMos we get the realistic percentage of the pre-structuredness whereas MoRFs do not provide such information. The term PreSMo was introduced as late as in 2012, but we underline that the PreSMos mentioned here refer to all the pre-existing or pre-formed residual secondary structures detected by NMR years before the term MoRF was introduced. It will be interesting to see how many of MoRFs may indeed coincide with PreSMos. One has to use a MoRF fragment, or preferably a longer IDR that encompasses such a MoRF fragment, to answer this question. An active pocket is a property of a globular protein that exists before binding to its target. In this regard, PreSMos qualify as the “active sites”, albeit not pockets, of IDPs since they are present before target binding. The same cannot be said for MoRFs. In Fig. 1, we show a conceptual scheme depicting what we have just described.
As is evident from Table 1 the PreSMos are the target-binding hot spots already present in free IDPs/IDRs; PreSMos are primed in a conformation similar to the target-bound conformation. Such pre-structuring is certainly advantageous for avoiding an entropic penalty that has to be paid when malleable IDPs/IDRs bind globular targets. Recent mutation studies demonstrated that the degree of pre-population of PreSMos is subtly controlled for efficient target binding (Borcherds et al., 2014; Iešmantavičius et al., 2014; Kim et al., 2017b; Salamanova et al., 2018). In many globular proteins a single mutation in the active site completely nullifies protein function by disabling the binding of ligands. PreSMos are often found in tandem within sufficiently long transcription factor IDPs/IDRs separated by ~30 residues (Chi et al., 2005). One PreSMo may be a high-affinity binding site to a target whereas the other is a low-affinity site to the same target. A synergistic effect of multiple PreSMos for efficient target binding has been discussed previously (Lee et al., 2000).
Since it was believed that any secondary structure in IDPs should be induced only upon target binding many implicitly concluded that IDPs would totally lie outside of the classical structure-function paradigm, not obeying the rules established by structural biology such as shape complementarity. However, PreSMos reveal to us that IDPs abide by the shape complementarity extremely well via binding to targets (see Fig. 3 in Lee et al., 2012). In other words, when the secondary structural aspects for IDP-target binding are considered IDPs are not unorthodox at all. The genuine novelty of IDPs is the absence of 3-D structures only, not the absence of secondary structures. Structure (or PreSMos) does dictate function in the case of IDPs.
The NMR spectral quality of hybrid-type IDPs is often not good enough for a full resonance assignment since a globular domain and an IDR will tumble around in different time scales. Consequently, a reductionist approach of using an IDR instead of a whole IDP is often necessary. One precaution when using such an approach is that one should use a sufficiently long region, not a short fragment since PreSMos may exist in the outside of the region covered by a short peptide (Botuyan et al., 1997; Uesugi et al., 1997). A longer IDR often contains a more populated PreSMo due to a tertiary effect that stabilizes the transient secondary structures, as was demonstrated in the case of p53 TAD and its short helical peptide (Botuyan et al., 1997; Lee et al., 2000). Another case demonstrating the significance of using a fragment of appropriate length is Myb 25/Myb32 (Table 1; Arai et al., 2015). The populations of a helix PreSMo in Myb25 and in Myb32 are ~30% and ~70%, respectively, demonstrating that having just 7 more residues in Myb32 drastically increases the PreSMo population by ~40%. Using bioinformatics disorder prediction programs may keep one from choosing an inappropriate IDR for NMR experiments. The inappropriate choice of an IDR for NMR investigation might be another reason why some NMR studies failed to detect PreSMos.
Because IDPs are relatively a new field several new (sometimes rather vague) terms and expressions were introduced in order to describe novel concepts or phenomena associated with IDPs (van der Lee et al., 2014). Aside from bioinformatics terms (PSEs, MoRFs) other numerous expressions basically with the same meaning as PreSMos were proposed such as “only partly structured” (Zor et al., 2002), “small islands of secondary structures” (Laptenko and Prives, 2006), “weakly structured” (Chumakov, 2007), “limited structure” (Lavery and McEwan, 2008), “minimal ordering of short linear motifs” (Mittag et al., 2008), “residual secondary structural elements” (Kim et al., 2009b), “transient order” (Feuerstein et al., 2012), “transiently ordered regions”, “localized structurally ordered regions” (Zheng et al., 2012), and dynamic local structure (Lum et al., 2012) just to name a few.
Being flooded with so many terms that are intended to denote PreSMos is not unique for PreSMos. For example, it took more than a decade for the IDP research community to come up with a more or less consensus term for IDPs in 2013 (Dunker et al., 2013). Yet overly creative names not precisely in line with the classical concepts and terms in structural biology or protein science created a certain degree of confusion that led to a situation where the importance of IDPs was not duly appreciated for some time (Uversky and Dunker, 2010). Here, we present again an easy-to-use term of PreSMos to designate what has been described by several generic names realizing that the existence and functional significance of PreSMos will be appreciated more and more (now in ~70% of IDPs). Most importantly, the statement that
Approximately 20 years have passed since IDPs emerged in protein science and structural biology communities. With more than ~5,000 papers on the subject no one would deny that IDPs have brought a critical paradigm shift to protein research, undoubtedly requiring that biochemistry textbooks be revised to include IDPs. There has been a tendency to put excessive emphasis on the disordered nature per se of IDPs with subsequent attempts trying to relate it to function due to an early-day misconception. For example, some reports on PreSMos were interpreted simply as evidence for disorder itself rather than as evidence for the existence of PreSMos (Cheng et al., 2006; Midic et al., 2009; Radivojac et al., 2007). It is important for the protein science community to learn a non-traditional view on proteins and their structures in two aspects. First, it is now well-known fact that long regions (40 residues and up) of proteins can be intrinsically disordered beyond the level of short disordered loops (Dunker et al., 2000). Proteins exist as dynamic conformational ensembles, not as snap-short entities that the PDB structures (both x-ray and NMR) have depicted for a long time. Second, in the absence of a well-defined 3D structure, the minimal residual secondary structures embedded into the flexible long IDR play key roles in target binding and govern the function of IDPs. Even in globular proteins, an important role of tertiary structure is to place the interacting (or active) secondary structures in a proper orientation relative to target proteins.
A discussion of PreSMos naturally brings us to the question of whether the mechanism of IDP-target binding follows IF (induced fit) or CS (conformational selection). In the case of KID-KIX binding IF (Sugase et al., 2007) was shown to be dominant whereas in the N-tail of viral nucleoproteins CS appeared prevalent (Jensen et al., 2008). In recent years, it is believed that these two mechanisms would work in concert; CS at the start of binding and IF at the final stage of binding (tightening). The existence of PreSMos itself is not an evidence for CS and one need to use a kinetics approach in order to determine if faster binding (kon increased) can be achieved with more pre-structuring of the PreSMo segments. Future works employing PreSMo mutants should provide a more concreate answer on this aspect. No matter whether PreSMos are pre-structured or not, i.e., even if a PreSMo may become unstructured and re-structured for binding as one may envision in the IF model (To et al., 2016) it still does not change the fact that the fragment forming a PreSMo per se is important for target binding.
It is possible that PreSMos are also important for aggregation via oligomerization (Atwal et al., 2007; Eliezer et al., 2001). Both oligomerization and IDP-target binding are protein-protein interactions; the former is homogenous IDP-IDP self-binding while the latter is heterogeneous binding. Even though the PreSMo concept is broadly (~70%) applicable we do not expect that it should be applicable to all IDPs since there are IDPs/IDRs that are composed of simple dipeptide repeats (Lee et al., 2016). The PreSMo concept is also unlikely to be applicable to highly charged polyvalent IDPs which maintain unfolded topology even after target binding (Borgia et al., 2018). Due to strong attractive electrostatic interactions these IDPs have a very high affinity (pM) towards each other, unlike MU-type IDPs that bind their targets via PreSMos typically with μM affinities. However, it is noteworthy that even polyglutamine and polyproline were shown to form α-helical and PPII helix type secondary structures, respectively (Mukrasch et al., 2009; Newcombe et al., 2018). Recent reports showed that IDP studies may lead to the development of new pharmaceuticals. For example, some PreSMo-antagonists against target proteins could serve as anti-cancer compounds (Kim et al., 2017a) and certain small molecule inhibitors can directly inhibit IDPs themselves (Follis et al., 2008; Metallo, 2010).
A list of MU-type IDPs/IDRs containing PreSMos
Name | Number of residues | P/Rb | Location of PreSMo residuesc | Populationd (%) | Role/Binding | References |
---|---|---|---|---|---|---|
FlgM | 97 | P | 60–73 | 50±10 | σ28 | Daughdrill et al., 1997 |
83–90 | 50±10 | |||||
42–50 | 20 | |||||
KID | 60 | R | 119–129 | >50 | KIX | Radhakrishnan et al., 1998 |
134–143 | ~10 | Hua et al., 1998 | ||||
GBD/CRIB in WASP W7 | 68 | R | 252–264 | ~14 | Cdc42/Rac | Rudolph et al., 1998 |
(201–268) | ||||||
HIV-1 Nef | 56 | R | 14–22 : helix I | 18 | Geyer et al., 1999 | |
(2–57) | 35–41 : helix II (Hα only) | |||||
Synaptobrevin-2 | 96 | R | 78–91 | 45 | core complex forming | Hazzard et al., 1999 |
APPC | 47 | R | 20–23 | 30 | X11 | Ramelot et al., 2000 |
(649–695) | 27–35 | 20 | ||||
37–45 (Hα only) | 30 | |||||
p53 TAD | 73 | R | 18–26 : helix | 20 | Mdm2 | Lee et al., 2000 |
40–44 : turn I | 5 | RPA, TFEII | ||||
48–53 : turn II | 15 | |||||
RPS4 | 200 | P | 12–15 | 8 | rRNA, ribosomal proteins | Sayers et al., 2000 |
30–33: β? | 23 | |||||
α-Synuclein | 140 | P | 18–31 | ~10 | amyloid-forming | Eliezer et al., 2001 |
Murrali et al., 2018 | ||||||
N-term. Tmod 1 | 92 | R | 24–35 | NA | tropomyosin | Greenfield et al., 2005 |
VP16 TAD | 79 | 443–447 | 25 | hTAFII31 PC4 | Jonker et al., 2005 | |
(412–490) | R | 469–483 | 15 | |||
VP16 TAD | 79 | R | 424–433/442–446, 465–467/472–479 (Hα only) | 60/40 | hTAFII31 PC4 | Kim et al., 2009 |
(412–490) | 10/20 | |||||
Dynein interm. chain | 40 | R | 223–228 | NA | light chains | Benison et al., 2006 |
(198–237) | Benison et al., 2007 | |||||
γ-Synuclein | 127 | P | 49–99 | ~15 | Marsh et al., 2006 | |
HMGA1 | 107 | P | 3–9 | 8 | 20 different proteins | Buchko et al., 2007 |
64–67 | ||||||
CFTR | 185 | R | interaction between R region and NT-binding domain 1 | Baker et al., 2007 | ||
(654–838) | 654–668, 759–764, 766–776, 801–817 | >5 | ||||
>5 | ||||||
744–753 | >5 | |||||
NS5A-D2 (HCV) | 93 | R | L48-V57 | 20 | - | Liang et al., 2007 |
(250–342) | L86-E96 (Hα only) | 25 | ||||
preS1 of HBV | 119 | R | 32–36, 41–45 | ~10 | hepatocyte receptor-binding | Chi et al., 2007 |
11–18, 22–25, 37–40, 46–50. (Hα only) | ~10 | |||||
~10 | ||||||
β-synuclein | 134 | P | NA | ~20 | - | Sung et al., 2007 |
Securin | 202 | P | 150–159 : helix | 45 | - | Csizmok et al., 2008 |
113–127 (β) | 15 | |||||
174–178 | 20 | |||||
C-XPCe | 126 | R | 818–843: helix | ~30 | Centri2 | Miron et al., 2008 |
(815–940) | 847–860: helix | ~30 | TFIIH | |||
891–901: helix | NA | |||||
908–915: helix | NA | |||||
923–930: helix | NA | |||||
MSP2 | 237 | P | 14–21 | 35 | - | Zhang et al., 2008 |
140–150 | 35 | |||||
197–211 | 20 | |||||
DARPP-32 | 118 | R | 22–29 | 50 | PP1 | Dancheck et al., 2008 |
103–114 | 25 | |||||
I-2 | 156 | R | 36–42 | 30 | PP1 | Dancheck et al., 2008 |
(9–164) | 96–106 | 48 (70) | ||||
127–154 | 67 (90) | |||||
132–138 | >98 | |||||
ENSA | 121 | P | 32–36 | 40 | - | Boettcher et al., 2008 |
48–50 | 10 | Boettcher et al., 2007 | ||||
65–70 | 30 | |||||
ODD/HIF-1α | 74 | R | 438–440 | ~10 | - | Kim et al., 2009 |
(404–477) | 467–477 | |||||
Sml1 | 104 | P | 4–14: helix | ~20 | RNR binding | Zhao et al., 2000 |
(1–104) | 61–80: helix | ~70 | Dimer forming | |||
Myb25 | 25 | R | 295–309 : helix | 25~30 | KIX | Zor et al., 2002 |
(291–315) | ||||||
N tail | 125 | R | 488–499 : helix | NA | phosphoprotein P | Bourhis et al., 2004 |
Measles virus nucleoprotein | (401–525) | |||||
dSLBP | 92 | R | 28–45 : helix | NA | mRNA | Thapar et al., 2004 |
(17–108) | 50–57 : helix | stem-loop | ||||
66–75 : helix | ||||||
91–96 : helix | ||||||
Tβ-4 | 43 | P | 5–16 : helix | NA | Ca ATP | Domanski et al., 2004 |
(1–43) | G-actin | |||||
N tail | 82 | R | 479–484 | 36 | phosphoprotein P | Jensen et al., 2008 |
Sendai Virus nucleoprotein | (443–524) | 476–488 | 38 | |||
478–492 | 11 | |||||
Sic1 | 90 | R | 20–30 | 20 | Cdc4 | Mittag et al., 2008 |
(1–90) | 63–68 | |||||
c-Myc | 88 | R | 26–34 : helix | 40 | Bin-SH3 domain | Andresen et al., 2012 |
(1–88) | 47–52 : helix | 25 | 24–31(TRRAP binding) | |||
20–23 : β-turn | ||||||
ExsE | 88 | P | 42–51: helix | NA | ExsC | Zheng et al., 2012 |
(1–88) | 61–65: helix | |||||
NS5A | 415 | R | 401–412 : helix | NA | Bin1-SH3 | Braeuning, 2013 |
HCV | (33–447) | 427–445 : helix | ||||
NS5A | 179 | R | 205–221 : helix I | 38 | Bin1-SH3 | Feuerstein et al., 2012 |
HCV | (191–369) | 251–266 : helix II | 38 | Solyom et al., 2015 | ||
292–306 : helix III | 51 | |||||
4EBP2 | 120 | P | 1–5 | 15~37 | eIF4E | Lukhele et al., 2013 |
(1–120) | 33–37 | |||||
50–64 | ||||||
86–89 | ||||||
96–105 | ||||||
E7 | 40 | R | 8–13 : helix | NA | E2 | Noval et al., 2013 |
HPV | (1–40) | 17–29 : helix | ||||
33–38 : PPII | ||||||
4EBP1 | 70 | R | 56–63 : helix | 20 | eIF4E | Kim et al., 2015 |
(49–118) | ||||||
Myb32 | 32 | R | 290–310 : helix | ~70 | KIX | Arai et al., 2015 |
(284–315) | ||||||
E7 | 46 | R | 7–14 : helix | 10 | E2 | Lee et al., 2016 |
HPV | (1–46) | 20–26 : helix | 20 | |||
CBP-ID4 | 207 | R | 1852–1875: helix | ~60 | - | Piai et al., 2016 |
(1851–2057) | 1951–1978: helix | |||||
HIV-1 Tat | 121 | P | 27–32: helix | ~20 | Fab’ | To et al., 2016 |
(1–121)a | 41–59: helix | ~30 | P-TEFb | |||
70–81: β sheet | ~25 | TAR-cyclin T1 | ||||
93–99: β sheet | ~25 | |||||
105–112: β sheet | ~10 | |||||
SUSP4 | 100 | R | 263–291 : helix | ~30 | mdm2 | Kim et al., 2017 |
(201–300) | 265–270 : helix | ~10 | ||||
281–291 : helix | ||||||
hGRtau1c | 64 | R | 185–202: helix | 20~30 | TAZ2 | Kim et al., 2017 |
(181–244) | 206–225: helix | |||||
232–244: helix | ||||||
Huntingtin Httex1 25Q | 95 | P | 18–42: helix | NA | Cytotoxic | Newcombe et al., 2018 |
(1–95) | Membrane binding | |||||
Aggregation |
aThe numbering includes a 20-residue N-terminal tag.
bAn IDP (P) versus an IDR (R).
cResidue numbers are taken from the original report.
dPopulation of PreSMos are read from the mid-point of the SSP scores that are calculated from chemical shifts in BMRB or literature. Shown in bold are the populations described in the original report. When the populations described in the original report without SSP scores differed significantly from the calculated SSP scores, the SSP scores are provided in parenthesis.
NA = not available.
Mol. Cells 2018; 41(10): 889-899
Published online October 31, 2018 https://doi.org/10.14348/molcells.2018.0192
Copyright © The Korean Society for Molecular and Cellular Biology.
Do-Hyoung Kim, and Kyou-Hoon Han*
Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, 34141, Korea
Correspondence to:*Correspondence: khhan600@kribb.re.kr
Intrinsically disordered proteins (IDPs) are highly unorthodox proteins that do not form three-dimensional structures under physiological conditions. The discovery of IDPs has destroyed the classical structure-function paradigm in protein science, 3-D structure = function, because IDPs even without well-folded 3-D structures are still capable of performing important biological functions and furthermore are associated with fatal diseases such as cancers, neurodegenerative diseases and viral pandemics. Pre-structured motifs (PreSMos) refer to transient local secondary structural elements present in the target-unbound state of IDPs. During the last two decades PreSMos have been steadily acknowledged as the critical determinants for target binding in dozens of IDPs. To date, the PreSMo concept provides the most convincing structural rationale explaining the IDP-target binding behavior at an atomic resolution. Here we present a brief developmental history of PreSMos and describe their common characteristics. We also provide a list of newly discovered PreSMos along with their functional relevance.
Keywords: IDPs, IDR (Intrinsically Disordered Region), NMR, IUPs (Intrinsically Unfolded Proteins), PreSMos (Pre-Structured Motifs)
The central dogma in protein science, established over the last half-century, states that “a well-folded 3-D structure is a prerequisite for protein function”. The 3-D structure in this statement refers to the one that is observed under near-physiological conditions, (i.e., ~ pH 7, ambient temperature, and aqueous buffer, etc.). Intrinsically unstructured/unfolded proteins (IUPs), now more commonly known as intrinsically disordered proteins (IDPs) (Dunker et al., 2013), are very peculiar proteins that do not form well-folded 3-D structures even under non-denaturing conditions. Naturally, IDPs are of great importance from a protein folding perspective. More intriguing are the observations that IDPs are functional or active without 3-D structures, for example, being involved in transcription (Lee et al., 2000; Sherr, 2004; Kim et al., 2017a; 2017b), translation (Fletcher and Wagner, 1998; Kim et al., 2015), cell cycle regulation (Pavletich, 1999), chaperoning (Hong et al., 2005), and membrane-binding (Atwal et al., 2007; Eliezer et al., 2001). The discovery of many, as much as half of the entire human proteome (Dunker et al., 2000), such highly unorthodox proteins has strongly suggested that the classical structure-function relationship of proteins needs to be reexamined. Cleary, the golden paradigm in structural biology, 3-D structure = protein function, is no longer valid. Several reviews dealing with general aspects of IDPs are available for further reading (Chavali et al., 2017; Dunker et al., 2013; Lee et al., 2012; Uversky and Dunker, 2010; Uversky, 2015).
Not only because of a basic scientific point of view are our interests in IDPs keen but also because of the fact that these proteins are involved in many fatal diseases. For example, ~80% of human cancers are associated with IDPs (Galea et al., 2008) such as eIF4E-binding proteins (4EBPs) (Fletcher and Wagner, 1998; Kim et al., 2015), Bcl-XL (Xu et al., 2009], human glucocorticoid receptors (Kim et al., 2017b), E7 (Lee et al., 2016), hypoxia inducible factors (Semenza, 2003; Kim et al., 2009a) and p53 all of which are so-called “hybrid-type” IDPs where intrinsically disordered regions (IDRs) coexist with globular domains (Lee et al., 2000; Wells et al., 2008). The causative agents of mad cow disease or Creutzfeldt-Jakob disease (CJD) in humans are prions that are also IDPs where a C-terminal globular domain coexists with a long intrinsically disordered region (IDR) at the N-terminus encompassing ~120 amino acid residues (James et al., 1997; Liu et al., 1999). Alpha-synuclein (Eliezer et al., 2001) and tau (Bibow et al., 2011; Künze et al., 2012), implicated in PD (Parkinson’s diseases) and AD (Alzheimer’s disease) respectively, are also IDPs. Furthermore, several viral strains including the well-known AIDS-causing HIV-1 produce IDPs (Chi et al., 2007; Feuerstein et al., 2012; Kim et al., 2009b; Lee et al., 2016; Liang et al., 2007; Reingewertz et al., 2009; To et al., 2016). Clearly, there is an immediate and strong need to acquire very thorough knowledge not only on the normal functionality of IDPs but also on their pathologic connection to above diseases since it has become apparent that the classical globular protein based approach is unlikely to provide us with sufficient information that can be used for developing effective weaponry against IDP-associated diseases.
The most obvious characteristic of IDPs is that they do not possess spatially-disposed active pockets, a fact that brings us to a simple but profound question of how then these long malleable stretches of amino acids (sometimes hundreds of amino acids) can bind their targets. Targets of IDPs are not just proteins, but can be nucleic acids (Thapar et al., 2004; To et al., 2016; Wells et al., 2008), lipids, metals, and small molecules (Follis et al., 2008; Metallo, 2010). Efforts were made recently to classify IDPs into several subfamilies (van der Lee et al., 2014). While intuitive, such a classification fails to provide detailed insights into how all these different subfamilies bind their targets. The well-cited expression “coupled folding and binding” (Dyson and Wright, 2002) is a useful term, but only as far as one tries to depict the rather easily-predictable topological change that IDPs need experience upon binding to their partners. This generic description therefore fails to provide any atomistic details associated with IDP-target binding that, if available, would be highly valuable for IDP-based drug design. As the axiom “
Direct and quantitative evidence that some sort of a secondary structural element, e.g., helix, is needed for transcriptional activity came from an NMR study on p53 TAD (Lee et al., 2000). The 73-residue long p53 TAD in its unbound form was found “unstructured” in a tertiary sense, yet contained a
While the conceptual development on PreSMos has been somewhat delayed due to previous misconceptions that IDPs were completely unstructured, the presence of local residual secondary structures in isolated IDPs has been increasingly detected by many NMR investigations including a few critical NMR reports published at the turn of the century. The first key report found that p53 TAD has local structural elements (a helix and two turns) in the unbound state, as described above (Lee et al., 2000). The second report made by Ramelot et al. demonstrated that the cytoplasmic tail of the amyloid precursor protein forms a transient structure and such a pre-ordered structure is important for its binding to cytosolic factors (Ramelot et al., 2000). Sayers et al. also reported that structural preordering important for target binding was detected in the N-terminal region of ribosomal protein S4 (Sayers et al., 2000). Zhao et al. reported local structural elements in the overall loosely folded Sml1 (Zhao et al., 2000). Zitzewitz et al. published an article in 2000 with a title of “Preformed secondary structure drives the association reaction of GCN4-p1, a model coiled-coil system” (Zitzewitz et al., 2000). Another report by Bienkiewicz et al. described the functional consequences of pre-organized helical structure in the intrinsically disordered cell-cycle inhibitor p27 (Kip1) (Bienkiewicz et al., 2002). All these early NMR studies contributed to the foundation of the PreSMo concept, the idea that IDPs are not completely unstructured, but mostly unstructured (MU), and contain PreSMos. Following these NMR reports, bioinformatics studies proposed similar concepts such as PSE (Pre-formed Structural Element) (Fuxreiter et al., 2004), MoRF (Molecular Recognition Element) (Mohan et al., 2006; Oldfield et al., 2005), or primary contact sites a few years later. All these results, NMR experimental or predicted, point in unison to the idea that IDPs possess local secondary structural elements that are “hot spots” for target-binding.
In 2012 we published the first comprehensive review on PreSMos (Lee et al., 2012) because no explicit articles on the subject were available, despite the fact that PreSMos (whatever they may be called) have been recognized for more than a decade as very important (perhaps the most significant) features explaining IDP-target binding on a per-residue basis. Several additional pieces of evidence have recently been published, demonstrating the functional significance of PreSMos (Kim et al., 2017b; Iešmantavičius et al., 2014; Mohan et al., 2014; Salamanova et al., 2018). In the first review, we presented 27 IDPs/IDRs containing PreSMos which constitute ~56% of all IDPs characterized by then. Most critically, we introduced the term pre-structured motifs (PreSMos) in order to unambiguously point out the importance of the pre-structured nature of target-binding segments in free IDPs and to provide a convenient term that can replace various names “transient, nascent, residual, minimally-structured, non-negligible, pre-existing, pre-formed, or pre-ordered secondary structures”. These terms were used mainly by NMR structural biologists who did not hasten to generalize the concept with a particular name realizing that PreSMos had only been observed in a handful of IDPs until 2005. This review is a follow-up to our 2012 review. Because we have found 20 more PreSMos since our first review here we provide an updated list of PreSMos and a brief description on their functional significance; however, we acknowledge that the list may still be incomplete. In addition, we describe differences between the PreSMos that are detected experimentally and the terms derived from bioinformatics predictions. With this review we now have 47 IDPs/IDRs containing PreSMos, strongly suggesting that PreSMos are general
The definition of a PreSMo was given in our 2012 review (Lee et al., 2012); PreSMos are NMR-detected transient secondary structural elements within long (minimally 40 residues) and functionally-active IDRs of IDPs. We underline the fact that PreSMos are the
Table 1 is an updated list of PreSMos found in 47 IDPs/IDRs. The total number of IDPs studied in detail by NMR (with an exception of C-XPC studied by SAXS) is 70 even though the number of reports are more than 70 reports because some IDPs were investigated more than once. Notably, several IDPs (4EBP1, HIV-1 Tat, VP16 TAD, securin, and p21Waf1/Cipl/Sdil) that were originally reported as CU types with no PreSMo turned out to be MU types in later studies. For convenience, we added the 20 newly-identified PreSMos (starting from Myb25) at the end of Table 1, including a few PreSMos that were actually reported before 2012, but were not included in our 2102 review. Although the number of investigated IDPs is small compared to the possible number of IDPs/IDRs predicted by bioinformatics (thousands or more) it is sufficient to provide an overview on PreSMos. In 2012, the number of IDPs/IDRs with PreSMos was 27 (out of 48 studied) it is now 48 out of 70; the proportion of MU type IDPs/IDRs increased from 56% to 69%. The proportion is likely to increase if more IDPs/IDRs are characterized. One immediate feature noted in Table 1 is that in most cases we essentially study IDRs rather than IDPs (only 15 are IDPs), although we speak of IDPs. Note that all IDPs/IDRs in Table 1 are composed of more than 40 residues except for Myb25/Myb32. IDPs by definition consist of a minimal 40 residues and are distinct from the short flexible linkers and loops typically composed of fewer than 20 residues. The other feature shown in Table 1 is that most PreSMos are helices even though some are turns, β-strands and poly-proline type II helices. A high percentage of helices is also noted in MoRFs where α-MoRFs are the majority (Mohan et al., 2006; Oldfield et al., 2005).
NMR is the main tool that enables quantitative definition of a PreSMo (Chi et al., 2007; Eliezer et al., 2001; Kim et al., 2009a; 2009b; 2015; 2017b; Lee et al., 2000; 2012; 2016; Liu et al., 1999; Xu et al., 2009). The beauty of NMR technique is that the presence of a PreSMo is reflected in several independent NMR parameters. In the early days, one needed to provide all of these NMR parameters (chemical shifts, inter-proton NOEs, J-couplings, T1 and T2 relaxation times, heteronuclear NOEs, temperature coefficients of backbone amide protons, etc.) to prove the existence of a PreSMo (Lee et al., 2000), whereas it usually is sufficient in recent years to just provide SSP (secondary structure propensity) scores (Marsh et al., 2006) as the concept of PreSMos has become more and more widely accepted. The SSP scores derived from CSIs (chemical shift indices) reveal an actual percentile value of a PreSMo population whereas CSIs can only indicate whether or not a PreSMo is present. A very important feature of a PreSMo is that it is never 100% populated. On the average, they are ~30% pre-populated, i.e., transient (Lee et al., 2012). This transient nature of PreSMos probably is the main cause that made several NMR investigators fail to detect them in the early days (Fletcher and Wagner, 1998; O’Hare and Williams, 1992; Radhakrishnan et al., 1997).
The most common bioinformatics term used interchangeably with PreSMos is MoRFs (Mohan et al., 2006). For example, the mdm2-binding helix PreSMo detected by NMR in free p53 TAD is reported as an α-MoRF, a MoRF seen as an alpha helix in the target-bound state (Oldfield et al., 2005). Although there are a few more (out of more than a hundred) MoRFs that overlap with PreSMos fundamental differences exist between MoRFs and PreSMos. By definition MoRFs were identified in the x-ray structures of complexes between target proteins and short fragments of IDPs/IDRs that were predicted to be disordered by bioinformatics disorder prediction algorithms. The concept of the MoRF implicitly acknowledges the idea that the structured, bound-conformation is induced only upon target binding which is based on the early-day idea that IDPs have no pre-structured secondary structures. On the other hand, the definition of a PreSMo is not associated with the target-bound structure at all. In this regard, stating that a MoRF is found by NMR experiments is inaccurate (Bourhis et al., 2004) since one cannot tell if a MoRF would exist within an isolated IDP. One has to obtain a complex structure between a target and a PreSMo/MoRF in order to conclude that the putative MoRF (which is actually a PreSMo) is indeed a MoRF. Thus, a helix PreSMo may become an α-MoRF, but the opposite may not necessarily be true. With PreSMos we get the realistic percentage of the pre-structuredness whereas MoRFs do not provide such information. The term PreSMo was introduced as late as in 2012, but we underline that the PreSMos mentioned here refer to all the pre-existing or pre-formed residual secondary structures detected by NMR years before the term MoRF was introduced. It will be interesting to see how many of MoRFs may indeed coincide with PreSMos. One has to use a MoRF fragment, or preferably a longer IDR that encompasses such a MoRF fragment, to answer this question. An active pocket is a property of a globular protein that exists before binding to its target. In this regard, PreSMos qualify as the “active sites”, albeit not pockets, of IDPs since they are present before target binding. The same cannot be said for MoRFs. In Fig. 1, we show a conceptual scheme depicting what we have just described.
As is evident from Table 1 the PreSMos are the target-binding hot spots already present in free IDPs/IDRs; PreSMos are primed in a conformation similar to the target-bound conformation. Such pre-structuring is certainly advantageous for avoiding an entropic penalty that has to be paid when malleable IDPs/IDRs bind globular targets. Recent mutation studies demonstrated that the degree of pre-population of PreSMos is subtly controlled for efficient target binding (Borcherds et al., 2014; Iešmantavičius et al., 2014; Kim et al., 2017b; Salamanova et al., 2018). In many globular proteins a single mutation in the active site completely nullifies protein function by disabling the binding of ligands. PreSMos are often found in tandem within sufficiently long transcription factor IDPs/IDRs separated by ~30 residues (Chi et al., 2005). One PreSMo may be a high-affinity binding site to a target whereas the other is a low-affinity site to the same target. A synergistic effect of multiple PreSMos for efficient target binding has been discussed previously (Lee et al., 2000).
Since it was believed that any secondary structure in IDPs should be induced only upon target binding many implicitly concluded that IDPs would totally lie outside of the classical structure-function paradigm, not obeying the rules established by structural biology such as shape complementarity. However, PreSMos reveal to us that IDPs abide by the shape complementarity extremely well via binding to targets (see Fig. 3 in Lee et al., 2012). In other words, when the secondary structural aspects for IDP-target binding are considered IDPs are not unorthodox at all. The genuine novelty of IDPs is the absence of 3-D structures only, not the absence of secondary structures. Structure (or PreSMos) does dictate function in the case of IDPs.
The NMR spectral quality of hybrid-type IDPs is often not good enough for a full resonance assignment since a globular domain and an IDR will tumble around in different time scales. Consequently, a reductionist approach of using an IDR instead of a whole IDP is often necessary. One precaution when using such an approach is that one should use a sufficiently long region, not a short fragment since PreSMos may exist in the outside of the region covered by a short peptide (Botuyan et al., 1997; Uesugi et al., 1997). A longer IDR often contains a more populated PreSMo due to a tertiary effect that stabilizes the transient secondary structures, as was demonstrated in the case of p53 TAD and its short helical peptide (Botuyan et al., 1997; Lee et al., 2000). Another case demonstrating the significance of using a fragment of appropriate length is Myb 25/Myb32 (Table 1; Arai et al., 2015). The populations of a helix PreSMo in Myb25 and in Myb32 are ~30% and ~70%, respectively, demonstrating that having just 7 more residues in Myb32 drastically increases the PreSMo population by ~40%. Using bioinformatics disorder prediction programs may keep one from choosing an inappropriate IDR for NMR experiments. The inappropriate choice of an IDR for NMR investigation might be another reason why some NMR studies failed to detect PreSMos.
Because IDPs are relatively a new field several new (sometimes rather vague) terms and expressions were introduced in order to describe novel concepts or phenomena associated with IDPs (van der Lee et al., 2014). Aside from bioinformatics terms (PSEs, MoRFs) other numerous expressions basically with the same meaning as PreSMos were proposed such as “only partly structured” (Zor et al., 2002), “small islands of secondary structures” (Laptenko and Prives, 2006), “weakly structured” (Chumakov, 2007), “limited structure” (Lavery and McEwan, 2008), “minimal ordering of short linear motifs” (Mittag et al., 2008), “residual secondary structural elements” (Kim et al., 2009b), “transient order” (Feuerstein et al., 2012), “transiently ordered regions”, “localized structurally ordered regions” (Zheng et al., 2012), and dynamic local structure (Lum et al., 2012) just to name a few.
Being flooded with so many terms that are intended to denote PreSMos is not unique for PreSMos. For example, it took more than a decade for the IDP research community to come up with a more or less consensus term for IDPs in 2013 (Dunker et al., 2013). Yet overly creative names not precisely in line with the classical concepts and terms in structural biology or protein science created a certain degree of confusion that led to a situation where the importance of IDPs was not duly appreciated for some time (Uversky and Dunker, 2010). Here, we present again an easy-to-use term of PreSMos to designate what has been described by several generic names realizing that the existence and functional significance of PreSMos will be appreciated more and more (now in ~70% of IDPs). Most importantly, the statement that
Approximately 20 years have passed since IDPs emerged in protein science and structural biology communities. With more than ~5,000 papers on the subject no one would deny that IDPs have brought a critical paradigm shift to protein research, undoubtedly requiring that biochemistry textbooks be revised to include IDPs. There has been a tendency to put excessive emphasis on the disordered nature per se of IDPs with subsequent attempts trying to relate it to function due to an early-day misconception. For example, some reports on PreSMos were interpreted simply as evidence for disorder itself rather than as evidence for the existence of PreSMos (Cheng et al., 2006; Midic et al., 2009; Radivojac et al., 2007). It is important for the protein science community to learn a non-traditional view on proteins and their structures in two aspects. First, it is now well-known fact that long regions (40 residues and up) of proteins can be intrinsically disordered beyond the level of short disordered loops (Dunker et al., 2000). Proteins exist as dynamic conformational ensembles, not as snap-short entities that the PDB structures (both x-ray and NMR) have depicted for a long time. Second, in the absence of a well-defined 3D structure, the minimal residual secondary structures embedded into the flexible long IDR play key roles in target binding and govern the function of IDPs. Even in globular proteins, an important role of tertiary structure is to place the interacting (or active) secondary structures in a proper orientation relative to target proteins.
A discussion of PreSMos naturally brings us to the question of whether the mechanism of IDP-target binding follows IF (induced fit) or CS (conformational selection). In the case of KID-KIX binding IF (Sugase et al., 2007) was shown to be dominant whereas in the N-tail of viral nucleoproteins CS appeared prevalent (Jensen et al., 2008). In recent years, it is believed that these two mechanisms would work in concert; CS at the start of binding and IF at the final stage of binding (tightening). The existence of PreSMos itself is not an evidence for CS and one need to use a kinetics approach in order to determine if faster binding (kon increased) can be achieved with more pre-structuring of the PreSMo segments. Future works employing PreSMo mutants should provide a more concreate answer on this aspect. No matter whether PreSMos are pre-structured or not, i.e., even if a PreSMo may become unstructured and re-structured for binding as one may envision in the IF model (To et al., 2016) it still does not change the fact that the fragment forming a PreSMo per se is important for target binding.
It is possible that PreSMos are also important for aggregation via oligomerization (Atwal et al., 2007; Eliezer et al., 2001). Both oligomerization and IDP-target binding are protein-protein interactions; the former is homogenous IDP-IDP self-binding while the latter is heterogeneous binding. Even though the PreSMo concept is broadly (~70%) applicable we do not expect that it should be applicable to all IDPs since there are IDPs/IDRs that are composed of simple dipeptide repeats (Lee et al., 2016). The PreSMo concept is also unlikely to be applicable to highly charged polyvalent IDPs which maintain unfolded topology even after target binding (Borgia et al., 2018). Due to strong attractive electrostatic interactions these IDPs have a very high affinity (pM) towards each other, unlike MU-type IDPs that bind their targets via PreSMos typically with μM affinities. However, it is noteworthy that even polyglutamine and polyproline were shown to form α-helical and PPII helix type secondary structures, respectively (Mukrasch et al., 2009; Newcombe et al., 2018). Recent reports showed that IDP studies may lead to the development of new pharmaceuticals. For example, some PreSMo-antagonists against target proteins could serve as anti-cancer compounds (Kim et al., 2017a) and certain small molecule inhibitors can directly inhibit IDPs themselves (Follis et al., 2008; Metallo, 2010).
. A list of MU-type IDPs/IDRs containing PreSMos.
Name | Number of residues | P/Rb | Location of PreSMo residuesc | Populationd (%) | Role/Binding | References |
---|---|---|---|---|---|---|
FlgM | 97 | P | 60–73 | 50±10 | σ28 | Daughdrill et al., 1997 |
83–90 | 50±10 | |||||
42–50 | 20 | |||||
KID | 60 | R | 119–129 | >50 | KIX | Radhakrishnan et al., 1998 |
134–143 | ~10 | Hua et al., 1998 | ||||
GBD/CRIB in WASP W7 | 68 | R | 252–264 | ~14 | Cdc42/Rac | Rudolph et al., 1998 |
(201–268) | ||||||
HIV-1 Nef | 56 | R | 14–22 : helix I | 18 | Geyer et al., 1999 | |
(2–57) | 35–41 : helix II (Hα only) | |||||
Synaptobrevin-2 | 96 | R | 78–91 | 45 | core complex forming | Hazzard et al., 1999 |
APPC | 47 | R | 20–23 | 30 | X11 | Ramelot et al., 2000 |
(649–695) | 27–35 | 20 | ||||
37–45 (Hα only) | 30 | |||||
p53 TAD | 73 | R | 18–26 : helix | 20 | Mdm2 | Lee et al., 2000 |
40–44 : turn I | 5 | RPA, TFEII | ||||
48–53 : turn II | 15 | |||||
RPS4 | 200 | P | 12–15 | 8 | rRNA, ribosomal proteins | Sayers et al., 2000 |
30–33: β? | 23 | |||||
α-Synuclein | 140 | P | 18–31 | ~10 | amyloid-forming | Eliezer et al., 2001 |
Murrali et al., 2018 | ||||||
N-term. Tmod 1 | 92 | R | 24–35 | NA | tropomyosin | Greenfield et al., 2005 |
VP16 TAD | 79 | 443–447 | 25 | hTAFII31 PC4 | Jonker et al., 2005 | |
(412–490) | R | 469–483 | 15 | |||
VP16 TAD | 79 | R | 424–433/442–446, 465–467/472–479 (Hα only) | 60/40 | hTAFII31 PC4 | Kim et al., 2009 |
(412–490) | 10/20 | |||||
Dynein interm. chain | 40 | R | 223–228 | NA | light chains | Benison et al., 2006 |
(198–237) | Benison et al., 2007 | |||||
γ-Synuclein | 127 | P | 49–99 | ~15 | Marsh et al., 2006 | |
HMGA1 | 107 | P | 3–9 | 8 | 20 different proteins | Buchko et al., 2007 |
64–67 | ||||||
CFTR | 185 | R | interaction between R region and NT-binding domain 1 | Baker et al., 2007 | ||
(654–838) | 654–668, 759–764, 766–776, 801–817 | >5 | ||||
>5 | ||||||
744–753 | >5 | |||||
NS5A-D2 (HCV) | 93 | R | L48-V57 | 20 | - | Liang et al., 2007 |
(250–342) | L86-E96 (Hα only) | 25 | ||||
preS1 of HBV | 119 | R | 32–36, 41–45 | ~10 | hepatocyte receptor-binding | Chi et al., 2007 |
11–18, 22–25, 37–40, 46–50. (Hα only) | ~10 | |||||
~10 | ||||||
β-synuclein | 134 | P | NA | ~20 | - | Sung et al., 2007 |
Securin | 202 | P | 150–159 : helix | 45 | - | Csizmok et al., 2008 |
113–127 (β) | 15 | |||||
174–178 | 20 | |||||
C-XPCe | 126 | R | 818–843: helix | ~30 | Centri2 | Miron et al., 2008 |
(815–940) | 847–860: helix | ~30 | TFIIH | |||
891–901: helix | NA | |||||
908–915: helix | NA | |||||
923–930: helix | NA | |||||
MSP2 | 237 | P | 14–21 | 35 | - | Zhang et al., 2008 |
140–150 | 35 | |||||
197–211 | 20 | |||||
DARPP-32 | 118 | R | 22–29 | 50 | PP1 | Dancheck et al., 2008 |
103–114 | 25 | |||||
I-2 | 156 | R | 36–42 | 30 | PP1 | Dancheck et al., 2008 |
(9–164) | 96–106 | 48 (70) | ||||
127–154 | 67 (90) | |||||
132–138 | >98 | |||||
ENSA | 121 | P | 32–36 | 40 | - | Boettcher et al., 2008 |
48–50 | 10 | Boettcher et al., 2007 | ||||
65–70 | 30 | |||||
ODD/HIF-1α | 74 | R | 438–440 | ~10 | - | Kim et al., 2009 |
(404–477) | 467–477 | |||||
Sml1 | 104 | P | 4–14: helix | ~20 | RNR binding | Zhao et al., 2000 |
(1–104) | 61–80: helix | ~70 | Dimer forming | |||
Myb25 | 25 | R | 295–309 : helix | 25~30 | KIX | Zor et al., 2002 |
(291–315) | ||||||
N tail | 125 | R | 488–499 : helix | NA | phosphoprotein P | Bourhis et al., 2004 |
Measles virus nucleoprotein | (401–525) | |||||
dSLBP | 92 | R | 28–45 : helix | NA | mRNA | Thapar et al., 2004 |
(17–108) | 50–57 : helix | stem-loop | ||||
66–75 : helix | ||||||
91–96 : helix | ||||||
Tβ-4 | 43 | P | 5–16 : helix | NA | Ca ATP | Domanski et al., 2004 |
(1–43) | G-actin | |||||
N tail | 82 | R | 479–484 | 36 | phosphoprotein P | Jensen et al., 2008 |
Sendai Virus nucleoprotein | (443–524) | 476–488 | 38 | |||
478–492 | 11 | |||||
Sic1 | 90 | R | 20–30 | 20 | Cdc4 | Mittag et al., 2008 |
(1–90) | 63–68 | |||||
c-Myc | 88 | R | 26–34 : helix | 40 | Bin-SH3 domain | Andresen et al., 2012 |
(1–88) | 47–52 : helix | 25 | 24–31(TRRAP binding) | |||
20–23 : β-turn | ||||||
ExsE | 88 | P | 42–51: helix | NA | ExsC | Zheng et al., 2012 |
(1–88) | 61–65: helix | |||||
NS5A | 415 | R | 401–412 : helix | NA | Bin1-SH3 | Braeuning, 2013 |
HCV | (33–447) | 427–445 : helix | ||||
NS5A | 179 | R | 205–221 : helix I | 38 | Bin1-SH3 | Feuerstein et al., 2012 |
HCV | (191–369) | 251–266 : helix II | 38 | Solyom et al., 2015 | ||
292–306 : helix III | 51 | |||||
4EBP2 | 120 | P | 1–5 | 15~37 | eIF4E | Lukhele et al., 2013 |
(1–120) | 33–37 | |||||
50–64 | ||||||
86–89 | ||||||
96–105 | ||||||
E7 | 40 | R | 8–13 : helix | NA | E2 | Noval et al., 2013 |
HPV | (1–40) | 17–29 : helix | ||||
33–38 : PPII | ||||||
4EBP1 | 70 | R | 56–63 : helix | 20 | eIF4E | Kim et al., 2015 |
(49–118) | ||||||
Myb32 | 32 | R | 290–310 : helix | ~70 | KIX | Arai et al., 2015 |
(284–315) | ||||||
E7 | 46 | R | 7–14 : helix | 10 | E2 | Lee et al., 2016 |
HPV | (1–46) | 20–26 : helix | 20 | |||
CBP-ID4 | 207 | R | 1852–1875: helix | ~60 | - | Piai et al., 2016 |
(1851–2057) | 1951–1978: helix | |||||
HIV-1 Tat | 121 | P | 27–32: helix | ~20 | Fab’ | To et al., 2016 |
(1–121)a | 41–59: helix | ~30 | P-TEFb | |||
70–81: β sheet | ~25 | TAR-cyclin T1 | ||||
93–99: β sheet | ~25 | |||||
105–112: β sheet | ~10 | |||||
SUSP4 | 100 | R | 263–291 : helix | ~30 | mdm2 | Kim et al., 2017 |
(201–300) | 265–270 : helix | ~10 | ||||
281–291 : helix | ||||||
hGRtau1c | 64 | R | 185–202: helix | 20~30 | TAZ2 | Kim et al., 2017 |
(181–244) | 206–225: helix | |||||
232–244: helix | ||||||
Huntingtin Httex1 25Q | 95 | P | 18–42: helix | NA | Cytotoxic | Newcombe et al., 2018 |
(1–95) | Membrane binding | |||||
Aggregation |
aThe numbering includes a 20-residue N-terminal tag.
bAn IDP (P) versus an IDR (R).
cResidue numbers are taken from the original report.
dPopulation of PreSMos are read from the mid-point of the SSP scores that are calculated from chemical shifts in BMRB or literature. Shown in bold are the populations described in the original report. When the populations described in the original report without SSP scores differed significantly from the calculated SSP scores, the SSP scores are provided in parenthesis.
NA = not available..
Jee-Young Lee, Sung-Ah Lee, Jin-Kyoung Kim, Chi-Bom Chae, and Yangmee Kim
Mol. Cells 2009; 27(6): 651-656 https://doi.org/10.1007/s10059-009-0086-zJiun Kim, Sung-Yu Hong, Hye-seo Park, Doo-Sik Kim, Weontae Lee
Mol. Cells 2005; 19(2): 205-211Kyung-Doo Han, Sung-Jean Park, Bong-Jin Lee
Mol. Cells 2005; 20(3): 442-445