TOP

Minireview

Split Viewer

Mol. Cells 2022; 45(7): 444-453

Published online June 27, 2022

https://doi.org/10.14348/molcells.2022.0035

© The Korean Society for Molecular and Cellular Biology

Quantitative Frameworks for Multivalent Macromolecular Interactions in Biological Linear Lattice Systems

Jaejun Choi1,2 , Ryeonghyeon Kim1,2 , and Junseock Koh1,*

1School of Biological Sciences, Seoul National University, Seoul 08826, Korea, 2These authors contributed equally to this work.

Correspondence to : junseockkoh@snu.ac.kr

Received: March 6, 2022; Revised: March 27, 2022; Accepted: March 28, 2022

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.

Multivalent macromolecular interactions underlie dynamic regulation of diverse biological processes in ever-changing cellular states. These interactions often involve binding of multiple proteins to a linear lattice including intrinsically disordered proteins and the chromosomal DNA with many repeating recognition motifs. Quantitative understanding of such multivalent interactions on a linear lattice is crucial for exploring their unique regulatory potentials in the cellular processes. In this review, the distinctive molecular features of the linear lattice system are first discussed with a particular focus on the overlapping nature of potential protein binding sites within a lattice. Then, we introduce two general quantitative frameworks, combinatorial and conditional probability models, dealing with the overlap problem and relating the binding parameters to the experimentally measurable properties of the linear lattice-protein interactions. To this end, we present two specific examples where the quantitative models have been applied and further extended to provide biological insights into specific cellular processes. In the first case, the conditional probability model was extended to highlight the significant impact of nonspecific binding of transcription factors to the chromosomal DNA on gene-specific transcriptional activities. The second case presents the recently developed combinatorial models to unravel the complex organization of target protein binding sites within an intrinsically disordered region (IDR) of a nucleoporin. In particular, these models have suggested a unique function of IDRs as a molecular switch coupling distinct cellular processes. The quantitative models reviewed here are envisioned to further advance for dissection and functional studies of more complex systems including phase-separated biomolecular condensates.

Recent advances in cutting-edge biotechnologies have provided opportunities to observe unprecedented molecular details of various biological processes (Ha et al., 2022; Mahamid et al., 2016; Oikonomou and Jensen, 2017; Sigal et al., 2018). Interpretation of such observations requires quantitative models dissecting the underlying macromolecular interactions. In turn, the quantitative information allows further understanding and prediction of spatiotemporal regulation of specific cellular processes in dynamically changing environments. The complexity of macromolecular interactions ranges from simple 1:1 binding to formation of phase-separated condensates with multivalent binding among two or more components (Banani et al., 2017; Lyon et al., 2021; Shin and Brangwynne, 2017). In contrast to the 1:1 binding, multivalent interactions are difficult to describe with the simple mass action law but modeled with more sophisticated frameworks accounting for the presence of various molecular states (Bujalowski, 2006; Freire et al., 2009; Wyman and Gill, 1990). Furthermore, the quantitative models are often formulated with large numbers of parameters, and exemplary cases determining these parameters with suitable in vitro model systems and methods are exceedingly rare.

A linear or one-dimensional lattice is a relatively tractable multivalent system found in numerous cellular processes. Linear lattices present multiple binding motifs or domains to interact with diverse proteins or multiple copies of identical proteins (Fig. 1) (Cortese et al., 2008; Dunker et al., 2005; Fung et al., 2018). For instance, in many signaling pathways, scaffold proteins such as axin, BRCA1, and Ste5 recruit various target proteins via specific binding sites (Choi et al., 1994; Mark et al., 2005; Wodarz and Nusse, 1998). These scaffold-driven higher-order assemblies are predicted to colocalize and increase the local concentrations of the target proteins and thereby facilitate their interactions for efficient integration and propagation of diverse signals in the cell (Fig. 1A) (Noutsou et al., 2011; Xue et al., 2013). Another example is the intrinsically disordered regions (IDRs) of some nucleoporins (Nups) present in the nuclear pore complex (NPC) (Fig. 1B) (Frey and Gorlich, 2007; Radu et al., 1995). The Nup IDRs mediate massive yet selective molecular transport between the nucleus and cytoplasm through specific interactions with karyopherin (Kap) proteins carrying macromolecular cargos (Koh and Blobel, 2015; Schoch et al., 2012). These interactions are achieved by multiple interspersed phenylalanine-glycine (FG) motifs on an IDR capturing several Kap molecules (Bayliss et al., 2000).

Finally, nucleic acids are the most prominent linear lattice systems in the cell. In particular, the chromosomal DNA presents the enormous amount of repeating phosphate groups along its backbone, creating electrostatic potentials for nonspecific protein-DNA interactions (Fig. 1C) (Berg et al., 1981; Stracy et al., 2021). Such polyelectrolyte effect is a major driving force (Lohman et al., 1980; Record et al., 1976), particularly at low salt concentrations, for formation of nucleosomes (Shrader and Crothers, 1989; Widom, 1999) as well as for binding of chromatin architectural proteins such as HMG (high mobility group)-box proteins with little specificities for DNA base sequences (Dragan et al., 2004). Even specific DNA binding proteins typically engage their cationic amino acid side chains to neutralize DNA phosphate charges (Jen-Jacobson et al., 2000; Privalov et al., 2011). Thus, these proteins are expected to interact with nonspecific sites that are present in overwhelming excess over specific site in the chromosomal context. In addition, as the copy numbers of many transcription factors (TFs) are considered greater than those of their corresponding specific binding sites on DNA, the majority of these factors may exist in vivo as nonspecifically bound states (Bintu et al., 2005; Kao-Huang et al., 1977). The physiological impact of the nonspecific protein-DNA interaction is substantial as demonstrated in the classical study by the von Hippel group (von Hippel et al., 1974) as well as in the recent seminal work by the Phillips group (Brewster et al., 2014). Both groups used Escherichia coli lac repressor as a model system to investigate the interplay among the copy numbers of TFs and their binding sites on DNA, the specificity ratio, and the inducer binding affinity in bacterial gene expression. The quantitative models proposed in these studies accurately described and predicted the expression profiles of the genes under the repressor regulation by incorporating nonspecific protein-DNA interactions as a “sink” for RNA polymerase and lac repressor.

Taken together, numerous protein-protein and protein-nucleic acid interactions can be perceived as multivalent interactions mediated by linear lattices. Thus, quantitative models for linear lattice systems are indispensable in understanding a broad range of biological processes and may be further extended to dissect more complex systems including phase-separated biomolecular condensates. In this review, we go over two general mathematical frameworks, combinatorial and conditional probability models, for quantitative description of linear lattices. Prior to the detailed derivation of these models, the molecular features of multivalent interactions on a linear lattice will be qualitatively discussed in light of how they are fundamentally different from 1:1 binding or discrete-site systems. The derivation will be supplemented in Supplementary Information with some detailed mathematical procedures omitted but not immediately evident in the original articles. In the end, a couple of practical examples will be discussed where the models have been further extended and applied to highlight their physiological significance. The alternative methods of sequence generating functions and transfer matrix may be referred to the original and case studies for handling multiple binding modes, heterogeneous lattices, and lattice conformational changes (Bujalowski et al., 1989; Lifson, 1964; Schellman, 1974; Teif, 2007).

It is straightforward to derive the quantitative models for the linear lattices that utilize discrete regions or domains to bind multiple distinct target proteins with the interaction stoichiometry of 1:1 for each target. In the absence of cooperativity among bound targets, the binding of each target can be handled, independent of binding of other targets, by the simple mass action law yielding a quadratic equation as a function of total concentrations of the lattice and the corresponding target. An advanced model has been derived by constructing a partition function for a linear lattice with cooperativities among bound targets (Cho et al., 2021).

Complexity arises when a target protein occupies two or more binding motifs on a linear lattice. We consider a linear lattice with a total of M motifs and a target protein occluding n consecutive motifs (Fig. 2) (Epstein, 1978; McGhee and von Hippel, 1974). The binding motif can be any repeating unit including a base-pair or phosphate on DNA and a short peptide motif or a PTM (post-translational modification) moiety on an intrinsically disordered protein (IDP). As DNA or proteins have particular directions in denoting their motifs (5’ to 3’ end or N to C-terminus), target proteins are assumed to be polar as well in recognizing the motifs. It is further assumed that there is no partial binding where a target protein occludes less than n motifs. Then, the target binding stoichiometry (N) is the greatest integer less than or equal to M/n (N = [M/n]). A fundamental nature of the linear lattice system becomes evident when a target protein binds to a naked lattice (left panel in Fig. 2A). Because the target protein occupies n consecutive motifs, any motifs except the rightmost n-1 positions can be starting points for target binding. Thus, potential target binding sites overlap and the number of such overlapping sites equals M n + 1, obviously greater than the stoichiometry [M/n]. In contrast, for a conventional system in which a target protein binds discrete and isolated sites (right panel in Fig. 2A), the number of binding sites is simply equal to the stoichiometry N = [M/n].

As the linear lattice subsequently binds more target proteins, its overlapping nature generates additional features further deviating from the discrete-site system. The number of potential binding sites eliminated upon binding of a protein depends on where the protein occupies on the lattice. When a protein binds to a gap exactly n motifs long between prebound proteins on the lattice, only one potential site is removed. Instead, if a gap is longer than 3n – 2 motifs, binding of a protein to this region can eliminate as many as 2n 1 sites. For instance, binding of a protein with the site size of n = 3 to the three leftmost motifs on a linear lattice with a total of nine motifs eliminates three potential binding sites (the second figure in the left panel of Fig. 2B). Alternatively, if the protein occupies the three motifs at the center of the lattice, five potential binding sites are eliminated (the third figure in the left panel of Fig. 2B). However, in the discrete-site system, protein binding invariably eliminates only one potential binding site (right panel in Fig. 2B). Finally, it is difficult to completely saturate the linear lattice since the overlapping protein binding increasingly accumulates gaps with less than n motifs that are futile for binding. This point is explicitly illustrated in Fig. 2C (left panel) listing all possible configurations of the linear lattice with [M/n] – 1 proteins bound. Among them, many are futile configurations with the n free (unoccupied) motifs scattered over the lattice and must rearrange the bound proteins to create a site with n consecutive motifs for the last protein binding. Such a rearrangement or reduction in number of lattice configurations corresponds to a loss of mixing entropy, culminating in apparent negative cooperativity among bound proteins. In contrast, the number of available binding sites is independent of the configuration of bound proteins in the discrete-site system (right panel in Fig. 2C). In summary, because of the overlapping nature of multivalent linear lattice-target interactions, a linear lattice initially presents binding sites greater than the stoichiometry and thereby enhances protein binding as compared to a discrete-site system. However, with density of bound proteins increased, the effect of the overlapping binding is reversed, attenuating saturation of the linear lattice.

The following sections review the quantitative models penetrating the overlap problem of the linear lattice to yield the mathematical formulations relating the binding parameters to experimentally measurable properties of the lattice-target interactions. A core element of each model is the computation of the number of possible configurations for a given density of bound proteins on a lattice.

A complete set of parameters for description of linear lattice-protein interactions consists of the binding stoichiometry (N), binding constant (K), and cooperativity (ω) among bound proteins. As discussed above, the binding stoichiometry (N) is determined by the numbers of all motifs on a lattice (M) and those occupied by a target protein (n, termed site size) (N = [M/n]). The binding constant (K) corresponds to the affinity between a protein and a site n motifs long. Cooperativity can arise from pairwise interactions between any two proteins bound to a linear lattice. Although there are in principle iC2 pairs on a lattice with i (≥2) proteins bound, the models discussed in this review formulate cooperativity only for the interaction between nearest neighbors (i.e., a pair of contiguously bound proteins without any intervening free motifs). Thus, the cooperativity parameter (ω) is equivalent to an equilibrium constant for formation of a direct “contact point” between a pair of bound proteins. Then, under these definitions, a linear lattice presents three distinct types of protein binding sites (Fig. 3A): 1) an isolated site with the binding constant K; 2) a singly contiguous site with the binding constant Kω; 3) a doubly contiguous site with the binding constant Kω2. If ω > 1 (or 0 < ω < 1), the nearest neighbor interaction is favorable (or unfavorable) and the protein binding is positively (or negatively) cooperative. For ω = 1, bound proteins are independent of each other and the binding is noncooperative.

A fundamental relationship between the binding parameters and experimental variables can be derived by constructing a partition function for a linear lattice (Freire et al., 2009; Wyman and Gill, 1990). The partition function is a sum of relative probabilities or statistical weights of all possible protein-bound states of a linear lattice with a free lattice assigned as a reference state of unit relative probability (i.e., statistical weight = 1). Then, the statistical weight of a lattice with i proteins bound and j contact points among them is given by (K[P])iωj where [P] is the free protein concentration. However, in order to account for the presence of multiple configurations for a given set of (i, j), the statistical weight must be multiplied by the degeneracy term PM(i, j), the number of distinct ways to distribute i proteins on a lattice with M motifs and j contact points. Then, the partition function (Z) is given by the following equations:

Z=i=0Nj=0i1PMi,jKPiωj

The average number of proteins bound per lattice (or binding density, ν), which is a principal quantity to be measured in all binding experiments, can be formulated from the partition function:

υ=lnZlnP=i=0Nj=0i1iPMi,jKPiωj×i=0Nj=0i1PMi,jKP iωj1

Likewise, the average number of contact points per lattice can be calculated from a partial derivative of the partition function:

j¯=lnZlnω=i=0Nj=0i1jPMi,jKPiωj×i=0Nj=0i1PMi,jKPiωj1

The final task in constructing the partition function is to derive the expression for PM(i, j). Here we follow the original combinatorial derivation of PM(i, j) (Epstein, 1978), highlighting the concept behind the mathematical procedures. A linear lattice with i proteins bound and j contact points may be dissected into two physical elements. The first element is a “run” defined as a distinct cluster of contiguously bound proteins, and the number of runs can be calculated as ij (Fig. 3B). Because there is at least one free motif between runs, each of the ij – 1 leftmost runs must be attached with a free motif on the right side. The second element is the remaining free motifs and there are Mni – (ij – 1) unattached free motifs (≡ Nu). Then, the number (≡ Nc) of ways of mixing these two elements to create the distinct lattice configurations equals the number of distributing ij runs (accompanied with the ij – 1 attached free motifs) and Nu unattached free motifs into Nu + ij slots (Fig. 3C):

Nc=Nu+ij!Nu!ij!

In this expression, all runs have been treated as identical elements, regardless of the actual number of bound proteins in each run. Therefore, in order to complete the derivation of PM(i, j), the function Nc must be multiplied by the number (≡ Np) of distinct ways to distribute i proteins into ij runs:

Np=i1!j!ij1!

The equation Np is mathematically equivalent to the number of partitions of the integer i into ij positive integers. Finally, PM(i, j) is derived as the following equation:

PMi,j=NcNp=Mni+1!i1!Mnii+j+1!ij!j!ij1!

For noncooperative binding (ω = 1), the number of contact points j becomes irrelevant and PM(i, j) reduces to PM(i), the number of ways of mixing i proteins and Mni free motifs to build distinct lattice configurations:

PMi=Mni+i!Mni!i!

Then, the partition function for noncooperative binding can be written in a simplified form:

Z=i=0NPMiKPi

In practice, the total lattice and protein concentrations ([L]tot and [P]tot), rather than the free protein concentration ([P]), are known experimental variables. The total concentrations are related to each other and other binding parameters through a simple mass balance equation:

Ptot=P+υLtot

For a given set of binding parameters and reactant concentrations, this mass balance equation can be solved for [P] by the numerical procedures such as the Newton-Raphson and the bisection method (Hamming, 1986). In turn, this solution allows calculation of the relative probabilities of all lattice configurations and the ensemble-averaged quantities including Eqs. 2 and 3. Thus, the combinatorial method is straightforward and intuitive in constructing a partition function which illustrates distribution among various protein-bound states of a linear lattice as a function of lattice and protein concentrations. However, this method is difficult to apply to a very long linear lattice (i.e., M >> n) because the number of possible lattice configurations may be too large and potentially cause an overflow problem in computation.

Several quantitative frameworks have been proposed to treat an “infinitely” long linear lattice (M >> n), particularly relevant for proteins nonspecifically binding the chromosomal DNA. Among these frameworks, we review the conditional probability model originally presented in the seminal work by McGhee and von Hippel (1974). In this model, the conditional probabilities have been formulated for the particular states (free or bound) of two consecutive motifs on a linear lattice. For instance, the conditional probability ff (or fb1) is defined as, given a randomly chosen free motif, the probability of the subsequent righthand side motif being free (or bound by the left end of a protein) (Supplementary Fig. S1). In addition, the conditional probability bnf (or bnb1) is defined as, given a motif bound by the right end of a protein, the probability of the subsequent motif being free (or bound by the left end of a protein) (Supplementary Fig. S1). The conditional probabilities were then used to derive an expression for the average number of free binding sites per lattice at a given binding density. This elegant approach yielded a modified form of the Scatchard equation:

υP=K·Average number of free binding sites per lattice θP=K1nθ2ω11nθ+θR2ω11nθn11n+1θ+R21nθ2 R= 1n+1θ2+4ωθ1nθ

where θ corresponds to the average number of proteins bound per motif (i.e., θ = ν/M).

Referring to Supplementary Information for the detailed mathematical procedures of the derivation, we focus on a few intuitive limiting cases leading to the interpretations of this equation consistent with the molecular features of the linear lattice system (McGhee and von Hippel, 1974).

1) In the case of ω = 1 (noncooperative binding), by using L’Hospital’s rule, the equation can be reduced to the following (see Supplementary Information for detailed mathematical procedures):

limω1 K1nθ2ω11nθ+θR2ω11nθn11n+1θ+R21nθ2=K1nθ 1nθ 1 n1θ n1

Note that, for n = 1 (no overlap between bound proteins), the equation further reduces to the original Scatchard equation θ/[L] = K (1 – θ) in which the term (1 – θ) simply represents the fraction of free motifs. Because the squared bracket term in Eq. 11 is always less than unity for n ≥ 2, the fraction of free motifs competent for binding is smaller than the total fraction of free motifs 1 – nθ. Therefore, this result quantitatively supports that, even without genuine interactions among bound proteins (i.e., ω = 1), apparent negative cooperativity arises from the overlap among potential binding sites and consequent futile gaps shorter than n motifs.

2) In the case of ω = 0 (infinite negative-cooperativity), Eq. 10b reduces to the following expression:

θP=K1n+1θ1n+1θ1nθn

This reduced form simply corresponds to Eq. 11 with n = n + 1. The increased binding site size demonstrates that, if the interaction between bound proteins is extremely unfavorable, there is apparently no contact point between any adjacently bound proteins. Instead, they are separated by a persistent free motif. This result clearly demonstrates the fundamental relationship between binding site size and cooperativity.

3) Further insight can be provided at the molecular level from the partial derivatives of Eqs. 10b and 11 with respect to θ at the limiting condition of θ → 0 (see Supplementary Information for detailed mathematical procedures):

θ/Pθθ=0=υ/Pυυ=0=K2ω2n1

Based on Eq. 10a, the partial derivative can be interpreted as a net change in the average numbers of all three types (Fig. 3A) of binding sites, weighted by their corresponding binding constants, upon binding of one protein to a naked (ν = 0) lattice. As illustrated in Fig. 2B, the binding of a protein to a sufficiently long region eliminates a total of 2n – 1 potential binding sites. In addition, the binding converts the two adjacent isolated binding sites into two singly contiguous binding sites (2∙Kω). Hence, a total of (2n – 1) + 2 isolated binding sites has been eliminated (– (2n + 1)∙K). Likewise, the partial derivative of Eq. 11 at θ → 0 is given by:

θ/Pθθ=0=K2n1

Therefore, in the noncooperative case, the binding of one ligand to a naked lattice simply eliminates 2n – 1 potential binding sites.

Taken together, although the conditional probability method is based on the different conceptual framework as compared to the combinatorial approach, the final formulation provides intuitive interpretations fully consistent with the molecular features of the linear lattice systems. In practice, Eq. 10b is rearranged and incorporated into a mass balance equation relating the binding parameters to the total concentrations of lattice motif and protein ([M]tot and [P]tot):

P=θK1nθffn1C2 ff=2ω11nθ+θR2ω11nθ C=1n+1θ+R21nθ Ptot=P+θMtot Ptot=θK1nθffn1C2+θMtot

Eq. 15e can be numerically solved for θ at given values of [M]tot and [P]tot. When interactions of proteins with short linear lattices (e.g., DNA oligomers) are analyzed, the equation can be partially corrected for the assumption of infinite lattice length by applying an “end effect” constant, (Mn + 1) / M, to the term ff n-1 (Tsodikov et al., 2001).

Competition among multiple binding modes in protein-nucleic acid interactions

Spatiotemporal regulation of transcription is achieved by interactions between TFs and their specific binding sites on DNA. Because of the enormous number of nonspecific sites on the chromosomal DNA, binding of TFs to these regions must be taken into account to accurately predict the occupancy of the specific sites and thereby the transcription profiles of the corresponding genes (Brewster et al., 2014; von Hippel et al., 1974). In order to recapitulate the essential features of the competition between specific and nonspecific DNA binding, the conditional probability model was extended and applied to a hypothetical two-component (TF and infinitely long DNA with a few embedded specific sites) system. While the 1:1 interaction between TF and a specific site is fully described by the binding constant Ksp, the nonspecific binding is characterized by the binding site size n (in base-pairs), the binding constant Kns, and the cooperativity parameter ω. Then, combining Eq. 10b with the mass-action law for the 1:1 specific binding, the TF concentrations of free, specifically, and nonspecifically bound forms ([TF], [TF]sp,b, [TF]ns,bcan be derived as the following equations:) can be derived as the following equations:

TF=θKns1nθffn1C2 [TF]sp,b=Ksp[Dsp ]totTF1+KspTF TF]ns,b=θM]tot

where [Dsp]tot and [M]tot are the total concentrations of the specific site and the nonspecific binding motif (base-pair), respectively. Substituting Eq. 16a for [TF] in Eq. 16b, the mass balance equation for the total TF concentration ([TF]tot = [TF] + [TF]sp,b + [TF]ns,b) can be numerically solved for θ. The final outcome of the calculation is the fractional occupancy of the specific site (Ysp = [TF]sp,b/[Dsp]tot) as a function of total concentration ratio between TF and the specific site ([TF]tot/[Dsp]tot ranging from 0 to 10) (upper panels in Figs. 4A and 4B). In the calculation, the ratio Ksp/Kns (termed specificity ratio) (Fig. 4A) or the total nonspecific motif concentrations (Fig. 4B) was varied over orders of magnitude while the nonspecific binding site size and cooperativity were fixed at the constant values for simplicity (n = 10, ω = 1).

At a given specificity ratio and a total motif concentration, as the concentration ratio [TF]tot/[Dsp]tot is increased, the fractional occupancy of the specific site by TF monotonically increases with an apparent hyperbolic feature (upper panels in Figs. 4A and 4B). However, the underlying distribution of TF exhibits a dynamic shift from specifically to nonspecifically bound states (bottom panels in Figs. 4A and 4B). For higher specificity ratio or lower nonspecific motif concentrations, the specific complex is predominant in the regime [TF]tot/[Dsp]tot < 1, leading to a steep rise in occupancy of the specific site. Consequently, the transition to the nonspecifically bound state is achieved at higher concentration ratio. Therefore, under these conditions, a relatively small amount of TF is required to saturate the specific site and thereby fully activate transcription. Conversely, for lower specificity ratio or higher nonspecific motif concentrations, the nonspecific binding significantly competes with the specific binding even at low [TF]tot/[Dsp]tot (bottom panels in Figs. 4A and 4B), attenuating saturation of the specific site (upper panels in Figs. 4A and 4B). These simulations suggest that, since protein-DNA interactions are generally sensitive to many cellular conditions such as salt concentration and osmotic stress, changes in these variables potentially fine-tune the specificity ratio of TFs and thereby the corresponding transcription levels. Furthermore, a change in chromosome packing may indirectly affect the TF-specific site interaction by altering the nonspecific site concentrations. Taken together, nonspecific protein-DNA interactions, via change in either specificity ratio or abundance of nonspecific sites, can modulate the occupancies of specific TF binding sites and consequently reprogram the gene-specific transcriptional activities.

Competitions between specific and nonspecific binding or among multiple nonspecific binding modes have been observed in numerous in vitro protein-DNA interactions as well (Bujalowski et al., 1988; Rajendran et al., 1998). Even studies using short oligonucleotides have shown similar competitions due to significantly low specificity ratios (Holbrook et al., 2001; Koh et al., 2008). In order to accurately determine a specific binding constant, the linear lattice models must be applied or further advanced to tease apart the contributions from multiple binding modes to the observed binding signal (Tsodikov et al., 2001).

Competition among distinct target proteins for binding to an intrinsically disordered protein

IDPs often utilize short peptide motifs to recruit multiple distinct targets or multiple copies of an identical target (Cumberworth et al., 2013; Hong et al., 2020; Wright and Dyson, 2015). These IDPs are collectively termed hubs and involved in signal transduction and macromolecular transport. A representative example is Nup153, a subunit of the NPC, that contains a long C-terminal IDR (~600 amino acids in length) (Krull et al., 2004). The IDR presents multiple FG-motifs to interact with Kaps carrying macromolecular cargos into and out of the nucleus. Multiple hydrophobic pockets on the Kap surface are the primary binding sites for the FG-motifs (Bayliss et al., 2000).

A recent thermodynamic study has developed an advanced combinatorial model to demonstrate that the Nup153 IDR comprises a high-affinity 1:1 binding site and a series of low-affinity sites for binding of multiple Kaps (Fig. 4C) (Cho et al., 2021). Calorimetric data of various protein concentrations and IDR lengths were scrutinized to further show that the overlapping binding of Kaps to the low-affinity sites results in apparent negative cooperativity. Because the Nup153 IDR potentially interacts with nuclear proteins involved in transcription and chromatin organization (Kadota et al., 2020; Kasper et al., 1999), this study has constructed composite combinatorial models to test how the multivalent Kap binding would be affected by competitive binding of nuclear proteins (Fig. 4C). Remarkably, the simulation has revealed that the Kap occupancy of the low-affinity region can be fine-tuned by changing the location of the competitor binding site (Fig. 4C). This delicate modulation arises from the molecular feature of the overlapping binding: The number of potential Kap binding sites eliminated by the competition is determined by the position of the competitor binding site (Fig. 2B). Therefore, assuming that the Kap occupancy is a proxy for the transport activity of the NPC, it is conceivable that the Nup153 IDR functions as a molecular switch coupling specific nuclear processes to distinct transport states. For instance, a strong promoter may be coupled to the NPC activity in such a way that specific TFs or co-activators associated with the strong promoter target a location in the Nup153 IDR that considerably reduces the Kap occupancy (Fig. 4D). As a consequence of the reduced general transport activity mediated by Kaps, a large amount of mRNA transcribed from the strong promoter may be efficiently exported through the NPC (Fig. 4D). Although awaiting experimental validation, the coupling mechanism built upon multivalent, overlapping IDP-target interactions may contribute to the functional versatility of the IDP hubs in dynamic cellular processes. This exemplary study demonstrates that the original combinatorial model can be readily expanded by simple mathematical operations to account for additional complexities in linear lattice-protein interactions including heterogeneous binding sites.

Linear lattice systems and their multivalent interactions with target proteins often regulate dynamic cellular processes. Because of the overlapping target binding sites on a linear lattice, quantitative understanding of such interactions requires a fundamentally different framework as compared to simple 1:1 binding or discrete-site systems. In this review, we discussed the two prevalent approaches in unraveling the linear lattice systems, namely combinatorial and conditional probability models. Constructing the lattice partition functions from the combinatorial approach is straightforward and readily expandable in data analysis and predictions as illustrated in the Nup153 IDR–Kap interaction. On the other hand, the conditional probability model provides invaluable physical insights consistent with the molecular features of the multivalent linear lattice–target interactions. Furthermore, this method is suitable in simulating in vivo nucleic acid systems of apparent infinite lattice length. These frameworks may serve as a cornerstone to develop sophisticated models to analyze more complex cellular processes including competition among multiple DNA binding proteins on nucleosomal DNA (Segal and Widom, 2009) as well as formation of phase-separated condensates involving multiple components (Lyon et al., 2021).

Fig. 1. Schematic illustration of the representative linear lattice systems in cellular processes. (A) Scaffold proteins recruiting diverse binding partners in signal transduction. (B) IDRs in the NPC binding multiple Kaps in nucleocytoplasmic transport. (C) Nonspecific sites on the chromosomal DNA for transcription factor binding.
Fig. 2. Molecular features of multivalent interactions on a linear lattice (M = 9) where a protein occupies any n (= 3) consecutive motifs. (A) The number of potential overlapping binding sites on a naked lattice (left panel) is greater as compared to a discrete-site system (right panel) with the same stoichiometry (N). (B) The number of potential binding sites eliminated upon binding of a protein depends on where the protein occupies on the lattice (left panel). In contrast, binding of a protein to the discrete-site system invariably eliminates only one potential binding site (right panel). (C) Possible configurations of the linear lattice with two proteins bound (left panel). Many configurations are futile for the last protein binding, resulting in apparent negative cooperativity among bound proteins. In contrast, all corresponding configurations in the discrete-site system are competent for binding (right panel).
Fig. 3. Calculation of the number of distinct lattice configurations with i proteins bound and j contact points. (A) Three distinct types of protein binding sites on a linear lattice and the definitions of K and ω. (B) Dissection of a linear lattice into two distinct physical elements, runs and unattached free motifs. The ij – 1 leftmost runs are attached at their righthand end with a free motif (termed attached free motif). (C) Creation of the distinct lattice configurations by combining the two elements.
Fig. 4. Application and extension of the quantitative models for linear lattice systems. (A and B) Effects of nonspecific protein-DNA interactions on transcription. Upper panels: Using an extended conditional probability model (Eq. 16), the fractional occupancy of specific DNA sites (Ysp = [TF]sp,b/[Dsp]tot) for binding of a hypothetical TF was calculated as a function of molar ratio [TF]tot/[Dsp]tot for various sets of interaction parameters. Bottom panels: The corresponding fractional distribution of TF between specifically (solid curves) and nonspecifically (dashed curves) bound states were calculated. In these calculations, the value of Kns (A) or the concentration of nonspecific motifs ([M]tot) (B) was varied with the fixed values of Ksp = 1 × 1012 M-1, n = 10 bp, and ω = 1 ([M]tot = 5 mM in (A); Kns = 1 × 105 M–1 in (B)). (C) Quantitative model for assembly of the Nup153 IDR hub with multiple interaction partners and competitors (adapted from Cho et al., 2021). The Nup153 IDR presents a high-affinity 1:1 Kap binding site (purple) and a series of low-affinity sites for overlapping binding of multiple Kaps. Kap occupies multiple dipeptide (FG) motifs (pink vertical bars). Using advanced combinatorial models, fine-tuning of the Kap occupancy of Nup153 IDR was predicted as a function of location of the competitor binding site. In the partition function Z, Z0 corresponds to the partition function of the Nup153 IDR in the absence of competition; Kc[C] represents the competitor binding; The terms in the brackets are the partition functions for two subregions of the low-affinity sites separated by the competitor binding; (1 + Ks[P]) represents the 1:1 interaction of Kap with the high-affinity site. (D) On the basis of the multivalent, overlapping IDR-Kap interaction, the Nup153 IDR is proposed to function as a molecular switch to couple nucleocytoplasmic transport to transcription.
  1. Banani S.F., Lee H.O., Hyman A.A., and Rosen M.K. (2017). Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 18, 285-298.
    Pubmed KoreaMed CrossRef
  2. Bayliss R., Littlewood T., and Stewart M. (2000). Structural basis for the interaction between FxFG nucleoporin repeats and importin-beta in nuclear trafficking. Cell 102, 99-108.
    Pubmed CrossRef
  3. Berg O.G., Winter R.B., and von Hippel P.H. (1981). Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and theory. Biochemistry 20, 6929-6948.
    Pubmed CrossRef
  4. Bintu L., Buchler N.E., Garcia H.G., Gerland U., Hwa T., Kondev J., Kuhlman T., and Phillips R. (2005). Transcriptional regulation by the numbers: applications. Curr. Opin. Genet. Dev. 15, 125-135.
    Pubmed KoreaMed CrossRef
  5. Brewster R.C., Weinert F.M., Garcia H.G., Song D., Rydenfelt M., and Phillips R. (2014). The transcription factor titration effect dictates level of gene expression. Cell 156, 1312-1323.
    Pubmed KoreaMed CrossRef
  6. Bujalowski W. (2006). Thermodynamic and kinetic methods of analyses of protein-nucleic acid interactions. From simpler to more complex systems. Chem. Rev. 106, 556-606.
    Pubmed CrossRef
  7. Bujalowski W., Lohman T.M., and Anderson C.F. (1989). On the cooperative binding of large ligands to a one-dimensional homogeneous lattice: the generalized three-state lattice model. Biopolymers 28, 1637-1643.
    Pubmed CrossRef
  8. Bujalowski W., Overman L.B., and Lohman T.M. (1988). Binding mode transitions of Escherichia coli single strand binding protein-single-stranded DNA complexes. Cation, anion, pH, and binding density effects. J. Biol. Chem. 263, 4629-4640.
    Pubmed CrossRef
  9. Cho B., Choi J., Kim R., Yun J.N., Choi Y., Lee H.H., and Koh J. (2021). Thermodynamic models for assembly of intrinsically disordered protein hubs with multiple interaction partners. J. Am. Chem. Soc. 143, 12509-12523.
    Pubmed CrossRef
  10. Choi K.Y., Satterberg B., Lyons D.M., and Elion E.A. (1994). Ste5 tethers multiple protein kinases in the MAP kinase cascade required for mating in S. cerevisiae. Cell 78, 499-512.
    Pubmed CrossRef
  11. Cortese M.S., Uversky V.N., and Dunker A.K. (2008). Intrinsic disorder in scaffold proteins: getting more from less. Prog. Biophys. Mol. Biol. 98, 85-106.
    Pubmed KoreaMed CrossRef
  12. Cumberworth A., Lamour G., Babu M.M., and Gsponer J. (2013). Promiscuity as a functional trait: intrinsically disordered regions as central players of interactomes. Biochem. J. 454, 361-369.
    Pubmed CrossRef
  13. Dragan A.I., Read C.M., Makeyeva E.N., Milgotina E.I., Churchill M.E., Crane-Robinson C., and Privalov P.L. (2004). DNA binding and bending by HMG boxes: energetic determinants of specificity. J. Mol. Biol. 343, 371-393.
    Pubmed CrossRef
  14. Dunker A.K., Cortese M.S., Romero P., Iakoucheva L.M., and Uversky V.N. (2005). Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 272, 5129-5148.
    Pubmed CrossRef
  15. Epstein I.R. (1978). Cooperative and non-cooperative binding of large ligands to a finite one-dimensional lattice. A model for ligand-oligonucleotide interactions. Biophys. Chem. 8, 327-339.
    Pubmed CrossRef
  16. Freire E., Schon A., and Velazquez-Campoy A. (2009). Isothermal titration calorimetry: general formalism using binding polynomials. Methods Enzymol. 455, 127-155.
    Pubmed CrossRef
  17. Frey S. and Gorlich D. (2007). A saturated FG-repeat hydrogel can reproduce the permeability properties of nuclear pore complexes. Cell 130, 512-523.
    Pubmed CrossRef
  18. Fung H.Y.J., Birol M., and Rhoades E. (2018). IDPs in macromolecular complexes: the roles of multivalent interactions in diverse assemblies. Curr. Opin. Struct. Biol. 49, 36-43.
    Pubmed KoreaMed CrossRef
  19. Ha T., Kaiser C., Myong S., Wu B., and Xiao J. (2022). Next generation single-molecule techniques: imaging, labeling, and manipulation in vitro and in cellulo. Mol. Cell 82, 304-314.
    Pubmed CrossRef
  20. Hamming R.W. .
  21. Holbrook J.A., Tsodikov O.V., Saecker R.M., and Record M.T. Jr. (2001). Specific and non-specific interactions of integration host factor with DNA: thermodynamic evidence for disruption of multiple IHF surface salt-bridges coupled to DNA binding. J. Mol. Biol. 310, 379-401.
    Pubmed CrossRef
  22. Hong S., Choi S., Kim R., and Koh J. (2020). Mechanisms of macromolecular interactions mediated by protein intrinsic disorder. Mol. Cells 43, 899-908.
    Pubmed KoreaMed CrossRef
  23. Jen-Jacobson L., Engler L.E., and Jacobson L.A. (2000). Structural and thermodynamic strategies for site-specific DNA binding proteins. Structure 8, 1015-1023.
    Pubmed CrossRef
  24. Kadota S., Ou J., Shi Y., Lee J.T., Sun J., and Yildirim E. (2020). Nucleoporin 153 links nuclear pore complex to chromatin architecture by mediating CTCF and cohesin binding. Nat. Commun. 11, 2606.
    Pubmed KoreaMed CrossRef
  25. Kao-Huang Y., Revzin A., Butler A.P., O'Conner P., Noble D.W., and von Hippel P.H. (1977). Nonspecific DNA binding of genome-regulating proteins as a biological control mechanism: measurement of DNA-bound Escherichia coli lac repressor in vivo. Proc. Natl. Acad. Sci. U. S. A. 74, 4228-4232.
    Pubmed KoreaMed CrossRef
  26. Kasper L.H., Brindle P.K., Schnabel C.A., Pritchard C.E., Cleary M.L., and van Deursen J.M. (1999). CREB binding protein interacts with nucleoporin-specific FG repeats that activate transcription and mediate NUP98-HOXA9 oncogenicity. Mol. Cell. Biol. 19, 764-776.
    Pubmed KoreaMed CrossRef
  27. Koh J. and Blobel G. (2015). Allosteric regulation in gating the central channel of the nuclear pore complex. Cell 161, 1361-1373.
    Pubmed CrossRef
  28. Koh J., Saecker R.M., and Record M.T. Jr. (2008). DNA binding mode transitions of Escherichia coli HU(alphabeta): evidence for formation of a bent DNA--protein complex on intact, linear duplex DNA. J. Mol. Biol. 383, 324-346.
    Pubmed KoreaMed CrossRef
  29. Krull S., Thyberg J., Bjorkroth B., Rackwitz H.R., and Cordes V.C. (2004). Nucleoporins as components of the nuclear pore complex core structure and Tpr as the architectural element of the nuclear basket. Mol. Biol. Cell 15, 4261-4277.
    Pubmed KoreaMed CrossRef
  30. Lifson S. (1964). Partition functions of linear-chain molecules. J. Chem. Phys. 40, 3705-3710.
    CrossRef
  31. Lohman T.M., deHaseth P.L., and Record M.T. Jr. (1980). Pentalysine-deoxyribonucleic acid interactions: a model for the general effects of ion concentrations on the interactions of proteins with nucleic acids. Biochemistry 19, 3522-3530.
    Pubmed CrossRef
  32. Lyon A.S., Peeples W.B., and Rosen M.K. (2021). A framework for understanding the functions of biomolecular condensates across scales. Nat. Rev. Mol. Cell Biol. 22, 215-235.
    Pubmed KoreaMed CrossRef
  33. Mahamid J., Pfeffer S., Schaffer M., Villa E., Danev R., Cuellar L.K., Forster F., Hyman A.A., Plitzko J.M., and Baumeister W. (2016). Visualizing the molecular sociology at the HeLa cell nuclear periphery. Science 351, 969-972.
    Pubmed CrossRef
  34. Mark W.Y., Liao J.C., Lu Y., Ayed A., Laister R., Szymczyna B., Chakrabartty A., and Arrowsmith C.H. (2005). Characterization of segments from the central region of BRCA1: an intrinsically disordered scaffold for multiple protein-protein and protein-DNA interactions? J. Mol. Biol. 345, 275-287.
    Pubmed CrossRef
  35. McGhee J.D. and von Hippel P.H. (1974). Theoretical aspects of DNA-protein interactions: co-operative and non-co-operative binding of large ligands to a one-dimensional homogeneous lattice. J. Mol. Biol. 86, 469-489.
    Pubmed CrossRef
  36. Noutsou M., Duarte A.M., Anvarian Z., Didenko T., Minde D.P., Kuper I., de Ridder I., Oikonomou C., Friedler A., and Boelens R., et al. (2011). Critical scaffolding regions of the tumor suppressor Axin1 are natively unfolded. J. Mol. Biol. 405, 773-786.
    Pubmed CrossRef
  37. Oikonomou C.M. and Jensen G.J. (2017). Cellular electron cryotomography: toward structural biology in situ. Annu. Rev. Biochem. 86, 873-896.
    Pubmed CrossRef
  38. Privalov P.L., Dragan A.I., and Crane-Robinson C. (2011). Interpreting protein/DNA interactions: distinguishing specific from non-specific and electrostatic from non-electrostatic components. Nucleic Acids Res. 39, 2483-2491.
    Pubmed KoreaMed CrossRef
  39. Radu A., Moore M.S., and Blobel G. (1995). The peptide repeat domain of nucleoporin Nup98 functions as a docking site in transport across the nuclear pore complex. Cell 81, 215-222.
    Pubmed CrossRef
  40. Rajendran S., Jezewska M.J., and Bujalowski W. (1998). Human DNA polymerase beta recognizes single-stranded DNA using two different binding modes. J. Biol. Chem. 273, 31021-31031.
    Pubmed CrossRef
  41. Record M.T. Jr., Lohman M.L. Jr., and De Haseth P. Jr. (1976). Ion effects on ligand-nucleic acid interactions. J. Mol. Biol. 107, 145-158.
    Pubmed CrossRef
  42. Schellman J.A. (1974). Cooperative multisite binding to DNA. Isr. J. Chem. 12, 219-238.
    CrossRef
  43. Schoch R.L., Kapinos L.E., and Lim R.Y. (2012). Nuclear transport receptor binding avidity triggers a self-healing collapse transition in FG-nucleoporin molecular brushes. Proc. Natl. Acad. Sci. U. S. A. 109, 16911-16916.
    Pubmed KoreaMed CrossRef
  44. Segal E. and Widom J. (2009). From DNA sequence to transcriptional behaviour: a quantitative approach. Nat. Rev. Genet. 10, 443-456.
    Pubmed KoreaMed CrossRef
  45. Shin Y. and Brangwynne C.P. (2017). Liquid phase condensation in cell physiology and disease. Science 357, eaaf4382.
    Pubmed CrossRef
  46. Shrader T.E. and Crothers D.M. (1989). Artificial nucleosome positioning sequences. Proc. Natl. Acad. Sci. U. S. A. 86, 7418-7422.
    Pubmed KoreaMed CrossRef
  47. Sigal Y.M., Zhou R., and Zhuang X. (2018). Visualizing and discovering cellular structures with super-resolution microscopy. Science 361, 880-887.
    Pubmed KoreaMed CrossRef
  48. Stracy M., Schweizer J., Sherratt D.J., Kapanidis A.N., Uphoff S., and Lesterlin C. (2021). Transient non-specific DNA binding dominates the target search of bacterial DNA-binding proteins. Mol. Cell 81, 1499-1514.e6.
    Pubmed KoreaMed CrossRef
  49. Teif V.B. (2007). General transfer matrix formalism to calculate DNA-protein-drug binding in gene regulation: application to OR operator of phage lambda. Nucleic Acids Res. 35, e80.
    Pubmed KoreaMed CrossRef
  50. Tsodikov O.V., Holbrook J.A., Shkel I.A., and Record M.T. Jr. (2001). Analytic binding isotherms describing competitive interactions of a protein ligand with specific and nonspecific sites on the same DNA oligomer. Biophys. J. 81, 1960-1969.
    Pubmed KoreaMed CrossRef
  51. von Hippel P.H., Revzin A., Gross C.A., and Wang A.C. (1974). Non-specific DNA binding of genome regulating proteins as a biological control mechanism: I. The lac operon: equilibrium aspects. Proc. Natl. Acad. Sci. U. S. A. 71, 4808-4812.
    Pubmed KoreaMed CrossRef
  52. Widom J. (1999). Equilibrium and dynamic nucleosome stability. Methods Mol. Biol. 119, 61-77.
    Pubmed CrossRef
  53. Wodarz A. and Nusse R. (1998). Mechanisms of Wnt signaling in development. Annu. Rev. Cell Dev. Biol. 14, 59-88.
    Pubmed CrossRef
  54. Wright P.E. and Dyson H.J. (2015). Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol. 16, 18-29.
    Pubmed KoreaMed CrossRef
  55. Wyman J. and Gill S.J. (Mill Valley, CA: University Science Books).
    CrossRef
  56. Xue B., Romero P.R., Noutsou M., Maurice M.M., Rudiger S.G., William A.M. Jr., Mizianty M.J. Jr., Kurgan L. Jr., Uversky V.N. Jr., and Dunker A.K. Jr. (2013). Stochastic machines as a colocalization mechanism for scaffold protein function. FEBS Lett. 587, 1587-1591.
    Pubmed KoreaMed CrossRef

Article

Minireview

Mol. Cells 2022; 45(7): 444-453

Published online July 31, 2022 https://doi.org/10.14348/molcells.2022.0035

Copyright © The Korean Society for Molecular and Cellular Biology.

Quantitative Frameworks for Multivalent Macromolecular Interactions in Biological Linear Lattice Systems

Jaejun Choi1,2 , Ryeonghyeon Kim1,2 , and Junseock Koh1,*

1School of Biological Sciences, Seoul National University, Seoul 08826, Korea, 2These authors contributed equally to this work.

Correspondence to:junseockkoh@snu.ac.kr

Received: March 6, 2022; Revised: March 27, 2022; Accepted: March 28, 2022

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.

Abstract

Multivalent macromolecular interactions underlie dynamic regulation of diverse biological processes in ever-changing cellular states. These interactions often involve binding of multiple proteins to a linear lattice including intrinsically disordered proteins and the chromosomal DNA with many repeating recognition motifs. Quantitative understanding of such multivalent interactions on a linear lattice is crucial for exploring their unique regulatory potentials in the cellular processes. In this review, the distinctive molecular features of the linear lattice system are first discussed with a particular focus on the overlapping nature of potential protein binding sites within a lattice. Then, we introduce two general quantitative frameworks, combinatorial and conditional probability models, dealing with the overlap problem and relating the binding parameters to the experimentally measurable properties of the linear lattice-protein interactions. To this end, we present two specific examples where the quantitative models have been applied and further extended to provide biological insights into specific cellular processes. In the first case, the conditional probability model was extended to highlight the significant impact of nonspecific binding of transcription factors to the chromosomal DNA on gene-specific transcriptional activities. The second case presents the recently developed combinatorial models to unravel the complex organization of target protein binding sites within an intrinsically disordered region (IDR) of a nucleoporin. In particular, these models have suggested a unique function of IDRs as a molecular switch coupling distinct cellular processes. The quantitative models reviewed here are envisioned to further advance for dissection and functional studies of more complex systems including phase-separated biomolecular condensates.

Keywords: biological linear lattice, combinatorial model, conditional probability model, multivalent binding, overlapping binding site

INTRODUCTION

Recent advances in cutting-edge biotechnologies have provided opportunities to observe unprecedented molecular details of various biological processes (Ha et al., 2022; Mahamid et al., 2016; Oikonomou and Jensen, 2017; Sigal et al., 2018). Interpretation of such observations requires quantitative models dissecting the underlying macromolecular interactions. In turn, the quantitative information allows further understanding and prediction of spatiotemporal regulation of specific cellular processes in dynamically changing environments. The complexity of macromolecular interactions ranges from simple 1:1 binding to formation of phase-separated condensates with multivalent binding among two or more components (Banani et al., 2017; Lyon et al., 2021; Shin and Brangwynne, 2017). In contrast to the 1:1 binding, multivalent interactions are difficult to describe with the simple mass action law but modeled with more sophisticated frameworks accounting for the presence of various molecular states (Bujalowski, 2006; Freire et al., 2009; Wyman and Gill, 1990). Furthermore, the quantitative models are often formulated with large numbers of parameters, and exemplary cases determining these parameters with suitable in vitro model systems and methods are exceedingly rare.

A linear or one-dimensional lattice is a relatively tractable multivalent system found in numerous cellular processes. Linear lattices present multiple binding motifs or domains to interact with diverse proteins or multiple copies of identical proteins (Fig. 1) (Cortese et al., 2008; Dunker et al., 2005; Fung et al., 2018). For instance, in many signaling pathways, scaffold proteins such as axin, BRCA1, and Ste5 recruit various target proteins via specific binding sites (Choi et al., 1994; Mark et al., 2005; Wodarz and Nusse, 1998). These scaffold-driven higher-order assemblies are predicted to colocalize and increase the local concentrations of the target proteins and thereby facilitate their interactions for efficient integration and propagation of diverse signals in the cell (Fig. 1A) (Noutsou et al., 2011; Xue et al., 2013). Another example is the intrinsically disordered regions (IDRs) of some nucleoporins (Nups) present in the nuclear pore complex (NPC) (Fig. 1B) (Frey and Gorlich, 2007; Radu et al., 1995). The Nup IDRs mediate massive yet selective molecular transport between the nucleus and cytoplasm through specific interactions with karyopherin (Kap) proteins carrying macromolecular cargos (Koh and Blobel, 2015; Schoch et al., 2012). These interactions are achieved by multiple interspersed phenylalanine-glycine (FG) motifs on an IDR capturing several Kap molecules (Bayliss et al., 2000).

Finally, nucleic acids are the most prominent linear lattice systems in the cell. In particular, the chromosomal DNA presents the enormous amount of repeating phosphate groups along its backbone, creating electrostatic potentials for nonspecific protein-DNA interactions (Fig. 1C) (Berg et al., 1981; Stracy et al., 2021). Such polyelectrolyte effect is a major driving force (Lohman et al., 1980; Record et al., 1976), particularly at low salt concentrations, for formation of nucleosomes (Shrader and Crothers, 1989; Widom, 1999) as well as for binding of chromatin architectural proteins such as HMG (high mobility group)-box proteins with little specificities for DNA base sequences (Dragan et al., 2004). Even specific DNA binding proteins typically engage their cationic amino acid side chains to neutralize DNA phosphate charges (Jen-Jacobson et al., 2000; Privalov et al., 2011). Thus, these proteins are expected to interact with nonspecific sites that are present in overwhelming excess over specific site in the chromosomal context. In addition, as the copy numbers of many transcription factors (TFs) are considered greater than those of their corresponding specific binding sites on DNA, the majority of these factors may exist in vivo as nonspecifically bound states (Bintu et al., 2005; Kao-Huang et al., 1977). The physiological impact of the nonspecific protein-DNA interaction is substantial as demonstrated in the classical study by the von Hippel group (von Hippel et al., 1974) as well as in the recent seminal work by the Phillips group (Brewster et al., 2014). Both groups used Escherichia coli lac repressor as a model system to investigate the interplay among the copy numbers of TFs and their binding sites on DNA, the specificity ratio, and the inducer binding affinity in bacterial gene expression. The quantitative models proposed in these studies accurately described and predicted the expression profiles of the genes under the repressor regulation by incorporating nonspecific protein-DNA interactions as a “sink” for RNA polymerase and lac repressor.

Taken together, numerous protein-protein and protein-nucleic acid interactions can be perceived as multivalent interactions mediated by linear lattices. Thus, quantitative models for linear lattice systems are indispensable in understanding a broad range of biological processes and may be further extended to dissect more complex systems including phase-separated biomolecular condensates. In this review, we go over two general mathematical frameworks, combinatorial and conditional probability models, for quantitative description of linear lattices. Prior to the detailed derivation of these models, the molecular features of multivalent interactions on a linear lattice will be qualitatively discussed in light of how they are fundamentally different from 1:1 binding or discrete-site systems. The derivation will be supplemented in Supplementary Information with some detailed mathematical procedures omitted but not immediately evident in the original articles. In the end, a couple of practical examples will be discussed where the models have been further extended and applied to highlight their physiological significance. The alternative methods of sequence generating functions and transfer matrix may be referred to the original and case studies for handling multiple binding modes, heterogeneous lattices, and lattice conformational changes (Bujalowski et al., 1989; Lifson, 1964; Schellman, 1974; Teif, 2007).

MOLECULAR FEATURES OF MULTIVALENT INTERACTIONS ON LINEAR LATTICES

It is straightforward to derive the quantitative models for the linear lattices that utilize discrete regions or domains to bind multiple distinct target proteins with the interaction stoichiometry of 1:1 for each target. In the absence of cooperativity among bound targets, the binding of each target can be handled, independent of binding of other targets, by the simple mass action law yielding a quadratic equation as a function of total concentrations of the lattice and the corresponding target. An advanced model has been derived by constructing a partition function for a linear lattice with cooperativities among bound targets (Cho et al., 2021).

Complexity arises when a target protein occupies two or more binding motifs on a linear lattice. We consider a linear lattice with a total of M motifs and a target protein occluding n consecutive motifs (Fig. 2) (Epstein, 1978; McGhee and von Hippel, 1974). The binding motif can be any repeating unit including a base-pair or phosphate on DNA and a short peptide motif or a PTM (post-translational modification) moiety on an intrinsically disordered protein (IDP). As DNA or proteins have particular directions in denoting their motifs (5’ to 3’ end or N to C-terminus), target proteins are assumed to be polar as well in recognizing the motifs. It is further assumed that there is no partial binding where a target protein occludes less than n motifs. Then, the target binding stoichiometry (N) is the greatest integer less than or equal to M/n (N = [M/n]). A fundamental nature of the linear lattice system becomes evident when a target protein binds to a naked lattice (left panel in Fig. 2A). Because the target protein occupies n consecutive motifs, any motifs except the rightmost n-1 positions can be starting points for target binding. Thus, potential target binding sites overlap and the number of such overlapping sites equals M n + 1, obviously greater than the stoichiometry [M/n]. In contrast, for a conventional system in which a target protein binds discrete and isolated sites (right panel in Fig. 2A), the number of binding sites is simply equal to the stoichiometry N = [M/n].

As the linear lattice subsequently binds more target proteins, its overlapping nature generates additional features further deviating from the discrete-site system. The number of potential binding sites eliminated upon binding of a protein depends on where the protein occupies on the lattice. When a protein binds to a gap exactly n motifs long between prebound proteins on the lattice, only one potential site is removed. Instead, if a gap is longer than 3n – 2 motifs, binding of a protein to this region can eliminate as many as 2n 1 sites. For instance, binding of a protein with the site size of n = 3 to the three leftmost motifs on a linear lattice with a total of nine motifs eliminates three potential binding sites (the second figure in the left panel of Fig. 2B). Alternatively, if the protein occupies the three motifs at the center of the lattice, five potential binding sites are eliminated (the third figure in the left panel of Fig. 2B). However, in the discrete-site system, protein binding invariably eliminates only one potential binding site (right panel in Fig. 2B). Finally, it is difficult to completely saturate the linear lattice since the overlapping protein binding increasingly accumulates gaps with less than n motifs that are futile for binding. This point is explicitly illustrated in Fig. 2C (left panel) listing all possible configurations of the linear lattice with [M/n] – 1 proteins bound. Among them, many are futile configurations with the n free (unoccupied) motifs scattered over the lattice and must rearrange the bound proteins to create a site with n consecutive motifs for the last protein binding. Such a rearrangement or reduction in number of lattice configurations corresponds to a loss of mixing entropy, culminating in apparent negative cooperativity among bound proteins. In contrast, the number of available binding sites is independent of the configuration of bound proteins in the discrete-site system (right panel in Fig. 2C). In summary, because of the overlapping nature of multivalent linear lattice-target interactions, a linear lattice initially presents binding sites greater than the stoichiometry and thereby enhances protein binding as compared to a discrete-site system. However, with density of bound proteins increased, the effect of the overlapping binding is reversed, attenuating saturation of the linear lattice.

The following sections review the quantitative models penetrating the overlap problem of the linear lattice to yield the mathematical formulations relating the binding parameters to experimentally measurable properties of the lattice-target interactions. A core element of each model is the computation of the number of possible configurations for a given density of bound proteins on a lattice.

QUANTITATIVE FRAMEWORK FOR LINEAR LATTICE-PROTEIN INTERACTIONS: COMBINATORIAL MODEL

A complete set of parameters for description of linear lattice-protein interactions consists of the binding stoichiometry (N), binding constant (K), and cooperativity (ω) among bound proteins. As discussed above, the binding stoichiometry (N) is determined by the numbers of all motifs on a lattice (M) and those occupied by a target protein (n, termed site size) (N = [M/n]). The binding constant (K) corresponds to the affinity between a protein and a site n motifs long. Cooperativity can arise from pairwise interactions between any two proteins bound to a linear lattice. Although there are in principle iC2 pairs on a lattice with i (≥2) proteins bound, the models discussed in this review formulate cooperativity only for the interaction between nearest neighbors (i.e., a pair of contiguously bound proteins without any intervening free motifs). Thus, the cooperativity parameter (ω) is equivalent to an equilibrium constant for formation of a direct “contact point” between a pair of bound proteins. Then, under these definitions, a linear lattice presents three distinct types of protein binding sites (Fig. 3A): 1) an isolated site with the binding constant K; 2) a singly contiguous site with the binding constant Kω; 3) a doubly contiguous site with the binding constant Kω2. If ω > 1 (or 0 < ω < 1), the nearest neighbor interaction is favorable (or unfavorable) and the protein binding is positively (or negatively) cooperative. For ω = 1, bound proteins are independent of each other and the binding is noncooperative.

A fundamental relationship between the binding parameters and experimental variables can be derived by constructing a partition function for a linear lattice (Freire et al., 2009; Wyman and Gill, 1990). The partition function is a sum of relative probabilities or statistical weights of all possible protein-bound states of a linear lattice with a free lattice assigned as a reference state of unit relative probability (i.e., statistical weight = 1). Then, the statistical weight of a lattice with i proteins bound and j contact points among them is given by (K[P])iωj where [P] is the free protein concentration. However, in order to account for the presence of multiple configurations for a given set of (i, j), the statistical weight must be multiplied by the degeneracy term PM(i, j), the number of distinct ways to distribute i proteins on a lattice with M motifs and j contact points. Then, the partition function (Z) is given by the following equations:

Z=i=0Nj=0i1PMi,jKPiωj

The average number of proteins bound per lattice (or binding density, ν), which is a principal quantity to be measured in all binding experiments, can be formulated from the partition function:

υ=lnZlnP=i=0Nj=0i1iPMi,jKPiωj×i=0Nj=0i1PMi,jKP iωj1

Likewise, the average number of contact points per lattice can be calculated from a partial derivative of the partition function:

j¯=lnZlnω=i=0Nj=0i1jPMi,jKPiωj×i=0Nj=0i1PMi,jKPiωj1

The final task in constructing the partition function is to derive the expression for PM(i, j). Here we follow the original combinatorial derivation of PM(i, j) (Epstein, 1978), highlighting the concept behind the mathematical procedures. A linear lattice with i proteins bound and j contact points may be dissected into two physical elements. The first element is a “run” defined as a distinct cluster of contiguously bound proteins, and the number of runs can be calculated as ij (Fig. 3B). Because there is at least one free motif between runs, each of the ij – 1 leftmost runs must be attached with a free motif on the right side. The second element is the remaining free motifs and there are Mni – (ij – 1) unattached free motifs (≡ Nu). Then, the number (≡ Nc) of ways of mixing these two elements to create the distinct lattice configurations equals the number of distributing ij runs (accompanied with the ij – 1 attached free motifs) and Nu unattached free motifs into Nu + ij slots (Fig. 3C):

Nc=Nu+ij!Nu!ij!

In this expression, all runs have been treated as identical elements, regardless of the actual number of bound proteins in each run. Therefore, in order to complete the derivation of PM(i, j), the function Nc must be multiplied by the number (≡ Np) of distinct ways to distribute i proteins into ij runs:

Np=i1!j!ij1!

The equation Np is mathematically equivalent to the number of partitions of the integer i into ij positive integers. Finally, PM(i, j) is derived as the following equation:

PMi,j=NcNp=Mni+1!i1!Mnii+j+1!ij!j!ij1!

For noncooperative binding (ω = 1), the number of contact points j becomes irrelevant and PM(i, j) reduces to PM(i), the number of ways of mixing i proteins and Mni free motifs to build distinct lattice configurations:

PMi=Mni+i!Mni!i!

Then, the partition function for noncooperative binding can be written in a simplified form:

Z=i=0NPMiKPi

In practice, the total lattice and protein concentrations ([L]tot and [P]tot), rather than the free protein concentration ([P]), are known experimental variables. The total concentrations are related to each other and other binding parameters through a simple mass balance equation:

Ptot=P+υLtot

For a given set of binding parameters and reactant concentrations, this mass balance equation can be solved for [P] by the numerical procedures such as the Newton-Raphson and the bisection method (Hamming, 1986). In turn, this solution allows calculation of the relative probabilities of all lattice configurations and the ensemble-averaged quantities including Eqs. 2 and 3. Thus, the combinatorial method is straightforward and intuitive in constructing a partition function which illustrates distribution among various protein-bound states of a linear lattice as a function of lattice and protein concentrations. However, this method is difficult to apply to a very long linear lattice (i.e., M >> n) because the number of possible lattice configurations may be too large and potentially cause an overflow problem in computation.

QUANTITATIVE FRAMEWORK FOR LINEAR LATTICE-PROTEIN INTERACTIONS: CONDITIONAL PROBABILITY MODEL

Several quantitative frameworks have been proposed to treat an “infinitely” long linear lattice (M >> n), particularly relevant for proteins nonspecifically binding the chromosomal DNA. Among these frameworks, we review the conditional probability model originally presented in the seminal work by McGhee and von Hippel (1974). In this model, the conditional probabilities have been formulated for the particular states (free or bound) of two consecutive motifs on a linear lattice. For instance, the conditional probability ff (or fb1) is defined as, given a randomly chosen free motif, the probability of the subsequent righthand side motif being free (or bound by the left end of a protein) (Supplementary Fig. S1). In addition, the conditional probability bnf (or bnb1) is defined as, given a motif bound by the right end of a protein, the probability of the subsequent motif being free (or bound by the left end of a protein) (Supplementary Fig. S1). The conditional probabilities were then used to derive an expression for the average number of free binding sites per lattice at a given binding density. This elegant approach yielded a modified form of the Scatchard equation:

υP=K·Average number of free binding sites per lattice θP=K1nθ2ω11nθ+θR2ω11nθn11n+1θ+R21nθ2 R= 1n+1θ2+4ωθ1nθ

where θ corresponds to the average number of proteins bound per motif (i.e., θ = ν/M).

Referring to Supplementary Information for the detailed mathematical procedures of the derivation, we focus on a few intuitive limiting cases leading to the interpretations of this equation consistent with the molecular features of the linear lattice system (McGhee and von Hippel, 1974).

1) In the case of ω = 1 (noncooperative binding), by using L’Hospital’s rule, the equation can be reduced to the following (see Supplementary Information for detailed mathematical procedures):

limω1 K1nθ2ω11nθ+θR2ω11nθn11n+1θ+R21nθ2=K1nθ 1nθ 1 n1θ n1

Note that, for n = 1 (no overlap between bound proteins), the equation further reduces to the original Scatchard equation θ/[L] = K (1 – θ) in which the term (1 – θ) simply represents the fraction of free motifs. Because the squared bracket term in Eq. 11 is always less than unity for n ≥ 2, the fraction of free motifs competent for binding is smaller than the total fraction of free motifs 1 – nθ. Therefore, this result quantitatively supports that, even without genuine interactions among bound proteins (i.e., ω = 1), apparent negative cooperativity arises from the overlap among potential binding sites and consequent futile gaps shorter than n motifs.

2) In the case of ω = 0 (infinite negative-cooperativity), Eq. 10b reduces to the following expression:

θP=K1n+1θ1n+1θ1nθn

This reduced form simply corresponds to Eq. 11 with n = n + 1. The increased binding site size demonstrates that, if the interaction between bound proteins is extremely unfavorable, there is apparently no contact point between any adjacently bound proteins. Instead, they are separated by a persistent free motif. This result clearly demonstrates the fundamental relationship between binding site size and cooperativity.

3) Further insight can be provided at the molecular level from the partial derivatives of Eqs. 10b and 11 with respect to θ at the limiting condition of θ → 0 (see Supplementary Information for detailed mathematical procedures):

θ/Pθθ=0=υ/Pυυ=0=K2ω2n1

Based on Eq. 10a, the partial derivative can be interpreted as a net change in the average numbers of all three types (Fig. 3A) of binding sites, weighted by their corresponding binding constants, upon binding of one protein to a naked (ν = 0) lattice. As illustrated in Fig. 2B, the binding of a protein to a sufficiently long region eliminates a total of 2n – 1 potential binding sites. In addition, the binding converts the two adjacent isolated binding sites into two singly contiguous binding sites (2∙Kω). Hence, a total of (2n – 1) + 2 isolated binding sites has been eliminated (– (2n + 1)∙K). Likewise, the partial derivative of Eq. 11 at θ → 0 is given by:

θ/Pθθ=0=K2n1

Therefore, in the noncooperative case, the binding of one ligand to a naked lattice simply eliminates 2n – 1 potential binding sites.

Taken together, although the conditional probability method is based on the different conceptual framework as compared to the combinatorial approach, the final formulation provides intuitive interpretations fully consistent with the molecular features of the linear lattice systems. In practice, Eq. 10b is rearranged and incorporated into a mass balance equation relating the binding parameters to the total concentrations of lattice motif and protein ([M]tot and [P]tot):

P=θK1nθffn1C2 ff=2ω11nθ+θR2ω11nθ C=1n+1θ+R21nθ Ptot=P+θMtot Ptot=θK1nθffn1C2+θMtot

Eq. 15e can be numerically solved for θ at given values of [M]tot and [P]tot. When interactions of proteins with short linear lattices (e.g., DNA oligomers) are analyzed, the equation can be partially corrected for the assumption of infinite lattice length by applying an “end effect” constant, (Mn + 1) / M, to the term ff n-1 (Tsodikov et al., 2001).

APPLICATION AND EXTENSION OF THE QUANTITATIVE MODELS

Competition among multiple binding modes in protein-nucleic acid interactions

Spatiotemporal regulation of transcription is achieved by interactions between TFs and their specific binding sites on DNA. Because of the enormous number of nonspecific sites on the chromosomal DNA, binding of TFs to these regions must be taken into account to accurately predict the occupancy of the specific sites and thereby the transcription profiles of the corresponding genes (Brewster et al., 2014; von Hippel et al., 1974). In order to recapitulate the essential features of the competition between specific and nonspecific DNA binding, the conditional probability model was extended and applied to a hypothetical two-component (TF and infinitely long DNA with a few embedded specific sites) system. While the 1:1 interaction between TF and a specific site is fully described by the binding constant Ksp, the nonspecific binding is characterized by the binding site size n (in base-pairs), the binding constant Kns, and the cooperativity parameter ω. Then, combining Eq. 10b with the mass-action law for the 1:1 specific binding, the TF concentrations of free, specifically, and nonspecifically bound forms ([TF], [TF]sp,b, [TF]ns,bcan be derived as the following equations:) can be derived as the following equations:

TF=θKns1nθffn1C2 [TF]sp,b=Ksp[Dsp ]totTF1+KspTF TF]ns,b=θM]tot

where [Dsp]tot and [M]tot are the total concentrations of the specific site and the nonspecific binding motif (base-pair), respectively. Substituting Eq. 16a for [TF] in Eq. 16b, the mass balance equation for the total TF concentration ([TF]tot = [TF] + [TF]sp,b + [TF]ns,b) can be numerically solved for θ. The final outcome of the calculation is the fractional occupancy of the specific site (Ysp = [TF]sp,b/[Dsp]tot) as a function of total concentration ratio between TF and the specific site ([TF]tot/[Dsp]tot ranging from 0 to 10) (upper panels in Figs. 4A and 4B). In the calculation, the ratio Ksp/Kns (termed specificity ratio) (Fig. 4A) or the total nonspecific motif concentrations (Fig. 4B) was varied over orders of magnitude while the nonspecific binding site size and cooperativity were fixed at the constant values for simplicity (n = 10, ω = 1).

At a given specificity ratio and a total motif concentration, as the concentration ratio [TF]tot/[Dsp]tot is increased, the fractional occupancy of the specific site by TF monotonically increases with an apparent hyperbolic feature (upper panels in Figs. 4A and 4B). However, the underlying distribution of TF exhibits a dynamic shift from specifically to nonspecifically bound states (bottom panels in Figs. 4A and 4B). For higher specificity ratio or lower nonspecific motif concentrations, the specific complex is predominant in the regime [TF]tot/[Dsp]tot < 1, leading to a steep rise in occupancy of the specific site. Consequently, the transition to the nonspecifically bound state is achieved at higher concentration ratio. Therefore, under these conditions, a relatively small amount of TF is required to saturate the specific site and thereby fully activate transcription. Conversely, for lower specificity ratio or higher nonspecific motif concentrations, the nonspecific binding significantly competes with the specific binding even at low [TF]tot/[Dsp]tot (bottom panels in Figs. 4A and 4B), attenuating saturation of the specific site (upper panels in Figs. 4A and 4B). These simulations suggest that, since protein-DNA interactions are generally sensitive to many cellular conditions such as salt concentration and osmotic stress, changes in these variables potentially fine-tune the specificity ratio of TFs and thereby the corresponding transcription levels. Furthermore, a change in chromosome packing may indirectly affect the TF-specific site interaction by altering the nonspecific site concentrations. Taken together, nonspecific protein-DNA interactions, via change in either specificity ratio or abundance of nonspecific sites, can modulate the occupancies of specific TF binding sites and consequently reprogram the gene-specific transcriptional activities.

Competitions between specific and nonspecific binding or among multiple nonspecific binding modes have been observed in numerous in vitro protein-DNA interactions as well (Bujalowski et al., 1988; Rajendran et al., 1998). Even studies using short oligonucleotides have shown similar competitions due to significantly low specificity ratios (Holbrook et al., 2001; Koh et al., 2008). In order to accurately determine a specific binding constant, the linear lattice models must be applied or further advanced to tease apart the contributions from multiple binding modes to the observed binding signal (Tsodikov et al., 2001).

Competition among distinct target proteins for binding to an intrinsically disordered protein

IDPs often utilize short peptide motifs to recruit multiple distinct targets or multiple copies of an identical target (Cumberworth et al., 2013; Hong et al., 2020; Wright and Dyson, 2015). These IDPs are collectively termed hubs and involved in signal transduction and macromolecular transport. A representative example is Nup153, a subunit of the NPC, that contains a long C-terminal IDR (~600 amino acids in length) (Krull et al., 2004). The IDR presents multiple FG-motifs to interact with Kaps carrying macromolecular cargos into and out of the nucleus. Multiple hydrophobic pockets on the Kap surface are the primary binding sites for the FG-motifs (Bayliss et al., 2000).

A recent thermodynamic study has developed an advanced combinatorial model to demonstrate that the Nup153 IDR comprises a high-affinity 1:1 binding site and a series of low-affinity sites for binding of multiple Kaps (Fig. 4C) (Cho et al., 2021). Calorimetric data of various protein concentrations and IDR lengths were scrutinized to further show that the overlapping binding of Kaps to the low-affinity sites results in apparent negative cooperativity. Because the Nup153 IDR potentially interacts with nuclear proteins involved in transcription and chromatin organization (Kadota et al., 2020; Kasper et al., 1999), this study has constructed composite combinatorial models to test how the multivalent Kap binding would be affected by competitive binding of nuclear proteins (Fig. 4C). Remarkably, the simulation has revealed that the Kap occupancy of the low-affinity region can be fine-tuned by changing the location of the competitor binding site (Fig. 4C). This delicate modulation arises from the molecular feature of the overlapping binding: The number of potential Kap binding sites eliminated by the competition is determined by the position of the competitor binding site (Fig. 2B). Therefore, assuming that the Kap occupancy is a proxy for the transport activity of the NPC, it is conceivable that the Nup153 IDR functions as a molecular switch coupling specific nuclear processes to distinct transport states. For instance, a strong promoter may be coupled to the NPC activity in such a way that specific TFs or co-activators associated with the strong promoter target a location in the Nup153 IDR that considerably reduces the Kap occupancy (Fig. 4D). As a consequence of the reduced general transport activity mediated by Kaps, a large amount of mRNA transcribed from the strong promoter may be efficiently exported through the NPC (Fig. 4D). Although awaiting experimental validation, the coupling mechanism built upon multivalent, overlapping IDP-target interactions may contribute to the functional versatility of the IDP hubs in dynamic cellular processes. This exemplary study demonstrates that the original combinatorial model can be readily expanded by simple mathematical operations to account for additional complexities in linear lattice-protein interactions including heterogeneous binding sites.

CONCLUSION

Linear lattice systems and their multivalent interactions with target proteins often regulate dynamic cellular processes. Because of the overlapping target binding sites on a linear lattice, quantitative understanding of such interactions requires a fundamentally different framework as compared to simple 1:1 binding or discrete-site systems. In this review, we discussed the two prevalent approaches in unraveling the linear lattice systems, namely combinatorial and conditional probability models. Constructing the lattice partition functions from the combinatorial approach is straightforward and readily expandable in data analysis and predictions as illustrated in the Nup153 IDR–Kap interaction. On the other hand, the conditional probability model provides invaluable physical insights consistent with the molecular features of the multivalent linear lattice–target interactions. Furthermore, this method is suitable in simulating in vivo nucleic acid systems of apparent infinite lattice length. These frameworks may serve as a cornerstone to develop sophisticated models to analyze more complex cellular processes including competition among multiple DNA binding proteins on nucleosomal DNA (Segal and Widom, 2009) as well as formation of phase-separated condensates involving multiple components (Lyon et al., 2021).

ACKNOWLEDGMENTS

This work was supported by Samsung Science & Technology Foundation and Research (SSTF-BA1802-09) and the National Research Foundation (2019R1C1C1011640).

AUTHOR CONTRIBUTIONS

J.C. and J.K. analyzed the data. J.C., R.K., and J.K. wrote the manuscript.

CONFLICT OF INTEREST

The authors have no potential conflicts of interest to disclose.

Fig 1.

Figure 1.Schematic illustration of the representative linear lattice systems in cellular processes. (A) Scaffold proteins recruiting diverse binding partners in signal transduction. (B) IDRs in the NPC binding multiple Kaps in nucleocytoplasmic transport. (C) Nonspecific sites on the chromosomal DNA for transcription factor binding.
Molecules and Cells 2022; 45: 444-453https://doi.org/10.14348/molcells.2022.0035

Fig 2.

Figure 2.Molecular features of multivalent interactions on a linear lattice (M = 9) where a protein occupies any n (= 3) consecutive motifs. (A) The number of potential overlapping binding sites on a naked lattice (left panel) is greater as compared to a discrete-site system (right panel) with the same stoichiometry (N). (B) The number of potential binding sites eliminated upon binding of a protein depends on where the protein occupies on the lattice (left panel). In contrast, binding of a protein to the discrete-site system invariably eliminates only one potential binding site (right panel). (C) Possible configurations of the linear lattice with two proteins bound (left panel). Many configurations are futile for the last protein binding, resulting in apparent negative cooperativity among bound proteins. In contrast, all corresponding configurations in the discrete-site system are competent for binding (right panel).
Molecules and Cells 2022; 45: 444-453https://doi.org/10.14348/molcells.2022.0035

Fig 3.

Figure 3.Calculation of the number of distinct lattice configurations with i proteins bound and j contact points. (A) Three distinct types of protein binding sites on a linear lattice and the definitions of K and ω. (B) Dissection of a linear lattice into two distinct physical elements, runs and unattached free motifs. The ij – 1 leftmost runs are attached at their righthand end with a free motif (termed attached free motif). (C) Creation of the distinct lattice configurations by combining the two elements.
Molecules and Cells 2022; 45: 444-453https://doi.org/10.14348/molcells.2022.0035

Fig 4.

Figure 4.Application and extension of the quantitative models for linear lattice systems. (A and B) Effects of nonspecific protein-DNA interactions on transcription. Upper panels: Using an extended conditional probability model (Eq. 16), the fractional occupancy of specific DNA sites (Ysp = [TF]sp,b/[Dsp]tot) for binding of a hypothetical TF was calculated as a function of molar ratio [TF]tot/[Dsp]tot for various sets of interaction parameters. Bottom panels: The corresponding fractional distribution of TF between specifically (solid curves) and nonspecifically (dashed curves) bound states were calculated. In these calculations, the value of Kns (A) or the concentration of nonspecific motifs ([M]tot) (B) was varied with the fixed values of Ksp = 1 × 1012 M-1, n = 10 bp, and ω = 1 ([M]tot = 5 mM in (A); Kns = 1 × 105 M–1 in (B)). (C) Quantitative model for assembly of the Nup153 IDR hub with multiple interaction partners and competitors (adapted from Cho et al., 2021). The Nup153 IDR presents a high-affinity 1:1 Kap binding site (purple) and a series of low-affinity sites for overlapping binding of multiple Kaps. Kap occupies multiple dipeptide (FG) motifs (pink vertical bars). Using advanced combinatorial models, fine-tuning of the Kap occupancy of Nup153 IDR was predicted as a function of location of the competitor binding site. In the partition function Z, Z0 corresponds to the partition function of the Nup153 IDR in the absence of competition; Kc[C] represents the competitor binding; The terms in the brackets are the partition functions for two subregions of the low-affinity sites separated by the competitor binding; (1 + Ks[P]) represents the 1:1 interaction of Kap with the high-affinity site. (D) On the basis of the multivalent, overlapping IDR-Kap interaction, the Nup153 IDR is proposed to function as a molecular switch to couple nucleocytoplasmic transport to transcription.
Molecules and Cells 2022; 45: 444-453https://doi.org/10.14348/molcells.2022.0035

References

  1. Banani S.F., Lee H.O., Hyman A.A., and Rosen M.K. (2017). Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 18, 285-298.
    Pubmed KoreaMed CrossRef
  2. Bayliss R., Littlewood T., and Stewart M. (2000). Structural basis for the interaction between FxFG nucleoporin repeats and importin-beta in nuclear trafficking. Cell 102, 99-108.
    Pubmed CrossRef
  3. Berg O.G., Winter R.B., and von Hippel P.H. (1981). Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and theory. Biochemistry 20, 6929-6948.
    Pubmed CrossRef
  4. Bintu L., Buchler N.E., Garcia H.G., Gerland U., Hwa T., Kondev J., Kuhlman T., and Phillips R. (2005). Transcriptional regulation by the numbers: applications. Curr. Opin. Genet. Dev. 15, 125-135.
    Pubmed KoreaMed CrossRef
  5. Brewster R.C., Weinert F.M., Garcia H.G., Song D., Rydenfelt M., and Phillips R. (2014). The transcription factor titration effect dictates level of gene expression. Cell 156, 1312-1323.
    Pubmed KoreaMed CrossRef
  6. Bujalowski W. (2006). Thermodynamic and kinetic methods of analyses of protein-nucleic acid interactions. From simpler to more complex systems. Chem. Rev. 106, 556-606.
    Pubmed CrossRef
  7. Bujalowski W., Lohman T.M., and Anderson C.F. (1989). On the cooperative binding of large ligands to a one-dimensional homogeneous lattice: the generalized three-state lattice model. Biopolymers 28, 1637-1643.
    Pubmed CrossRef
  8. Bujalowski W., Overman L.B., and Lohman T.M. (1988). Binding mode transitions of Escherichia coli single strand binding protein-single-stranded DNA complexes. Cation, anion, pH, and binding density effects. J. Biol. Chem. 263, 4629-4640.
    Pubmed CrossRef
  9. Cho B., Choi J., Kim R., Yun J.N., Choi Y., Lee H.H., and Koh J. (2021). Thermodynamic models for assembly of intrinsically disordered protein hubs with multiple interaction partners. J. Am. Chem. Soc. 143, 12509-12523.
    Pubmed CrossRef
  10. Choi K.Y., Satterberg B., Lyons D.M., and Elion E.A. (1994). Ste5 tethers multiple protein kinases in the MAP kinase cascade required for mating in S. cerevisiae. Cell 78, 499-512.
    Pubmed CrossRef
  11. Cortese M.S., Uversky V.N., and Dunker A.K. (2008). Intrinsic disorder in scaffold proteins: getting more from less. Prog. Biophys. Mol. Biol. 98, 85-106.
    Pubmed KoreaMed CrossRef
  12. Cumberworth A., Lamour G., Babu M.M., and Gsponer J. (2013). Promiscuity as a functional trait: intrinsically disordered regions as central players of interactomes. Biochem. J. 454, 361-369.
    Pubmed CrossRef
  13. Dragan A.I., Read C.M., Makeyeva E.N., Milgotina E.I., Churchill M.E., Crane-Robinson C., and Privalov P.L. (2004). DNA binding and bending by HMG boxes: energetic determinants of specificity. J. Mol. Biol. 343, 371-393.
    Pubmed CrossRef
  14. Dunker A.K., Cortese M.S., Romero P., Iakoucheva L.M., and Uversky V.N. (2005). Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 272, 5129-5148.
    Pubmed CrossRef
  15. Epstein I.R. (1978). Cooperative and non-cooperative binding of large ligands to a finite one-dimensional lattice. A model for ligand-oligonucleotide interactions. Biophys. Chem. 8, 327-339.
    Pubmed CrossRef
  16. Freire E., Schon A., and Velazquez-Campoy A. (2009). Isothermal titration calorimetry: general formalism using binding polynomials. Methods Enzymol. 455, 127-155.
    Pubmed CrossRef
  17. Frey S. and Gorlich D. (2007). A saturated FG-repeat hydrogel can reproduce the permeability properties of nuclear pore complexes. Cell 130, 512-523.
    Pubmed CrossRef
  18. Fung H.Y.J., Birol M., and Rhoades E. (2018). IDPs in macromolecular complexes: the roles of multivalent interactions in diverse assemblies. Curr. Opin. Struct. Biol. 49, 36-43.
    Pubmed KoreaMed CrossRef
  19. Ha T., Kaiser C., Myong S., Wu B., and Xiao J. (2022). Next generation single-molecule techniques: imaging, labeling, and manipulation in vitro and in cellulo. Mol. Cell 82, 304-314.
    Pubmed CrossRef
  20. Hamming R.W. .
  21. Holbrook J.A., Tsodikov O.V., Saecker R.M., and Record M.T. Jr. (2001). Specific and non-specific interactions of integration host factor with DNA: thermodynamic evidence for disruption of multiple IHF surface salt-bridges coupled to DNA binding. J. Mol. Biol. 310, 379-401.
    Pubmed CrossRef
  22. Hong S., Choi S., Kim R., and Koh J. (2020). Mechanisms of macromolecular interactions mediated by protein intrinsic disorder. Mol. Cells 43, 899-908.
    Pubmed KoreaMed CrossRef
  23. Jen-Jacobson L., Engler L.E., and Jacobson L.A. (2000). Structural and thermodynamic strategies for site-specific DNA binding proteins. Structure 8, 1015-1023.
    Pubmed CrossRef
  24. Kadota S., Ou J., Shi Y., Lee J.T., Sun J., and Yildirim E. (2020). Nucleoporin 153 links nuclear pore complex to chromatin architecture by mediating CTCF and cohesin binding. Nat. Commun. 11, 2606.
    Pubmed KoreaMed CrossRef
  25. Kao-Huang Y., Revzin A., Butler A.P., O'Conner P., Noble D.W., and von Hippel P.H. (1977). Nonspecific DNA binding of genome-regulating proteins as a biological control mechanism: measurement of DNA-bound Escherichia coli lac repressor in vivo. Proc. Natl. Acad. Sci. U. S. A. 74, 4228-4232.
    Pubmed KoreaMed CrossRef
  26. Kasper L.H., Brindle P.K., Schnabel C.A., Pritchard C.E., Cleary M.L., and van Deursen J.M. (1999). CREB binding protein interacts with nucleoporin-specific FG repeats that activate transcription and mediate NUP98-HOXA9 oncogenicity. Mol. Cell. Biol. 19, 764-776.
    Pubmed KoreaMed CrossRef
  27. Koh J. and Blobel G. (2015). Allosteric regulation in gating the central channel of the nuclear pore complex. Cell 161, 1361-1373.
    Pubmed CrossRef
  28. Koh J., Saecker R.M., and Record M.T. Jr. (2008). DNA binding mode transitions of Escherichia coli HU(alphabeta): evidence for formation of a bent DNA--protein complex on intact, linear duplex DNA. J. Mol. Biol. 383, 324-346.
    Pubmed KoreaMed CrossRef
  29. Krull S., Thyberg J., Bjorkroth B., Rackwitz H.R., and Cordes V.C. (2004). Nucleoporins as components of the nuclear pore complex core structure and Tpr as the architectural element of the nuclear basket. Mol. Biol. Cell 15, 4261-4277.
    Pubmed KoreaMed CrossRef
  30. Lifson S. (1964). Partition functions of linear-chain molecules. J. Chem. Phys. 40, 3705-3710.
    CrossRef
  31. Lohman T.M., deHaseth P.L., and Record M.T. Jr. (1980). Pentalysine-deoxyribonucleic acid interactions: a model for the general effects of ion concentrations on the interactions of proteins with nucleic acids. Biochemistry 19, 3522-3530.
    Pubmed CrossRef
  32. Lyon A.S., Peeples W.B., and Rosen M.K. (2021). A framework for understanding the functions of biomolecular condensates across scales. Nat. Rev. Mol. Cell Biol. 22, 215-235.
    Pubmed KoreaMed CrossRef
  33. Mahamid J., Pfeffer S., Schaffer M., Villa E., Danev R., Cuellar L.K., Forster F., Hyman A.A., Plitzko J.M., and Baumeister W. (2016). Visualizing the molecular sociology at the HeLa cell nuclear periphery. Science 351, 969-972.
    Pubmed CrossRef
  34. Mark W.Y., Liao J.C., Lu Y., Ayed A., Laister R., Szymczyna B., Chakrabartty A., and Arrowsmith C.H. (2005). Characterization of segments from the central region of BRCA1: an intrinsically disordered scaffold for multiple protein-protein and protein-DNA interactions? J. Mol. Biol. 345, 275-287.
    Pubmed CrossRef
  35. McGhee J.D. and von Hippel P.H. (1974). Theoretical aspects of DNA-protein interactions: co-operative and non-co-operative binding of large ligands to a one-dimensional homogeneous lattice. J. Mol. Biol. 86, 469-489.
    Pubmed CrossRef
  36. Noutsou M., Duarte A.M., Anvarian Z., Didenko T., Minde D.P., Kuper I., de Ridder I., Oikonomou C., Friedler A., and Boelens R., et al. (2011). Critical scaffolding regions of the tumor suppressor Axin1 are natively unfolded. J. Mol. Biol. 405, 773-786.
    Pubmed CrossRef
  37. Oikonomou C.M. and Jensen G.J. (2017). Cellular electron cryotomography: toward structural biology in situ. Annu. Rev. Biochem. 86, 873-896.
    Pubmed CrossRef
  38. Privalov P.L., Dragan A.I., and Crane-Robinson C. (2011). Interpreting protein/DNA interactions: distinguishing specific from non-specific and electrostatic from non-electrostatic components. Nucleic Acids Res. 39, 2483-2491.
    Pubmed KoreaMed CrossRef
  39. Radu A., Moore M.S., and Blobel G. (1995). The peptide repeat domain of nucleoporin Nup98 functions as a docking site in transport across the nuclear pore complex. Cell 81, 215-222.
    Pubmed CrossRef
  40. Rajendran S., Jezewska M.J., and Bujalowski W. (1998). Human DNA polymerase beta recognizes single-stranded DNA using two different binding modes. J. Biol. Chem. 273, 31021-31031.
    Pubmed CrossRef
  41. Record M.T. Jr., Lohman M.L. Jr., and De Haseth P. Jr. (1976). Ion effects on ligand-nucleic acid interactions. J. Mol. Biol. 107, 145-158.
    Pubmed CrossRef
  42. Schellman J.A. (1974). Cooperative multisite binding to DNA. Isr. J. Chem. 12, 219-238.
    CrossRef
  43. Schoch R.L., Kapinos L.E., and Lim R.Y. (2012). Nuclear transport receptor binding avidity triggers a self-healing collapse transition in FG-nucleoporin molecular brushes. Proc. Natl. Acad. Sci. U. S. A. 109, 16911-16916.
    Pubmed KoreaMed CrossRef
  44. Segal E. and Widom J. (2009). From DNA sequence to transcriptional behaviour: a quantitative approach. Nat. Rev. Genet. 10, 443-456.
    Pubmed KoreaMed CrossRef
  45. Shin Y. and Brangwynne C.P. (2017). Liquid phase condensation in cell physiology and disease. Science 357, eaaf4382.
    Pubmed CrossRef
  46. Shrader T.E. and Crothers D.M. (1989). Artificial nucleosome positioning sequences. Proc. Natl. Acad. Sci. U. S. A. 86, 7418-7422.
    Pubmed KoreaMed CrossRef
  47. Sigal Y.M., Zhou R., and Zhuang X. (2018). Visualizing and discovering cellular structures with super-resolution microscopy. Science 361, 880-887.
    Pubmed KoreaMed CrossRef
  48. Stracy M., Schweizer J., Sherratt D.J., Kapanidis A.N., Uphoff S., and Lesterlin C. (2021). Transient non-specific DNA binding dominates the target search of bacterial DNA-binding proteins. Mol. Cell 81, 1499-1514.e6.
    Pubmed KoreaMed CrossRef
  49. Teif V.B. (2007). General transfer matrix formalism to calculate DNA-protein-drug binding in gene regulation: application to OR operator of phage lambda. Nucleic Acids Res. 35, e80.
    Pubmed KoreaMed CrossRef
  50. Tsodikov O.V., Holbrook J.A., Shkel I.A., and Record M.T. Jr. (2001). Analytic binding isotherms describing competitive interactions of a protein ligand with specific and nonspecific sites on the same DNA oligomer. Biophys. J. 81, 1960-1969.
    Pubmed KoreaMed CrossRef
  51. von Hippel P.H., Revzin A., Gross C.A., and Wang A.C. (1974). Non-specific DNA binding of genome regulating proteins as a biological control mechanism: I. The lac operon: equilibrium aspects. Proc. Natl. Acad. Sci. U. S. A. 71, 4808-4812.
    Pubmed KoreaMed CrossRef
  52. Widom J. (1999). Equilibrium and dynamic nucleosome stability. Methods Mol. Biol. 119, 61-77.
    Pubmed CrossRef
  53. Wodarz A. and Nusse R. (1998). Mechanisms of Wnt signaling in development. Annu. Rev. Cell Dev. Biol. 14, 59-88.
    Pubmed CrossRef
  54. Wright P.E. and Dyson H.J. (2015). Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol. 16, 18-29.
    Pubmed KoreaMed CrossRef
  55. Wyman J. and Gill S.J. (Mill Valley, CA: University Science Books).
    CrossRef
  56. Xue B., Romero P.R., Noutsou M., Maurice M.M., Rudiger S.G., William A.M. Jr., Mizianty M.J. Jr., Kurgan L. Jr., Uversky V.N. Jr., and Dunker A.K. Jr. (2013). Stochastic machines as a colocalization mechanism for scaffold protein function. FEBS Lett. 587, 1587-1591.
    Pubmed KoreaMed CrossRef
Mol. Cells
May 31, 2023 Vol.46 No.5, pp. 259~328
COVER PICTURE
The alpha-helices in the lamin filaments are depicted as coils, with different subdomains distinguished by various colors. Coil 1a is represented by magenta, coil 1b by yellow, L2 by green, coil 2a by white, coil 2b by brown, stutter by cyan, coil 2c by dark blue, and the lamin Ig-like domain by grey. In the background, cells are displayed, with the cytosol depicted in green and the nucleus in blue (Ahn et al., pp. 309-318).

Supplementary File

Share this article on

  • line
  • mail

Related articles in Mol. Cells

Molecules and Cells

eISSN 0219-1032
qr-code Download