Mol. Cells 2023; 46(1): 21-32
Published online January 4, 2023
https://doi.org/10.14348/molcells.2023.2157
© The Korean Society for Molecular and Cellular Biology
Correspondence to : baek@snu.ac.kr
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
MicroRNAs (miRNAs) play cardinal roles in regulating biological pathways and processes, resulting in significant physiological effects. To understand the complex regulatory network of miRNAs, previous studies have utilized massivescale datasets of miRNA targeting and attempted to computationally predict the functional targets of miRNAs. Many miRNA target prediction tools have been developed and are widely used by scientists from various fields of biology and medicine. Most of these tools consider seed pairing between miRNAs and their mRNA targets and additionally consider other determinants to improve prediction accuracy. However, these tools exhibit limited prediction accuracy and high false positive rates. The utilization of additional determinants, such as RNA modifications and RNA-binding protein binding sites, may further improve miRNA target prediction. In this review, we discuss the determinants of functional miRNA targeting that are currently used in miRNA target prediction and the potentially predictive but unappreciated determinants that may improve prediction accuracy.
Keywords bioinformatics, microRNA, microRNA target prediction, microRNA targeting, microRNA targeting determinants
MicroRNAs (miRNAs) regulate a broad range of biological processes and physiological pathways, including tumor suppression and progression (Gregory et al., 2008; Jang et al., 2020; O'Donnell et al., 2005; Peng and Croce, 2016), immune cell development and function (Fontana et al., 2007; Han et al., 2020b; Mehta and Baltimore, 2016; Muljo et al., 2005; O'Connell et al., 2010; Rodriguez et al., 2007; Thai et al., 2007), cardiovascular diseases (Care et al., 2007; Harris et al., 2008), neural development and function (Brennan and Henshall, 2020; Ghosh et al., 2014; Giraldez et al., 2005; Krichevsky et al., 2006; Tan et al., 2013), early embryonic development (Bernstein et al., 2003; Choi et al., 2007; Wienholds et al., 2005), and cytoskeletal dynamics (Fededa et al., 2016; Wu et al., 2014; Xin et al., 2009). Since miRNAs modulate gene expression by binding to their targets, it is crucial to systematically identify and evaluate all the functional targets of miRNAs to better understand complex miRNA regulatory networks (Bracken et al., 2016; Plaisier et al., 2012; Pu et al., 2019).
The generation of a comprehensive list of miRNA targets is challenging because miRNA targets are abundant and widespread across the transcriptome (Friedman et al., 2009; Lim et al., 2005; Selbach et al., 2008). The classical strategy for studying miRNA targeting involves the identification of potential targets using experimental methods, including microarray (Lim et al., 2005), RIP (RNA immunoprecipitation) (Keene et al., 2006), cross-linking immunoprecipitation (CLIP) (Chi et al., 2009), miRNA pull-down (Hassan et al., 2013; Orom and Lund, 2010), and luciferase reporter assays (Ghanbarian et al., 2022; Thomson et al., 2011; Tuschl et al., 1999). The generation of massive amounts of data through these experiments is often limited by a prohibitively large amount of labor, time, and cost. Alternatively, functional miRNA targeting can be identified using computational tools that are much faster and more accessible than conventional experimental methods.
Recently, many computational tools for miRNA targeting prediction have been developed (Nachtigall and Bovolenta, 2022). However, the predictive power of these tools requires further improvement. For instance, even with the state-of-the-art prediction tool (Agarwal et al., 2015; McGeary et al., 2019), the predicted repressive effects show a weak correlation (r2 ≤ 0.2) with the experimental data (Agarwal et al., 2015; McGeary et al., 2019). Here, we review the determinants of functional miRNA targeting that have not been used in computational miRNA target prediction tools. The incorporation of previously unappreciated determinants may improve the performance of miRNA target prediction tools.
Mature miRNAs form a protein-RNA complex, termed the RNA-induced silencing complex (RISC), with the Argonaute (AGO) protein (Gregory et al., 2005; Hammond et al., 2000). The current RISC model suggests that four nucleotides (nucleotides 2-5 of guide RNA; g2-5) at the 5′ end of the miRNA are exposed before binding to the target (Schirle et al., 2014). The pairing between this region of a miRNA and its target mRNA induces a conformational change in AGO and exposes additional nucleotides (g6-8 and 13-16) that allow stable pairing between RISC and its target. The sequence of 6 or 7 nucleotides (g2-7 or g2-8) at the 5′ end of the miRNAs, called the seed region, primarily determines the target specificity of the miRNAs (Bartel, 2009). The nucleotides exposed at the 3′ ends of miRNAs may pair with the target and supplement seed pairing (Grimson et al., 2007).
The primary determinant of miRNA targeting efficacy is pairing in the seed region. Mismatches in the seed region reduce the potency of the target site. The miRNA target sites are classified into 'canonical' sites that contain a perfect hexamer match with the seed (g2-7), and the ‘noncanonical’ sites that match imperfectly with the seed (Bartel, 2009) (Fig. 1). The canonical miRNA target sites are further divided into four types according to two criteria: a t8 match with the g8 nucleotide and the presence of adenine at position t1. The 8-mer sites with both a t8 match and a t1A match are the most effective, followed by the 7-mer m8, 7-mer A1, and 6-mer sites (Baek et al., 2008; Bartel, 2009; Lewis et al., 2005). Noncanonical sites generally confer weaker repressive effects than canonical sites, and a subset of the noncanonical sites are only effective in favorable contexts, hence the term 'context-dependent noncanonical site types (CDNSTs)’ (Kim et al., 2016). Currently, offset 7-mer (g3-8 match), offset 6-mer (g3-7 match), 6-mer A1 (g2-6 match with t1A), and four other CDNSTs have been identified as functional noncanonical sites through large-scale bioinformatics analysis (Kim et al., 2016). Many widely used miRNA target prediction tools, such as TargetScan (Agarwal et al., 2015), do not fully account for noncanonical sites.
Each miRNA target prediction tool utilizes a different set of determinants (Table 1). Earlier tools, including PITA (Kertesz et al., 2007), MicroTar (Thadani and Tammi, 2006), RNAhybrid (Kruger and Rehmsmeier, 2006), and PicTar (Krek et al., 2005) rely only on a few determinants, whereas more recent tools, such as TargetScan (Agarwal et al., 2015; McGeary et al., 2019), MIRZA-G (Gumienny and Zavolan, 2015), DIANA-microT-CDS (Reczko et al., 2012), and miRanda-mirSVR (Betel et al., 2010) utilize a larger number of determinants. Most of these tools use seed pairing, thermodynamic stability, evolutionary conservation, and the structural accessibility of target sites. Of the currently available miRNA targeting prediction tools, TargetScan is considered to be the most accurate tool that utilizes the largest number of determinants in the model. Here, we summarize the determinants of miRNA targeting used in these tools (Fig. 2).
Local AU content around the target site is one of the most prominent determinants of miRNA targeting. Effective and conserved target sites are associated with higher local AU content but not global AU content (Grimson et al., 2007; Nielsen et al., 2007). Although the mechanism by which local AU content affects miRNA targeting remains unclear, many target prediction tools, including TargetScan7, miRanda-mirSVR, and DIANA-microT-CDS, utilize this feature. Two hypotheses that are not mutually exclusive have been proposed to explain the correlation between stronger repression and higher local AU content. The first hypothesis is that RISC interacts with AREs (AU-rich elements), and the repressive efficacy of the nearby target is enhanced by this interaction (Jing et al., 2005; Nielsen et al., 2007; Vasudevan and Steitz, 2007). An alternative hypothesis is that high AU content inhibits the formation of secondary structures and increases the accessibility of the target, leading to stronger repression (Grimson et al., 2007; Nielsen et al., 2007).
The structural accessibility of the target site, mainly determined by the secondary or tertiary structure of the 3′UTR, positively correlates with targeting efficacy (Kertesz et al., 2007). Higher site accessibility increases the chance of RISC binding to its target, leading to stronger repression. The structural accessibility of the target site can be measured as ΔG or ΔΔG, which is the difference in ΔG between the folded and open 3′UTRs. A previous study demonstrated that ΔΔG had a stronger correlation with the repressive effect than ΔG of the folded 3′UTR (Kertesz et al., 2007). Another study argued that structural accessibility, measured by ΔG, lost its correlation with repression when local AU was controlled (Grimson et al., 2007). A more recent study showed that when structural accessibility was measured in log unpaired probability using RNAplFold (Bernhart et al., 2006), the correlation was present even when other confounding factors, including local AU, were controlled (Agarwal et al., 2015).
The thermodynamic pairing stability between a miRNA and its target is a determinant widely used for miRNA targeting prediction. There are two approaches to calculating pairing stability: (1) using the whole miRNA sequence and (2) using only the seed region of the miRNA, which is termed seed pairing stability. Many miRNA target prediction tools, including DIANA-microT-CDS, MIRZA-G, PITA, PicTar, RNAhybrid, and MicroTar, rely heavily on the predicted thermodynamic pairing stability between the entire mature miRNA and target. However, TargetScan7 utilizes seed pairing stability as a feature (Agarwal et al., 2015). Seed pairing stability is negatively correlated with the abundance of the target site in the 3′UTR of mRNAs (Garcia et al., 2011). Targets with a higher abundance are associated with weaker repressive effects, which may be due to competition between miRNA targets (Arvey et al., 2010). Despite the correlation between target abundance and seed pairing stability, both determinants affect miRNA targeting efficacy independently and globally (Garcia et al., 2011).
Recently, RNA Bind-n-Seq (RBNS), a technique that measures the affinity between a protein and random RNA sequences by pull-down of RNA-binding protein (RBP) with bound RNA followed by high-throughput sequencing (Lambert et al., 2014), has been employed to measure the binding affinity between miRNA-loaded AGO and 12-nt RNA sequences (McGeary et al., 2019). The AGO-RBNS data have been used to develop a biochemical model of miRNA targeting and are included in TargetScan8 (McGeary et al., 2019).
When RISC opens up after the initial pairing between the miRNA and the target RNA, two regions of the miRNA open up: nucleotides 2-8 and 13-16 (Schirle et al., 2014). While miRNA targeting is primarily determined by the former, the latter region also affects the repressive effect of miRNAs (Friedman et al., 2009; Grimson et al., 2007). The pairing of the 3′ regions of the miRNA, including nucleotides 13-16, may supplement the canonical seed match or compensate for the noncanonical seed match (Bartel, 2009). About 5% of the conserved target sites have supplementary 3′ pairings, and less than 2% have compensatory 3′ pairings (Friedman et al., 2009). A recent study (McGeary et al., 2022) reported that each miRNA has different preferences for an optimal 3′ pairing, while some miRNAs, such as let-7, can have two 3′ binding modes with different offsets, and nucleotides outside of positions 13-16 could have a significant impact on the 3′ binding affinities.
Although the majority of functional miRNA target sites are located in the 3′UTRs, targets in the ORFs may have weak repressive effects. ORF sites can strengthen the repressive effect of 3′UTR sites in a synergistic or additive manner (Fang and Rajewsky, 2011). The results of DIANA-microT-CDS suggest that the incorporation of ORF site as a determinant may aid miRNA targeting prediction (Reczko et al., 2012).
Target sites closer to either end of the 3′UTR are associated with higher repressive efficacy (Gaidatzis et al., 2007; Grimson et al., 2007; Majoros and Ohler, 2007). There are two hypotheses to explain the correlation between repression and the minimum distance to the 3′UTR. The first hypothesis is that the target sites near either end of the 3′UTR could be closer to the translation or RNA-processing complexes (Gaidatzis et al., 2007; Grimson et al., 2007). At the 3′ end, mRNA looping may bring the site closer to the protein complex (Grimson et al., 2007). Another possibility is that the sites near the ends have higher structural accessibility because ribosomes and poly(A)-binding proteins inhibit secondary structure formation (Grimson et al., 2007; Majoros and Ohler, 2007). This hypothesis is congruent with a recent study that showed that RBPs that bind near the target site increase targeting efficacy by opening the RNA secondary structure (Kim et al., 2021). While sites closer to the ends of 3′UTRs confer higher targeting efficacies, studies have also reported that conserved miRNA target sites are relatively scarce in the 15- to 20-nt region after the stop codon, likely due to interference by the translation machinery (Gaidatzis et al., 2007; Grimson et al., 2007; Majoros and Ohler, 2007).
The role of 3′UTR length as a determinant of functional miRNA targeting has been suggested by previous studies (Hausser et al., 2009; Hong et al., 2009; Sandberg et al., 2008; Stark et al., 2005), but the validity and direction of the correlation between 3′UTR length and targeting efficacy remain open to debate (Wen et al., 2011). Nevertheless, regression analyses from recent studies have shown that the 3′UTR length has a significant negative correlation with targeting efficacy (Agarwal et al., 2015; Betel et al., 2010). Three hypotheses have been proposed to explain this correlation. First, a previous study showed a correlation between shorter 3′UTRs and higher structural accessibility through an association analysis, which could improve miRNA targeting (Hong et al., 2009). Second, shorter 3′UTRs have higher target site densities (Hong et al., 2009), which may increase the chance of cooperative action among adjacent target sites (Briskin et al., 2020; Grimson et al., 2007; Saetrom et al., 2007). Third, the higher number of endogenous miRNA target sites in longer 3′UTRs can induce stronger local competition between target sites, leading to derepression and decreased proficiency of target sites for exogenous miRNA (Kim et al., 2014).
Functionally important regulatory elements are likely to be conserved during evolution, and the same principle applies to miRNA target sites. Because miRNA target sites are frequently under negative selection pressure, more than 60% of human protein-coding genes contain conserved target sites (Friedman et al., 2009). Notably, the degree of target site conservation is positively correlated with a repressive effect (Friedman et al., 2009; Nielsen et al., 2007). Most widely used miRNA prediction tools, including TargetScan, miRanda-mirSVR, MIRZA-G, PicTar, and DIANA-microT-CDS, employ evolutionary conservation as a determinant to predict miRNA targeting.
Fifty-four percent of human genes have multiple polyadenylation (poly(A)) sites, and 51% of human poly(A) sites have heterogeneous cleavage sites (Tian et al., 2005), resulting in transcript isoforms with varying 3′UTR lengths. Such isoforms can confound the 3′UTR-related determinants of miRNA targeting, specifically the distance to the 3′UTR ends and the 3′UTR length (Agarwal et al., 2015). The repressive effect of the site can be reduced if the target site is present in only some of the isoforms, as shown by a study that showed that the isoforms containing the target sites are more strongly repressed by miRNAs than those without the target sites (Legendre et al., 2006). Moreover, the difference in miRNA targeting between cell types has been attributed to the 3′UTR isoform composition (Nam et al., 2014). To account for the 3′UTR isoforms, poly(A)-position profiling by sequencing (3P-seq) (Jan et al., 2011) can be used. TargetScan incorporated 3P-seq by calculating the AIR, which is defined as a fraction of the isoforms that contain a specific target site (Agarwal et al., 2015; Nam et al., 2014).
Although a large number of determinants of miRNA targeting have been incorporated into computational models, additional novel determinants have recently been suggested. In this section, we review these determinants of miRNA targeting that could potentially improve the accuracy of miRNA target prediction tools (Fig. 3).
RNAs can be chemically modified in more than 100 different ways, and these RMs play crucial roles in the regulation of both coding and noncoding RNAs (Roundtree et al., 2017). It is widely accepted that RMs, such as N6-methyladenosine (m6A), of miRNA precursors are essential for proper processing and maturation (Alarcon et al., 2015b). Recent reports have suggested that various RMs in miRNAs alter their targeting properties and may have significant physiological effects.
One of the first RMs reported to affect miRNA targeting is adenosine-to-inosine (A-to-I) editing. A-to-I editing is induced by adenosine deaminase acting on the RNA (ADAR) family, both in the nucleus and cytoplasm (Bass and Weintraub, 1988; Wagner et al., 1989). Primary miRNAs (pri-miRNAs) can undergo A-to-I editing via ADARs (Blow et al., 2006; Luciano et al., 2004). While A-to-I editing of pri-miRNAs can interfere with miRNA processing (Yang et al., 2006), some edited nucleotides are retained in mature miRNAs and can also affect miRNA targeting (Kawahara et al., 2007). The effect of A-to-I editing on miRNA targeting was first shown for pri-miR-376a, in which single A-to-I editing in the middle of the seed drastically redirected its target specificity (Kawahara et al., 2007). Only 2 of the 78 predicted targets were retained after A-to-I editing of miR-376a. One of the redirected targets of the edited miR-376a is PRPS1, which encodes an enzyme in the uric acid synthesis pathway. Increased levels of uric acid in the brain cortex of ADAR2 knockout mice demonstrated the biological significance of A-to-I editing of miR-376a. In addition, A-to-I editing of the seed sequence of pri-miR-589-3p by ADAR2 has been reported to redirect its targeting, resulting in the inhibition of glioblastoma progression, in part due to the suppression of ADAM12 expression (Cesarini et al., 2018). The redirection of miRNA targeting by A-to-I editing has been reported for at least 14 different miRNAs (Nishikura, 2016). Although A-to-I editing is considered similar to A-to-G substitution because inosine preferentially pairs with cytidine, one study showed that the effect of A-to-I editing of the miRNA seed region on miRNA targeting is different from the effect of A-to-G substitutions (Kume et al., 2014).
In addition to A-to-I editing, three other classes of RMs present in mature RNAs have been found to affect miRNA targeting. m6A methylation is one of the most prevalent types of RM (Dominissini et al., 2012). m6A modifications are made by writer proteins, including the methyltransferase-like (METTL) family of proteins (Liu et al., 2014), and recognized by reader proteins, including the YTH domain-containing family of proteins (Liao et al., 2018). Although the role of m6A in miRNA biogenesis is well known (Alarcon et al., 2015a; 2015b), its role in miRNA targeting has not been systematically evaluated. A recent study found that the m6A modification of mature miR-200c-3p diminishes its repressive function. Structural prediction of RISC showed that a single m6A modification of the seed of let-7a-5p is sufficient to globally modify the 3-D structure of RISC (Konno et al., 2019).
When guanine is oxidized to 8-oxoguanine (o8G), it can pair with either adenine or cytosine (Michaels et al., 1992). Recent studies have shown that o8G modification of miRNA seeds can redirect its targets, which may result in significant physiological outcomes. For miR-1, when activated adrenergic receptors induce o8G modification in the seed, its targeting is globally redirected (Seok et al., 2020). While miR-1 is known to cause atrophy (Li et al., 2010), the 7o8G miR-1 induces cardiac hypertrophy. It was also reported that the o8G modification of miR-184 redirects the miRNA to Bcl-xL and Bcl-w, key anti-apoptotic genes that are not targeted by unoxidized miR-184 (Wang et al., 2015). Oxidized miR-184 promotes apoptosis and myocardial infarction, highlighting the clinical significance of o8G present in mature miRNAs.
Another abundant type of RM is 5-methylcytosine (m5C), installed by enzymes in the NOL1/NOP2/SUN domain (NSUN) family and DNA methyltransferase 2 (DNMT2) (Bohnsack et al., 2019). A recent study showed that the m5C modification of mature miR-181a-5p abolished its tumor suppressor function through the derepression of Bcl-2-like protein 11 (BIM), a key protein in the initiation of apoptosis (Cheray et al., 2020). In glioblastoma patients, m5C of miR-181a-5p was associated with a worse prognosis, emphasizing the potential physiological importance of m5C modification of mature miRNAs. The study also found that the DNMT3A/AGO4 complex was responsible for m5C modification of miRNAs.
RMs at miRNA target sites and adjacent regions affect miRNA targeting. The binding of the m6A reader protein IGF2BP1 near the miRNA target site in the SRF transcript hinders the binding of miR-2 and miR-125 in an m6A-dependent manner (Muller et al., 2019). In contrast, the binding of another m6A reader protein, IGF2BP2, to the target site of miR-133a increases the repressive efficacy of the miRNA by physically interacting with AGO (Qian et al., 2021). m5C is also suggested to play a role in miRNA targeting based on the enrichment of its putative sites in miRNA target sites (Squires et al., 2012). A-to-I editing in the target sites of miR-30b-3p and miR-573 in the 3′UTR of the ARHGAP26 transcript blocks miRNAs from pairing to the target (Wang et al., 2013).
It is widely accepted that miRNA target sites with higher structural accessibility elicit stronger miRNA-mediated repression (Agarwal et al., 2015). RBPs can open the secondary structures of the 3′UTRs and hence make the target site more accessible for RISC. This phenomenon has been observed for PCBP2 (Lin et al., 2016) and Pumilio (Kedde et al., 2010), inducing a local structural change in the target 3′UTR and thus enhancing miRNA targeting. This hypothesis is supported by a recent study that showed that the number of RBP-binding sites and the proximity of RBP-binding sites to miRNA target sites are positively correlated with increased repression (Kim et al., 2021). Therefore, the locations of the RBP-binding sites in the 3′UTR, which can be accurately determined by enhanced CLIP (eCLIP) (Van Nostrand et al., 2016), could improve miRNA targeting prediction.
Individual RBPs can affect miRNA targeting via mechanisms other than increasing structural accessibility. PTB, an RBP that binds to the polypyrimidine tract, can compete with RISC to bind to miRNA target sites and inhibit miRNA targeting (Xue et al., 2013). HuR, an RBP known to regulate mRNA stability, competes with miR-125b when its binding site is adjacent to its miRNA target site (Ahuja et al., 2016). A recent study showed that UPF1, an essential RBP for NMD (nonsense-mediated decay), directly interacts with Ago2 and enhances target repression through CCR4-NOT-mediated deadenylation, which is another mechanism of miRNA targeting regulation by RBPs (Park et al., 2019). Therefore, utilizing the precise binding site location for each RBP species would allow for accurate miRNA targeting prediction.
In general, stronger binding between a miRNA and its target leads to better repression. However, when a perfect or near-perfect match occurs in both the seed region and the 3′ regions of the miRNA, the miRNA may be subject to degradation, a phenomenon known as target RNA-directed miRNA degradation (TDMD) (Ameres et al., 2010). TDMD was initially studied using artificial exogenous RNAs (Ameres et al., 2010) and viral RNAs (Cazalla et al., 2010). Some viral RNA transcripts, including those of HVS (herpesvirus saimiri)(Cazalla et al., 2010), MCMV (murine cytomegalovirus) (Libri et al., 2012), and HCMV (human cytomegalovirus) (Lee et al., 2013), utilize TDMD as an evolutionary strategy for immune evasion. However, the physiological role of TDMD in animal cells has been debated (Fuchs Wightman et al., 2018).
Recently, the physiological role of TDMD induced by endogenous miRNA targets has been suggested in three important RNA examples: Cyrano, Nrep, and Serpine1. Cyrano is a long non-coding RNA (lncRNA) broadly conserved in vertebrates and plays a crucial role in early embryonic development (Ulitsky et al., 2011). Cyrano pairs perfectly with miR-7 both at the seed and 3′ regions, inducing effective degradation of miR-7 (Kleaveland et al., 2018). Degradation of miR-7 by Cyrano leads to the derepression and accumulation of the circular RNA (circRNA) Cdr1as, which is associated with neuronal development, neurodegenerative diseases, and cancer metastasis (Memczak et al., 2013). Further studies found that TDMD by Cyrano is mediated by ZSWIM8 polyubiquitin ligase, which induces degradation of AGO by polyubiquitination and exposes the miRNA to destruction (Han et al., 2020a; Shi et al., 2020). Serpine1 plays an important role in the cell cycle re-entry of quiescent cells (Iyer et al., 1999). The serpine1 transcript induces TDMD of miR-30b/c during cell cycle re-entry through miRNA tailing, and the degradation of miR-30b/c is crucial for accelerating the G1/S transition during cell cycle re-entry (Ghini et al., 2018). The Nrep transcript (known as libra in zebrafish) is expressed in vertebrate brains and induces TDMD of miR-29b by 3′ trimming but not tailing. Scrambling the Nrep transcript to inhibit TDMD results in an increase in miR-29b expression in the cerebellum, which in turn leads to behavioral deficits in mice (Bitetti et al., 2018). These examples demonstrate the significant physiological effects of TDMD on endogenous transcripts in mammals.
It is widely recognized that the abundance of miRNA target sites is negatively correlated with targeting efficacy (Arvey et al., 2010; Garcia et al., 2011). This correlation can be explained by the competitive endogenous RNA (ceRNA) hypothesis that transcriptome-wide competition occurs among target sites, including those in the mRNAs, circRNAs, pseudogenes, and lncRNAs (Salmena et al., 2011; Thomson and Dinger, 2016). Experimental evidence indicates that some transcripts, called miRNA sponges, can effectively sequester miRNAs. For instance, the PTEN pseudogene (
To effectively incorporate previously unappreciated determinants into computational miRNA target prediction tools, high-dimensional input data should be utilized (Fig. 4). For instance, RBP-binding site information generated by eCLIP-seq (Van Nostrand et al., 2016) is represented in a multidimensional score matrix. To account for the RMs in the miRNA and 3′UTR, a large-scale dataset such as m6A-seq (Dominissini et al., 2012) should be utilized. An alternative approach that maps RMs directly from Oxford Nanopore sequencing has been suggested and can be used for miRNA target prediction (Leger et al., 2021). Secondary structures of the 3′UTRs can also be incorporated into the model using selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE)-seq (Lucks et al., 2011) and dimethyl sulfate (DMS)-seq (Rouskin et al., 2014). By combining all these datasets, a high-dimensional input dataset can be prepared for functional miRNA target prediction use.
Deep learning approaches can be used to integrate high-dimensional datasets for accurate miRNA target prediction. Recently, deep learning models, such as CNNs (convolutional neural networks) (Townshend et al., 2021; Zeng et al., 2016), RNNs (recurrent neural networks) (Quang and Xie, 2016; Tasdelen and Sen, 2021), and transformers (Ji et al., 2021; Wang et al., 2021) that utilize nucleic acid sequences and multi-omics data, have generated promising results. Indeed, multiple efforts have been made to employ deep learning for miRNA target prediction (McGeary et al., 2019; Min et al., 2022; Talukder et al., 2022). However, their overall predictive accuracies have been limited, perhaps because these models do not fully integrate the aforementioned determinants of miRNA targeting into the deep learning model. Therefore, a deep learning model that integrates high-dimensional data, accounting for a more comprehensive list of determinants, including those that have not been used previously, may yield significantly improved predictions of functional miRNA targeting.
miRNAs are indispensable regulators of gene expression that affect various biological processes, pathways, and diseases by forming complex regulatory networks. Currently, a large number of miRNA target prediction tools have been developed and are widely used. Most of these tools use seed pairing with other well-known determinants of miRNA targeting, including local AU content, structural accessibility, seed-pairing stability, target abundance, and evolutionary conservation. Each of these determinants is significantly correlated with miRNA targeting efficacy and helps to accurately predict functional miRNA targeting. Tools that incorporate a larger number of determinants, such as TargetScan, produce more accurate predictions than those that do not. However, all currently available computational tools exhibit limited prediction accuracy.
Incorporation of unappreciated miRNA targeting determinants such as RMs in mature miRNAs, RMs in miRNA target mRNAs, target RNA-directed miRNA degradation, ceRNAs, and RBP-binding sites may further improve the accuracy of the computational tools aforementioned. For most of these determinants, their regulatory impacts on miRNA targeting have been validated in various studies, yet some determinants, such as RMs, should be assessed more systematically. To integrate these unappreciated determinants into the prediction model, high-dimensional multiomics datasets should be utilized, and improving computational tools for functional miRNA prediction would help to accurately decipher the complex regulatory network of miRNAs.
This study was supported by the National Research Foundation of Korea (NRF), which is funded by the Ministry of Science and ICT, Republic of Korea (NRF-2014M3C9A3063541, NRF-2019M3E5D3073104, NRF-2020R1A2C3007032, NRF-2020R1A5A1018081, and NRF-2022M3A9I2082294), the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health and Welfare, Republic of Korea (HI15C3224), and the Korea National Institute of Health (KNIH) which is funded by the Korea Disease Control and Prevention Agency (KDCA) (2022-ER1605-00).
H.H. wrote the original draft. H.H., H.R.C., and D.B. reviewed and revised the manuscript.
The authors have no potential conflicts of interest to disclose.
A table of representative computational tools for miRNA target prediction and the determinants they use
Model | Seed | TPS | EC | SA | Dist. | AU | Len. | 3Sup. | TA | ORFS |
---|---|---|---|---|---|---|---|---|---|---|
TargetScan7 | O | SPS | O | O | O | O | O | O | O | 8m |
miRanda-mirSVR | O | X | O | O | O | O | O | O | X | X |
DIANA-microT-CDS | O | O | O | O | O | O | X | X | X | O |
MIRZA-G | O | O | O | O | O | X | X | X | X | X |
PITA | Opt. | O | X | O | X | X | X | X | X | X |
PicTar | O | O | O | X | X | X | X | X | X | X |
RNAhybrid | Opt. | O | X | X | X | X | X | X | X | X |
MicroTar | O | O | X | X | X | X | X | X | X | X |
Seed, seed match or site type; TPS, thermodynamic pairing stability; EC, evolutionary conservation; SA, structural accessibility; Dist., distance to 3′UTR ends or relative position of the target sites in the 3′UTR; AU, AU or GC content; Len., length of transcript or UTR; 3Sup., 3′ supplementary pairing; TA, target abundance; ORFS, ORF or CDS sites; Opt., optional; SPS, seed pairing stability; 8m, number of 8-mer sites in the ORF.
Mol. Cells 2023; 46(1): 21-32
Published online January 31, 2023 https://doi.org/10.14348/molcells.2023.2157
Copyright © The Korean Society for Molecular and Cellular Biology.
Hyeonseo Hwang , Hee Ryung Chang
, and Daehyun Baek*
School of Biological Sciences, Seoul National University, Seoul 08826, Korea
Correspondence to:baek@snu.ac.kr
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
MicroRNAs (miRNAs) play cardinal roles in regulating biological pathways and processes, resulting in significant physiological effects. To understand the complex regulatory network of miRNAs, previous studies have utilized massivescale datasets of miRNA targeting and attempted to computationally predict the functional targets of miRNAs. Many miRNA target prediction tools have been developed and are widely used by scientists from various fields of biology and medicine. Most of these tools consider seed pairing between miRNAs and their mRNA targets and additionally consider other determinants to improve prediction accuracy. However, these tools exhibit limited prediction accuracy and high false positive rates. The utilization of additional determinants, such as RNA modifications and RNA-binding protein binding sites, may further improve miRNA target prediction. In this review, we discuss the determinants of functional miRNA targeting that are currently used in miRNA target prediction and the potentially predictive but unappreciated determinants that may improve prediction accuracy.
Keywords: bioinformatics, microRNA, microRNA target prediction, microRNA targeting, microRNA targeting determinants
MicroRNAs (miRNAs) regulate a broad range of biological processes and physiological pathways, including tumor suppression and progression (Gregory et al., 2008; Jang et al., 2020; O'Donnell et al., 2005; Peng and Croce, 2016), immune cell development and function (Fontana et al., 2007; Han et al., 2020b; Mehta and Baltimore, 2016; Muljo et al., 2005; O'Connell et al., 2010; Rodriguez et al., 2007; Thai et al., 2007), cardiovascular diseases (Care et al., 2007; Harris et al., 2008), neural development and function (Brennan and Henshall, 2020; Ghosh et al., 2014; Giraldez et al., 2005; Krichevsky et al., 2006; Tan et al., 2013), early embryonic development (Bernstein et al., 2003; Choi et al., 2007; Wienholds et al., 2005), and cytoskeletal dynamics (Fededa et al., 2016; Wu et al., 2014; Xin et al., 2009). Since miRNAs modulate gene expression by binding to their targets, it is crucial to systematically identify and evaluate all the functional targets of miRNAs to better understand complex miRNA regulatory networks (Bracken et al., 2016; Plaisier et al., 2012; Pu et al., 2019).
The generation of a comprehensive list of miRNA targets is challenging because miRNA targets are abundant and widespread across the transcriptome (Friedman et al., 2009; Lim et al., 2005; Selbach et al., 2008). The classical strategy for studying miRNA targeting involves the identification of potential targets using experimental methods, including microarray (Lim et al., 2005), RIP (RNA immunoprecipitation) (Keene et al., 2006), cross-linking immunoprecipitation (CLIP) (Chi et al., 2009), miRNA pull-down (Hassan et al., 2013; Orom and Lund, 2010), and luciferase reporter assays (Ghanbarian et al., 2022; Thomson et al., 2011; Tuschl et al., 1999). The generation of massive amounts of data through these experiments is often limited by a prohibitively large amount of labor, time, and cost. Alternatively, functional miRNA targeting can be identified using computational tools that are much faster and more accessible than conventional experimental methods.
Recently, many computational tools for miRNA targeting prediction have been developed (Nachtigall and Bovolenta, 2022). However, the predictive power of these tools requires further improvement. For instance, even with the state-of-the-art prediction tool (Agarwal et al., 2015; McGeary et al., 2019), the predicted repressive effects show a weak correlation (r2 ≤ 0.2) with the experimental data (Agarwal et al., 2015; McGeary et al., 2019). Here, we review the determinants of functional miRNA targeting that have not been used in computational miRNA target prediction tools. The incorporation of previously unappreciated determinants may improve the performance of miRNA target prediction tools.
Mature miRNAs form a protein-RNA complex, termed the RNA-induced silencing complex (RISC), with the Argonaute (AGO) protein (Gregory et al., 2005; Hammond et al., 2000). The current RISC model suggests that four nucleotides (nucleotides 2-5 of guide RNA; g2-5) at the 5′ end of the miRNA are exposed before binding to the target (Schirle et al., 2014). The pairing between this region of a miRNA and its target mRNA induces a conformational change in AGO and exposes additional nucleotides (g6-8 and 13-16) that allow stable pairing between RISC and its target. The sequence of 6 or 7 nucleotides (g2-7 or g2-8) at the 5′ end of the miRNAs, called the seed region, primarily determines the target specificity of the miRNAs (Bartel, 2009). The nucleotides exposed at the 3′ ends of miRNAs may pair with the target and supplement seed pairing (Grimson et al., 2007).
The primary determinant of miRNA targeting efficacy is pairing in the seed region. Mismatches in the seed region reduce the potency of the target site. The miRNA target sites are classified into 'canonical' sites that contain a perfect hexamer match with the seed (g2-7), and the ‘noncanonical’ sites that match imperfectly with the seed (Bartel, 2009) (Fig. 1). The canonical miRNA target sites are further divided into four types according to two criteria: a t8 match with the g8 nucleotide and the presence of adenine at position t1. The 8-mer sites with both a t8 match and a t1A match are the most effective, followed by the 7-mer m8, 7-mer A1, and 6-mer sites (Baek et al., 2008; Bartel, 2009; Lewis et al., 2005). Noncanonical sites generally confer weaker repressive effects than canonical sites, and a subset of the noncanonical sites are only effective in favorable contexts, hence the term 'context-dependent noncanonical site types (CDNSTs)’ (Kim et al., 2016). Currently, offset 7-mer (g3-8 match), offset 6-mer (g3-7 match), 6-mer A1 (g2-6 match with t1A), and four other CDNSTs have been identified as functional noncanonical sites through large-scale bioinformatics analysis (Kim et al., 2016). Many widely used miRNA target prediction tools, such as TargetScan (Agarwal et al., 2015), do not fully account for noncanonical sites.
Each miRNA target prediction tool utilizes a different set of determinants (Table 1). Earlier tools, including PITA (Kertesz et al., 2007), MicroTar (Thadani and Tammi, 2006), RNAhybrid (Kruger and Rehmsmeier, 2006), and PicTar (Krek et al., 2005) rely only on a few determinants, whereas more recent tools, such as TargetScan (Agarwal et al., 2015; McGeary et al., 2019), MIRZA-G (Gumienny and Zavolan, 2015), DIANA-microT-CDS (Reczko et al., 2012), and miRanda-mirSVR (Betel et al., 2010) utilize a larger number of determinants. Most of these tools use seed pairing, thermodynamic stability, evolutionary conservation, and the structural accessibility of target sites. Of the currently available miRNA targeting prediction tools, TargetScan is considered to be the most accurate tool that utilizes the largest number of determinants in the model. Here, we summarize the determinants of miRNA targeting used in these tools (Fig. 2).
Local AU content around the target site is one of the most prominent determinants of miRNA targeting. Effective and conserved target sites are associated with higher local AU content but not global AU content (Grimson et al., 2007; Nielsen et al., 2007). Although the mechanism by which local AU content affects miRNA targeting remains unclear, many target prediction tools, including TargetScan7, miRanda-mirSVR, and DIANA-microT-CDS, utilize this feature. Two hypotheses that are not mutually exclusive have been proposed to explain the correlation between stronger repression and higher local AU content. The first hypothesis is that RISC interacts with AREs (AU-rich elements), and the repressive efficacy of the nearby target is enhanced by this interaction (Jing et al., 2005; Nielsen et al., 2007; Vasudevan and Steitz, 2007). An alternative hypothesis is that high AU content inhibits the formation of secondary structures and increases the accessibility of the target, leading to stronger repression (Grimson et al., 2007; Nielsen et al., 2007).
The structural accessibility of the target site, mainly determined by the secondary or tertiary structure of the 3′UTR, positively correlates with targeting efficacy (Kertesz et al., 2007). Higher site accessibility increases the chance of RISC binding to its target, leading to stronger repression. The structural accessibility of the target site can be measured as ΔG or ΔΔG, which is the difference in ΔG between the folded and open 3′UTRs. A previous study demonstrated that ΔΔG had a stronger correlation with the repressive effect than ΔG of the folded 3′UTR (Kertesz et al., 2007). Another study argued that structural accessibility, measured by ΔG, lost its correlation with repression when local AU was controlled (Grimson et al., 2007). A more recent study showed that when structural accessibility was measured in log unpaired probability using RNAplFold (Bernhart et al., 2006), the correlation was present even when other confounding factors, including local AU, were controlled (Agarwal et al., 2015).
The thermodynamic pairing stability between a miRNA and its target is a determinant widely used for miRNA targeting prediction. There are two approaches to calculating pairing stability: (1) using the whole miRNA sequence and (2) using only the seed region of the miRNA, which is termed seed pairing stability. Many miRNA target prediction tools, including DIANA-microT-CDS, MIRZA-G, PITA, PicTar, RNAhybrid, and MicroTar, rely heavily on the predicted thermodynamic pairing stability between the entire mature miRNA and target. However, TargetScan7 utilizes seed pairing stability as a feature (Agarwal et al., 2015). Seed pairing stability is negatively correlated with the abundance of the target site in the 3′UTR of mRNAs (Garcia et al., 2011). Targets with a higher abundance are associated with weaker repressive effects, which may be due to competition between miRNA targets (Arvey et al., 2010). Despite the correlation between target abundance and seed pairing stability, both determinants affect miRNA targeting efficacy independently and globally (Garcia et al., 2011).
Recently, RNA Bind-n-Seq (RBNS), a technique that measures the affinity between a protein and random RNA sequences by pull-down of RNA-binding protein (RBP) with bound RNA followed by high-throughput sequencing (Lambert et al., 2014), has been employed to measure the binding affinity between miRNA-loaded AGO and 12-nt RNA sequences (McGeary et al., 2019). The AGO-RBNS data have been used to develop a biochemical model of miRNA targeting and are included in TargetScan8 (McGeary et al., 2019).
When RISC opens up after the initial pairing between the miRNA and the target RNA, two regions of the miRNA open up: nucleotides 2-8 and 13-16 (Schirle et al., 2014). While miRNA targeting is primarily determined by the former, the latter region also affects the repressive effect of miRNAs (Friedman et al., 2009; Grimson et al., 2007). The pairing of the 3′ regions of the miRNA, including nucleotides 13-16, may supplement the canonical seed match or compensate for the noncanonical seed match (Bartel, 2009). About 5% of the conserved target sites have supplementary 3′ pairings, and less than 2% have compensatory 3′ pairings (Friedman et al., 2009). A recent study (McGeary et al., 2022) reported that each miRNA has different preferences for an optimal 3′ pairing, while some miRNAs, such as let-7, can have two 3′ binding modes with different offsets, and nucleotides outside of positions 13-16 could have a significant impact on the 3′ binding affinities.
Although the majority of functional miRNA target sites are located in the 3′UTRs, targets in the ORFs may have weak repressive effects. ORF sites can strengthen the repressive effect of 3′UTR sites in a synergistic or additive manner (Fang and Rajewsky, 2011). The results of DIANA-microT-CDS suggest that the incorporation of ORF site as a determinant may aid miRNA targeting prediction (Reczko et al., 2012).
Target sites closer to either end of the 3′UTR are associated with higher repressive efficacy (Gaidatzis et al., 2007; Grimson et al., 2007; Majoros and Ohler, 2007). There are two hypotheses to explain the correlation between repression and the minimum distance to the 3′UTR. The first hypothesis is that the target sites near either end of the 3′UTR could be closer to the translation or RNA-processing complexes (Gaidatzis et al., 2007; Grimson et al., 2007). At the 3′ end, mRNA looping may bring the site closer to the protein complex (Grimson et al., 2007). Another possibility is that the sites near the ends have higher structural accessibility because ribosomes and poly(A)-binding proteins inhibit secondary structure formation (Grimson et al., 2007; Majoros and Ohler, 2007). This hypothesis is congruent with a recent study that showed that RBPs that bind near the target site increase targeting efficacy by opening the RNA secondary structure (Kim et al., 2021). While sites closer to the ends of 3′UTRs confer higher targeting efficacies, studies have also reported that conserved miRNA target sites are relatively scarce in the 15- to 20-nt region after the stop codon, likely due to interference by the translation machinery (Gaidatzis et al., 2007; Grimson et al., 2007; Majoros and Ohler, 2007).
The role of 3′UTR length as a determinant of functional miRNA targeting has been suggested by previous studies (Hausser et al., 2009; Hong et al., 2009; Sandberg et al., 2008; Stark et al., 2005), but the validity and direction of the correlation between 3′UTR length and targeting efficacy remain open to debate (Wen et al., 2011). Nevertheless, regression analyses from recent studies have shown that the 3′UTR length has a significant negative correlation with targeting efficacy (Agarwal et al., 2015; Betel et al., 2010). Three hypotheses have been proposed to explain this correlation. First, a previous study showed a correlation between shorter 3′UTRs and higher structural accessibility through an association analysis, which could improve miRNA targeting (Hong et al., 2009). Second, shorter 3′UTRs have higher target site densities (Hong et al., 2009), which may increase the chance of cooperative action among adjacent target sites (Briskin et al., 2020; Grimson et al., 2007; Saetrom et al., 2007). Third, the higher number of endogenous miRNA target sites in longer 3′UTRs can induce stronger local competition between target sites, leading to derepression and decreased proficiency of target sites for exogenous miRNA (Kim et al., 2014).
Functionally important regulatory elements are likely to be conserved during evolution, and the same principle applies to miRNA target sites. Because miRNA target sites are frequently under negative selection pressure, more than 60% of human protein-coding genes contain conserved target sites (Friedman et al., 2009). Notably, the degree of target site conservation is positively correlated with a repressive effect (Friedman et al., 2009; Nielsen et al., 2007). Most widely used miRNA prediction tools, including TargetScan, miRanda-mirSVR, MIRZA-G, PicTar, and DIANA-microT-CDS, employ evolutionary conservation as a determinant to predict miRNA targeting.
Fifty-four percent of human genes have multiple polyadenylation (poly(A)) sites, and 51% of human poly(A) sites have heterogeneous cleavage sites (Tian et al., 2005), resulting in transcript isoforms with varying 3′UTR lengths. Such isoforms can confound the 3′UTR-related determinants of miRNA targeting, specifically the distance to the 3′UTR ends and the 3′UTR length (Agarwal et al., 2015). The repressive effect of the site can be reduced if the target site is present in only some of the isoforms, as shown by a study that showed that the isoforms containing the target sites are more strongly repressed by miRNAs than those without the target sites (Legendre et al., 2006). Moreover, the difference in miRNA targeting between cell types has been attributed to the 3′UTR isoform composition (Nam et al., 2014). To account for the 3′UTR isoforms, poly(A)-position profiling by sequencing (3P-seq) (Jan et al., 2011) can be used. TargetScan incorporated 3P-seq by calculating the AIR, which is defined as a fraction of the isoforms that contain a specific target site (Agarwal et al., 2015; Nam et al., 2014).
Although a large number of determinants of miRNA targeting have been incorporated into computational models, additional novel determinants have recently been suggested. In this section, we review these determinants of miRNA targeting that could potentially improve the accuracy of miRNA target prediction tools (Fig. 3).
RNAs can be chemically modified in more than 100 different ways, and these RMs play crucial roles in the regulation of both coding and noncoding RNAs (Roundtree et al., 2017). It is widely accepted that RMs, such as N6-methyladenosine (m6A), of miRNA precursors are essential for proper processing and maturation (Alarcon et al., 2015b). Recent reports have suggested that various RMs in miRNAs alter their targeting properties and may have significant physiological effects.
One of the first RMs reported to affect miRNA targeting is adenosine-to-inosine (A-to-I) editing. A-to-I editing is induced by adenosine deaminase acting on the RNA (ADAR) family, both in the nucleus and cytoplasm (Bass and Weintraub, 1988; Wagner et al., 1989). Primary miRNAs (pri-miRNAs) can undergo A-to-I editing via ADARs (Blow et al., 2006; Luciano et al., 2004). While A-to-I editing of pri-miRNAs can interfere with miRNA processing (Yang et al., 2006), some edited nucleotides are retained in mature miRNAs and can also affect miRNA targeting (Kawahara et al., 2007). The effect of A-to-I editing on miRNA targeting was first shown for pri-miR-376a, in which single A-to-I editing in the middle of the seed drastically redirected its target specificity (Kawahara et al., 2007). Only 2 of the 78 predicted targets were retained after A-to-I editing of miR-376a. One of the redirected targets of the edited miR-376a is PRPS1, which encodes an enzyme in the uric acid synthesis pathway. Increased levels of uric acid in the brain cortex of ADAR2 knockout mice demonstrated the biological significance of A-to-I editing of miR-376a. In addition, A-to-I editing of the seed sequence of pri-miR-589-3p by ADAR2 has been reported to redirect its targeting, resulting in the inhibition of glioblastoma progression, in part due to the suppression of ADAM12 expression (Cesarini et al., 2018). The redirection of miRNA targeting by A-to-I editing has been reported for at least 14 different miRNAs (Nishikura, 2016). Although A-to-I editing is considered similar to A-to-G substitution because inosine preferentially pairs with cytidine, one study showed that the effect of A-to-I editing of the miRNA seed region on miRNA targeting is different from the effect of A-to-G substitutions (Kume et al., 2014).
In addition to A-to-I editing, three other classes of RMs present in mature RNAs have been found to affect miRNA targeting. m6A methylation is one of the most prevalent types of RM (Dominissini et al., 2012). m6A modifications are made by writer proteins, including the methyltransferase-like (METTL) family of proteins (Liu et al., 2014), and recognized by reader proteins, including the YTH domain-containing family of proteins (Liao et al., 2018). Although the role of m6A in miRNA biogenesis is well known (Alarcon et al., 2015a; 2015b), its role in miRNA targeting has not been systematically evaluated. A recent study found that the m6A modification of mature miR-200c-3p diminishes its repressive function. Structural prediction of RISC showed that a single m6A modification of the seed of let-7a-5p is sufficient to globally modify the 3-D structure of RISC (Konno et al., 2019).
When guanine is oxidized to 8-oxoguanine (o8G), it can pair with either adenine or cytosine (Michaels et al., 1992). Recent studies have shown that o8G modification of miRNA seeds can redirect its targets, which may result in significant physiological outcomes. For miR-1, when activated adrenergic receptors induce o8G modification in the seed, its targeting is globally redirected (Seok et al., 2020). While miR-1 is known to cause atrophy (Li et al., 2010), the 7o8G miR-1 induces cardiac hypertrophy. It was also reported that the o8G modification of miR-184 redirects the miRNA to Bcl-xL and Bcl-w, key anti-apoptotic genes that are not targeted by unoxidized miR-184 (Wang et al., 2015). Oxidized miR-184 promotes apoptosis and myocardial infarction, highlighting the clinical significance of o8G present in mature miRNAs.
Another abundant type of RM is 5-methylcytosine (m5C), installed by enzymes in the NOL1/NOP2/SUN domain (NSUN) family and DNA methyltransferase 2 (DNMT2) (Bohnsack et al., 2019). A recent study showed that the m5C modification of mature miR-181a-5p abolished its tumor suppressor function through the derepression of Bcl-2-like protein 11 (BIM), a key protein in the initiation of apoptosis (Cheray et al., 2020). In glioblastoma patients, m5C of miR-181a-5p was associated with a worse prognosis, emphasizing the potential physiological importance of m5C modification of mature miRNAs. The study also found that the DNMT3A/AGO4 complex was responsible for m5C modification of miRNAs.
RMs at miRNA target sites and adjacent regions affect miRNA targeting. The binding of the m6A reader protein IGF2BP1 near the miRNA target site in the SRF transcript hinders the binding of miR-2 and miR-125 in an m6A-dependent manner (Muller et al., 2019). In contrast, the binding of another m6A reader protein, IGF2BP2, to the target site of miR-133a increases the repressive efficacy of the miRNA by physically interacting with AGO (Qian et al., 2021). m5C is also suggested to play a role in miRNA targeting based on the enrichment of its putative sites in miRNA target sites (Squires et al., 2012). A-to-I editing in the target sites of miR-30b-3p and miR-573 in the 3′UTR of the ARHGAP26 transcript blocks miRNAs from pairing to the target (Wang et al., 2013).
It is widely accepted that miRNA target sites with higher structural accessibility elicit stronger miRNA-mediated repression (Agarwal et al., 2015). RBPs can open the secondary structures of the 3′UTRs and hence make the target site more accessible for RISC. This phenomenon has been observed for PCBP2 (Lin et al., 2016) and Pumilio (Kedde et al., 2010), inducing a local structural change in the target 3′UTR and thus enhancing miRNA targeting. This hypothesis is supported by a recent study that showed that the number of RBP-binding sites and the proximity of RBP-binding sites to miRNA target sites are positively correlated with increased repression (Kim et al., 2021). Therefore, the locations of the RBP-binding sites in the 3′UTR, which can be accurately determined by enhanced CLIP (eCLIP) (Van Nostrand et al., 2016), could improve miRNA targeting prediction.
Individual RBPs can affect miRNA targeting via mechanisms other than increasing structural accessibility. PTB, an RBP that binds to the polypyrimidine tract, can compete with RISC to bind to miRNA target sites and inhibit miRNA targeting (Xue et al., 2013). HuR, an RBP known to regulate mRNA stability, competes with miR-125b when its binding site is adjacent to its miRNA target site (Ahuja et al., 2016). A recent study showed that UPF1, an essential RBP for NMD (nonsense-mediated decay), directly interacts with Ago2 and enhances target repression through CCR4-NOT-mediated deadenylation, which is another mechanism of miRNA targeting regulation by RBPs (Park et al., 2019). Therefore, utilizing the precise binding site location for each RBP species would allow for accurate miRNA targeting prediction.
In general, stronger binding between a miRNA and its target leads to better repression. However, when a perfect or near-perfect match occurs in both the seed region and the 3′ regions of the miRNA, the miRNA may be subject to degradation, a phenomenon known as target RNA-directed miRNA degradation (TDMD) (Ameres et al., 2010). TDMD was initially studied using artificial exogenous RNAs (Ameres et al., 2010) and viral RNAs (Cazalla et al., 2010). Some viral RNA transcripts, including those of HVS (herpesvirus saimiri)(Cazalla et al., 2010), MCMV (murine cytomegalovirus) (Libri et al., 2012), and HCMV (human cytomegalovirus) (Lee et al., 2013), utilize TDMD as an evolutionary strategy for immune evasion. However, the physiological role of TDMD in animal cells has been debated (Fuchs Wightman et al., 2018).
Recently, the physiological role of TDMD induced by endogenous miRNA targets has been suggested in three important RNA examples: Cyrano, Nrep, and Serpine1. Cyrano is a long non-coding RNA (lncRNA) broadly conserved in vertebrates and plays a crucial role in early embryonic development (Ulitsky et al., 2011). Cyrano pairs perfectly with miR-7 both at the seed and 3′ regions, inducing effective degradation of miR-7 (Kleaveland et al., 2018). Degradation of miR-7 by Cyrano leads to the derepression and accumulation of the circular RNA (circRNA) Cdr1as, which is associated with neuronal development, neurodegenerative diseases, and cancer metastasis (Memczak et al., 2013). Further studies found that TDMD by Cyrano is mediated by ZSWIM8 polyubiquitin ligase, which induces degradation of AGO by polyubiquitination and exposes the miRNA to destruction (Han et al., 2020a; Shi et al., 2020). Serpine1 plays an important role in the cell cycle re-entry of quiescent cells (Iyer et al., 1999). The serpine1 transcript induces TDMD of miR-30b/c during cell cycle re-entry through miRNA tailing, and the degradation of miR-30b/c is crucial for accelerating the G1/S transition during cell cycle re-entry (Ghini et al., 2018). The Nrep transcript (known as libra in zebrafish) is expressed in vertebrate brains and induces TDMD of miR-29b by 3′ trimming but not tailing. Scrambling the Nrep transcript to inhibit TDMD results in an increase in miR-29b expression in the cerebellum, which in turn leads to behavioral deficits in mice (Bitetti et al., 2018). These examples demonstrate the significant physiological effects of TDMD on endogenous transcripts in mammals.
It is widely recognized that the abundance of miRNA target sites is negatively correlated with targeting efficacy (Arvey et al., 2010; Garcia et al., 2011). This correlation can be explained by the competitive endogenous RNA (ceRNA) hypothesis that transcriptome-wide competition occurs among target sites, including those in the mRNAs, circRNAs, pseudogenes, and lncRNAs (Salmena et al., 2011; Thomson and Dinger, 2016). Experimental evidence indicates that some transcripts, called miRNA sponges, can effectively sequester miRNAs. For instance, the PTEN pseudogene (
To effectively incorporate previously unappreciated determinants into computational miRNA target prediction tools, high-dimensional input data should be utilized (Fig. 4). For instance, RBP-binding site information generated by eCLIP-seq (Van Nostrand et al., 2016) is represented in a multidimensional score matrix. To account for the RMs in the miRNA and 3′UTR, a large-scale dataset such as m6A-seq (Dominissini et al., 2012) should be utilized. An alternative approach that maps RMs directly from Oxford Nanopore sequencing has been suggested and can be used for miRNA target prediction (Leger et al., 2021). Secondary structures of the 3′UTRs can also be incorporated into the model using selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE)-seq (Lucks et al., 2011) and dimethyl sulfate (DMS)-seq (Rouskin et al., 2014). By combining all these datasets, a high-dimensional input dataset can be prepared for functional miRNA target prediction use.
Deep learning approaches can be used to integrate high-dimensional datasets for accurate miRNA target prediction. Recently, deep learning models, such as CNNs (convolutional neural networks) (Townshend et al., 2021; Zeng et al., 2016), RNNs (recurrent neural networks) (Quang and Xie, 2016; Tasdelen and Sen, 2021), and transformers (Ji et al., 2021; Wang et al., 2021) that utilize nucleic acid sequences and multi-omics data, have generated promising results. Indeed, multiple efforts have been made to employ deep learning for miRNA target prediction (McGeary et al., 2019; Min et al., 2022; Talukder et al., 2022). However, their overall predictive accuracies have been limited, perhaps because these models do not fully integrate the aforementioned determinants of miRNA targeting into the deep learning model. Therefore, a deep learning model that integrates high-dimensional data, accounting for a more comprehensive list of determinants, including those that have not been used previously, may yield significantly improved predictions of functional miRNA targeting.
miRNAs are indispensable regulators of gene expression that affect various biological processes, pathways, and diseases by forming complex regulatory networks. Currently, a large number of miRNA target prediction tools have been developed and are widely used. Most of these tools use seed pairing with other well-known determinants of miRNA targeting, including local AU content, structural accessibility, seed-pairing stability, target abundance, and evolutionary conservation. Each of these determinants is significantly correlated with miRNA targeting efficacy and helps to accurately predict functional miRNA targeting. Tools that incorporate a larger number of determinants, such as TargetScan, produce more accurate predictions than those that do not. However, all currently available computational tools exhibit limited prediction accuracy.
Incorporation of unappreciated miRNA targeting determinants such as RMs in mature miRNAs, RMs in miRNA target mRNAs, target RNA-directed miRNA degradation, ceRNAs, and RBP-binding sites may further improve the accuracy of the computational tools aforementioned. For most of these determinants, their regulatory impacts on miRNA targeting have been validated in various studies, yet some determinants, such as RMs, should be assessed more systematically. To integrate these unappreciated determinants into the prediction model, high-dimensional multiomics datasets should be utilized, and improving computational tools for functional miRNA prediction would help to accurately decipher the complex regulatory network of miRNAs.
This study was supported by the National Research Foundation of Korea (NRF), which is funded by the Ministry of Science and ICT, Republic of Korea (NRF-2014M3C9A3063541, NRF-2019M3E5D3073104, NRF-2020R1A2C3007032, NRF-2020R1A5A1018081, and NRF-2022M3A9I2082294), the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health and Welfare, Republic of Korea (HI15C3224), and the Korea National Institute of Health (KNIH) which is funded by the Korea Disease Control and Prevention Agency (KDCA) (2022-ER1605-00).
H.H. wrote the original draft. H.H., H.R.C., and D.B. reviewed and revised the manuscript.
The authors have no potential conflicts of interest to disclose.
A table of representative computational tools for miRNA target prediction and the determinants they use
Model | Seed | TPS | EC | SA | Dist. | AU | Len. | 3Sup. | TA | ORFS |
---|---|---|---|---|---|---|---|---|---|---|
TargetScan7 | O | SPS | O | O | O | O | O | O | O | 8m |
miRanda-mirSVR | O | X | O | O | O | O | O | O | X | X |
DIANA-microT-CDS | O | O | O | O | O | O | X | X | X | O |
MIRZA-G | O | O | O | O | O | X | X | X | X | X |
PITA | Opt. | O | X | O | X | X | X | X | X | X |
PicTar | O | O | O | X | X | X | X | X | X | X |
RNAhybrid | Opt. | O | X | X | X | X | X | X | X | X |
MicroTar | O | O | X | X | X | X | X | X | X | X |
Seed, seed match or site type; TPS, thermodynamic pairing stability; EC, evolutionary conservation; SA, structural accessibility; Dist., distance to 3′UTR ends or relative position of the target sites in the 3′UTR; AU, AU or GC content; Len., length of transcript or UTR; 3Sup., 3′ supplementary pairing; TA, target abundance; ORFS, ORF or CDS sites; Opt., optional; SPS, seed pairing stability; 8m, number of 8-mer sites in the ORF.
. A table of representative computational tools for miRNA target prediction and the determinants they use.
Model | Seed | TPS | EC | SA | Dist. | AU | Len. | 3Sup. | TA | ORFS |
---|---|---|---|---|---|---|---|---|---|---|
TargetScan7 | O | SPS | O | O | O | O | O | O | O | 8m |
miRanda-mirSVR | O | X | O | O | O | O | O | O | X | X |
DIANA-microT-CDS | O | O | O | O | O | O | X | X | X | O |
MIRZA-G | O | O | O | O | O | X | X | X | X | X |
PITA | Opt. | O | X | O | X | X | X | X | X | X |
PicTar | O | O | O | X | X | X | X | X | X | X |
RNAhybrid | Opt. | O | X | X | X | X | X | X | X | X |
MicroTar | O | O | X | X | X | X | X | X | X | X |
Seed, seed match or site type; TPS, thermodynamic pairing stability; EC, evolutionary conservation; SA, structural accessibility; Dist., distance to 3′UTR ends or relative position of the target sites in the 3′UTR; AU, AU or GC content; Len., length of transcript or UTR; 3Sup., 3′ supplementary pairing; TA, target abundance; ORFS, ORF or CDS sites; Opt., optional; SPS, seed pairing stability; 8m, number of 8-mer sites in the ORF..
Chaehwan Oh, Dahyeon Koh, Hyeong Bin Jeon, and Kyoung Mi Kim
Mol. Cells 2022; 45(9): 603-609 https://doi.org/10.14348/molcells.2022.0056Woo Ryung Kim, Eun Gyung Park, Kyung-Won Kang, Sang-Myeong Lee, Bumseok Kim, and Heui-Soo Kim
Mol. Cells 2020; 43(11): 953-963 https://doi.org/10.14348/molcells.2020.0177Hanbit Jang, Seulki Park, Jaehoon Kim, Jong Hwan Kim, Seon-Young Kim, Sayeon Cho, Sung Goo Park, Byoung Chul Park, Sunhong Kim, and Jeong-Hoon Kim
Mol. Cells 2020; 43(1): 23-33 https://doi.org/10.14348/molcells.2019.0239