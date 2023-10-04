Most protein-coding genes in higher eukaryotes are subject to alternative processing yielding multiple mRNA isoforms, thereby diversifying the functional transcriptome (Tian & Manley, 2017; Ule & Blencowe, 2019). The ability to generate complex alternative transcripts from individual genes is critical to the appropriate specification, differentiation and function of distinct cell types, and homeostatic responses to an array of perturbations (Vuong et al, 2016a; Olthof et al, 2022; Wright et al, 2022). Consequently, alternative mRNA processing is deregulated under diverse pathogenic conditions, and defects in some individual isoforms can cause disease (Bonnal et al, 2020; Rogalska et al, 2022). For these reasons, it is important to appreciate the breadth of alternative transcript isoforms across time and space, understand mechanisms by which specific alternative mRNA isoform choices are executed in the correct cell states and conditions, and elucidate biological consequences of failure to generate appropriate programs of alternative isoforms.

Many classes of alternative mRNA processing include usage of different promoters, inclusion of distinct internal exons, and deployment of alternative last exons (ALEs) and/or 3′ UTRs (Fig 1A). Conceptually, these phenomena invoke choices of basal promoters, of splicing sites, and of polyadenylation sites. Different machineries are involved in identifying each of these alternative sites, and moreover, individual genes can be subject to alternative processing at multiple locations to generate combinatorial complexity. With these complexities in mind, it is notable that we lack full mechanistic understanding of several established, critical, regulators of isoform diversity. We also are far from knowing the biological importance of many such programs, which is ultimately critical to decipher their contributions to human disease.

Figure 1. Fundamental mechanisms for mRNA processing: splicing and polyadenylation. (A) Primary mRNA transcripts bear a series of sequence motifs that direct splicing (left side, sequences in the vicinity of exon–intron boundaries) and cleavage and polyadenylation (right side, sequences in the vicinity of 3′ termini). (B) Core features of splicing. This process is defined by cis-acting sequences at the 5′ exon–intron junction (GU, often within the AG|GURAGU context), the branchpoint (YUNAY), and the 3′ intron–exon junction (AG, often within the YAG|GU context). The spliceosome contains multiple ribonucleoprotein subcomplexes that mediate different aspects of intron excision and exon ligation. Splicing is a dynamic stepwise process, and the stepwise recruitment and ejection of splicing subcomplexes is not fully detailed here for simplicity. Central players include the U1 snRNP that recognizes the 5′ intron boundary and the U2 snRNP that recognizes the 3′ intron boundary. The 5′ splice site basepairs with U1 snRNA, whereas the 3′ intron region is initially bound by SF1 (branchpoint), U2AF65 (polypyrimidine tract), and U2AF35 (3′ splice site); this transitions to basepairing of U2 snRNA around the branchpoint. The two fundamental steps of splicing are ligation of the branchpoint A to the 5′ splice site G, followed by ligation of the 3′ end of the upstream exon to the 5′ end of the downstream exon, joining the exons. This simultaneously liberates an intron lariat, which is then debranched and degraded. (C) Core features of 3′ end formation. The multisubunit cleavage and polyadenylation complex recognizes the presumptive 3′ end via sequence motifs, including the polyadenylation sequence (typically AAUAAA), which is often flanked by other upstream and downstream sequences. Following cleavage of the primary mRNA transcript by the CPSF73 endonuclease, the 3′ end is extended via poly-A polymerase to ensure a stable terminus, that is, protected by poly-A–binding protein.

In this review, we will focus on alternative splicing (AS) of internal and 3′ terminal exons and on alternative polyadenylation (APA) to generate distinct 3′ UTRs. We note that others have extensively reviewed general mechanisms and regulation of alternative splicing (Tian & Manley, 2017; Bonnal et al, 2020) and APA (Ule & Blencowe, 2019; Mitschka & Mayr, 2022). We direct the reader to such recent reviews for comprehensive background on these topics. Among the broad literature on these processing programs, we pay attention to exemplary cell-specific RNA-binding proteins (RBPs) that instruct isoform programs (Darnell, 2013). As for setting, we will focus on the nervous system, whose utilization of both cell-specific splicing and alternative 3′ UTR programs is particularly widespread. The diversity of neural transcriptomes is critical for the development, function, and maintenance of these unusual cells, and also exposes vulnerabilities of neurons when these mRNA-processing mechanisms go awry. Although we review classic literature and general studies on isoform generation, we will emphasize the latest mechanistic findings, technical innovations, and biological impacts of neural-specific AS and neural APA.

Fundamentals of 3′ end formation Another fundamental aspect of eukaryotic mRNAs is that they bear terminal features that are not specified by the genome, that is, a 5′ cap and 3′ polyadenylate (pA) tail. These untemplated sequences protect mRNAs from degradation by a variety of professional exoribonucleases that destroy uncapped and/or untailed mRNAs. On the 3′ end, the cleavage and polyadenylation (CPA) machinery first identifies appropriate cleavage locations within primary transcripts, thereby separating the nascent RNA from RNA polymerase II, and subsequently adds the pA tail (Tian & Manley, 2017). These complex reactions are specified by primary sequence motifs (Fig 1C). The most critical of these is the polyadenylation sequence (PAS), typically AAUAAA and certain variants, located ∼20–30 nts upstream of the site of CPA (i.e., the pA site). The central factors in 3′ mRNA cleavage reside in the CPSF complex, which includes the CPSF30/WDR33 heterodimer that directly recognizes the PAS (Schonemann et al, 2014), and the CPSF73 endonuclease that cleaves nascent mRNA (Dominski et al, 2005; Mandel et al, 2006). As with splicing, a similar puzzle exists as to how specificity of 3′ mRNA cleavage is achieved. The known PAS signals are themselves insufficient to explain accurate cleavage only at the 3′ ends of transcripts because 3′ UTRs and especially introns are enriched in AU sequences. Thus, numerous PAS-like sequences must seemingly be ignored by the CPA machinery to generate full-length mRNA. This specificity is explained in part by additional motif information located in the vicinity of bona fide 3′ cleavage sites, including upstream UGUA (recognized by the CFIm complex) and downstream U/GU (recognized by CstF complex) (Fig 1C). However, such motifs are not required to process many mRNAs, and thus have modulatory but not absolute roles in recognizing 3′ cleavage sites.

ALE splicing and relation to intronic polyadenylation (IPA) Because introns can be extremely long, they contain not only fortuitous splice site matches but also cryptic polyadenylation signals. Inappropriate action of the CPA machinery within an intron will create an alternative gene product with a distinct 3′ UTR from the downstream model, and high likelihood of encoding a truncated protein that may also bear a foreign C-terminus. One strategy to prevent this is “telescripting,” whereby U1 snRNP protects elongating Pol II from premature CPA (Berg et al, 2012; Almada et al, 2013). In particular, the need for U1-mediated suppression of early termination is especially overt within long introns (Oh et al, 2017), which preferentially exist in neural-expressed genes. This is also applicable to non-neuronal settings, as mammalian first introns are typically longer than downstream introns and require U1 telescripting (Kainov & Makeyev, 2020). Although U1 is most well-known for its role in splicing, it exists in a distinct complex with CPA factors to suppress cryptic PAS usage (So et al, 2019). More generally at terminal exons, U1 snRNP regulates 3′-end polyadenylation and gene expression via binding of its subunit U1-70K with the carboxy-terminal end of PAP (Gunderson et al, 1998; Fortes et al, 2003; Abad et al, 2008). A single classification of IPA belies the complexity of functional outcomes on these alternative transcripts, and these can generate either 3′ terminal extensions of an existing exon or enable splicing into a distinct 3′ terminal exon (Fig 2B). In any case, IPA events are not formally intronic with respect to cognate alternative transcripts because CPA by definition generates the terminus of the respective exon (although it may be considered to be intronic with respect to a different gene model). In some cases, fortuitous IPA events will generate a truncated transcript containing arbitrary sequence (Fig 2B). A typical outcome would be for such aberrant transcripts to be removed by quality control pathways, such as NMD. However, if production of the aberrant isoform is preferred because of sequence or genetic variation, this can substantially impede the production of the intended full-length product. However, IPA can also yield stable 3′ isoforms, which bear an internal ALE compared with isoforms carrying downstream 3′ sequence (Fig 2B). Although ALE isoforms can be classified within the rubric of alternative splicing, it is important to bear in mind that this can also reflect alternative CPA choice. In subsequent sections, we will discuss ALE splicing in the context of APA.

Challenges for understanding alternative splicing and polyadenylation programs Numerous laboratories and researchers are dedicated to unraveling the molecular strategies and mechanics of splicing and 3′ formation, simply in a constitutive fashion. However, the complexity of alternative mRNA splicing and 3′ formation events raises even more challenges. There is already seemingly not enough primary sequence information to distinguish “intended” processing events from the forest of “illegitimate” matches to splicing and polyadenylation signals. Given this, how can alternative splicing (AS, Fig 2A) and APA (Fig 2B) site usage be controlled appropriately across the genome, in cell-type, and condition-specific manners? It is instructive to frame the breadth of these questions. If there were only a few AS and APA events, it might suffice to invoke specific regulatory phenomena at those loci. However, the advent of widespread genomic profiling has led to the realization that higher eukaryotes make extraordinarily broad use of both AS and APA. Indeed, available data from 15 yr ago revealed that the vast majority (>90%) of mammalian multi-exon genes undergo alternative splicing (Pan et al, 2008; Wang et al, 2008). This proportion continues to increase with ever deeper transcriptome profiling, comprising numerous subclasses (Fig 2A). Likewise, a most of the genes yield multiple 3′ isoforms in diverse metazoan species (Sandberg et al, 2008; Derti et al, 2012; Smibert et al, 2012). Again, these comprise numerous subclasses of APA isoforms (Fig 2B). Moreover, the collection of alternative isoforms from an individual gene can include many combinations of alternative internal exons and alternative 3′ terminal exons. These facts make it even more perplexing how such programs of isoform generation can be both accurate but also alternative to diversify the transcriptome.