In 1977, researchers were surprised to learn that the protein-coding sequence of messenger RNA doesn't arise from a continuous section of DNA.
Instead, work that earned Phil Sharp and Richard Roberts a Nobel in 1993 found that the as-transcribed pre-mRNA includes sections called introns that are then cut out of the sequence while the remaining exons are spliced back together (the words can apply to either DNA or RNA).
The final protein-coding section is straddled on both ends, called 5' and 3', by untranslated regions (UTRs). These noncoding regions are also transcribed from the DNA, but aren't usually described as exons. But their sequence still matters: out in the cell, the 3' UTR is a favorite target for complementary microRNAs that affect the stability or translation of the messenger RNA.
Additional processing steps in the nucleus add to the spliced-together RNA a trademark chemical cap at its 5' end and a tail of repeated adenylenes at its 3' end. Both the cap and the polyadenylated tail are important for the later translation of the mature mRNA at ribosomes, once it has been exported from the nucleus.
A further wrinkle was the realization that the splicing can happen in different ways, as illustrated in the figure (from Wikipedia), which connects by blue lines the pieces that can be neighbors in the final RNA. The multiplicity of possible proteins resulting from this alternative splicing significantly increases the number of protein products available from a given stretch of DNA. Most human proteins occur in more than one splicing arrangement, called an isoform.
The splicing is done by a large complex of RNA and proteins called the spliceosome. The choice of isoform depends in part on special RNA sequences, either within an exon or an intron, that bind proteins that promote or inhibit splicing at a particular point. This binding is sensitive to sequence changes that don't change the coded amino acid and are therefore called "synonymous." Because of splicing, these changes aren't always synonymous: they change the final protein.
In addition, the relative amounts of the alternatively spliced isoforms can change, for example, during development of an organism in different tissues, notably the brain. The regulation of this process provides yet another tool for controlling gene expression, but scientists are still clarifying what determines the splice configuration.
To get a global view of alternative splicing, Chris Burge of MIT, at a conference that I covered last year, described a technique called mRNA-seq that preferentially sequences short RNA segments that contain the polyadenylated tail, and are therefore proper mRNA candidates for later translation. (This eliminates the confusing background of transcribed RNA that is useless or acts in other ways.) Using this technique, he and his colleagues identified more than 10,000 sequences that coded for multiple isoforms. Of these, Burge estimated that more than 2/3 were present in different amounts in different tissues, so they different forms seem likely to be doing important things.
Burge also found a surprising connection between the processes that add the polyadenylene tail and those that do splicing. The former process occurs at the end of transcription, but it looks as if the RNA is already being handed off to the splicing machinery before transcription is finished.
Although it is still poorly understood, alternative splicing is an important and widespread mechanisms for regulating genes, as well as for getting multiple proteins out of a single region of DNA.