Do DNA sequences evolve more slowly if they play important biological roles?
For many genomics researchers, the answer is so self-evidently "yes" that the question is hardly worth asking. Indeed, they often regard sequence conservation between species, or the evolutionary constraint that it implies, as a clear indication of biological function.
And sometimes it is a good indication. But in general, as I described in my story this summer in Science, this connection is only weakly supported by experiments, such as the exhaustive exploration of 1% of the human genome in the pilot phase of the ENCODE project. That work found that roughly 40% of constrained sequences had no obvious biochemical function, while only half of biochemically active sequences seemed to be constrained.
One reason for this is that many important functions, such as markers for alternative splicing, 3D folding of transcribed RNA, or DNA structure that affects binding by proteins, may have an ambiguous signature in the DNA base sequence.
But another reason (there's always more than one!) is that evolutionary pressure depends on biological context.
Several of my sources for the Science story emphasized that redundancy can obscure the importance of a particular region of DNA. For example, deleting one region may not kill an animal, if another region does the same thing. By the same token, redundant sequences may be less visible to evolution, and therefore freer to change over time. Biologists know many cases of important new functions that have evolved from a duplicate copy of a gene.
But redundancy, in which two sequences play interchangeable roles, is only one of many ways that genetic regions affect each other, and a very simple one at that. As systems biologists have been revealing, the full set of interactions between different molecular species forms a rich, complex network, affectionately known as "the hairball."
For some biologists, the importance of context on evolution is obvious. When I spoke on this subject on Tuesday at The Stowers Institute, for example, Rong Li pointed to the work of Harvard systems biologist Mark Kirschner. Kirschner, notably in the 2005 book The Plausibility of Life that he coauthored with John Gerhart, describes biological systems as comprising rigid core components controlled by flexible regulatory linkages.
The conserved core processes generally consist of many complex, precisely interacting pieces. They may be physical structures, like ribosome components, or systems of interactions like signaling pathways. Their structure and relationships are so finely tuned that any change is likely to disrupt their function, so their evolution will be highly constrained.
In contrast, the flexible regulatory processes that link these core components can fine-tune the timing, location, or degree of activity of the conserved core processes. Such changes are at the heart of much biological innovation. For example, the core components that lead to the segmented body plan of insects are broadly similar, at a genetic level, to those that govern our own development. Our obvious differences arise from the way these components are arranged in time and space during development.
The weak predictive power of conservation is particularly relevant as researchers comb the non-protein-coding 98.5% of the genome for new functions. Many of these non-coding DNA sequences are regulatory, so they may evolve faster. Indeed, Mike Snyder of Yale University observed a rapid loss of similarity in regulatory RNA between even closely-related species in deep sequencing studies he described at a symposium at the New York Academy of Sciences (nonmembers can get to my write-up by following the "Go Deep" link at the NYAS section of my website).
Quantifying how evolutionary pressures depend on the way genes interact is likely to keep theorists busy for years to come. But it is clear that the significance of evolutionary constraint in a DNA sequence--or its absence--depends very much on where it fits in the larger biological picture.