The release of the draft map of "the" human genome sequence in 2000 raised hopes that the genetic sources of human variability, and especially disease, would soon be identified. But it has become increasingly clear that the sequence overlooks a major source--perhaps the major source--of genetic variation.
In the years after the sequence was mapped, the International HapMap Project worked to identify some ten million common alterations of individual bases throughout the genome. These "single-nucleotide polymorphisms," or SNPs, constitute a molecular fingerprint or genotype, and companies now offer microarrays to test subsets of them. Researchers look for correlations of disease with particular variants to locate nearby genes that may cause disease. With a few exceptions, though, this process been rather slow, and the genes it finds explain only part of the genetic contribution to disease.
One reason for this--although not the only one--is that the sequence differences don't reflect important genetic differences that arise when large segments of the gene are missing, duplicated, or reversed. Researchers estimate that these "structural variants" affect many more bases than the individual base changes. The importance of these changes was recognized early on in cancer, where they arise from the disruption of the usual quality-control mechanisms of DNA replication, but the past few years have shown that their influence is much more widespread.
The changes were previously invisible because the usual method of sequencing first chops up the DNA into many smaller pieces, whose base sequence is easier to determine. The different sections are then compared in software to see how they match up. With enough overlap and duplication, researchers can make a reasonable guess for the original long sequence. But this method breaks down in regions where sequences occur more than once, because there are many ways to match things up.
In recent years, experimenters have devised several techniques for finding copy-number variations arising from insertions or deletions, as well as inverted sections. For example, Mike Snyder's group at Yale developed a method that I covered for the New York Academy of Sciences (if you're not a member, look at "Go Deep" in the NYAS section of the "Clips" tab at my website). Most of the regions are 3,000-1,000 base pairs in length.
These changes contribute to many diseases. Last year, for example, an international consortium found that structural variants play a role in schizophrenia. But instead of fingering a few key suspects, the results pointed to hundreds of copy-number variations, each one of which has only a small effect. Interestingly, some of the same genetic regions seem to be involved in other mental illnesses.
Like genetic studies that use SNP genotypes, these results highlight the complex nature of many diseases, and the many distinct disruptions that can cause them. Treating these diseases may require a better understanding of the complete networks of interactions that underlie them. At the same time, different diseases seem to have important elements in common, and perhaps should be thought of as members of disease families.
The next few years should see a dramatic increase in the understanding of structural variants in human differences and disease as well as in human evolution.