The regulation of genes, which renders only some of them active in a particular cell, is critical for controlling life processes. But for many years it was tedious to measure which genes were active. That all changed, beginning about 15 years ago, with the microarray revolution, which lets researchers simultaneously monitor the activity of thousands or more genes.
The technique uses methods similar to those in microelectronics to create a huge, two-dimensional array of spots, each containing a short segment of DNA. Typically, this array is bathed with a sample containing a mixture of DNA or RNA molecules, which bind to any complementary sequences in the array. These sample molecules were duplicated from the original DNA or RNA to be tested, and in the process they were labeled with a fluorescent die. The pattern of glowing spots shows which sequences are present in the sample, and their brightness indicates how much is present.
The best-known use of these tools is to measure levels of messenger RNA (mRNA) that have been transcribed from known genes. Since mRNA is the template for making proteins, this level gives an indication of how active the gene is, and is sometimes referred to as "gene expression." However, this terminology is oversimplified, because many effects change the amounts and activity of the final protein, as well as the amount of mRNA in the cell.
The cleanest experiments compare the amounts under two conditions, for example in normal cells and after they are exposed to some perturbation, such as a drug. If the mRNA levels increase or decrease, the corresponding gene is said to be "upregulated" or "downregulated," respectively. Frequently the result for a single perturbation is presented as a column of red or green spots, each representing one gene. The overall pattern shows fingerprint of expression changes corresponding to various changes in the cell. For example, different types of stress tend to activate the same sets of genes and to de-activate others.
The technology is expensive, costing hundreds of dollars for a single commercial array, which can make it difficult for less well funded labs. On the other hand, researchers are expected to deposit their array data into publicly accessible databases that anyone can analyze. Naturally these databases are huge, so exploring them is a bioinformatics challenge that has spawned numerous software tools.
Array technology is most useful for identifying well known sequences, such as those derived from protein-coding genes. Indeed, arrays are available for the complete genomes of many model organisms such as E. coli, Drosophila, and mice, as well as humans. For exploring the large non-coding parts of the genome or the metagenomics of entire ecosystems, however, the need to pre-specify the sequences to be matched is a limitation.
For these and other reasons, researchers are increasingly turning to sequence-based technology, which simultaneously determines the base sequence of millions of short segments of DNA or RNA. Software then looks for matches between each sample and known genome sequences. This technology isn't cheap either, and the millions of sequences are an even larger bioinformatics challenge than the array data.