Many of the molecular transformations in cells occur inside of complexes, each containing many protein molecules and often other molecules like RNA.
Determining which molecules are in each complex is a critical experimental challenge for unraveling their function.
Ideally, biologists would identify not just the components, but the way they intertwine at an atomic level, for example using x-ray crystallography. The Nobel-prize-winning analysis of the ribosome showed that this detailed structural information also illuminates how the pieces of the molecular machine interact to carry out its biochemical task.
But growing and analyzing crystals takes years of effort. In many cases researchers are happy just to know which molecules are in which complexes. As a first step, biologists have developed several clever techniques to survey thousands of proteins to see which pairs interact, and to confirm whether those interactions really happen in cells.
Identifying additional protein members of complexes requires chemical analysis like chromatography and increasingly powerful mass spectrometry techniques. In contrast, to explore how DNA and RNA act in complexes, researchers can take advantage of the sequence information available for humans and most lab organisms.
To find out which DNA regions bind with a particular protein transcription factor, for example, biologists use Chromatin ImmunoPrecipitation, or ChIP. Bound proteins from a batch of cells are chemically locked to the DNA with a cross-linker like formaldehyde.
This technique then requires an antibody that binds only to the protein (and its bound DNA), and which is sooner or later tethered to a particle. After breaking apart the DNA, the particle precipitates to the bottom of the solution carrying ("pulling down") its bound molecules, which are then separated and analyzed.
A related technique identifies proteins bound to an antibody-targeted partner. Ideally, the methods identify components that were already bound just before the cells are broken up to begin the experiment, rather than all possible binding sites, so they flag only biologically relevant pairings.
For DNA, the state of the art until recently was "ChIP-chip," which uses microarrays to try to match the pulled down DNA to one of perhaps a million complementary test fragments on an analysis "chip." The advent of high-throughput sequencing has allowed "ChIP-seq," in which the sequence of the bound DNA is directly measured and compared by software to the known genome to look for a match. This was the method used recently to find enhancer sequences by their association with a known enhancer-complex protein.
A similar method can identify the RNA targets of RNA-binding proteins, as discussed by Scott Tenenbaum of the University at Albany at a 2007 meeting that I covered for the New York Academy of Sciences (available through the "Going for the Code" link on my website's NYAS page).
Once the fragments are identified, researchers can try to dissect the elements of the sequence that make them prone to binding by a particular protein. When successful, this procedure allows them to identify other targets for interaction with proteins using only computer analysis of sequence information. These bioinformatics techniques are a critical time saver, because the experiments show that each protein can bind to many different molecules in the cell.
Experiments like these are revealing many of the intricate details of cellular regulation, but also how much more there is to learn.