Tuesday, December 29, 2009

Salting out and in

My latest story for Physical Review Focus concerns calculations of the tendency of various ions dissolved in water to accumulate at its surface.

This is a really old problem, discussed by some of the giants of physical chemistry. In the 1930s, for example, later Nobel winner Lars Onsager and others suggested that the termination of the electrical polarization at the surface of the water would give rise to an "image charge"--a surface charge of the same sign as the ion that creates an electric field just like that of a point charge at the mirror-image location on the other side of the interface. The repulsion from this image charge, they suggested, would keep ions away from the surface.

People have apparently suspected for decades that things can't be that simple, because different ions alter the surface tension to different degrees, indicating that they are changing the energy of the surface, presumably by being part of it. But only in the past decade or so have new experiments and simulations shown that some simple negative ions like halogens can be stable at the surface. Such ions at the surface of atmospheric droplets could be important catalysts, for example for breaking down ozone.

The two closely related Physical Review Letters that motivated the Focus story attribute the attractiveness of the surface position of a large negative ion to its internal polarizability. The internal rearrangement of charge, they say, allows the ion to retain much of the electrostatic attraction to nearby water molecules without creating a big hole in the water. However, I talked to another researcher who attributes the stabilization of the surface ion to a distortion it induces in the shape of the nearby surface. These both seem like potentially important effects, and both may play a role in the ultimate understanding.

The difference between the two could be important, though, for a related and even older phenomenon: the effect of various added salts on dissolved proteins. In 1888, Hofmeister ranked a series of ions in terms of their effectiveness in precipitating the proteins, and the order of the series mirrors that which was later found for the effects of ions on surface tension.

"Salting out" occurs when an added salt reduces the solubility of a protein, presumably by tying up water molecules and raising its effective concentration. This effect has been used for decades to create the protein crystals needed for structural studies like x-ray crystallography.

In contrast, "salting in" makes the protein more soluble, but may denature it. Salts that have this effect may alter the repulsion between water and the hydrophobic regions of the protein. This repulsion is critical for maintaining the shape of proteins that naturally occur in the bulk of the cell, since that shape generally presents hydrophilic regions to the solution and shelters hydrophobic regions inside. (Proteins that naturally occur in membranes, by contrast, generally expose a hydrophobic stripe where they are embedded in the non-aqueous center of the membrane sheet.)

The polarizability of ions at the protein-water interface could have an important effect on this repulsion. In contrast, since the water-protein interface is entirely within the liquid, changing the shape of the interface wouldn't seem to be an option.

It is true that many proteins take on their final shapes only in the presences of "chaperone" proteins, which can also help fix them up if they become denatured. Nonetheless, any insight into the interactions between water and proteins could be very important to understanding why they fold the way they do, and how circumstances might change that folding.

Monday, December 14, 2009

Rules to Design By

Once something gets really too complicated, it's almost certain to fail. So how can computer chips, with their billions of components, work at all?

We know lots of other complicated systems, like the world economy or our own bodies. And we know those systems fail often dramatically or tragically.

Of course, computers fail, too, as you know if you've seen a "blue screen of death" recently. But although it won't make you feel any better, those crashes almost always arise from problems with the software, not the hardware it runs on.

So how do engineers ensure that integrated circuits, diced by the score from semiconductor wafers, have a very good chance of working?

Design rules.

Simply put, design rules are a contract between the engineers designing the process for making chips and the engineers designing circuits to put on them. The process engineers guarantee that, if the circuit designers follow these rules in the geometry of their circuits, the chips will work (most of the time).

You may have heard "minimum design rule" used as a shorthand to describe a particular "generation" of computer chips, such as the "32nm" technology recently introduced by Intel. But that is shorthand is somewhat misleading.

For one thing, the true gate length of the transistor--which is critical to their speed and power--is generally about half the generation name. In addition, the "coded" gate length--the length in computer-aided-design files--is not usually the smallest design rule. And this is just one of hundreds of rules that are required to define a technology.

Rather than dive into the details of transistor geometry, consider a simpler design rule: the minimum width of a metal wire connecting the transistors. Together with the spacing between the wires, this dimension determines how tightly the wiring can be packed, which for some circuits determines how many transistors can be used in a parcel of semiconductor real estate.

The minimum safe design width of a wire depends on how fine it can be made and still assure that it will conduct electricity. This has to be guaranteed even under variations over time of the process used to make it, as well as the variation in that process across a, say, 12-inch diameter wafer.

To test what number is safe, the process engineers will make a whole series of test patterns, each consisting of very long wires with various design widths. After measuring hundreds of these test structures, they have a good idea what they can reliably make.

In developing the process technology, they have hundreds of test structures, each aimed at testing one or more design rules. The structures are automatically measured on different positions on different wafers made in different processing runs. Only then will the engineers have the confidence to guarantee that any circuit that follows those rules will work.

After a long process, a set of design rules will be given to designers to use for their circuit layouts. None of this would work without computers to check whether a particular chip layout meets the rules, since the job is beyond human capacity. Therefore a key feature of the design rules is that they can be embodied in an efficient algorithm.

The design-rule paradigm has been extraordinarily successful. But its success depends on a characteristic of the failures it is intended to prevent: they are all dependent on the local properties of the circuit. Some of the more complex rules involve quantities like the area of the metal "antenna" that is connected to a particular device at some point during processing. And frequently the engineers will play it safe by crafting the rules to cover the worst possible situation. But if the rules are chosen and followed properly, there is no chance for a combination of small choices that satisfy the rules to join together to cause a problem in the larger circuit. That's what makes a chip with a billion transistors possible.


Friday, December 11, 2009

Chromatin Compartments

The fractal packing of some DNA is just one of the interesting results from the recent Science paper by Lieberman-Aiden and colleagues. Of greater practical importance is the ability of their experimental technique to assign each region of DNA to one of two compartments.

The fact that some DNA regions, called heterochromatin, are packed more densely than other regions, called euchromatin, was discovered 80 years ago, by observing darker and lighter regions of stained nuclei under the optical microscope. Researchers have since learned that the heterochromatin is more densely packed, and that the genes it contains are transcriptionally silent. Heterochromatin also tends to segregate to the periphery of the nucleus, but to avoid the nuclear pores through which gene products are exported.

The Science authors did not mention this well-known classification. However, when they measured which regions of the genome were close together in the clumped DNA, they found that they could divide the mappable regions of the genome into two distinct "compartments." Regions from compartment A were more likely to lie close to other regions from compartment A, and similarly for compartment B. Importantly, they could make this assignment even for regions on different chromosomes, suggesting that the compartments represent regions of the nucleus in which segments of different chromosomes mingle.

The researchers also found that regions in compartment B were much more likely to be in close contact, so they designated that compartment "closed," and the other one "open." But Erez Lieberman-Aiden told me that "it seemed best to use terminology attached to things that we can probe and which clearly correspond to our compartments." Indeed, the regions they call "open" correspond well to the regions that can are accessible to DNA-digesting enzymes, but do not correspond to the light and dark bands that appear on the chromosomes during cell division.

Although the relationship to microscopically-observed partitioning may need clarification, the ability to globally map closed and open regions of the genome could be a very powerful tool. Looking at different cell types, for example, could reveal overall "signatures" in the chromosome arrangements. Such cell-type-specific patterns are already known to exist in the arrangement of histone modifications, which affect the nucleosome arrangement.

In addition, the chromatin structure enters into regulation of individual genes. Enhancer elements in the DNA sequence, for example, can affect the expression of quite distant genes, while an intervening insulator region can block that effect. Models of these influences generally involve large loops of DNA, but some also include the notion of a densely-packed and transcriptionally silent "scaffold" region that is reminiscent of the closed compartment. Determining which sections of the sequence are in the closed or open arrangements, especially in cells with different types of activity, could add some much-needed experimental visibility into the regulatory activity of these critically important elements.

[For physicist readers: as I was wrapping up this entry, the latest Physics Today arrived with a news story on this subject.]

Thursday, December 10, 2009

Fractal DNA

Packing meters of DNA into a nucleus with a diameter a million times smaller is quite a challenge. Wrapping the DNA around nucleosomes, and arranging these nucleosomes into 30nm fibers, both help, but these structures must themselves be packed densely. Beautiful new research, reported in Science in October, supports a 20-year old idea that some DNA is arranged in an exotic knot-free fractal structure that is particularly easy to unpack.

Alexander Grosberg, now at New York University, predicted (1M pdf) in 1988 that a polymer would initially collapse into a "crumpled globule," in which nearby segments of the chain would be closer to each other than they would be in the final, equilibrium globule. Creating the equilibrium structure requires "reptation," in which the polymer chain threads its way through its own loops, forming knots. This gets very slow for a long chain like DNA. Grosberg also applied (1M pdf) these ideas to DNA, and explored whether fractal patterns in the sequence could stabilize it. But experimental evidence was limited.

Now Erez Lieberman-Aiden and his coworkers at MIT and Harvard have devised a clever way to probe the large-scale folding structure of DNA, and found strong support for this picture.

The experiment is similar to chromatin immunoprecipitation techniques that look for DNA regions that are paired to target proteins by crosslinking and precipitating the pairs and then sequencing the DNA. In this case, however, the researchers crosslink nearby sections of the collapsed DNA to each other. To sequence both sections of DNA, they first splice the ends of the pairs to each other to form a loop, and then break them apart at a different position in the loop. The result is a set of sequence pairs that were physically adjacent in the cell; their positions along the DNA are found by matching them to the known genome.

The researchers found that the number of neighboring sequences decreases as a power law of their sequence separation, with an exponent very close to -1, for sequence distances in the range of 0.5 - 7 million bases. This is precisely the expected exponent for the crumpled--or fractal--globule. This structure is reminiscent of the space-filling Peano curve with its folds, folds of folds, and folds of folds of folds forming a hierarchy. In contrast, the equilibrium globule has an exponent of -3/2.

As a rule, I don't put a lot of stock in claims that a structure is fractal simply by seeing a power law, or a straight line on a double-logarithmic plot, unless the data cover at least a couple of orders of magnitude. After all, a true fractal is self-similar, meaning that the picture looks exactly the same at low resolution at high resolution, and in many cases there's no reason to think that fine structure resembles the coarse structure at all.

But when there's a good theoretical argument for similar behavior at different scales, I relax my standards of evidence a bit. For example, there's a good argument that rate the random walk of a diffusing molecule looks into neighboring volumes looks similar, whatever the size of the volume you consider--this is a known fractal. The standard polymer model is just a self-avoiding random walk, which adds the constraint that two parts of the chain can't occupy the same space. The DNA data are different in detail, but the mathematical motivation is similar.

At the conference I covered last week in Cambridge, MA, Lieberman-Aiden noted that the fractal structure has precisely the features you would want for a DNA library: it is compact, organized, and accessible. The densely packed structure keeps nearby sequence regions close in space, and parts of it can easily be unfolded to allow the transcription machinery to get access to it. Co-author Maxim Imakaev has verified all of these features with simulations of the collapsing DNA.

These experiments and simulations are fantastic, and the fractal globule structure makes a lot of sense. But this dense structure makes it all the more amazing what must happen when cells divide, making a complete copy of each segment of DNA (except the telomeres), and ensuring that the epigenetic markers on the DNA and histones of one copy are replicated on the other. It's still an awesome process.

Monday, December 7, 2009

Short RNAs to the Rescue

Ever since scientists realized, just over a decade ago, that exposing cells to short snippets of RNA could affect the activity of matching genes, they have dreamed if harnessing this RNA interference, or RNAi, to fight diseases. In the past week, two groups have announced progress toward that goal, treating chimpanzees with hepatitis C and mice with lung cancer.

RNAi, which rapidly earned a 2006 Nobel Prize, is just one facet of the many ways in which short RNAs regulate gene activity. Researchers have since found numerous types of naturally occurring short RNA that play important roles in development, stem cells, cancer, and other biological processes. These RNA-based mechanisms could seriously revise the emerging understanding of how cellular processes are controlled.

Over the same period, manipulating genetic activity with short RNAs has become an essential tool in biology labs. Cells process various forms of short RNA, such as short-hairpin RNA (shRNA) and small interfering RNA (siRNA) into RNA-protein complexes that reduce (usually) how much protein is made from a messenger RNA that include a complementary (or nearly complementary) sequence.

This technique gives researchers a quick way to learn about what a particular gene does, at least in culture dishes, sidestepping the laborious creation and breeding of genetically-modified critters. (Or if they do put in the time, they can insert genes that allow them to controllably trigger RNAi to knock down a gene only in particular cells or after it has completed an indispensible task in helping an organism to grow.)

But affecting genetic regulation in patients faces the challenges of "delivery" that are well-known in the pharmaceutical industry: To have a beneficial effect, the short RNA must survive in the body, get inside the right cells in large quantities, and not cause too many other effects in other cells. The New York Academy of Sciences has a regular series on the challenges of using RNA for treatment, and I covered one very interesting meeting in 2008.

Molecular survival is the first challenge. Researchers have developed various chemical modifications that help RNA (or a lookalikes) withstand assaults by enzymes that degrade rogue nucleic acids. Santaris, for example, which helped in the hepatitis project, has developed proprietary modifications it calls "locked nucleic acids," or LNA. Other researchers and companies are exploring similar techniques.

Getting the protected RNA to the right tissue is another challenge. Foreign chemicals are naturally cycled to the liver for processing, so it's fairly easy to target this organ. For this reason, the hepatitis results don't really prove that the technique is useful for other tissues. The Santaris release also neglects to mention any publication associated with the research.

The mouse lung cancer result appears in Oncogene. The lead Yale researcher, Frank Slack, regularly studies short RNAs in the worm C. elegans, as I described in a recent report from the New York Academy of Sciences. In this work, he teamed with Mirna Therapeutics, which aims to use the short-RNA-delivery vehicle to replace naturally occurring microRNA that are depleted in cancer, like the let-7 they used for this study. The mouse cancers did not disappear, but they regressed to about a third of their previous size, according to the release. Mirna says that since they are replacing natural microRNAs, their technique shouldn't induce many side effects in other tissues.

A further risk for small-RNA delivery is immune responses. The field of gene therapy is only now recovering from the 1998 death of Jesse Gelsinger in what looks like a massive immune response to the virus used to insert new genes in his cells. Although the short-RNA response will be different, some cellular systems are primed to respond to the foreign nucleic acids brought in by viruses.

It's likely that there will be many twists and turns along the way, and I haven't solicited expert opinions on these studies, but they seem to be intriguing steps toward the goal of using RNA not just to study biology, but to change people's lives.

Wednesday, December 2, 2009

Massachusetts Dreaming

Today I'm taking Amtrak to Cambridge--our fair city--MA, for an exciting back-to-back-to-back trio of conferences at the MIT/Harvard Broad (rhymes with "road") Center.

Two of the conferences are described as satellites to RECOMB (Research in Computational Molecular Biology), even though that meeting was in Tucson in May. One of these is on regulatory genomics and the other on systems biology. The third is the fourth meeting of the DREAM assessment of methods for modeling biological networks, a series I've covered since its organizational meeting at the New York Academy of Sciences in 2006.

There's a lot in common between these conferences, so it's not always easy to notice the boundaries. The most tightly focused is DREAM--Dialog on Reverse-Engineering Assessment and Methods. The goal is simple to state: what are the best ways to construct networks that mimic real biological networks, and how much confidence should we have in the results. In practice, things are not so straightforward, and border on the philosophical question of how to distinguish models and "reality." The core activity of DREAM is a competition to build networks based on diverse challenges.

The Regulatory Genomics meeting covers detailed mechanisms of gene regulation, often focusing on more formal and algorithmic aspects than would be expected in a pure biology meeting. The Systems Biology meeting addresses techniques, usually based on high-throughput experimental tools, for attacking large networks head on, rather than taking the more traditional pathway-by-pathway approach.

I'll be writing synopses of the invited talks and the DREAM challenges for an eBriefing at NYAS, but I'll be free to relax and enjoy the contributed talks and posters. This promises to be a rich and exhausting five days.

Tuesday, December 1, 2009

Packing DNA Beads

The dense packing of DNA in the nucleus of eukaryotes strongly affects how genes within it are expressed, with some regions much more accessible to the transcription machinery than others. At the shortest scales, the accessibility of the DNA double helix is reduced where it is wound around groups of eight histone proteins to form nucleosomes, and the precise position of the nucleosomes in the sequence affects which genes are active.

At a slightly larger scale, the nucleosomes are rather closely packed along the DNA. They can remain floppy, like beads on a string, or they can fold into rods of densely packed beads, which further reduces the accessibility of their DNA. Other proteins in the nucleus, notably the histone H1, help to bind together this dense packing. These rods can pack further, with the help of other proteins.

The histone proteins that form the core of the nucleosome, two copies each of H2A, H2B, H3, and H4, have stray "tails" extending from the core. Small chemical changes at particular positions along these tails can have surprisingly large influence on the expression of the associated DNA. For example, the modification H3K27me3 (three methyl groups attached to the lysine at position 27 on the tail of histone H3) represses expression, while acetylation of the same amino acid, H3K27ac activates expression. There is also a more substantial modification, in which histone H2A is replaced by a variant called H2A.Z also modifies expression.

The detailed mechanisms by which the modifications affect expression, such as changing the wrapping of nucleosomes, the packing of nucleosomes, or recruiting of other proteins in the nucleus, are areas of active research.

Since there are dozens of possible histone tail modifications, there are vast numbers of possible combinations of modifications. Some researchers have proposed that these combinations could each prescribe different expression patterns, for example during development. However, the evidence for a combinatorial "histone code" analogous to the three-base codons of the genetic code remains weak.

Nonetheless, proteins that can modify the tails, either adding or removing a chemical group, can have lasting effects on the activity of the underlying genes. The sirtuin proteins that are candidates for longevity-extending drugs, for example, are best known for their role as histone deacetylases.

Some histone modifications can be passed down through cell division or reproduction, so they qualify as epigenetic changes. In contrast to the natural replication of the mirror-image DNA sequence, replicating histone modifications requires a much more complicated process.

Changes in the pattern of histone modifications are found in many basic biological processes, including development, stem-cell maintenance, and cancer. Particular modification patterns have been used to find specific functional sequences within the DNA, such as transcription start sites and enhancers. For these reasons, the ENCODE project mapped modifications as part of their survey of a select part of the human genome for intense study.

Understanding the mechanisms and roles of DNA organization and how it is changed will be essential to a complete picture of gene regulation.