Thursday, December 10, 2009

Fractal DNA

Packing meters of DNA into a nucleus with a diameter a million times smaller is quite a challenge. Wrapping the DNA around nucleosomes, and arranging these nucleosomes into 30nm fibers, both help, but these structures must themselves be packed densely. Beautiful new research, reported in Science in October, supports a 20-year old idea that some DNA is arranged in an exotic knot-free fractal structure that is particularly easy to unpack.

Alexander Grosberg, now at New York University, predicted (1M pdf) in 1988 that a polymer would initially collapse into a "crumpled globule," in which nearby segments of the chain would be closer to each other than they would be in the final, equilibrium globule. Creating the equilibrium structure requires "reputation," in which the polymer chain threads its way through its own loops, forming knots. This gets very slow for a long chain like DNA. Grosberg also applied (1M pdf) these ideas to DNA, and explored whether fractal patterns in the sequence could stabilize it. But experimental evidence was limited.

Now Erez Lieberman-Aiden and his coworkers at MIT and Harvard have devised a clever way to probe the large-scale folding structure of DNA, and found strong support for this picture.

The experiment is similar to chromatin immunoprecipitation techniques that look for DNA regions that are paired to target proteins by crosslinking and precipitating the pairs and then sequencing the DNA. In this case, however, the researchers crosslink nearby sections of the collapsed DNA to each other. To sequence both sections of DNA, they first splice the ends of the pairs to each other to form a loop, and then break them apart at a different position in the loop. The result is a set of sequence pairs that were physically adjacent in the cell; their positions along the DNA are found by matching them to the known genome.

The researchers found that the number of neighboring sequences decreases as a power law of their sequence separation, with an exponent very close to -1, for sequence distances in the range of 0.5 - 7 million bases. This is precisely the expected exponent for the crumpled--or fractal--globule. This structure is reminiscent of the space-filling Peano curve with its folds, folds of folds, and folds of folds of folds forming a hierarchy. In contrast, the equilibrium globule has an exponent of -3/2.

As a rule, I don't put a lot of stock in claims that a structure is fractal simply by seeing a power law, or a straight line on a double-logarithmic plot, unless the data cover at least a couple of orders of magnitude. After all, a true fractal is self-similar, meaning that the picture looks exactly the same at low resolution at high resolution, and in many cases there's no reason to think that fine structure resembles the coarse structure at all.

But when there's a good theoretical argument for similar behavior at different scales, I relax my standards of evidence a bit. For example, there's a good argument that rate the random walk of a diffusing molecule looks into neighboring volumes looks similar, whatever the size of the volume you consider--this is a known fractal. The standard polymer model is just a self-avoiding random walk, which adds the constraint that two parts of the chain can't occupy the same space. The DNA data are different in detail, but the mathematical motivation is similar.

At the conference I covered last week in Cambridge, MA, Lieberman-Aiden noted that the fractal structure has precisely the features you would want for a DNA library: it is compact, organized, and accessible. The densely packed structure keeps nearby sequence regions close in space, and parts of it can easily be unfolded to allow the transcription machinery to get access to it. Co-author Maxim Imakaev has verified all of these features with simulations of the collapsing DNA.

These experiments and simulations are fantastic, and the fractal globule structure makes a lot of sense. But this dense structure makes it all the more amazing what must happen when cells divide, making a complete copy of each segment of DNA (except the telomeres), and ensuring that the epigenetic markers on the DNA and histones of one copy are replicated on the other. It's still an awesome process.

Monday, December 7, 2009

Short RNAs to the Rescue

Ever since scientists realized, just over a decade ago, that exposing cells to short snippets of RNA could affect the activity of matching genes, they have dreamed if harnessing this RNA interference, or RNAi, to fight diseases. In the past week, two groups have announced progress toward that goal, treating chimpanzees with hepatitis C and mice with lung cancer.

RNAi, which rapidly earned a 2006 Nobel Prize, is just one facet of the many ways in which short RNAs regulate gene activity. Researchers have since found numerous types of naturally occurring short RNA that play important roles in development, stem cells, cancer, and other biological processes. These RNA-based mechanisms could seriously revise the emerging understanding of how cellular processes are controlled.

Over the same period, manipulating genetic activity with short RNAs has become an essential tool in biology labs. Cells process various forms of short RNA, such as short-hairpin RNA (shRNA) and small interfering RNA (siRNA) into RNA-protein complexes that reduce (usually) how much protein is made from a messenger RNA that include a complementary (or nearly complementary) sequence.

This technique gives researchers a quick way to learn about what a particular gene does, at least in culture dishes, sidestepping the laborious creation and breeding of genetically-modified critters. (Or if they do put in the time, they can insert genes that allow them to controllably trigger RNAi to knock down a gene only in particular cells or after it has completed an indispensible task in helping an organism to grow.)

But affecting genetic regulation in patients faces the challenges of "delivery" that are well-known in the pharmaceutical industry: To have a beneficial effect, the short RNA must survive in the body, get inside the right cells in large quantities, and not cause too many other effects in other cells. The New York Academy of Sciences has a regular series on the challenges of using RNA for treatment, and I covered one very interesting meeting in 2008.

Molecular survival is the first challenge. Researchers have developed various chemical modifications that help RNA (or a lookalikes) withstand assaults by enzymes that degrade rogue nucleic acids. Santaris, for example, which helped in the hepatitis project, has developed proprietary modifications it calls "locked nucleic acids," or LNA. Other researchers and companies are exploring similar techniques.

Getting the protected RNA to the right tissue is another challenge. Foreign chemicals are naturally cycled to the liver for processing, so it's fairly easy to target this organ. For this reason, the hepatitis results don't really prove that the technique is useful for other tissues. The Santaris release also neglects to mention any publication associated with the research.

The mouse lung cancer result appears in Oncogene. The lead Yale researcher, Frank Slack, regularly studies short RNAs in the worm C. elegans, as I described in a recent report from the New York Academy of Sciences. In this work, he teamed with Mirna Therapeutics, which aims to use the short-RNA-delivery vehicle to replace naturally occurring microRNA that are depleted in cancer, like the let-7 they used for this study. The mouse cancers did not disappear, but they regressed to about a third of their previous size, according to the release. Mirna says that since they are replacing natural microRNAs, their technique shouldn't induce many side effects in other tissues.

A further risk for small-RNA delivery is immune responses. The field of gene therapy is only now recovering from the 1998 death of Jesse Gelsinger in what looks like a massive immune response to the virus used to insert new genes in his cells. Although the short-RNA response will be different, some cellular systems are primed to respond to the foreign nucleic acids brought in by viruses.

It's likely that there will be many twists and turns along the way, and I haven't solicited expert opinions on these studies, but they seem to be intriguing steps toward the goal of using RNA not just to study biology, but to change people's lives.

Wednesday, December 2, 2009

Massachusetts Dreaming

Today I'm taking Amtrak to Cambridge--our fair city--MA, for an exciting back-to-back-to-back trio of conferences at the MIT/Harvard Broad (rhymes with "road") Center.

Two of the conferences are described as satellites to RECOMB (Research in Computational Molecular Biology), even though that meeting was in Tucson in May. One of these is on regulatory genomics and the other on systems biology. The third is the fourth meeting of the DREAM assessment of methods for modeling biological networks, a series I've covered since its organizational meeting at the New York Academy of Sciences in 2006.

There's a lot in common between these conferences, so it's not always easy to notice the boundaries. The most tightly focused is DREAM--Dialog on Reverse-Engineering Assessment and Methods. The goal is simple to state: what are the best ways to construct networks that mimic real biological networks, and how much confidence should we have in the results. In practice, things are not so straightforward, and border on the philosophical question of how to distinguish models and "reality." The core activity of DREAM is a competition to build networks based on diverse challenges.

The Regulatory Genomics meeting covers detailed mechanisms of gene regulation, often focusing on more formal and algorithmic aspects than would be expected in a pure biology meeting. The Systems Biology meeting addresses techniques, usually based on high-throughput experimental tools, for attacking large networks head on, rather than taking the more traditional pathway-by-pathway approach.

I'll be writing synopses of the invited talks and the DREAM challenges for an eBriefing at NYAS, but I'll be free to relax and enjoy the contributed talks and posters. This promises to be a rich and exhausting five days.

Tuesday, December 1, 2009

Packing DNA Beads

The dense packing of DNA in the nucleus of eukaryotes strongly affects how genes within it are expressed, with some regions much more accessible to the transcription machinery than others. At the shortest scales, the accessibility of the DNA double helix is reduced where it is wound around groups of eight histone proteins to form nucleosomes, and the precise position of the nucleosomes in the sequence affects which genes are active.

At a slightly larger scale, the nucleosomes are rather closely packed along the DNA. They can remain floppy, like beads on a string, or they can fold into rods of densely packed beads, which further reduces the accessibility of their DNA. Other proteins in the nucleus, notably the histone H1, help to bind together this dense packing. These rods can pack further, with the help of other proteins.

The histone proteins that form the core of the nucleosome, two copies each of H2A, H2B, H3, and H4, have stray "tails" extending from the core. Small chemical changes at particular positions along these tails can have surprisingly large influence on the expression of the associated DNA. For example, the modification H3K27me3 (three methyl groups attached to the lysine at position 27 on the tail of histone H3) represses expression, while acetylation of the same amino acid, H3K27ac activates expression. There is also a more substantial modification, in which histone H2A is replaced by a variant called H2A.Z also modifies expression.

The detailed mechanisms by which the modifications affect expression, such as changing the wrapping of nucleosomes, the packing of nucleosomes, or recruiting of other proteins in the nucleus, are areas of active research.

Since there are dozens of possible histone tail modifications, there are vast numbers of possible combinations of modifications. Some researchers have proposed that these combinations could each prescribe different expression patterns, for example during development. However, the evidence for a combinatorial "histone code" analogous to the three-base codons of the genetic code remains weak.

Nonetheless, proteins that can modify the tails, either adding or removing a chemical group, can have lasting effects on the activity of the underlying genes. The sirtuin proteins that are candidates for longevity-extending drugs, for example, are best known for their role as histone deacetylases.

Some histone modifications can be passed down through cell division or reproduction, so they qualify as epigenetic changes. In contrast to the natural replication of the mirror-image DNA sequence, replicating histone modifications requires a much more complicated process.

Changes in the pattern of histone modifications are found in many basic biological processes, including development, stem-cell maintenance, and cancer. Particular modification patterns have been used to find specific functional sequences within the DNA, such as transcription start sites and enhancers. For these reasons, the ENCODE project mapped modifications as part of their survey of a select part of the human genome for intense study.

Understanding the mechanisms and roles of DNA organization and how it is changed will be essential to a complete picture of gene regulation.


 

Monday, November 30, 2009

The Honest Broker

In case you hadn't noticed, discussion of global warming has become somewhat polarized. Amid accusations, on the one hand, that industry-financed non-experts deliberately sow confusion, and on the other that a leftist cabal exaggerates the risks and threatens our economy, Roger A. Pielke, Jr. is something of an anomaly.

A professor of environmental studies at the University of Colorado, Pielke is an expert who endorses the broad consensus that humans are causing dangerous changes. But he also criticizes scientists like those on the Intergovernmental Panel on Climate Change for stifling legitimate dissent in the service of narrow policy options. In his 2007 book, The Honest Broker: Making sense of science in policy and politics, Pielke touches on climate change only tangentially as he outlines how scientists can more constructively contribute to contentious policy decisions.

Reading the title, I thought at first that I understood what Pielke meant by an "Honest Broker." As an undergraduate thirty years ago I dabbled in the still-young academic field of Science, Technology, and Society. Books like Advice and Dissent: Scientists in the political arena, by Joel Primack and Frank Von Hippel illustrated how scientists who step outside their specialized knowledge to advocate particular policies risk both their own credibility and that of science. To preserve the authority of expertise, scientists should be careful and clear when they spoke outside of their specialty.

But the intervening years, Pielke says, have shown that the whole notion that science provides objective information that is then handed over to inform policy makers, the so-called linear model, is naïve and unrealistic. Only rarely, when people share goals and the relation between causes and effects is simple, can scientists meaningfully contribute by sticking to their fields of expertise as a "Pure Scientist" or by providing focused answers to policy questions as a "Science Arbiter."

More frequently, people do not share goals and the causal relationships are more complicated. Scientists who wish to contribute to these policy debates are naturally pulled into the role of "Issue Advocate," marshalling the science in support of a narrowed range of politically-supported options. Although this is a useful role, Pielke warns, scientists often drift into it unwittingly. As they deny any political influence on their scientific judgments, these "stealth issue advocates" can damage the authority of science even as they obscure the true nature of the political decision.

To address this problem, Pielke pleads for more scientists to act as "Honest Brokers of Policy Alternatives," to give his complete description. Such scientists, presumably as part of multi-disciplinary committees like the now-defunct Congressional Office of Technology Assessment, would act to expand the available policy alternatives rather than restrict them. Unlike the science arbiter, Pielke's honest broker recognizes an inseparability of policy issues from the corresponding scientific issues, but nonetheless provides a palette of options that are grounded in evidence.

In my technology research, I've had my own complaints about the analogous linear model. I've found that pure research often leads to more research, rather than to the promised applied research and products that make everyone's lives better. But Pielke's criticism of the linear model is more fundamental. He correctly notes that in many complex situations, scientific knowledge does not, on its own, determine a policy outcome. But he then seems to conclude that there is no legitimate role for objectively valid science that can narrow policy options.

In discussing Bjørn Lomborg's The Skeptical Environmentalist, for example, Pielke says "Followers of the linear model would likely argue that it really does matter for policy whether or not the information presented in TSE is 'junk' or 'sound' science." He then shows that for some criticisms of the book, the validity of the science was irrelevant to policy. But many of the standard talking points raised by global-warming skeptics are well within the bounds of science, so clarifying them is a useful narrowing of options, even if it doesn't lead to a single, unanimously correct policy.

Nonetheless, Pielke's short, readable book provides a helpful guide for what we can hope for in policy debates involving science, and how scientists can most productively contribute. What we can't hope for is a single, science-endorsed answer to complex issues that trade off competing interests and conflicting values. For that, we have politics.

Wednesday, November 25, 2009

Green Computing

Supercomputers run the vast simulations that help us to better predict climate change--but they also contribute to it through their energy consumption.

The lifetime cost of powering supercomputers and data centers is now surpassing the cost of buying the machines in the first place. Computers, small, medium, and large, have become a significant fraction of energy consumption in developed countries. And although the authors of Superfreakonomics may not understand it, the carbon dioxide used to supply this energy will absorb, during its time in the atmosphere, some 100,000 times more heat energy than that.

To draw attention to this issue, for the last two years researchers at Virginia Tech have been reordering the Top500 list of the fastest supercomputers, ranking them according their energy efficiency in the Top Green500 list. I have a wee news story out today on the newest list, released last Thursday, on the web site of the Communications of the Association for Computing Machinery. The top-ranked systems are from the QPACE project in Germany, and are designed for quantum chromodynamics calculations.

Calculating efficiency isn't as straightforward as it sounds. The most obvious metric is the number of operations you get for a certain amount of energy. This is essentially what Green500 measures in its MFLOPS/W, since MFLOPS is millions of floating-point operations per second and watts is joules per second.

As a rule, however, this metric favors smaller systems. It also favors slower operation, which is not what people want from their supercomputers. Some of the performance lost by running slowly can be recovered by doing many operations in parallel, but this requires more hardware. For these reasons, the most efficient systems aren't supercomputers at all. The Green500 list works because they only include the powerhouse machines from the Top500 list, which puts a floor on how slowly the competing machines can go.

Over the years, researchers have explored a family of other metrics, where the energy per operation is multiplied by some power of the delay per operation: EDn. But although these measures may approximately capture the real tradeoffs that systems designers make, none has the compelling simplicity of the MFLOPS/W metric. This measure also leverages the fact that supercomputer makers already measure the computational power to get on the Top500 list, so all they need to do extra is measure the electrical power in a prescribed way.

These systems derive much of their energy efficiency from the processor chips they use. The top systems in the current list all use a special version of IBM's cell processor, for example. I worked on power reduction in integrated circuits more than a decade ago--an eternity in an industry governed by Moore's Law--and some of my work appeared in a talk at the 1995 International Electron Devices Meeting. I also served for several years on the organizing committee of the International Symposium on Low Power Electronics and Design, but I'm sure the issues have advanced a lot since then.

In addition to the chips, the overall system design makes a big difference. The QPACE machine, for example, serves up as about half again as many MFLOPS/W as its closest competitor by using novel water-cooling techniques and fine-tuning the processor voltages, among other things. These improvements aren't driven just by ecological awareness, but by economics.

There's still lots of room for improvement in the energy efficiency of computers. I expect that the techniques developed for these Cadillac systems will end up helping much more common servers to do their job with less energy.

Tuesday, November 24, 2009

Happy Anniversary

One hundred fifty years ago today, the first edition of Charles Darwin's masterpiece On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life was published.

I picked up a copy of the book a few years ago for about $10, when the American Museum of Natural History in Manhattan had a Darwin exhibit. The most memorable display for me was the handwritten notebook entry where he first speculated about the tree-like connectivity between different species. I was humbled to be within a few feet of this tangible record of his world-changing inspiration, written with his own pen.

The book itself was very readable, intended as it was for an audience far beyond specialists. Starting with what would then have been familiar techniques of plant and animal breeding--"artificial selection"-- Darwin proposes conceptually extending that process to nature. Combining natural variation with its heritability, a Malthusian appreciation of the struggle to survive and an awareness of the immensity of geographic time, this extension seems eminently reasonable.

And yet there are challenges. Rather than bluster through them, Darwin addresses them head on, conveying an honesty and openmindededness that is bracingly refreshing in our argumentative times.

He confronted head-on, for example, the intellectual challenges of the intricate structure of the eye, fully admitting that the theory demanded that at every step of evolution there be some function for the intermediate forms. Even today, intelligent design proponents profess to be flummoxed by the very challenges that Darwin faced--and faced down.

Darwin also clearly described the tradeoffs needed for the evolution of traits like altruism, avoiding the temptation to invoke the "good of the species." To persist, such traits must provide an advantage to the group that exceeds the cost to individuals. This clear statement of the constraints of group selection needs wider appreciation today.

In these and many other areas, Darwin anticipated and addressed the confusing aspects of his explanation for evolution. And he did it all without even the benefit of Mendel's laws of genetics.

Time and again in the intervening decades, newly uncovered evidence from biology and paleontology has reinforced the essential correctness of Darwin's framework. The laws of genetics and of their DNA mechanism, the fossil record of transitional forms, and mathematical models have all confirmed and clarified the power of undirected selection of random variation for driving innovative new possibilities.

There are caveats, of course. Non-inherited mechanisms of genetic transfer change the story in important ways, especially near the single-celled trunk of the tree of life. Such revisions are hardly surprising after 150 years of scientific advancement.

What is humbling is the persistent soundness of the essence of Darwin's vision, and of this amazing book.