Wednesday, September 30, 2009

Never Fold Alone

Predicting the structure of a protein--the three dimensional pattern that a particular sequence of amino acids folds into to become biologically active--is a perennial challenge of biology.

Researchers have long recognized major drivers of the final shape, such as exposing hydrophilic amino acids to the aqueous environment, while keeping hydrophobic amino-acids tucked away safely inside or in regions that will lie inside of membranes. Chemists back to Linus Pauling have also recognized recurring structural motifs such as alpha helices and beta sheets that allow somewhat regular packing. But even with these constraints, a chain of hundreds of amino acids can arrange in an astronomical number of ways. Exploring these configurations one by one would take virtually forever, so how do real proteins find the few configurations that will let them do their biological job?

One answer is that they don't always succeed. Generally tens of percent of the molecules get mangled along the way and have to be disposed of. But this just reduces the astronomical challenge by a small factor.

Another important fact is that proteins don't fold in a vacuum--or even in a water environment. Even as it is being translated from messenger RNA by the ribosome, a growing polypeptide is joined by proteins called chaperones. These key proteins help to ensure that the new chain folds properly, and also keeps it from aggregating with others (which is another way to tuck away hydrophobic amino acids).

These molecular chaperones are the best-known members of the family of heat-shock proteins (denoted hsp), which are produced in large quantities by cells that have been stressed. Heat, for example, tends to disrupt protein folding, and the chaperones can help put them back together again. In addition to the small hsp70 chaperone that binds to the growing protein, another protein called hsp60 forms a kind of dressing room where the still-folding protein can assemble itself in privacy.

This activity of these chaperones is driven by ATP, the cell's energy currency. Here is a movie of both processes. I'm afraid it didn't help me much, though.

The important point is that protein folding in a cell, like the processing of DNA and RNA, involves the close coordination of other biological macromolecules. This may be part of the reason that, although researches have made a lot of progress in structure prediction from sequence, in part by draw analogies with similar sequences in proteins with known structure, they still struggle with completely novel sequences.

Folding is only one step in the processing of proteins. They also may be acetylated or phosphorylated, crosslinked with sulfur, and combined with metals like iron, zinc, or manganese. They will be decorated with sugars that can, for example, serve as address labels for their final destinations. Those proteins headed for membranes will not be sent out into the cell to fend for themselves, but will bound with membrane and directly handed off. Much of this activity happens in the endoplasmic reticulum, where proteins that have been mangled are identified and recycled.

Even after processing, many proteins will be further modified chemically, for example by adding or removing phosphate groups to modify their activity. Moreover, many proteins do their work as part of complexes with other proteins, either in pairs or other small groups or in larger complexes that may include RNA.

Biology (the reality, as well as the science) is a team sport.

Tuesday, September 29, 2009


I've gotten many good insights from Chris Mooney. In a 2004 story in the Columbia Journalism Review called Blinded by Science, for example (oddly unlinkableposted here), he criticized the journalistic tradition of "balance," as it applied to climate change. He explained that although including diverse points of view gives an impression of objectivity, this habit was giving undeserved credibility to the rare deniers of the consensus on climate. In the intervening years, journalists have become more aware of this problem and more frank in distinguishing the mainstream from the fringe (supported by the increasingly dire predictions of the mainstream view).

In one small section of their recent book, Unscientific America: How Scientific Illiteracy Threatens Our Future, Chris and his coblogger at The Intersection, Sheril Kirshenbaum, expand on this and other ways that journalistic traditions obscure scientific realities. Chief among the disconnects is the news focus on, well, news: what's happening now that we didn't know yesterday? Such event-driven coverage serves poorly many ongoing trends in science (as well as in other areas) that develop continuously or incrementally. The need for a "hook" drives reporters to focus on specific articles in the big journals, rather than the accumulating evidence that they are merely an example of.

Journalists are also prone to framing stories around human elements, especially conflict. There are good reasons for this: people read these stories. But the focus on personalities or revolutions often distracts from the real issues. Biobloggers Larry Moran and T. Ryan Gregory, for example, routinely complain about the misleading narrative that "scientists used to think most of the genome was 'junk," but now they've realized it's good for something." (Scientists have long known that much of it was good for something. Much of it is still junk.)

These differences--driven largely by the business of journalism--are important. Scientists who can't follow Mooney and Kirshenbaum's dictum to transform into public communicators would do well to appreciate what happens to their message when it leaves their hands.

Nonetheless, as someone who has morphed from one to the other, I think the similarities between scientists and journalists are greater than the differences. At a fundamental level, both are professionals dedicated to uncovering reality, wherever it lies. Both groups rely on evidence, and treat personal opinions and popular fads with suspicion, as much as they can recognize them. In each profession, there is a strong social obligation that transcends any loyalty to one's employer or even to one's own prejudices. It is an obligation, as best one can, to speak the truth.

Monday, September 28, 2009

We Did All We Could

As you read this on your computer screen, it's easy to take for granted the billions of transistors--driving the screen, running the programs, storing the data, and bringing it to you over the internet--that make it all possible.

This embarrassment of transistors is affordable because they're made in a parallel process that produces vast numbers of similar devices at once, combined into integrated circuits (ICs). Making sure that they each behave the way they're supposed to demands extraordinarily clean and reproducible manufacturing processes. In fact, after inventing the transistor, Bell Labs was late to the IC party because they didn't think anybody could get them all to work at once.

Later, Bell Labs' parent, AT&T, did get good at ICs. Towards the end of my time in semiconductor device research at Bell Labs I worked with the excellent developers of the upcoming integrated circuit generations for what was then AT&T Microelectronics, who had moved to Orlando, Florida.

One benefit of visiting Orlando and learning about their challenges was that I managed to design some test structures that they included on the photomasks they used to develop their process. It took some convincing for them give up even a tiny piece (about 0.002 square centimeters!) of their very precious real estate. They also need to be sure that my devices wouldn't flake off and mess up other structures that they needed to do their real work.

Months later, it was a real rush to get the first silicon wafers with my devices on them.

First, the structures looked exactly like what I designed. Instead of looking at multicolored rectangles in a CAD program on a computer screen, though, I was looking at multicolored rectangles in a microscope: real semiconductor devices.

Second, there were lots of them. Even though the entire array of test structures was over a square centimeter in area, there were dozens of repetitions on each eight-inch-diameter silicon wafer.

Third, they were all the same. They didn't just look the same: on the unfortunate occasions when I blew one up with too much voltage, I learned that its repeated version would have very much the same electrical behavior.

I also made friends with people who did testing, robotically stepping across the wafer to measure each repetition. So for simple measurements, after a lot of up-front planning, I could sit back and let the data roll in. Whenever development ran a lot of 25 wafers through the several hundred steps it took to get finished ICs, they also made me hundreds of test structures, and measured them, too.

Compared to what I was used to in the physics labs, where you might work weeks to get a sample or two, this was heaven.

With lots of people helping out, we also did something more challenging, which was to explore new ways to process the wafers. For example, my research colleague Joze Bevk devised a scheme to improve the addition of electrical dopants into the narrow poly-crystalline-silicon ribs that formed the gates of the transistors. Our development colleagues helped track the wafers through step after step of the modified process.

One day, when Joze and I were visiting Orlando, our colleague Steve Kuehne approached us. In the matter of a surgeon telling waiting relatives "I'm sorry. We did all we could," Steve gave us the bad news: "The gates are falling off." Joze and I were very disappointed at this failure, since from Steve's grave expression it was clear that the result was a disaster.

Over the next hour or so, as we discussed what sort of stresses in the materials might cause these terrible problems, an interesting fact emerged. Out of many millions of gates on the test wafer, perhaps 20 had fallen off! Only the high-throughput measurement tools in the development line, which scan the entire wafer looking for anomalies, could even detect them. This is what Steve meant when he said the gates were falling off. For him, a process with even that many broken devices was a non-starter.

I don't doubt that the developers could have devised modifications of the process that reduced the problem, it if had seemed worthwhile--or if they had invented it themselves. Nonetheless, it was a powerful reminder of the degree of reproducibility that IC manufacturing demands.

When I see a news story about some new technique that's going to change the way ICs are made (like this one or this one--not to pick on IBM), I remember how few failures are deadly. If you can see variation in a handful of devices, then someone is going to have to do an awful lot of work before they can be made by the billions.

Now go back to taking them for granted.

Friday, September 25, 2009

The Map and the Territory

The map is not the territory. Alfred Korzybski

I confused things with their names: that is belief. Jean-Paul Sartre

Ceci n'est pas une pipe. René Magritte

In fields ranging from economics to climate to biology, scientists build representations of collections of interacting entities. Everyone knows that the real systems have so many moving parts, influencing each other in poorly known ways, that any representation or model will be flawed. But even though they understand the limitations, experts routinely talk about these systems using words that come from the models, rather than from reality. Climate scientists talk of the "troposphere," economists talk of "recessions," and biologists talk of "pathways." Such concepts help us organize our thinking, but they are not the same as the real thing.

Sometimes the difference between the "map" and the "territory" is manageable. Roads and rivers are not lines on a piece of paper, but they clearly exist. Similarly, the frictionless pulleys and massless ropes of introductory physics have a simplified but clear relationship to their real-world counterparts (at least after you've spent a semester learning the rules). Still, it's easy to get sucked into thinking of these well-behaved theoretical entities as the essence, the Platonic ideal, even as one learns to decorate them with friction and mass and other real-world "corrections."

For many interesting and important problems, however, the conceptual distance between the idealizations and the boots-on-the-ground reality is much larger. You might think that experts would recognize the cartoonish nature of their models and treat them as crude guides or approximations, rather than fundamental principles partially obscured by noisy details. Judging from the never-ending debates in economics, however, the more obscure the reality, the more compelling the abstractions become.

Even in less contentious fields, experts can mistake the models for reality. For example, the fascinating field of systems biology aspires to map networks containing hundreds or thousands of molecules using high-throughput experiments like microarrays and computer analysis. Although one might like to describe all these interactions using coupled partial differential equations, researchers would often be happy simply to list which molecules interact. This information is often represented as a graph--sometimes called a "hairball"--which represents each molecule as a dot or node, and interactions as a line or edge connecting them.

Finding such graphs or networks is a major goal of systems biology. In principle, an exhaustive map is more useful than the traditional painstaking focus on particular pathways, which are presumably a small piece of the entire network. But to yield benefits, researchers need to understand how "accurate" the models are.

A few years ago, a group of systems biologist decided the time was ripe to critically evaluate this accuracy. They established the "Dialogue on Reverse Engineering Assessment and Methods," or DREAM to compare different ways of "inferring" biological networks from experiments. (I covered the organizational meeting, as well as meetings in 2006, 2007, and 2008, under the auspices of the New York Academy of Sciences. A fourth meeting, which like the third will be held in conjunction with the RECOMB satellite meetings on Systems Biology and Regulatory Genomics, is scheduled for December in Cambridge, Massachusetts.) These meetings, including competitions to "reverse engineer" some known networks, have been very productive.

Nonetheless, one thing the DREAM meetings made clear is that "inferring" or "reverse engineering" the "real" networks is simply not a realistic goal. Once the networks get reasonably complicated, it's essentially impossible to take enough measurements to clearly define the network. The ambiguity even applies to networks that actually have been engineered, that is, created by people on computers. The "inferred" networks are a useful computational device, but they are not "the" network. And they never will be.

For these reasons, many researchers think the only proper way to assess the results is by comparing to experiments. If the models are good, they should not only match observed data, but should extrapolate to accurately predict what happens in a novel situation, such as the response to a new drug. Interestingly, the most recent DREAM challenges included tasks of this type. Disappointingly, however, the methods that best predicted the novel responses simply generalized from other responses: they did not include any network representation at all!

It seems reasonable to expect that a model that tries to mimic the internal network, even if it is flawed, would better predict truly novel situations. But it's hard to know what it will take for the system to hit a tipping point where it does something completely different, which was never observed before or included in the modeling. Often, we won't recognize the limitations of our complex models--in biology, climate, or economics--until they break.

Wednesday, September 23, 2009


When it comes to inheritance, there's no beating the DNA sequence for storing and passing on complex information. But other, "epigenetic" mechanisms also bequeath information to subsequent cells or offspring, sometimes in response to environmental changes.

In principle, the word "epigenetics" could apply to any inheritance outside of the genetic sequence. For example, when a cell divides, its contents are divided among the daughter cells. Any transcription factors or other chemicals that alter gene expression are therefore passed on independently of the DNA (along with the mitochondria, which have their own DNA). In recent years, however, "epigenetics" has come to be used mainly to describe two types of chemical changes directly associated with DNA in the nucleus, other than its sequence.

These changes modify how active various genes are in a particular cell. They are particularly important for enforcing the "no turning back" feature of differentiation from versatile stem cells to specialized cells, helping to shut off cellular programs that were active in the early embryo. Moreover, epigenetic changes are passed on during cell division, so that the differentiated cells and all cells made from them lose their ability to become other types of cell. It should not be surprising that many cancers subvert the epigenetic programming to help them re-activate embryonic programs to help them survive and spread. Researchers have identified many epigenetic modifications in cancer cells.

Epigenetic changes can also pass between generations. Biologists have long known of cases of "imprinting," in which the mother's or the father's DNA is inactive in the offspring. Even in people, researchers have found that food shortages in Holland at the end of World War II resulted in changes in the metabolism of the children of women conceived during that period. Such effects are unusual, but profound.

This sounds disturbingly like inheritance of acquired characteristics, as in Kipling's "Just-So Stories." This concept, often misleadingly associated with early 1800's evolution pioneer Jean-Baptiste Lamarck, was supplanted by Darwin's notion of natural selection of random variations. But persistently activating or suppressing pre-existing genes for a few generations, even in response to environmental pressures, is not the same thing as creating novel properties. Some scientists, notably Eva Jablonka of Tel Aviv University, maintain that epigenetic effects can be permanently enshrined in the sequence, but that remains a minority view. Equating epigenetics with Lamarckism is misleading, despite having a grain of truth.

The two best known epigenetic mechanisms are chemical changes that alter the transcription of DNA. One mechanism modifies the DNA itself, while the other modifies the packaging of the DNA in the nucleus.

(Click to open in new window. Source: NIH)

In DNA methylation, methyl (-CH3) groups are chemically bonded to a base in the DNA sequence, usually a cytosine (C) next to a guanine (G), together called CpG. The presence of the methyl group suppresses translation of the DNA sequence that contains it. In addition, the cell contains enzymes that recognize methylation of one chain of DNA and methylate the other chain, helping to propagate the information.

The second mechanism affects the packing of the DNA into the compact structure known as chromatin. The paired DNA chains wrap tightly around a cluster of proteins called histones to form a nucleosome. Nucleosomes strung along the DNA chain themselves pack into compact arrangements that make it hard for the transcription machinery to get at them.

The details of this process are only partially understood. One thing that is known is that free "tails" of the histone proteins straggle out of the nucleosomes, and that chemical modifications of these tails modifies transcription. The modifications include single or multiple methylation or acetylation (adding -COCH3) of particular amino acids positions in the tail, as well as binding of other factors. The details matter: particular modifications either increase or decrease transcription.

In recent years researchers have developed techniques for mapping both DNA methylation and chromatin modification over large regions of the genome. Using these techniques and others, biologists are beginning to understand when and where these epigenetic modifications occur in normal and diseased cells, how nutrition and other environmental influences change them, and how specific modifications are actively regulated to modulate gene expression.

Tuesday, September 22, 2009

Forbidden Questions

To navigate the quantum world, you have to know what questions not to ask.

In the everyday world, we get along fine assuming that a baseball, for example, had a certain momentum even before we whacked it and felt the effects. But at the quantum level, an observable effect like the push on a bat does not give us permission to regard the ball's earlier momentum as having been a "real" quantity, independent of the swinging bat. Talking about such unobserved properties is a recipe for trouble.

This takes a lot of getting used to.

I ran head on into this problem in my latest story for Physical Review Focus. I first titled the story "How Long is a Photon?" and described the experiments as measuring the "duration of individual photons." That description was wrong, and came from asking forbidden questions.

Optics experts often measure the duration of pulses that are only a few femtoseconds (10-15 seconds) long. This is much too fast for direct electronic measurements, so they do it by making two similar pulses and measuring whether they overlap. Delaying one of the pulses by more than their length stops them from overlapping. Actually the researchers repeat the experiment with millions of pairs of pulses, each with a particular delay, to build up a picture of how the overlap varies with delay. For pulses consisting of many photons, it is natural to regard the overlap time as reflecting the length of the underlying pulses.

The new experiments look a lot like this. But the difference is critical.

Kevin O'Donnell, at CICESE in Baja California, built on earlier experiments from the Weizmann Institute in Israel. Instead of pairs of pulses, however, these groups measure pairs of photons. They get the pairs by shining a steady green laser into a special crystal, which splits about one green photon in ten million into two infrared photons. Because these two photons are created as a pair in a single quantum-mechanical process, they are "entangled": properties deduced from measurements on one will always be related to properties deduced from measurements on the other, even if the measurements are done far apart.

The nature of this connection is one of the central oddities of quantum mechanics. In fact, we could save a lot of trouble by not talking about the individual photons at all, because in a profound sense they do not exist as separate entities, even after "they" move away from each other. But our language makes it hard to talk about a pair without think of it as a pair of something.

As in the experiments on pulses, O'Donnell delays one photon with respect to the other and measures their overlap. (I can say that without saying photon, but it gets a lot more complicated.) But as he explained to me, it is not meaningful to relate this overlap to the "length" of the photons. Instead, the result of the overlap experiment at different delays is a property of the combined state of the two photons. The final story, "The Overlap of Two Photons," takes pains to describe that correctly, at the cost of clunkier language and probably losing some readers.

As another example, a researcher who measures the energy of one photon in a pair can be assured that the energy of the other will be just right, so that their combined energy equals that of the original green photon. But that doesn't mean that the photon "had" that energy before the measurement was made. More complex experiments, in fact, show that the unmolested photon does not have any particular energy.

In the current experiment, you don't go too far wrong by imagining (incorrectly) that it measures the length of a photon. But this "bad habit," as described by David Mermin in Physics Today (subscribers only, or you can google the title), of conferring reality on properties that aren't or can't be measured, is the root of much confusion. More importantly, thinking (and talking) precisely about what actually exists is key to understanding the nature of the quantum world we inhabit.



Monday, September 21, 2009

Photo Finish for the Netflix Prize

It's not every day you see seven computer scientists grinning awkwardly behind one of those goofy six-foot checks they give out to lottery winners.

That was the scene in Manhattan's Four Seasons Hotel this morning as Netflix announced the winners of their $1 million competition to improve the system that underlies the movie recommendations that they make to their customers. I wrote about these "recommender systems" and the Netflix Prize in the August issue of Communications of the Association for Computing Machinery. Just as that article was going to press, someone beat the 10%-improvement threshold for bringing home the award.

But only today did Netflix announce that the winning team was BellKor's Pragmatic Chaos, a longtime leader, some of whose members appeared in my story. Although this team was the first to break the barrier in late June, other teams had subsequently passed them, and they submitted their winning entry only 20 minutes before the deadline (30 days after the barrier breaking). In fact, another submission matched their 10.6% improvement --but because the Ensemble team submitted their entry ten minutes later, they spent the presentation clapping politely from the audience.

Most researchers will say the prize money is only a part of the excitement of this competition--and in any case the winning members from AT&T Labs will be handing their winnings over to their corporate sponsor. A major draw for researchers was access to Netflix's enormous database of real-world data. The company also maintained an academic flavor by requiring that winners publish their findings in the open literature, and by maintaining a discussion board where competitors discussed results and strategy.

We don't know how many other companies have taken advantage of these open results, but Netflix certainly has. Netflix's Chief Product Officer Neil Hunt says the company said the company has already incorporated the two or three most effective algorithms from the interim "progress prizes." Moreover, "we've measured a retention improvement" among customers, Hunt said. The company is still evaluating which of the several hundred algorithms that were blended together to win the prize will be incorporated into future recommendations, since the results need to be generated very rapidly. "We still have to assess the complexity of introducing additional algorithms," Hunt said.

At the ceremony, the company didn't talk much about the other features, beyond predicting "star" rankings, that define good recommendation systems. As discussed in my CACM story, these include aspects of the user interface, such as the way users are encouraged to enter data and the way the results are presented. In addition, a good recommendation needs to go beyond predictable satisfaction to include serendipitous choices that a customer would not find on their own.

Rather than take on these more psychological challenges, the "Second Netflix Prize" will address the more algorithmic challenge of making predictions for customers who haven't ranked many movies, for example those who just signed up or who don't feel like providing ratings. To augment this "sparse" data, Netflix will provide competitors with various other tidbits of data, including demographic information like zip code and data about prior movie orders. But not, Hunt hastened to add, names or credit-card numbers.

As my earlier story discussed, such "implicit" user information is of growing importance for recommender systems. For one thing, it's harder to distort this kind of input by pumping up certain products with fake ratings. In addition, although Netflix can easily cajole customers to take the time to enter ratings, many commercial sites are more limited and have only implicit data to work with.

The new prize doesn't set any explicit performance goals. Instead, Netflix plans to award $500,000 each to the best performers as or April, 2010 and April, 2011. But most of the winners today weren't sure they were going to sign on to the new challenge. They were too tired.

Saturday, September 19, 2009

Truth or Beauty?

How do scientists screw up in reaching the public? That's the theme of the new book, Don't Be Such a Scientist: Talking Substance in an Age of Style (which came out right away in paperback). Read it.

The author, Randy Olson, knows both sides. He resigned a tenured professorship at the University of New Hampshire in the early 90s to start over as a filmmaker, most notably making Flock of Dodos and Sizzle. But the impact of these movies, Olson says, comes from bypassing the "head"--the cerebral target of most documentaries--and aiming instead for lower organs: the heart, the gut, and perhaps even the naughty bits. Instead of being "about" intelligent design and global warming, respectively, they illuminate these topics obliquely through a more human story line.

Olson doesn't dismiss the power of the science, and even admits that he's known among friends for his negative, skeptical, and even boring demeanor. But reaching a wider audience needs a more visceral appeal. Olson's self-deprecating humor makes it easy for nerds like me to recognize how our analytical habits of thought and speech can turn people off. But in the end he appeals to scientists to heed their own "voice," while becoming more "bilingual" by learning additional ways to first engage people in order to inform them.

Chris Mooney and Sheril Kirshenbaum's Unscientific America: How Scientific Illiteracy Threatens Our Future addressed some of the same issues of communicating science. For example, both books extol Carl Sagan as a master communicator who suffered for it professionally. But Unscientific America's more academic approach ultimately left me disappointed, with its bland recommendation that more scientists should reach out to the public, and be rewarded for it. Don't Be Such a Scientist gives a much more satisfying vision of how to get there, on a personal level. But it also shows what makes it hard.

One of the big challenges is the conflict between "accuracy" and "boredom," which Olson likens to the tradeoff between false positives and false negatives in a classification task. For example, you can't catch every instance of disease without mistakenly diagnosing some healthy people. By temperament and training, scientists regard accuracy as paramount, even at the cost of some boredom. Effective communication, he says, requires a different balance, which scientists will have to learn to live with.

No doubt this tradeoff exists, and it's important to recognize it. But this part of the book struck me as a bit of a copout, because--with effort--you can change the terms of the tradeoff. For example, an improved medical test can significantly reduce both the false positives and false negatives. The idea that compelling communication requires sacrificing accuracy is a red herring, one that fueled a huge amount of discussion in the blogosphere a year or two ago around the issue of "framing" (which I will not get into here).

It would be more useful to address the specific issues of how accuracy and boredom conflict in particular cases, and tricks to sidestep each conflict. Olson doesn't get into this level of detail, taking it for granted that a movie about global warming, for example, must make factual errors if it is to reach a wide audience.

I'm sure the challenge in movie-based storytelling is much harder than in what I do, which is writing for an audience that's already somewhat scientifically engaged. Still, much of my writing time is aimed at changing the terms of the accuracy/readability tradeoff. It takes a lot of work, but it's a major part of the art of scientific communication.

Of course, there's still a limit to how far you can take this, and in the end there will still be a tradeoff. Olson is right that sacrificing absolute accuracy sometimes makes it possible to communicate a larger truth, and do it in a way that people engage with and remember.

Friday, September 18, 2009

Visualizing Orbitals

When I was first learning about science, everyone was confident about the existence of atoms but no one ever expected to "see" them. That all changed in the 1980s with the invention of scanning-probe microscopies at IBM Zurich, first scanning tunneling microscopy, then atomic force microscopy and others.

But even in those earlier days, textbooks showed a few pictures of atoms in real space. Those pictures came from a field emission microscope, in which the strong electric field at an ultra-sharp metallic tip rips electrons out of the atoms. Because the field lines diverge rapidly from the tip, the pattern of electrons from different atoms spreads out rapidly until an enlarged version of the atomic arrangement can be directly visualized on a phosphor screen.

Now Ukrainian researchers have adapted this venerable technique in an upcoming paper in Physical Review B to look at the different arrangements of electrons within a single atom. Yes, that looks like s and p orbitals. Yet another thing I never thought I'd see! What a world, what a world.

Ban the Authors!

Good story on ghostwriting today at the New York Times, "Medical Editors Push for Ghostwriting Crackdown."

In an interview last month, Dr. Cynthia E. Dunbar, the editor in chief of Blood, said that, in the future, the journal would consider a ban of several years for authors caught lying about ghostwriting, in addition to retracting their ghosted articles.

Why consider? Do it.

Thursday, September 17, 2009

Pathways to Disease

Most common diseases, including the big killers like heart disease, are "complex": they can't be blamed on single causes like a particular gene. Instead, they result from a complicated interaction of factors that may include lifestyle, environmental exposures, or infection, as well as genetic effects. Moreover, large-scale surveys of genetic influences have confirmed that, in many cases, lots of different genes contribute to disease, each in a small way.

These generalizations also apply to cancer. Cancer differs from the other diseases because most of the genetic changes in cancer cells aren't present in the rest of the patient's cells. Instead, mutations, copy number variations, and large-scale chromosome anomalies accumulate as the disease progresses. These alterations are often abetted by early disruptions of the usual mechanisms for maintaining genome quality during cell division and for executing damaged cells. In spite of these differences, the first major results last fall from The Cancer Genome Atlas comparing the genetics of glioblastomas (deadly and virtually untreatable brain cancers) found no specific mutation was present in all of the tumors. The huge team of researchers did a comprehensive analysis including gene expression, copy number changes and epigenetic changes. But although some changes happened rather frequently, there was no single "smoking gun."

Nonetheless, these studies, in both cancer and other diseases, find clear patterns among the genes whose activity is altered in one way or another. When researchers put the changes in the context of the complex network of molecular interactions in the cell, most of the changes cluster along clear "pathways." As Todd Golub told a meeting I covered last year, just after the glioblastoma results were published: "What was gratifying about this was that this was not just a sprinkling of mutations randomly across the genome, which were difficult to decipher in the context of any kind of mechanistic understanding, but rather these were falling together in a set of pathways that were increasingly well understood in cancer."

I regard the word "pathway" is a bit of a misnomer, since it suggests a linear sequence in which each molecule affects the next one in a chain. In the early days, that was about all that experiments could get at, but researchers have long recognized that networks are messier than this. For example, there may be multiple, parallel influences of one molecule on another, and there are almost always feedback paths in which the final outcome comes back to modify the early steps.

Nonetheless, although they are complex and interconnected, these pathways give researchers a useful shorthand for navigating the rich networks of interactions and for communicating with others. In fact, many researchers specialize in particular pathways, getting to know each molecular member "personally," as well as the effects they have on one another.

Results like the glioblastoma study also show that the pathway level may be a more useful level of "granularity" for thinking about disease than the individual molecules are. Focusing on pathways (or "modules," or "motifs," or whatever) gives us simple-minded humans a better intuitive understanding of a disease, which is important. Moreover, in treatment, researchers can be led astray by focusing on molecular-level changes such as individual genetic variants, since these are not the same for everyone. Targeting specific pathways, for example with combination therapies that attack several "nodes" of the network at once, may prove to be more effective against diverse groups of patients.

But the most important benefit of isolating pathways may be that many of them are shared by different diseases, which is leading to new insights into the relationships between diseases.

Wednesday, September 16, 2009

The Healing of America

In honor of the constructive and collegial discussion of health-care reform going on in our nation's capital, I've just finished reading Washington Post correspondent T.R. Reid's new book, The Healing of America: A Global Quest for Better, Cheaper, and Fairer Health Care. I first heard about it in a great interview on NPR's Fresh Air.

This highly readable book illustrates with brutal clarity how out of step the U.S. is with other advanced nations. At the same time, Reid shows that we have several proven ways to simultaneously improve the accessibility of health care and reduce its cost--if we are willing to look outside our borders for guidance.

Reid divides the world's health systems into four types, noting that different groups in the U.S. already experience each one:

  • The systems that most resemble the widely reviled "socialized medicine" follow the "Beveridge model" of the National Health Service in the U.K.: the government runs both delivery and payment. The Veteran's Administration in the U.S. is similar.
  • In Canada, a government-run single payer (actually one for each province) pays private practitioners. The U.S. Medicare system follows this model.
  • The "Bismarck model" used in Germany and many other European countries, as well as Japan, requires everyone to get insurance from mostly private providers, generally with partial payment from employers, and most providers are also private. This is similar to the coverage many employed U.S. citizens get, but the insurers are non-profit and are required to take everyone and the fees for treatment are generally negotiated at the national level.
  • The "out-of-pocket" model is common in developing countries, where people get whatever care they can afford--and many get nothing. Millions of Americans get the same treatment.

Reid doesn't dismiss the downsides to the different approaches-- restricted options in the U.K, long waits for elective procedures in Canada, and merely middle-class pay for doctors in most countries. But at the same time, he notes that patients in these other countries are often completely free to choose their doctors--in contrast with the restrictive insurance-company networks in this country. And all of these countries have significantly lower costs, often half of per-capita costs in the U.S, partly because they spend much less on paperwork.

All the rich countries of the world have opted for universal coverage--except the U.S. For Reid, this moral question should be addressed first: "is access to health care is a basic right?" Or is it acceptable that tens of thousands of Americans die each year for lack of insurance? He thinks that trying to sell reform on cost alone, as the Clintons did in 1994, is misguided.

Nonetheless, Reid clearly expects that bringing everyone into a single plan will provide the joint sense of purpose and the negotiating leverage to reduce costs. Like waiting in line at the grocery store, it's a lot easier to accept limits if everyone is treated equally. Currently, even though the U.S. spends more than anyone on health care, it falls far short on measures such as life expectancy or infant mortality. We're not getting what we pay for.

Tuesday, September 15, 2009

Targeting Cancer

If personalized medicine ever becomes widespread--and I hope it does--it will probably start with cancers.

In fact, it already has. More than ten years, ago, in 1998, the FDA approved the Genentech monoclonal antibody Herceptin (trastuzumab) as part of treatment for metastatic breast cancer--but only for patients who overexpress the membrane receptor ErbB-2 (also called HER2). For these patients, the extra copies of ErbB-2 generate signals that make the cancer spread more aggressively. The antibody binds to the receptor and diminishes this effect. But Herceptin was only shown to be effective in people who, as shown by laboratory tests, have an excess of the receptor. The approval was conditional on positive test results.

Cancers ought to be the best case for personalized medicine because treatment decisions are made by experts in the disease, based on medical tests and observations. These experts recognize that different tumors respond differently, and they are accustomed to adjusting treatment accordingly. In contrast, for many other diseases, such as mental illnesses, doctors often depend on more subjective symptoms, and patients are susceptible to the default "one size fits all" advertising of pharmaceutical companies. Cancer treatment is still the province of experts.

But it's important to ask whether those experts are doing what they need to, to get the drug to the people who will benefit, and not to the people who will not. A new article in Cancer addresses this question, and the answers are troubling.

The main complaint of the article is that there's not enough data to know. Kathryn Phillips, of the Center for Translational and Policy Research on Personalized Medicine at UCSF, and her colleagues find that in many cases there is no documentation that patients are receiving the right tests to guide their treatment. They also cite other results that

  • Perhaps two thirds of patients who could get the test to see if Herceptin would be appropriate may not get it (at least it's not recorded). By implication, many patients aren't getting a treatment that might help them.
  • A fifth of patients who do get the drug have no record of having gotten the test. This means that patients may be taking a drug, and suffering its cost and side effects, without any evidence that it will help them.
  • A fifth of the test results may be incorrect.

The argument for approving the drug was that it would make treatment cheaper and more effective. That only makes sense if the tests are given, are accurate, and are used to guide treatment. The success of personalized medicine depends on new, reliable procedures for ensuring that treatment is coupled with validated tests. If it can't be done with cancer treatment, it's hard to believe that it's a realistic goal for other diseases.

By the way, the researchers get funding for their research (said to be unrestricted) from the foundations of major health insurance companies. I'm not sure what to make of that.

Monday, September 14, 2009

The First Transistor

Over the weekend I helped guide some residents of Berkeley Heights, New Jersey (my home town), through the museum at Bell Labs (which is also in the town, although its postal address is Murray Hill). The free, publicly accessible lobby exhibit is an amazing testament to the many contributions that Bell Labs has made over the decades.

My personal favorite display is a replica of the original transistor, which was created in this very building (in the center of this google maps picture) in late 1947. It is amazing to think that there are billions of these electrical switches in the laptop that I'm typing this on, which cost me a few hundred dollars.

What was funny about showing this replica to non-experts, though, is that they naturally perceive it the way they do the other devices on display, as the carefully crafted product of finely-honed technology. In fact, the thing is a complete kluge.

It's easy to miss the scientific centerpiece of the whole apparatus: it's the silvery-gray slab sitting on top of the larger copper-colored slab. This is the semiconductor, in this case a piece of very pure germanium, which you can only make out in the picture because of a reflection from its somewhat ragged edge. Semiconductors, with their ability to morph from a metal-like conductor to an insulator and back again, are the materials that make the entire electronics industry possible.

The other key ingredient that you can't see is at the bottom of the clear triangle. That triangle itself is just a piece of plastic or something. The trick is that the experimenters wrapped some gold foil along its edge (probably more sloppily than in this replica). At the tip of the triangle, they then sliced the foil with a razor blade to form a very narrow gap, probably tens of microns in width, between two remaining pieces of foil. They connected the two pieces to separate parts of their electric circuit with the thin coiled wires you can see.

They then smushed the tip of the triangle, with its two almost-touching pieces of gold foil, into the surface of the semiconductor. To hold it in place, they fashioned a spring from the stiff wire, maybe a paper clip, that you see in the replica The final step is to connect their circuit to the copper-colored block, and thus to the semiconductor. So most of what you see in the picture is just there in a supporting role. The two pieces of foil, each touching the semiconductor, and the tiny gap between them, is where all the action is.

For electricity to get from one piece of foil to the other, it has to pass through the semiconductor. Applying an electrical signal to the base that supports the semiconductor, changes its properties, changing the amount of current that flows. The result is that a small signal on the base turns into a big change in current, so the signal is amplified. Even then, the current has trouble making it very far, which is why the gap has to be small. One of the first things the researchers did was to connect it to a speaker (like those in the telephones of their employer, Ma Bell) and verify that, yes, it sounded louder.

This rickety contraption is a classic experiment in progress. In fact, I'm confident that, at first, the real thing looked a lot messier than this replica. The scientists were throwing stuff together to see whether they could see the transistor action. If this hadn't worked they'd have thrown something else together, maybe with a narrower gap or a different metal, or washed the semiconductor differently.

To me, the ugliness of this device is its beauty. It's a snapshot of a discovery in progress.


Recent history note: The replica in the picture, unlike the one in the museum, was made for a 50th anniversary celebration held at the Murray Hill facility in 1997. Lucent Technologies' Microelectronics Group, the name on the plaque, included both the integrated-circuit business and the optoelectronic-device businesses (not including optical fibers). Three years later, Lucent decided to spin that group off as Agere Systems, along with the people at Bell Labs (like me) whose work related to those businesses. (In a separate decision, they sold their optical fiber group and the associated Bell Labs members.) Within another few years, the optoelectronics business had been sold to Triquint, and much of it later became Cyoptics, while the IC business had been acquired by LSI. Most of the people from Bell Labs (like me) had already left. So it goes.

Saturday, September 12, 2009

E pluribus unum

A few diseases can be traced to specific genetic variants. The nerve degeneration of Huntington's Disease, for example, arises exclusively from alterations of either copy of a gene on chromosome 4. This gene specifies a protein that is now called huntingtin. Such diseases are referred to as Mendelian, since they follow the simple rules of inheritance that Gregor Mendel observed in his pea plants.

For most diseases, though, it has proved difficult to find individual genes that explain much of the risk. Instead, the growing evidence from large-scale studies is that many variants contribute, each contributing only weakly. Even then, the genes alone do not condemn a person to the disease, which may also depend on microbes or non-living elements of the environment or on lifestyle. These "complex" diseases include all of the biggies, like heart disease and stroke, cancers, and many mental illnesses.

In some ways, the failure of the "one-gene/one-disorder" hypothesis shouldn't be too surprising. After all, a gene that reliably causes a fatal disease should have been largely weeded out by natural selection. Huntington's disease avoids this fate because it usually appears late in life, often after people have already had children (including, fortunately for us, Arlo Guthrie). Sickle-cell anemia persists because people with a single variant gene are resistant to malaria, although two copies cause the disease.

Nonetheless, lots of other diseases have an import genetic component, which can be determined by comparing the disease rate for close relatives. For example, if pairs of "identical" twins are more likely to both get a disease than are fraternal twins, the difference presumably arises because they share their entire genome, rather than only half.

For simple Mendelian diseases, researchers have extended this approach to locate where the disease gene resides in the chromosomes. This "linkage" analysis looks at which close relatives inherited a disease, and what known chromosome features they also inherited. This technique was applied in the 1980s to locate the Huntington's gene by testing dozens of residents of a Venezuelan village that had unusually many cases.

But human populations aren't particularly well suited for linkage studies. People don't have a lot of children, and they resist attempts at controlled breeding. As a result, it's hard to see weak genetic effects.

To get more subjects, researchers use association studies, which compare the genetics of unrelated individuals. Historically, you really had to know where to look to make associations studies work. But in the past few years researchers have done dozens of "genome-wide association studies," or GWAS, that look without prejudice across the entire human genome.

These studies are tricky. For one thing, since they monitor perhaps a million genetic markers at once, the chances are good that a marker will correlate with the disease by dumb (bad) luck. In individual experiments, researchers traditionally ignore a result if the probability of it arising by chance isn't less than 5% (P<0.05). For testing a million markers, they might need to ignore a result unless the effect is so strong that the probability that it arose by chance is less than perhaps 5x10-8. To get such a convincing effect requires a lot of human subjects, generally hundreds or thousands. Even so, GWAS results often fail to recur when someone else tries the experiment.

Nonetheless, some genome-wide studies, like two for Alzheimer's I wrote about recently, have uncovered genes repeatedly associated with disease. In addition to variations of the DNA sequence, these studies often include structural variants such as copy-number variations, as well as "epigenetic" tags that change the expression of particular DNA regions. In spite of finding some likely genetic suspects, though, the total effect of all of the known variants is generally less than the known genetic component of these complex diseases. Researchers are actively debating the causes of this discrepancy; probably part of it comes because there are other contributions that are too weak to be seen in these studies.

Because complex diseases depend on the small contributions of many genetic variants, as well as the environment, buying your personal genome often won't tell you much definitive. But by studying these variants, and the way their effects interact in cells, researchers are learning a great deal about the nature of the diseases, including potential strategies for treating them.

Friday, September 11, 2009

Medical "Ghostwriting" Update

The practice of "ghostwriting" can range from unacknowledged editorial assistance to getting someone else to sign on as author of a paper that you wrote.

Reports from the Sixth International Congress of Peer Review and Biomedical Publication, this week in Vancouver, confirm that the practice is widespread, but don't clarify where it mostly falls on this spectrum:

  • In a survey, 7.8% of respondents admit that, on their articles in major medical journals, people who could have been listed as co-authors were not.
  • Looking at the metadata in Word files reveals hidden contributors in many manuscripts.

I hope that the journals, and the academic community, can figure out how to clamp down on this practice.

Added 9/13:

At the Knight Tracker, Paul Raeburn commented on the coverage of this conference. In particular, he notes that several stories touted the dangers of pharma ghostwriting stories, when the survey mentioned above does not actually reveal the nature of the unattributed authors.

Thursday, September 10, 2009

The Firehose

When I arrived as an undergraduate at MIT, the orientation material likened an education there to drinking water from a firehouse. I get the same sense of frustrated exhilaration when I go to the amazingly useful web site of the National Center for Biotechnology Information. It's just plain humbling, how much there is to know.

The original raison d'être of NCBI, from its founding in 1988, was the maintenance of centralized databases of the staggering amount of genetic and other biological information, as well as tools for navigating through it. The center also funds extramural work to improve the acquisition, navigation and analysis of this and other data. Want to see the genetic markers in a particular section of human chromosome 13, together with nearby genes, hyperlinked to annotations of their function? It's all there for your perusal. And more. And more.

NCBI also hosts PubMed, a very useful hyperlinked database of journal articles, centered on biology and medicine, but also including other journals to some degree. It features really smart keyword searching, and includes links to online copies that are sometimes free. In fact, new rules also require, in principle, that a copy of any publication of research funded by the National Institutes of Health be deposited or freely linked at PubMed within a year of its publication, although I'm not sure they have the resources to enforce that requirement.

But one very cool resource, for those of us who don't have access to university libraries, is the bookshelf. I like Wikipedia a lot (although the memristor entry looks like it was written by HP), and I refer to it for background information in my stories, cautiously. But there's nothing like going straight to Alberts' Molecular Biology of the Cell, Stryer's Biochemistry, or dozens of other texts, for truly authoritative information (at least at the time it was written), systematically and comprehensively presented.

Take a sip!

Wednesday, September 9, 2009

Structural Variants

The release of the draft map of "the" human genome sequence in 2000 raised hopes that the genetic sources of human variability, and especially disease, would soon be identified. But it has become increasingly clear that the sequence overlooks a major source--perhaps the major source--of genetic variation.

In the years after the sequence was mapped, the International HapMap Project worked to identify some ten million common alterations of individual bases throughout the genome. These "single-nucleotide polymorphisms," or SNPs, constitute a molecular fingerprint or genotype, and companies now offer microarrays to test subsets of them. Researchers look for correlations of disease with particular variants to locate nearby genes that may cause disease. With a few exceptions, though, this process been rather slow, and the genes it finds explain only part of the genetic contribution to disease.

One reason for this--although not the only one--is that the sequence differences don't reflect important genetic differences that arise when large segments of the gene are missing, duplicated, or reversed. Researchers estimate that these "structural variants" affect many more bases than the individual base changes. The importance of these changes was recognized early on in cancer, where they arise from the disruption of the usual quality-control mechanisms of DNA replication, but the past few years have shown that their influence is much more widespread.

The changes were previously invisible because the usual method of sequencing first chops up the DNA into many smaller pieces, whose base sequence is easier to determine. The different sections are then compared in software to see how they match up. With enough overlap and duplication, researchers can make a reasonable guess for the original long sequence. But this method breaks down in regions where sequences occur more than once, because there are many ways to match things up.

In recent years, experimenters have devised several techniques for finding copy-number variations arising from insertions or deletions, as well as inverted sections. For example, Mike Snyder's group at Yale developed a method that I covered for the New York Academy of Sciences (if you're not a member, look at "Go Deep" in the NYAS section of the "Clips" tab at my website). Most of the regions are 3,000-1,000 base pairs in length.

These changes contribute to many diseases. Last year, for example, an international consortium found that structural variants play a role in schizophrenia. But instead of fingering a few key suspects, the results pointed to hundreds of copy-number variations, each one of which has only a small effect. Interestingly, some of the same genetic regions seem to be involved in other mental illnesses.

Like genetic studies that use SNP genotypes, these results highlight the complex nature of many diseases, and the many distinct disruptions that can cause them. Treating these diseases may require a better understanding of the complete networks of interactions that underlie them. At the same time, different diseases seem to have important elements in common, and perhaps should be thought of as members of disease families.

The next few years should see a dramatic increase in the understanding of structural variants in human differences and disease as well as in human evolution.

Tuesday, September 8, 2009


One frustrating aspect of science writing is that there's little market for stories that are skeptical about a new advance, but there's a big market for stories that run with the hype.

One widely covered "beyond Moore's law" story last year was the announcement from HP Labs that they had discovered the "fourth basic circuit element," the "memristor," which "could transform computing." As I've complained before, lots of researchers think they can solve industry's problems even though they don't know what they are, but in this case I think the strategy is more deliberate.

HP, in fact, has a history of visionary claims. In 2005, for example, they announced the "crossbar latch": "Who Needs Transistors? HP Scientists Create New Computing Breakthrough at Molecular Scale." They described dense arrays of devices whose resistance depends on previous voltages. But their claims, such as their ability to "restore" signal levels, were quite misleading, since the depend on surrounding their structure with normal transistors. The devices themselves, although densely packed, were slow and passive and certainly not a threat to transistors.

In spite of this history, most news coverage of the memristor repeated the framing in the HP press release: this was the "fourth element" that researchers had sought for decades. U.C. Berkeley electrical engineering professor Leon Chua authoritatively endorsed this view.

Unfortunately, it was Chua who had made the original claim about the memristor, in a 1971 paper that was cited only about 20 times over the next 37 years. In reality, no researchers were beating down the bushes looking for this device; most had never heard of it. Meanwhile many experimenters, including those at HP, had made devices that had the special properties of memristors, but didn't need see a need for that language. Now the crossbar latch appears to contain memristors.

So what elite group is the memristor supposed to be the fourth member of? The other three members are the resistor, capacitor, and inductor. These are the passive, linear, two-terminal circuit elements found in every textbook. Passive means they don't provide power, and two-terminal means they just have two wires coming out. Linear means that output is proportional to input, but for reasons I'll explain below that word was left out of the HP descriptions.

In his 1971 paper, Chua argued that these elements relate either the current or its integral over time (the charge) to the voltage or its integral (which he somewhat surprisingly calls the magnetic flux). The resistance relates current to voltage. The capacitance relates the charge to the voltage. The inductance relates the current to the flux. And one thing is missing….

This unusual description of these familiar devices is a little unnerving. But as Chua modestly explains to Information Week:

"Electronic theorists have been using the wrong pair of variables all these years -- voltage and charge. The missing part of electronic theory was that the fundamental pair of variables is flux and charge," said Chua. "The situation is analogous to what is called "Aristotle's Law of Motion, which was wrong, because he said that force must be proportional to velocity. That misled people for 2000 years until Newton came along and pointed out that Aristotle was using the wrong variables. Newton said that force is proportional to acceleration -- the change in velocity. This is exactly the situation with electronic circuit theory today. All electronic text books have been teaching using the wrong variables -- voltage and charge--explaining away inaccuracies as anomalies. What they should have been teaching is the relationship between changes in voltage, or flux, and charge."

What seems to be missing is something that relates charge (the integral of the current) to flux (the integral of the voltage). He postulated, and HP says they found, this missing element, called the memristor.

But remember that omitted word "linear"? The device that is in the same club as the resistor, capacitor, and inductor would have, like them, a proportionality between the quantities it relates. This simple linear relationship is what makes these devices fundamental. And if the integral of the current is proportional to the integral of the voltage, then their derivatives, the current and voltage, are also proportional. In other words, in this linear case--the only case for which a memristor can legitimately join the ranks of the other three devices--"the memristor reduces to a linear time-invariant resistor," in Chua's own words.

So the memristor only exists, as a fundamental circuit element, when it is just a resistor.

Of course, there are nonlinear generalizations of all of these devices, and things get complicated really fast. And real devices aren't purely one device or another. Capacitors, have some series resistance and some leakage resistance, for example.

Still, maybe thinking in terms of a general, nonlinear "memristivity" makes some observations easier to understand. Even if it doesn't, people may discover some cool stuff by exploring this area. And to their credit, HP researchers seem to be seriously pursuing this, as exemplified by this September 1 Nano Letter. But there's an awful lot of noise and hype that makes it hard to get a bead on the real issues. Certainly the story is much more complex than, say, finding a missing element predicted to be at a particular spot in the periodic table. It's more like deciding that Pluto wasn't really a planet after all--and we know how complicated that story gets.

Once a story like this gets out there, though, unless it's so widespread that virtually everybody has heard of it, it's very hard to sell a story that tries to put it back in the bottle, or even to give it some perspective. (I've tried.)

After all, why would you want to put a lot of effort into understanding something that not worth learning about in the first place?

I'll bet you haven't even read this whole post.

Monday, September 7, 2009

Alzheimer's and Inflammation

Two online letters, just out in Nature Genetics (here and here), found three genes that had a statistically significant correlation with late-onset Alzheimer's disease. For the past 16 years, only one gene, APOE, had been connected with this common form of the disease, explaining about half of its genetic heritability. In contrast, the rare, early-onset form has a more classic "Mendelian" genetic pattern, in which, if you have a mutation in one of three genes, you have a high probability of getting the disease.

The new results have the common disappointments of the last few years of genome-wide association studies, or GWAS, of complex diseases: (1) The effects of any particular genetic variant are weak, so they can only be seen by studying thousands of subjects. (2) Because half a million candidate mutations are tested simultaneously, it's hard to assess the significance of something that looks like an association. It's easy to pick up false positives by chance alone, so researchers need to apply big corrections. (3) Different studies identify different variants. In this case, the studies agree on a gene called CLU, but each of them also finds another gene that the other study doesn't. (4) The cumulative effect of all the variants found is not enough to explain the observed heritability of the disease. It seems that there must be many other, unidentified contributors, each having only a small effect.

Confirming this weakness, co-author Michael Owen, of Cardiff University in Wales, noted in a supplementary statement on the Nature Genetics website that "the current genes on their own are not strong predictors of risk and are not suitable for risk testing." I'll have a lot more to say about GWAS and disease in future posts.

But although the genes aren't very useful for predicting risk, they do give clues about the biological mechanisms of the disease. Most previous discussions of Alzheimer's, including these two papers, concerns two types of protein deposits in brain cells: "plaques" of β-amyloid protein and "tangles" of tau protein. Both of these deposits are often seen in the brains of Alzheimer's patients after they die. Clusterin, which is the protein coded by CLU, may help clean up the plaques.

But in her supplementary statement, Julie Williams, also of Cardiff, noted that "clusterin has a role in dampening down inflammation in the brain. Up until now increased inflammation seen in the brains of Alzheimer's sufferers had been viewed as a secondary effect of disease. Our results suggest the possibility that inflammation may be primary to disease development."

This reminded me of Paul Ewald's talk at a January 2007 symposium at Hunter College, "Evolution, Health, and Disease," which I covered on behalf of the New York Academy of Sciences. Ewald, of the University of Louisville, noted that inflammation of arterial plaques is a common feature of the atherosclerosis that often leads to heart disease. (The test for inflammation using the C-reactive protein (CRP) is often used to predict heart-attack risk.) But he also noted that the troublesome ε4 variant of the EPOE gene "is the major risk factor, not only for atherosclerosis and stroke, but also for sporadic Alzheimer's and multiple sclerosis," even though the fat transport that influences atherosclerosis is a completely different chemical property than the formation of protein plaques in Alzheimer's or the myelin-sheath destruction in multiple sclerosis. "The idea that ε4 would be bad in all of these different ways," Ewald said, "is really stretching it."

Instead, Ewald suspects that the common element in these various diseases is infection, perhaps by Chlamydia pneumonia. I imagine that his view remains on the fringe, and perhaps it will remain there. But 25 years ago, the idea that many ulcers are caused by a bacteria was also a fringe idea. Barry Marshall and Robin Warren won the 2005 Nobel Prize in Physiology or Medicine for tracing ulcers to Helicobacter pylori. Maybe in 25 years we will find it natural to associate Alzheimer's with infection, too.

Saturday, September 5, 2009


There's a lot of buzz about two reports in Science (here and here) on monopoles in magnetic solids. (The best story I saw is by Adrian Cho in ScienceNOW, the online news section of Science.) Much of the excitement comes because some candidates for a grand unified theory predict such particles. But although the results are fun, to my mind the enthusiasm is misplaced.

Eugenie Samuel Reich (author of Plastic Fantastic) got a jump on the coverage in her May story in New Scientist, which also illustrates the problem. The magazine cover screams: "THE MYSTERIOUS MONOPOLE: Predicted by theory; Hunted for decades; FOUND AT LAST." Her closing quote is more accurate: "These might not be exactly the monopoles that Dirac dreamed of, but that doesn't mean they're not remarkable." In what sense, then, was the predicted and hunted particle "found"?

I need to be careful here: last October, I wrote in New Scientist about nonabelian anyons, predicted particles for which outwardly identical arrangements can be distinct quantum states. Researchers are hopeful that these states could stably save quantum information for use in quantum computing. An earlier draft of that story, which I rather liked, began: "When it comes to unveiling fundamental particles, the Large Hadron Collider gets all the press. But in a few, much smaller labs around the world, some physicists suspect they've already found particles as exotic as any the new accelerator is likely to create."

So why do I think finding nonabelian anyons in a solid is more profound and fundamental than finding monopoles in a solid? To answer that question, I need to distinguish three levels of description of a complex system: how the particles interact, how they arrange themselves, and how they deviate from that arrangement. The descriptions are interconnected, but sometimes, one level is more interesting than another.

To be concrete, let's think about a regular lattice of atoms on a square lattice, each of which has a magnetic moment, which acts like a tiny bar magnet that can point in different directions. As it happens, this is somewhat similar to the spin system where the monopoles were seen.

To mathematically describe the interactions, physicists use an expression--unhelpfully called the Hamiltonian--for the total energy of any particular arrangement. The spins, for example, might favor arrangements with neighbors pointing in opposite directions. If they point in the same direction, it takes extra energy. The Hamiltonian just adds up the energy of every pair of neighbors. Fundamental theories, like proposed grand unified theories, aim at this basic level of the Hamiltonian, which for the universe as a whole is unknown. Experiments only probe how particles arrange themselves in response.

The arrangement that gives the lowest total energy for a particular Hamiltonian is the ground state. Sometimes it's obvious what this will be. On a square lattice, for example, spins forming a checkerboard pattern will always have neighbors pointing in opposite directions. On a triangular lattice, though, some neighbors must point the same way. In other cases, physicists don't know what the ground state is, even when they know the Hamiltonian. This is the situation for the two-dimensional electron systems, where nonabelian anyons may exist: the possible arrangements are very complicated and physicists aren't sure what the lowest-energy state is. (It might also depend on neglected details in the Hamiltonian.)

At temperatures above absolute zero, a system doesn't have to be in its ground state. Instead, it will include excitations away from the ground state (which necessarily require energy). For our spin system, for example, one of the spins on the checkerboard could be flipped so that it points the same direction as its neighbors. Although some specialized experiments can measure the ground state directly, many observable properties depend on these excitations. The excitations that can occur depend very much on the underlying ground state, although it's not always easy to tell what the most important excitations are going to be.

So where do monopoles fit in? In the recent experiments, spins in a crystal arrange so that their excitations act like these isolated magnetic poles. This is certainly unusual and interesting. But there is no reason to think that the Hamiltonian that describes the spin has any relationship to the one that describes the fundamental particles of the universe, any more than two equations, both solved by x=2, are necessarily related to each other. People have looked for monopoles in exotic places like cosmic rays because seeing them would give a profound clue about the Hamiltonian describing elementary particles. The monopoles seen here give no such clue.

Nonabelian anyons, by contrast, are inherently interesting particles. They may or may not be the excitations for certain two-dimensional electron systems studied in labs. We do know they can only exist in an effectively two-dimensional world, so they're not likely to turn up in the three-dimensional world of the Large Hadron Collider. But their quantum memory, which may even be useful, is a surprising property not shared by any known particle. In this case, the excitations (which reflect an unusual ground state) are the interesting thing, not the Hamiltonian.

To be fair, the new monopoles also have unique properties, not shared by any known particle. At that level, they are quite interesting in their own right. They just don't tell us much about grand unified field theories.

Friday, September 4, 2009

The Harder They Come

Over at Physical Review Focus, my latest story concerns a simple model that can deal with materials whose hardness varies with direction.

One important message, which gets diluted by the required pegging of the story to a current Physical Review article, is that there are some new ultrahard materials called transition-metal diborides. One of these, ReB2, is said to be hard enough to scratch diamond. I'm a little confused about this, because the measurements of the Vickers microhardness, measured with a tiny diamond indenter, give numbers for the hardness that are a bit lower than diamond. (Other researchers complained about the claims, Science subscribers only.) But it's certainly competitive, and apparently easier to make than competitors like boron nitride, which need high temperatures and pressures.

It's important to be careful talking about "hardness," which is one of those dual-use words whose lay meaning is imprecise. Scientifically, hardness refers to permanent, "plastic" deformation, as contrasted with stiffness, which is characterized by the bulk modulus or compressibility and refers to reversible, elastic deformation. Water is incompressible but not at all hard, for example. In a crystalline solid, plastic deformation requires that planes of atoms slide by one another, breaking bonds as they do so. In normal materials, this process is sensitive to things like grains structure and defects, which is what smiths change when they "temper" steel. In principle, though, microhardness is an intrinsic, unchangeable property of a particular crystal.

The author of the paper, Antonín Šimůnek, (want to know how to pronounce Czech?) has developed a model for hardness that simply calculates the bond strengths for the various bonds in the material and combines them into an overall hardness. Apparently there's been a flurry of similar papers over the last few years, which is kind of surprising since hardness has been important for a long time. In fact, the ingredients of the models seem almost primitive in these days of massive computer simulations. Šimůnek told me that he recently realized that his overtly classical view of the forces between atoms is similar to work published in 1939 by Richard Feynman, then at MIT. In the current paper, he distinguishes between bonds aligned with and perpendicular to the force to calculate a direction-dependent hardness, which apparently has never been done before.

As an undergraduate at MIT, a hotbed of early metallurgy research, I heard an amusing story about calculations of hardness in metals. The story goes that the first calculations for a single crystal predicted a strength much greater than observed. At that point someone realized that, instead of a plane of atoms sliding all at once, the same overall motion could come from progressive motion of line, called a dislocation, moving one atomic row at a time in a zipper fashion, which of course takes much less energy. Sadly, the calculations then predicted much too low a strength. The researchers then realized that the motion of dislocations could be arrested if they were pinned by crystalline defects. The predicted strength was then in line with measurements.

Wisely, they stopped improving the models.

Thursday, September 3, 2009

Soft Machines

Biological macromolecules, such as DNA, RNA and protein, spend much of their time in the company of other macromolecules, rather than floating freely on their own. The resulting molecular behavior in a cell resembles the choreographed handoff of a product between machines in an assembly line, rather than the indiscriminate reactions of molecules in a chemistry-lab beaker. The intriguing twist is that the "machines" are themselves collections of biological macromolecules that can also come and go.

Specialized cellular structures, known as organelles, are basic to any introduction to cell biology. But the usual suspects, such as the nucleus, mitochondria, chloroplasts, the endoplasmic reticulum and golgi apparatus, and various vacuoles, are all delineated by membranes. The lipid molecules of the membrane are also free to come and go, of course, but this construction makes it easy to define the organelles.

Other structures, often not called organelles because they lack a membrane, are just as important. The best known examples are ribosomes, which are large enough (about 0.02 microns) to have been noticed in electron micrographs some fifty years ago. They are complexes of specialized RNA with specific proteins, together known as ribonucleoprotein, or RNP. The ribosomes translate the sequence of bases in a messenger RNA strand into a corresponding amino-acid sequence in a growing protein.

Another structure, found in the nucleus, is the spliceosome. This critical RNP complex processes pre-RNA that is directly transcribed from the DNA, cutting out some sections and splicing the rest back together to form a proper protein-coding sequence. The spliceosome frequently splices in different sections of the pre-RNA to specify different variants of the protein, depending on cellular conditions. Once the strand is equipped with a cap and a tail on opposite ends, it is a messenger RNA ready for export from the nucleus.

Back outside the nucleus, the messenger RNA may encounter two other types of RNP structures, known as processing bodies and stress granules (the latter appear only in stressed cells). Messenger RNA can be temporary stored in either of these complexes, delaying its translation into protein by the ribosomes. Processing bodies can also permanently degrade the RNA preventing it from being translated. In ways that are still being explored, this degradation is associated with the RNA interference, in which short regulatory RNAs work with proteins to target messenger RNA strands that have a mostly complementary sequence.

Experiments show that molecules are constantly entering and leaving these complexes (at least some of them), unimpeded by any membrane, so there may be no clear line between them and the transient association of a few macromolecules. These dynamic association are similar to the shifting alliances of politicians in congress, some formal and restrictive like a political party, some more of an informal hallway conversation, but each contributing to the political process. Molecular complexes in the cell, though not always recognized, are equally critical to its function.

Pharma in the News

Two big stories this week about how pharmaceutical industries get doctors to prescribe their drug:

  • Some of the marketing plans of Forest Laboratories for their antidepressant Lexapro were made public. Their previous product, Celexa, which was approaching the end of its patent protection, contains a mixture of two mirror-image versions of the same molecule, while Lexapro contains only one. Generating a market for the newer, more expensive replacement, when it is so similar, takes a full-court "marketing" press. (See also this view from a pharmaceutical industry researcher at In the Pipeline.)
  • For a record $2.3 Billion, Pfizer settles charges about their marketing practices. There have been several previous finds of many hundreds of millions of dollars, but apparently they judged the profit potential to be worth the risk.
  • In a more encouraging report, the FDA says that 80% of a sample of postmarketing studies are proceeding on schedule. Since these studies include many more patients than those used for initial approval of drugs, they can uncover rare problems that were not statistically apparent in the original studies. Pharmaceutical companies have been criticized for dragging their feet on these studies, since they have little incentive to uncover new problems.

Wednesday, September 2, 2009

Biology Is Not Chemical Engineering

Those of us who were trained in physics or chemistry tend to think of a cell as a bag, mostly water, with some active molecules floating around randomly in it. This is really misleading.

Chemical engineers, for example, often envision a "well stirred reactor." (Physicists do too, the difference being that they often do it without realizing it.) In this model, every molecule is equally likely to be anywhere in the reactor, which we take to be a cell. Maybe we have one pot for the inside of the nucleus and one for the cytoplasm, but we imagine them individually to be well stirred. This makes mathematical modeling really straightforward. For example, the rate at which two types of molecule react is proportional to the product of their concentrations in the cell, times the likelihood that they will react if they bump into each other. From here it's easy to write a set of simple differential equations describing how the number of each type of molecule changes over time. Easy, but wrong--or at least oversimplified.

Consider messenger RNA, which is an inverted copy of the genetic information in DNA. In eukaryotes like us, after the RNA is transcribed in the nucleus, it moves to a ribosome out in the cytoplasm of the cell, where it is translated into protein. I used to imagine this just like a well stirred reactor: the freshly minted RNA diffuses randomly around the nucleus, some of it leaks out through holes in the nuclear membrane into the cytoplasm, and there it randomly bumps in to a ribosome to begin translation.

A big problem with this image is that in reality the RNA is virtually never alone. Everywhere it goes, it is accompanied by an entourage of proteins that guide it to its next press conference, keep away the paparazzi, and so forth. Nobody gets in without an appointment.

Even as the RNA is being transcribed, proteins modify it with tags that identify it as a protein-coding RNA, rather than, for example, an RNA involved in regulating gene expression. It is quickly shepherded to the splicing apparatus, where more proteins remove some sections and splice the others together to form mature messenger RNA. Special bodyguard proteins then accompany it through the security gate at the nuclear pore. Although its name suggests a simple hole in the nuclear membrane, the nuclear pore doesn't let any molecules in or out without one of these bodyguards. The RNA may then be escorted directly to a ribosome to begin translation. Alternatively, it may be shunted to a holding area within the cell to await further instructions. If its sequence matches one of the regulatory RNAs, which has its own protein entourage, the enforcer from the regulatory entourage executes the messenger RNA, cutting it to pieces before any translation. So it goes.

Proteins also spend much of their time in direct contact with other proteins, beginning even during their translation from RNA. These transient associations aren't easy to measure, but biologists are continually improving their tools for marking individual proteins, imaging their locations, and recording who they associate with. These tools reveal that the metaphor of the cell as a machine--imagine RNA processing as a production line--is a much more potent than is the metaphor of the well-stirred reactor.