Wednesday, November 4, 2009

The Permanent Record

The history of science is documented in carefully crafted publications, which contain data the authors have selected and analyzed to best support their claims. But how should day-to-day scientific research be documented?

Once upon a time, measurements were taken by an experimenter reading an instrument, and transcribing the result by hand into a lab notebook. Written in pen, numbered sequentially, and dated, this notebook provided a time-stamped record of the raw data of the research.

Those days are gone. Modern instruments take massive amounts of data under computer control and often save it in files with inscrutable names that can be later modified at will and leaving no trace.

When I served on a committee investigating possible fraud by investigation of Hendrik Schön in 2002, we found numerous discrepancies in his published data. The committee hoped that we might resolve the problems by examining Schön's raw data. We were sorely disappointed.

We requested supporting information for only six of the eventual 25 papers under investigation. Schön (who was still working at Bell Labs) gave us a two-inch-thick stack of printouts. Essentially none of them met the traditional notion of an archival record of the raw data with document provenance. All were processed data--and some of these were clearly manipulated. He said that storage limitations on the computer forced him to delete some of the data.

Schön also gave us a CD-ROM or two full of "raw" data files. Many of these proved to be files produced by the plotting program, Origin--also not raw data. These files were at least time-stamped, with dates corresponding to the original acquisition of the data. However, they had a curious property that, although the dates on the files varied over a couple of years, the creation times on the files showed a steady progression, one every few minutes, even as the dates changed. Real raw data could have saved Schön, but what he produced only made things worse.

Bill Neaves, the Chief Executive Officers at the Stowers Institute for Medical Research in Kansas City, understands the need for an archival record, from investigations scientific misconduct cases in his former position at the Southwestern Medical Center in Dallas. When he was helping to set up the Stowers Institute almost a decade ago, he says, he decided to take advantage of the lack of institutional history to require that researchers maintain lab notebooks. Moreover, every week the notebooks are scanned and stored in an unalterable, time-stamped format.

Would this procedure have stopped Schön? Maybe not. During the investigation he showed that he could fabricate "raw" data as well as he could publishable data. But it would have slowed him down a lot, and it would have ensured that he knew that people were watching. And if he had been honest, the archive would have absolved him, as Neaves says the Stowers system has already done for one falsely-accused researcher.

But these days, the written record is only a shadow of the activities of science. I'd also like to see all data-acquisition software unalterably configured to create read-only files with clear time stamps. With the low price of digital memory today, there is no excuse for not archiving all measurements.

Unfortunately, even this is not enough. Many of the great biology frauds have involved researchers altering samples to get the expected readout--for example spiking them with radioactive compounds that will show up at the right place on a gel. Without mechanisms to track laboratory materials and their manipulation, and connect them to the eventual measurements, there will always be room for chicanery.

But the Stowers efforts are step in the right direction. They send a strong message to researchers that the integrity of the scientific process is paramount, and that ensuring integrity will not be left to chance or to the reputation of any individual.



  1. So, in addition to being overworked and having little job security, scientists should be under constant automatic suspicion of fraud? That'll make it an attractive career option.

    Twenty years of that, and scientists will be careful mediocrities who will follow all the rules, meet on committees to set new rules, set compliance targets, and get prizes for the best organized lab notebooks. However, I doubt that they'd have the time or inclination to actually learn anything new about nature.

    I see this kind of approach as an attempt to impose quality from the outside, rather than doing it right in the first place. The right strategy is to reduce the incentive for fraud, rather than trying to prosecute afterwards.

    Basically, if you have people who are only honest because they are scared, you won't get any good science. You need people who are honest because they really care what the answer is, and they really want to find out. People who take pride in their work.

    The problem in science (and it's not a big one) is not fraud, but normal human incompetence and laziness. Sometimes, this is combined with a convenient inability to notice problems with one's experiment/analysis. But only rarely does it escalate to intentional fudging of data.

  2. I know this is an old post but that is a ridiculous argument. There is no reason why anyone should be scared if they are not manipulating data. In fact, keeping all raw data protects you as a scientist from accusations of fraud