Midgaard: Beautiful Data

Wednesday, February 3, 2010

Beautiful Data

There's an old joke where a scientist presents "representative data," when everyone knows it's really their best data. Like many jokes, there's a large measure of truth in it. (And like many science jokes, it's not actually funny unless you're a scientist.)

Audiences, whether in person or in print, like a good story. And it's best when data tell a story on their own, without further words of explanation. Scientists can handle tables of numbers better than most people, but, even for them, pictures tell the most compelling stories.

In fact, many experienced researchers begin preparing a new manuscript by deciding what figures to include. In part this is because figures take a lot of space in journals, but in addition many readers will go from the title straight to the figures, bypassing even the short abstract. If the pictures don't tell a good story, readers may just move on.

This is what it means when scientists say data are "beautiful": not that they have some intrinsic aesthetic appeal, but that they tell a good story about what's happening. Ideally, the story is compelling because the experiments have been done very well. But that's not always the reason.

Some of the most beautiful data I ever saw, in this sense, were presented at Bell Labs by Hendrik Schön in early 2002, at a seminar honoring Bob Willett and the other winners of that year's Buckley Prize. In contrast to Willett's painstaking work over many years elucidating the properties of the even-denominator fractional quantum Hall effect, Schön presented one slide after another demonstrating a wide variety of phenomena in high-mobility organic semiconductors.

The problem was that the story the beautiful data were telling was a lie. Schön's rise and fall were expertly described in Eugenie Reich's 2009 book Plastic Fantastic, and I was later on the committee that concluded that he had committed scientific misconduct.

But honest scientists also must be careful when they choose data that supports the story they want to tell. There is an intrinsic conflict of interest between telling the most compelling story and facing honestly what the data are saying. Decisions about which data to omit, and how to process the remainder, must be handled with great care.

As a rule, scientists overestimate their objectivity in selecting and processing data. As individuals, they are more swayed by expectations than they would like to admit. Collaborators can help keep each other honest, but only to a degree.

One thing that keeps science on track in these situations is that other people may make the same sort of measurements. Often different experimenters have a different idea of what's right, and the back-and-forth helps the field as a whole converge toward the truth.

But what happens when a whole field expects the same thing? There's a real danger that the usual checks and balances of scientific competition will break down, and all the slop and subjectivity of experiments will be enlisted in service of the common expectation.

Just because data is beautiful--that is, tells a good story--doesn't mean that story is right.

Midgaard

Wednesday, February 3, 2010

Beautiful Data

No comments:

Post a Comment

About Me

Blog Archive

Search This Blog

Labels

Some Feeds I Track

Books I've Read

Midgaard

Wednesday, February 3, 2010

Beautiful Data

No comments:

Post a Comment

About Me

Subscribe To

Blog Archive

Search This Blog

Labels

Some Feeds I Track

Books I've Read

Subscribe To