Friday, September 25, 2009

The Map and the Territory

The map is not the territory. Alfred Korzybski

I confused things with their names: that is belief. Jean-Paul Sartre

Ceci n'est pas une pipe. René Magritte

In fields ranging from economics to climate to biology, scientists build representations of collections of interacting entities. Everyone knows that the real systems have so many moving parts, influencing each other in poorly known ways, that any representation or model will be flawed. But even though they understand the limitations, experts routinely talk about these systems using words that come from the models, rather than from reality. Climate scientists talk of the "troposphere," economists talk of "recessions," and biologists talk of "pathways." Such concepts help us organize our thinking, but they are not the same as the real thing.

Sometimes the difference between the "map" and the "territory" is manageable. Roads and rivers are not lines on a piece of paper, but they clearly exist. Similarly, the frictionless pulleys and massless ropes of introductory physics have a simplified but clear relationship to their real-world counterparts (at least after you've spent a semester learning the rules). Still, it's easy to get sucked into thinking of these well-behaved theoretical entities as the essence, the Platonic ideal, even as one learns to decorate them with friction and mass and other real-world "corrections."

For many interesting and important problems, however, the conceptual distance between the idealizations and the boots-on-the-ground reality is much larger. You might think that experts would recognize the cartoonish nature of their models and treat them as crude guides or approximations, rather than fundamental principles partially obscured by noisy details. Judging from the never-ending debates in economics, however, the more obscure the reality, the more compelling the abstractions become.

Even in less contentious fields, experts can mistake the models for reality. For example, the fascinating field of systems biology aspires to map networks containing hundreds or thousands of molecules using high-throughput experiments like microarrays and computer analysis. Although one might like to describe all these interactions using coupled partial differential equations, researchers would often be happy simply to list which molecules interact. This information is often represented as a graph--sometimes called a "hairball"--which represents each molecule as a dot or node, and interactions as a line or edge connecting them.

Finding such graphs or networks is a major goal of systems biology. In principle, an exhaustive map is more useful than the traditional painstaking focus on particular pathways, which are presumably a small piece of the entire network. But to yield benefits, researchers need to understand how "accurate" the models are.

A few years ago, a group of systems biologist decided the time was ripe to critically evaluate this accuracy. They established the "Dialogue on Reverse Engineering Assessment and Methods," or DREAM to compare different ways of "inferring" biological networks from experiments. (I covered the organizational meeting, as well as meetings in 2006, 2007, and 2008, under the auspices of the New York Academy of Sciences. A fourth meeting, which like the third will be held in conjunction with the RECOMB satellite meetings on Systems Biology and Regulatory Genomics, is scheduled for December in Cambridge, Massachusetts.) These meetings, including competitions to "reverse engineer" some known networks, have been very productive.

Nonetheless, one thing the DREAM meetings made clear is that "inferring" or "reverse engineering" the "real" networks is simply not a realistic goal. Once the networks get reasonably complicated, it's essentially impossible to take enough measurements to clearly define the network. The ambiguity even applies to networks that actually have been engineered, that is, created by people on computers. The "inferred" networks are a useful computational device, but they are not "the" network. And they never will be.

For these reasons, many researchers think the only proper way to assess the results is by comparing to experiments. If the models are good, they should not only match observed data, but should extrapolate to accurately predict what happens in a novel situation, such as the response to a new drug. Interestingly, the most recent DREAM challenges included tasks of this type. Disappointingly, however, the methods that best predicted the novel responses simply generalized from other responses: they did not include any network representation at all!

It seems reasonable to expect that a model that tries to mimic the internal network, even if it is flawed, would better predict truly novel situations. But it's hard to know what it will take for the system to hit a tipping point where it does something completely different, which was never observed before or included in the modeling. Often, we won't recognize the limitations of our complex models--in biology, climate, or economics--until they break.

No comments:

Post a Comment