Using the Black Box

I was first introduced to the concept of black box thinking through the CrossFit community. In the CF system, the black box is a method of cutting through the confusing and unclear (and often completely absent) data in the world of training, nutrition, and health in order to find the best practices for reaching your personal fitness goals.

Simply put, the idea of the black box says that you have a system whose workings are complex, invisible, or unknown. How do you figure out their mechanism? You don’t. You simply enter an input, then observe the output. Do something and see what happens. Do you like what happens? Keep doing it. Don’t like it? Try something else.

You don’t know how or why, but you do know what and that means you can keep doing it. You can correlate input to output—fairly reliably, if it’s repeatable—but you can’t “look inside the box” and figure out the causal model, so it’s great when you’re interested in what does what, but poor when you want to know how it does it.

The reason the CrossFitters love this is because it’s not only possible but quite easy in the world of health nowadays to find experts, anecdotes, and research results that seem to support absolutely any imaginable point of view or practice. Amidst this ocean, and trying to decide what to do with yourself (what do I eat? how do I exercise? does this supplement work?), there is only one 100% surefire way to find certainty, and that’s to try stuff personally and see what happens. Does creatine help you? Well, the studies say that for most people it does xyz, but even with rock-solid data, that doesn’t help you too much, because you’re an individual, and individuals are not big enough to fit statistical models well. So you just stuff some creatine down your throat for a few weeks and then you take stock and see what happened. Do you lift more? Run faster? Recover better? Hell, is your coat more glossy and your farts smell sweeter? Whatever; if you like what’s happening, keep doing it. If you don’t like it or don’t notice anything, quit.

What if it’s a placebo? Or what if there’s some confounding factor which is responsible for results rather than the creatine? Who cares? “Keep doing what you’re doing” does not reveal these things, but nor does it rely on them. If you like what’s happening, keep doing it. If I can get 10% better performance from a placebo, shit, I’ll snort that stuff year round.

So the black box is really great for this kind of trial-and-error self-experimentation, and to some extent, it’s something all athletes do; much of the early stages of training are really about learning what methods work for you as much as they’re about any physical adaptations. However, problems can arise when people try to apply the black box to types of analysis where it doesn’t fit well, and since CrossFitters have a habit of this I wanted to talk about some of the things the black box is good for, and others where it’s not so effective.

The creatine scenario I mentioned is exactly where the black box shines. You have a clear input—you’re taking the stuff or you’re not—and a clear output—you improve or you don’t. The input is easy to modify, since the stuff is cheap and you just swill it down with water; and the output is a low-risk scenario, since you either improve or nothing much happens. In this case, therefore, the black box is king and other sources of data, such as controlled studies, are much less useful; they can “suggest” that trying the stuff is worth your time, but that’s about it.

But consider a different set of circumstances. What if rather than the rapid results (or rapid failure) of creatine, it instead took forty years for results to manifest in the majority of users? This would be a bad black box, because you’d have to use it for most of your life before you learned whether it was a good idea or not; there is such a long “lag” between input and output that the test is hardly worth doing.

What if my interest isn’t just in whether the drug will help me, but whether it will help other people—whether I can recommend it? The black box would tell me next to nothing here. It is what’s called in science an anecdote, or a single piece of data gathered non-rigorously. The fact that I only know what happened to me (one person) means that my results are very unreliable; for all I know, I’m an incredible fluke and nobody else in the world would have the same results. Worse yet, the fact that my “experiment” was done without any controls means that, in truth, I don’t really know what caused the results. Maybe the creatine would have worked, but because I happened to go on a vacation in the second week, my output tanked and I saw nothing come from it. Or maybe it didn’t work after all, but I changed my diet in the meantime, which produced better performance. Since I haven’t controlled any of these factors, I can’t even say that “the drug did xyz to me”; maybe it did, but maybe it didn’t. All I can infer is a correlation, not a causation, and a very weak one at that. So can I tell my buddy to try this stuff because it works? No. The black box can’t tell me that. What if fifty of us all tried it, and it worked for most of us? Well, now we have more anecdotes, which is at least better than just one, but they still lack controls, so it’s not much better; if there were some systematic confounder (some other factor that most of us unknowingly added to the system which produced our results without our realizing its role), then we’d still know nothing about the drug itself. What if we all tried to control the other aspects of our lives to prevent confounders? Now we’re getting somewhere, and this is basically what a scientific study consists of. But that’s a far cry from the personal black box that we started with.

What if we wanted to know if a similar drug, not creatine but a compound based on it—call it Fakeatine—was a worthwhile substance to try? Would our creatine black box tell us this? No. All we know is whether creatine helped us, not how it did it; we have no data on its mechanism, even if we pull some creative theories out of our ass. We therefore can’t begin to speculate on whether Fakeatine would do something similar, better, or worse, since we don’t know what effect the differences between Fakeatine and creatine might have on creatine’s (unknown) biological mechanism. All we can do is start a new Fakeatine black box . . . but hopefully the problem with this is clear: if you don’t know causal mechanisms, and hence can’t extrapolate predictions, then you can never make any progress of knowledge beyond what you actually personally tried yourself. You can never say “this would probably be a bad idea,” because you haven’t tried it yet, so you don’t know. For that matter, you can’t even really say “creatine worked for me yesterday, so it will probably work for me today”—how do you know? Maybe its mechanism is one that self-destructs after fifteen days of use. In short, you lose the predictive power of science, which is really the whole point of the stuff, since it’s what tells us that poison is bad (odds are that it will kill you) and brushing your teeth is good (odds are it will reduce cavities).

All right, now suppose I wanted to try another supplement, a drug called Poisontine. This stuff has about a 50% chance of permanently increasing my strength, but a 50% chance of killing me instantly; the difference comes down to a certain gene found in half the population. Is the black box a good way to find the effects of this supplement on me? I hope it’s obvious that it’s not; while I could certainly find out whether it worked, it would be useless information, because once I’ve taken it, I’m either dead or permanently enhanced, and in neither case is there any opportunity (in the first case) or reason (in the second case) for continuing or terminating the “experiment.” It’s not a useful black box if it only lasts for one “round”; it’s just a gamble. Whether or not you would take that gamble is a different matter, but it doesn’t fall under the bailiwick of the black box.

What if the drug didn’t kill me, but had a chance of permanently crippling me, maybe making my middle fingers fall off? This would also be a poor candidate for the black box, because while it would give us data, the data is too expensive. In order to learn what we want to know, we have to take a 50% chance of serious harm, and that’s simply not worth it; at the very least, if we could acquire our data another way (such as a test for the dangerous gene), it would be a much better method even if it weren’t quite as certain in results.

By now, a pattern should be becoming clear. The effectiveness of the black box is tied to the immediacy and personalization of its results, as well as the ability to easily and safely manipulate the system; the more we try to generalize the results, or the more difficult or dangerous it becomes to change inputs and examine outputs, the more the black box fails.

In sum: the black box is excellent, perhaps unmatched, for determining optimal inputs in systems where the only thing we care about is whether we “like” or “don’t like” the results. However, it tells us nothing that we can generalize, so it’s next to useless for “learning” anything or for suggesting new avenues to pursue (it can only test the ones we’ve already decided to try); it’s a poor tool when some potential outputs are harmful, unclear, or delayed; and it’s completely pointless when the system isn’t a lasting one whose inputs we can continue to manipulate and outputs continue to reap.

So it’s great for figuring out what foods you like. But it’s a really bad for weighing in on global warming. Make sense?