Scorecasting Read online

Page 2


  “Half a second longer and I would’ve had to [call him in the grasp],” says Carey. “If I stayed in my original position, I would have whistled it. Fortunately, I was mobile enough to see that he wasn’t completely in the grasp. Yeah, I had a sense of ‘Oh boy, I hope I made the right call.’ And I think I did.… I’m glad I didn’t blow it dead. I’d make the same call again, whether it was the last [drive] of the Super Bowl or the first play of the preseason.”

  Others aren’t so sure. Reconsidering the play a year later, Tony Dungy, the former Indianapolis Colts coach and now an NBC commentator, remarked: “It should’ve been a sack. And, I’d never noticed this before, but if you watch Mike Carey, he almost blows the whistle.… With the game on the line, Mike gives the QB a chance to make a play in a Super Bowl.… I think in a regular season game he probably makes the call.”* In other words, at least according to Dungy, the most famous play in Super Bowl history might never have happened if the official had followed the rule book to the letter and made the call he would have made during the regular season.

  It might have been a correct call. It might have been an incorrect call. But was it the wrong call? It sure didn’t come off that way. Carey was not chided for “situational ethics” or “selective officiating” or “swallowing the whistle.” Quite the contrary. He was widely hailed for his restraint, so much so that he was given a grade of A+ by his superiors. In the aftermath of the game, he appeared on talk shows and was even permitted by the NFL to grant interviews—including one to us as well as one to Playboy—about the play, a rarity for officials in most major sports leagues. It’s hard to recall the NFL reacting more favorably to a single piece of officiating.

  If this is surprising, it shouldn’t be. It conforms to a sort of default mode of human behavior. People view acts of omission—the absence of an act—as far less intrusive or harmful than acts of commission—the committing of an act—even if the outcomes are the same or worse. Psychologists call this omission bias, and it expresses itself in a broad range of contexts.

  In a well-known psychological experiment, the subjects were posed the following question: Imagine there have been several epidemics of a certain kind of flu that everyone contracts and that can be fatal to children under three years of age. About 10 out of every 10,000 children with this flu will die from it. A vaccine for the flu, which eliminates the chance of getting it, causes death in 5 out of every 10,000 children. Would you vaccinate your child?

  On its face, it seems an easy call, right? You’d choose to do it because not vaccinating has twice the mortality rate as the vaccination. However, most parents in the survey opted not to vaccinate their children. Why? Because it caused 5 deaths per 10,000; never mind that without the vaccine, their children faced twice the risk of death from the flu. Those who would not permit vaccinations indicated that they would “feel responsible if anything happened because of [the] vaccine.” The same parents tended to dismiss the notion that they would “feel responsible if anything had happened because I failed to vaccinate.” In other words, many parents felt more responsible for a bad outcome if it followed their own actions than if it simply resulted from lack of action.

  In other studies, subjects consistently view various actions taken as less moral than actions not taken—even when the results are the same or worse. Subjects, for instance, were asked to assess the following situation: John, a tennis player, has to face a tough opponent tomorrow in a decisive match. John knows his opponent is allergic to a particular food. In the first scenario, John recommends the food containing the allergen to hurt his unknowing opponent’s performance. In the second, the opponent mistakenly orders the allergenic food, and John, knowing his opponent might get sick, says nothing. A majority of people judged that John’s action of recommending the allergenic food was far more immoral than John’s inaction of not informing the opponent of the allergenic substance. But are they really different?

  Think about how we act in our daily lives. Most of us probably would contend that telling a direct lie is worse than withholding the truth. Missing the opportunity to pick the right spouse is bad but not nearly as bad as actively choosing the wrong one. Declining to eat healthy food may be a poor choice; eating junk food is worse. You might feel a small stab of regret over not raising your hand in class to give the correct answer, but raise your hand and provide the wrong answer and you feel much worse.

  Psychologists have found that people view inaction as less causal, less blameworthy, and less harmful than action even when the outcomes are the same or worse. Doctors subscribe to this philosophy. The first principle imparted to all medical students is “Do no harm.” It’s not, pointedly, “Do some good.” Our legal system draws a similar distinction, seldom assigning an affirmative duty to rescue. Submerge someone in water and you’re in trouble. Stand idly by while someone flails in the pool before drowning and—unless you’re the lifeguard or a doctor—you won’t be charged with failing to rescue that person.

  In business, we see the same omission bias. When is a stockbroker in bigger trouble? When she neglects to buy a winning stock and, say, misses getting in on the Google IPO? Or when she invests in a dog, buying shares of Lehman Brothers with your retirement nest egg? Ask hedge fund managers and, at least in private, they’ll confess that losing a client’s money on a wrong pick gets them fired far more easily than missing out on the year’s big winner. And they act accordingly.

  In most large companies, managers are obsessed with avoiding actual errors rather than with missing opportunities. Errors of commission are often attributed to an individual, and responsibility is assigned. People rarely are held accountable for failing to act, though those errors can be just as costly. As Jeff Bezos, the founder of Amazon, put it during a 2009 management conference: “People overfocus on errors of commission. Companies overemphasize how expensive failure’s going to be. Failure’s not that expensive.… The big cost that most companies incur is much harder to notice, and those are errors of omission.”

  This same thinking extends to sports officials. When referees are trained and evaluated in the NBA, they are told that there are four basic kinds of calls: correct calls, incorrect calls, correct noncalls, and incorrect noncalls. The goal, of course, is to be correct on every call and noncall. But if you make a call, you’d better be right. “It’s late in the game and, let’s say, there’s goaltending and you miss it. That’s an incorrect noncall and that’s bad,” says Gary Benson, an NBA ref for 17 years. “But let’s say it’s late in the game and you call goaltending on a play and the replay shows it was an incorrect call. That’s when you’re in a really deep mess.”*

  Especially during crucial intervals, officials often take pains not to insinuate themselves into the game. In the NBA, there’s an unwritten directive: “When the game steps up, you step down.” “As much as possible, you gotta let the players determine who wins and loses,” says Ted Bernhardt, another longtime NBA ref. “It’s one of the first things you learn on the job. The fans didn’t come to see you. They came to see the athletes.”

  It’s a noble objective, but it expresses an unmistakable bias, and one could argue that it is worse than the normal, random mistakes officials make during a game. Random referee errors, though annoying, can’t be predicted and tend to balance out over time, not favoring one team over the other. With random errors, the system can’t be gamed. A systematic bias is different, conferring a clear advantage (or disadvantage) on one type of player or team over another and enabling us—to say nothing of savvy teams, players, coaches, executives, and, yes, gamblers—to predict who will benefit from the officiating in which circumstances. As fans, sure, we want games to be officiated accurately, but what we should really want is for games to be officiated without bias. Yet that’s not the case.

  Start with baseball. In 2007, Major League Baseball’s website, mlb.com, installed cameras in ballparks to track the location of every pitch, accurate to within a centimeter, so that fans could follow games on their handhe
lds, pitch by pitch. The data—called Pitch f/x—track not only the location but also the speed, movement, and type of pitch. We used the data, containing nearly 2 million pitches and 1.15 million called pitches, for a different purpose: to evaluate the accuracy of umpires. First, the data reveal that umpires are staggeringly accurate. On average, umpires make erroneous calls only 14.4 percent of the time. That’s impressive, especially considering that the average pitch starts out at 92 mph, crosses the plate at more than 85 mph, and usually has been garnished with all sorts of spin and movement.

  But those numbers change dramatically depending on the situation. Suppose a batter is facing a two-strike count; one more called strike and he’s out. Looking at all called pitches in baseball over the last three years that are actually within the strike zone on two-strike counts (and removing full counts where there are two strikes and three balls on the batter), we observed that umpires make the correct call only 61 percent of the time. That is, umpires erroneously call these pitches balls 39 percent of the time. So on a two-strike count, umpires have more than twice their normal error rate—and in the batters’ favor.

  What about the reverse situation, when the batter has a three-ball count and the next pitch could result in a walk? Omission bias suggests that umpires will be more reluctant to call the fourth ball, which would give the batter first base. Looking at all pitches that are actually outside the strike zone, the normal error rate for an umpire is 12.2 percent. However, when there are three balls on the batter (excluding full counts), the umpire will erroneously call strikes on the same pitches 20 percent of the time.

  In other words, rather than issue a walk or strikeout, umpires seem to want to prolong the at-bat and let the players determine the outcome. They do this even if it means making an incorrect call—or, at the very least, refraining from making a call they would make under less pressured circumstances.

  The graph on this page plots the actual strike zone according to MLB rules, represented by the box outlined in black. Taking all called pitches, we plot the “empirical” strike zone based on calls the umpire is actually making in two-strike and three-ball counts. Using the Pitch f/x data, we track the location of every called pitch and define any pitch that is called a strike more than half the time to be within the empirical strike zone. The strike zone for two-strike counts is represented by the dashed lines, and for three-ball counts it is represented by the darker solid area.

  The graph shows that the umpire’s strike zone shrinks considerably when there are two strikes on the batter. Many pitches that are technically within the strike zone are not called strikes when that would result in a called third strike. Conversely, the umpire’s strike zone expands significantly when there are three balls on the batter, going so far as to include pitches that are more than several inches outside the strike zone. To give a sense of the difference, the strike zone on three-ball counts is 93 square inches larger than the strike zone on two-strike counts.*

  ACTUAL STRIKE ZONE FOR THREE-BALL VERSUS TWO-STRIKE COUNTS

  Box represents the rules-mandated strike zone. Tick marks represent a half inch.

  The omission bias should be strongest when making the right call would have a big influence on the game but missing the call would not. (Call what should be a ball a strike on a 3–0 pitch and, big deal, the count is only 3–1.) Keeping that in mind, look at the next graph. The strike zone is smallest when there are two strikes and no balls (count is 0–2) and largest when there are three balls and no strikes (count is 3–0).

  ACTUAL STRIKE ZONE FOR 0–2 AND 3–0 COUNTS

  Box represents the rules-mandated strike zone. Tick marks represent a half inch.

  The strike zone on 3–0 pitches is 188 square inches larger than it is on 0–2 counts. That’s an astonishing difference, and it can’t be a random error.

  We also can look at the specific location of pitches. Even for obvious pitches, such as those in the dead center of the plate or those waaay outside the strike zone—which umpires rarely miss—the pitch will be called differently depending on the strike count. The umpire will make a bad call to prolong the at-bat even when the pitch is obvious. So what happens with the less obvious pitches? On the most ambiguous pitches, those just on or off the corners of the strike zone that are not clearly balls or strikes, umpires have the most discretion. And here, not surprisingly, omission bias is the most extreme. The table below shows how strike-ball calls vary considerably depending on the situation.

  PERCENTAGE OF CORRECT CALLS OF MLB HOME PLATE UMPIRES BY SITUATION

  A shrewd batter armed with this information could—and should—use it to his advantage. Facing an 0–2 count and knowing that the chances of a pitch being called a strike are much lower, he would be smart to be conservative in his decision to swing. Conversely, on a 3–0 count, the umpire is much more likely to call a strike, so the batter may be better off swinging more freely.

  From Little League all the way up to the Major Leagues, managers, coaches, and hitting experts all encourage players to “take the pitch” on 3–0. The thinking, presumably, is that the batter is so close to a walk, why blow it? But considering the home plate umpire’s omission bias, statistics suggest that batters might be better off swinging, because they’re probably conceding a strike otherwise. And typically, a pitcher facing a 3–0 count conservatively throws a fastball down the middle of the plate to avoid a walk. (Of course, if the pitcher also knows these numbers, he might throw a more aggressive pitch instead.)

  There are other indications that umpires don’t want to insert themselves into the game. For as long as sports have existed, fans have accused officials of favoring star players, giving them the benefit of the calls they make. As it turns out, there is validity to the charges of a star system. Star players are treated differently by the officials, but not necessarily because officials want to coddle and protect the best (and most marketable) athletes. It happens because the officials don’t want to influence the game.

  If Albert Pujols, the St. Louis Cardinals’ slugger—for our money, the best hitter in baseball today—is up to bat, an umpire calling him out on a third strike is likely to get an earful from the crowd. Fans want to see stars in action; they certainly don’t want the officials to determine a star’s influence on the game. Almost by definition, stars have an outsized impact on the game, so umpires are more reluctant to make decisions against them than, say, against unknown rookies. Sure enough, we find that on two-strike counts, star hitters—identified by their all-star status, career hitting statistics, awards, and career and current salaries—are much less likely to get a called third strike than are nonstar hitters for the exact same pitch location. This is consistent with omission bias and also with simple star favoritism.

  But here’s where our findings get really interesting. On three-ball counts, star hitters are less likely to get a called ball, controlling again for pitch location. In other words, umpires—already reluctant to walk players—are even more reluctant to walk star hitters. This is the opposite of what you would expect if umps were simply favoring star athletes, but it is consistent with trying not to influence the game. The result of both effects is that umpires prolong the at-bats of star hitters—they are more reluctant to call a third strike but also more reluctant to call the fourth ball. In effect, the strike zone for star hitters shrinks when they have two strikes on them but expands when they have three balls in the count. Umpires want star hitters in particular to determine their own fate and as a result give them more chances to swing at the ball.

  As fans, we want that, too. Even if you root for the St. Louis Cardinals, you’d probably rather see Pujols hit the ball than walk. As an opposing fan, you’d like him to strike out, but isn’t it sweeter when he swings and misses than when he takes a called third strike that might be ambiguous? We essentially want the umpire taken out of the play. Fans convey a clear message—Let Pujols and the other team’s ace duel it out—and umpires appear to be obliging.

  The umpire’s omiss
ion bias affects star pitchers in a similar way. Aces are given slightly bigger strike zones, particularly on three-ball counts, consistent with a reluctance to influence the game by prolonging an outing. The more walks a pitcher throws, the more likely he is to be replaced, and that obviously has a sizable impact on the game and the fans.

  In the NBA, home to many referee conspiracy theories, skeptical fans (and Dallas Mavericks owner Mark Cuban) have long asserted the existence of a “star system.” The contention is that there is one set of rules for LeBron James, Kobe Bryant, and their ilk and a separate set for players on the order of Chris Duhon, Martell Webster, and Malik Allen. But confirming that star players receive deferential treatment from the refs is difficult, at least empirically. Stars have the ball more often, especially in a tight game as time winds down, and so looking at the number of fouls or turnovers on star versus nonstar athletes isn’t a fair comparison. Unlike in baseball, where we have the Pitch f/x data, we can’t actually tell whether a foul or violation should have been called. Did Michael Jordan push off against Bryon Russell before hitting the game-winning shot in the 1998 NBA finals? That’s a judgment call, not a call that current technology can answer precisely and decisively.

  The closest thing to a fair comparison between stars and nonstars we’ve found is what happens when two players go after a loose ball. A loose ball is a ball that is in play but is not in the possession of either team (think of a ball rolling along the floor or one high in the air). Typically, there is a mad scramble between two (or more) opposing players that often results in the referee calling a foul. We examined all loose ball situations involving a star and a nonstar player and analyzed how likely it is that a foul will be called on either one.* A nonstar player will be assessed a loose ball foul about 57.4 percent of the time, a star player only 42.6 percent of the time. If the star player is in foul trouble—three or more fouls in the first half, four or more fouls in the second half—the likelihood that he will be assessed a loose ball foul drops further, to 26.9 percent versus 73.1 percent for the nonstar. But what if the nonstar player is in foul trouble but the star isn’t? It evens out, tilting slightly against the star player, who receives a foul 50.5 percent of the time, whereas his foul-ridden counterpart receives a foul 49.5 percent of the time. These results are consistent with the omission bias and the officials’ reluctance to affect the outcome. Fouling out a player has a big impact on the game, and fouling out a star has an even bigger impact. Much like the called balls and strikes in MLB for star players, it is omission bias, not star favoritism, that drives this trend. Star players aren’t necessarily being given better calls, just calls that keep them in the game longer.