My name is Jonah B. Gelbach, and I am an Assistant Professor
of Economics at the University of Maryland, College Park. If you're
looking for my usual web page, it may be found at
http://glue.umd.edu/
gelbach/usual-index.html. If you are
looking for a set of links I posted about the election, you can find
those at http://glue.umd.edu/
gelbach/whowon/links.html.
In this note, I address several issues:
I wish to note that I have benefited from discussions about
this issue with Rob Shimer (whose correspondence and web site at
http://www.princeton.edu/
shimer/election.html. caused me
to begin thinking about this issue systematically), Scott Wallsten,
Rhiannon Patterson, and Seth Sanders. I hope in this note to provide a
primitive, behaviorally-based argument regarding the Buchanan-Palm
Beach controversy. Other analyses of which I am aware have not done
so, relying on ad hoc (though widely used) statistical
techniques. Ironically, in my view, the closest to being right have
been the most statistically simple methodologies.
Comments or questions are welcome, at gelbach@glue.umd.edu.
Let's begin by assuming that everyone in a county has identical preferences over candidates. This is of course an unreasonable assumption, and I will relax it below. But it will simplify the initial discussion. For purposes of this discussion, I will take the primitive to be pk, the probability that a given voter votes for some candidate k. It is straightforward to show that this primitive can itself be derived from a more primitive description of (rational, consistent) preferences regarding a person's choice of candidate for whom to vote,1 and I will not do so here.
Let's do the simple thing first. Aggregating over all residents of County c, we may write the expected total number of votes for candidate k as
where Nc is the total number of residents (i.e. voters) in County c.
There is an important thing to note straight away: the relationship between the conditional mean total number of votes for candidate k and county size is precisely linear, with an intercept of 0. The intuition is very simple. If there were no voters, candidate k would get no votes, which yields the conclusion regarding the intercept. Secondly, if we know the expected number of votes for candidate k in one group of people, and we are handed another group of the same exact size and preferences, we simply have to believe that the expected number of votes for candidate k will double. To suggest otherwise is to violate the laws of conditional expectation.
Plots of the number of votes for a particular candidate against the number of voters in the county are hence exactly the right functional form to consider. To consider realized votes rather than their conditional expectation, note that any random variable may be written as the sum of its conditional expectation and a residual whose unconditional mean is 0:
By solving for Nc for two candidates, j and k, we see that the relationship between observed votes for the two candidates is
where
has
mean 0 because each of the residuals ukc and ujc does. This
latter fact might seem trivial, but it is the key to why
specifications involving logs are simply wrong. To see this, take
logs of both sides:
![]() |
(4) |
which certainly does not yield a linear relationship between the logs. While others have suggested to me that linear plots are ``meaningless'' because of ``noise near the origin'' from small counties, the truth is precisely the opposite. Log-log specifications introduce correlated error as well as noise. A plot of the log of one candidate's vote total against the log of another's is simply meaningless, and no conclusion can be drawn from the observation that a particular county does not appear to be an outlier in such a graph.
By contrast, a plot of vote shares for one candidate against vote shares for another is meaningful, as can be seen by using the relationship in equation (3):
It is instructive to note that this equation explains the by-now well-known fact that plotting Buchanan shares against either Bush or Gore shares makes Palm Beach look like not such an outlier after all. Why is this? Because Palm Beach is a large County, and the dispersion of the residual in the shares plots is necessarily reduced for large counties. This is perhaps ironic, but it is precisely the analysis of shares plots that introduces confusion related to County size.
Perhaps a useful way to elaborate this point is as follows. Suppose we accept for the sake of argument that Palm Beach is, in fact, systematically biased by the ballot (or anything else). Now imagine that we double the size of Palm Beach, without fixing the source of the bias. Then the expected Buchanan vote there will be just as large in percentage terms as is currently observed, but the Palm Beach residual in the shares plot will have one-half the variance it does now. [Note: I previously wrote ``one-fourth'', rather than one-half. The reason for the correction is that the variance of ujkc may be shown to be proportional to Nc, a fact I'll prove later. Since V[ujkc/Nc|Nc] = (1/Nc)2 V[ujkc|Nc], it follows that V[ujkc/(2Nc)|Nc] = (1/2) V[ujkc|Nc].]
Simply put, smaller counties have larger variance in their observed vote totals than do larger counties. That means that plots of the shares will be very likely to turn up other counties that appear to be just as, or more, outlying than is Palm Beach - precisely because Palm Beach is a large County! The point is that unlikely events (i.e. an unusually high share of votes at the county level for one candidate) become even less likely as sample size increases.
Moreover, the question we really care about is how many Buchanan votes were likely erroneous. But plots of shares against shares don't tell us anything about that. Instead, they tell us how far the Buchanan share in Palm Beach deviated from what one would expect. But again, since Palm Beach is a large County, only a very small share deviation will be necessary to yield a large deviation in the number of votes. This is why analyses by some others have found a large outlying effect in terms of the number of Buchanan votes in Palm Beach, even when the authors assume erroneous nonlinear relationships: these authors are careful to transform their predicted values back into numbers of votes, rather than continuing to focus on obfuscating transformations.
What have we found so far, ignoring heterogeneity? Here is a brief summary:
Now let's consider the possibility of what can be estimated. It is tempting, given the linear relationship between the conditional mean number of votes for a candidate and county size in equation 1, to simply run a linear regression. Before we do that, we must remember that there is no guarantee that the residuals ukc mean 0 conditional on Nc - we only know that they have unconditional mean 0. Using the binomial theorem, the probability we will observe Ykc votes for candidate k in County c given the county's population Nc and the probability pkc of a vote for candidate k may be written
It follows that the density function for the residual ukc is:
The conditional expectation of ukc is
.
I have not tried
to show, nor would I expect to succeed in showing, that this
expectation is zero for all Nc. However, in such large
populations, the normal distribution is likely to be a fine
approximation to the binomial, so it may not be so unreasonable to run
linear regressions.
Nonetheless, maximum likelihood estimation in this case is
straightforward. Rather than extend this long note further, I will
simply remind readers that the maximum likelihood estimate of a
binomial probability is simply the population proportion. In this
case, that means that our estimate of pk is simply
,
where Yk is the total number of votes in the
population for candidate k and N is the total number of voters
in the population.
2
This approach points to another advantage relative to using
cross-County data: rather than having only 67 observations (the number
of Florida counties), we can take advantage of the nearly 6 million
votes cast in Florida.
For the moment, continue to assume there is no heterogeneity in voter preferences. Under this assumption, we can estimate the single unknown parameter that determines Buchanan votes by taking the ratio of total Buchanan votes in Florida - 17,358 - to total votes in Florida - 5,934,277. This yields an estimate of pBuchanan = .00292504. That is, for every 10,000 voters in Florida, we expect between 29 and 30 to vote for Pat Buchanan. Hence the expected number of votes for Pat Buchanan in a county with 431,621 voters (the count in Palm Beach) is 1262.5, just over 37% of what Buchanan actually got. This sort of figure has already been discussed publicly and is not news.
Some observers have suggested that there is a great deal of uncertainty in predicting Buchanan votes under the null hypothesis that nothing funny went on in Palm Beach. It is thus important to note that the estimated standard error associated with the estimate reported above is just .00002217, approximately two orders of magnitude smaller. This result is of course driven by the very large number of observations - i.e., votes. The point here is that any suggestion of uncertainty about the likelihood that a randomly chosen voter would have voted for Pat Buchanan is suspect. The fact that there are only 67 counties concerns only (statistically inappropriate) estimation methods and is totally irrelevant to this point.
We are now in a position to begin asking whether, maintaining our assumption of voter homogeneity, the total of 3,407 votes attributed to Pat Buchanan in Palm Beach County is statistically plausible. The key mathematical function for this analysis is the binomial distribution function laid out in equation 6 above. This is obviously a highly nonlinear function, so perhaps the best way to discuss whether or not the Palm Beach Buchanan vote of 3,407 is statistically unreasonable is to simply graph the probability that Buchanan would get at least a certain number of votes in a county the size of Palm Beach. Here's a graph that does that (page down if you are reading the HTML version):
It is quite evident from this picture that there is - literally - 0 probability that Buchanan would get over 3000 votes. In fact, the same could be said for the possibility that Buchanan would get over 1500 votes. The small range of uncertainty around the expected value of 1,262.5 votes is due to the fact that there is only an extremely small amount of uncertainty regarding the number of Buchanan votes in such a large County, given the small likelihood that Floridians vote for Buchanan.
It is now time to incorporate heterogeneity in voter preferences. The most direct way to do this is to imagine that there are T types of voters. The probability that a voter of type t will vote for candidate k may be written ptk. The rest of the analysis from above carries through directly. In particular, we have:
where Ntc is the number of voters of type t in County c. Note that the conditional mean relationship remains linear, with the intercept remaining 0. Moreover, we have
where parameters in bold are vectors with dimension T. Hence the true relationship between candidates' conditional mean votes is again exactly linear.
Linear regression estimates still suffer from the possibility that the residual is not conditionally mean 0. They now have the further problem that there may be omitted ``types''. I'll discuss this issue in a moment.
For each type t, one would again estimate the probability of voting for candidate k using the observed population proportion among voters of type t. What would we have to know in order to redo the homogeneous-voter analysis above while taking into account voter heterogeneity? We would have to know the number of types in the state, as well as the number of votes for Buchanan among voters of each type. Having estimated the relevant proportions, we would turn to Palm Beach County. We would take the number of voters of each type and calculate the probability that Buchanan got at least y votes in every logically possible combination among these voters. Maintaining the assumption of cross-voter independence, this probability is simply the summation over all combinations of the product of the probability that each component of the combination occurs.
For example, let's assume there are only Democrats, Republicans, and Independents. Then the probability that Buchanan would get exactly (yD, yR,yI) votes in Palm Beach County would be (note that I have dropped the k and c subscripts):
Even in Florida, we don't observe a particular individual's
vote choice, so the data needed to estimate the probability vector
are not available. We might be able to do so using county
or, better yet, precinct level data on party registration, although
this would require certain assumptions about ``defections'' by party
members. Moreover, there will always be some other ``type'' that we
have failed to measure. So the question is not so much can we
estimate the true values of the Buchanan probability vector estimates.
Rather, it seems to me that the question is whether the estimate
obtained by assuming homogeneity is reasonable, or, better yet,
methodologically conservative.
There are uncertainties due to the extreme nonlinearity of the underlying distribution of Buchanan votes. However, I am willing to risk making the following mathematical conjecture: when a county's demographic composition is such that people in the county are less likely to vote for Buchanan, we will over estimate the probability of high numbers of Buchanan votes when we assume homogeneity across all counties. 3 The remaining question is thus whether people in Palm Beach County are more or less likely to vote for Pat Buchanan than a randomly selected Floridian.
This issue has garnered some public attention since the Bush campaign suggested that Palm Beach County is a ``Buchanan stronghold''. There is now ample evidence to demonstrate that this claim is very unlikely to be true. 4 In fact, the heterogeneity that is most likely to exist in Palm Beach County seems very likely to strengthen the conclusion drawn from figure 1. Palm Beach is a large County, with a heavily Democratic population, very little registration for the Reform party, and many elderly Jews. It is thus difficult to imagine that treating Palm Beach symmetrically with randomly chosen Florida voters does anything but bias upwards the probability of observing a large Buchanan vote. Similar arguments are made at http://www.econ.jhu.edu/people/ccarroll/HowManyBadBuchananBallots.html.
Please send comments or questions to me at gelbach@glue.umd.edu.