My name is Jonah B. Gelbach, and I am an Assistant Professor of Economics at the University of Maryland, College Park. If you're looking for my usual web page, it may be found at http://glue.umd.edu/[ \tilde]gelbach/usual-index.html. If you are looking for a set of links I posted about the election, you can find those at http://glue.umd.edu/[ \tilde]gelbach/whowon/links.html.

In this note, I address several issues:

  1. What is the correct functional form relating county size and candidates' number or share of votes, and how should one estimate the relevant parameters?
  2. What is the correct functional form relating various candidates' number or share of votes?
  3. Why does the plot of shares against shares seem to make the Palm Beach outlier effect go away?
  4. Is the total of 3,407 votes attributed to Pat Buchanan in Palm Beach County statistically reasonable, or is Buchanan's Palm Beach total an extreme outlier?

I wish to note that I have benefited from discussions about this issue with Rob Shimer (whose correspondence and web site at http://www.princeton.edu/[ \tilde]shimer/election.html. caused me to begin thinking about this issue systematically), Scott Wallsten, Rhiannon Patterson, and Seth Sanders. I hope in this note to provide a primitive, behaviorally-based argument regarding the Buchanan-Palm Beach controversy. Other analyses of which I am aware have not done so, relying on ad hoc (though widely used) statistical techniques. Ironically, in my view, the closest to being right have been the most statistically simple methodologies.

Comments or questions are welcome, at gelbach@glue.umd.edu.

Let's begin by assuming that everyone in a county has identical preferences over candidates. This is of course an unreasonable assumption, and I will relax it below. But it will simplify the initial discussion. For purposes of this discussion, I will take the primitive to be pk , the probability that a given voter votes for some candidate k . It is straightforward to show that this primitive can itself be derived from a more primitive description of (rational, consistent) preferences regarding a person's choice of candidate for whom to vote,1 and I will not do so here.

Let's do the simple thing first. Aggregating over all residents of County c , we may write the expected total number of votes for candidate k as


E [Ykc| Nc,pk] = Si = 1 Nc pk = pk Nc,
(1)

where Nc is the total number of residents (i.e. voters) in County c .

There is an important thing to note straight away: the relationship between the conditional mean total number of votes for candidate k and county size is precisely linear, with an intercept of 0. The intuition is very simple. If there were no voters, candidate k would get no votes, which yields the conclusion regarding the intercept. Secondly, if we know the expected number of votes for candidate k in one group of people, and we are handed another group of the same exact size and preferences, we simply have to believe that the expected number of votes for candidate k will double. To suggest otherwise is to violate the laws of conditional expectation.

Plots of the number of votes for a particular candidate against the number of voters in the county are hence exactly the right functional form to consider. To consider realized votes rather than their conditional expectation, note that any random variable may be written as the sum of its conditional expectation and a residual whose unconditional mean is 0:


Ykc = pk Nc + ukc
(2)

By solving for Nc for two candidates, j and k , we see that the relationship between observed votes for the two candidates is


Ykc = pk
pj
Yjc + ujkc,
(3)

where ujkc = ukc - ujc [(pk)/(pj)] has mean 0 because each of the residuals ukc and ujc does. This latter fact might seem trivial, but it is the key to why specifications involving logs are simply wrong. To see this, take logs of both sides:


lnYkc = ln æ
ç
è
pk
pj
Yjc +ujkc ö
÷
ø
,
(4)

which certainly does not yield a linear relationship between the logs. While others have suggested to me that linear plots are ``meaningless'' because of ``noise near the origin'' from small counties, the truth is precisely the opposite. Log-log specifications introduce correlated error as well as noise. A plot of the log of one candidate's vote total against the log of another's is simply meaningless, and no conclusion can be drawn from the observation that a particular county does not appear to be an outlier in such a graph.

By contrast, a plot of vote shares for one candidate against vote shares for another is meaningful, as can be seen by using the relationship in equation :


Ykc
Nc
= pk
pj
Yjc
Nc
+ ujkc
Nc
(5)

It is instructive to note that this equation explains the by-now well-known fact that plotting Buchanan shares against either Bush or Gore shares makes Palm Beach look like not such an outlier after all. Why is this? Because Palm Beach is a large County, and the dispersion of the residual in the shares plots is necessarily reduced for large counties. This is perhaps ironic, but it is precisely the analysis of shares plots that introduces confusion related to County size.

Perhaps a useful way to elaborate this point is as follows. Suppose we accept for the sake of argument that Palm Beach is, in fact, systematically biased by the ballot (or anything else). Now imagine that we double the size of Palm Beach, without fixing the source of the bias. Then the expected Buchanan vote there will be just as large in percentage terms as is currently observed, but the Palm Beach residual in the shares plot will have one-half the variance it does now. [Note: I previously wrote ``one-fourth'', rather than one-half. The reason for the correction is that the variance of ujkc may be shown to be proportional to Nc, a fact I'll prove later. Since V[ujkc/Nc|Nc] = (1/Nc)2 V[ujkc|Nc], it follows that V[ujkc/(2Nc)|Nc] = (1/2) V[ujkc|Nc].]

Simply put, smaller counties have larger variance in their observed vote totals than do larger counties. That means that plots of the shares will be very likely to turn up other counties that appear to be just as, or more, outlying than is Palm Beach - precisely because Palm Beach is a large County! The point is that unlikely events (i.e. an unusually high share of votes at the county level for one candidate) become even less likely as sample size increases.

Moreover, the question we really care about is how many Buchanan votes were likely erroneous. But plots of shares against shares don't tell us anything about that. Instead, they tell us how far the Buchanan share in Palm Beach deviated from what one would expect. But again, since Palm Beach is a large County, only a very small share deviation will be necessary to yield a large deviation in the number of votes. This is why analyses by some others have found a large outlying effect in terms of the number of Buchanan votes in Palm Beach, even when the authors assume erroneous nonlinear relationships: these authors are careful to transform their predicted values back into numbers of votes, rather than continuing to focus on obfuscating transformations.

What have we found so far, ignoring heterogeneity? Here is a brief summary:

  1. It is possible to derive the true statistical relationship between votes for a particular candidate and either county size or votes for other candidates. When we do derive this relationship, rather than simply assuming one that has no basis in statistical theory, we find that the true relationship between either realized or conditional mean votes for any candidate and those of any other is precisely linear.

  2. Plots of nonlinear transformations (like log-levels or log-shares) of candidates' votes are simply jumbles of noise. One should not expect to learn anything from them.

  3. Plots of candidate shares against each other are statistically meaningful, in that the linear relationship between conditional means carries over to conditional shares.

  4. However, plotting shares of candidates' votes against each other reduces the likelihood that a large county will seem like an outlier, because it causes compression of the residuals' dispersion (as measured by the coefficient of variation) proportional to county size.

Now let's consider the possibility of what can be estimated. It is tempting, given the linear relationship between the conditional mean number of votes for a candidate and county size in equation 1, to simply run a linear regression. Before we do that, we must remember that there is no guarantee that the residuals ukc mean 0 conditional on Nc - we only know that they have unconditional mean 0. Using the binomial theorem, the probability we will observe Ykc votes for candidate k in County c given the county's population Nc and the probability pkc of a vote for candidate k may be written


Pr [Ykc = y| pk, Nc] = Nc!
y! (Nc-y)!
pky 1-pk Nc-y
(6)

It follows that the density function for the residual ukc is:


Pr [ukc = u| pk, Nc]
=
Nc!
(pk Nc+ukc)! (Nc-(pk Nc+ukc))!
×
pk(pk Nc+ukc) (1-pk)Nc-(pk Nc+ukc)
(7)

The conditional expectation of ukc is Su ukc Pr [ukc = u| pk, Nc]. I have not tried to show, nor would I expect to succeed in showing, that this expectation is zero for all Nc . However, in such large populations, the normal distribution is likely to be a fine approximation to the binomial, so it may not be so unreasonable to run linear regressions.

Nonetheless, maximum likelihood estimation in this case is straightforward. Rather than extend this long note further, I will simply remind readers that the maximum likelihood estimate of a binomial probability is simply the population proportion. In this case, that means that our estimate of pk is simply [^p]k = Yk/N , where Yk is the total number of votes in the population for candidate k and N is the total number of voters in the population.2 This approach points to another advantage relative to using cross-County data: rather than having only 67 observations (the number of Florida counties), we can take advantage of the nearly 6 million votes cast in Florida.

For the moment, continue to assume there is no heterogeneity in voter preferences. Under this assumption, we can estimate the single unknown parameter that determines Buchanan votes by taking the ratio of total Buchanan votes in Florida - 17,358 - to total votes in Florida - 5,934,277. This yields an estimate of pBuchanan = .00292504. That is, for every 10,000 voters in Florida, we expect between 29 and 30 to vote for Pat Buchanan. Hence the expected number of votes for Pat Buchanan in a county with 431,621 voters (the count in Palm Beach) is 1262.5, just over 37% of what Buchanan actually got. This sort of figure has already been discussed publicly and is not news.

Some observers have suggested that there is a great deal of uncertainty in predicting Buchanan votes under the null hypothesis that nothing funny went on in Palm Beach. It is thus important to note that the estimated standard error associated with the estimate reported above is just .00002217, approximately two orders of magnitude smaller. This result is of course driven by the very large number of observations - i.e., votes. The point here is that any suggestion of uncertainty about the likelihood that a randomly chosen voter would have voted for Pat Buchanan is suspect. The fact that there are only 67 counties concerns only (statistically inappropriate) estimation methods and is totally irrelevant to this point.

We are now in a position to begin asking whether, maintaining our assumption of voter homogeneity, the total of 3,407 votes attributed to Pat Buchanan in Palm Beach County is statistically plausible. The key mathematical function for this analysis is the binomial distribution function laid out in equation 6 above. This is obviously a highly nonlinear function, so perhaps the best way to discuss whether or not the Palm Beach Buchanan vote of 3,407 is statistically unreasonable is to simply graph the probability that Buchanan would get at least a certain number of votes in a county the size of Palm Beach. Here's a graph that does that (page down if you are reading the HTML version):

Figure 1: Probability Buchanan vote is at least as large as ``votes'' in a Florida county of size 431,621 given a probability of Buchanan vote equal to .00292504

Figure

It is quite evident from this picture that there is - literally - 0 probability that Buchanan would get over 3000 votes. In fact, the same could be said for the possibility that Buchanan would get over 1500 votes. The small range of uncertainty around the expected value of 1,262.5 votes is due to the fact that there is only an extremely small amount of uncertainty regarding the number of Buchanan votes in such a large County, given the small likelihood that Floridians vote for Buchanan.

It is now time to incorporate heterogeneity in voter preferences. The most direct way to do this is to imagine that there are T types of voters. The probability that a voter of type t will vote for candidate k may be written ptk . The rest of the analysis from above carries through directly. In particular, we have:


E [Ykc| Nc,pk]
=
Si = 1 NtcSt = 1 Tptk
=
St = 1 T ptk Ntc,
(8)

where Ntc is the number of voters of type t in County c . Note that the conditional mean relationship remains linear, with the intercept remaining 0. Moreover, we have


E [Ykc| Nc, pk] = St = 1 T ptk
ptj
E [Ykc| Nc,pj],
(9)

where parameters in bold are vectors with dimension T. Hence the true relationship between candidates' conditional mean votes is again exactly linear.

Linear regression estimates still suffer from the possibility that the residual is not conditionally mean 0. They now have the further problem that there may be omitted ``types''. I'll discuss this issue in a moment.

For each type t , one would again estimate the probability of voting for candidate k using the observed population proportion among voters of type t . What would we have to know in order to redo the homogeneous-voter analysis above while taking into account voter heterogeneity? We would have to know the number of types in the state, as well as the number of votes for Buchanan among voters of each type. Having estimated the relevant proportions, we would turn to Palm Beach County. We would take the number of voters of each type and calculate the probability that Buchanan got at least y votes in every logically possible combination among these voters. Maintaining the assumption of cross-voter independence, this probability is simply the summation over all combinations of the product of the probability that each component of the combination occurs.

For example, let's assume there are only Democrats, Republicans, and Independents. Then the probability that Buchanan would get exactly (yD, yR,yI) votes in Palm Beach County would be (note that I have dropped the k and c subscripts):


Pr [YD = yD,YR = yR,YI = yI| p, N]
=
(10)
ND!
yD! (ND-yD)!
pDyD (1-pD) N-yD ×
NR!
yR! (NR-yR)!
pRyR (1-pR) N-yR ×
NI!
yI! (NI-yI)!
pIyI (1-pI) N-yI
(11)

Even in Florida, we don't observe a particular individual's vote choice, so the data needed to estimate the probability vector p are not available. We might be able to do so using county or, better yet, precinct level data on party registration, although this would require certain assumptions about ``defections'' by party members. Moreover, there will always be some other ``type'' that we have failed to measure. So the question is not so much can we estimate the true values of the Buchanan probability vector estimates. Rather, it seems to me that the question is whether the estimate obtained by assuming homogeneity is reasonable, or, better yet, methodologically conservative.

There are uncertainties due to the extreme nonlinearity of the underlying distribution of Buchanan votes. However, I am willing to risk making the following mathematical conjecture: when a county's demographic composition is such that people in the county are less likely to vote for Buchanan, we will over estimate the probability of high numbers of Buchanan votes when we assume homogeneity across all counties.3 The remaining question is thus whether people in Palm Beach County are more or less likely to vote for Pat Buchanan than a randomly selected Floridian.

This issue has garnered some public attention since the Bush campaign suggested that Palm Beach County is a ``Buchanan stronghold''. There is now ample evidence to demonstrate that this claim is very unlikely to be true.4 In fact, the heterogeneity that is most likely to exist in Palm Beach County seems very likely to strengthen the conclusion drawn from figure 1. Palm Beach is a large County, with a heavily Democratic population, very little registration for the Reform party, and many elderly Jews. It is thus difficult to imagine that treating Palm Beach symmetrically with randomly chosen Florida voters does anything but bias upwards the probability of observing a large Buchanan vote. Similar arguments are made at http://www.econ.jhu.edu/people/ccarroll/HowManyBadBuchananBallots.html.

Please send comments or questions to me at gelbach@glue.umd.edu.


Footnotes:

1I realize that there is a serious debate among social scientists about the rationality of voting, but that is an orthogonal issue. Also, there is no cause for concern about IIA or related, generalized assumptions in the nested logit model, because I will focus on ``types'' of voters within whom there is no heterogeneity of preferences. There is thus no issue of assuming anything about cross-equation error terms.

2Observant readers will note that this fraction is different from the linear regression coefficient one gets using OLS, even if the constant is constrained to be zero, because this coefficient is the ratio of average votes for candidate k to average total votes by county. This ratio will differ from the MLE estimate, depending on the distribution of county sizes.

3This conjecture can really only be examined using simulation methods, and I can't spend too much more time on this!

4For a discussion of the ``stronghold'' hypothesis, see http://www.econ.jhu.edu/people/ccarroll/HowManyBadBuchananBallots.html as well as http://glue.umd.edu/[ \tilde]gelbach/whowon/stronghold.html.


File translated from TEX by TTH, version 2.80.
On 5 Dec 2000, 11:19.