## Comparing 2020 Election Models

Written August 21, 2023Election models generally have three major components:

A prediction of each state’s winning margin (e.g. (D) +3 percentage points),

An estimate of each state’s prediction error, and

An estimate of the relationship between state errors.

Step 1 is something that many individuals can manage fairly well. We have access to high-quality polls, election history data, and more. However, this doesn't necessarily translate to a high level of confidence in predicting election outcomes, especially since many critical battleground states often end up with extremely narrow margins.

The current leaders of election modeling focus on steps 2 and 3. We are aware that poll outcomes come with margins of error. These errors can stem from factors such as sample size but are also influenced by the representativeness of the sample. The latter aspect can lead to broader, correlated errors that span across multiple states. This phenomenon was prominently demonstrated during the 2016 election.

So let’s view the full ranges of results simulated by the different models and consider their reasonability. Here’s the Datacracy prediction:

And a couple competing models: Fivethirtyeight (left) and The Economist (right).

Let's begin with the obvious: the Datacracy model exhibited less certainty about a Biden victory compared to the others. However, this isn't inherently intriguing since we have only one election to evaluate. Instead, let's contrast the range provided by each outlet with our own.

Here is Fivethirtyeight’s range:

Notice the large bump in their range. That's where Biden wins >400 electors. In order for Biden to have done that, he would've needed to win states like Tennessee, Alabama, South Carolina or Mississippi. This is where the model said the results were **MOST LIKELY** to land.

How likely does that seem? To me, that is a pretty fringe prediction. But, it was likely necessary for their model in order to generate predictions where Trump wins over 300 electors. I assume this is because they're infusing strong error correlation across many states in order to capture a broader range of potential results. In the Datacracy model, we do something similar but without overstating Biden's chances of a landslide. Trump could have a solid victory while Biden's likelihood of winning states like Mississippi is nearly half of what FiveThirtyEight projected. That seems more in line with intuition about which states were truly in play in 2020.

Given that, the area around the actual 306 electoral votes earned by Biden in the Fivethirtyeight model is a reasonable likelihood. This is a good result. There are, in my opinion, too many results that were given reasonable likelihoods, but at least the answer wasn't missed.

Now The Economist range:

These results seem to face the converse issue presented earlier. The actual outcome (306 electoral votes for Biden) is portrayed as improbable. There appear to be only a handful of potential outcomes that dominate their simulation. This observation, combined with the low probability of a Trump victory, indicates to me that their model does not adequately account for error correlation. In other words, there seems to be an excessive reliance on point predictions heavily influenced by polls.

In summary, I argue the DatacracyNow model has effectively found a happy medium between the attempts of its (friendly) competitors. Specifically, the range of potential election outcomes assigns marginal probabilities to seemingly rare events while attributing significant probabilities to substantial polling errors across battleground states.