Blog Post 10: Post-Election Reflection

John Kulow

2024/11/18

Introduction

It has now been about two weeks since the 2024 election, and, although some states like California and Alaska are still tabulating the final votes, we now have an all-but-final sense of where the national and state-level popular votes have landed. That also means that we can so how accurate, or rather how inaccurate, my predictive models were. Thus, I will begin by recapping my model and my predictions for the national popular vote, state-level results, and the electoral college, before then assessing the accuracy of my model, proposing hypotheses for my model’s inaccuracies, and then suggesting ways that I could have built my model better.

Recap of My Models and Predictions

For my predictions, I created three OLS models: one for the two-party national popular vote, one for the two-party state-level popular vote for states with public polling, and one for the two-party state-level popular vote for states without public polling.

My national popular vote model included variables for the October polling average, the September polling average, the (election year) Q2 national GDP growth, and a variable for incumbency. Below was what my model predicted for Harris’ two-party national vote share in the 2024 election:

Harris Vote ShareLower BoundUpper Bound
51.8271146.5505557.10367

As can be seen, my model predicted incorrectly a Harris national popular vote victory, with the lower bound of the 95% confidence interval being a Trump victory with 53.45% and the upper bound being a Harris landslide with a 57.1% share of the vote.

State-Level Vote and Electoral College

In terms of my state-level predictions, for the 23 states and two relevant congressional districts for which we had publicly available polling, I used an OLS model with variables for October and September polling averages, one- and two-cycle vote share lag, incumbent status, and the October national polling average. This model yielded the following predictions for these 25 states/districts:

StateHarris Vote ShareLower BoundUpper Bound
Arizona50.1706844.8501955.49117
California65.1702459.7808070.55968
Florida48.3654043.0333953.69742
Georgia50.4239945.1014955.74650
Indiana41.3464936.0240946.66890
Maine Cd 246.5806741.2620451.89930
Maryland68.0404862.6658673.41510
Massachusetts66.7620061.3992372.12476
Michigan51.4415746.1185356.76461
Minnesota53.8970248.5740859.21996
Missouri43.1661737.8331948.49915
Montana41.0960735.7741446.41800
Nebraska40.0941434.7645345.42374
Nebraska Cd 254.5768649.2419159.91181
Nevada51.3687446.0421856.69530
New Hampshire54.6799749.3486460.01130
New Mexico54.6240549.2958159.95228
New York59.8633354.4985565.22812
North Carolina50.3175744.9878455.64730
Ohio46.0236240.7004951.34675
Pennsylvania51.2331645.9060856.56025
Texas47.1223541.8024752.44224
Virginia54.6438149.3242859.96335
Washington60.5707655.2218265.91970
Wisconsin51.2017845.8642456.53932

As can be seen, this model predicted a Harris victory in all seven key battleground states, with single-digit margin loses for her in Florida, Texas, Ohio, and Maine’s 2nd Congressional district. The upper bounds of this model had Harris also winning these four, while the lower bounds had Trump sweeping all seven key swing states in addition to Minnesota, Nebraska’s 2nd Congressional district, New Hampshire, New mexico, and Virginia.

For the remaining states for which there was no public polling, I used a similar OLS model that removed the state-level polling averages but added back in the September national polling average that the national OLS model used. Combining my state-level predictions from these two models

This would have yielded a 319-219 electoral college victory for Harris. Clearly, however, this did not quite happen.

Assessing My Accuracy

As of today, the New York Times has Harris at 73,846,289 votes nationwide and Trump at 76,488,195, meaning that Harris is (currently) taking about 49.12% of the two-party national popular vote. When looking at how this compares to my national model…

Pred. Harris %Real Harris %Lower BoundUpper Bound
51.8349.1246.5557.1

… we can see that, although my model did predict a 51.83% Harris victory that overestimate her share by 2.71%, this margin does squarely fall within my 95% confidence interval, being 2.57% above my lower bound for her vote share.

State-Level Vote and Electoral College

In terms of my state-level predictions, let’s first look at my OLS model for states/districts with public polling. Below is the table for these 23 states and two congressional districts, shaded by how much Harris in reality underperformed my model (as seen quantified in the “Error” column).

StatePred. Harris %Real Harris %ErrorLower BoundUpper Bound
Arizona50.1706847.15253-3.018157344.8501955.49117
California65.1702460.72681-4.443434659.7808070.55968
Florida48.3654043.37995-4.985453643.0333953.69742
Georgia50.4239948.87845-1.545543145.1014955.74650
Indiana41.3464940.34257-1.003921736.0240946.66890
Maine Cd 246.5806745.37900-1.201669841.2620451.89930
Maryland68.0404863.31036-4.730123062.6658673.41510
Massachusetts66.7620062.66252-4.099477661.3992372.12476
Michigan51.4415749.30156-2.140014846.1185356.76461
Minnesota53.8970252.16725-1.729775848.5740859.21996
Missouri43.1661740.64426-2.521910837.8331948.49915
Montana41.0960739.63712-1.458951835.7741446.41800
Nebraska40.0941439.38059-0.713543834.7645345.42374
Nebraska Cd 254.5768652.12900-2.447859349.2419159.91181
Nevada51.3687448.37829-2.990453846.0421856.69530
New Hampshire54.6799751.40987-3.270101249.3486460.01130
New Mexico54.6240553.00043-1.623618449.2958159.95228
New York59.8633355.82881-4.034519354.4985565.22812
North Carolina50.3175748.29889-2.018676144.9878455.64730
Ohio46.0236244.27299-1.750634540.7004951.34675
Pennsylvania51.2331648.98235-2.250816445.9060856.56025
Texas47.1223542.98466-4.137695041.8024752.44224
Virginia54.6438152.85012-1.793696049.3242859.96335
Washington60.5707659.85109-0.719666655.2218265.91970
Wisconsin51.2017849.53156-1.670224945.8642456.53932

In all of the above states/districts, Harris did worse than my model predicted, with larger, more diverse states such as Florida, California, Texas, and New York having the biggest misses. This said, even my most inaccurate predictions still fell within the 95% confidence interval, although some, California, Florida, and Maryland in particular, were very close, coming within a point of falling below the lower bound.

My second state-level model fared slightly better. Below is a map of how my combined predictions fared against reality, with blue-shaded states representing those which Harris did better than what my model(s) predicted and red-shared states being those which she did worse than my predictions.

In terms of how this translates to who actually won each state, below are the maps of the predicted and actual state-level victors (with Alaska, Hawaii, DC, and the congressional districts in Nebraska and Maine not shown, but not changing between my model compared to the results):

In terms of quantifying how much my predictions were off, below are the bias, root mean squared error, and mean average error of my combined predictions:

BiasRMSEMAE
-1.12.091.66

Although less than ideal, these values are not horribly far off the final results, especially given that this race ended up being less close than 2020 (although a 1-2% error still would have substantially swung the election). This said, there were differences in how accurate my two state-level models were.

In terms of my model for states with polling, this model fared notably worse, with higher bias, RMSE, and MAE values compared to my overall state-level predictions:

BiasRMSEMAE
-2.492.792.49

Meanwhile, the subset of states for which I used the secondary model had almost no bias, and had notably lower RMSE and MAE values:

BiasRMSEMAE
0.021.241

I will explore potential reasons for this discrepancy below, but I preliminarily believe that the main reason behind this is that some of the largest shocking electoral shifts of the night happened in large, diverse states like California, New York, Texas, and Florida, all of which are important enough (either electorally or otherwise) for there to be public polling available for, whereas smaller, less diverse, electorally-safe states such as those in the Great Plains and Upper New England did not have polling.

Potential Explanations for Inaccuracies

Hypothesis 1: Incumbency

To begin with, I likely did not adequately consider how closely Harris would be tied to Biden in the eyes of the electorate. With the vast majority of Americans believing the country is headed in the “wrong direction” and with President Biden’s approval rating being stubbornly abysmal for so long, I should have better taken into account how poorly the electorate may view the current administration, particularly as Harris repeatedly either failed or refused to distance herself from her former running mate. In fact, at one point a few weeks ago, I strongly considered adding into my model a variable for the incumbent president’s (election year) June approval rating, but I decided against it because of prior beliefs that President Biden’s low approval rating was driven by concerns about age that would not play into a 2024 election with Harris as the Democratic nominee and by Democrats who would vote blue come November regardless. Clearly, this was not the case, and I should not have so quickly dismissed this variable and the notable rightwards shift its inclusion would have caused in my model. This is just one way I could have adjusted my incumbency variable though, and I am sure there could have been ways to make this more nuanced, especially given how complex the “incumbency” question was this year.

Hypothesis 2: Economics

Americans clearly are still feeling (or at least think they are feeling) the effects of inflation, and exit polls show that the economy was a top issue and that voters who cited the economy as their top issue swung towards Trump. This indicates that GDP was not an accurate indicator of how Americans were viewing the state of the economy. While this may have been a useful factor, it clearly did not capture the whole picture of how Americans have been viewing the economy. For example, I have seen numerous reports that exit polls show voters in swing states saying decisively that their local economy is doing better than four years ago, but that they perceive the national economy as doing worse, leading some to deem the current economic outlook in the country a “vibecession,” driven by perceptions and not necessarily reality. Because of these above reasons, I could have added in variables for inflation, RDI growth, or consumer sentiment. At one point I toyed with adding in a variable for RDI growth, but I ended up not using it because historically GDP growth had been a more accurate indicator. However, I did not let that stop me from including less historically-accurate variables such as September polling averages in my model since historical accuracy does not necessarily mean that such variables do not add nuance to my model in important ways.

Hypothesis 3: Demographics

While much more relevant to my state-level model, demographic-related electoral trends clearly played a crucial role in this election with minority voters, especially Hispanic, Asian, and Native American voters, swinging strongly towards the GOP. This has been a common explanation in the media for why states like New York, New Jersey, Illinois, California, Texas, and Florida all swung hard to the right. These demographic trends were somewhat caught preemptively by polling crosstabs, and to correct for this I could have added variables into my state-level models adjusting my polling models by trends with different demographic groups.

How I Might Have Changed My Model

While I have proposed a number of potential changes to my model above, I would have prioritized adding in variables for RDI growth and consumer sentiment to adjust how my model defined economic perceptions, I would have prioritized adding in some demographic-related weighting mechanism for my state-level models, and I would have added back in that aforementioned June approval rating variable that I almost included in prior blog posts. While my model still likely would have missed in some ways, with the benefit of hindsight I can see why models overestimated Harris and voters’ perceptions of the economy, and why my state-level model failed to adequately predict differing trends in some states and regions compared to others.

Altogether, I am fairly proud of how my model held up. While I did incorrectly predict the final result, with one or two minor adjustments the narrow victory for Harris that I predicted in the swing states easily could have shifted to being a narrow victory for Trump instead. I am also very pleased with how surprisingly well my supplementary state-level model held up, albeit with the aforementioned caveat, and I do feel that through these models and predictions I have learned more about America’s electorate and what truly seems to matter in elections here.