Two Theories for Why the Polls Failed in 2020, and What It Means for 2024

Will the polls be wrong again this cycle?

It’s the question I probably get most, for obvious reasons. Unfortunately, it’s not an easy one to answer, and one reason might surprise you: Pollsters still don’t know exactly why the polls underestimated Donald J. Trump four years ago.

As a post-election report by professional pollsters put it: “Identifying conclusively why polls overstated the Democratic-Republican margin relative to the certified vote appears to be impossible with the available data.”

The exact explanation matters. Under some theories, polls may be much better in 2024; under others, pollsters are still vulnerable to another misfire.

In the absence of a clear answer, most theories center on “nonresponse bias,” in which Mr. Trump’s supporters were less likely to respond to surveys than demographically similar Biden voters. This is reasonable enough, but the details are murky — and again, they matter. In particular, they need to explain why the polls have sometimes been accurate during the Trump era.

It’s easy to forget, but the polls haven’t always been terrible since Mr. Trump came down the escalator. For all the problems with state polling in 2016, the high-quality national polls were excellent, and almost all high-quality polls excelled in the 2022 midterm elections. This variation in results requires pollsters and analysts to build a theory that fits the shifting error. It requires something much more nuanced than “Trump supporters don’t respond.”

Pollsters and analysts have studied the last eight years very closely (and they have made substantial changes, which we’ll explore tomorrow). Although they have countless hypotheses, I’d broadly say there are two not-entirely-mutually-exclusive theories for the polling misfires in 2016 and 2020. Depending on which you find more compelling, you’d have a different guess about how vulnerable the polls are to a misfire in November.

The unified theory

Let’s call the first approach the unified theory. It tries to explain, in one swoop, why the polls and Democrats do well in midterms, while the polls and Democrats do poorly in presidential elections.

This theory holds that pollsters simply can’t reach enough of the least politically engaged voters — and these voters overwhelmingly back Mr. Trump. The polls can do fine in midterm elections, when only the highly engaged (and now relatively Democratic-leaning voters) cast ballots, but they underestimate Mr. Trump in presidential elections.

If you’re a liberal reader of this newsletter, this theory may send a shiver down your spine. All cycle, we’ve noted Mr. Trump’s strength among less engaged voters. We’ve agonized over the challenges in polling. Recently, we’ve observed that the Times/Siena poll shows strange similarities to the midterm election. The unified theory stitches all of this together into one potential nightmare for Democrats, where all the subtle patterns in the Times/Siena data add up to a harbinger of yet another polling misfire — and another Trump presidency.

A version of this theory is popular among the most renowned pollsters and data scientists, and there’s a lot of evidence to support it. Pollsters have known for decades that the least engaged, least political voters are least likely to respond to surveys. This may even sound obvious: A political junkie would naturally be more excited to take a poll than someone without any interest in politics.

We see this in our data, too: In a typical Times/Siena poll, people who have previously voted in a primary are about twice as likely to respond to a survey as people who haven’t. Worse, the people with no voting history who do take polls clearly aren’t representative of nonvoters as a whole. The previous nonvoters who take polls are much more likely to vote compared with otherwise similar registrants who decline to respond to surveys. (We know this because, once the election is over, we’re able to see which people we’ve polled actually voted.)

There’s also considerable evidence that less engaged voters are likelier to back Mr. Trump, especially after accounting for their demographic characteristics. The Times/Siena data has supported this proposition all cycle — including once Kamala Harris became the nominee, if to a lesser degree.

Put it together, and pollsters might be stuck: As hard as they may try, they will never properly represent politically disengaged voters, and they will therefore never show enough support for Mr. Trump — at least when those disengaged voters decide to vote, as they tend to do in a presidential election.

The patchwork theory (but ultimately the pandemic)

The second theory is not so unified. Let’s call it the patchwork theory, though ultimately a lot of eggs wind up in the “it was the pandemic” basket.

In this tale, the polling errors in 2016 and 2020 may look similar, but they were actually very different. For one, the “gold standard” national polls were pretty good in 2016, while they were terrible in 2020. This suggests that there were distinct challenges in both elections, like undecided Republican voters who disliked Mr. Trump in 2016, the failure of state pollsters to weight by education, and ultimately the pandemic.

In most cases, these challenges have either faded since 2016 and 2020 or they have been fixed by pollsters. This theory doesn’t necessarily dispute that there’s a challenge reaching less engaged Trump voters, but perhaps that’s only one of many problems. With those other problems gone, the polls might be set up for a much more accurate 2024 cycle.

The patchwork theory obviously lacks the sweeping coherence of the unified theory, but in many respects there’s more evidence for it, starting with the 2016 election:

Education. In 2016, most state polls — including polls by the campaigns — were not weighted by self-reported education, and consequently had far too many college graduates. (Before 2016, whether you had a college degree was not a meaningful predictor of whether someone would vote Democratic or Republican; that’s no longer the case.) By our estimates at the time, weighting by education shifted polls by an average of four percentage points toward Mr. Trump. This helped explain the accuracy of the traditional national surveys, which were almost always weighted by education.
Late deciders. The 2016 election also featured an unusual number of undecided voters and voters backing third-party candidates. These voters were disproportionately Republican, and they appeared to break toward Mr. Trump over the last few days, based on the exit polls and post-election studies that recontacted previously polled undecided voters. Here again, the national polls had the edge: There were many national polls taken over the final few days of the race, and they were both accurate and showed Mr. Trump gaining. Conversely, there were few or no high-quality state polls taken over the last five days of the race. Every major final national poll, for instance, was fielded entirely after the end of the last high-quality education-weighted poll in Wisconsin — a Marquette poll fielded from Oct. 26 to Oct. 31.

These factors gave pollsters cause for optimism heading into 2020. In theory, weighting by education alone would fix many state polls, while late shifts wouldn’t be so much of a problem with fewer undecided voters. Put them together, and pollsters entered 2020 confident there wouldn’t be another polling misfire.

This confidence turned out to be misplaced. In 2020, the polls erred even worse than they did in 2016. This was especially true for the national pollsters who seemed to nail the 2016 election. The state polls fared no better than they did in 2016, and they often erred in the same places — like Wisconsin, where polls underestimated Mr. Trump by about eight points. They erred even though nearly all the state pollsters weighted by education, and even though the number of conflicted, undecided voters had plunged. So what happened?

The unified theory holds that the same underlying problem was behind the polling misfires in 2016 and 2020, helping explain the similar geographic distribution of error across the Northern battlegrounds in both elections. In this view, this nonresponse problem became even more severe as turnout rose to record modern levels and drew even more disengaged Trump voters into the electorate. The higher turnout in 2020 effectively canceled out the gains pollsters made by weighting on education.

When it comes to 2020, the patchwork theory ultimately leans a lot on one explanation: the pandemic, an extraordinary event that affected every American. Importantly, the pandemic affected people’s behavior. Many people stopped going outside and started working from home. All of this had a clear effect on surveys. Response rates went up, and surveys were becoming cheaper and seemingly demographically more representative — so much so that pollsters were rejoicing.

The pollsters may have been right to celebrate in April, but by the fall the pandemic was becoming partisan. Democrats tended to be masking and staying home — and more likely to respond to polls. The (overly simplistic but plausible) story here is that Democrats were free and available to take surveys all day, lonely and grateful to speak with a human, enraged by Mr. Trump and the pandemic, all while Republicans tended to be out living their lives.

There’s a mix of evidence to support a major but hard-to-quantify role for the pandemic in the 2020 survey error.

Response rates. As mentioned earlier, there’s clear evidence that response rates increased during the pandemic. Democratic response rates were significantly higher than Republican response rates in 2020 — about 20 percent higher in Times/Siena data — although we can’t be sure the pandemic was the reason for Democrats’ higher response rates relative to Republicans.
Timing. Before the pandemic, the polls showed a highly competitive race. The Times/Siena polls in November 2019, for instance, were extremely close to the final result, as were the poling averages by state in early 2020. Of course, it’s possible the polls were always wrong and Mr. Trump actually held a significant lead until his response to the pandemic put President Biden over the top. But the pattern is nonetheless what one might have expected if the pandemic had induced a problem in the polls.
Landlines. Landline telephone polling, which might be expected to be most affected by people staying at home, had extraordinary survey error. Remember the infamous ABC/Washington Post poll that found Mr. Biden up 17 points in Wisconsin? The landline respondents backed Mr. Biden by 30 points.
In Times/Siena polling, registered Democrats were twice as likely to respond as Republicans on a landline, but there was little to no gap among those reached via cellphone (the majority of our respondents).
Geography. There was a relationship between coronavirus prevalence in November 2020 and the polling error. Wisconsin was the epicenter for cases in late November, along with other states across the Northern tier. At the time, this was considered the likely explanation for why Mr. Biden was surging in Wisconsin down the stretch — that infamous ABC News poll contended: “Covid Surge Hurts Trump in WI.” In retrospect, the surge might have been merely helping Mr. Biden in the polls.

Which theory is right?

We’ve looked at two basic ways of understanding polling error in 2016 and 2020, but just about every analyst would probably position themselves somewhere on a spectrum between them.

After all, almost every theory for error described so far has merit. They’re rooted in reasonable theories of survey response. They’re backed by evidence. They would help explain why the polls were worse in 2020 than 2016, and why polls were better in 2022 and 2018 than either presidential election.

Unfortunately, there’s almost no way to untangle the relative merit of the various theories.

The best-case scenario for pollsters would go something like this: Pollsters fixed the problems that hurt them in 2016, and the pandemic hurt them in 2020 but it’s over now. Thus, the polls will finally have a good cycle in 2024 — possibly even if they make no meaningful methodological changes whatsoever. Maybe the polls could even underestimate Democrats now, if there’s some new source of error working to Mr. Trump’s advantage. The polls underestimated Democrats back in 2012, after all.

On that point, it’s noteworthy that many of the worst pollsters of 2020 seem to be producing far better results for Mr. Trump in 2024, even when they’re not making many or any methodological changes. The Quinnipiac poll is perhaps the best example. It’s the last remaining of the traditional telephone political polls using random-digit-dialing, and it’s shown far better results for Mr. Trump than four years ago. It’s not easy to explain how the Quinnipiac poll can show Mr. Trump ahead in Wisconsin and Michigan under the most extreme version of the unified theory, unless Mr. Trump is on track for a landslide.

Similarly, it’s hard to argue that the survey respondents who yielded a Biden +17 lead in Wisconsin were representative of the midterm electorate — the implication of the unified theory. Those respondents probably wouldn’t have said they favored the Republican Ron Johnson in the race for U.S. Senate, if they had been asked two years later. Clearly, something got a bit easier for pollsters for them to get the midterms right.

While it’s reasonable to say things might be better for pollsters, the worst-case scenario still remains: There is no reason to assume pollsters can reach the least politically interested voters in sufficient numbers, and there is plenty of reason to think they will back Mr. Trump in November. If this challenge remains just as great, the polls might miss badly yet again. Perhaps the polls could fare even worse if Mr. Trump fares even better among less engaged voters than four years ago, as polls have shown all cycle.

In that event, pollsters will hope that the changes they’ve made since 2020 will help address the problems that come next. Tomorrow, we’ll take a close look at those changes — and whether that gives them a chance to avoid another 2020-like polling error.

The post Two Theories for Why the Polls Failed in 2020, and What It Means for 2024 appeared first on New York Times.