Pollsters had many reasons to be proud after the 2024 election, but there were still plenty of warning signs.
Most obviously, the polls underestimated President Trump’s support yet again, even if not by as much as in 2016 or 2020. Worse, the so-called “gold standard” polls often underestimated him most of all.
You might remember, for instance, that the well-regarded Selzer poll of Iowa found Kamala Harris leading by three percentage points a few days before the election; Mr. Trump went on to win Iowa by 13 points. Less well known: The American National Election Study, the crown jewel of U.S. election social science research, found Ms. Harris ahead by eight points among registered voters — despite an eight-figure budget, $100 survey incentives, door-to-door interviews and a response rate dwarfing what public polls can realistically accomplish.
If even this extraordinary effort wasn’t enough, any effort to improve the polls would have to focus on “weighting” — the statistical technique used to ensure that various demographic groups are each correctly represented in a survey. For instance, if a poll’s sample has too few voters without a college degree, a pollster will give more weight to respondents without a degree to ensure that they represent the right share of the population.
In the next New York Times/Siena poll, we’ll introduce the most significant methodological changes to our survey since it began a decade ago. All of the changes involve weighting, with the goal of making the survey more deeply representative of the population.
When we tried out these changes on Times/Siena polls from previous elections, they yielded modest but not miraculous improvements. Our polls from 2020, for instance, would have still overestimated Joe Biden by a wide margin (he won the election, but our polls had him winning by much more). Still, modest gains can be meaningful. If the changes had been in place in 2024, when the polls didn’t have a bad year, the poll’s underestimate of Mr. Trump — nearly three points — would have been cut nearly in half.
The changes are significant, but they’re a natural extension of the longstanding philosophy of the Times/Siena poll. It tries to marry the “gold standard” survey sampling of traditional pollsters with the more sophisticated weighting and modeling techniques used in the world of campaigns and analytics. As a consequence, the poll has always been weighted across more categories than other public polls. In the last election, every Times/Siena poll was weighted by at least 10 categories where applicable: race, age, education, gender, party registration, region, history of voting, method of voting in 2020, marital status and home ownership.
But as impressive as that list may seem, there’s actually a lot of data we don’t use in weighting. We know whether respondents are donors to a campaign, whether their neighborhoods are blue or red, whether they’re licensed doctors or lawyers, whether their spouses are Democrats or Republicans and more, based on voter file records. Nonetheless, we use none of this information in the typical Times/Siena poll: It’s hard or even impossible to weight a poll on more than a dozen or so categories, at least at typical sample sizes.
Even beyond the limited number of groups used in weighting, polls are also limited in their ability to weight on the intersections — or interactions — between different groups. Our polls, for instance, represent the right number of Democrats and the right number of white voters, but not necessarily the right number of white Democrats. While it’s possible to weight by interactions — we currently do so with race by education and age by gender — it’s challenging or impossible to do at scale.
The new weighting changes are intended to advance the Times/Siena poll on both fronts. Together, they offer a framework for leveraging the rich data available on the voter file, while weighting the poll to more thoroughly represent the population — even beyond merely assuring that different groups represent their proper share of the population.
More details follow if you’re interested, but be warned: A 10 out of 10 “wonk warning” is in effect for the remainder of this newsletter.
Support score
The first change is we’re weighting on a new variable: what we call a support score, which is an estimate of the likelihood that every registered voter supported Mr. Trump or Ms. Harris in the last election.
If you’re a longtime reader, you’ve probably read analyses based on these scores, even if we didn’t describe them as such. When we wrote last year that “Ms. Harris would have won only 72 percent of the registered Democrats who stayed home, according to estimates based on New York Times/Siena College data,” it was based on finding that, on average, nonvoting registered Democrats had an average Harris support score of 72.
These support scores aren’t magic. If you’re a white independent in a swing precinct, the odds are that we don’t have any extraordinary insight into your political preferences. But the scores can assimilate all the information that we have on voters. The estimates are based on how hundreds or even thousands of traits predict whether someone supported Mr. Trump or Ms. Harris in Times/Siena polling. These traits range from obviously political factors (party, turnout, campaign contributions) to whether the registrant is named Colton and has a record of owning a Ford F-350 (in which case they might well be more likely to be a Trump voter, even if they’re a white independent in a swing precinct).
This is where the support score can be useful for polling. While a poll can’t weight on dozens of variables, the support score lets us pile a lot of information into a single measure. We can make sure the poll doesn’t overrepresent or underrepresent the “100 percent Harris” Democrats, even if we can never make sure the poll has all of the right number of Prius drivers, Democratic campaign contributors, people with Democratic spouses, and so on.
It’s important to note that the support score doesn’t obviate the need to weight on more traditional categories. It still matters that the poll represents the population as much as possible. A pre-election poll in 2024 that hypothetically looked perfect based on the support score might have still fared disastrously if it underrepresented the relatively young, disaffected and nonwhite voters who swung to Mr. Trump.
Energy balancing
There’s a second change: We’re weighting the survey using a novel statistical technique called energy balancing.
This is a very different way to weight a survey. If the traditional survey weighting techniques think in terms of groups and ask, “Does this poll have the right number of white voters,” energy balancing thinks in terms of individuals: “How many people in the population are like this survey respondent?”
At risk of oversimplifying, energy balancing compares each individual with every member of the population. This comparison is based on the characteristics used in weighting: A white and Democratic survey respondent would be considered identical to a white and Democratic member of the population (notice that this naturally incorporates the intersection of categories; the usual weighting techniques generally ensure that each group is properly represented, but not two characteristics together). Energy balancing finds weights that ensure the population and the sample are as similar as possible across all measures.
What does this mean in practice for a political poll? In short, it attempts to obtain a much deeper match between the sample and the population, including correcting imbalances that would be extremely difficult to identify otherwise.
Let’s take an example: Imagine that a poll was skewed by a dozen young and less educated rural white voters who, it turns out, were progressives living in heavily Democratic precincts. Maybe they’re students at Oberlin College in Ohio and they don’t yet have their college degrees.
With traditional weighting techniques, these respondents might not get down-weighted at all. It would come down to whether their demographic groups — young, rural, no degree — were overrepresented in the sample. Even if they were, it probably wouldn’t be by much. It’s just as likely that they would actually receive more weight, as voters without a college degree are usually underrepresented and up-weighted.
With the new weighting procedure, this problem would be handled very differently. For one, the support score would rate them as highly likely to be Harris supporters, as their college town is exceptionally Democratic.
For another, energy balancing would compare these respondents to each member of the population and note that these voters are pretty atypical. There aren’t really any independent rural white voters without degrees who are modeled as highly likely to support Ms. Harris. As such, energy balancing would conclude that the poll would be more representative if they were substantially down-weighted — perhaps even to the point of elimination.
Again, none of this is a magical solution. In back testing, the improvements might be worth only a percentage point. The Oberlin student scenario, while fun, must not be that common. This suggests that much of what’s wrong with polling — especially regarding 2020 — runs even deeper than what these sophisticated methods can address given the data at hand.
But while the gains might be modest, energy balancing did nonetheless help reduce the underestimation of Mr. Trump when it was tested on prior Times/Siena polls. And one secondary advantage to this framework is that it does offer opportunity for growth. To this point, we haven’t had a reason to perfect our support scores for a purpose like this, and there are opportunities to do so. And after a decade without always having obvious paths forward, it’s nice to be assured of chances to improve.
There will be more details on this approach in the methodology statement that accompanies the results of our Times/Siena polls this week.
The post The Big Changes Coming to the Times/Siena Poll appeared first on New York Times.




