The last election was another bad one for pollsters. The overall poll error in the Conservative lead was over 4pc, which incorrectly suggested that the Conservatives would win a sizeable majority. That was the second time in a row that the polls got the election result wrong.
But there was one notable exception. Late in the campaign, YouGov pioneered a new method of analysing polls which gave a very different result. The new method gave a final campaign prediction of 304 seats for the Conservatives. This was pretty close to the actual result of 318 seats, and correctly predicted that the Conservatives would not have a majority. YouGov called their method "multi-level regression and post-stratification", which baffled many people.
But what actually is this method? And does it really work, or were the YouGov results just accidentally right by lucky chance? The answers are that this method is similar to those successfully employed in other contexts and that it is very suitable to analysing polling data as well. It works pretty well in practice, not just for the most recent election but also for earlier British elections. Here are the details.
You can also learn how Electoral Calculus can run regression analysis on your own polling. It can give you greater insight without breaking your budget.
Electoral Calculus has built its own version of the political regression methodology. This is very similar in spirit to the YouGov approach, though some of tecnical details may be different, since YouGov have not published a detailed technical description. The basic idea is relatively new to political science, but is already well established in the fields of mathematical statistics and technology. Political sciencists use the tongue-wrenching name of "multi-level regression and post-stratification", but the technology companies prefer the more inviting terms of "machine learning" and "big data". Unassuming mathematicians just call it "regression". Whatever it is called, it is the same basic thing. Let's call it regression for now.
The regression works by taking a set of people and polling them to ask not just about their voting intention but also about other facts about themselves. These facts, which are known as "predictors", can include demographic characteristics such as age, gender, location, and education as well as political characteristics like their votes in previous elections and the political alignment of their constituency. So far, this is just like conventional polling.
But the next stage is different. Now we "regress" people's voting intention against their predictor variables. In other words, we estimate the statistical relationship between the various predictor variables and someone's voting intention. For example, we might find that younger people are more likely to vote Labour and Green and that older people are more likely to vote Conservative and UKIP. Or that someone in the East Midlands is more likely to vote Conservative than a similar person in London. These links are usually not surprising, but the important thing is that we can quantify each relationship in numerical terms.
Finally the regression data are applied to everyone in each constituency to estimate the individual probability that a particular person supports each party. This gives an idea of how many people in each constituency support each party, from which we can work out which party is likely to win that seat.
To test this method, Electoral Calculus took a large set of polls conducted during the general election campaign. The polls were fairly standard and typical of the campaign. Using the "classic" poll analysis, and focusing on the final campaign polls, they showed a strong Conservative lead over Labour, which would give an (innaccurate) prediction of the Conservatives on 370 seats.
But using the new regression analysis, the result was very different. The final campaign polls, combined with the regression method, predicted the Conservatives on 321 seats and Labour on 250 seats. This compares well to the actual result of 318 for the Conservatives and 262 for Labour. It is quite similar to the final YouGov regression prediction and correctly predicts a minority Conservative government.
The table above shows that both the regression approaches are much more accurate than the "classic" polling analysis. Both YouGov and Electoral Calculus regression methods produce good predictions which were consistent with the actual election result.
It is important to stress that the "Classic Polling" and the "Elect Calc Regression" columns were based on the same polling data. Both of these predictions used the same raw poll data, with identical respondents and responses. Only the analyses of the poll data were different. This suggests that the raw poll data is quite adequate and that this part of market research is working well.
If we now trust this new regression method, it can show us something else. Since we have polls throughout the last month of campaigning, we can estimate what the election result would have been if the election had taken place earlier.
|Poll period||CON||LAB||LIB||Regression-Predicted Outcome|
|Late April to early May 2017||391||189||4||Conservative majority of 132|
|Middle two weeks of May 2017||362||211||7||Conservative majority of 74|
|Last week of May 2017||310||264||5||Conservative minority|
|First week of June 2017||321||250||9||Conservative minority|
|General Election 8 June 2017||318||262||12||Conservative minority|
This table shows the predicted result, using the regression method, at four different points in time. The first period runs from the start of the campaign in late April to the first week of May. If the election had been held around that time, then Theresa May would have enjoyed a landslide victory with a three-figure majority. The next period is for the following two weeks, all taken before the Manchester Arena bombing on 22 May. For this period as well, the Conservatives would have won with a comfortable majority.
After Manchester, the position changes dramatically. Polls from the last week of May show that the Conservatives have already lost their majority and Labour has gained around fifty seats. The final week of the campaign was relatively stable with no major moves.
This appears to be evidence that the Conservatives under Theresa May lost the election that was theirs to win. It was not wrong to call the election in April and the public was initially prepared to return the Conservatives with a large majority. But a series of campaign errors led to a marked decline in Conservative support from mid-May onwards. These errors are now widely known and include the poor manifesto, an over-focus on the PM herself, and an inability to articulate any optimistic purpose for the election. Only the Cambridge debate, which Corbyn attended but May didn't, seems to have made no difference as it came on 31 May, when the tide had already turned against the Conservatives.
The results from 2017 are impressive, but they do not prove that the regression technique will always work. It is necessary to look at how it would have performed in other elections to get a rounded view. To do this, Electoral Calculus looked at polls from the general elections in 2015 and 2010.
This was complicated by the limited availability of raw polling data, and also because that data did not include all the useful predictors which were used in 2017. So the predictions are expected to be more crude and less accurate, especially for the Lib Dems. However the results are still interesting.
In 2015, we remember that the pollsters had another very bad experience. The campaign polls suggested a Conservative lead of only 2pc and that the Conservatives and Labour would win a similar number of seats. A Labour/SNP alliance looked like a distinct possibility. In fact, the Conservative lead over Labour was over 6pc and the Conservatives had an overall majority of 12 seats. How does the regression method compare with that?
The overall prediction quality is much better than the classic polling methodology. The regression method correctly predicts that the Conservatives will win many more seats than Labour and has the SNP correct. The Lib Dems are probably too low because of the absence of some Lib Dem predictors from the raw poll data. The regression method didn't actually predict an overall Conservative majority, but neither did the BBC/ITV exit poll on election night.
In 2015, the regression method performed much better than classic polling methodology and would have correctly given the Conservatives as the largest party and likely to be in government.
The 2010 election saw Gordon Brown leading Labour, with David Cameron for the Conservatives, and Nick Clegg had a good campaign for the Liberal Democrats. But raw polling availability in 2010 was even more limited. Conventional polling worked well that year, and the regression method was fairly similar in accuracy.
The results show that the regression method performs acceptably. It is more accurate than classic polling for Labour and the Liberal Democrats, but less accurate for the Conservatives. And it mis-predicts that the Conservatives would get a small absolute majority. But the margin of error is higher than normal, due to the poor quality and quantity of poll data, and the results are broadly accurate in correctly predicting that the Conservatives would be in government.
Having created the Electoral Calculus version of the regression model, we can see three interesting things:
Try the regression method for yourself by asking it to Guess My Vote.
Return to Articles home page.