Forecast error The UK general election

elections Forecast error The UK general election Pollsters expected a hung parliament, but UK voters instead returned a small Conservative majority. Timothy Martyn Hill reviews the predictions and the errors that were made 10 2015 The Royal Statistical Society

Image: Oli Scarff/Getty Images News/Thinkstock On the morning of 7 May 2015, the day of the UK general election, it looked like the country was heading for a hung parliament. The two largest parties, the Conservatives and Labour, were neck and neck in the polls, and several long days of backroom deal-making looked likely in order to hash together another coalition government. By 10 p.m. that day, everything had changed. The exit poll funded by UK broadcasters was predicting that the Conservatives would be the largest party. By the morning of 8 May, it was confirmed: the Conservatives would form a majority government with David Cameron as prime minister. Including the House of Commons speaker, the Conservatives (Con) had won 331 seats. Labour (Lab) returned only 232 MPs. The Liberal Democrats (Lib), who had been coalition partners in the last government, saw their number of seats cut to just 8. All other parties accounted for the remaining 79 seats, including the 56 won by the Scottish National Party. This surprise result has been chalked up to a failure of the pollsters to accurately gauge voting intentions. But the pollsters were not alone. All other sources of prediction missed the mark. Just how inaccurate were they? This article sets out to answer that question, by analysing the performance of pollsters, seat and vote modellers, and betting firms from 2010 all the way up to election day 2015. Measuring error We measure the inaccuracy of a prediction by calculating the average difference between the predicted party levels and the actual result. We call this inaccuracy the mean absolute error (MAE). An accurate prediction will have an MAE of 0%, while a wholly inaccurate prediction will have an MAE of 100%. Calculating the MAE for one prediction is simple. An example is given in Table 1 (page 12). But between 2010 and 2015 there were hundreds of academic predictions, thousands of polls and at least 3000 betting odds. We calculated the MAE for them all. Tracking prediction performance over time has an inherent problem: predictors change constantly, both in their results and in their methods. Modellers are inconsistent: they refine their model or use new ones as time progresses, further checks are made, or party fortunes fluctuate, and the model at polling day 2015 will not have been the same as it was in 2010. Pollsters also exhibit this behaviour, changing their methodology as the election approaches or to cope with changes in the voting landscape, such as the rise of the nationalist UK Independent Party (UKIP). Predictions, meanwhile, are incompatible: they may be made for three, four, five, or more parties, and their groupings may not match each other. Predictions are also ephemeral: new predictions may replace previous ones or are published on dynamic websites that differ from day to day and leave no recoverable trace of their previous incarnations. The surprise result has been chalked up to a failure of the pollsters to accurately gauge voting intentions. But the pollsters were not alone. All other sources of prediction missed the mark Finally, predictions are imprecise: the movement to mobile computing has seen the rise of graphical output, with data displayed as graphs instead of data points, making it hard to recover the underlying numbers. As for the organisations making the predictions, they are abundant: during the 2010 2015 parliament one could choose from at least 13 pollsters, at least 24 bookies, at least 10 academic models, at least one bookie modeller and at least one City analyst. The predictors are also asynchronous: some pollsters publish daily, others publish weekly or monthly, betting odds can change at any time, and so on. Meeting the challenges We cannot compensate for changes in prediction methodology. But we can deal with the other issues as follows: Incompatible. To enable comparisons, we will express all predictions in a fourparty-forced format: predictions will be expressed as Con/Lab/Lib/Other. The four-party-forced format also compensates for the tendency of MAE to increase as the number of parties predicted increases. Ephemeral. Overwritten, dynamic and replaced predictions can be reconstructed by trawling through the Twitter feed of the authors, or retrieved using archive.org. But Twitter feeds may not include the whole prediction and archive.org does not archive every change. Consequently, we will not consider predictors that cannot be reasonably reconstructed for at least a year prior to the election. Imprecise. Predictions expressed graphically without an associated set of numbers will be ignored. Asynchronous. Instead of collating all predictions, for each predictor we will take the latest prediction on each Friday. Abundant. For each category of predictor we will limit our selection to those predictors that can be reliably reconstructed (to a maximum of five per category, chosen according to size and age of dataset) and fit the fourparty-forced format referenced above. So, how did our predictors behave over the course of the last parliament? Opinion polls The television series The West Wing had an episode called Lies, Damn Lies and Statistics which dealt with a tense three-day wait for a poll. These days things have changed and polls are published far more often. By 7 May 2015 we had captured approximately 2000 distinct polls over the 2010 2015 parliament. We chose, as our sources, YouGov, Populus, Opinium, ComRes, and ICM the five most frequent pollsters extending over a year prior to polling day. Each poll from those pollsters was converted to four numbers via four-partyforced. Those four numbers were compared to the election result in Great Britain (i.e., excluding Northern Ireland) and the MAE for each poll was calculated as in Table 1. That 11

Table 1. Example calculation for mean absolute error Party Prediction calculation was repeated for every poll from each of the chosen pollsters. The resulting MAEs are given in Figure 1. The MAE starts high and descends over time. This is consistent with the theory of swingback, which describes the propensity of voters to veer away from a government at the outset of a parliament, only to swing back at its end. Swingback makes it difficult to distinguish between a poll that accurately measures a varying intention, and a poll that inaccurately measures a fixed one. As Figure 1 shows, there is no obvious best or worse poll, which supports the concept of herding described by Nate Silver, editor-in-chief of FiveThirtyEight, as the tendency of polling firms to produce results that closely match 2015 GB result % Proportion % Proportion one another, especially toward the end of a campaign. Modellers Absolute error Con 40% 0.4 37.8% 0.378 0.022 Lab 30% 0.3 31.2% 0.312 0.012 Lib 20% 0.2 8.1% 0.081 0.119 Other 10% 0.1 22.9% 0.229 0.129 Total absolute error 0.022 + 0.012 + 0.119 + 0.129 = 0.282 Mean absolute error 0.282 / 4 = 0.0705 In the world of pollsters, monthly or even daily publishing is unremarkable, but this is rare with modellers academics and other people who build predictive models and attempt to turn voting intention data into parliamentary seat projections. Our need to have regular predictions for at least a year prior to the election left us with just three modellers to choose from: Fisher of Oxford/ Elections Etc, Baxter of Electoral Calculus, and Ford/Jennings/Pickup/Wlezien of Polling Observatory. Unlike the pollsters, prior predictions are not always available on modeller websites, so some data had to be reconstructed from Twitter, archive.org, other websites, and by contacting the original authors. As with the pollsters, each prediction from the above three modellers was converted to four numbers via four-party-forced. Those four numbers were compared to the election result (votes or seats as appropriate), and the MAE for that prediction was calculated. The resultant MAEs are given in Figure 2 and, as can be seen, the vote modellers (v) perform noticeably better than the seat modellers (s), although both exhibit herding. Betting odds Unlike modellers and pollsters, bookies publish odds from moment to moment. The oddschecker.com website captures single odds, but for technical reasons cannot capture all of them. Nevertheless by 7 May 2015 it had captured over 3000 single odds on any one party winning an overall majority. The five most captured bookies extending over a year prior to polling day were Betfair Exchange, SpreadEx, Bet365, William Hill and Ladbrokes, and those were the ones we initially chose. But the SpreadEx data were difficult to handle, 10% 9% 8% 7% 6% 5% 4% 3% 2% 1% 0% 2010-05-07 2011-05-07 2012-05-07 2013-05-07 2014-05-07 2015-05-07 YouGov Populus Opinium ComRes ICM Average Figure 1. MAEs for ComRes, ICM, Opinium, Populus, and YouGov polls since 14 May 2010. X-axis, Fridays; Y-axis, MAE between latest poll that Friday and final result. The solid line represents the average 12

14% 12% 10% 8% 6% 4% 2% 0% 2010-05-07 2011-05-07 2012-05-07 2013-05-07 2014-05-07 2015-05-07 Baxter_(s) Fisher_(v) Fisher_(s) Polling_Obs_(v) Avg_(s) Avg_(v) Figure 2. MAEs for Fisher, Baxter and Polling Observatory predictions since 14 May 2010. X-axis, Fridays; Y-axis, MAE between latest prediction that Friday and final result. The heavy red line is the average for seats, the heavy blue line for votes so we substituted Coral for SpreadEx and continued. Betting odds are expressed differently than polls and modelling predictions, but a given odds can be converted to an implied probability: for example, 99/1 = 1/(1+99) = 1/100 = 0.01. But note that implied probability is not the same as actual probability. The implied probabilities for the full book (the odds offered by a bookie on all outcomes of a given event) will not add up to 1 they will exceed it. This excess or overround represents the bookie s profit margin. The size of the overround differs from bet to bet and from bookie to bookie. So if you do not know the full book, you can calculate the implied probability but not the actual probability. To fairly calculate the MAE of a bookie we must know the full book, so we reconstructed it from the captured single odds. To do this we assume that a single odds captured from a bookie in the past is still available unless replaced by a new one, and we further assume that the first published odds for a party is always captured. Each reconstructed full book on an overall majority from those bookies was Table 2. Example calculation for converting betting odds into probabilities, before calculating the mean absolute error Party Bet365 full book 15 March 2015 Odds of overall majority Implied probability Actual probability (overround removed) 2015 result Actual probability Absolute error No overall majority 1/5 0.833 0.770 0 0.770 Con 9/2 0.182 0.168 1 0.832 Lab 16/1 0.059 0.055 0 0.055 Any other party 125/1 0.008 0.007 0 0.007 Total absolute error 0.77 + 0.832 + 0.055 + 0.007 = 1.664 Mean absolute error 1.664 / 4 = 0.416 converted to four numbers via four-partyforced and the overround was removed. Those four numbers were compared to the election result and the MAE for that full book was calculated. An example calculation is shown in Table 2. That calculation was repeated for every reconstructed full book for each of the chosen bookies. The resultant MAEs are given in Figure 3 (page 14). The average MAE for overall majority starts high, remains high, and gets worse. All bookies exhibited herding. Time constraints prevented examination of betting odds for parties winning most seats, which may have performed better than the odds on an overall majority. And the winner is For those who sat up to watch the UK general election results pour in, it goes without saying that the exit poll, jointly funded by the UK s major broadcasters, was the most reliable forecast of the past five years. At 10 p.m. on election night, it predicted 316 seats for the Conservatives, 239 for Labour, 10 for the Liberal Democrats, and 85 for all other parties an MAE of 1.15%. However, an exit poll has limited predictive value. It is published only after polls have closed and mere hours before the true vote is known. 13

50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 2010-05-07 2011-05-07 2012-05-07 2013-05-07 2014-05-07 2015-05-07 Betfair_Ex Coral Bet365 William_Hill Ladbrokes Average Figure 3. MAEs for Betfair Ex, Coral, Bet365, William Hill and Ladbrokes overall majority odds since 14 May 2010. X-axis, Fridays; Y-axis, MAE between result and latest data that Friday. The solid line is the average For those looking for an earlier read on the way the political winds are blowing, where should they turn for the most reliable predictions? Table 3 compares the MAE of the four categories of prediction at different stages of the pre-election period, up to 12 months out. What constitutes a successful prediction? At the 1992 election, pollsters predicted a hung parliament, only for voters to elect a Conservative majority. The inaccuracy of the polls then was 2.75% and that result was labelled a debacle, so an MAE of 2.75% is clearly considered to be unacceptable. If we take a successful prediction to be less than 2% MAE, then only the vote modellers hit that, and then only two months before voting took place. So in terms of a long-range forecast, there are no real winners. There is a question that needs answering here: Is it wise to survey voting intentions five years before an election? Even three months out, the polls in our sample had an MAE above the unacceptable 1992 figure. If we were to extend our performance analysis to include all pollsters, not just those in our sample, we Polls are a snapshot in time, not a prediction. Nevertheless, there was something off with the picture they were presenting on election day see that their final predictions for the 2015 election had an MAE of 2.25%. This, as in 1992, has also been labelled a debacle and to add to the situation, the modellers have been quick to point out that any inaccuracy in their model predictions can be attributed Table 3. MAE for prediction categories at different points in the pre-election period to the inaccuracy of the polls. And poll results also influence betting odds. Polls are a snapshot in time, not a prediction. Nevertheless, there was something off with the picture they were presenting on election day. Announcing an enquiry on 8 May, the British Polling Council said: The final opinion polls were clearly not as accurate as we would like, and the fact that all the pollsters underestimated the Conservative lead over Labour suggests that the methods that were used should be subject to careful, independent investigation. There s hope that any problems will be fixed in time for the 2020 election. But it may simply be too early to tell. Timothy Martyn Hill is a statistician who used to work for the Office for National Statistics and now works in the private sector A version of this article with inline citations and appendices is available online at significancemagazine.com/election2015 Months before election 12 9 6 3 2 1 Date (Friday) 9 May 2014 8 August 2014 7 November 2014 6 February 2015 6 March 2015 3 April 2015 Seat modellers 7.54% 7.69% 6.04% 5.23% 5.27% 3.85% Vote modellers 3.20% 2.44% 2.30% 2.00% 1.88% 1.65% Full books of odds of overall majority 38.57% 37.11% 41.03% 41.48% 42.18% 42.39% Polls 3.99% 3.40% 3.53% 3.66% 2.13% 2.11% Image: Dan Kitwood/Getty Images News/Thinkstock 14