Forecasting the 2016 EU Referendum with Big Data: Remain to win, in spite of Cameron Ronald MacDonald, University of Glasgow and Xuxin Mao, UCL This report summarises predictions about the outcome of the upcoming EU referendum using what we call the Big Data framework. The approach supplements daily polling data with information about what relevant online searches are being made by voters in the run up to an election or referendum. Our starting point is that what people search for online may be a better indication of polling intention than what they actually tell pollsters, or at least give value added to that intention. Evidence from our two previous studies has borne this out. The underlying statistical model we use for computational purposes has the added advantage that it can pick up momentum effects in the data and momentum reversals; it can also identify potential factors that influence tactical voting. Key findings Momentum for a Remain victory began before the murder of Jo Cox and UKIP s controversial immigration poster. Voter decision making is not being determined by issues such as security, sovereignty or the cost of EU membership The Leave campaign benefits - and only benefits - from the immigration issue Interventions by David Cameron are having a negative effect on the Remain campaign and a positive impact on Leave The predictions contained here are based on text mining Google searches up to 18 th June 2016, a period that includes the tragic death of Jo Cox and the controversial UKIP immigration poster. For the polling data we were able to use daily voting intention information between 15 th April, when the campaign officially started, and 18 th June. The polling information came from ORB, Survation and YouGov, the only three mainstream pollsters that provide regular poll updates. The momentum is now clearly in favour of Remain - and did not originate with the Jo Cox tragedy or UKIP poster In Figure 1, below, we portray voting intention data based on YouGov, Survation and ORB polls and note that after the campaign officially started on 15 th April, Remain enjoyed a comfortable majority of 4-7 until late May 2016 when a Leave momentum kicks in. But this momentum has stalled since 12 th June and indeed has now reversed. Together with our findings of a reduced number of undecided voters swinging towards Leave, the momentum is now clearly in favour of Remain. Our statistical analysis detects a shift in favour of Remain prior to 16 th June and so, contrary to opinions expressed by some pollsters and journalists, the momentum change does not originate with either the Jo Cox tragedy or the Nigel Farage poster row.
Figure 1 Voting Intention Data based on YouGov, Survation and ORB 50 37.5 25 Remain Leave Undecided 12.5 0 21-Feb-20 23-Mar-20 23-Apr-20 24-May-20 24-Jun-20 Note: The units are in percent. The first part of the TRUST approach relies on the text mining a very large data-base of newspapers in print, along with their web based counterparts, using sophisticated algorithms to represent the topics that will motivate voters (this is discussed in some detail in our previous work - see panel, below). The results are summarised in Table 1 for various periods of the campaign, with the key themes summarised in the last row. Table 1: Text Mined Topics on the EU Referendum between during the official EU Referendum Campaign Period 15 April-14 May 2016 economy, market, trade, Cameron, Osborne, Obama 15-21 May 2016 market, business, economic price, bill, Cameron, Johnson 22-28 May 2016 Market, trade, economy, treasury, claim, immigration,, Cameron, Johnson 29 May-4 June 2016 trade, market, economy, immigration, Cameron, Labour 5-11 June 2016 market, trade, economy, immigration, Cameron, Johnson, Labour 12-18 June 2016 Key Themes Trade, work, market, bank, price, Pound, business, Cameron, Labour UK economy, EU trade, Single Market, EU immigration, David Cameron, Boris Johnson, Labour Party There are several noteworthy points here. Firstly, key words/ names such as Corbyn, UKIP/Farage or SNP rarely show up in our algorithmic searches of the newspapers during the EU campaign period. This suggests that the EU referendum is more of an internal Conservative matter since the key names Cameron and Johnson constantly come up as
motivational keywords. Secondly, economic issues (trade, economy, the Single Market, etc) dominate the referendum themes. Thirdly, immigration only emerges as an issue from 22 May to 11 June, the same period when the Leave side were generating momentum in the polls and Remain was trailing in the polls. Fourthly, issues such as security, the constitution, sovereignty, the NHS, and the cost of EU membership and the potential issue of an EU army, claimed by many to be important issues, are not directly related to people's decision making. Despite the coverage given to the killing of Jo Cox and the controversy aroused by the UKIP poster of a queue of refugees launched by Nigel Farage on 16 th June, there was no evidence in our data of these being motivational factors, although we cannot comment on whether these factors affected intentions after 18 th June. Issues such as security, sovereignty, the NHS and the cost of EU membership are not directly related to voter s decision making Using the text-mined topics noted in Table 1 we then construct Big Data volume indictors based on the key EU themes. Combined with daily voting intention information, we conducted a statistical analysis that is used to predict the outcome of the referendum. Before using it in that way, however, some of the determinants of how people will vote that our statistical analysis generates are interesting and these are summarised in Table 2 (essentially these terms are significant coefficients, or weights, on the relevant term; so, for example, the issue of the UK economy has a statistically positive effect of 0.01 per cent). From Table 2 the first noteworthy finding is a bigger swing tendency towards Remain than Leave: when the ratio of undecided voters is reduced by one percent, there is a 0.5% increase in the Remain vote and a lesser 0.43% increase in Leave s vote. This is due perhaps to a status quo bias, caused by people s dislike of change and uncertainty. Second, while general economy related arguments help the Remain side and shows no effects on Leave, potential voters do not appear to be interested in the specifics of this in terms of EU trade issues or indeed the Single Market. Third, Table 2 shows that the Leave camp benefits, and only benefits, from immigration issues, while the Remain camp did badly throughout the campaign on this topic. Fourth, it is striking to find that Boris Johnson does not appear with statistical significance on either side of the debate while David Cameron s interventions appear to have a negative effect on the Remain vote and a What is the Big Data framework? The Big Data Topic Retrieved, Uncovered and Structurally Tested (TRUST) framework was previously used for predicting the outcomes of the Scottish referendum and the 2015 General Election Supplements daily polling data with information on relevant online searches by voters Uses a statistical model that identifies momentum effects and reversals, and potential factors that influence tactical voting More detail here
positive impact on Leave. In terms of other political figures, Jeremy Corbyn has been accused of having a lacklustre performance throughout the referendum and that is borne out here, as the Labour party does not have any significant loadings in the statistical model. David Cameron s interventions appear to have a negative effect on the Remain vote and a positive impact on Leave Table 2: Factors Influencing Remain and Leave voters, 15 April -18 June 2016 Remain Voting Intention Leave Voting Intention Undecided Voter -0.50-0.43 UK Economy 0.01 EU Trade Single Market EU Immigration -0.06 0.07 David Cameron -0.07 0.05 Boris Johnson Labour Party Finally, we use our statistical model to calculate the predicted outcomes for the referendum, reported in Table 3, and they show that remain will have a clear win in the referendum with a mean poll of 48% against Leave s 44%. Allowing for our calculated swing ratios, noted in the second row of the Table, confirms that even taking account of undecided voters Leave cannot win the referendum as shown in the Final Rate Range and Mean rate rows. Table 3: Projecting Referendum Voting Results Remain Leave Mean Voting Intention Rate 48.4% 45% Swing votes Range 0-3.4% 0-2.9% Final Rate Range 50.1%-53.6% 46.4-49.7% Final Mean Rate 51.9% 48.1%