Methodology. 1 State benchmarks are from the American Community Survey Three Year averages

The Choice is Yours Comparing Alternative Likely Voter Models within Probability and Non-Probability Samples By Robert Benford, Randall K Thomas, Jennifer Agiesta, Emily Swanson Likely voter models often improve election predictions for both voter turnout and vote choice. Successful modeling typically combines several measures to estimate registered voters, voter turnout, and vote outcome. A number of factors have made likely voter modeling more difficult, including the broader utilization of early voting and changes in sampling and data collection mode. In October 2013 the AP-GfK Poll moved from dualframe RDD telephone surveys to an online protocol using KnowledgePanel, which is the largest U.S. probability-based online panel, enabling rapid and detailed national polling. Though KnowledgePanel can be used for national projections, one of the key interests is prediction of voting outcomes by states where KnowledgePanel can fall short. As such, GfK and The Associated Press (AP) have examined how larger demographically balanced nonprobability (opt-in) samples could supplement probability-based (KnowledgePanel) through a calibration methodology. To study this, we selected two states with diverse populations - one in the Midwest often favoring Democrats (Illinois) and one in the South often favoring Republicans (Georgia). Each state had both Senatorial and Gubernatorial races on the line. In each state two parallel surveys with about 800 KnowledgePanel and 1,600 opt-in respondents were administered immediately prior to the elections. Respondents in each sample were randomly assigned to one of two alternative sets of likely voter items - either the AP-GfK Poll s standard likely voter set or an alternate set driven mainly by stated intention to vote. We report estimates of registered voters, turnout, and election results by likely voter model, how that model can be optimized, and comparisons of estimates separately from KnowledgePanel and opt-in samples. While both models predicted well, the revised model used fewer variables. In addition, calibrating opt-in samples to probability samples can improve the utility of opt-in polling samples. Introduction As with the entire market and survey research industry, polling faces challenges that continue to erode fully probabilistic, high response rate methods that have historically produced quality estimates with calculable precision. Probability-based samples of all varieties are fraught with unknown levels of imperfection due to coverage error and non-response error in what are now often response rates in the low teens or even single-digits. Attempts to overcome these sources of potential error come at a high cost and extensive effort, which rarely eradicate these potential errors. Online samples are one cost effective approach, where the cost-quality tradeoffs are a main reason motivating survey and market researchers to experiment with the use of online samples. Essentially online samples come in two varieties, opt-in or probability based. Opt-in samples can further be thought of as community-based (mostly panels) or intercept approaches (mostly river) with a great deal of variation by sample provider in these approaches. GfK uses both types of samples depending on a project s budget and fitness of use. When an opt-in sample is selected as the best match for a survey, GfK uses routing technology provided by Fulcrum to manage a large number of opt-in providers under the theory that more is better and robustness can overcome many issues. GfK s KnowledgePanel is a probability-based online sample with over 50,000 members and is primarily used when fitness of use mandates this. However, probabilitybased samples by nature are more expensive to recruit, empanel, and maintain leading to frames

of higher cost and of a moderated size nationally. At times, combinations of these types of sample are indicated because low incidence populations or other constraints on sample such as geography make it unfeasible to use one type of sample or the other on its own. GfK s national polling for the Associated Press is conducted using KnowledgePanel. The AP s survey standards allow publication of online polls only if they are conducted using panels selected using probability-based methods. Regardless of online sample source, surveys attempting to represent narrower geographic locations, such as statewide surveys, can limit the amount of sample available for analysis. For example, a client such as the AP might want to conduct a political survey in a specific state to assess the horse race in a statewide election or tell the story of political issues in that state. Often, subgroup analyses by party, sex or race are important and thus sample sizes often must be greater than one single online source can provide. KnowledgePanel covers every state in the U.S., but proportionally leaves some states with less than desirable case counts for surveys with a smaller geographic coverage area. Opt-in samples can be a cost-effective way to supplement KnowledgePanel, in particular when and if they align well through weighting or calibration techniques. This leads to the question of quality by sample source and how the two can work together to produce quality survey estimates. Methodology To address these questions GfK, in coordination with the Associated Press, carried out two surveys, one in Georgia and one in Illinois, where essentially the entire KnowledgePanel sample in each state was used. These samples were supplemented with opt-in panel sources via the Fulcrum router, managed by GfK. To mimic the population in each state, an interlocking quota design matched survey respondents to state distribution of sex by age (18-29, 30-49, 50-64, 65+) by race (Black/AA, All Other) by educational attainment (Some college or less, College grad or higher) or a 32-cell design 1. These demographics were also included for KnowledgePanel respondents in each state. In addition, opt-in respondents were asked five early adopter questions that are already asked and included with the KnowledgePanel sample. Three weights were computed, adjusting only KnowledgePanel, adjusting only opt-in, and calibrating opt-in to KnowledgePanel via demographics and early adopter questions. This research also assessed two different likely voter models in each state. Cases from each sample source were randomly assigned to the standard likely voter model versus a more direct stated intention to vote method. The stated intention model is based on registered voters and includes those who already voted or say they will definitely vote and those who say the probably will vote and say they always or nearly always vote in elections. The stated intention model is based on three survey questions. The standard model is also based on registered voters and is a complex set of definitions that includes past vote frequency, past voting behavior, whether they have already voted, likelihood to vote, interest in news about the election and knowing where to vote. This model requires eight survey questions based on four different patterns of survey answers to define a likely voter. This model is very similar to what others in the polling sector use. Within each sample type, sample was randomly assigned to a likely voter model. Sample sizes 1 State benchmarks are from the American Community Survey Three Year averages 2011-1013.

for each model and sample by state are shown in Table 1. To control for consistency between these two models, specific to this research, the above weighting was completed within model. Prior to analysis, weighted data were compared to assess the outcome of the random assignment and ensure important covariates of election outcomes such as party identification were equitable. In Illinois the demographic weighted outcomes were not equitable between models on party identification. It is not GfK s or AP s standard practice to include party identification in weighting given variability known for this variable. To make the models equitable, the initial weighted estimate of party identification was used as an additional weighting variable within each model. Table 1 KnowledgePanel Opt-In Model Standard Intent Standard Intent GA 333 321 800 759 IL 494 523 875 877 Results For analytic purposes there are two states, each with two models, each comprised of three types of sample KnowledgePanel only, opt-in only, and both combined through calibration. Statistical significance is determined at the 95% confidence level using a t-test of proportions and the effective sample sizes to account for variability due to weights. It should also be noted that while testing of estimates is against parameters, it is also meaningful to assess the absolute differences in estimates by sample type and model. Throughout the findings there are essentially thirty-two estimates across the two types of sample. Each are discussed then summarized. Registered Voters As a prerequisite to vote in most states in the U.S., including both Georgia and Illinois, one must be registered to vote, which makes voter registration the root of most likely voter models. Table 2 shows estimates for each sample type. KnowledgePanel estimates for registered voters across models and states are always within statistical tolerance. Opt-in estimates are the most distant from actual percentages of registered voters. Interestingly, in Illinois, the calibrated estimate is closer to actual registered voters than KnowledgePanel. Table 2: Registered to Vote Actual KnowledgePanel Opt-in Calibrated GA Reg Voter 77.0 77.0 81.8** 80.2** IL Reg Voter 83.3 81.3 85.5** 84.0 **Highlighted estimate significantly different from parameter at 95% confidence. Turnout The essence of any likely voter model is to predict the population that will actually cast votes on Election Day. Turnout, as operationalized as those registered voters modeled as likely to vote, is

an estimate that is nearly always overstated by likely voter models. All estimates are statistically significantly higher than actual turnout among registered voters (Table 3). This could be because those who participate in political surveys are more likely to be interested in politics to begin with, because of overstatement of vote intention or because of some combination of the two. KnowledgePanel was closest to actual turnout in three out of four cases, followed by calibrated estimates, then opt-in. In one case, the Georgia standard model, the calibrated and opt-in sample were closer than KnowledgePanel. Overall, the standard model overstated turnout less than the stated intention model, due to the additional minutia asked in the model to winnow down the likely voter pool. However, just because the turnout estimate is closer does not mean the right mix of voters who turn out is predicted. Table 3: Turnout Standard Model Stated Intent Model Actual KnowledgePanel Opt-in Calibrated KnowledgePanel Opt-in Calibrated GA Turnout 50.0 68.1** 64.3** 64.3** 73.9** 78.1** 74.9** IL Turnout 49.2 64.0** 69.2** 66.7** 68.9** 77.2** 74.7** *Highlighted estimate significantly different from parameter at 95% confidence. Election Results With the exception of the Illinois Governor results in the standard model, KnowledgePanel was always directionally correct in estimating the elections tested in the surveys and never significantly different from the actual results for each candidate (Table 4). The calibrated results performed similarly, but missed the Illinois Governor s race in both models. Opt-in sample missed the Illinois Governor s race in both models as well as missing the Georgia Senate race in the standard likely voter model. Table 4: Election Results Standard Model Stated Intent Model Actual KnowledgePanel Opt-in Calibrated KnowledgePanel Opt-in Calibrated IL Senate Durbin 53.5 52.2 56.4 55.4 53.5 53.9 54.6 Oberweis 42.7 45.6 38.5 39.8 42.6 39.2 38.8 IL Governor Quinn 46.3 48.7 49.7 49.4 44.1 48.7 47.6 Rauner 50.3 48.1 44.5** 45.8** 50.2 45.9** 47.0 GA Senate Nunn 45.2 45.2 46.5 45.5 42.2 44.3 43.8 Perdue 52.9 52.0 45.5** 47.2 49.7 51.0 50.2 GA Governor Carter 44.9 38.9 44.4 42.3 41.5 42.9 41.9 Deal 52.8 54.4 47.8 49.8 49.8 50.9 50.1 *Democratic candidate always shown first, third party candidate not shown. ** Highlighted estimate significantly different from parameter at 95% confidence.

Table 5 shows the predicted margin of victory as the percentage for the Democratic candidate minus the percentage for the Republican candidate. That is, a positive number is the margin in favor of the Democrat and a negative in favor of the Republican. This margin is often critical to calling a race or predicting a winner based on survey estimates. Again with the exception of the Illinois Governor s race in the standard likely voter model, which would have been deemed too close to call, surveys drawn from KnowledgePanel would likely have resulted in directionally correct race calls. The calibrated sample was wrong in both models for the Illinois Governor s race and too close to call in the Georgia Senate race in the standard model but directionally correct. Opt-in sample estimates were wrong in both models for the Illinois Governor s race and wrong for the Georgia Senate race in standard model. Table 5: Dem-Rep Margin Standard Model Stated Intent Model Actual KnowledgePanel Opt-in Calibrated KnowledgePanel Opt-in Calibrated IL Senate 10.8 6.6 17.9** 15.6** 10.9 14.7** 15.7** IL Governor -4.0 0.6 5.2** 3.6** -6.1 2.8** 0.5** GA Senate -7.7-6.8 1.0** -1.6** -7.5-6.6-6.4 GA Governor -7.9-15.5** -3.4-7.4** -8.3-8.0-8.2 **Highlighted estimate significantly different from parameter at 95% confidence. To assess the surveys ability to generate a sample with demographic traits which match those of the overall electorate, we compared the survey results by model with the National Election Pool exit poll estimates of sex, race, Hispanic origin, age and education level. We were also able to look at the actual share of the electorate by gender and race based on figures released by the Georgia Secretary of State. Tables 6 and 7 show weighted demographics among likely voters in each model in each state broken down by sample source. Table 6: Georgia Demographic Comparison KP Only KP + Opt-in Standard Stated Standard Stated Exit poll Secretary of state Men 44 49 47 48 48 45 Women 56 51 53 52 52 55 18-29 6 15 13 15 10 NA 30-44 35 35 35 36 27 NA 45-59 35 28 30 28 34 NA 60+ 25 22 22 21 29 NA White alone 58 66 64 68 65 64 Black alone 33 28 30 27 29 29 Hispanic origin 8 2 6 4 4 1

HS or less 39 39 34 38 18 NA Some college 32 32 34 32 28 NA College grad 29 29 32 31 54 NA Table 7: Illinois Demographic Comparison KP Only KP + Opt-in Standard Stated Standard Stated Exit poll Men 48 43 49 47 50 Women 52 57 51 53 50 18-29 8 13 13 14 11 30-44 27 32 32 32 23 45-59 36 31 31 31 37 60+ 28 23 25 23 29 White alone 75 76 76 77 75 Black alone 19 13 16 14 16 Hispanic origin 4 6 10 9 6 HS or less 31 36 33 32 19 Some college 33 31 31 33 30 College grad 37 34 36 35 51 Likely Voter Models Estimates of candidate vote percentage show very few significant differences by sample within model. From the perspective of the margin of victory, there are more significant differences, but a good deal of directional consistency. That is, in a majority of cases, the call would have been correct. Significance aside, comparing the models across estimates by sample, 70% of the time the stated intention model was closer to the actual results than the standard model. This suggests that the stated intention model (fewer questions) may work well as a substitute for the standard model (more questions). Conclusions It seems clear that probabilistic online samples such as KnowledgePanel are a better choice when budget and the number of panelists available make that choice feasible. When geographic or other constraints limit sample availability, then supplementing these samples with online opt-in samples can work well in estimating election outcomes. However, several details are important in doing so. Bayesian statisticians argue that knowledge of the posterior distribution can help align samples so that they are unbiased. This is not dissimilar to weighting, but extends beyond geodemographics. Care needs to be taken when opt-in samples are designed to not only mimic

the geodemographics but to attend to other dimensions in aligning samples to these posteriors. In this research attitudes towards early adoption are used, and this steers opt-in samples towards accuracy. What this suggests is that as research practices continue to change and evolve, standard or typically weighting practices will need to be more creative, aggressive, and often heroic in nature. These efforts will more likely than not come at the expense of greater variability due to weighting, but be deemed necessary for precision in estimates of populations. Last, when it comes to likely voter models tested here, results are inconclusive based on statistical significance. That is, there is no clear statistical winner. Even though the standard model gets closer to turnout among registered voters, the stated intention model performs equally well when election outcomes are estimated. Thus, given this outcome, one may opt to save questionnaire space and take the more direct stated intention approach. That coupled with appropriate weighting, even when opt-in samples are used alone or calibrated, can produce the reliable estimates necessary. The choice is yours.