VoteCastr methodology

Similar documents
University of North Florida Public Opinion Research Lab

Survey Overview. Survey date = September 29 October 1, Sample Size = 780 likely voters. Margin of Error = ± 3.51% Confidence level = 95%

Illustrating voter behavior and sentiments of registered Muslim voters in the swing states of Florida, Michigan, Ohio, Pennsylvania, and Virginia.

1. A Republican edge in terms of self-described interest in the election. 2. Lower levels of self-described interest among younger and Latino

POLL: CLINTON MAINTAINS BIG LEAD OVER TRUMP IN BAY STATE. As early voting nears, Democrat holds 32-point advantage in presidential race

North Carolina Races Tighten as Election Day Approaches

*Embargoed Until Monday, Nov. 7 th at 7am EST* The 2016 Election: A Lead for Clinton with One Day to Go November 2-6, 2016

Survey Instrument. Florida

THE 2004 YOUTH VOTE MEDIA COVERAGE. Select Newspaper Reports and Commentary

NBC News/WSJ/Marist Poll

REGISTERED VOTERS October 30, 2016 October 13, 2016 Approve Disapprove Unsure 7 6 Total

PENNSYLVANIA: DEMOCRATS LEAD FOR BOTH PRESIDENT AND SENATE

The Cook Political Report / LSU Manship School Midterm Election Poll

PENNSYLVANIA: SMALL LEAD FOR SACCONE IN CD18

WISCONSIN: CLINTON STAYS AHEAD; FEINGOLD WITH SMALLER LEAD

Vote Preference in Jefferson Parish Sheriff Election by Gender

Civitas Institute North Carolina Statewide Poll Results November 17 19, 2018

NH Statewide Horserace Poll

PENNSYLVANIA: CD01 INCUMBENT POPULAR, BUT RACE IS CLOSE

Multi-Mode Political Surveys

******DRAFT***** Muhlenberg College/Morning Call 2016 Pennsylvania Republican Presidential Primary Survey. Mid April Version

University of North Florida Public Opinion Research Lab

The Youth Vote 2004 With a Historical Look at Youth Voting Patterns,

Marist College Institute for Public Opinion Poughkeepsie, NY Phone Fax

Five Days to Go: The Race Tightens October 28-November 1, 2016

Loras College Statewide Wisconsin Survey October/November 2016

FOR RELEASE APRIL 26, 2018

RBS SAMPLING FOR EFFICIENT AND ACCURATE TARGETING OF TRUE VOTERS

PENNSYLVANIA: DEM GAINS IN CD18 SPECIAL

Report for the Associated Press: Illinois and Georgia Election Studies in November 2014

HIGH POINT UNIVERSITY POLL MEMO RELEASE 9/24/2018 (UPDATE)

The Election Process From a Data Prospective. By Kimball Brace, President Election Data Services, Inc. 2017

Statewide Survey on Job Approval of President Donald Trump

Orange County Registrar of Voters. June 2016 Presidential Primary Survey Report

Obama s Support is Broadly Based; McCain Now -10 on the Economy

Clinton Leads by 13% in Michigan before Last Debate (Clinton 51% - Trump 38%- Johnson 6% - Stein 2%)

A Post-Debate Bump in the Old North State? Likely Voters in North Carolina September th, Table of Contents

NANOS. Ideas powered by world-class data. Liberals 41, Conservatives 31, NDP 15, Green 6 in latest Nanos federal tracking

Local Fiscal Impact. Statewide $0 $23,347 $5,884 $4,038

Colorado 2014: Comparisons of Predicted and Actual Turnout

For immediate release Monday, March 7 Contact: Dan Cassino ;

Campaign and Research Strategies

NUMBERS, FACTS AND TRENDS SHAPING THE WORLD. FOR RELEASE September 12, 2014 FOR FURTHER INFORMATION ON THIS REPORT:

Clinton Lead Cut to 8% in Michigan (Clinton 49% - Trump 41%- Johnson 3% - Stein 1%)

An analysis and presentation of the APIAVote & Asian Americans Advancing Justice AAJC 2014 Voter Survey

Response to the Report Evaluation of Edison/Mitofsky Election System

Children's Referendum Poll

NEW JERSEY: DEM TILT IN CD07

OHIO: GAP NARROWS IN CD12 SPECIAL

McCain Pushes Back on Attributes But the Dynamic Holds for Obama

The University of Akron Bliss Institute Poll: Baseline for the 2018 Election. Ray C. Bliss Institute of Applied Politics University of Akron

FOR RELEASE NOVEMBER 07, 2017

Trump Trails Clinton by Only 3 Points In New Mexico. Making up 2 Points Over The Last Week. Johnson s Polling Numbers Continue to Decline.

NANOS. Ideas powered by world-class data. Liberals 39 Conservatives 28, NDP 20, Green 6, People s 1 in latest Nanos federal tracking

New Louisiana Run-Off Poll Shows Lead for Kennedy, Higgins, & Johnson

Youth Voter Turnout has Declined, by Any Measure By Peter Levine and Mark Hugo Lopez 1 September 2002

Voter turnout in today's California presidential primary election will likely set a record for the lowest ever recorded in the modern era.

UTAH: TRUMP MAINTAINS LEAD; CLINTON 2 nd, McMULLIN 3 rd

Iowa Voting Series, Paper 4: An Examination of Iowa Turnout Statistics Since 2000 by Party and Age Group

Case Study: Get out the Vote

VP PICKS FAVORED MORE THAN TRUMP AND CLINTON IN FAIRLEIGH DICKINSON UNIVERSITY NATIONAL POLL; RESULTS PUT CLINTON OVER TRUMP BY DOUBLE DIGITS

NANOS. Ideas powered by world-class data. Conservatives 35, Liberals 34, NDP 16, Green 8, People s 1 in latest Nanos federal tracking

Clinton s lead over Trump drops to 7 points in Virginia, as holdout voters move toward major party candidates

Consolidating Democrats The strategy that gives a governing majority

NEVADA: CLINTON LEADS TRUMP IN TIGHT RACE

Every Eligible Voter Counts: Correctly Measuring American Turnout Rates

PPIC Statewide Survey Methodology

DIRECTIVE November 20, All County Boards of Elections Directors, Deputy Directors, and Board Members. Post-Election Audits SUMMARY

FLORIDA: CLINTON MAINTAINS LEAD; TIGHT RACE FOR SENATE

Most opponents reject hearings no matter whom Obama nominates

Florida Atlantic University Poll: Trump Edging Clinton in Florida; Murphy and Rubio poised for tough Senate race

Changes in Party Identification among U.S. Adult Catholics in CARA Polls, % 48% 39% 41% 38% 30% 37% 31%

IOWA: TRUMP HAS SLIGHT EDGE OVER CLINTON

Union Voters and Democrats

New Mexico Canvass Data Shows Higher Undervote Rates in Minority Precincts where Pushbutton DREs Were Used

Americans Want a Direct Say in Government: Survey Results in All 50 States on Initiative & Referendum

THE SECRETS OF VOTER TURNOUT 2018

OHIO: TIGHT RACE FOR PREZ; PORTMAN WIDENS SENATE LEAD

NextGen Climate ran the largest independent young

2018 Vote Margin Narrows as Democratic Engagement Slips

OHIO: CLINTON HOLDS SMALL EDGE; PORTMAN LEADS FOR SENATE

Clinton Maintains 3% Lead in Michigan (Clinton 47% - Trump 44% - Johnson 4% - Stein 1%)

Iowa Voting Series, Paper 6: An Examination of Iowa Absentee Voting Since 2000

HIGH POINT UNIVERSITY POLL MEMO RELEASE 2/15/2018 (UPDATE)

MEMORANDUM. Independent Voter Preferences

PENNSYLVANIA: SMALL GOP LEAD IN CD01

Tulane University Post-Election Survey November 8-18, Executive Summary

Survey on the Death Penalty

NEW HAMPSHIRE: CLINTON LEADS TRUMP; SENATE RACE NECK AND NECK

THE GOVERNOR, THE PRESIDENT, AND SANDY GOOD NUMBERS IN THE DAYS AFTER THE STORM

CALIFORNIA: CD48 REMAINS TIGHT

Clinton s lead in Virginia edges up after debate, 42-35, gaining support among Independents and Millennials

Survey of Likely General Election Voters Missouri Statewide

Green Party of California

NATIONAL: 2018 HOUSE RACE STABILITY

National Latino Leader? The Job is Open

Clinton, Trump at Campaign s End: Still Close and Still Unpopular

Georgia Democratic Primary Poll 5/17/18

CRUZ & KASICH RUN STRONGER AGAINST CLINTON THAN TRUMP TRUMP GOP CANDIDACY COULD FLIP MISSISSIPPI FROM RED TO BLUE

Trump and Sanders Have Big Leads in MetroNews West Virginia Poll

Transcription:

VoteCastr methodology Introduction Going into Election Day, we will have a fairly good idea of which candidate would win each state if everyone voted. However, not everyone votes. The levels of enthusiasm among the candidates respective bases and the strength of their turnout operations will have a huge impact on the final outcome. Traditionally, turnout rates are unknown until after the polls close. This year, however, VoteCastr will be tracking turnout in battleground states throughout Election Day. This information, combined with microtargeting models and early vote reports will allow us to make predictions about who is winning or losing the turnout battle. (Terms in italics are defined in the terminology appendix.) The process can be thought of as similar to solving a Sudoku or a crossword puzzle. Each piece of the puzzle that we fill in helps us fill in others: Turnout observations during the day allow us to predict end-of-day turnout Predicted end-of-day turnout in precincts with turnout observations allow us to calculate the average turnout rate in different categories and subcategories of precincts The average turnout rates in each of the categories and subcategories of precincts allow us to calculate the extrapolated end-of-day turnout in precincts where we do not have turnout observations. The extrapolated end-of-day turnout combined with candidate support microtargeting models allow us to calculate each precinct s extrapolated vote for Clinton, Trump, Johnson, and Stein, based on that turnout level. The extrapolated votes will be aggregated to the state level to determine which candidate has the advantage in terms of turnout. Methodology Stages Microtargeting models Validating the microtargeting models Early vote tracking Precinct turnout reports Reasons why a field worker might not be able to get turnout reports from their assigned precinct Projected end-of-day turnout Expected vote Projected end-of-day turnout as a percent of expected vote Category and subcategory average turnout as a percent of expected Extrapolated end-of-day turnout Expected Clinton, Trump, Stein, and Johnson vote based on extrapolated turnout Summary statistics Microtargeting models

Microtargeting works by taking information known about a sample of voters, combining it with demographics and commercial marketing data, and using that information to build statistical or machine learning models that then predict that information about every other voter. In this case, the information we have about a sample of voters will come from telephone surveys and past turnout information. In the telephone surveys, we will ask a random sample of 10,000 voters how likely it is that they will vote and who they are supporting for president and for U.S. Senate. The bulk of the surveys will be done through automated calls to landlines, but a smaller sample of live calls will be made to cellphones in order to reach the necessary number of younger, cellphone-only voters. We will use these survey responses to build models that predict how voters who were not called would have answered the survey had we been able to reach them. There are a large number of algorithms that we typically use for modeling projects. In this case, we expect to use a blend of penalized logistic regression and random forests. These models will then be scored on the full voter file. That means that every voter will get a number between 0 and 100 giving the percent likelihood that he or she will support each of the top four candidates. The process for modeling turnout is similar, although in addition to self-reported turnout likelihood from the phone surveys, we will also use past turnout history from the 2012 election. We use 2012 because that was the last presidential election, and turnout is traditionally higher in presidential years than in off-year elections like 2014. These vote history based models give us predictions of how likely it is that someone would have voted in 2012 had they been eligible. Using these models, we will have predicted turnout likelihood for all voters, even those who had not yet turned 18 in 2012 or who had not moved into the state until after that election. Validating the microtargeting models Both the candidate support models and the turnout models are built using two-thirds of the available survey or vote history data. The other one-third, selected at random, is used as a test set, and not used in the construction of the models. Once the models have been build and scored, the model predictions are compared to the actual test-set survey responses and vote history to make sure that the model is accurately predicting the behavior of voters whose responses were not used to build the models. We use a large number of model validation metrics, including the F1 score and area under the ROC curve. As a more easily understood metric, we will also calculate the expected margin of error for the average-sized precinct in each state. To do this, we will randomly select samples of test-set IDs equal to the number of voters in the average size precinct, then we will compare the test-set IDs to the candidate support model predictions for each one of these groups. We will run 1,000 simulations per state to calculate these margins of error. Early vote tracking Local election officials collect and report information about who has voted early. This information is compiled by the voter-file vendor L2 and provided to us the weekend before the election. We will know who has already vote, and from the microtargeting models know for whom they most likely voted. Because we know who has voted early, we will remove those voters from the pool of potential Election Day voters in each precinct.

Precinct turnout reports Field workers are assigned specific precincts to monitor. These precincts are selected so as to give us coverage of base Clinton and Trump areas, key demographic groups, and major geographies in each state. If for some reason a field worker is not able to get turnout reports for their assigned precinct, they will have been given backup precincts with similar demographics. (See below for possible reasons why a field worker might not be able to obtain reports from their assigned precinct.) Field workers collect total turnout frequently throughout the day and report those numbers via an automated touch-tone reporting system. That information is then fed in real time to the VoteCastr team. Reasons why a field worker might not be able to get turnout reports from their assigned precinct There are a number of reasons why reliable reports might not be available from a specific precinct. Poll workers refuse to allow field workers access. In theory, field workers have the right to observe polling locations, but rather than spend time on legal challenges, if a field worker is barred from a polling location, he will be directed to move on to his backup precinct. Poll workers are too busy to provide turnout numbers. Poll workers may be willing to have our field workers at their polling location, but if lines of voters are long, they might not be willing to assist with reporting the number of votes cast. In these cases, field workers will be instructed to first attempt to get a voter to report his voter number, and failing that, to move on to a backup precinct. Precinct has multiple lines, each with its own number range. A precinct might have one line for voters with last names A through M using voter numbers 1-1,000, and a second line for voters N through Z using numbers 1,001 through 2,000. In this case, getting the voter number from a voter in either line will give a misleading turnout number. If the poll workers are helpful and competent they may be able to determine the actual number of voters cast, but if not, the field worker will be instructed to move on to a backup precinct. Projected end of day turnout When we get turnout observations, they will include a time stamp telling us when the observation was collected. For each precinct in a state will we know the poll opening and closing times. This will allow us to calculate the percentage of the day that has transpired at the time of the observation. We will use this percentage to calculate the projected end of day turnout. For example, if a turnout observation was collected 10 percent of the way into the voting day, then we would multiply that observation by 10 to calculate the projected end of day turnout. NOTE: The rate of voting is not actually steady throughout the day. Generally speaking, there are surges of turnout before 9 a.m., over the lunch hour, and after 5 p.m. However those patterns are less distinct in areas with large numbers of unemployed voters, students, retirees, or people working nontraditional hours. There is not enough historical data to definitively define the turnout pattern in each precinct, and making assumptions like saying that Clinton-supporting student areas, or Trump supporting areas with large numbers of unemployed white voters will turnout out in a certain way risks skewing the results. So, in the interest of transparency, we are treating all precincts alike. This will probably lead to

apparently high turnout first thing in the morning that then begins to taper off as more post 9am reports come in. Expected vote In each precinct, we will have calculated an expected vote. This is based on the turnout scores and early vote numbers. Voters who have voted early will be suppressed from the voterfile for each precinct, then the turnout scores divided by 100 will be summed. The turnout scores will have been adjusted to match the expected statewide turnout. Expected statewide turnout is calculated based on 2012 turnout adjusted for population growth. Projected end-of-day turnout as a percent of expected vote Once we have calculated the projected end-of-day vote, we will be able to calculate the projected endof-day turnout as a percentage of the precinct s expected vote in each precinct with turnout observations. Defining precinct categories and sub-categories Each precinct in a state will be defined as belonging to a broad category, and then to a sub-category. Categories: Base Clinton Clinton expected to get 60%+ of the 2-way vote (Clinton 2-way >=60%) Lean Clinton Clinton expected to get between 55% and 60% of the 2-way vote (Clinton 2way >=55 and <60) Swing Clinton expected to get between 45% and 55% of the 2-way vote (Clinton 2-way >=45 and <55) Lean Trump Trump expected to get between 55% and 60% of the 2-way vote (Trump 2-way >=55 and <60) Base Trump Trump expected to get 60%+ of the 2-way vote (Trump 2-way >=60) Subcategories Each category is sub-divided based on race. The racial categories are: Majority African American Majority Hispanic Majority White Mixed-race (no one group > 50 percent)

There are 20 possible sub categories (5 categories times 4 racial types) Base Clinton majority African American Base Clinton majority Hispanic Base Clinton Majority White Base Clinton Mixed race Lean Clinton majority African-American. Not all subcategories will exist in every state. For example, it is extremely unlikely that there will be any base Trump, majority African American precincts. Category and subcategory average turnout as a percent of expected Whenever a precinct s turnout as a percent of expected is updated, we will calculate the average turnout as a percent of expected for all precincts in that category and for all precincts in the subcategory. We will also record the total number of precincts reporting turnout for each category and subcategory. Extrapolated end of day turnout For precincts where we do not have turnout observations, we will use the average from other precincts in the same category or subcategory to fill in the extrapolated end of day turnout. Each precinct belongs to one subcategory and one category. We will attempt to calculate its extrapolated end of day turnout first using the subcategory if five or more precincts in that subcategory have reported turnout. Otherwise, we will use the average from the precinct category. Once we have calculated the extrapolated end of day turnout as a percent of expected for each precinct we will then multiply that number divided by 100 by the precinct s expected vote in order to calculate the extrapolated end of day turnout. Expected Clinton, Trump, Stein, and Johnson vote based on extrapolated turnout We will have microtargeting models giving the likelihood of each voter in each precinct supporting Clinton, Trump, Johnson, or Stein. If all voters in a precinct turned out, the vote for each of those candidates would be the sum of their candidate support scores divided by 100. However we know that not all voters are going to vote, and furthermore that all voters are not equally likely to turn out. In each precinct we can rank the voters from most to least likely to vote. The candidate support scores for the most likely to vote voter will likely to be different from those for the second most likely to vote on down through the least likely to vote. Prior to the election, we will calculate the 2-way Clinton vs. Trump support scores at each turnout level from one vote through every voter in the precinct turning out. Once that is done, we will run a linear regression model predicting Clinton and Trump support based on percent of registered voters turning out. The formula for each precinct will be saved and used on election day to calculate the expected Clinton, Trump, Johnson, and Stein vote in each precinct based on the number of votes expected to be cast by the end of the day. We will not be adjusting the Johnson

and Stein numbers as they will be too small to yield a reliable model based on different turnout levels. Instead, we will use the precinct aggregate microtargeting scores for them. Summary statistics As each precinct s extrapolated end of day turnout and extrapolated candidate support are calculated, we will be able to sum the candidate vote statewide to get an overall sense of which candidates are over- or under-performing in terms of turnout. We will also be able to provide statistics for turnout as a percent of expected for each of the precinct categories and sub-categories. Appendix 1 Terminology There are a lot of terms that are used interchangeably in the news. Projected, modeled, and estimated are all often used to mean something that is believed, but not known with certainty. In the context of Votecastr, these terms have distinct, specific meanings. These, and other key terms are defined below. Observed An actual report. Observed turnout is a turnout report called in from the field. Projected A calculation based on time of day. If a precinct has observed turnout at a specific time during the day, we can use that to calculate the projected end of day vote. Extrapolated In a precinct that does not have observed turnout, we extrapolate turnout based on how other precincts in the same category or subcategory are turning out. Modeled A predictive model (likely logistic regression and/or random forests) that predicts a behavior at an individual level. For this project we will model likelihood of supporting each of the top four candidates, and likelihood of voting. Microtargeting

Modeled support or turnout Score The output of a predictive model. A score gives the percent likelihood of an individual taking an action or holding an opinion. The turnout score gives the percent likelihood that the individual in question will vote. A candidate support score gives the percent likelihood that the individual will support that candidate, if he or she votes. Precinct The smallest unit of political geography. In some states, the words ward and precinct are used interchangeably. In other states, precincts are a subdivision of wards. VTD Voting Tabulation District. This is the unit of geography at which votes are counted. Often, this is the same as a precinct, but in many cases, multiple precincts will vote at the same location, and their votes are not separated by precinct. In this case, the VTD will be the combination of those precincts. Note: Each 10 years, the census collects VTD definitions from the states. Many states will report their precinct lines, even if the individual precincts always vote in combination with others. Because a VTD is defined in a certain way by the census does not necessarily mean that that is the actual configuration that will be used for tabulating results. For clarity, we use the term Census VTD to refer to the official census geography. Polling location Where people go to vote on Election Day A polling location may be the same as a precinct, or it may include multiple precincts. If a polling location includes multiple precincts and their votes are counted together, then the polling location and VTD are the same. Some polling locations will have separate lines and ballot boxes

or voting machines for each precinct voting at that polling location, and those votes are counted and reported separately. In these cases, the VTD and the polling location are not the same. VAP Voting Age Population. From the census. Number of people age 18-plus. CVAP Citizen Voting Age Population. From the census. Number of U.S. Citizens age 18-plus.