The Optimal Weighting of Pre-Election Polling Data

Similar documents
Did Illegal Overseas Absentee Ballots Decide the 2000 U.S. Presidential Election? 1

Money is where the fun ends: material interests and individuals preference for direct democracy

Mean Vector Analyses of the Voting Patterns of Ghanaians for Three Consecutive Periods: A Case Study of the Greater Accra Region

Attorney Docket Number Application Number

State of New York Public Employment Relations Board Decisions from September 5, 1974

Biased Democracies: The Social and Economic Logic of Interest-Based Voting

FOREIGN WORKERS IN SOUTHERN AGRICULTURE *

SURVEY ON FOREIGN TRAVELERS METHODOLOGY AND IMPLEMENTATION

Clientelism and polarized voting: Empirical evidence

LEGAL STATUS AND U.S. FARM WAGES

Defensive Counterterrorism Measures and Domestic Politics

Fiscal Decentralization and Development: How Crucial is Local Politics?

Financing Direct Democracy: Revisiting the Research on Campaign Spending and Citizen Initiatives

CONSTITUTION OF ADASTRAL PARK LEISURE AND SPORTS (ATLAS) BODY TALK GYM CLUB

Economy and Turnout: Class Differences in the 2000 U.S. Presidential Election Uisoon Kwon University of Minnesota Duluth

Media Networks and Political Accountability: Evidence from Radio Networks in Brazil

PUBLIC SERVICE COMMISSION OF WEST VI'RGINIA CHARLESTON PROCEDURE. required to satisfy said complaint or make answer thereto, in writing,

The Effects of District Magnitude on Voting Behavior

Board of Trustees Meeting Minutes

Calculating Equivalent and Compensating Variations in CGE Models

The E ects of District Magnitude on Voting Behaviour

PROPOSED AMENDMENTS TO THE BOARD OF REGENTS POLICY ON WEAPONS POSSESSION

What Do We Elect Committees For? A Voting Committee Model for Multi-Winner Rules

POLITICAL REGIME DURABILITY, DEVELOPMENT AND GOVERNANCE: THE ROMANIA S CASE. Mihai MUTASCU *

Return Migration, Investment in Children, and Intergenerational Mobility: Comparing Sons of Foreign and Native Born Fathers

IMMIGRATION POLICY AND THE AGRICULTURAL LABOR MARKET: SPECIALTY CROPS IN THE UNITED STATES

Does Bicameralism Matter?

Document de treball de l IEB 2012/31

CONSTITUTION OF THE New Democratic Party of Canada EFFECTIVE FEBRUARY 2018

DETERMINANTS OF UNEMPLOYMENT AND EARNINGS IN SOUTH AFRICA. Master of Science in Statistics

87 faces of the English clause

Introduc)on to Hierarchical Models 8/25/14. Hierarchical Models in Population Ecology. What are they and why should we use them? Topics of Discussion

Pattern recognition applied to presidential elections in the United States, : Role of integral social, economic, and political traits

Legal Strategies for FDA Consent Decrees

Fairfield Sentry and the limits of comity in Chapter15cases

Political Competition and Invalid Ballots in Mexico: evidence from. subnational data

Plaintiff, Defendant. This libel action arises out of the public controversy. concerning the safety.of fluoridation o:f public water supplies,

Can the Introduction of a Minimum Wage in FYR Macedonia Decrease the Gender Wage Gap?

Constitution of the Broad MBA Association

Investigating the interaction effect of democracy and economic freedom on corruption: a cross-country quantile regression analysis

How Interest Groups with Limited Resources can Influence Political Outcomes: Information Control and the Landless Peasant Movement in Brazil

Varieties of Clientelism: Machine Politics During Elections

Oregon Round Dance Teachers Association

membership in a language minority. assumption that Section 5 complies Case 2:13-cv Document Filed in TXSD on 08/08/14 Page 1 of 79

I" f_jj" Erwln 0. Canham Post Office Box 185. t Plebiscite Commissioner Capitol Hill Rural Branch

Judicial Review as a Constraint on Tyranny of the Majority

IMMIGRATION POLICY AND THE AGRICULTURAL LABOR MARKET: THE EFFECT ON JOB DURATION. Nobuyuki Iwai, Orachos Napasintuwong, & Robert D.

An Empirical Analysis of the Determinants of Guilty Plea Discount

Is There Really a Border Effect?

California Ballot Propositions and Initiatives. Follow this and additional works at:

Document de treball de l IEB 2009/8

WORKING PAPER 2000:9. Ethnic enclaves and the economic success of immigrants - evidence from a natural experiment

Combating Housing Benefit Fraud: Local Authorities' Discretionary Powers

UNICEF Humanitarian Action Study 2017

of any issue of law or fact, to the entry of the

World Income Distribution and Mobility

8/19/16. Clustering. Clustering is a hard problem. Clustering is a hard problem

Ethnic Enclaves and the Economic Success of Immigrants Evidence from a Natural Experiment *

CDDRL WORKING PAPERS. Varieties of Clientelism: Machine Politics During Elections. Number 119 October 2010

How minorities fare under referendums. A cross national study *

UNITED STATES DISTRICT COURT NORTHERN DISTRICT OF CALIFORNIA SAN FRANCISCO DIVISION

econstor Make Your Publications Visible.

Democratization and clientelism: Why are young democracies badly governed?

Income Segregation and Suburbanization in France : a discrete choice approach

Hukou and Highways WPS7350. Policy Research Working Paper 7350

A Water Cooler Theory of Political Knowledge and Voting

UNCLASSIFIED UNITED STATES ARMY SPECIAL OPERATIONS COMMAND. White Paper. Redefining the Win. 06 Jan 2015 UNCLASSIFIED

Gaber v Benhuri Ctr. for Laser Dentistry 2013 NY Slip Op 30378(U) February 15, 2013 Supreme Court, New York County Docket Number: /11 Judge:

POLITICAL STABILITY AND ECONOMIC GROWTH. A TWO WAY RELATION. EDGARDO E. ZABLOTSKY

The Roles of Foreign Aid and Education in the War on Terror

Scoring Guidelines and Notes for Document-Based Question

An Integrated Computational Model of Multiparty Electoral Competition

I i IN THE COURT OF APPEAL OF THE DEMOCRATIC SOCIALIST REPUBLIC OF SRI LANKA CA 1 WAKFS 1 01/2017. I j

FOlA IVlarker. Records Managemeht;.White House Office of

NBER WORKING PAPER SERIES

Institut für Halle Institute for Economic Research Wirtschaftsforschung Halle

BELGRADE CITY COUNCIL MEETING MINUTES COUNCIL CHAMBERS. April 7, : 00 PM

Community Access To Justice And Conflict Resolution In Aceh And Maluku

Ethnic minorities in the UK: burden or benefit?

DISCOURAGING DEMAND. Defining the concept of demand. What do we mean when we talk about demand in relation to trafficking?

THE DISTRIBUTION OF DISCRIMINATION IN IMMIGRANT EARNINGS - EVIDENCE FROM BRITAIN *

Off with their heads: Terrorism and electoral support for capital punishment in Australia *

American Law & Economics Association Annual Meetings

Proximity, Regional Integration and Weak Trade among African Countries Perspective from SADC

On the Duration of Comparative Advantages of Top European Wine Producers Jeremiás Máté BALOGH, Attila JÁMBOR

Matter of Diaz v New York City Dept. of Health & Mental Hygiene 2013 NY Slip Op 32360(U) September 25, 2013 Supreme Court, New York County Docket

The effect of motherhood on wages and wage growth: evidence for Australia

Does Labour Supply Respond to Globalisation? Malaysia Evidence from Micro Data

MINUTES OF THE. MEETING of the FINANCE COMMITTEE July 21, 1967

Municipal mergers and special provisions of local council members in Japan

AGENDA REQUEST AGENDA ITEM NO: V.3. Board Appointments. July 21, 2014 BY City Auditor and Clerk Pamela M. Nadalini City Auditor and Clerk Nadalini

Matter of Brasky v City of New York 2006 NY Slip Op 30744(U) March 15, 2006 Supreme Court, New York County Docket Number: /05 Judge: Lottie E.

The statistical analysis of the relationship between Religion and macroeconomic indicators

Corruption Re-examined *

Scoring Guidelines and Notes for Long Essay Question

Ethnic Residential Segregation and Immigrants Perceptions of Discrimination in West Germany

Democratic Institutions and Equity Market Liberalization

UNITED STATES DISTRICT COURT. I i I. District of. l by Failing to Maintain an Accurate Oil Record:Book, to

Last Time. u Priority-based scheduling. u Schedulable utilization u Rate monotonic rule: Keep utilization below 69%

Essay The Economic Argument for a Policy of Suicide Prevention

Transcription:

Brgham Young Unversty BYU ScholarsArchve All Theses and Dssertatons 2008-04-23 The Optmal Weghtng of Pre-Electon Pollng Data Gregory K. Johnson Brgham Young Unversty - Provo Follow ths and addtonal works at: https://scholarsarchve.byu.edu/etd Part of the Statstcs and Probablty Commons BYU ScholarsArchve Ctaton Johnson, Gregory K., "The Optmal Weghtng of Pre-Electon Pollng Data" (2008). All Theses and Dssertatons. 1377. https://scholarsarchve.byu.edu/etd/1377 Ths Selected Project s brought to you for free and open access by BYU ScholarsArchve. It has been accepted for ncluson n All Theses and Dssertatons by an authorzed admnstrator of BYU ScholarsArchve. For more nformaton, please contact scholarsarchve@byu.edu.

THE OPTIMAL WEIGHTING OF PRE-ELECTION POLLING DATA by Gregory K. Johnson A selected project submtted to the faculty of Brgham Young Unversty n partal fulfllment of the requrements for the degree of Master of Scence Department of Statstcs Brgham Young Unversty August 2008

BRIGHAM YOUNG UNIVERSITY GRADUATE COMMITTEE APPROVAL of a project submtted by Gregory K. Johnson Ths selected project has been read by each member of the followng graduate commttee and by majorty vote has been found to be satsfactory. Date Wllam F. Chrstensen, Char Date Scott D. Grmshaw Date C. Shane Reese

BRIGHAM YOUNG UNIVERSITY As char of the canddate s graduate commttee, I have read the selected project of Gregory K. Johnson n ts fnal form and have found that (1) ts format, ctatons, and bblographcal style are consstent and acceptable and fulfll unversty and department style requrements; (2) ts llustratve materals ncludng fgures, tables, and charts are n place; and (3) the fnal manuscrpt s satsfactory to the graduate commttee and s ready for submsson to the unversty lbrary. Date Wllam F. Chrstensen Char, Graduate Commttee Accepted for the Department Date Scott D. Grmshaw Graduate Coordnator Accepted for the College Date Thomas W. Sederberg Assocate Dean, College of Physcal and Mathematcal Scences

ABSTRACT THE OPTIMAL WEIGHTING OF PRE-ELECTION POLLING DATA Gregory K. Johnson Department of Statstcs Master of Scence Pre-electon polls are used to test the poltcal landscape and predct electon results. The relatve weghts for the state-level data from the 2006 U.S. senatoral races are consdered based on the date on whch the polls were conducted. Long- and shortmemory weght functons are developed to specfy the relatve value of hstorcal pollng data. An optmal weght functon s estmated by mnmzng the dscrepancy functon between estmates from weghted polls and the electon outcomes.

ACKNOWLEDGEMENTS I wsh to expressly thank Dr. Wllam Chrstensen for hs efforts, and ongong nterest n my behalf. The same s true for the faculty n the BYU Statstcs Department. As a group, they have provded me truly exceptonal support, assstance, and consderaton n obtanng my master's degree. I reman ndebted to them. I would also lke to thank my dear wfe Karen who suffers me so gladly. I love her.

CONTENTS CHAPTER 1 Introducton... 1 2 Lterature Revew... 4 3 Methods and Analyss... 8 3.1 Optmal Weght Functon for a Specfc State... 12 3.2 Overall Optmal Weght Functon... 12 4 Concluson... 16 BIBLIOGRAPHY... 17 v

FIGURES Fgure 3.1 Illustraton of weghtng functon used by Chrstensen and Florence (2008) when weghtng polls... 10 3.2 Long-memory weght functon from equaton (2) n black and shortmemory weght functon from equaton (3) n red... 11 3.3 Values of D 2 (h, f) for combnatons of half-lfe (h) n H={1, 2,,50} wth floor (f) n F={0.0001, 0.01, 0.02,, 0.50}... 15 x

1. INTRODUCTION For decades, poltcans and the lke have used pollng data to predct electon results. As campagn budgets have soared, so has the frequency and sophstcaton of conductng pre-electon polls. Subsequently, accurately analyzng and nterpretng the results of these polls has become ncreasngly more mportant. The natural consequence of ncreased frequency and sophstcaton s an ncreased dffculty n the analyss and proper nterpretaton of results of pollng actvtes. As an llustraton of the dffculty of proper nterpretaton of results of polls, Carl Balk reported n the Wall Street Journal (2008) that some pollsters have merely averaged the results of pollng data. The am of ths averagng s to offset conflctng results, to control for competng nterests, and to acheve a more accurate synopss of the poltcal landscape. Mr. Balk observes Among the ptfalls: Polls have dfferent sample szes, yet n the composte, those wth more respondents are weghted the same. They are felded at dfferent tmes, some before respondents have absorbed the results from other states prmares. They cover dfferent populatons, especally durng prmares when turnout s tradtonally lower. It s expensve to reach the target number of lkely voters, so some pollsters apply looser screens. Also, pollsters apply dfferent weghts to adjust for voters they ve mssed. And wordng of questons can dffer, whch makes t especally trcky to count undecded voters. Even dentfyng these dfferences sn t easy, as some of the ncluded polls aren t adequately footnoted. 1

The statstcal ssues assocated wth smple averagng of polls are clearly lad out n Mr. Balk s comments. To be precse, the statstcal problems assocated wth smple averagng of polls s the lack of accountng for 1. dfferent sample szes among polls, 2. dfferent populatons of nterest (as evdenced by poll tme, and samplng frame), 3. dfferent samplng weghts provded by pollng organzaton, and 4. dfferent queston wordng whch nduces a bas. Poll result synthess based on smple averagng s therefore an nadequate approach. Ths s not to say that there s lttle value n pollng data, rather the opposte: all pollng results are of value, t s smply a queston of the degree to whch polls are consdered valuable. A more approprate analyss would seek to weght dfferent polls, accountng for some of the effects mentoned above. The purpose of ths study s to determne optmal weght functons for pollng results whch mnmze the dscrepancy between polls and the actual electon results. To better address these weghts, we explore dfferent optons for weghts usng state-level pre-electon pollng data from the 2006 senatoral races and ther subsequent results (31 states qualfed for our study). The relatve value of the polls was based on the dates on whch they were conducted. Two weght functons were consdered, short memory and long memory, provdng more weght to the more recent polls. The remander of ths project explores the constructon of the optmal weght functons. More specfcally, Chapter 2 contans a revew of lterature related to weghtng functon constructon and poll synthess. Chapter 3 presents the short and long memory weght functons developed 2

and the results of these two dfferent weght functons on the data from the 2006 senatoral races. In Chapter 4 the relatve merts of the two dfferent approaches s dscussed and conclusons are drawn. 3

2. LITERATURE REVIEW For decades poltcans, campagns, poltcal pundts, reporters, and academcs have used pollng data to predct electon outcomes. As congressonal campagns become more sophstcated at measurng voter opnons and local meda have more complete coverage of state races, there has been an ncrease n the frequency, sophstcaton, and publcaton of pre-electon polls. Subsequently, accurately analyzng and nterpretng the results of these polls has become ncreasngly more mportant and more dffcult. There are many concerns wth embracng a sngle poll to predct electon outcome. Frst, polls are, at best, snapshots of voter opnon. They are frequently reported weekly for natonal races or monthly for statewde contests and are often released on weekends to concde wth the Sunday poltcal news cycle or leadng nto upcomng prmary votes. Weekly samplng s based on the assumpton that voter opnons change often, whch s possble when sgnfcant events occur or poltcal stumbles from the canddate or campagn change the message. However, most electons are stable wth a constancy to message and strategy, and t seems reasonable to combne the nformaton n a thoughtful way. One approach s to create a regresson model to predct the natonal popular vote. Pollng data can be ncorporated as an explanatory varable, although how to use past polls nstead of smply the most recent poll requres more attenton. Some of the explanatory varables descrbe the nature of the campagn and the amount of underlyng partsan support. Examples nclude an ndcator varable for ncumbency, the current presdent s approval ratng, number of party delegates, strength of thrd-party 4

challengers, and measurements of the natonal economy. One of the challenges s dentfyng explanatory varables that measure voter nterest. Examples nclude hstorcal voter turnout, degree of partsanshp, satsfacton wth educaton, defense, and other ssues. To demonstrate ths approach, consder Campbell (1992), where 16 explanatory varables were used to predct presdental outcome by September of an electon year. These models can be modfed to apply to state races. Even these models rely on pollng data. Brown and Chappell (1999) found that poll data domnates the optmal forecast when compared to models wth only explanatory varables based on hstorcal electon fundamentals. Other models attempt to defne the characterstcs of lkely voters. One of the flaws n polls s that people are usually surveyed by phone. Whle t s possble to ask f a person plans to vote, the answer s consdered based snce most people n a survey beleve t reflects poorly on ther ctzenshp to admt to not votng. Some pollsters ask a seres of screenng questons to dentfy lkely voters. Some are drect questons, such as are you a regstered voter or who dd you vote for n the last congressonal race. Some are ndrect questons, where they ask questons you would need to know f you had voted such as how long dd you wat n lne to vote last tme or what tme of day dd you last vote. Another approach s to develop profles of lkely voters and weght the sample data to reflect the populaton. For example, pollsters wll develop demographc groups or partsan groups and estmate the expected voter turnout. However, these models do not address the ssue of prncpal nterest, namely who wll actually wn the presdency. Whle the popular vote and the electoral vote often agree, Al Gore takes lttle comfort for wnnng the popular vote n the 2000 U.S. 5

presdental electon. Although natonal popular opnon durng U.S. presdental races s most commonly measured and dscussed n the meda, the U.S. presdental electon s based on the electoral college, n whch each state has a number of electors equal to the number of ts U.S. representatves. Addtonally, the Dstrct of Columba acts as a state wth a number of electors proportonal to ts populaton, but not exceedng the number of electors assgned to any of the states. The people n each state vote for the state-level electors who then vote for a presdental canddate, wth most states usng a wnner-takeall polcy for castng votes n the electoral college. Thus, although much of the meda attenton durng electon years focuses on polls trackng popular support for the major canddates, the complcated role played by the electoral college n ths multstage electon process must be accounted for n order to address the ssue of wnnng the presdency. Balk (2008) reports that some pollsters have merely averaged the results of pollng data n an attempt to offset conflctng results and to acheve a more accurate synopss of the poltcal landscape. Current practce s descrbed by Mark Blumenthal, a former Democratc pollster and co-founder of Pollster.com, as not optmal, but lets hope that by combnng them were gettng some better verson of the truth (qtd. n Balk 2008). Ths nave approach to combnng polls from dfferent days gnores dfferent sample szes. Sophstcated pollng asks questons and apples sample weghts that allow survey respondents opnons to be portrayed as lkely voters. Dfferent pollsters use dfferent flterng questons to dentfy partsan voters. Balk notes that dentfyng and countng undecded voters s partcularly challengng. Unfortunately, the detals of a pollsters sample and operatng procedure s not adequately dsclosed. Wthout techncal 6

descrptons t s dffcult to provde a thoughtful method to combne nformaton from dfferent poltcal polls. Smply averagng polls s also a poor choce. Ths s not to say that there s lttle value n pollng data, rather the opposte: all pollng results are of value, t s smply a queston of how much. Chrstensen and Florence (2008) descrbe a smulaton-based approach (ether frequentst or Bayesan) to answerng electon outcome questons that rely on combnng polls. Hstorcally, one of the man challenges assocated wth forecastng electon outcomes has been the lack of state-level-pre-electon poll data (Cohen 1998), but opnon polls are now easly accessble on the Internet. For example, n 2004, state-level poll data for all 50 states and the Dstrct of Columba were avalable from several web pages such as the LA Tmes webste (where most of the data for these analyses were obtaned). Although pre-electon pollng data are nevtably awed, they can stll provde much nsght about natonal and regonal trends. Poltcal scentsts who study electons have noted that presdental pre-electon pollng data may not be useful untl at least early September after the two partes natonal conventons. The analyses of the 2004 presdental electon dscussed use statelevel opnon poll updates 12 dfferent tmes begnnng 12 October 2004 and endng 2 November 2004, the day before the electon. Durng the 22-day wndow wth poll results, some states had no new updates whle others had as many as ten. Wth the begnnng 12 October 2004 data, polls are assumed to be taken on that day even though some polls may have been older. Multday polls were treated as f the data were gathered on the day the poll was reported. 7

3. METHODS AND ANALYSIS In ths chapter, we consder the optmzaton of poll weghts for predcton of a sngle-stage electon outcome. That s, we consder a data scenaro smlar to that observed n Chrstensen and Florence (2008). We are nterested n predctng the actual percentage of a populaton votng for a specfc canddate. Specfcally, we consder each state separately and determne what weghtng scheme wthn a class of weghts wll yeld the best estmate of the actual percentage votng for a canddate. For ths exploraton, we use the electon poll data obtaned pror to the November 2006 U.S. Senate races. Pror to ths electon, the Republcan Party was n the majorty n the U.S. Senate but several GOP senators were defendng hotly contested seats. For most states, the pollng data used n these analyses were gathered on 1 August 2006 or later. The exceptons were Massachusetts, Msssspp, and Wyomng, where the only polls avalable before the last week of the campagn were conducted pror to August 1. For each date between 1 August 2006 and the electon on November 7, Chrstensen and Florence (2008) predcted the outcomes of each race and also predcted the lkelhood of a change n majorty party. For each predcton, the electon pollng data was formulated n one of three ways: (1) usng only the latest poll, (2) combnng all of the responses from all prevous polls, and (3) weghtng the responses from prevous polls, wth decreasng weghts for older polls. They consder two dfferent weghtng functons one that gves the estmator a long memory of the past polls and one that gves a short memory. The general form of the weght functon s: t w ( t; h, f ) = mn 1, f, (1) 2h 8

where t s the number of days snce the poll was carred out, h s the half-lfe of the functon, and f s ts floor. To defne the facets of ths functon, consder the long memory weght functon defned by Chrstensen and Florence (2008) as follows: t 1, w( t;35,0.2) = 70 0.2, t t 56 > 56. (2) Ths weght functon s llustrated n Fgure 3.1. Note that the weght functon mples that a respondent to a poll that s 35 days old wll have a weght equal to half that of a respondent n a poll released today. Thus, we reference the slope of ths lne wth the functon s half-lfe of 35 days. The nterpretaton of the weght functon for t 56 s that pollng data has decreasng utlty as t ages. The other parameter governng ths class of weghts s the mnmum or floor weght. In the sample weght functon gven above, the floor s equal to 0.2. That s, at 56 days old, a poll s respondents wll have a weght of 0.2, but wll then decrease no more as t contnues to age. The nterpretaton of the weght functon for t 56 s that the utlty of pollng data always retans some mnmal level of value, regardless of age. Chrstensen and Florence (2008) also use a short-memory weght functon defned by t 1, w( t;7,0.05) = 14 0.05, t 13 t > 13. (3) The half-lfe for the short-memory ndcates that a poll has lost half of ts utlty by the tme t s one week old. The floor value of 0.05 s also much smaller than n the longmemory weght functon, ndcatng that n every respect, the estmator usng ths weght 9

Fgure 3.1. Illustraton of weghtng functon used by Chrstensen and Florence (2008) when weghtng polls. wll draw only mnmally on older pollng data. Fgure 3.2 compares the nature of the long- and short-memory weght functons. Note that for the predctons made n Chrstensen and Florence (2008) between August 1 and 6 November 2006, we cannot evaluate the accuracy of our state-by-state or overall predctons because there s no ground truth aganst whch we can compare. However, the predcton on our fnal day can be compared to the actual results on 7 November 2006. That s, we cannot evaluate the optmalty of our weghts for August data when predctng voter behavor on September 1, but we can evaluate dfferent weghtng schemes for August-through-November data when predctng electon results for November 7. In ths secton, we consder the class of weghts llustrated n equaton (1) and dentfy the optmal weght functon for each of the 31 senate races we were trackng. Addtonally, we are nterested n recommendng one all-purpose weght functon that can be used for future poll trackng of ths nature. It may not seem optmal to use an allpurpose weght functon when trackng a race for whch we can obtan state-specfc 10

Fgure 3.2. Long-memory weght functon from equaton (2) n black and short-memory weght functon from equaton (3) n red. optmzed weghts. For example, we can obtan an optmzed weght functon for Tennessee based on the 2006 Senate data and then use that specfc weght for the 2008 Senate race n Tennessee. However, t s also plausble that the 31 estmated optmal weght functons obtaned from each of the Senate data sets n 2006 represents a dstrbuton of estmates for some unversal weght functon. Under ths assumpton, we can smultaneously use the 31 data sets from 2006 to post a functon that s best for future use n an overall sense. 3.1 Optmal Weght Functon for a Specfc State Consder the m pre-electon polls for a gven state, wth poll ages {t 1,, t m }, poll sample szes {n 1,, n m }, and Republcan preference counts {r 1,, r m }. Because we do not want our calculatons nfluenced by potental voters who are undecded or votng for thrd-party canddates, our sample sze for these calculatons s actually the sum of the Democratc preference count and the Republcan preference count (.e., n j = d j + r j ). We consder all possble combnatons of the half-lfe (h) n H={1, 2,, 50} 11

wth floor (f) n F={0.0001, 0.01, 0.02,, 0.50}. For each par (h, f), we calculate the estmate of the proporton votng Republcan n state (among all persons votng Republcan or Democrat) wth ˆ π ( h, f ) m = = j 1 m j= 1 w( t ; h, f ) r j j j w( t ; h, f ) n j. (4) We consder the optmal weght functon for the state to be w(t; h o,f o ), where o o ( h ˆ, f ) = arg mn π ( h, f ) π (5) h H, f F and π s the actual proporton votng for the Republcan canddate n state (among all persons votng for ether the Republcan or the Democrat). 3.2 Overall Optmal Weght Functon Our task s then to choose an overall weght functon that n some sense best predcts the vector of Republcan preference proportons for all 31 states (π = π 1,, π 31 ). A smple rule for choosng the optmal values of h o and f o n the overall weght functon w(t; h o,f o ) s to mnmze the dscrepancy functon D 1 31 1 ( h, f ) = ˆ π ( h, f ) π 31 = 1 (6) so that o ( h f o ) = arg mn D ( h, f ), 1 h H, f F. (7) The problem wth the rule n (7) s that t penalzes estmaton errors equally across states. So, f we estmate a state to yeld 70% Republcan vote nstead of an actual 12

value of 75%, ths has equal mpact on the dscrepancy measure as f we estmate a state to yeld 47% Republcan vote nstead of an actual value of 52%. In order to gve greater weght to the close races, we could weght each term n the sum found n (6) usng some measure of tghtness. In ths study, we use the number of publshed polls for a state (m ) as a measure of a race s tghtness to obtan the dscrepancy functon D ( f ) = 1 2 h, 31 31 ( ˆ π ( h, f ) π m ) = 1. (8) Then, our optmal functon s defned usng o ( h, f o ) = arg mn D ( h, f ). (9) h H, f F Thus, states generatng the most electon coverage by pollsters (e.g., battleground states) wll have the largest nfluence n selectng the optmal weght functon. 2 Alternatvely, one could weght by the closeness of πˆ to 0.50 n the dscrepancy functon, as n or D ( h, f ) = 3 1 31 31 = 1 ˆ π ( h, f ) π ˆ π ( h, f ) 0.50 D ( f ) = 1 4 h, 31 31 ( ˆ π [ ˆ ( h, f ) π 0.5 - π ( h, f ) 0.50 ]) = 1. Fgure 3.3 gves a plot showng the values of D 2 (h, f) for all possble combnatons of the half-lfe (h) n H={1, 2,, 50} wth floor (f) n F={0.0001, 0.01, 0.02,, 0.50}. Note that the overall optmal weght functon s w(t; h o,f o ) = w(t; 20, 0.0001). That s, the weght functon that mnmzes D 2 (h, f) n equaton (8) s one that gves polls a half-lfe of 20 days and a floor value of essentally zero. (We do not set the value of the floor at zero because there are some states for whch the most recent poll 13

may be older than 2h.) The optmal weght functon for each of the ten closest senate races s also denoted on the plot. Note that 6 of the 10 closest states (and 15 out of 31 states n total) use vrtually no weght for polls older than 2h (.e., f o = 0.0001). We recommend the weght functon w(t; h o,f o ) = w(t; 20, 0.0001) for general use n future work predctng electon outcomes from pre-electon polls. 14

floors 0.0 0.1 0.2 0.3 0.4 0.5 MD TN NV MONJ VA MT OH RI AZ 0 10 20 30 40 50 halflfes Fgure 3.3. Values of D 2 (h, f) for combnatons of half-lfe (h) n H={1, 2,, 50} wth floor (f) n F={0.0001, 0.01, 0.02,, 0.50}. The red dot ndcates the mnmum value of o o D 2 (h, f) wth ( h, f ) = (20, 0.0001). Optmal values for the weght functons assocated wth the ten closest senate races are denoted wth state abbrevatons. 15

4. CONCLUSION In summary, the purpose of ths study s to provde better means by whch pollng data may be utlzed. Polls are ncreasngly more expensve and reled upon. Smply averagng the polls does not account for dfferences n sample szng, populatons of nterest, pollsters, queston wordng, and so forth, and therefore can skew nterpretatons. A total of 2,550 weght functons are consdered, each havng a pecewse lnear form. The overall optmal weght functon for these data s determned based on the noton that the specfc functon for each state s a random realzaton from an overall dstrbuton wth common average shape. Wth ths assumpton, t s determned that a poll has a half-lfe of 20 days, and a floor value of essentally zero, meanng that a poll loses ts value wthn 10 days and has no value thereafter. Our approach for choosng the optmal weght functon gves a much larger nfluence to the states wth the closest races. If one s nterested n gvng an equal nfluence to all races, a dfferent optmal weght functon would be determned. 16

BIBLIOGRAPHY Balk, C.N., Electon Handcappers are Usng Rsky Tool: Mxed Poll Averages, The Numbers Guy, Wall Street Journal, sec. B1, February 15, 2008. Brown, L.B., and Chappell, H.W. Jr. (1999), Forecastng Presdental Electons usng Hstory and Polls, Internatonal Journal of Forecastng, 15, 127 135. Campbell, J.E. (1992), Forecastng the Presdental Vote n the States, Amercan Journal of Poltcal Scence, 36, 386 407. Chrstensen, W.F., and Florence, L.W. (2008), Predctng Presdental and Other Multstage Electon Outcomes Usng State-Level Pre-Electon Polls, The Amercan Statstcan, 62, 1 10. Cohen, J.E. (1998), State-Level Publc Opnon Polls as Predctors of Presdental Electon Results: The 1996 Race, Amercan Poltcs Quarterly, 26, 139 159. 17