Tag Archives: elections

why is it so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament?

This blog post is inspired by my disappointing NCAA March Madness bracket. I used math modeling to fill my bracket, and I am currently in the 51st percentile on ESPN. On the upside, all of my Final Four picks are still active so I have a chance to win my pool. I am worried that my bracket has caused me to lose all credibility with those who are skeptical of the value of math modeling. After all, guessing can lead to a better bracket. Isn’t Nate Silver a wizard? How come his bracket isn’t crushing the competition? Here, I will make the case that a so-so bracket is not evidence that the math models are bad. To do so, I will discuss why it is so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament.

Many models for the Presidential election and the basketball tournament are similar in that they use various inputs to predict the probability of an outcome. I have discussed several models for forecasting the Presidential election [Link] and the basketball tournament [Link].

All models that didn’t solely rely on economic indicators chose Obama to be the favorite, and nearly all predicted 48+ of the states correctly. In other words, even a somewhat simplistic model to forecast the Presidential election could predict the correct outcome 96% of the time. I’m not saying that the forecasting models out there were simplistic – but simply going with poll averages gave good estimates of the election outcomes.

The basketball tournament is another matter. Nate Silver has blogged about how models to predict tournament games using similar math models. Here, we can only predict the correct winner 71-73% of the time [Link]:

Since 2003, the team ranked higher in the A.P. preseason poll (excluding cases where neither team received at least 5 votes) has won 72 percent of tournament games. That’s exactly the same number, 72 percent, as the fraction of games won by the better seed. And it’s a little better than the 71 percent won by teams with the superior Ratings Percentage Index, the statistical formula that the seeding committee prefers. (More sophisticated statistical ratings, like Ken Pomeroy’s, do only a little better, with a 73 percent success rate.)

To do well in your bracket, you would need to make small marginal improvements over using the naive model of always picking the better seed (72% success rate). Here, a 96% success rate would be unrealistic — an improved model that would get 75% of the games correctly would give you a big advantage. The big advantage here means that if you used your improved method in 1000 tournaments, it would do better on average than a naive method. In any particular tournament,  the improved method may still lead to a poor bracket. It’s a small sample.

The idea here is similar to batting averages in baseball. It is not really possible to notice the difference between a 0.250 batter and a 0.300 batter in a single game or even across the games in a single week. The 0.250 hitter may even have a better batting average in any given week of games. Over the course of the season of 162 games, the differences are quite noticeable when looking at the batters’ batting average. The NCAA does not have the advantage of averaging performance over a large number of games — we are asked to predict a small set of outcomes in a single tournament where things will not have a chance to average out (it’s The Law of Small Numbers).

It’s worth noting that actual brackets get fewer than 72% of the games correct because errors are cumulative. If you put Gonzaga in the Elite Eight and they are defeated in the (now) third round and do not make it to the Sweet Sixteen, then one wrong game prediction leads to two wrong games in the bracket.

It’s also worth noting that some games are easier to predict than others. In the (now) second round (what most of us think of as  the first round), no 1 seed has ever lost to a 16 seed, and 2 seeds have only rarely lost to 15 seeds (it’s happened 7 times). Likewise, some states are easy to predict in Presidential elections (e.g., California and Oklahoma). The difference is that there are few easy to predict games in the tournament whereas there are many easy to predict states in a Presidential election. Politico lists 9 swing states for the 2012 election. That is, one could predict the outcome in 82% of the states with a high degree of confidence by using common sense. In contrast, one can confidently predict ~12% of  tournament games in the round of 64 teams using common sense (based on four of the games corresponding to 1 seeds). Therefore, I would argue that there is more parity in college basketball than there is in politics.

How is your bracket doing?

the exit polling supply chain

A WSJ Washington Wire blog post describes the Presidential election exit polling supply chain in New York in the immediate aftermath of Hurricane Sandy. The Washington Wire blog post highlights the polling firm Edison Research, based in New Jersey. Edison provided the questionnaires used by pollsters who would collect information about the ballots cast. As you might recall, New Jersey and New York were extremely damaged from the hurricane.


One of the logistical challenges was in printing and delivering the questionnaires used by pollsters around the country. The questionnaires need to be timely, so they are usually shipped one week before the election. Sandy was on track to strike 8 days before the election, so a rush order was placed with the printer. Two thirds of the questionnaires were mailed before Sandy struck and Edison’s election office lost power along with the rest of New Jersey. The rest of the questionnaires were stored for two days until they had to be shipped. Edison printed the mailing labels from their main office, and then UPS shipped the 400 packages to pollsters via Newark Airport. While Edison had redundancy in their system (e.g., the mailing labels could be printed in another facility and a redundant system alerted employees of the change), it only worked because not all of their offices lost power.

Mail Delivery

While Edison relied on UPS to deliver the mail, it is worth noting that USPS mail service continued as normal except for one day during Hurricane Sandy (HT to @EllieAsksWhy).


Edison relied on having employees implement Plan B. With the gas shortage, it was difficult for employees to get to work when they needed to save gas for other car trips. Organizing car pools was more difficult than normal, since employees could not rely on communicating by email or cell phone.


As I mentioned in an earlier post, there were few/no vacancies at hotels that had power, which provided challenges for Edison employees who wanted to work out of a hotel (most offices and homes were without power) or pollsters who needed to travel to different cities to perform exit polling.  I’m not sure how these issues were resolved.

Local transportation to the polls

The NYC public transportation was up and running on election day, so the pollsters could make it there for the big day. The subway reopened with limited runs the Thursday before Election Day and was running as usual on Election Day.

What if Hurricane Sandy came later?

Edison Research managed, but having an 8 day head start was helpful for successfully completing a contingency plan. If the hurricane hit 5 days or closer, the questionnaires would have already been printed and mailed. However, there may have been more challenges with getting pollsters to the polling locations in New York City and other locations (the subway may still have been closed on Election Day).

Related posts:

queuing on election day

This is another blog post about voting. This one focuses on the actual act of voting in all its queuing glory.

Queue basics: the voters are customers who enter the system. The system here is a voting area for a precinct. The voters wait in a queue to cast their votes in voting booths (the servers). The customer arrival rates depend on the time of day, and hence, the system is not stationary.

Let’s look at different ways to look at voting from a queuing perspective.

Let’s start with this article in the Economist that argues that bad weather favors Romney. Here, they focus on how weather affects the voter arrival rates:

To be brutal, a certain amount of bad weather on election day helps conservatives in every democracy. In crude terms, car-driving conservative retirees still turn out in driving rain, when bus-taking lower-income workers just back from a night shift are more likely to give rain-soaked polls a miss. School closures are a particular problem for low-income families or single mothers scrambling to find childcare.

Thus, bad weather may decrease the arrival rate of liberal voters more than of conservative voters.

Ultimately, many people are going to vote. Long lines were a problem in 2004 and 2008, and a few balked at waiting in line. Many places (such as my state of Virginia) offered early voting via absentee ballots to voters in 2008, since the turnout was unprecedented. Waiting in line to vote leads to questions about voting machine allocations and the time it takes to vote.

Muer Yang, Ted Allen, and Michael Fry wrote a paper that focuses on the number of servers and the service times. [Link to press release] They examine how to assign voting machines to precincts to equalize the amount of waiting time between precincts so that some precincts are not plagued with long waiting times while others are not. They do so by noting that voting is not stationary and include  other realistic voting complications:

“[The election board's] assumptions of those problems are not even close to the real world,” he added, “because [the election board's traditional] model assumes a stationary voter arrival — that voters arrive at the voting station at the same rate, which is not true. We use simulation models to consider realistic complications, including variables such as voter arrival time, voter turnout, length of time needed to finish a ballot, peak voting times and machine failures.”

The paper hasn’t been published yet, so I don’t know all of the details. To satiate your desire for mathematical details, you can read this paper in the Winter Simulation Conference by Muer Yang, Michael Fry, and David Kelton. They examines how to allocate voting machines (servers) to voting precincts in an equitable manner.  There are many ways to evaluate equity. Here, the authors use the average absolute differences of expected waiting times among precincts as a proxy for “equity.”  They provide a heuristic that uses a factorial experimental design and show that this heuristic outperforms the “utilization-equalization” method. The  “utilization-equalization” method is another proxy for voter equity that “equalizes the utilization of voting machines rather than equalizing waiting times of voters. Moreover, the utilization rate is obtained by traditional queueing theory, which assumes stationary arrivals and steady-state operating conditions.”

Initially, I thought that so many people voting early via absentee ballot or just early voting would mean fewer long lines in a queue. This is not necessarily the case. Early voting in South Florida and Ohio has been plagued with long lines (up to six hours). Hopefully, this means that fewer people will be in line on Election Day. I haven’t heard yet if state budget cuts will lead to poorly staffed voting precincts, which will in turn lead to long lines on Election Day even if the turnout isn’t record-setting.

All those voter ID laws were supposed to cut down on voter fraud. In a queuing context, that means that the new laws would slightly reduce the voter arrival rates. Carl Bialik wrote a nice article in the WSJ [Link] about voter fraud and whether voter ID laws would make much of a difference. The short answer is that they don’t. Fraud is hard to detect, and when it has been detected, it has most often occurred with absentee ballots (no ID needed to vote absentee) and during voter registration.

How long did you wait in line to vote?

drive carefully: you are 18% more likely to die in a fatal car crash on Presidential election days

I’ve reported this before, but it’s worth revisiting before a Presidential election: you are 18% (+/- 8%) more likely to die in a fatal car crash on a Presidential election day [Link to JAMA article]. Here’s a snippet from the paper:

The results of US presidential elections have large effects on public health by their influence on health policy, the economy, and diverse political decisions. We hypothesized that mobilizing approximately 50% to 55% of the population, along with US reliance on motor vehicle travel, might result in an increased number of fatal motor vehicle crashes during US presidential elections… We analyzed national data from the Fatality Analysis Reporting System of fatal crashes in the United States from 1975 to 2006. We included all presidential elections since database inception (from Jimmy Carter in 1976 through George W. Bush in 2004) during the hours of polling (defined as 8:00AM to 7:59 PM local time). For each election, we also identified the same hours on the Tuesdays immediately before and immediately after as control days for the number of individuals in fatal crashes at the time, as described previously. Confidence intervals (CIs) for comparing death counts on election days and control days were calculated by binomial tests.

This yielded a relative risk of 1.18 on election days (95% CI, 1.10-1.26; P < .001), equivalent to an absolute increase of 189 individuals over the study interval (95% CI, 104-280). The net increase in risk was about 24 individuals per election and was fairly stable across decades of time (see Figure below). The increase in relative risk extended to pedestrians and persisted across different ages, sexes, locations, polling hours, and whether a Democrat or Republican was elected. No difference in risk was observed in separate sensitivity analyses of individuals involved in fatal crashes during the same hours comparing the Monday before the election with control Mondays (relative risk, 0.97, 95% CI, 0.89-1.06) or comparing the Wednesday after the election with control Wednesdays (relative risk, 1.03; 95% CI, 0.95-1.12).

Figure. Individuals in Fatal Crashes During Presidential Elections
Data are counts of individuals in the crashes (in which not all persons necessarily died) during polling hours (8:00 AM to 7:59 PM, except where noted with alternative hours). Because 2 control days are available for each election day, expected deaths were calculated as total control deaths divided by 2. CI indicates confidence interval.

election day forecasts: what do Presidential election forecasting models use to make predictions

Many critics of Nate Silver and other modelers have popped up out of the woodworks lately. I’m sure small improvements can be made to his models and to others. But at the end of the day, I prefer math models to educated guesses. All models are an abstraction of reality. They may still be useful (and I believe that many of the forecasting models are–our world runs on models).

But let’s be clear: all rely on proxies and/or data that is available. Both proxies and data introduce sources of errors into the predictions.

Earlier, I blogged about different pieces of information used by different Presidential election forecasting models [Link]. Here, I compared the 13 Keys to the White House model with Nate Silver’s national model (that he is no longer using this close to the election) and Ezra Klein’s national model. They are used economic data as part of the their forecast

Now that we’re getting closer to the election, many models use state polls for forecasts. Now, the national polls are no longer relevant because the election hinges on a few, key states. This implies that state-level forecasts are key, but that’s not necessarily the case. Fewer and fewer people respond to polls. In 1997, 90% of voters were contacted for the polls and 36% responded. In 2012, 36% were contacted and 9% responded [Link]. That’s right, 9%!! It’s getting hard to make the case that polls matter. Some no longer rely on opinion polls.

Here’s a list of forecasting models with a description of the information they use. This list isn’t comprehensive–I’m doing the best I can with shrouded information surrounding some of these models.

13 Keys to the While House

Nate Silver: fivethirtyeight

  • state polls, economic indicators, historic information about poll biases, post-convention bounce correction factors, etc.

Election Analytics (at U of Illinois)

  • state election polls (read more in my previous post here)

U of Colorado model

  • state level economic indicators…and that’s it

Rothschild-Wolfers model

  • polls about who voters think will win rather than opinion polls that are traditionally used. This isn’t a forecasting model per se since they focus on forecasting state outcomes, but it’s in the same vein

Princeton Election Consortium

  • state polls (accounts for poll biases in a simple manner).


  • state polls (I’m not sure if they account for poll biases), economic indicators (the president’s net approval-disapproval rating in June of the election year; the percent change in GDP from Q1 to Q2 of the election year; and whether the incumbent party has held the presidency for two or more terms) to contribute to a Bayesian “prior” for how the race will unfold. The methodology will be published in the Journal of the American Statistical Association (forthcoming).

Here are some related posts:

election day blog post: moving from polls to forecasts

I will have a series of posts on elections and voting leading up to Election Day (hint: check back over the weekend!). This blog post is about the nuts and bolts of forecasting using polls and about interpreting the results.

Many of the polls are a dead heat. Why doesn’t this mean that Obama has a 50-50 chance of being reelected?

This excellent piece by Simply Statistics explains why a candidate’s small lead (say, 0.5%) in the popular vote could translate into a favorite when it comes to winning the election (say, 68%). Moreover, this small lead could translate into many electoral votes. The post was written about Nate Silver’s model on fivethirtyeight, and it is aimed at those that are less statistically literate. It gives the rest of us a good way to explain how forecasting models work, and why a candidate could get so many electoral votes in a close election:

Let’s pretend, just to make the example really simple, that if Obama gets greater than 50% of the vote, he will win the election… [W]e want to know what is the “percent chance” Obama will win, taking into account what we know. So let’s run a bunch of “simulated elections” where on average Obama gets 50.5% of the vote, but there is variability because we don’t have the exact number. Since we have a bunch of polls and we averaged them, we can get an estimate for how variable the 50.5% number is… We can run 1,000 simulated elections… When I run the code, I get an Obama win 68% of the time (Obama gets greater than 50% of the vote). But if you run it again that number will vary a little, since we simulated elections. The interesting thing is that even though we only estimate that Obama leads by about 0.5%, he wins 68% of the simulated elections. The reason is that we are pretty confident in that number, with our standard deviation being so low (1%). But that doesn’t mean that Obama will win 68% of the vote in any of the elections!

This is another way to explain forecasting models from an article on Politico (the Simply Statistics blog post criticizes this article, but I like the quote from Nate Silver):

Silver cautions against confusing prediction with prophecy. “If the Giants lead the Redskins 24-21 in the fourth quarter, it’s a close game that either team could win. But it’s also not a “toss-up”: The Giants are favored. It’s the same principle here: Obama is ahead in the polling averages in states like Ohio that would suffice for him to win the Electoral College. Hence, he’s the favorite,” Silver said.

Nuts and bolts of moving from polling data to forecasting models

This Huffington Post article by Simon Jackman of Stanford describes the process of moving from poll averages to forecasts. The entire article is worth a careful read–it’s written more at the level of a technical reader. He steps through the importance of polling biases, meaning that the polls collectively under- or over-estimate Obama support. He accounts for this by looking at data about historic bias, while counting more recent elections more than more distant elections. This makes sense, because recent elections with polls via land land, text, etc., likely have more insight. This gives him a probability distribution for the polling bias instead of a fixed value of 0.

 I adopt an approach that concedes that we simply don’t know with certainty what the error of the poll average will be; I use a (heavy-tailed) probability distribution to characterize my uncertainty over the error of the poll average. Analysis of the temporally discounted, historical data supplies information as to the shape and location of that distribution. … Our problem is that we don’t know what type of election we’ve got (at least not yet). We should buy some insurance when we move from poll averaging to talking about state-level predictions, with some uncertainty coming into our predictions via uncertainty over the bias of the poll average we might encounter this cycle.

We see the effect of this in the table below, which shows that the polls that average out to Obama’s chance of winning different states varies from 11%-97%. When the bias for the polling average is taken into account, Obama’s chances move closer to a 50-50 outcome. On a national level, this means that Obama’s chance of winning the election (“Electoral Vote”) lowers to 71% from an overly-optimistic 99% when bias is not properly accounted for.

The probability that Obama will win each state when assuming a zero bias (left: “poll average”) vs. a probability distribution for the bias (right: “extra uncertainty”)

 Another way to look at this

Andrew Gelman of the Statistical Modeling, Causal Inference, and Social Science blog writes a nice post in the NY Times about how the race is really too close to call. Nate Silver’s model, for example, shows that Obama has a 72% chance of winning. This means that we shouldn’t be too surprised if either Obama or Romney wins.

Let’s dig and see what this means. If we ran the election 100 times, Silver was saying that Obama would win 72 of them — but we’ll only be running it once. Silver was predicting an approximate 50.3 percent of the two-party vote share for Obama, but shifts of as large as 1 percent of the vote could happen at any time. (Ultimately, of course, we care about the Electoral College, not the popular vote. But a lot of research on polls and elections has shown that opinion swings and vote swings tend to be national.)

The online betting service Intrade gives Obama a 62 percent chance of winning, and I respect this number too, as it reflects the opinions of people who are willing to put money on the line. The difference between 63 percent and 75 percent may sound like a lot, but it corresponds to something like a difference of half a percentage point in Obama’s forecast vote share. Put differently, a change in 0.5 percent in the forecast of Obama’s vote share corresponds to a change in a bit more than 10 percent in his probability of winning. Either way, the uncertainty is larger than the best guess at the vote margin.

Let me be clear: I’m not averse to making a strong prediction, when this is warranted by the data. For example, in Feburary 2010, I wrote that “the Democrats are gonna get hammered” in the upcoming congressional elections, as indeed they were. My statement was based on [models that suggested] that the Republicans would win by 8 percentage points (54 percent to 46 percent). That’s the basis of an unambiguous forecast. 50.5 percent to 49.5 percent? Not so much. The voters are decided; small events could swing the election one way or another, and until we actually count the votes, we won’t know how far off the polls are. Over the past couple of weeks, each new poll has provided lots of excitement (thanks, Gallup) but essentially zero information.

Not all election forecasting models rely on opinion polls. That will be the subject of another blog post. Stay tuned.

how much is my vote worth to Obama and Romney?

I live in Virginia, a hotly contested swing state. I don’t watch much TV, but when I do, there are back-to-back election commercials. I usually find 3 robo-call messages on my answering machine when I return home from work. My vote is valuable — and I know it.

As an undecided voter, I wondered how much Obama and Romney paid for my vote. Here’s the formula I used:

Amount spent on my vote = [Amount spend in RVA] / [Number of undecided voters in RVA].

Here RVA = Richmond, VA. The idea here is that the ads are not aired for the benefit of those who have already decided–they are for my “benefit.” To estimate both the numerator and the denominator, I use the following two formulas. The first assumes that advertising money is spent proportionally to the voting age population in different regions in Virginia.

Amount spend in RVA = [Amount spend in Virginia] * [Potential RVA voters] / [Potential VA voters].


Number of undecided voters in RVA = [Potential RVA voters]*[Fraction who will vote] * [Fraction of undecided voters].

Here are the parameters I used:

First, I looked at how much they spent on advertising in Virginia: $96M [link].

Now, I look at the total number of the voting age population in Virginia: 6,242,000 [link]

Fraction of these voters in Richmond: 940,000 [link]

Proportion of voting age population who will vote: 60-67% [link]

The proportion of undecided voters in Virginia: 4.7% (as of 10/10) to 9.2% (on 8/1) [link]

When I put this all together, I find that my vote is worth $250 – $545.

The $250 figure assumes that the voter turnout is historically high (67% from the 2008 election) and that there are many undecided voters (they peaked at 9.2% on 8/1, about the time when the campaign ads started in full force). The $545 figure assumes a more reasonable voter turnout (60% from the 2004 election) and that there are fewer undecided voters (there are currently 4.7% undecideds). A more realistic scenario would be to include the 60% voter turnout with 6.0% undecided voters (the value before the conventions), yielding $427 for my vote. I’d rather pocket the cash.

11/5 UPDATE: $131M was spent on ads in Virginia, which means that $582 was spent for my vote with a range of $340-$744. This only counts ads, which means that this grossly underestimates how much was spent on me. If I scale this by the ration of the total amount spent ($1.6B) to the total amount spent on ads ($733M), then $1272 was spent on my vote with a range of $743 – $1624. This seems like too much. After the election excitement dies down, I’m ready to discuss campaign reform.

forecasting the Presidential election using regression, simulation, or dynamic programming

Almost a year ago, I wrote a post entitled “13 reasons why Obama will be reelected in one year.” This post uses Lichtman’s model for predicting the Presidential election way ahead of time using 13 equally weighted “keys” – macro-level predictors. Now that we are closer to the election, Lichtman’s method offers less insight, since it ignores the specific candidates (well, except for their charisma), the polls, and the specific outcomes from each state. At this point in the election cycle, knowing which way Florida, for example, will fall is important for understanding who will win.  Thus, we need to look at specific state outcomes, since the next President needs to be the one who gets at least 271 electoral votes, not the one who wins the popular vote.

With less than two months until the election, it’s worth discussing two models for forecasting the election:

  1. Nate Silver’s model on fivethirtyeight
  2. Sheldon Jacobson’s model (Election analytics)

In this post, I am going to compare the models and their insights.

Nate Silver [website link]:

Nate Silver’s model develops predictions for each state based on polling data. He adjusts for different state polls applying a “regression analysis that compares the results of different polling firms’ surveys in the same state.” The model then adjusts for “universal factors” such as the economy and state-specific issues, although Silver’s discussion was a bit sketchy here–it appears to be a constructed scale that is used in a regression model. It appears that Silver is using logistic regression based on some of his other models. Here is a brief description of what goes into his models:

The model creates an economic index by combining seven frequently updated economic indicators. These factors include the four major economic components that economists often use to date recessions: job growth (as measured by nonfarm payrolls), personal incomeindustrial production, and consumption. The fifth factor is inflation, as measured by changes in theConsumer Price Index. The sixth and seventh factors are more forward looking: the change in the S&P 500 stock market index, and the consensus forecast of gross domestic product growth over the next two economic quarters, as taken from the median of The Wall Street Journal’s monthly forecasting panel.

Nate Silver’s methodology is here and here. It is worth noting that Silver’s forecasts are for election day.

Sheldon Jacobson and co-authors [website link]

This model also develops predictions for each state based on polling data. Here, Jacobson and his collaborators use Bayesian estimators to estimate the outcomes for each state.  A state’s voting history is used for it’s prior. State polling data (from Real Clear Politics) is used to estimate the posterior. In each poll, there are undecided voters. Five scenarios are used to allocate the undecided voters from a neutral outcomes to strong Republican or Democrat showings. Dynamic programming is used to compute the probability that each candidate would win under the five scenarios for allocating undecided votes. It is worth noting that Jacobson’s method indicates the Presidential election if it is held now; it doesn’t make adjustments for forecasting into the future.

The Jacobson et al. methodology is outlined here and the longer paper is here.

Comparison and contrast:

One of the main differences is that Silver relies on regression whereas Jacobson uses Bayesian estimators. Silver uses polling data as well as external variables (see above) as variables within his model whereas Jacobson relies on polling data and the allocation of undecided voters.

Once models exist for state results, they have to be combined to predict the election outcome. Here, Silver relies on simulation whereas Jacobson relies on dynamic programming. Silver’s simulations appear to simulate his regression models and potentially exogenous factors. Both the simulation and dynamic programming approaches model inter-state interactions that do not appear to be independent.

Another difference is that Silver forecasts the vote on Election Day whereas Jacobson predicts the outcome if the race were held today (although Silver also provides a “now”-cast). To do so, Silver adjusts for post-convention bounces and for the conservative sway that occurs right before the election:

The model is designed such that this economic gravitational pull becomes less as the election draws nearer — until on Election Day itself, the forecast is based solely on the polls and no longer looks at economic factors.

This is interesting, because it implies that Silver double counts the economy (the economy influences voters who are captured by the polls). I’m not suggesting that this is a bad idea, since I blogged about how all forecasting models stress the importance of the economy in Presidential elections. It is worth noting that Silver’s “now”-cast is close to Jacobson’s prediction (98% vs. 100% as of 10/1)

Silver makes several adjustments to his model, not relying solely on poll data. The economic index mentioned earlier is one of these adjustments. Others are the post-convention bounces (those have both been weighed out by now). While Silver appears to do this well, the underlying assumption is that what worked in the past is relevant for the election today.  This is probably a good assumption as long as we don’t go too far in the past. This election seems to have a few “firsts,” which suggests that the distant past may not be the best guide. For example, the economy has been terrible: this is the first time that the incumbent appears to be heading toward reelection under this condition.

Both models rely on good polls for predicting voter turnout. The polls in recent months have been conducted on a “likely voter basis,” From what I’ve read, this is the hardest part of making a prediction. The intuition is that it’s easy to make a poll, but it’s harder to predict how this will translate into votes. Silver explains why this issue is important in response to a CNN poll:

Among registered voters, [Mr. Obama] led Mitt Romney by nine percentage points, with 52 percent of the vote to Mr. Romney’s 43 percent. However, Mr. Obama led by just two percentage points, 49 to 47, when CNN applied its likely voter screen to the survey.

Thus, the race is a lot closer when looking at likely voters. Polling is a complex science, but those who are experts suggest that the race is closer than polls indicate.

Jacobson’s model overwhelmingly predicts that Obama will be reelected, which is in stark contrast to other models that give Romney a 20-30% chance of winning as of 9/16 and a ~15% of winning today (10/1). Jacobson’s model predicted an Obama landslide in 2008, which occurred. The landslide this time around seems to be due to a larger number of “safe” votes for Obama in “blue” states (see the image below). Romney has to win many battleground states to win the election. The odds of Romney winning nearly all of the battleground states necessary to win is ~0% (according to Jacobson as of 9/30). This is quite a bold prediction, but it appears to rely on state polls that are accurately calibrated for voter turnout. To address this, Jacobson uses his five scenarios that suggest that even with a strong conservative showing, Romney has little chance of winning.  Silver and InTrade predict a somewhat closer race, but Obama is still the clear favorite  (e.g., Intrade shows that Romney has a 24.1% of winning as of 10/1) .

Additional reading:

Special thanks to the two political junkies who gave me feedback draft of this blog: Matt Saltzman and my husband Court.

Sheldon Jacobson’s election analytics predictions as of 9/16

It’s the economy, stupid: the pieces of information you need to know to forecast the Presidential election

Ezra Klein from the Washington post created a forecasting model for the upcoming Presidential election.

The final model uses just three pieces of information that have been found to be particularly predictive: economic growth in the year of the election, as measured by the change in gross domestic product during the first three quarters; the president’s approval rating in June; and whether one of the candidates is the incumbent.

That may seem a bit thin. But it calls 12 of the past 16 elections right. The average error in its prediction of the two-party vote share is less than three percentage points.

It is interesting that there are only three parameters in the model. I should note that highly qualified academic experts informed this model, so this model is defensible (maybe I give my fellow academics too much credit (-:  ).

Contrast these three parameters with Nate Silver’s model (from the NY Times): Presidential approval ratings, GDP growth rate in the fourth year of the incumbent’s term, and the ideology score of the challenger.

The Keys to the White House model by Allan Lichtman and Vladimir Keilis-Borok have 13 parameters that are all equally weighted. I wouldn’t necessarily conclude that more parameters means better accuracy. The equal weights may be a big limitation, since one could argue that not all of the 13 parameters carry the same weight in the mind of the voters. This may necessitate the use of additional parameters to accurately predict the race.

In summary, let’s look at a table of the parameters used in three models to predict the Presidential election. It’s amazing how little these models have in common and they are all “good” (i.e., they have predicted past election outcomes well). The one overlapping piece is the short-term change in the economy: if the GDP has improved in the recent past (i.e., this year!), then the President will be re-elected. This gives credence to the catch-phrase, “It’s the economy, stupid.”

Keys to the White House (equally weights) Nate Silver (unequal weights) Ezra Klein (unequal weights)
 Party Mandate: After the midterm elections, the incumbent party holds more seats in the US  House of Representatives than after the previous midterm elections  -  -
 Contest: There is no serious contest for the incumbent party nomination  -  -
 Incumbency: The incumbent party candidate is the sitting president  - Incumbency
 Third party: There is no significant third party or independent campaign  -  -
 Short term economy: The economy is not in recession during the election campaign Economic Growth: G.D.P. growth during the election year itself Economic Growth: change in gross domestic product during the first three quarters of the election year
 Long term economy: Real per capita economic growth during the term equals or exceeds mean growth during the previous two terms  -  -
 Policy change: The incumbent administration effects major changes in national policy  -  -
 Social unrest: There is no sustained social unrest during the term  -  -
 Scandal: The incumbent administration is untainted by major scandal  -  -
 Foreign/military failure: The incumbent administration suffers no major failure in foreign or military affairs  -  -
 Foreign/military success: The incumbent administration achieves a major success in foreign or military affairs  -  -
 Incumbent charisma: The incumbent party candidate is charismatic or a national hero  -  -
 Challenger charisma: The challenging party candidate is not charismatic or a national hero  -  -
 - Presidential approval ratings Presidential approval ratings

Nate Silver discusses the limitation of his model (or any model really) in forecasting the Presidential election that is worth posting here:

By design, [my forecast] is not an exceptionally precise forecast. There are all types of factors that the model does not explicitly consider, among them the possibility of third-party candidates or differences between the popular vote and the Electoral College. Moreover, voter perceptions about the economy, or the ideological positioning of the candidates, may differ in practice from what the objective data say about them. Rather than pretending to have all the answers, the model knows how much it doesn’t know and allows for a reasonably wide range of possible outcomes.

Who will be the Republican nominee?

The race for the Republican Presidential nomination has changed so much in the past week that it is hard to keep up. I enjoy reading Nate Silver’s NY Times blog when I have a chance. A week ago (Jan 16) he wrote a post entitled “National Polls Suggest Romney is Overwhelming Favorite for GOP Nomination, where he noted that Romney had a 19 point lead in the polls. He wrote

Just how safe is a 19-point lead at this point in the campaign? Based on historical precedent, it is enough to all but assure that Mr. Romney will be the Republican nominee.

Silver compared the average size of the lead following the New Hampshire primary across the past 20+ years of Presidential campaigns. He sorted the results according to decreasing “Size of Lead” the top candidate had in the polls. The image below is from Silver’s blog, where it suggests that Romney has this race all but wrapped up.

It looks almost impossible for Romney to blow it. I stopped following the election news until Gingrich surged ahead and the recount in Iowa led to Santorum winning the caucus.

A mere week later, it looks like Romney’s campaign is in serious trouble. Today (Jan 23), Silver wrote a post entitled “Some Signs GOP Establish Backing Romney is Tenuous.”  His forecasting model for the Florida primary on January 31 now predicts that Newt Gingrich has an 81% chance of winning. This is largely because Silver weighs “momentum” in his model, which Gingrich has in spades.

Two months ago, I blogged about how Obama will win the election next year. I was only half-serious about my prediction. Although the model seems to work, it is based on historical trends that may not sway voters today. Plus, I had no idea who the Republican nominee would be. Despite my prediction, I certainly envisioned a tight race that Obama could lose. Not so much these days.

A lot has changed in the past week (and certainly in the past two months!)

My question is, what models are useful for making predictions in the Republican race? Will the issue of “electability” ever become important to primary voters?



Get every new post delivered to your Inbox.

Join 1,877 other followers