Punk Rock Operations Research

~*~ peace, love, and operations research ~*~

Posts Tagged ‘march madness’

why is it so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament?

Posted by Laura McLay on March 26, 2013

This blog post is inspired by my disappointing NCAA March Madness bracket. I used math modeling to fill my bracket, and I am currently in the 51st percentile on ESPN. On the upside, all of my Final Four picks are still active so I have a chance to win my pool. I am worried that my bracket has caused me to lose all credibility with those who are skeptical of the value of math modeling. After all, guessing can lead to a better bracket. Isn’t Nate Silver a wizard? How come his bracket isn’t crushing the competition? Here, I will make the case that a so-so bracket is not evidence that the math models are bad. To do so, I will discuss why it is so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament.

Many models for the Presidential election and the basketball tournament are similar in that they use various inputs to predict the probability of an outcome. I have discussed several models for forecasting the Presidential election [Link] and the basketball tournament [Link].

All models that didn’t solely rely on economic indicators chose Obama to be the favorite, and nearly all predicted 48+ of the states correctly. In other words, even a somewhat simplistic model to forecast the Presidential election could predict the correct outcome 96% of the time. I’m not saying that the forecasting models out there were simplistic – but simply going with poll averages gave good estimates of the election outcomes.

The basketball tournament is another matter. Nate Silver has blogged about how models to predict tournament games using similar math models. Here, we can only predict the correct winner 71-73% of the time [Link]:

Since 2003, the team ranked higher in the A.P. preseason poll (excluding cases where neither team received at least 5 votes) has won 72 percent of tournament games. That’s exactly the same number, 72 percent, as the fraction of games won by the better seed. And it’s a little better than the 71 percent won by teams with the superior Ratings Percentage Index, the statistical formula that the seeding committee prefers. (More sophisticated statistical ratings, like Ken Pomeroy’s, do only a little better, with a 73 percent success rate.)

To do well in your bracket, you would need to make small marginal improvements over using the naive model of always picking the better seed (72% success rate). Here, a 96% success rate would be unrealistic — an improved model that would get 75% of the games correctly would give you a big advantage. The big advantage here means that if you used your improved method in 1000 tournaments, it would do better on average than a naive method. In any particular tournament,  the improved method may still lead to a poor bracket. It’s a small sample.

The idea here is similar to batting averages in baseball. It is not really possible to notice the difference between a 0.250 batter and a 0.300 batter in a single game or even across the games in a single week. The 0.250 hitter may even have a better batting average in any given week of games. Over the course of the season of 162 games, the differences are quite noticeable when looking at the batters’ batting average. The NCAA does not have the advantage of averaging performance over a large number of games — we are asked to predict a small set of outcomes in a single tournament where things will not have a chance to average out (it’s The Law of Small Numbers).

It’s worth noting that actual brackets get fewer than 72% of the games correct because errors are cumulative. If you put Gonzaga in the Elite Eight and they are defeated in the (now) third round and do not make it to the Sweet Sixteen, then one wrong game prediction leads to two wrong games in the bracket.

It’s also worth noting that some games are easier to predict than others. In the (now) second round (what most of us think of as  the first round), no 1 seed has ever lost to a 16 seed, and 2 seeds have only rarely lost to 15 seeds (it’s happened 7 times). Likewise, some states are easy to predict in Presidential elections (e.g., California and Oklahoma). The difference is that there are few easy to predict games in the tournament whereas there are many easy to predict states in a Presidential election. Politico lists 9 swing states for the 2012 election. That is, one could predict the outcome in 82% of the states with a high degree of confidence by using common sense. In contrast, one can confidently predict ~12% of  tournament games in the round of 64 teams using common sense (based on four of the games corresponding to 1 seeds). Therefore, I would argue that there is more parity in college basketball than there is in politics.

How is your bracket doing?

Posted in Uncategorized | Tagged: , , | 10 Comments »

methodologies used to predict the outcome of the basketball tournament

Posted by Laura McLay on March 21, 2013

My last post was about how to choose a winning bracket in the NCAA men’s basketball tournament. I linked to several tools for predicting which team is likely to win the outcome of a game. These tools

  1. provide a rank ordering of the teams from best to worst,
  2. compute the odds of which team would win in a matchup based on their tournament seed, or
  3. provide odds of a team making it to different levels of the tournament based on specific matchups.

I linked to the methodologies used by these tools in my last post but didn’t get into the details. Here, I am going to discuss the methodologies in more detail. I am going to focus on tools that predict the outcome of specific tournaments (#3 above).

Wayne Winston noted in Mathletics that there is no transitivity in matchups. That is, if team A is favored to beat team B and team B is favored to beat team C, this does not  imply that team A is favored to beat team C. Thus, the team rankings (#1 above) are not a perfect tool for predicting specific matchups. He uses “power ratings” to compute how many points one team is better than the other (a point spread), which takes home field advantage and other factors into account. He then converts the point spread to the probability of winning using historical game outcomes (basically, a normal distribution with a history-derived standard deviation) or simulates the games to compute the odds of winning.

Nate Silver’s model is interesting in that it takes many inputs, including the ranking tool outcomes from #1 above. His model uses blends four ranking models to take a more pluralistic view of who might win. I think this is a strength because it uses the wisdom of crowds (a small crowd in this case). Each of the four tools contributes 1/6 of the total power rating (a margin of victory).  Seed number and whether the team was ranked in preseason polls each contribute 1/6 of the power rating. He then makes adjustments for the geography of the game and player injuries and absences. He doesn’t describe his forecast probabilities in detail, but I suspect that his approach is similar to Wayne Winston’s. A team’s power rating is adjusted in each round based on the outcomes from previous rounds to account for potential errors in the power rating, another strength of the model.

Finally, Luke Winn and John Ezekowitz’s model doesn’t use power ratings [methodology here] – it instead applied survival analysis to predict when a team may drop out of the tournament. This model computes hazard rates for each team based on the team’s RPI and Ken Pomeroy’s ranking. They also consider

  1. consistency,
  2. tournament experience,
  3. out-degree network centrality that captures the number of games played and won against other NCAA tournament teams (see picture below), and
  4. the negative interaction of the Experience and Out-Degree Centrality variables

Cox Proportional Hazard regression was used to rerank the teams.

Other recommended reading on March Madness and analytical methods:

Posted in Uncategorized | Tagged: , | 1 Comment »

how to pick a winning bracket using analytics

Posted by Laura McLay on March 17, 2013

I’ve written about the NCAA basketball tournament many times before – click on my March Madness tag for my past posts. This time, I am going to summarize the different ways to select the winning teams in your bracket. There are many ways to choose a bracket, but using math models and analytics techniques seem to work best. Plus, it’s more fun than just guessing.

As I see it, there are two ways to pick a bracket: you can look at the team matchups and choose or you can look at the seed number and choose. I lean toward team level matchups when creating my own brackets, but I use the seed numbers, too. Some seed matchups (7/10 seeds for example) have historically high rates of producing upsets.

There are several tools that rank teams, and these ranking tools provide a way to see which team is “better” – the one with the higher ranking. One traditional tool is the RPI (access the RPI rankings here), but it’s not considered to be very good.

There are a number of more sophisticated ranking tools that use math modeling.

These ranking tools are great and do well at predicting individual games, and they do extremely well on average. This means that these methods would do the best when averaged over, say, 1000 basketball tournaments. We don’t have 1000 tournaments – we have just one. Keep that in mind. These rankings also do not necessarily give insight into a matchup in a specific game.

Two models consider individual matchups when computing how far each team will make it in the tournament.

The other way to pick rankings is to look at the seeds in the matchups. This is useful when a weak team from a major conference plays a top mid-major team. See this Business Week article on Sheldon’s advice for picking a good bracket. There is one tool developed by Sheldon Jacobson and his collaborators that focuses on seeds:

Here is one last thing to keep in mind:

  • Preseason rankings matter: teams that are in the top 25 before the season starts are likely to go far in the conference despite their seeds, and likewise, top 25 teams at the end of the season who were unranked at the beginning of the season are likely to go home early.

There are other articles out there on how to pick a winning bracket. Here is what I recommend reading:

Good luck with your bracket!

Posted in Uncategorized | Tagged: | 4 Comments »

hints for picking a winning bracket

Posted by Laura McLay on March 12, 2012

I have several blog posts about how to pick a winning bracket from previous years. Check out the March Madness tag to find these blog posts.  Some of the most useful posts for creating a bracket are:

Guidance for this year’s bracket can be found here:  The LRMC rankings are here, Wayne Winston’s tournament odds are here, and Sheldon Jacobson’s guide to finding the right mix of seeds is here.

Posted in Uncategorized | Tagged: | Leave a Comment »

How likely is VCU’s run to the Final Four? A VCU professor and sports nerd reflects on the likelihood of her school’s path to the Final Four

Posted by Laura McLay on March 30, 2011

I am thrilled that VCU made the Final Four this year.  My school’s team had an unlikely path to the Final Four, so unlikely that only 2 of 5.9 million ESPN brackets correctly picked all Final Four teams.  Sports nerds unanimously agree that VCU’s run has indeed been unlikely.

  • Nate Silver at the NY Times tweeted that “VCU reaching Final 4 may be least likely event in the history of the NCAAs. Penn in ’79 is close. So is Villanova winning it all in ’85.”  He wrote an excellent article that summarizes that the numbers show that VCU was indeed less likely to make the Final Four than the other 11 seeds in the tournament.
  • Before the tournament began, Wayne Winston gave VCU a 1-in-1000 chance of reaching the Final Four (using the Sagarin ratings) and a 1-in-5000 chance of winning the entire tournament.
  • Andy Glockner at Sports Illustrated summarizes a few stats about how likely VCU’s run has been.  According to Ken Pomeroy, VCU had a 1-in-3333 chance of making it to the Final Four and a 1-in-203,187 shot to win the title, one of the worst odds of the teams in the field.
  • Slate’s Hang Up and Listen sports podcast discusses the Final Four odds and statistics.  This enjoyable podcast sheds light on quite a few quantitative factors that relate to the tournament and provide several good links for further reading.
  • This Final Four has the highest seed total ever, and it is the first time since 1979 that no 1 or 2 seeds are in the Final Four.

VCU making the Final Four is not “proof” that we should throw out the expert advice from sports nerds since anything can happen.  While anything can indeed happen, each outcome is not equally likely.  Most outcomes are so unlikely to occur that we will not see them in our lifetimes (the probability that all four 16 seeds comprise the Final Four would occur once every eight hundred trillion years on average).  It’s like monkeys randomly typing away. Given enough time, they will rewrite Shakespeare, but don’t expect to see it happen any time soon. However, even though most of the potential tournament outcomes have an infinitesimally small chance of occurring, when you add them all up, there is almost a certain chance that a few unlikely things will occur (which is why we always see a few upsets in the first two rounds).

On the contrary, the excellent analyses from sports nerds will produce the best predictions that work on average.  That is, averaged over a large number of tournaments, their predictions will yield the best results (meaning that brackets produced using advice from the experts who have crunched the numbers will win the office pool most frequently).  That is because the numbers point to the outcomes that are most likely to occur.  There are an enormous number of potential outcomes in the tournament (2^67 ~= 1.5 x 10^20, which is way, way more stars than there are in the Milky Way!), and it helps to have some quantitative advice to prune most of the unlikely outcomes, nearly all of which are even less likely to happen than VCU making the Final Four!  Even the most likely outcomes rarely occur:  we would expect all four one seeds to comprise the Final Four, for example, to occur every 39 years (it has happened once).  The problem is, things don’t average out in a single year–we have just one tournament this year.  In any single year, something unlikely–like VCU reaching the Final Four–has a chance of occurring (albeit a small one).

Now that VCU is in the Final Four, what are their odds of winning the tournament? VCU may have initially had an infinitesimal 1-in-203,187 chance of winning the tournament, but given that they have made it to the Final Four, their odds of winning it all is not unlikely (they’ve completed the hard part of being one of four teams left).  Wayne Winston estimates that they have a 0.11 chance (a 1-in-10 chance) of winning the tournament.  Using past tournament outcomes, Sheldon Jacobson has shown that seeds don’t matter after the Sweet Sixteen round, which means that VCU essentially has a 1-in-4 chance of winning the tournament.  The truth is likely somewhere in between, which means that VCU has an excellent chance of being the national champion. Let’s go Rams!

Related links:

Posted in Uncategorized | Tagged: , | 1 Comment »

bracket tip of the day: pay attention to preseason rankings

Posted by Laura McLay on March 16, 2011

I missed Nate Silver’s NY Times blog post last week about the history of the NCAA basketball tournament based on preseason rankings (instead of merely seeds).  The teams that were not ranked in the AP preseason poll at the beginning of the season tend to underperform in the tournament when compared to other teams with the same seed.

[T]he preseason poll is essentially a prediction of how the teams are likely to perform. The writers who vote in the poll presumably consider things like coaching, the quality of talent on the roster, and how the team has performed in recent seasons.Although we all like to make fun of sportswriters, these predictions are actually pretty decent. Since 2003, the team ranked higher in the A.P. preseason poll (excluding cases where neither team received at least 5 votes) has won 72 percent of tournament games. That’s exactly the same number, 72 percent, as the fraction of games won by the better seed. And it’s a little better than the 71 percent won by teams with the superior Ratings Percentage Index, the statistical formula that the seeding committee prefers. (More sophisticated statistical ratings, like Ken Pomeroy’s, do only a little better, with a 73 percent success rate.)

When I teach multiobjective decision analysis, I mention how cognitive biases indicate that we tend to be overconfident about our initial information.  Nate Silver’s example, however, suggests the opposite: we tend to underestimate the original predictions in favor of metrics available at the end of the season (win-loss records, RPI, various team rankings, etc.).  It’s a nice counterexample for showing that bias is a two way street.

As far as your bracket is concerned, Nate Silver’s blog post suggests that teams like Notre Dame, who was unranked when the season began, are unlikely to get as far in the tournament as their seed might suggest.

Related posts:

Posted in Uncategorized | Tagged: , | Leave a Comment »

bracketology links for team rankings

Posted by Laura McLay on March 14, 2011

Here are a few links about the NCAA basketball tournament.  If you find any good OR/MS bracketology articles, please post them in the comments.  Every year, I try to blog through the tournament, but seeing as today is my due date, I probably won’t be in any condition to blog in the near-future.  I’m tucking a copy of the bracket into my hospital bag in case I have the energy to casually follow the tournament (although I will have more important things on my mind this year!).

Here are three different lists that rank the teams in the tournament using various OR and statistical methods:

A pdf of the bracket is pretty handy.  Also, check out my post from yesterday for more bracketology information.

Posted in Uncategorized | Tagged: , | Leave a Comment »

Bracket Odds for March Madness: A tool for picking a winning bracket

Posted by Laura McLay on March 13, 2011

Will a one seed win the tournament?  How many 4-16 seeds will be in the Final Four?  Bracket Odds, a probabilistic analysis tool by Sheldon Jacobson at the University of Illinois provides the answers.  It is one of a series of tools that can be used by the more quantitative sports fans for picking better brackets.

Rather than making a prediction for a specific matchup (e.g., Duke vs. VCU), Bracket Odds makes seed-based predictions that are probabilistic, not absolute.  The recommendations are based on analyzing patterns from the past tournaments and prior seed matchups in each round of the tournament using a truncated geometric distribution.

Sheldon Jacobson recommends picking Final Four teams with seeds that are a combination of 1, 2, 3, since they result in the most likely outcomes.  Here is his reasoning:

[T]he probability of the Final Four comprising the four top-seeded teams is 0.026, or once every 39 years. Meanwhile, the probability of a Final Four of all No. 16 seeds – the lowest-seeded teams in the tournament – is so small that it has a frequency of happening once every eight hundred trillion years.

Sheldon Jacobson also writes about March Madness Math in the latest OR/MS article (for INFORMS members).  He gives a few hints about how to fill out a winning bracket:

In its most basic form, the game of basketball can be described as a sequence of dependent (Bernoulli) trials with well-defined outcomes. The sum of the resulting outcomes produces a final score. A superbly talented team will consistently defeat a much weaker opponent, even if the talented team plays very poorly and their weaker adversary plays well. This is why a No. 16 seed has never (so far) beaten a No. 1 seed in the first round of the tournament.

Everyone loves upsets, which occur with great regularity and predictability every year, in the first two rounds of the tournament. On average, more than four teams seeded No. 11 to 15 win a first round game; five such upsets occurred in 2010, the same number seen in both 2008 and 2009. On average, more than three teams seeded No. 7 to 14 reach the Sweet Sixteen; four such teams were so fortunate in 2010. In fact, it is rare not to see a team seeded No. 11 or lower in the Sweet Sixteen; this has only happened four times since 1985.

Joel Sokol provides team rankings using the LRMC method, which I have found to be useful for predicting the outcome of a game based on the teams rather than the seeds.  It has performed well in the past, and I’ve found that it does well with predicting upsets in the early rounds.

Other links:

Related posts:

Good luck with your bracket this year!

Posted in Uncategorized | Tagged: , | Leave a Comment »

NCAA roundup

Posted by Laura McLay on March 17, 2010

A few articles about the NCAA tournament using math were in the news.

Depaul math professor Jeffrey Bergen illustrates how hard it is to fill out a correct bracket using straight up combinatorics.  You are less likely to randomly choose winners in a bracket (ignoring seeds) than to win the lottery.  This is because there are 63 games in the tournament (with two potential winners in the first round and 65 potential winners in the final round), whereas there are only 6 numbers in the lottery  (with about 40 numbers to choose from, with replacement). Of course, you increase your odds of correctly predicting all the tournament games by taking the seeds into account, but it’s still tough.  The winning brackets in online contests from a field of millions of entries typically do not predict all games correctly.

An article in the business section of CBS News summarizes some hints that rely on mathematical tools (rather than listening to the talking heads).  They suggest using online tools, including the OR model LRMC (developed by Joel Sokol and others). The article also suggests playing the odds in the first round and choosing all #1 seeds to advance.  They also suggest to play some mind games if you are filling out an office pool, since you can increase your odds of winning by make different–but not unlikely–choices.  They recommend choosing the third or fourth overall pick as the champion rather than the first or second overall pick that most of your rivals are choosing.  It also advises to guard against the bias of availability by not favoring teams that have played against the home town favorite.

ESPN maintains a list of Giant Killers for predicting upsets, which is mainly useful in the early rounds of the tournament.  A giant killer is a “team that beats a tournament opponent seeded at least five spots higher in any round”.  ESPN has a methodology behind their approach–they have

zeroed in on team stats that correlate strongly with upset wins and losses in past tournaments. We’ve conducted multiple regression analyses, which essentially is a way to tell how strongly each member of a group of inputs (those stats) affects an output (giant-killing success or failure). Statistically, [Giant Killers] have:
• Low turnover rates and high rates of generating opponent turnovers.
• High offensive-rebound percentages.
• High 3-point scoring as a proportion of all points scored.

Links:

Posted in Uncategorized | Tagged: , | 4 Comments »

Predictalot for bracket success

Posted by Laura McLay on March 16, 2010

Yahoo! labs has unveiled Predictalot for combining user rules that don’t explicitly rely on math for predicting tournament winners, although there is a lot of complexity in the game. Predictalot is a #P-Hard game that uses combinatorial prediction market methodologies for combining human input and high-performance computing for making better tournament predictions.

Predictalot limits users to a pretty restricted set of rules. The rules focus on more aggregate or simple outcomes that are easy to count (like the sum of the seeds in a given round, rather than the mix of individual seeds).  This isn’t too bad for beta version 1.0, but I still found it frustrating.  For example, when predicting which seed range advances to what round, I am unable to create a rule that indicates that a single five seed or worse will make it to the final four.  I can only create a rule about all final four team seeds.  I also wanted to create a rule about how many Final Four teams would be from the Big Ten conference.  The only two conference rules allowed: (1) predicting the winner and (2) predicting if a conference will have more or fewer wins than another conference.

It is also not clear that “better than a 4 seed” is strictly-better-than or better-than-or-equal-to.  Accuracy is important to me. For example, I was unable to create a rule that a one seed would win the tournament (see image below).  This is something that can easily be fixed.

Still, Predictalot looks pretty good for a beta version, and it will be interesting to see how it works, both in terms of predicting a winner and harnessing the power of social networking.

Links:

Posted in Uncategorized | Tagged: , | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 1,158 other followers