# Monthly Archives: March 2011

## How likely is VCU’s run to the Final Four? A VCU professor and sports nerd reflects on the likelihood of her school’s path to the Final Four

I am thrilled that VCU made the Final Four this year.  My school’s team had an unlikely path to the Final Four, so unlikely that only 2 of 5.9 million ESPN brackets correctly picked all Final Four teams.  Sports nerds unanimously agree that VCU’s run has indeed been unlikely.

• Nate Silver at the NY Times tweeted that “VCU reaching Final 4 may be least likely event in the history of the NCAAs. Penn in ’79 is close. So is Villanova winning it all in ’85.”  He wrote an excellent article that summarizes that the numbers show that VCU was indeed less likely to make the Final Four than the other 11 seeds in the tournament.
• Before the tournament began, Wayne Winston gave VCU a 1-in-1000 chance of reaching the Final Four (using the Sagarin ratings) and a 1-in-5000 chance of winning the entire tournament.
• Andy Glockner at Sports Illustrated summarizes a few stats about how likely VCU’s run has been.  According to Ken Pomeroy, VCU had a 1-in-3333 chance of making it to the Final Four and a 1-in-203,187 shot to win the title, one of the worst odds of the teams in the field.
• Slate’s Hang Up and Listen sports podcast discusses the Final Four odds and statistics.  This enjoyable podcast sheds light on quite a few quantitative factors that relate to the tournament and provide several good links for further reading.
• This Final Four has the highest seed total ever, and it is the first time since 1979 that no 1 or 2 seeds are in the Final Four.

VCU making the Final Four is not “proof” that we should throw out the expert advice from sports nerds since anything can happen.  While anything can indeed happen, each outcome is not equally likely.  Most outcomes are so unlikely to occur that we will not see them in our lifetimes (the probability that all four 16 seeds comprise the Final Four would occur once every eight hundred trillion years on average).  It’s like monkeys randomly typing away. Given enough time, they will rewrite Shakespeare, but don’t expect to see it happen any time soon. However, even though most of the potential tournament outcomes have an infinitesimally small chance of occurring, when you add them all up, there is almost a certain chance that a few unlikely things will occur (which is why we always see a few upsets in the first two rounds).

On the contrary, the excellent analyses from sports nerds will produce the best predictions that work on average.  That is, averaged over a large number of tournaments, their predictions will yield the best results (meaning that brackets produced using advice from the experts who have crunched the numbers will win the office pool most frequently).  That is because the numbers point to the outcomes that are most likely to occur.  There are an enormous number of potential outcomes in the tournament (2^67 ~= 1.5 x 10^20, which is way, way more stars than there are in the Milky Way!), and it helps to have some quantitative advice to prune most of the unlikely outcomes, nearly all of which are even less likely to happen than VCU making the Final Four!  Even the most likely outcomes rarely occur:  we would expect all four one seeds to comprise the Final Four, for example, to occur every 39 years (it has happened once).  The problem is, things don’t average out in a single year–we have just one tournament this year.  In any single year, something unlikely–like VCU reaching the Final Four–has a chance of occurring (albeit a small one).

Now that VCU is in the Final Four, what are their odds of winning the tournament? VCU may have initially had an infinitesimal 1-in-203,187 chance of winning the tournament, but given that they have made it to the Final Four, their odds of winning it all is not unlikely (they’ve completed the hard part of being one of four teams left).  Wayne Winston estimates that they have a 0.11 chance (a 1-in-10 chance) of winning the tournament.  Using past tournament outcomes, Sheldon Jacobson has shown that seeds don’t matter after the Sweet Sixteen round, which means that VCU essentially has a 1-in-4 chance of winning the tournament.  The truth is likely somewhere in between, which means that VCU has an excellent chance of being the national champion. Let’s go Rams!

Related links:

## what operations research has taught me about having a baby

After having my third baby last week, I am writing a post on how OR has helped me navigate pregnancy, labor, and birth.  Here is a few ways that OR has enhanced my experiences with having babies.

A solid OR background gave me the necessary knowledge about statistics to wade through all of the pregnancy advice out there and keep my perspective.  Much of the advice is silly and not evidence-based (like keeping your heart rate < 140 bpm, not lying on your back in a low-risk pregnancy, omitting all caffeine, avoiding certain foods, etc.).  Yet some of the best pregnancy advice (like eating your vegetables and staying active) is rarely verbalized.

A solid OR background also means that my educational background alone makes me a likely candidate to avoid most pregnancy complications (education of the mother is almost always a statistically significant factor that is negatively correlated with most adverse pregnancy outcomes).

I know that pregnancy statistics based on aggregate data are almost useless when on my third pregnancy. It’s all about correlation at this point.

After having two labors shorter than average, my midwife told me not to necessarily expect an even shorter labor.  After some discussion, I realized she was explaining that anecdotally, she has observed that most women experience regression to the mean after very short labors.  That makes sense.  If I had a good draw the last time, the odds are, I won’t get as good of a draw this time.  In general, buying into the Flaw of Averages is a bad idea for preparing for labor.  Few labors are “average.” However, I don’t have a good approach for managing the anxiety that comes along with having to prepare for a myriad of labor realizations. (I ended up having my shortest labor this time).

I have found Bayes rule to be extremely helpful for managing prenatal anxiety.  Many screening tests are performed during pregnancy.  The one that perhaps causes the most anxiety in expectant moms is the quad screening test, which attempts to screen for Downs syndrome and spina bifida.  Given that the test comes back positive, for example, a baby has about a 3% chance of having Downs Syndrome (although I recently learned that these odds vary according to the age of the mother).  My quad screen tests came back negative for all three pregnancies, but I was prepared not to panic just in case.  A similar screening test is done for gestational diabetes.  I had a false positive once and took the dreaded three hour glucose test.  That was no fun, but again, I knew not to worry.

My last two deliveries took place at the tail end of a brief surge in births at the hospital (births should be a Poisson process–see my post on the exponential distribution).  This meant that all of the recovery rooms in the hospital were occupied by the time I needed one (!)   Luckily, I was able to get into a room both times, although this time, I am indebted to two kind nurses who pulled a few strings for me.  My hospital stay illustrates that hospitals still need lots of OR for planning hospital beds.  (My hospital stay also suggests that OR could be used to schedule meal deliveries, schedule infant inspections, and organize hospital discharges).

With regard to hospital discharges, the bottleneck in the process is waiting for someone with a wheelchair to take me and baby to the car after discharge.  This was also true three years ago after my last birth.  I would have thought that the bottleneck would be scheduling the “important” stuff, like the pediatrician’s checkup of the baby and the nurse’s checkup of me.  Despite being tired after having a baby, I couldn’t help but start to model the hospital system and mentally note where they need to make improvements.

I have avoided amniocentesis for all of my pregnancies, but if I was offered amniocentesis, I would use a decision tree with my personalized economic model to make the decision.

And most importantly, decision analysis methods confirm that it was a good idea for me to be fruitful and multiply.

## Punk Rock OR has a baby!

Last week, I welcomed my third child into this world.  Both baby and I are doing well.  I enjoy blogging way too much to take a leave of absence from it, but my blogging frequency in the coming months may be even more erratic than before the baby was born.

In the mean time, enjoy doing good OR!

## bracket tip of the day: pay attention to preseason rankings

I missed Nate Silver’s NY Times blog post last week about the history of the NCAA basketball tournament based on preseason rankings (instead of merely seeds).  The teams that were not ranked in the AP preseason poll at the beginning of the season tend to underperform in the tournament when compared to other teams with the same seed.

[T]he preseason poll is essentially a prediction of how the teams are likely to perform. The writers who vote in the poll presumably consider things like coaching, the quality of talent on the roster, and how the team has performed in recent seasons.Although we all like to make fun of sportswriters, these predictions are actually pretty decent. Since 2003, the team ranked higher in the A.P. preseason poll (excluding cases where neither team received at least 5 votes) has won 72 percent of tournament games. That’s exactly the same number, 72 percent, as the fraction of games won by the better seed. And it’s a little better than the 71 percent won by teams with the superior Ratings Percentage Index, the statistical formula that the seeding committee prefers. (More sophisticated statistical ratings, like Ken Pomeroy’s, do only a little better, with a 73 percent success rate.)

When I teach multiobjective decision analysis, I mention how cognitive biases indicate that we tend to be overconfident about our initial information.  Nate Silver’s example, however, suggests the opposite: we tend to underestimate the original predictions in favor of metrics available at the end of the season (win-loss records, RPI, various team rankings, etc.).  It’s a nice counterexample for showing that bias is a two way street.

As far as your bracket is concerned, Nate Silver’s blog post suggests that teams like Notre Dame, who was unranked when the season began, are unlikely to get as far in the tournament as their seed might suggest.

Related posts:

## on sharing data

The National Science Foundation (NSF) started to require a data management plan for each new proposal.  The data management plan will require investigators to make their data (including figures, tables, and code) available to encourage collaboration.  An excellent idea! They specifically mention that investigators are required to document their data–not just make it available–so that others could use it, thus creating new opportunities for scientific research.  NSF’s data management plan is similar to NIH’s public access plan, which requires that publications from NIH-funded research are publicly available through Pubmed Central with twelve months of publication.

The research world is moving toward a place where investigators are required to share data and code.  I once wrote about my insecurities surrounding sharing my code.  While I don’t have insecurities about sharing my data (except trying to find extra time to document data better), I do need to think about creating a system for posting my research materials.  I’m not sure what the solution should look like.

What is the best place online for sharing research materials?  How should code be stored and formatted?  Tables, figures, data, and code have different formats.  It’s best if they are all stored (or accessed) in the same location.

My university does not have the best tools for sharing data (at least not that I know of).  Just updating my web site is a pain.  I use my university’s BlackBoard page for my research group that contains my code, papers, references, slides, and whatever else I get tired of emailing students.  However, my BlackBoard site is a closed system that does not allow guests even at my university to have access, so it cannot be used to share data to the public.

Dropbox may be a good place to store many documents, given that a separate page links to all of the stored data, although I am loathe to use my precious Dropbox space for storing data.  Slideshare and Scribd are good places for sharing slides and technical reports, respectively.  Code can be zipped and uploaded elsewhere.  But having to store each type of file in a different account on a different site would not exactly facilitate sharing information with others (and no fun for me to keep track of all the different logins and passwords), but I could create a Google Sites page to manage the information so that it can be accessed from a single page.

How do you share your data?  How do you find time to document your data?

Related posts:

## bracketology links for team rankings

Here are a few links about the NCAA basketball tournament.  If you find any good OR/MS bracketology articles, please post them in the comments.  Every year, I try to blog through the tournament, but seeing as today is my due date, I probably won’t be in any condition to blog in the near-future.  I’m tucking a copy of the bracket into my hospital bag in case I have the energy to casually follow the tournament (although I will have more important things on my mind this year!).

Here are three different lists that rank the teams in the tournament using various OR and statistical methods:

A pdf of the bracket is pretty handy.  Also, check out my post from yesterday for more bracketology information.

## Bracket Odds for March Madness: A tool for picking a winning bracket

Will a one seed win the tournament?  How many 4-16 seeds will be in the Final Four?  Bracket Odds, a probabilistic analysis tool by Sheldon Jacobson at the University of Illinois provides the answers.  It is one of a series of tools that can be used by the more quantitative sports fans for picking better brackets.

Rather than making a prediction for a specific matchup (e.g., Duke vs. VCU), Bracket Odds makes seed-based predictions that are probabilistic, not absolute.  The recommendations are based on analyzing patterns from the past tournaments and prior seed matchups in each round of the tournament using a truncated geometric distribution.

Sheldon Jacobson recommends picking Final Four teams with seeds that are a combination of 1, 2, 3, since they result in the most likely outcomes.  Here is his reasoning:

[T]he probability of the Final Four comprising the four top-seeded teams is 0.026, or once every 39 years. Meanwhile, the probability of a Final Four of all No. 16 seeds – the lowest-seeded teams in the tournament – is so small that it has a frequency of happening once every eight hundred trillion years.

Sheldon Jacobson also writes about March Madness Math in the latest OR/MS article (for INFORMS members).  He gives a few hints about how to fill out a winning bracket:

In its most basic form, the game of basketball can be described as a sequence of dependent (Bernoulli) trials with well-defined outcomes. The sum of the resulting outcomes produces a final score. A superbly talented team will consistently defeat a much weaker opponent, even if the talented team plays very poorly and their weaker adversary plays well. This is why a No. 16 seed has never (so far) beaten a No. 1 seed in the first round of the tournament.

Everyone loves upsets, which occur with great regularity and predictability every year, in the first two rounds of the tournament. On average, more than four teams seeded No. 11 to 15 win a first round game; five such upsets occurred in 2010, the same number seen in both 2008 and 2009. On average, more than three teams seeded No. 7 to 14 reach the Sweet Sixteen; four such teams were so fortunate in 2010. In fact, it is rare not to see a team seeded No. 11 or lower in the Sweet Sixteen; this has only happened four times since 1985.

Joel Sokol provides team rankings using the LRMC method, which I have found to be useful for predicting the outcome of a game based on the teams rather than the seeds.  It has performed well in the past, and I’ve found that it does well with predicting upsets in the early rounds.

Other links:

Related posts:

Good luck with your bracket this year!