Monthly Archives: March 2011

How likely is VCU’s run to the Final Four? A VCU professor and sports nerd reflects on the likelihood of her school’s path to the Final Four

I am thrilled that VCU made the Final Four this year.  My school’s team had an unlikely path to the Final Four, so unlikely that only 2 of 5.9 million ESPN brackets correctly picked all Final Four teams.  Sports nerds unanimously agree that VCU’s run has indeed been unlikely.

  • Nate Silver at the NY Times tweeted that “VCU reaching Final 4 may be least likely event in the history of the NCAAs. Penn in ’79 is close. So is Villanova winning it all in ’85.”  He wrote an excellent article that summarizes that the numbers show that VCU was indeed less likely to make the Final Four than the other 11 seeds in the tournament.
  • Before the tournament began, Wayne Winston gave VCU a 1-in-1000 chance of reaching the Final Four (using the Sagarin ratings) and a 1-in-5000 chance of winning the entire tournament.
  • Andy Glockner at Sports Illustrated summarizes a few stats about how likely VCU’s run has been.  According to Ken Pomeroy, VCU had a 1-in-3333 chance of making it to the Final Four and a 1-in-203,187 shot to win the title, one of the worst odds of the teams in the field.
  • Slate’s Hang Up and Listen sports podcast discusses the Final Four odds and statistics.  This enjoyable podcast sheds light on quite a few quantitative factors that relate to the tournament and provide several good links for further reading.
  • This Final Four has the highest seed total ever, and it is the first time since 1979 that no 1 or 2 seeds are in the Final Four.

VCU making the Final Four is not “proof” that we should throw out the expert advice from sports nerds since anything can happen.  While anything can indeed happen, each outcome is not equally likely.  Most outcomes are so unlikely to occur that we will not see them in our lifetimes (the probability that all four 16 seeds comprise the Final Four would occur once every eight hundred trillion years on average).  It’s like monkeys randomly typing away. Given enough time, they will rewrite Shakespeare, but don’t expect to see it happen any time soon. However, even though most of the potential tournament outcomes have an infinitesimally small chance of occurring, when you add them all up, there is almost a certain chance that a few unlikely things will occur (which is why we always see a few upsets in the first two rounds).

On the contrary, the excellent analyses from sports nerds will produce the best predictions that work on average.  That is, averaged over a large number of tournaments, their predictions will yield the best results (meaning that brackets produced using advice from the experts who have crunched the numbers will win the office pool most frequently).  That is because the numbers point to the outcomes that are most likely to occur.  There are an enormous number of potential outcomes in the tournament (2^67 ~= 1.5 x 10^20, which is way, way more stars than there are in the Milky Way!), and it helps to have some quantitative advice to prune most of the unlikely outcomes, nearly all of which are even less likely to happen than VCU making the Final Four!  Even the most likely outcomes rarely occur:  we would expect all four one seeds to comprise the Final Four, for example, to occur every 39 years (it has happened once).  The problem is, things don’t average out in a single year–we have just one tournament this year.  In any single year, something unlikely–like VCU reaching the Final Four–has a chance of occurring (albeit a small one).

Now that VCU is in the Final Four, what are their odds of winning the tournament? VCU may have initially had an infinitesimal 1-in-203,187 chance of winning the tournament, but given that they have made it to the Final Four, their odds of winning it all is not unlikely (they’ve completed the hard part of being one of four teams left).  Wayne Winston estimates that they have a 0.11 chance (a 1-in-10 chance) of winning the tournament.  Using past tournament outcomes, Sheldon Jacobson has shown that seeds don’t matter after the Sweet Sixteen round, which means that VCU essentially has a 1-in-4 chance of winning the tournament.  The truth is likely somewhere in between, which means that VCU has an excellent chance of being the national champion. Let’s go Rams!

Related links:

what operations research has taught me about having a baby

After having my third baby last week, I am writing a post on how OR has helped me navigate pregnancy, labor, and birth.  Here is a few ways that OR has enhanced my experiences with having babies.

A solid OR background gave me the necessary knowledge about statistics to wade through all of the pregnancy advice out there and keep my perspective.  Much of the advice is silly and not evidence-based (like keeping your heart rate < 140 bpm, not lying on your back in a low-risk pregnancy, omitting all caffeine, avoiding certain foods, etc.).  Yet some of the best pregnancy advice (like eating your vegetables and staying active) is rarely verbalized.

A solid OR background also means that my educational background alone makes me a likely candidate to avoid most pregnancy complications (education of the mother is almost always a statistically significant factor that is negatively correlated with most adverse pregnancy outcomes).

I know that pregnancy statistics based on aggregate data are almost useless when on my third pregnancy. It’s all about correlation at this point.

After having two labors shorter than average, my midwife told me not to necessarily expect an even shorter labor.  After some discussion, I realized she was explaining that anecdotally, she has observed that most women experience regression to the mean after very short labors.  That makes sense.  If I had a good draw the last time, the odds are, I won’t get as good of a draw this time.  In general, buying into the Flaw of Averages is a bad idea for preparing for labor.  Few labors are “average.” However, I don’t have a good approach for managing the anxiety that comes along with having to prepare for a myriad of labor realizations. (I ended up having my shortest labor this time).

I have found Bayes rule to be extremely helpful for managing prenatal anxiety.  Many screening tests are performed during pregnancy.  The one that perhaps causes the most anxiety in expectant moms is the quad screening test, which attempts to screen for Downs syndrome and spina bifida.  Given that the test comes back positive, for example, a baby has about a 3% chance of having Downs Syndrome (although I recently learned that these odds vary according to the age of the mother).  My quad screen tests came back negative for all three pregnancies, but I was prepared not to panic just in case.  A similar screening test is done for gestational diabetes.  I had a false positive once and took the dreaded three hour glucose test.  That was no fun, but again, I knew not to worry.

My last two deliveries took place at the tail end of a brief surge in births at the hospital (births should be a Poisson process–see my post on the exponential distribution).  This meant that all of the recovery rooms in the hospital were occupied by the time I needed one (!)   Luckily, I was able to get into a room both times, although this time, I am indebted to two kind nurses who pulled a few strings for me.  My hospital stay illustrates that hospitals still need lots of OR for planning hospital beds.  (My hospital stay also suggests that OR could be used to schedule meal deliveries, schedule infant inspections, and organize hospital discharges).

With regard to hospital discharges, the bottleneck in the process is waiting for someone with a wheelchair to take me and baby to the car after discharge.  This was also true three years ago after my last birth.  I would have thought that the bottleneck would be scheduling the “important” stuff, like the pediatrician’s checkup of the baby and the nurse’s checkup of me.  Despite being tired after having a baby, I couldn’t help but start to model the hospital system and mentally note where they need to make improvements.

I have avoided amniocentesis for all of my pregnancies, but if I was offered amniocentesis, I would use a decision tree with my personalized economic model to make the decision.

And most importantly, decision analysis methods confirm that it was a good idea for me to be fruitful and multiply.

    Punk Rock OR has a baby!

    Last week, I welcomed my third child into this world.  Both baby and I are doing well.  I enjoy blogging way too much to take a leave of absence from it, but my blogging frequency in the coming months may be even more erratic than before the baby was born.

    In the mean time, enjoy doing good OR!

    bracket tip of the day: pay attention to preseason rankings

    I missed Nate Silver’s NY Times blog post last week about the history of the NCAA basketball tournament based on preseason rankings (instead of merely seeds).  The teams that were not ranked in the AP preseason poll at the beginning of the season tend to underperform in the tournament when compared to other teams with the same seed.

    [T]he preseason poll is essentially a prediction of how the teams are likely to perform. The writers who vote in the poll presumably consider things like coaching, the quality of talent on the roster, and how the team has performed in recent seasons.Although we all like to make fun of sportswriters, these predictions are actually pretty decent. Since 2003, the team ranked higher in the A.P. preseason poll (excluding cases where neither team received at least 5 votes) has won 72 percent of tournament games. That’s exactly the same number, 72 percent, as the fraction of games won by the better seed. And it’s a little better than the 71 percent won by teams with the superior Ratings Percentage Index, the statistical formula that the seeding committee prefers. (More sophisticated statistical ratings, like Ken Pomeroy’s, do only a little better, with a 73 percent success rate.)

    When I teach multiobjective decision analysis, I mention how cognitive biases indicate that we tend to be overconfident about our initial information.  Nate Silver’s example, however, suggests the opposite: we tend to underestimate the original predictions in favor of metrics available at the end of the season (win-loss records, RPI, various team rankings, etc.).  It’s a nice counterexample for showing that bias is a two way street.

    As far as your bracket is concerned, Nate Silver’s blog post suggests that teams like Notre Dame, who was unranked when the season began, are unlikely to get as far in the tournament as their seed might suggest.

    Related posts:

    on sharing data

    The National Science Foundation (NSF) started to require a data management plan for each new proposal.  The data management plan will require investigators to make their data (including figures, tables, and code) available to encourage collaboration.  An excellent idea! They specifically mention that investigators are required to document their data–not just make it available–so that others could use it, thus creating new opportunities for scientific research.  NSF’s data management plan is similar to NIH’s public access plan, which requires that publications from NIH-funded research are publicly available through Pubmed Central with twelve months of publication.

    The research world is moving toward a place where investigators are required to share data and code.  I once wrote about my insecurities surrounding sharing my code.  While I don’t have insecurities about sharing my data (except trying to find extra time to document data better), I do need to think about creating a system for posting my research materials.  I’m not sure what the solution should look like.

    What is the best place online for sharing research materials?  How should code be stored and formatted?  Tables, figures, data, and code have different formats.  It’s best if they are all stored (or accessed) in the same location.

    My university does not have the best tools for sharing data (at least not that I know of).  Just updating my web site is a pain.  I use my university’s BlackBoard page for my research group that contains my code, papers, references, slides, and whatever else I get tired of emailing students.  However, my BlackBoard site is a closed system that does not allow guests even at my university to have access, so it cannot be used to share data to the public.

    Dropbox may be a good place to store many documents, given that a separate page links to all of the stored data, although I am loathe to use my precious Dropbox space for storing data.  Slideshare and Scribd are good places for sharing slides and technical reports, respectively.  Code can be zipped and uploaded elsewhere.  But having to store each type of file in a different account on a different site would not exactly facilitate sharing information with others (and no fun for me to keep track of all the different logins and passwords), but I could create a Google Sites page to manage the information so that it can be accessed from a single page.

    How do you share your data?  How do you find time to document your data?

    Related posts:

    bracketology links for team rankings

    Here are a few links about the NCAA basketball tournament.  If you find any good OR/MS bracketology articles, please post them in the comments.  Every year, I try to blog through the tournament, but seeing as today is my due date, I probably won’t be in any condition to blog in the near-future.  I’m tucking a copy of the bracket into my hospital bag in case I have the energy to casually follow the tournament (although I will have more important things on my mind this year!).

    Here are three different lists that rank the teams in the tournament using various OR and statistical methods:

    A pdf of the bracket is pretty handy.  Also, check out my post from yesterday for more bracketology information.

    Bracket Odds for March Madness: A tool for picking a winning bracket

    Will a one seed win the tournament?  How many 4-16 seeds will be in the Final Four?  Bracket Odds, a probabilistic analysis tool by Sheldon Jacobson at the University of Illinois provides the answers.  It is one of a series of tools that can be used by the more quantitative sports fans for picking better brackets.

    Rather than making a prediction for a specific matchup (e.g., Duke vs. VCU), Bracket Odds makes seed-based predictions that are probabilistic, not absolute.  The recommendations are based on analyzing patterns from the past tournaments and prior seed matchups in each round of the tournament using a truncated geometric distribution.

    Sheldon Jacobson recommends picking Final Four teams with seeds that are a combination of 1, 2, 3, since they result in the most likely outcomes.  Here is his reasoning:

    [T]he probability of the Final Four comprising the four top-seeded teams is 0.026, or once every 39 years. Meanwhile, the probability of a Final Four of all No. 16 seeds – the lowest-seeded teams in the tournament – is so small that it has a frequency of happening once every eight hundred trillion years.

    Sheldon Jacobson also writes about March Madness Math in the latest OR/MS article (for INFORMS members).  He gives a few hints about how to fill out a winning bracket:

    In its most basic form, the game of basketball can be described as a sequence of dependent (Bernoulli) trials with well-defined outcomes. The sum of the resulting outcomes produces a final score. A superbly talented team will consistently defeat a much weaker opponent, even if the talented team plays very poorly and their weaker adversary plays well. This is why a No. 16 seed has never (so far) beaten a No. 1 seed in the first round of the tournament.

    Everyone loves upsets, which occur with great regularity and predictability every year, in the first two rounds of the tournament. On average, more than four teams seeded No. 11 to 15 win a first round game; five such upsets occurred in 2010, the same number seen in both 2008 and 2009. On average, more than three teams seeded No. 7 to 14 reach the Sweet Sixteen; four such teams were so fortunate in 2010. In fact, it is rare not to see a team seeded No. 11 or lower in the Sweet Sixteen; this has only happened four times since 1985.

    Joel Sokol provides team rankings using the LRMC method, which I have found to be useful for predicting the outcome of a game based on the teams rather than the seeds.  It has performed well in the past, and I’ve found that it does well with predicting upsets in the early rounds.

    Other links:

    Related posts:

    Good luck with your bracket this year!

    land O links

    I haven’t had time to read the news much in the last few weeks, but I found a few good links to share this week.

    • Worrying is good for your health. A massive longitudinal study on longevity finds some counter-intuitive (causal?) relationships between what we do and how long we live.
    • Can cities be described by a set of equations? The answer to this ridiculous-sounding question is actually yesThis Wired article qualitatively describes what these equations mean.  Both the good stuff in cities  (productivity, income, intelligence, innovation) and the bad stuff in cities (violent crime, drug consumption, disease outbreaks, shoplifting) scale superlinearly at a rate of ~1.15.  The NY Times magazine also featured an article about this story.
    • Fellow operations researcher Sanjay Saigal is a guest blogger at the Atlantic this week. His first post was about operations research (woo hoo!).  I enjoyed his post on Indian English.  Check this page for Sanjay’s latest posts.
    • The illustrated guide to the PhD is an amusing visual comic about how getting a PhD makes a small dent in human knowledge.
    • This visual made my day:  If Monet painted Darth Vader (Thanks John D. Cook for the link!)

    on sharing code

    I was recently asked for some of my code that I used in a paper.  First of all, I should firmly state that people should share code. Sharing and openly sharing ideas is, after all, the hallmark of academic research.

    I am not often asked for code, and I had a few reactions to a recent request, which was made by a student.  My first reaction was that of a professor: should I give the student a fish or teach the student how to fish?  When I am usually asked for code, the request is extremely short with no context given, meaning that it is hard for me to gauge how hard the student had tried to get the program to work.  Was there a typo in the paper that is causing a problem? Where are they stuck?  What types of programming errors are they getting?  Are they simply being lazy?  I just don’t know.

    This particular request relied on code that was about 20 lines of code.   Most of my projects are longer, usually comprising of hundreds or thousands of lines of code (I don’t really count, but they can get hairy).  I found myself wondering if the requester–a graduate student–really had thought about how to run the code or was just emailing me.  I exchanged a few emails with the requester before sending my code to make sure that I wasn’t doing someone’s homework for them.

    Another recent request asked for code in a paper that I did the computational work for when I was in graduate school.  I looked at the pseudo-code in the paper draft and wrote the code.  It took me a day or so to get my code working, but it wasn’t particularly painful.  In this case, I was confident that the paper was clear and unambiguous about how to write the code.

    My second concern about sharing code is–and I’m being honest here–my code is one giant hack.  I am not a software programmer.  I know what good, elegant code looks like, and it’s not mine.  I often have to cobble together multiple programs to solve a problem from beginning to end.  I often write a script to run many copies of the same program with different inputs.  I always write code for analyzing the solutions and creating figures.  Over the years, I have gotten good at making my code readable to me, so that I can come back to it after months or years and figure out what I did.  But that’s not the same thing as being readable to someone else.  This is a long way of saying that I’m a little embarrassed about sharing my code with others.  Maybe I’m just prudent and am being too hard on myself.  But I am married to a software programmer, so I very aware of how high the bar really is for “good” code.

    Having someone look at my code is like inviting someone into my house before straightening up first. It’s one thing to show my messy code to a collaborator but it’s another thing to show my messy code to a stranger.  Sharing papers and tech reports is different–they are polished so they are OK to share.  This can be somewhat addressed by commenting code better.  I always start off commenting code well, but during the fog of debugging, my code usually gets a little out of control, and it’s hard to reign in after awhile.  (I’ve seen other people’s code.  I have some good programming habits–my code could be much, much worse).

    However, as I am learning in my discrete optimization course this semester, even simple programming assignments such as implementing the Secretary Problem Markov decision process model can be incredibly difficult for PhD students.  They can benefit from looking at my code.  My homework solution code isn’t as wild and unruly as my research code.  I’m getting used to sharing my code for the homework solutions.

    On a related note, this post by Panos Ipeirotis reflects on how to make code more robust to changes, since old code often does not run if it relies on old libraries.  Dr. Ipeirotis is a computer scientist, and it sounds like he writes more elegant code than I do.  I’m still in square one, meaning that I try to make my code readable to someone else.

    How self-conscious are you about sharing your code?


    the first supercomputer was powered by women

    I stumbled across an article on ENIAC, the world’s first supercomputer that was built by the Army and unveiled in 1946.  It summarizes twelve factoids about ENIAC. I found these two the most interesting:

    5. The original programmers of ENIAC computer were women. The most famous of the group was Jean Jennings Bartik (originally Betty Jennings). The other five women were Kay McNulty, Betty Snyder, Marlyn Wescoff, Fran Bilas, and Ruth Lichterman. All six have been inducted into the Women in Technology International Hall of Fame. When the U.S. Army introduced the ENIAC to the public, it introduced the inventors (Dr. John Mauchly and J. Presper Eckert), but it never introduced the female programmers.

    6. Jean Bartik went on to become an editor for Auerback Publishers, and eventually worked for Data Decisions, which was funded by Ziff-Davis Publishing. She has a museum in her name at Northwest Missouri State university in Maryville, Missouri.

    Kudos to the women of ENIAC and other women of supercomputing fame!

    The women of ENIAC


    Get every new post delivered to your Inbox.

    Join 2,039 other followers