tips for filling out a statistically sound bracket

Go Badgers!!

Here are a few things I do to fill out my bracket using analytics.

1. Let’s start with what not to do. I usually don’t put a whole lot of weight on a team’s record because strength of schedule matters. Likewise, I don’t put a whole lot of weight on bad ranking tools like RPI that do not do a good job of taking strength of schedule into account.

2. Instead of records, use sophisticated ranking tools. The seeding committee using some of these ranking tools to select the seeds, so the seeds themselves reflect strength of schedule and implicitly rank teams.  Here are a few ranking tools that use math modeling.

I like the LRMC (logistic regression Markov chain) method from some of my colleagues at Georgia Tech. Again: RPI bad, LRMC good.

3. Survival analysis quantifies how far each each team is likely to make it in the tournament. This doesn’t give you insight into team-to-team matchups per se, but you can think about the probability that Wisconsin making it to the Final Four reflecting an kind of average across the different teams a team might play during the tournament.

4. Look at the seeds. Only once did all four 1-seeds make the Final Four. It’s a tough road. Seeds matter a lot in the rounds of 64 and 32, not so much after that point. There will be upsets. Some seed match ups produce more upsets than others. The 7-10 and 5-12 match ups are usually good to keep an eye on.

4. Don’t ignore preseason rankings. The preseason rankings are educated guesses on who the best teams are before any games have been played. It may seem silly to consider preseason rankings at the end of the season after all games have been played (when we have much better information!) but the preseason rankings seem to reflect some of the intangibles that predict success in the tournament (a team’s raw talent or athleticism).

6.Math models are very useful, but they have their limits. Math models implicitly assume that the past is good for predicting the future. This is not usually a good assumption when a team has had any major changes, like injuries or suspensions. You can check out crowdsourcing data (who picked who in a matchup), expert opinion, and things like injury reports to make the final decision.

For more reading:

operations research improves school choice in Boston

Many cities allow families to choose elementary schools to address growing inequities in school instruction and performance. School choice lets families give a rank ordering of their preferred schools, and a lottery ultimately assigns students to schools. The result is that many students have to travel a long way to school, crazy bus schedules, and students on the same block who do not know each other because they go to different schools.

Peng Shi at MIT (with advisor Itai Ashlagi) won the 2013 Doing Good with Good OR Award held by INFORMS with his project entitled “Guiding school choice reform through novel applications of operations research” that addressed and improved Boston’s school choice model.  I am pleased to find that his paper based on this project is in press at Interfaces (see below).

The schools in Boston were divided into three zones, and every family could choose among the schools in their zone. Each zone was huge, so on any given block, students might be attending a dozen different schools. See a graphic in a Boston Globe report to see more about the problem.

Peng balanced equity with social problems introduced with the choice model by proposing a limited choice model. His plan was to let every family to choose among just a few schools: the best and the closest. Families could choose from the 2 closest in the top 25% of schools, the 4 closest in the top 50% of schools, and the 6 closest top 75% schools, the 3 closest “capacity schools,” and any school within a mile. There was generally a lot of overlap between these sets, so families had about 8 choices in total (a lot less than the original school choice model!). This gave all families good choices while managing some of the unintended consequences of a school choice system (busing and transportation, distant schools, neighbors who didn’t know each other).

The model itself was not obvious: there is no “textbook” way to model choice. Peng visited with the school board and iteratively adapted and changed his model to address concerns within the community.  This resulted in the model becoming simpler and more transparent to parents (most parents don’t know about linear programming!). The new model pairs above average schools with below average schools in a capacity-weighted way to make school pairs have comparable average qualities. This lets families choose from school partners, the closest four schools, and schools within a mile.

The school board voted to adopt his plan. Peng worked with the school district to come up with important outcomes to evaluate. The model itself uses linear programming to “ration” school seats probabilistically among students by minimizing the expected distance subject to constraints. To parameterize the model, he used a multinomial logit model to fit the data (with validation). He also ran a simulation with Gale-Shapley’s deferred acceptance algorithm as a proof of concept to ensure that the model would work.

See Peng Shi’s web site for more information. Some of his documentation is here.

I’ve been on the INFORMS Doing Good with Good OR Award committee for the past three years. This award honors and celebrates student research with societal impact. I love this committee – I get to learn about how students are making the world a better place through optimization. And these projects really do make a difference: all applications must submit a letter from the sponsor attesting to improvements. Submissions are due near the end of the semester (hint hint!)


Guiding School-Choice Reform through Novel Applications of Operations Research by Peng Shi
Interfaces articles in advance

land O links

Here are a few links for your weekend reading:

  1. I’ve used the Monty Hall Problem in class. I didn’t realize agreeing on the correct solution was so controversial.
  2. Leonard Nimoy’s portrayal of Spock “in many ways co-created, helped define geek/nerd personality and interests for millions of future geeks” So true.
  3. Sports analytics is great and all that, but there is a serious lack of data in women’s sports.
  4. Five class Atari games that totally stump google’s artificial intelligence algorithm (like Asteroids!)
  5. The math behind getting all that damn snow off your street.
  6. Han Solo shot first: the surprising significance of past tense, present tense, and Wikipedia.
  7. A sobering Los Angeles Times article about why women are leaving the tech industry in droves (hint: it’s a hostile work environment)

be a satisficer or use the secretary problem to find love!

I was surprised to read that a study recommends using a satisficing strategy to find a mate [Link to article, Michigan State press release]

The researchers studied the evolution of risk aversion and discovered that it is in the human nature to go for the safe bet when their stakes are high. For instance whether or not mate. This human nature is traced back to the earliest period of their evolution.

The study is computational (i.e., not a psychological study) and involves simulation that mimics the degree and range of risk-taking in human behavior. It suggests that we are satisficers, not optimizers* and that something like the secretary problem is not suitable for finding a mate/spouse [read Mike Trick’s counterpoint here].

In the secretary problem, the goal is to maximize the probability of finding the “best” secretary and its optimal solution by interviewing up to n secretaries, letting a few go, and then selecting the next secretary that is the “best” so far (totally how dating works, right?!) The secretary problem focuses on finding the best possible solution (only the best will do) but the optimal strategy may lead to a suboptimal secretary (good but not the best) or no secretary at all. In fact, the optimal strategy leaves a pretty big chance of finding no secretary, which will occur if the “best” overall is one of those candidates that we let go at the beginning. I have a figure below of the probability that a candidate is hired at all (it’s the blue line), and it’s way less than 1.0.

The probability of hiring a secretary/finding a spouse as a function of the number of candidates (n) simulated over 10,000 replications

The probability of hiring a secretary/finding a spouse and the probability of hiring the *best* secretary/choosing the right spouse as a function of the number of candidates (n) simulated over 10,000 replications

Do I think this satisficing advice is really worth taking? Well, I have three daughters and I’ll encourage them to optimize instead of satisfice.

Here are my other Valentine’s Day, secretary problem, and love posts from my blog archive:

Valentine’s day posts from the #orms blogosphere:

* Yes, I know that satisficing and optimizing are really the same thing, but I finding-the-absolute-best and not-walking-away-without-empty-handed are really two different objective functions.

Do you optimize or satisfice? What other OR models can be compared to finding a partner for life aside from the secretary problem?

snowblowing is NP-complete

The recent winter storm left a lot of snow on my driveway. A lot. My driveway is the perfectly place for huge snowdrifts to form. A tweet of my shoveling resulted in the discovery of The Snowblower Problem  by Esther M. Arkin, Michael A. Bender, Joseph S. B. Mitchell, and Valentin Polishchuk (HT @fbahr)

The Snowblower Problem (SBP) answers the following question:

How does one optimally use a snowblower to clear a given polygonal region?

The snowblower problem is like the Traveling Salesman Problem (TSP): the  objective to find the shortest snow removing tour to remove all the snow from a domain (the polygonal region). The difference between TSP and SBP is that the snow is displaced into a nearby region in the SBP and that if the snow is piled too high, then the snowblower cannot clear the snow. SBP is NP-complete.

There are three model variants considered in the paper differ in how and where you can throw the snow: (1) the default model where snow can be thrown in any direction, (2) the adjustable throw direction (left, right, or center) and (3) left throw only. Changing the snow throw direction is cumbersome, so the fixed direction model variant (the left throw only) has more practical value.

Theorem: The SBP is NP-complete, both in the default model and in the adjustable throw model, for inputs that are polygonal domains with holes.

From the paper.

From The Snowblower Problem by Esther M. Arkin, Michael A. Bender, Joseph S. B. Mitchell, and Valentin Polishchuk.

The SBP is similar in spirit to various zombie research models: it’s a silly problem context that has real applications. The applications for SBP are in milling and lawn-mowing. And you guessed it: lawn mowing is also NP-complete.

The paper goes on to present various approximation algorithms for SBP. The algorithms used decompose the snowy region into Voronoi cells and then clear the domain cell-by-cell. It is difficult to succinctly summarize the results here without introducing a bunch of mathematical notation so I’ll refer you directly to the paper for mathematical details.  The conclusion notes that the approximation ratios are likely not the best possible, so there are opportunities for follow on work.

What is your snow removing algorithm and how close to optimality is it?


Update: this is my favorite comment about this post on twitter. I also shovel my snow the old fashioned way and agree!


it’s possible that we have both record levels of immunization & record levels of vulnerability to infectious disease: why social networks matter

My recent blog post on eradicating polio through vaccination ends with this:

Part of the reason why vaccination is challenging is because social networks play a critical role in disease transmission. Even if enough people have been vaccinated in aggregate to obtain herd immunity in theory, it may not be enough if there are hot spots of unvaccinated children who can cause outbreaks. There are hot spots in some areas in California and other states that have generous exemption policies.

I want to elaborate. It’s possible that we have both:

  1. record levels of immunization, and
  2. record levels of vulnerability to highly infectious diseases.

The CDC estimates that 95% of kindergarteners have had MMR and DTaP immunization and 93% have had varicella (chicken pox) immunization. [Link] And yet there are outbreaks! The CDC’s immunization goal is 95%. Given that some kids truly should not be immunized (like those with specific allergies or those who have had serious reactions to other vaccines), there isn’t much room to allow for parents to choose to opt out while maintaining herd immunity. In fact, just a few people opting out has been linked to several disease outbreaks [Links here and here]. This is especially critical for newborns, who cannot be immunized for anything except Hepatitis B, and babies under a year old who cannot get the MMR vaccine for the measles. There is a measles outbreak among babies who cannot be vaccinated against the measles yet in a Chicagoland day care. In other words: please vaccinate your children if you can.

<side note>The rotavirus vaccine was off the market from 1999 to 2006, so my oldest daughter wasn’t vaccinated. She came down with a bad case of rotavirus at age 3.5 when my second daughter was 3 months old. Luckily, my second daughter had received her first rotavirus immunization 2 weeks prior and didn’t get sick.<\side note>

Despite having record levels of immunization, we’ve seen a lot of cases of pertussis, measles, and other infectious diseases. It’s all about social networks!

Let’s return to herd immunity. Estimates from Wikipedia indicate that most diseases require an immunization rate of 85%-94% (the herd immunity threshold) in a “well mixed” population to achieve herd immunity. In other words, this might be an optimistically low threshold when kind of ignoring social networks. An article in The Atlantic reports a 92% herd immunity threshold for most diseases and a 95% herd immunity threshold for highly infectious diseases like the measles (they cite a World Health Organization document). A population being well-mixed is a big assumption. Kids within the same school may travel in different social circles that are more homogenous than the school as a whole. That’s important. A social circle that has a low level of immunization may introduce risk to the kids in the community even if the school as a whole is above the herd immunity threshold. That’s what happened in the Chicagoland day care with the unvaccinated babies who hung out together every day. But more generally, people who don’t vaccinate generally have friends who also don’t vaccinate.

The Guardian has a nice simulation about herd immunity and social networks. They consider a few different communities with different vaccination rates (from 10% to 99.7%) with vaccinated, unvaccinated (susceptible) and vaccinated but susceptible individuals (the CDC estimates that MMR only “takes” in 93%-97% of those vaccinated). A few random individuals then come in contact with the measles. The red individuals represent infections. There are measles outbreaks even with a 90% vaccination rate.

The Guardian simulation results for exposing a community to measles.

The Guardian simulation results for exposing a community to measles. Sometimes a vulnerable community is OK (see the 83.8% vax rate here) but in general there is an outbreak.

We need vaccines because they protect against highly contagious diseases. Epidemiologists use the “basic reproduction number” (R0) to estimate how infectious a disease is, where R0 is the average number of people someone with the disease is expected to infect. The smaller R0 is, the less a disease tends to spread. Exponential growth happens when R0 > 1, but there is exponential growth (flu R0=2.5) and then there is exponential growth (measles R0 = 16!!). While R0 can be lowered by actions like good hygiene, some diseases are inherently more contagious than others. We can’t get the measles to spread as “slowly” as seasonal flu no matter how much we encourage people to wash their hands and use hand sanitizer. This is why we need vaccines as well as a higher herd immunity threshold for diseases like measles than we do for the flu. When that one person who hasn’t been vaccinated infects 12-18 other people, we have an epidemic on our hands. From the Wall Street Journal:

“Imagine if you had a reproduction number of 15 for measles with everybody susceptible,” said Derek A.T. Cummings, a professor of epidemiology at Johns Hopkins Bloomberg School of Public Health. “If you go in and vaccinate half the people, the expected reproduction number goes down to 7.5.”

The Guardian provides a nice figure of deadliness (the y-axis) vs. the basic reproduction numbers (the x-axis) of various diseases. That cluster of diseases on the right hand side is composed of highly infectious. Measles, rotavirus, whooping cough are highly infectious in ways that the seasonal flu just isn’t. In fact, the CDC recommends vaccines for that entire cluster of highly infectious diseases over there on the right except for malaria (which isn’t a huge problem in the US) because they are really that bad.

Deadliness vs. Basic Reproduction Number.

Deadliness vs. Basic Reproduction Number.

I get the impression we keep having these debates without agreeing on what the problem is and what its consequences are. It’s reasonable to say that most people in my generation have no idea what a massive infectious disease outbreak looks like. It’s not like the seasonal flu, where you know just a few people who succomb to the flu every year but you’re usually OK. With these highly contagious diseases, outbreaks may be rare but then BOOM, everyone you know is sick. Chicken pox (R0 = 8.5) may be an exception. The chicken pox epidemic in my kindergarten class was the only time I experienced a massive infectious disease outbreak (class attendance dwindled to 3-5 students for a few days).

<side note>I may have had the mildest case of the chicken pox ever recorded. I had just a few poxes/blisters and itched for maybe an hour. But I bear a scar on my face from one of the few poxes I had(!) </side note>

I’ll stop here for now. Let me know your thoughts on vaccination, social networks, herd immunity, and disease outbreaks.

Related posts:

eradicating polio through vaccination and with analytics

The most recent issue of Interfaces (Jan-Feb 2015, 45(1)) has an article about eradicating polio published by Kimberly M. Thompson, Radboud J. Duintjer Tebbens, Mark A. Pallansch, Steven G.F. Wassilak, and Stephen L. Cochi from Kid Risk, Inc., and the U.S. Centers for Disease Control and Prevention (CDC). This paper develops and applies a few analytics models to inform policy questions regarding the eradication of polioviruses (polio) [Link to paper].

The article is timely given that vaccination is in the news again. At least this time, the news is fueled by outrage over GOP Presidential contenders Chris Christie and Rand Paul’s belief that parents should have the choice to vaccinate their children [example here].

Polio has essentially been eradicated in the United States, but polio has not been eradicated in the developing world. The Global Polio Eradication Initiative (GPEI) helped to reduce the number of paralytic polio cases from 350,000 in 1988 to 2,000 in 2001. This enormous reduction has mainly been achieved through vaccination. There are two types of vaccines: the live oral vaccine and the inactivated vaccine (IPV). Those who have been vaccinated have lifelong protection but can participate in polio transmission.

The paper summarizes a research collaboration that occurred over a decade and was driven by three questions asked by global policy leaders:

  • What vaccine (if any) should countries use after wild polioviruses (WPV) eradication, considering both health and economic outcomes?
  • What risks will need to be managed to achieve and maintain a world free of polio?
  • At the time of the 1988 commitment to polio eradication, most countries expected to stop polio vaccinations after WPV eradication, as had occurred for smallpox. Would world health leaders still want to do so after the successful eradication of WPVs?

The paper is written at a fairly high level, since it summarizes about a decade of research that has been published in several papers. They ended up using quite a few methodologies to answer quite a few questions, not just about routine immunization. Here is a snippet from the abstract (emphasis mine):

Over the last decade, the collaboration innovatively combined numerous operations research and management science tools, including simulation, decision and risk analysis, system dynamics, and optimization to help policy makers understand and quantify the implications of their choices. These integrated modeling efforts helped motivate faster responses to polio outbreaks, leading to a global resolution and significantly reduced response time and outbreak sizes. Insights from the models also underpinned a 192-country resolution to coordinate global cessation of the use of one of the two vaccines after wild poliovirus eradication (i.e., allowing continued use of the other vaccine as desired). Finally, the model results helped us to make the economic case for a continued commitment to polio eradication by quantifying the value of prevention and showing the health and economic outcomes associated with the alternatives. The work helped to raise the billions of dollars needed to support polio eradication.

The following figure from the paper summarizes some of the problems addressed by the research team. The problems involved everything from stockpiling vaccines, to administering vaccines for routine immunization and to containing outbreaks:

A decision tree showing the possible options for preventing and containing polio. From "Polio Eradicators Use Integrated Analytical Models to Make Better Decisions" by Kimberly M. Thompson, Radboud J. Duintjer Tebbens, Mark A. Pallansch, Steven G.F. Wassilak, Stephen L. Cochi in Interfaces

A decision tree showing the possible options for preventing and containing polio. From “Polio Eradicators Use Integrated Analytical Models to Make Better Decisions” by Kimberly M. Thompson, Radboud J. Duintjer Tebbens, Mark A. Pallansch, Steven G.F. Wassilak, Stephen L. Cochi in Interfaces

I wanted to include one of the research figures used in the paper that helped guide policy and obtain funding. The figure (see below) is pretty interesting. It shows the costs, in terms of dollars ($) and paralytic polio cases associated with two strategies over a 20 year horizon: (1) intense vaccination until eradication or (2) intense vaccination but only until it’s “cost effective” (routine immunization). The simulation results show that the cumulative costs (in dollars or lives affected) are much, much lower over a 20 year time horizon if they adopt a the vaccination until eradication strategy. This helped to make a big splash. From the paper:

In a press release related to this analysis, Dr. Tachi Yamada, then president of the Bill & Melinda Gates Foundation’s
Global Health Programs stated: “This study presents a clear case for fully and immediately funding global polio eradication, and ensuring that children everywhere, rich and poor, are protected from this devastating disease.” In 2011, Bill and Melinda Gates made polio eradication the highest priority for their foundation


In full disclosure, I’m a big fan of immunization. All of my children are fully vaccinated. My grandmother was born in 1906 and used to tell stories about relatives, so many of whom ultimately died of infectious diseases (Grandma lived until she was ~102!). I’m glad my kids don’t have to worry about getting many of these diseases. I’m also proud to contribute to herd immunity.  We come in contact with people who have compromised immune systems or could not get immunized, and I’m glad we’re playing our part in keeping everyone else healthy. Part of the reason why vaccination is challenging is because social networks play a critical role in disease transmission. Even if enough people have been vaccinated in aggregate to obtain herd immunity in theory, it may not be enough if there are hot spots of unvaccinated children who can cause outbreaks. There are hot spots in some areas in California and other states that have generous exemption policies.

Related posts:




Get every new post delivered to your Inbox.

Join 2,661 other followers