introducing Badger Bracketology, a tool for forecasting the NCAA football playoff

bucky_shoots_and_scoresToday I am introducing Badger Bracketology:

I have long been interested in football analytics, and I enjoy crunching numbers while watching the games. This year is the first season for the NCAA football playoff, where four teams will play to determine the National Champion. It’s a small bracket, but it’s a start in the right direction.

The first step to being becoming the national champion is to make the playoff. To do so, a team must be one of the top four ranked teams at the end of the season. A selection committee manually ranks the teams, and they are given a slew of information and other rankings to make their decisions.

I wanted to see if I could forecast the playoff ahead of time by simulating the rest of the season rather than waiting until all season’s games have been played. Plus, it’s a fun project that I can share with my undergraduate simulation simulation that I teach in the spring.

Here is how my simulation model works. The most critical part is the ranking method, which uses the completed game results to rate and then rank the teams so that I can forecast who the top 4 teams will be at the end of the season. I need to do this solely using math (no humans in the loop!) in each of 10,000 replications. Here is how it works. I start with the outcomes of the games played so far, starting with at least 8 weeks of data. This is used to come up with a rating for each team that I then rank. The ranking methodology uses a connectivity matrix based on Google’s PageRank algorithm (similar to a Markov chain). So far, I’ve considered three variants of this model that take various bits of information account like who a team beats, who it loses to, and the additional value provided by home wins. I used data from the 2012 and 2013 seasons to tune the parameters needed for the models.

The ratings along with the impact of home field advantage are then used to determine a win probability for each game. From previous years, we found that the home team won 56.9% of games later in the season (week 9 or later), which accounts for an extra boost in win probability of ~6.9% for home teams. This is important since there are home/away games as well as games on neutral sites, and we need to take this into account. The simulation selects winners in the next week of games by essentially flipping a biased coin with. Then, the teams are re-ranked after each week of simulated game outcomes. This is repeated until we get to the end of the season. Finally, I identify and simulate the conference championship games played (these are the only games not scheduled in advance). And then we end up with a final ranking. Go here for more details.

There are many methods for predicting the outcome of a game in advance. Most of the sophisticated methods use additional information that we could not expect to obtain weeks ahead of time (like the point spread, point outcomes, yards allowed, etc.). Additionally, some of the methods simply return win probabilities and cannot be used to identify the top four teams at the end of the season. My method is simple, but it gives us everything we need without being so complex that I would be suspicious of overfitting. The college football season is pretty short, so our matrix is really sparse. At present, teams have played 8 weeks of football in sum, but many teams have played just 6-7 games. Additional information could be used to help make better predictions, and I hope to further refine and improve the model in coming years. Suggestions for improving the model will be well-received.

Our results for our first week of predictions are here. Check back each week for more predictions.

Badger Bracketology:

Our twitter handle is: @badgerbrackets

Your thoughts and feedback are welcome!

Additional reading:


what I learned from preparing for a semi-plenary talk

I recently blogged about a semi-plenary talk I gave at the German OR Society Conference. This post is about the process of preparing for that presentation.

First I thought about the story I wanted to tell. I’ve given a lot of research talks before. I understand the general plot of a research talk, but a semi-plenary was not a regular research talk. I wasn’t initially sure how to tell a story in a new way. I asked a wise colleague for advice, which was excellent:

  1. Think about your favorite plenary talks. Model your talk after that (including the amount of math to include in the talk).
  2. Think of the talk as a series of 30 second elevator talks. Let those messages structure your story.
  3. Your audience will want to feel that they’ve learned something. What are the takeaways?

I found that creating an initial set of slides wasn’t so bad once I decided in the story I wanted to tell. I have given so many talks before that I had a huge set of slides that I could pull from. I had too many slides and could not fit into the time slot, and editing and pruning my slides was pure torture.

A few months ago, I read a post by an academic blogger who had recently given a plenary talk. I can’t find the post now but I remember that it took about 40 hours to create a one hour talk. This reminded me of an earlier post on teaching MOOCs (How college is like choosing between going to the movies and Netflix), where an enormous amount of time goes into a single lecture.

Here is why it took so long. I noticed that every time I removed a slide or combined a few slides into a single slide, it affected the story narrative in a major way. In a regular research talk, I find it easy to pick a few details to leave out. Not the case this time. Rather than condense the story, I eventually left some topics out all together or turned the insights from a  paper into a couple of bullet points on a slide. Finding the right balance of detail and insight was a constant challenge.

I ended up having almost no math in my talk. I decided that insights were more important that going through technical details.

I recreated almost all of the visuals from my slides in previous talk. It’s not that my visuals were total crap, it’s just that there was just too much detail and notation in previous figures I made for research talks. I didn’t want confusing visuals getting in the way of the story. Sometimes I added a picture to illustrate an idea or insight that was technical in nature rather than launching into a long narrative to explain a simple point. Here is an example of a new visual explaining the concept of ambulance response times and coverage:

Example of a conceptual slide I used in my talk.

Example of a conceptual slide I used in my talk.

Other times i just needed to make a simpler version of a figure or table that allowed me to look at a single curve or to compare two things, instead of a busier figure that works in a regular research talk. At one point, I changed a figure with four subfigures into a single figure by omitting the other three subfigures. I make nearly all of my figures with Matlab and save my code so that I can easily recreate figures for presentations or paper revisions. Remaking figures wasn’t too taxing, but remaking a lot of figures took some time.

Finally, I learned so much about my research when giving this talk. The end my my talk answered two questions:

  1. Where is emergency medical service research in OR going?
  2. Where does emergency medical service research in OR need to go?

I think about high level issues all the time (after all, I frequently write proposals!). But this was different: I was talking about places where this entire line of research is going, not just mine. When I was answering the question “Where does emergency medical service research in OR need to go?” when making my slides, I learned that my research had already made progress in the right direction. Not all of my ideas are in line with the where this line of research needs to go, and it was worthwhile to realign my priorities.


Related posts:

  1. Do you have a 30 second elevator talk about your research?
  2. The most important 30 seconds of your dissertation defense


underpowered statistical tests and the myth of the myth of the hot hand

In grad school, I learned about the hot hand fallacy in basketball. The so-called “hot hand” is the person whose scoring success probability is temporarily increased and therefore should shoot the ball more often (in the basketball context). I thought the myth of the hot hand effect was an amazing result: there is no such thing as a hot hand in sports, it’s just that humans are not good at evaluating streaks of successes (hot hand) or failures (slumps).

Flash forward years later. I read a headline about how hand sanitizer doesn’t “work” in terms of preventing illness. I looked at the abstract and read off the numbers. The group that used hand sanitizer (in addition to hand washing) got sick 15-20% less than the control group that only washed hands. The 15-20% difference wasn’t statistically significant so it was impossible to conclude that hand sanitizing helped, but it represented a lot of illnesses averted. I wondered if this difference would have been statistically significant if the number of participants was just a bit larger.

It turns out that I was onto something.

The hot hand fallacy is like the hand sanitizer study: the study design was underpowered, meaning that there is no way to reject the null hypothesis and draw the “correct” conclusion whether or not the hot hand effect or the hand sanitizer effect is real. In the case of the hand sanitizer, the number of participants needed to be large enough to detect a 15-20% improvement in the number of illnesses acquired. Undergraduates do this in probability and statistics courses where they estimate the sample size needed. But often researchers sometimes forget to design an experiment in a way that can detect real differences.

My UW-Madison colleague Jordan Ellenberg has a great article about the myth of the myth of the hot hand on Deadspin and it’s fantastic. He has more in his book How Not to Be Wrong, which I highly recommend.  He introduced me to a research paper by Kevin Korb and Michael Stillwell that compared statistical tests used to test for the hot hand effect on simulated data that did indeed have a hot hand. The “hot” data alternated between streaks with success probabilities of 50% and 90%. They demonstrated that the serial correlation and runs tests used in the ‘early “hot hand fallacy” paper were unable to identify a real hot hand, and therefore, these tests were underpowered and unable to reject the null hypothesis when it was indeed false. This is poor test design. If you want to answer a question using any kind of statistical test, it’s important to collect enough data and use the right tools so you can find the signal in the noise (if there is one) and reject the null hypothesis if it is false.

I learned that there appears to be no hot hand in sports where a defense can easily adapt to put greater defensive pressure on the “hot” player, like basketball and football. So the player may be hot but it doesn’t show up in the statistics only because the hot player is, say, double teamed. The hot hand is more apparent and measurable in sports where defenses are not flexible enough to put more pressure on the hot player, like in baseball and volleyball.



land O links

Here are a few links for your enjoyment:

  1. Does a five year old need to learn how to code?
  2. A mathematician uses statistics to predict the next Game of Thrones death.
  3. Why academics stink at writing.
  4. An operations researcher argues that airports should screen for Ebola the same way it screens for terrorists (nice job Sheldon Jacobson!). He was also interviewed on MSNBC.
  5. How diversity makes us smarter
  6. Article on why women should learn to love criticism. HT @katemath. “76 percent of the negative feedback given to women included some kind of personality criticism, such as comments that the woman was ‘abrasive,’ ‘judgmental’ or ‘strident.’ Only 2 percent of men’s critical reviews included negative personality comments.” Discuss.
  7. Are you a satisficer or a maximizer?

in defense of model complexity

Recently I wrote a post in defense of model simplicity. I liked a lot of things about that post, but it wasn’t the entire picture. Much of what we do in operations research deals with solving complex problems, and often we can’t settle for anything simple. Simple models can be incredibly useful, but they are generally useful when we are looking at a piece of a system without so many moving parts. Do we make a credit card offer to person X? Yes or no? An educated guess will suffice. A model (simple or complicated) that replaces that educated guess can be a big improvement. But the decision context is inherently simple: we need a model that tells is yes or no.

Operations research does so well when we need an answer to a complex problem with many interconnected parts. Case in point: it’s hard to find a feasible solution in many optimization models (capacitated facility location, scheduling models, vehicle routing problem with time windows, etc.). It’s trivial for finding a feasible solution to a yes-or-no problem

The last two semesters, I’ve team taught an introductory course for engineering freshmen on engineering grand challenges. Nearly all Wisconsin engineering students are admitted to the College of Engineering without a major (they are in a “general engineering” curriculum for the first year), although this is starting to change this year. One of the goals of this course is to introduce the students to different majors. I am in charge of Industrial and Systems Engineering. I have to talk about operations research, manufacturing, and human factors (confession: I really struggle with human factors). I’ve gotten better at telling 18 year olds about why industrial engineering is so cool. I’ve found that using a few examples is the best way to make this point.

(1) My favorite example is explaining why Major League Baseball scheduling is so hard (thanks again Mike Trick! Read more here and here) This example is so intuitive to so many students because they understand the many constraints:

  • 30 teams with 162 games each
  • half home games, half away games
  • each team must play each other a given number of times
  • a team cannot play too many away games in a row
  • travel distances matter: a team can’t fly across the country all the time
  • television revenue make some schedules more attractive than others
  • teams play each other, so you can’t fix a part of the schedule in isolation: everything affects everything else
  • you finally get a schedule you like, and then the Pope asks to visit New York and needs to be in a baseball stadium the day a game is scheduled, forcing you to reschedule the season.

(2) Scheduling people for work shifts is another problem that needs complex models. Decisions are discrete (you work the 8AM shift or you don’t), and there are many constraints, such as hour limits per week, block structure schedules, consecutive scheduling problems, union rules. Paul Rubin has a great blog post on scheduling instability:

The multiperiod nature of scheduling models tends to make them a bit chewy, especially when you need to coordinate schedules of multiple individuals (or machines, or venues) across blocks of time while dealing with multiple constraints and possibly multiple conflicting criteria. Not all complex models are optimization models.

(3) Awhile back, I blogged about an article in Nautilus about optimization and trucking [Link and Link] all about the difficulty in coming up with useful models for effectively delivering goods by truck. For the models to be useful, they need to account for many rules (like breaks) and the human behavior of the truck drivers. The resulting models are pretty complex.

There were union rules, there was industry practice. Tractors can be stored anywhere, humans like to go home at night. “I said we’re going to need a file with 2,000 rules. Trucks are simple; drivers are complicated.” At UPS, a program could come up with a great route, but if it violated, say, the Teamsters Union rules, it was worthless. For instance, time windows need to be built in for driver’s breaks and lunches.

(4) Thus far, I’ve only used scheduling and routing examples with a lot of interconnected parts. But those aren’t the only models that require complexity. My interest in public sector operations research has led me to appreciate so-called “wicked problems” (as opposed to “tame” problems). Wicked problems often addresses the soft side of operations research and is a defense of model complexity. Due to the social component of the problem, there are many stakeholders with contradictory needs. A problem that is wicked quickly unravels due to the connections it has to other issues that are also social, and so on. Russell Ackoff summed this up nicely:

“Every problem interacts with other problems and is therefore part of a set of interrelated problems, a system of problems…. I choose to call such a system a mess.”

I recommend C. West Churchman’s guest editorial in Management Science in 1967, where the term “wicked problems” was coined [pdf: Wicked Problems Churchman 1967] and this nice article on “wicked” problems by John Mingers in OR/MS Today.

Do you have a favorite complex model for a wicked problem?


Related reading:

a journey to the German OR Society Conference

Earlier in September, I gave a semi-plenary at the 2014 German OR Conference in Aachen, Germany. It was a wonderful conference and experience that will inspire at least another blog post or two. The German OR Society and Marco Lübbecke were wonderful hosts and conference organizers. There were more than 850 attendees, 500 talks, and an impressive group of plenary and semi-plenary talks.

Earlier I blogged about Mike Trick’s plenary talk on Major League Baseball scheduling and analytics that opened up the conference. I’m finally getting around to blogging about my talk on emergency medical services. For another take, see Mike Trick’s blog post about my talk. I learned a lot by giving the talk and talking to German researchers. Emergency medical services are operated in different ways in different parts of the world. It was refreshing to talk to other researchers who are looking at healthcare delivery issues from a different perspective than we have in the United States. It was also fun to catch up with two of my favorite bloggers (Mike Trick and and Marc-Andre Carle) at social events and meet some Punk Rock OR readers from across the pond.

I posted the slides to my talk below.

I took a few pictures from the conference and from Aachen that capture some of the highlights of the trip.


At the reception with Mike Trick and Marc-Andre Carle.


At the reception.


At the conference.


There were pretzels at almost every meal and snack break.


A statue in a square in Aachen.


A square in Aachen.


What looks like a panther statue.


I ran into Belguim and the Netherlands and found the Dreiländerpunkt [three-country point].


The conference bags were pretty snazzy.


A snapshot of me blogging about Mike Trick’s keynote as taken by my laptop. I didn’t realize how serious I look–blogging is a lot of fun, I swear!

The German OR society has a great mascot: a GORilla!

land O links

Here are a few links for your weekend reading.

  1. Wealthy Los Angeles K12 school vaccination rates are as low as Sudan’s. We have a paradox: higher overall vaccination rates and higher vulnerability due to risk caused by social networks choosing not to vaccinate.
  2. A few OR/Stat bloggers have written about FiveThirtyEight and data journalism recently. I like Michael Lopez’s (@StatsByLopez) blog post on where FiveThirtyEight stands after six months and Nathan Brixius’s take on FiveThirtyEight’s burrito challenge.
  3. The sad, gradual decline of the fade-out in popular music.
  4. Athene Donald on imposter syndrome and everyone feeling like an imposter sometime.
  5. I want to try code reviews in lab meetings.



Get every new post delivered to your Inbox.

Join 2,371 other followers