# Tag Archives: sports

## Should a football team run or pass? A game theory and linear programming approach

Last week I visited Oberlin College to deliver the Fuzzy Vance Lecture in Mathematics (see post here). In addition, I gave two lectures to Bob Bosch’s undergraduate optimization course. My post about my lecture on ambulance location models is here.

My second lecture was about how to solve two player zero-sum games using linear programming. The application was a sports analytics application of whether a football team should run or pass. The purpose of the lecture was to learn about zero-sum games (it was a new topic to most students) and learn how to solve zero-sum games with two decision-makers using linear programming.

This lecture tied into my Badger Bracketology work, but since I do not use optimization in my college football playoff forecasting model, I selected another football application.

## the NFL football draft and the knapsack problem

In this week’s Advanced Football Analytics podcast, Brian Burke talked about the knapsack problem and the NFL draft [Link]. I enjoyed it. Brian has a blog post explaining the concept of the knapsack problem as it relates to the NFL draft here here. The idea is that the draft is a capital budgeting problem for each team, where the team’s salary cap space is the knapsack budget, the potential players are the items, the players’ salaries against the cap are the item weights, and the players’ values (hard to estimate!) are the item rewards. Additional constraints are needed to ensure that all the positions are covered, otherwise the optimal solution returned might be a team with only quarterbacks and running backs. Brian talks a bit about analytics and estimating value. I’ll let you listen to the podcast to get to all the details.

During the podcast, Brian gave OR a shout out and added a side note about how knapsack problems are useful for a bunch of real applications and can be very difficult to solve in the real world (thanks!). I appreciated this aside, since sometimes cute applications of OR on small problem instances give the impression that our tools are trivial and silly. The reality is that optimization algorithms are incredibly powerful and have allowed us to solve incredibly difficult optimization problems.

Optimization has gotten sub-optimal coverage in the press lately. My Wisconsin colleagues Michael Ferris and Stephen Wright wrote a defense of optimization in response to an obnoxious anti-optimization article in the New York Times Magazine (“A sucker is optimized every minute.” Really?). Bill CookNathan Brixius, and JF Puget wrote nice blog posts in response to coverage of a TSP road trip application that failed to touch on the bigger picture (TSP is useful for routing and gene sequencing, not just planning imaginary road trips!!). I didn’t write my own defense of optimization since Bill, Nathan, and JF did such a good job, but needless to say, I am with them (and with optimization) all the way. It’s frustrating when our field misses opportunities to market what we do.

If you enjoy podcasts, football, and analytics, I recommend the Advanced Football Analytics podcast that featured Virgil Carter, who published his groundbreaking football analytics research in Operations Research [Link].

Related posts:

## Some thoughts on the College Football Playoff

After a fun year of Badger Bracketology, I wanted to reflect upon the college football playoff.

Nate Silver reflects upon the playoff in an article on FiveThirtyEight, and he touches on the two most salient issues in the playoff:

• False negatives: leaving teams with a credible case for being named the national championship out of the playoff.
• False positives: “undeserving” teams in the playoff.

As the number of teams in the playoff increases, the number of false negatives decreases (good – this allows us to have a chance of selecting the “right” national champion) and the number of false positives increases (bad).

One of my concerns with the old Bowl Championship Series (BCS) system with a single national championship game was that exactly two teams were invited to the national championship game. This was a critical assumption in the old system that was rarely discussed. There was rarely exactly two teams that are “deserving.” Usually, deserving is equated with “undefeated” and in a major conference. Out of 16 BCS tournaments, this situation occurred only four times (25% of championship games), leading to controversy in the remaining 75%. This is not a good batting average, with most of the 12 controversial years having too many false negatives and no false positives.

The new College Football Playoff (CFP) system has a new assumption: the number of “deserving” teams does not exceed four teams.

If you look at the BCS years, we see that this assumption was never violated: there was never more than four undefeated teams in a major conference nor a controversy surrounding more than 3 potential “deserving” teams. Controversy surrounded the third team that was left out, a team that would now be invited to the playoff. At face value, the four team playoff seems about right.

But given the title of Nate Silver’s article (“Expand The College Football Playoff”) and the excited discussion of the idea of the eight team playoff in 2008 after a controversial national championship game, I can safely say that most people want more than four teams in the playoff. TCU’s dominance in a bowl game supports these arguments. The fact that we’ve had one controversial seeding in one CFP is a sign that maybe four isn’t the right playoff size. What is the upper bound on the number of deserving teams?

Answering this question is tricky, because there is a relationship between the number of teams in the playoff and our definition of “deserving.” There will always be teams on the bubble, but as the playoff becomes larger, this becomes less of an issue. Thoughts on this topic are welcome in blog comments.

It’s worth mentioning the impact on academics and injuries. As a professor of operations research, I believe that every decision requires balancing different tradeoffs. The tradeoffs in the college football playoffs should not only be about false positives, false negatives, fan enjoyment, and ad revenue. Maybe this is trivial: it’s an extra game for a mere eight teams, but I will be disappointed if the entire impact on the student-athletes and their families such as academics and injuries are not part of the conversation.

## introducing Badger Bracketology, a tool for forecasting the NCAA football playoff

Today I am introducing Badger Bracketology:
http://bracketology.engr.wisc.edu/

I have long been interested in football analytics, and I enjoy crunching numbers while watching the games. This year is the first season for the NCAA football playoff, where four teams will play to determine the National Champion. It’s a small bracket, but it’s a start in the right direction.

The first step to being becoming the national champion is to make the playoff. To do so, a team must be one of the top four ranked teams at the end of the season. A selection committee manually ranks the teams, and they are given a slew of information and other rankings to make their decisions.

I wanted to see if I could forecast the playoff ahead of time by simulating the rest of the season rather than waiting until all season’s games have been played. Plus, it’s a fun project that I can share with my undergraduate simulation simulation that I teach in the spring.

Here is how my simulation model works. The most critical part is the ranking method, which uses the completed game results to rate and then rank the teams so that I can forecast who the top 4 teams will be at the end of the season. I need to do this solely using math (no humans in the loop!) in each of 10,000 replications. Here is how it works. I start with the outcomes of the games played so far, starting with at least 8 weeks of data. This is used to come up with a rating for each team that I then rank. The ranking methodology uses a connectivity matrix based on Google’s PageRank algorithm (similar to a Markov chain). So far, I’ve considered three variants of this model that take various bits of information account like who a team beats, who it loses to, and the additional value provided by home wins. I used data from the 2012 and 2013 seasons to tune the parameters needed for the models.

The ratings along with the impact of home field advantage are then used to determine a win probability for each game. From previous years, we found that the home team won 56.9% of games later in the season (week 9 or later), which accounts for an extra boost in win probability of ~6.9% for home teams. This is important since there are home/away games as well as games on neutral sites, and we need to take this into account. The simulation selects winners in the next week of games by essentially flipping a biased coin with. Then, the teams are re-ranked after each week of simulated game outcomes. This is repeated until we get to the end of the season. Finally, I identify and simulate the conference championship games played (these are the only games not scheduled in advance). And then we end up with a final ranking. Go here for more details.

There are many methods for predicting the outcome of a game in advance. Most of the sophisticated methods use additional information that we could not expect to obtain weeks ahead of time (like the point spread, point outcomes, yards allowed, etc.). Additionally, some of the methods simply return win probabilities and cannot be used to identify the top four teams at the end of the season. My method is simple, but it gives us everything we need without being so complex that I would be suspicious of overfitting. The college football season is pretty short, so our matrix is really sparse. At present, teams have played 8 weeks of football in sum, but many teams have played just 6-7 games. Additional information could be used to help make better predictions, and I hope to further refine and improve the model in coming years. Suggestions for improving the model will be well-received.

Our results for our first week of predictions are here. Check back each week for more predictions.

Your thoughts and feedback are welcome!

## underpowered statistical tests and the myth of the myth of the hot hand

In grad school, I learned about the hot hand fallacy in basketball. The so-called “hot hand” is the person whose scoring success probability is temporarily increased and therefore should shoot the ball more often (in the basketball context). I thought the myth of the hot hand effect was an amazing result: there is no such thing as a hot hand in sports, it’s just that humans are not good at evaluating streaks of successes (hot hand) or failures (slumps).

Flash forward years later. I read a headline about how hand sanitizer doesn’t “work” in terms of preventing illness. I looked at the abstract and read off the numbers. The group that used hand sanitizer (in addition to hand washing) got sick 15-20% less than the control group that only washed hands. The 15-20% difference wasn’t statistically significant so it was impossible to conclude that hand sanitizing helped, but it represented a lot of illnesses averted. I wondered if this difference would have been statistically significant if the number of participants was just a bit larger.

It turns out that I was onto something.

The hot hand fallacy is like the hand sanitizer study: the study design was underpowered, meaning that there is no way to reject the null hypothesis and draw the “correct” conclusion whether or not the hot hand effect or the hand sanitizer effect is real. In the case of the hand sanitizer, the number of participants needed to be large enough to detect a 15-20% improvement in the number of illnesses acquired. Undergraduates do this in probability and statistics courses where they estimate the sample size needed. But often researchers sometimes forget to design an experiment in a way that can detect real differences.

My UW-Madison colleague Jordan Ellenberg has a great article about the myth of the myth of the hot hand on Deadspin and it’s fantastic. He has more in his book How Not to Be Wrong, which I highly recommend.  He introduced me to a research paper by Kevin Korb and Michael Stillwell that compared statistical tests used to test for the hot hand effect on simulated data that did indeed have a hot hand. The “hot” data alternated between streaks with success probabilities of 50% and 90%. They demonstrated that the serial correlation and runs tests used in the ‘early “hot hand fallacy” paper were unable to identify a real hot hand, and therefore, these tests were underpowered and unable to reject the null hypothesis when it was indeed false. This is poor test design. If you want to answer a question using any kind of statistical test, it’s important to collect enough data and use the right tools so you can find the signal in the noise (if there is one) and reject the null hypothesis if it is false.

I learned that there appears to be no hot hand in sports where a defense can easily adapt to put greater defensive pressure on the “hot” player, like basketball and football. So the player may be hot but it doesn’t show up in the statistics only because the hot player is, say, double teamed. The hot hand is more apparent and measurable in sports where defenses are not flexible enough to put more pressure on the hot player, like in baseball and volleyball.

## Major League Baseball scheduling at the German OR Society Conference

Mike Trick talked about his experience setting the Major League Baseball (MLB) schedule at the 2014 German OR Conference in Aachen, Germany. Mike’s plenary talk had two major themes:
1. Getting the job with the MLB
2. Keeping the job with the MLB

The getting the job section summarized advances in computing power and integer programming solvers that have made solving large-scale integer programming (IP) models a reality. Mike talked about how he used to generate cuts for his models, but now the solvers (like CPLEX or Gurobi) add a lot of the cuts automatically as part of pre-processing. Over time, Mike’s approach has become popping his models into CPLEX and then figuring out what the solver is doing so he can exploit the tools that already exist.

Side note: I am amazed at how good the integer programming solvers have become. I recently worked on a variation to the set covering model for which a greedy approximation algorithm exists. The time complexity of the greedy algorithm isn’t great in theory. In practice, the greedy algorithm is slower than the solver (Gurobi, I think) and doesn’t guarantee optimality. I can’t believe we’ve come this far.

Mike also stressed the importance of finding better ways to formulate the problem to create a better structure for the IP solver.  Better formulations can be more complicated and less intuitive, but they can lead to markedly better linear programming bounds. Mike achieved this by replacing his model with binary variables that correspond to team-to-team games (does team i play team j on day t?) with another model whose variables correspond to series (a series is usually 3 games played between teams on consecutive days). Good bounds from the linear programming relaxations help the IP solver find an optimal solution much quicker. Another innovation focused on improving the schedule by “throwing away” much of the schedule (usually about a month) after making needed changes and resolving. Again, this is something that is possible due to advances in computing.

The keeping the job section addressed business analytics and its role in optimization. Mike defined business analytics as using data to make better decisions, something that OR has always done. What is new is using the power of data analytics and predictive modeling to guide prescriptive integer programming models in a meaningful way. The old way was to use point estimates in integer programming models, the new way uses more information (such as the output of a logistic regression) to guide optimization models. The application Mike used was estimating the value of scheduling home games at different times (day vs. night) and day of the week. When embedded in the optimization modeling framework, the end result was that creating a schedule using business analytics could add about \$50M to MLB in revenue.

Mike summed up his talk but talking about how educating the marketing folks is part of the job now. Marketing likes to measure “success” as the number of games that sell out. Operations researchers recognize that sold out games are lost revenue, so the goal has become to schedule games such that games are almost sold out, and making sure that marketing understands this approach.

Related post:

the craft of scheduling Major League Baseball games

## Markov chains for ranking sports teams

My favorite talk at ISERC 2014 (the IIE conference) was “A new approach to ranking using dual-level decisions” by Baback Vaziri, Yuehwern Yih, Mark Lehto, and Tom Morin (Purdue University) [Link]. They used a Markov chain to rank Big Ten football teams in their ability to recruit prospective players. Players would accept one of several offers. The team that got the player was the “winner” and the other teams were losers.  We end up with a matrix P where element (i,j) in P is the number of times team j beats team i.

The Markov chain is then normalized so that each row sums to 1 and solved for the limiting distribution. The probability of being in team j in the limit was interpreted as meaning the proportion of time that team j is the best. Therefore, the limiting distribution can be used to rank teams from best to worst.

They found that using this method with 2001 – 2012 data, Wisconsin was ranked fourth, which was much higher than it was ranked by experts and explains why they have been to 12 bowl games in a row. Illinois (my alma mater) was ranked second to last, only above lowly Indiana.

I used this method regular season 2014 Big Ten basketball wins and ended up with the following ranking. I also have the official ranking based on win-loss record for comparison.  We see large discrepancies for only two teams: Michigan State (which is over-ranked according to its win-loss record) and Indiana (which is under-ranked according to its win-loss record). The Markov chain method ranks these two teams differently because Indiana had high quality wins despite not winning so frequently and because Michigan State lost to a few bad teams when they were down a few players due to injuries.

 Ranking MC Ranking W-L record  Ranking 1 Michigan Michigan 2 Wisconsin Wisconsin 3 Indiana Michigan State 4 Iowa Nebraska 5 Nebraska Ohio State 6 Ohio St Iowa 7 Michigan St Minnesota 8 Minnesota Illinois 9 Illinois Indiana 10 Penn St Penn State 11 Northwestern Northwestern 12 Purdue Purdue

Sophisticated methods are a little more complex than this. Paul Kvam and Joel Sokol estimate conditional probabilities in the transition probability matrix for the logistic regression Markov chain (LRMC) model using logistic regression [Paper link here]. The logistic regression yields an estimate for the probability that a team with a margin of victory of x points at home is better than its opponent, and thus, looks at margin of victory not just wins and losses.