# Tag Archives: sports

## Markov chains for ranking sports teams

My favorite talk at ISERC 2014 (the IIE conference) was “A new approach to ranking using dual-level decisions” by Baback Vaziri, Yuehwern Yih, Mark Lehto, and Tom Morin (Purdue University) [Link]. They used a Markov chain to rank Big Ten football teams in their ability to recruit prospective players. Players would accept one of several offers. The team that got the player was the “winner” and the other teams were losers.  We end up with a matrix P where element (i,j) in P is the number of times team j beats team i.

The Markov chain is then normalized so that each row sums to 1 and solved for the limiting distribution. The probability of being in team j in the limit was interpreted as meaning the proportion of time that team j is the best. Therefore, the limiting distribution can be used to rank teams from best to worst.

They found that using this method with 2001 – 2012 data, Wisconsin was ranked fourth, which was much higher than it was ranked by experts and explains why they have been to 12 bowl games in a row. Illinois (my alma mater) was ranked second to last, only above lowly Indiana.

I used this method regular season 2014 Big Ten basketball wins and ended up with the following ranking. I also have the official ranking based on win-loss record for comparison.  We see large discrepancies for only two teams: Michigan State (which is over-ranked according to its win-loss record) and Indiana (which is under-ranked according to its win-loss record). The Markov chain method ranks these two teams differently because Indiana had high quality wins despite not winning so frequently and because Michigan State lost to a few bad teams when they were down a few players due to injuries.

 Ranking MC Ranking W-L record  Ranking 1 Michigan Michigan 2 Wisconsin Wisconsin 3 Indiana Michigan State 4 Iowa Nebraska 5 Nebraska Ohio State 6 Ohio St Iowa 7 Michigan St Minnesota 8 Minnesota Illinois 9 Illinois Indiana 10 Penn St Penn State 11 Northwestern Northwestern 12 Purdue Purdue

Sophisticated methods are a little more complex than this. Paul Kvam and Joel Sokol estimate conditional probabilities in the transition probability matrix for the logistic regression Markov chain (LRMC) model using logistic regression [Paper link here]. The logistic regression yields an estimate for the probability that a team with a margin of victory of x points at home is better than its opponent, and thus, looks at margin of victory not just wins and losses.

## subjective scoring in Olympic sports drives me a little crazy

The Olympics are beginning. When I think of the Olympic sports, I think of a lot of sports that scored subjectively. Not so much stronger, faster, and more goals, more of panels of judges picking winners amid controversy. I prefer number crunching and objective scoring. A New York Times article by John Branch [Link] overviews the changes to the winter Olympic sports in the last two decades. In summary, the new sports are mostly those with  subjective scoring (halfpipe, snowboard cross).

A good run early in the contest might receive an 80. A slightly better run might earn an 83. A brilliant run, one that seems unbeatable, might score 95. All of the others are slotted around them. It can frustrate athletes, who ask why their second-place score was 10 points below that of the winner. They struggle to understand that the value means nothing; what matters is how it ranks.

I’ve noticed this, too, and it’s frustrating. Some sports like figure skating and gymnastics have well-established rubrics for scoring, but they are not perfect. On the positive side, the judges do a fairly good job of recognizing the best performances.

Does subjective scoring bother you?

~

Look for more Olympics posts from me in the next couple of weeks.

I’ve been blogging for almost 7 years, so I have a few old posts about the Olympics. Here are a few that I recommend reading:

## will the New York Times Fourth Down Robot change football?

The New York times runs a twitter account for a “Fourth Down Bot” (@NYT4thDownBot) that analyzes every 4th down call in NFL football games. The bot gives advice and sometimes a short report summarizing the probability of success associated with each of the choices:

The bot has a lot of personality!

Brian Burke provides the methodology, which is here. The recommendations are based on which actions (going for it, punting, or going for a field goal) yields the most expected points. In the last 10 minutes of a game, the bot selects recommendations based on which yields the highest win probability. These concepts are not equivalent – going for it may maximize your points, but if time is running out and you are down by two, it might be better to go for a field goal than try for a touchdown.

The bot is useful because there is such a huge difference what is the best strategy and what coaches actually do. The picture below illustrates the difference. There are a number of explanations for the difference. One is that fans and owners only remember the times it doesn’t work–following the optimal policy may maximize the number of wins on average, but losing a game could mean losing your job. When the objective is to keep your job and not win games, everyone gets used to more conservative and suboptimal play calling.

Fourth Down Bot’s recommendations as compared to what most coaches do.

The Fourth Down Bot is so high profile that it has really raised awareness of this issue, possibly to the point that it may change how the game is played. If the fans know that it is better to go for it on fourth down and if the coaches and owners read the scathing fourth down reports questioning their decision-making, then maybe it will be unacceptable for coaches to cling to sub-optimal policies. Maybe I’m too optimistic about the Fourth Down Bot’s chance at improving scientific literacy to the point when the game changes. It’s possible that coaches and owners will be dismissive of math models and the nerds who make them, but I hope the Fourth Down Bot chips away at our society’s distrust of math.

It’s worth noting that the Fourth Down Bot is genderless and does not have a race. Until I blogged about the bot, all of the sports nerds and number crunchers I’ve read and blogged about are men. I can’t be the only women interested in these issues. Please introduce me to other women and minority sports nerds – I am more than willing to promote sports number crunchers from underrepresented groups.

Has the Fourth Down Bot changed the way you think about football? Do you think the Fourth Down Bot has the potential to change the game?

## decision quality and baseball strategy

Miss baseball? Love operations research and analytics? Watch Eric Bickel’s 46-minute webinar called “Play Ball! Decision Quality and Baseball Strategy” here:

## before Sabermetrics, there was football analytics

I enjoyed a recent Advanced NFL Stats podcast interview with Virgil Carter [Link], a former Chicago Bears quarterback who is considered to be the “father of football analytics.” During his time in the NFL, Carter enrolled in Northwestern University’s MBA program, and he started to work on a football project that was eventually published in Operations Research in 1971 (before Bill James of baseball analytics and Sabermetrics fame!). Carter even taught statistics and mathematics at Xavier University while on the Cincinnati Bengals.

The paper in Operations Research was co-written with Robert Machol and entitled “Operations Research on Football.” The paper estimates the expected value of having a First-and-10 at different yard lines on the field (see my related post here). Slate has a nice article about Virgil Carter [Link] outlining the work that went into estimating the value associated with field position:

Carter acquired the play-by-play logs for the first half of the 1969 NFL season and started the long slog of entering data: 53 variables per play, 8,373 plays. After five or six months, Carter had produced 8,373 punch cards. By today’s computing standards, Carter’s data set was minuscule and his hardware archaic. To run the numbers, he reserved time on Northwestern’s IBM 360 mainframe. Processing a half-season query would take 15 or 20 minutes—something today’s desktop computers could do in nanoseconds. In one research project, Carter started with the subset of 2,852 first-down plays. For each play, he determined which team scored next and how many points they scored. By averaging the results, he was able to learn the “expected value” of having the ball at different spots on the field.

They found that close to a team’s own end zone (almost 100 years from scoring a touchdown), a team’s expected points was negative, meaning that turnovers from fumbles and interceptions leading an opponent to score an easy touchdown outweighed a team’s own ability to move down the field and score. The paper discusses issues other than expected values, such as Type I and Type II errors using time outs. Here, the a timeout that controls time management has implications on each team’s remaining possessions, and using too much or too little time. The rules of football were quite different 40-something years ago. For example, an incomplete pass in the endzone required the ball to be brought out to the 20 yard line (instead of a mere loss of a down with no change in field position).

Listen to the podcast here.

Read my posts on football analytics here.

## the craft of major league baseball scheduling – a journey from 1982 until now

Grantland and ESPN has a short video [12:25] on the couple who created the major league baseball schedules in the pre-Mike Trick era (1982-2004). The husband-and-wife team of Henry and Holly Stephenson used scheduling algorithms to set about 80% of the schedule. They found that the their algorithm could not come up with the entire schedule because the list of scheduling requirements led to infeasibility:

“It couldn’t do the whole schedule. That was where the big companies were falling apart. We analyzed the old schedules and found that none of them met the written requirements that the league gave to us. It turns out it was impossible to meet all of the requirements. So the secret was to really know how to break the rules.”

Watch the video here. The end of the video acknowledges how scheduling has evolved such that the entire schedules can be computer generated using combinatorial optimization software (the Stephensons even mention having to compete with a scheduling team from CMU). The video uses baseball scheduling as an avenue to illustrate how decision making and optimization has evolved in the past 30 years. I would highly recommend the video to operations research and optimization students.

## why the Bears should have gone for it on fourth and inches

In last night’s Bears/Packers game, Coach Marc Trestman (of the Bears) decided to go for it on 4th and inches at the Bears’ 32 yard line during in the fourth quarter with 7:50 left and when the Bears were up 4. Normally, teams decide to punt in this situation, which reflects a hyper-conservative decision-making approach adopted by most football coaches. The Bears got the first down, and the ensuing drive led to a field goal, putting the Bears up by 7 with 0:50 left in the game.

In hindsight, it was obviously a great call. But decisions aren’t made with hindsight – both good and bad outcomes are possible with different likelihoods.

An article by Chris Chase at USA Today [Link] argued that while going for it on 4th down was a bad decision because the bad outweighed the good. There isn’t much analytical reasoning in the article. I prefer base decisions on number crunching rather than feeling and intuition, so here is my attempt to argue that going for it on 4th down was a good decision.

### The basic idea of football decision-making

There are a number of models that estimate the expected number of points a team would get based on their position on the field. To determine the best decision, you can:

1. look at the set of possible outcomes associated with each decision,
2. find the probability and expected number of points associated with each of these outcomes,
3. then take the expected value associated with each outcome, and
4. choose the outcome with the most expected points.

Let’s say going for it on 4th down has success probability p. Historical data suggests that p=0.8 or so. If unsuccessful, the Packers would take the ball over on the Bears’ 32 yard line with a conditional expected value of about -3.265 points. This value is negative because we are taking the Bears’ point of view. If successful, the Bears would be around their own 35 yard line with a conditional expected value of 0.839. When considering both outcomes (success and failure), we can an expected value associated with going for it on fourth down: 0.839 p – 3.265(1-p).

Let’s look at the alternative: punting. The average punt nets a team about 39 yards. This would put the ball on the Packers’ 29 yard line with an associated expected number of points of -0.51. However, this isn’t the right way to approach the problem. Since the expected number of points associated with a yard line is non-linear, we can’t average the field position first and then look up the expected number of points. Instead, we should consider several outcomes associated with field positions: Let’s assume that the Packers will get the ball back on their own 15, 25, 35, and 45 yard lines with probabilities 0.28, 0.25, 0.25, and 0.22 and with expected points 0.64, -0.24, -0.92, and -1.54, respectively. This averages out to the ball on the Packers’ 29 yard line with -0.45 points (on average).

Now we can compare the options of going for it (left hand side) and punting (right hand side):
$0.839 p - 3.265 (1-p) \ge -0.45$
Solving this inequality tells us that the Bears should go for it on fourth down if they have a success probability of at least 68.6%.

These values are from Wayne Winston’s book Mathletics.

### But time was running out!

The method I outlined above tends to work really well except that it ignores the actual point differential between the teams (which is often important, e.g., when deciding to go for one or two after a touchdown), the amount of time left on the clock, and the number of timeouts. It’s worth doing a different analysis during extreme situations. With 7:50 left on the clock, the situation wasn’t too extreme, but the Packers’ 3 remaining timeouts and 4 point score differential are worth discussing. Going for it on 4th down allowed the Bears to score a field goal and eat up an additional seven minutes off the clock, which was almost the perfect outcome. Let’s consider a range of outcomes.

Very close to the end of the game, it’s best to evaluate decisions based on the probability of winning instead of the expected number of points. Note that you find the probability of winning as the expected value of an indicator variable, so it uses the same method with different numbers. Making this distinction is important, since if you are down by 4 points, going for a field goal may maximize your average points but would guarantee that you’d lose the game.

One way to address these issues is to look at how many possessions the Packers will have if the Bears punt or go for it on fourth down. Let’s say that the Packers would get one possessions if the Bears punt. They would need to score a touchdown on their single possession to win. Let’s say that the Packers would get two possessions if the Bears punt. The Packers could win by scoring two field goals or one touchdown, unless the Bears score on their possession in between the Packers’ possessions. If the Bears score an additional field goal, that would put the Bears up 7, and the Packers would need at least one touchdown to tie (assuming a PAT), and an additional score of any kind to win. If the Bears score an additional touchdown, that would put the Bears up 10-12, and the Packers need two touchdowns to win and could possibly tie or win with a field goal and a touchdown (assuming a PAT or 2-point conversion was successful). The combination and sequence of events need to be evaluated and measured.

Without crunching numbers, we can see that punting would likely increase the Packers’ chance of winning because it would give them 2 chances to score (unless the Packers’ defense is so poor that they think the Bears would be almost certain to score again given another chance).

This is just one idea for analyzing the decision of whether to go for it on fourth down. Certainly, more details can be taken into account so long as there is data to support the modeling approach to support the decision.

Brian Burke blogged about this as I was finishing up my post [Link]. He used the win potential instead of the expected number of points (which I recommend but don’t calculate). This yielded the Bears’ break-even success probability of 71%, which is close to what I found. In any case, this more or less supported the decision to go for it on fourth and inches (although not going for it would also be reasonable in this case since the probability of successfully getting a fourth down is only slightly higher than the threshold) but maybe this analysis wouldn’t have supported the decision to go for it if it were fourth and 1.

### More on fourth down decision-making:

What sports play have you over analyzed?

## methodologies used to predict the outcome of the basketball tournament

My last post was about how to choose a winning bracket in the NCAA men’s basketball tournament. I linked to several tools for predicting which team is likely to win the outcome of a game. These tools

1. provide a rank ordering of the teams from best to worst,
2. compute the odds of which team would win in a matchup based on their tournament seed, or
3. provide odds of a team making it to different levels of the tournament based on specific matchups.

I linked to the methodologies used by these tools in my last post but didn’t get into the details. Here, I am going to discuss the methodologies in more detail. I am going to focus on tools that predict the outcome of specific tournaments (#3 above).

Wayne Winston noted in Mathletics that there is no transitivity in matchups. That is, if team A is favored to beat team B and team B is favored to beat team C, this does not  imply that team A is favored to beat team C. Thus, the team rankings (#1 above) are not a perfect tool for predicting specific matchups. He uses “power ratings” to compute how many points one team is better than the other (a point spread), which takes home field advantage and other factors into account. He then converts the point spread to the probability of winning using historical game outcomes (basically, a normal distribution with a history-derived standard deviation) or simulates the games to compute the odds of winning.

Nate Silver’s model is interesting in that it takes many inputs, including the ranking tool outcomes from #1 above. His model uses blends four ranking models to take a more pluralistic view of who might win. I think this is a strength because it uses the wisdom of crowds (a small crowd in this case). Each of the four tools contributes 1/6 of the total power rating (a margin of victory).  Seed number and whether the team was ranked in preseason polls each contribute 1/6 of the power rating. He then makes adjustments for the geography of the game and player injuries and absences. He doesn’t describe his forecast probabilities in detail, but I suspect that his approach is similar to Wayne Winston’s. A team’s power rating is adjusted in each round based on the outcomes from previous rounds to account for potential errors in the power rating, another strength of the model.

Finally, Luke Winn and John Ezekowitz’s model doesn’t use power ratings [methodology here] – it instead applied survival analysis to predict when a team may drop out of the tournament. This model computes hazard rates for each team based on the team’s RPI and Ken Pomeroy’s ranking. They also consider

1. consistency,
2. tournament experience,
3. out-degree network centrality that captures the number of games played and won against other NCAA tournament teams (see picture below), and
4. the negative interaction of the Experience and Out-Degree Centrality variables

Cox Proportional Hazard regression was used to rerank the teams.

## Superbowl reading for number crunchers

Here are a few links to posts and articles about the Superbowl that will appeal to number crunchers:

Nate Silver argues that defense wins championships. Many math models show that offense is more instrumental in winning games than is defense. But defense may be better for winning titles. Silver looks at the top 20 defenses and offenses to have played in the Super Bowl according to the simple rating system at pro-football-reference.com. He finds that the team with top defensive teams have won 14 of 20 Super Bowls  whereas the top offensive teams have won 10 of 20.

Nate Cohn at the New Republic writes about how football is ripe for reaping the benefits from advanced statistics.

Josh Laurito has a nice post on TV Ratings (as measured by Nielsen) for major league sports championships. The Super Bowl is the only one championship that has been increasing over the past decade or so  (shown below). The Superbowl with the highest ratings ever was the 1986 Superbowl featuring the 1985 Bears (this is probably the closest I’ll get to proving that the 1985 Bears was the best team ever.)

Superbowl Nielson Ratings

I’ve written about football in several posts. One analyzes the Patriots’ decision to let the Giants score a touchdown in last year’s Superbowl using a decision tree.

I also have three presentations on football decision making.

The third uses game theory to find the best mix of run and pass plays.

## game theory and college football

60 Minutes had a nice piece on college football on Sunday with correspondent Armen Keteyian (Link). The story examined the popularity and skyrocketing costs of college football programs. The most interesting part of the story was its application of game theory. To stay competitive, a team must recruit a good coach and the best players. Of course, a team’s competitors are going to be doing the same. This leads to a type of Prisoner’s Dilemma  where a team can choose to “keep costs down” or “escalate.” If the team keeps costs down and their opponents escalate, they have a terrible record, their alums are not happy, and their alums are not generous with donations. This leads to a college football arms race:

[Michigan athletic director] Dave Brandon: You’ve got 125 of these programs. Out of 125, 22 of them were cash flow even or cash flow positive. Now, thankfully, we’re one of those. What that means is you’ve got a model that’s not sustainable in most cases. You just don’t have enough revenues to support the costs. And the costs continue to go up.

Why? A big reason is universities are in the midst of a sports building binge. Cal Berkeley, for example, renovated its stadium to the tune of \$321 million. The list is endless. Michigan’s athletic department floated \$226 million in bonds to upgrade the Big House.

[60 Minutes correspondent] Armen Keteyian: What are you chasing?

Dave Brandon: We want to win championships.

Armen Keteyian: And you’re going to get a big payout?

Dave Brandon: We’re going to have excited fans, we’re going to fill stadiums, we’re going to be on TV. We’re going to accomplish all of the goals that we need to accomplish to keep this department moving ahead.

Armen Keteyian: And that’s where the phrase “arms race” comes up?

Dave Brandon: If you don’t keep pace, if you don’t stay competitive, you’re going to have a problem.

Inside a recently built indoor practice facility that many an NFL team would envy, we spoke to Michigan’s head coach Brady Hoke.

Armen Keteyian: Can you recruit a top player without facilities like this?

[Michigan's head coach] Brady Hoke: You know, it matters. I– I’d be sitting here lying if I didn’t think it mattered. I think the other part of it though– the people have to matter too.

The program every school has been chasing is Alabama. The Crimson Tide have rolled to two national titles in the last three years. The architect of that success is Nick Saban, as innovative a coach as there is in the game. And the leader of another escalating trend in college football: skyrocketing coaching salaries. Saban is paid over \$5 million a year, more than Alabama’s chancellor.

Armen Keteyian: Are you worth it?

Nick Saban: Probably not. Probably not.

Universities  engage in other arms races. The move toward the university as Club Med is an example. The university with the best dorms and exercise facilities recruit the best scholars. A university without an artificial rock climbing wall, water slides, and spa (sadly) cannot hope to be competitive.

Is there a way universities can deescalate?