Miss baseball? Love operations research and analytics? Watch Eric Bickel’s 46-minute webinar called “Play Ball! Decision Quality and Baseball Strategy” here:
Posted by Laura McLay on December 5, 2013
Miss baseball? Love operations research and analytics? Watch Eric Bickel’s 46-minute webinar called “Play Ball! Decision Quality and Baseball Strategy” here:
Posted by Laura McLay on December 4, 2013
Kaggle is sponsoring a non-denominational Christmas optimization contest to help Santa solve a 3-dimensional bin packing problem [Link].
This year we’re attacking the classic bin packing problem with a twist: cram all the gifts into his sleigh, but in the order they need to be delivered! MathWorks is sponsoring the $10,000 prizes — with $4000 going to both the holder of the top spot, and also the highest-ranked team using 100% MATLAB/MathWorks as their tool. The final $2000 is the Rudolph Prize: Rudolph, won’t you lead-our-board tonight? No word yet on whether the prize pool will include an optimally-stuffed reindeer.
The problem seeks to find an optimal ordering of Christmas presents in a single 3-dimensional bin. Packages have a size in each of the three dimensions, and they can be rotated to fit into the bin. The objective is to minimize the highest placed present in the bin. The contest ends at 11:59 pm, Sunday 26 January 2014 UTC.
Posted by Laura McLay on November 15, 2013
Last week I blogged about the husband and wife team that created Major League Baseball schedules for more than two decades [Link]. I discovered another operations research collaboration between a husband and wife team.
Math professor Sommer Gentry and her surgeon husband Dorry Segev discuss how to match kidney donors with those in need of a transplant using networks and integer programming. Their collaboration is featured in the documentary “The Right Match” (below).
In the documentary, they mention how administrators in a single hospital could match up the pairs locally, where there were just a few patients. Integer programming models were needed when considering patients across multiple hospitals, where there are hundreds of patients in need of a transplant. Jump ahead to about seven minutes in to see their discussion of the the network structure of the problem and its similarity to max cardinality matching.
This is a nice video that would be suitable to undergraduate and graduate students studying optimizations. It might be particularly motivating for undergraduates who have learned about less useful applications like the diet problem and optimal mix problems in a linear programming course.
Watch the video here:
Visit their web site: http://www.optimizedmatch.com/
See some of the press their research has received here.
For more reading, I recommend reading more about it on Hari Balasubramanian’s blog here.
Posted by Laura McLay on November 8, 2013
I enjoyed a recent Advanced NFL Stats podcast interview with Virgil Carter [Link], a former Chicago Bears quarterback who is considered to be the “father of football analytics.” During his time in the NFL, Carter enrolled in Northwestern University’s MBA program, and he started to work on a football project that was eventually published in Operations Research in 1971 (before Bill James of baseball analytics and Sabermetrics fame!). Carter even taught statistics and mathematics at Xavier University while on the Cincinnati Bengals.
The paper in Operations Research was co-written with Robert Machol and entitled “Operations Research on Football.” The paper estimates the expected value of having a First-and-10 at different yard lines on the field (see my related post here). Slate has a nice article about Virgil Carter [Link] outlining the work that went into estimating the value associated with field position:
Carter acquired the play-by-play logs for the first half of the 1969 NFL season and started the long slog of entering data: 53 variables per play, 8,373 plays. After five or six months, Carter had produced 8,373 punch cards. By today’s computing standards, Carter’s data set was minuscule and his hardware archaic. To run the numbers, he reserved time on Northwestern’s IBM 360 mainframe. Processing a half-season query would take 15 or 20 minutes—something today’s desktop computers could do in nanoseconds. In one research project, Carter started with the subset of 2,852 first-down plays. For each play, he determined which team scored next and how many points they scored. By averaging the results, he was able to learn the “expected value” of having the ball at different spots on the field.
They found that close to a team’s own end zone (almost 100 years from scoring a touchdown), a team’s expected points was negative, meaning that turnovers from fumbles and interceptions leading an opponent to score an easy touchdown outweighed a team’s own ability to move down the field and score. The paper discusses issues other than expected values, such as Type I and Type II errors using time outs. Here, the a timeout that controls time management has implications on each team’s remaining possessions, and using too much or too little time. The rules of football were quite different 40-something years ago. For example, an incomplete pass in the endzone required the ball to be brought out to the 20 yard line (instead of a mere loss of a down with no change in field position).
Listen to the podcast here.
Read my posts on football analytics here.
Posted by Laura McLay on November 6, 2013
Grantland and ESPN has a short video [12:25] on the couple who created the major league baseball schedules in the pre-Mike Trick era (1982-2004). The husband-and-wife team of Henry and Holly Stephenson used scheduling algorithms to set about 80% of the schedule. They found that the their algorithm could not come up with the entire schedule because the list of scheduling requirements led to infeasibility:
“It couldn’t do the whole schedule. That was where the big companies were falling apart. We analyzed the old schedules and found that none of them met the written requirements that the league gave to us. It turns out it was impossible to meet all of the requirements. So the secret was to really know how to break the rules.”
Watch the video here. The end of the video acknowledges how scheduling has evolved such that the entire schedules can be computer generated using combinatorial optimization software (the Stephensons even mention having to compete with a scheduling team from CMU). The video uses baseball scheduling as an avenue to illustrate how decision making and optimization has evolved in the past 30 years. I would highly recommend the video to operations research and optimization students.
Posted by Laura McLay on November 5, 2013
In last night’s Bears/Packers game, Coach Marc Trestman (of the Bears) decided to go for it on 4th and inches at the Bears’ 32 yard line during in the fourth quarter with 7:50 left and when the Bears were up 4. Normally, teams decide to punt in this situation, which reflects a hyper-conservative decision-making approach adopted by most football coaches. The Bears got the first down, and the ensuing drive led to a field goal, putting the Bears up by 7 with 0:50 left in the game.
In hindsight, it was obviously a great call. But decisions aren’t made with hindsight – both good and bad outcomes are possible with different likelihoods.
An article by Chris Chase at USA Today [Link] argued that while going for it on 4th down was a bad decision because the bad outweighed the good. There isn’t much analytical reasoning in the article. I prefer base decisions on number crunching rather than feeling and intuition, so here is my attempt to argue that going for it on 4th down was a good decision.
There are a number of models that estimate the expected number of points a team would get based on their position on the field. To determine the best decision, you can:
Let’s say going for it on 4th down has success probability p. Historical data suggests that p=0.8 or so. If unsuccessful, the Packers would take the ball over on the Bears’ 32 yard line with a conditional expected value of about -3.265 points. This value is negative because we are taking the Bears’ point of view. If successful, the Bears would be around their own 35 yard line with a conditional expected value of 0.839. When considering both outcomes (success and failure), we can an expected value associated with going for it on fourth down: 0.839 p – 3.265(1-p).
Let’s look at the alternative: punting. The average punt nets a team about 39 yards. This would put the ball on the Packers’ 29 yard line with an associated expected number of points of -0.51. However, this isn’t the right way to approach the problem. Since the expected number of points associated with a yard line is non-linear, we can’t average the field position first and then look up the expected number of points. Instead, we should consider several outcomes associated with field positions: Let’s assume that the Packers will get the ball back on their own 15, 25, 35, and 45 yard lines with probabilities 0.28, 0.25, 0.25, and 0.22 and with expected points 0.64, -0.24, -0.92, and -1.54, respectively. This averages out to the ball on the Packers’ 29 yard line with -0.45 points (on average).
Now we can compare the options of going for it (left hand side) and punting (right hand side):
Solving this inequality tells us that the Bears should go for it on fourth down if they have a success probability of at least 68.6%.
These values are from Wayne Winston’s book Mathletics.
The method I outlined above tends to work really well except that it ignores the actual point differential between the teams (which is often important, e.g., when deciding to go for one or two after a touchdown), the amount of time left on the clock, and the number of timeouts. It’s worth doing a different analysis during extreme situations. With 7:50 left on the clock, the situation wasn’t too extreme, but the Packers’ 3 remaining timeouts and 4 point score differential are worth discussing. Going for it on 4th down allowed the Bears to score a field goal and eat up an additional seven minutes off the clock, which was almost the perfect outcome. Let’s consider a range of outcomes.
Very close to the end of the game, it’s best to evaluate decisions based on the probability of winning instead of the expected number of points. Note that you find the probability of winning as the expected value of an indicator variable, so it uses the same method with different numbers. Making this distinction is important, since if you are down by 4 points, going for a field goal may maximize your average points but would guarantee that you’d lose the game.
One way to address these issues is to look at how many possessions the Packers will have if the Bears punt or go for it on fourth down. Let’s say that the Packers would get one possessions if the Bears punt. They would need to score a touchdown on their single possession to win. Let’s say that the Packers would get two possessions if the Bears punt. The Packers could win by scoring two field goals or one touchdown, unless the Bears score on their possession in between the Packers’ possessions. If the Bears score an additional field goal, that would put the Bears up 7, and the Packers would need at least one touchdown to tie (assuming a PAT), and an additional score of any kind to win. If the Bears score an additional touchdown, that would put the Bears up 10-12, and the Packers need two touchdowns to win and could possibly tie or win with a field goal and a touchdown (assuming a PAT or 2-point conversion was successful). The combination and sequence of events need to be evaluated and measured.
Without crunching numbers, we can see that punting would likely increase the Packers’ chance of winning because it would give them 2 chances to score (unless the Packers’ defense is so poor that they think the Bears would be almost certain to score again given another chance).
This is just one idea for analyzing the decision of whether to go for it on fourth down. Certainly, more details can be taken into account so long as there is data to support the modeling approach to support the decision.
Brian Burke blogged about this as I was finishing up my post [Link]. He used the win potential instead of the expected number of points (which I recommend but don’t calculate). This yielded the Bears’ break-even success probability of 71%, which is close to what I found. In any case, this more or less supported the decision to go for it on fourth and inches (although not going for it would also be reasonable in this case since the probability of successfully getting a fourth down is only slightly higher than the threshold) but maybe this analysis wouldn’t have supported the decision to go for it if it were fourth and 1.
What sports play have you over analyzed?
Posted by Laura McLay on October 31, 2013
Here is a roundup of Halloween related reading I’ve been doing.
Please share any OR/MS related Halloween stories that you’ve come across.
Posted by Laura McLay on October 24, 2013
Based on the curiosity over my cheese-related exam question on twitter, I have decided to post the midterm for my graduate level course on stochastic processes. My favorite questions are #1 and #2. I should note that #1 was inspired by actual leftover cheese that is packaged and sold at a discount at the Babock Dairy Store on campus (picture is below). If there is enough extra leftover cheese, it is poured into a bag, leading to cheese that has layers like an onion. It is apparently not cost-effective to repress the leftover cheese into a smooth brick of cheese. As someone who didn’t grow up wearing a foam cheese hat, I find that cheese production, quality control, and inventory is the right avenue for me to learn about cheese.
Students are confused by my cheese quality control exam question. Shouldn't Wisconsin students know a lot more about cheese?—
Laura McLay (@lauramclay) October 22, 2013
1. The dairy plant in Babcock Hall makes one batch of cheese six days per week. The amount of cheese (in pounds) left over after each batch is distributed according to an exponential distribution with parameter 1. Cheese production on each day is independent. The Babcock Dairy store will package and sell any leftover cheese in a batch (i.e., a day) that is more than 1 pound—a “cheese factory second.” Let Ei be the event that there is more than 1 pound of cheese available on day i, with i = 1, 2, 3, 4, 5, 6.
Therefore, Ei is a random variable:
Ei = 1 if there are cheese factory seconds for the type of cheese produced on day i and 0 otherwise (i.e., ignore how much cheese is leftover – we are instead interested in the binary outcome of whether cheese is leftover).
E = the total number of types of cheese factory seconds across the 6 day week.
a) Define E in terms of Ei.
b) What is the sample space for E?
c) What is the probability that there is at least a pound of leftover cheese on day 1?
d) What is the probability that four or more days in the week produce cheese factory seconds?
2. The Lightsaber Manufacturing Company (LMC) operates their manufacturing plant in a galaxy far, far away. They need to decide to how many lightsabers to stock for the next Jedi Convention, where they will sell lightsabers to Jedi apprentices. Due to the expense associated with interstellar travel, LMC will discard the unsold lightsabers after the convention. Lightsabers cost 413 galactic credits to produce and are sold for 795 galactic credits (the unit of currency in the Empire). If the demand for lightsabers follows an exponential distribution with an average of 75 lightsabers, how many lightsabers should the LMC bring to the Jedi Convention to maximize its profit?
3. A student has a hard time figuring out how to get started on homework for Stochastic Modeling Techniques. The student randomly selects one of 3 potential places to start a homework problem with equal probability. The first approach is not fruitful; the student will return to the starting point after 1 hour of work. The second approach will return the student to the starting point after 3 hours. The third will lead to the solution in 15 minutes (1/4 hour). The student is confused, so he/she always chooses from all 3 available approaches each time. What is the expected amount of time it takes this diligent student to solve the homework problem?
4. A mysterious illness called “badgerpox” has affected the local badger population near Madison. The exposure level (X) largely determines whether a badger contracts the disease (D). The probability distribution for the exposure level and the conditional probability of disease given the exposure level are given in the table below.
|Exposure level (Xi)||P(Xi)||P(D | Xi)|
(a) The conditional distribution P(Xi | D) for each value of Xi
(b) The probability that a badger contracts the disease P(D)
(c) The expected exposure level for badgers that have contracted the disease.
5. A student likes to come to ISyE 624 on time, which is possible as long as the student can travel from left to right in the diagram below. There are two paths to class; the student can pass through if and only if all components along its path are open. Due to construction, the probability that component i is open has probability pi, i = 1,2,3,4. Assume components are independent. If there is not a path to class, the student will arrive to class late, and the professor will be sad. What is the probability the student gets to class on time?
6. The Wisconsin-Minnesota football rivalry dates back to 1890. The teams play once per year for the trophy of “Paul Bunyan’s Axe,” which replaced the first trophy (the “Slab of Bacon,” 1930-1943)*. The teams are unevenly matched, with Wisconsin winning 16 of the last 20 games. Let’s say that Wisconsin wins each game independently with probability 0.8. The teams play next on 11/23/13.
(a) What is the expected number of games/years until Wisconsin loses next?
(b) What is the expected number of games/years until Wisconsin loses 2 games in a row?
(c) What is the probability that it Wisconsin loses for the 3rd time in the 5th game in the series?
* I didn’t make that up!
Posted by Laura McLay on October 23, 2013
As a new faculty member, I often meet staff from offices around the university. This is a fun part of being a new faculty member. I’ve been very impressed with the efficiency at which the University of Wisconsin “introduces” faculty to offices on campus whose services they may need.
This week, I met with a staff member at the Wisconsin Alumni Research Foundation (WARF), which handles patenting and licensing on campus. I was given a lab notebook after the staff member introduced me to possible services.
I have been a longtime advocate of lab notebooks (see my post here). It has been a long time since I received a lab notebook. This was a big step up from my the composition notebooks that I normally use as lab notebooks. I took a picture of the notebook below as well as instructions for using a lab notebook. It is always good to review. This one has lines for experiment witnesses.
I still use an old school lab notebook. I don’t make optimal use of my notebook, since my good ideas that start in my notebook are then revised and edited in many electronic records. Ideas are often refined in electronic documents, as the need to revise a model is often easier to do in a LaTeX document and the need to revise code is easiest to do in a programming language. I slowly and steadily fill my notebooks.
I am not sure how to best make use of lab notebooks for maintaining records given the need to collaborate with others (both in the same place around the world) and to work asynchronously with collaborators. Old school lab notebooks may not be ideal for these collaborations, where shared files must document the research contributions. Working on proposals that later become reports that then become publications basically replace a lab notebook, but this is most useful for ideas that are mature. I have found that documents in Dropbox are good for maintaining a good record of work, since Dropbox keeps old time-stamped versions of documents.
In general, I don’t have all the answers about the perfect way to maintain an electronic lab notebook with collaborators. What do you use?
Posted by Laura McLay on October 16, 2013
The PBS show NOVA covered the traveling salesman problem [Link]. Here is a short video from the program. The traveling salesman has trillions and trillions of possible routes than there are stars in the galaxy. And algorithms can search through those options and find near-optimal solutions relatively quickly. I am having trouble embedding the video, so please visit the NOVA video here:
This episode will air TONIGHT on PBS at 9pm EST.