Major League Baseball scheduling at the German OR Society Conference

Mike Trick talked about his experience setting the Major League Baseball (MLB) schedule at the 2014 German OR Conference in Aachen, Germany. Mike’s plenary talk had two major themes:
1. Getting the job with the MLB
2. Keeping the job with the MLB

The getting the job section summarized advances in computing power and integer programming solvers that have made solving large-scale integer programming (IP) models a reality. Mike talked about how he used to generate cuts for his models, but now the solvers (like CPLEX or Gurobi) add a lot of the cuts automatically as part of pre-processing. Over time, Mike’s approach has become popping his models into CPLEX and then figuring out what the solver is doing so he can exploit the tools that already exist.

Side note: I am amazed at how good the integer programming solvers have become. I recently worked on a variation to the set covering model for which a greedy approximation algorithm exists. The time complexity of the greedy algorithm isn’t great in theory. In practice, the greedy algorithm is slower than the solver (Gurobi, I think) and doesn’t guarantee optimality. I can’t believe we’ve come this far.

Mike also stressed the importance of finding better ways to formulate the problem to create a better structure for the IP solver.  Better formulations can be more complicated and less intuitive, but they can lead to markedly better linear programming bounds. Mike achieved this by replacing his model with binary variables that correspond to team-to-team games (does team i play team j on day t?) with another model whose variables correspond to series (a series is usually 3 games played between teams on consecutive days). Good bounds from the linear programming relaxations help the IP solver find an optimal solution much quicker. Another innovation focused on improving the schedule by “throwing away” much of the schedule (usually about a month) after making needed changes and resolving. Again, this is something that is possible due to advances in computing.

The keeping the job section addressed business analytics and its role in optimization. Mike defined business analytics as using data to make better decisions, something that OR has always done. What is new is using the power of data analytics and predictive modeling to guide prescriptive integer programming models in a meaningful way. The old way was to use point estimates in integer programming models, the new way uses more information (such as the output of a logistic regression) to guide optimization models. The application Mike used was estimating the value of scheduling home games at different times (day vs. night) and day of the week. When embedded in the optimization modeling framework, the end result was that creating a schedule using business analytics could add about $50M to MLB in revenue. 

Mike summed up his talk but talking about how educating the marketing folks is part of the job now. Marketing likes to measure “success” as the number of games that sell out. Operations researchers recognize that sold out games are lost revenue, so the goal has become to schedule games such that games are almost sold out, and making sure that marketing understands this approach.

Related post:

the craft of scheduling Major League Baseball games


WORMS Childcare Travel Fund application for the INFORMS Annual Meeting

I am the Past-President of the Forum for Women in OR/MS (WORMS). My last initiative as President was to get the ball rolling on a travel fund for students and junior faculty traveling to the INFORMS Annual Meeting with babies and young children (Fact: I did this once!). The travel fund would help pay for childcare costs at the conference. A student member made the suggestion at the business meeting as a way to support (both literally and figuratively) young women in our field, and I thought it was something we should already have been doing.

Current WORMS President Susan Martonosi (at Harvey Mudd University) agreed that this was a great initiative and ran with the idea when she became President in January. She has established the program this year and seeks to make a permanent endowment for the travel fund. Susan did an amazing job! Susan writes:

On behalf of the WORMS officers, I am happy to announce that we have created a new WORMS Childcare Travel Fund, which will partially reimburse recipients for up to $200 in costs associated with care for children age 12 years or younger traveling with a parent who is participating in the INFORMS Annual Meeting. Approximately five grants will be awarded each year, with the exact number depending on available budget. Please read below for eligibility and application procedures. We are also hoping to establish an endowment that will permanently support this travel fund. If your organization is interested in contributing to this endowment, please let me know.

I am really excited that this travel fund has become a reality. I can’t thank Susan Martonosi enough. The award details are below. Please send this to anyone who could benefit, including dads.


WORMS Childcare Travel Fund [Link]

Eligibility:

Recipients must be current members of the Forum on Women in OR/MS (WORMS) who are officially participating in the INFORMS Annual Meeting, including giving an oral or poster presentation, acting as session chair or participating in the Combined Colloquia. Eligibility is limited to current students, post-doctoral associates or early career professionals (e.g. junior faculty or professionals in their first five years of post-graduate employment) who are the parent or guardian of a child age 12 years or younger who is also traveling to the conference. Up to $200 in expenses will be reimbursed upon submittal of receipts to INFORMS staff to cover costs such as travel (transportation, accommodation or meals) for the child, travel (transportation, accommodation or meals) for the child’s caregiver, or payment of childcare services or suitable day-camp program in the conference environs. At most one grant will be awarded per family, and priority will be given to those who have never received the award in prior years.

Application Procedures:

Applications are due September 1, 2014 September 5, 2014. Applicants should email Courtney Biefeld courtney.biefeld at informs.org with the following information:

  1. Subject line should be WORMS Childcare Travel Fund
  2. Statement of need (not to exceed 200 words):
  3. Why is attendance at this conference important to you?
  4. Are you giving an oral or poster presentation, acting as session chair or participating in the Combined Colloquia? (Please provide the title of the presentation, session or Colloquium you are participating in and your role.)
  5. Why do you need childcare?
  6. Why are you requesting travel support?
  7. What childcare do you plan to arrange?
  8. Estimated childcare budget, including travel costs for child and/or caregiver and childcare expenses.
  9. Statement verifying applicant’s eligibility from applicant’s department chair or supervisor.

The President, President-Elect and Past-President of WORMS will appoint the selection committee from amongst the officers of WORMS. The selection committee will assess the applications based on their stated need for travel support. Recipients will be notified by the selection committee no later than September 15, 2014. Recipients should submit all receipts for reimbursement to Courtney Biefeld no later than 30 days following conclusion of the conference.

Notes:

Childcare must be arranged by the parents. INFORMS and WORMS are not responsible for identifying caregivers and are not liable for any damages.


academic blogs: a labor of love

I recently discovered an articles about academics who blog from Tim Hitchcock (a humanities professor). The title really caught my eye: “Twitter and blogs are not add-ons to academic research, but a simple reflection of the passion that underpins it.” Yes! We don’t have to create and maintain blogs, we do so because we love our disciplines and we love to share our passion:

The best (and most successful) academics are the ones who are so caught up in the importance of their work, so caught up with their simple passion for a subject, that they publicise it with every breadth. Twitter and blogs, and embarrassingly enthusiastic drunken conversations at parties, are not add-ons to academic research, but a simple reflection of the passion that underpins it.

I like this summary of how blogging goes hand in hand with other academic activities, contributing to them rather than detracting from them:

The most impressive thing about these blogs (and the academic careers that generate them), is that there is no waste – what starts as a blog, ends as an academic output, and an output with a ready-made audience, eager to cite it. For myself the point is that these scholars don’t waste text, and neither do I. If I give a talk, I turn it into a blog. Not everything is blogged, but the vast majority of the public presentations I make as part of my job, will be.  And while many of these texts will never contribute to an academic article, about half of them do. As a result blogging has become part of my own contribution to what I think of as an academic public sphere. It becomes a way of thinking in public and revising ones work, to make it better, in public. And knowing that there is an audience (whatever its size), changes how one does it – forcing you to think a little harder about the reader, and to think a little harder about the standards of record keeping and attribution that underpin your research.

In fact, this article was cannibalized from one of Hitchcock’s blog posts (on his blog called “Historyonics”) that summarized a message from a talk he gave with the provocative title “Doing it in public: Impact, blogging, social media and the academy” [Link].

The message in these articles resonates with me. Blogging is a labor of love, and this is one of the main reasons to blog. Maintaining a blog is a lot of work, and that isn’t possible without passion. I definitely agree that blogging  isn’t wasted time, but to be honest, it took me awhile to be more efficient with blogging.

I wrote about academic blogging in an article about blogging in the IFORS newsletter that summarizes my thoughts on academic blogging. Here were my final thoughts in that article, where my passion for academic OR blogging hopefully shines through.

Blogging has been a very rewarding journey. While our fame (notoriety?) has passed—ABC News named Bloggers the 2004 People of the Year—blogging is still relevant and important. Blogs continue to be relevant despite being somewhat displaced by the massive rise of microblogging. Blogging provides content that cannot be conveyed in a 140-character tweet or short FaceBook post. Certainly YouTube videos, podcasts, and slidecasts also provide content that rival those in a blog post. However, it is simple to embed youtube videos in a blog post while the reverse is not. Blogs continue to be the best medium for a non-journalist to convey information in different formats accessible in the same place. I have been on several scientific blogging and social networking panels, and they have all confirmed the importance of blogs over other social networking tools.

 

People stumble across OR blogs for many reasons, and often they stick around. Reaching out to these readers is a tremendous opportunity to improve scientific literacy in the general public. I am often disheartened by the state of scientific literacy in the US, where a recent op-ed in the New York Times argued for universities to abolish the algebra requirement for incoming students and where politicians often cite federal grants for conducting basic scientific research as a symptom of government waste. We need to continue to make operations research known to those who can benefit from the use of advanced analytics for making better decisions. OR blogging is important for making the case to increase competence in mathematics, as it is important for letting people know about OR.

HT Arthur Charpentier (@freakonometrics).

 


the 30 most important seconds of your thesis defense

I’m on a lot of dissertation committees. While most of the committees are for students in my department, many are not in my area of operations research. I’m surprised at how hard it can be to follow along to the bigger picture and/or to the technical details. Even when I completely understand the technical details, I usually do not know enough about the specific research niche to characterize the dissertation’s contribution or novelty.

I tell students that the most important part of their thesis or dissertation defenses are the 30 seconds when they summarize the key contributions of their research at the beginning of the dissertation. I’ve been to defenses and proposal defenses where this has been unclear, and confusion follows. A lot of confusion.

The 30 second elevator speech is an important skill, because academics (and non-academics too) spend a lot of time trying to sell their ideas (literally!) to people with technical expertise in another field. The 30 second elevator speech is a necessary but not sufficient first step to communicating with others, and a thesis or dissertation is a great place to get started with this.

Additionally, all committee members want to understand what a student’s research is trying to accomplish and how it will fit into the literature. We need help to get there. Not all committee members seek to understand all the technical ideas, especially if they are outside your area, but we all want the Big Picture. Admittedly, guiding your committee through the Big Picture this will take more than 30 seconds, but doing so will lead to fewer questions later on.

A good thesis offense starts by hitting your committee with a 30 second elevator speech, not a sword. Thesis defense comic courtesy of xkcd.

Related posts:


in defense of model simplicity

Recently, I found a few interesting articles/posts that all defend model simplicity.

An interview with Gregory matthews and Michael Lopez about their winning entry in the Kaggle’s NCAA tournament challenge “ML mania” suggests that it’s better to have a simple model with the right data than a complex model with the wrong data. This is my favorite quote from the interview:

John Foreman has a nice blog post defending simple models here. He argues for sometimes replacing a machine learning model for clustering with an IF statement or two. He links to a published paper entitled “Very simple classification rules perform well on most commonly used datasets” by Robert Holte  in Machine Learning that demonstrates his point. You can watch John talk about modeling in his very informative and enjoyable hour-long seminar here.

A paper called “The Bias Bias” by Henry Brighton and Gerd Gigerenzer examines our tendency to build overly-complex models. Do complex problems require complex solutions? Not always. Here is the abstract.

In marketing and finance, surprisingly simple models sometimes predict more accurately than more complex, sophisticated models. Why? Here, we address the question of when and why simple models succeed — or fail — by framing the forecasting problem in terms of the bias-variance dilemma. Controllable error in forecasting consists of two components, the “bias” and the “variance”. We argue that the benefits of simplicity are often overlooked by researchers because of a pervasive “bias bias”: The importance of the bias component of prediction error is inflated, and the variance component of prediction error, which reflects an oversensitivity of a model to different samples from the same population, is neglected. Using the study of cognitive heuristics, we discuss how individuals and organizations can reduce variance by ignoring weights, attributes, and dependencies between attributes, and thus make better decisions. We argue that bias and variance provide a more insightful perspective on the benefits of simplicity than common intuitions that typically appeal to Occam’s razor.

What about discrete optimization models? 

All of these links address data science problems, like classifying data or building a predictive model. Operations research models are often trying to solve complicated problems with a lot of constraints and requirements. They have a lot of pieces that need to play nicely together. But even then, it’s often incredibly useful to ask the right question and then answer it using a simple model.

I have one example that makes a great case for simple models. Armann Ingolfsson examined the impact of model simplifications in models used to locate ambulances in a recent paper (see citation below). Location problems like this one almost always use a coverage objective function, where locations are covered if an ambulance can respond to the location in a fixed amount of time (e.g., 9 minutes). The question is how to represent the coverage function and how to aggregate the locations, two choices of model error. The coverage objective function can either reflect deterministic or probabilistic travel times. Deterministic travel times lead to binary objective function coefficients (an ambulance covers a location or is doesn’t) whereas probabilistic travel times lead to real-valued objective coefficients that are a little “smoother” with respect to distances between stations and locations (an ambulance can reach 75% of calls at this location in 9 minutes).

This paper examined which is worse: (a) a simple model with highly aggregated locations but realistic (probabilistic) travel times or (b) a more complex model with finely granulated locations but less realistic (deterministic) travel times.

It turns out that the simple but realistic model (choice (a)) is better by a long shot. Here is a figure from the paper that reflects the coverage loss (model error) from different models. The x-axis reflects aggregation, and the y-axis reflects coverage loss (model error, more is bad). The different curves reflect different models. The blue line is the model with probabilistic travel times; the rest have deterministic travel times with the binary value determined by different percentiles.

INSERT HERE

From the paper: “Figure 4 shows how relative coverage loss varies with aggregation level (on a log scale) for the five models, for a scenario with a budget for five stations, using network distances, and actual demand. This figure illustrates our two main findings: (1) If one uses the probabilistic model (THE BLUE LINE), then the aggregation error is negligible, even for extreme levels of aggregation and (2) all of the deterministic models (ALL OTHER LINES) result in large coverage losses that decrease inconsistently, if at all, when the level of aggregation is reduced”

From the conclusion:

In this paper, we demonstrated that the use of coverage probabilities rather than deterministic coverage thresholds reduces the deleterious effects of demand point aggregation on solution quality for ambulance station site selection optimization models. We find that for the probabilistic version of the optimization model, the effects of demand-point aggregation are minimal, even for high levels of spatial aggregation.

Citation:

Holmes, G., A. Ingolfsson, R. Patterson, E. Rolland. 2014. Model specification and data aggregation for emergency services facility location.  [Supplement] [Submitted, last revision March 2014.]

 

What is your favorite simple model?

Related posts:


it’s still safe to fly

Despite terrifying headlines like “2014 could be worst year for plane crash deaths in almost a decade,” it’s quite safe to fly. Operations research has played a role in demonstrating aviation safety over the years. Professor Arnie Barnett at MIT is a leading authority in aviation safety, and he has published several papers on this topic (see references below for four of them). He was recently on Voice of America in a 22 minute segment discussing aviation safety [Link here, HT @Supernetworks]. According to Barnett, flying in the first world was 100 times safer now than in the 1950s. Terrorism may be more of a threat to first world air safety than accidents. Most of Barnett’s papers focus on the safety associated with US domestic trunklines, however, some of his work has noted improvements in international safety.

The developing world is not quite as safe. However, Barnett nicely discusses benefits as well as costs. He points out that many things are not as safe in the developing world (drinking water, medical care, etc.) and that we should look at the entire safety of the trip and weigh that with the potential benefits of travel when making travel decisions. Likewise, there are potential solutions for improving air safety that may be too costly. Given limited budgets for things like (say) security, it generally makes sense to spend the budget on things that have the most impact. Barnett references RAND’s MANPADS study [Link] that concluded that “given the enormous cost of installing anti-missile systems compared with other homeland security measures, researchers suggest that officials explore less costly approaches in the near term while launching efforts to improve and demonstrate the reliability of the systems.”

This week, Arnie Barnett was also on More or Less on BBC Radio [Link]

Have the recent air events changed your willingness to fly domestically or internationally?

 

ON THE LINE: How Safe Are Our Skies ?

ON THE LINE: How Safe Are Our Skies ?

Barnett, A., Abraham, M., & Schimmel, V. (1979). Airline safety: Some empirical findings. Management Science25(11), 1045-1056.

Barnett, A., & Higgins, M. K. (1989). Airline safety: The last decade.Management Science35(1), 1-21.

Barnett, A. (2000). Free-flight and en route air safety: a first-order analysis.Operations research48(6), 833-845.

Czerwinski, D., & Barnett, A. (2006). Airlines as baseball players: Another approach for evaluating an equal-safety hypothesis. Management science,52(9), 1291-1300.

Air fatalities per year


land O links

Here are a few links for your holiday weekend reading:

  1. How to make mass transit sustainable once and for all by @trnsprttnst
  2. Why commute times don’t change much even as a city grows by @e_jaffe
  3. Blogging: is it good or bad for journal readership? The Incidental Economist weighs in.
  4. Harvard Business Review: Instinct can beat analytical thinking
  5. The hot hand fallacy: why we persist in seeing streaks
  6. The myth of the hot hand fallacy by @JSEllenberg
  7. Sports teams are immersed in “big data”
  8. Speaking of big data, an entire tumblr is devoted to cheesy pictures of Big Data (HT @mlesz1 )

This is what Big Data looks like. Maybe.


Follow

Get every new post delivered to your Inbox.

Join 2,211 other followers