I loved Tallys Yunes’ challenge to come up with OR memes to bring awareness about O.R. to the masses. Mine is below.
Please make an OR meme and send it to Tallys and me.
Posted by Laura McLay on January 29, 2012
I loved Tallys Yunes’ challenge to come up with OR memes to bring awareness about O.R. to the masses. Mine is below.
Please make an OR meme and send it to Tallys and me.
Posted by Laura McLay on January 27, 2012
In my previous post, I tried to unravel life expectancy curves. The comments on this post were fantastic (thank you, readers!). They were so good that I decided to share some of the readers’ information and reply to a request.
First, I was asked if the mortality rates follow a “bath tub” shape. If you have taken a course on reliability, you have seen hazard rates. Many processes and widgets have a “bath tub” curve, meaning that there is some break-in failure (this is what a warranty is for), there is an extended period of time with a low incidence of failure during a widget’s useful life, and then there is wear-out failure. People are like widgets in this regard. Below is the CDC’s recent mortality estimates for men and women as a function of age. Do to low infant mortality rates, there isn’t much of a tub there, but mortality does decrease for the first 10 years of life for girls and boys (using reliability terms, this is break-in failure). After the age of 10, the mortality curve for boys dramatically rises and diverges from the curve for girls.
Second, the link between women’s life expectancy and childbirth is quite real. The figure below from the Red Blog (courtesy of Hans Rosling) captures international life expectancy rates as a function of the number of children a country has, on average. Michael also points out that the growing life expectancy disparity between men and women reflects this: “As family sizes grow, life expectancies drop. It seems to me that the widening gender gap from 1920 onwards tends to support your notions about childbirth reducing female life expectancy.” Hans Rosling talks about this figure in the must-see video at the bottom of this post.
David Smith found an entire article about life expectancies in England in 1550-1800. Life expectancies were between 35-40. The figure below is not differentiated between gender, but it is indeed fascinating. The article itself discusses childbirth quite a bit, although not so much on the relationship between childbirth and life expectancy. They note that lower life expectancies were caused by poverty and lack of nutrition, which in turn encouraged people to have fewer children.
The following video of Hans Rosling talking about life expectancy over time is a real treat.
Posted by Laura McLay on January 26, 2012
I was poking around for some health data for a project I am working on and came across an interesting life expectancy table from the CDC that reports the life expectancy from birth based on birth year back to 1900. Below, I show a plot of life expectancies as a function of birth year according to gender. I found a few things surprising:
(1) Women have been outliving men for a long time. I thought this was a relatively new phenomenon. It isn’t. Women had a better life expectancy than men at all ages as far back as 1850 (and perhaps longer–I don’t have the data). This is shocking. I thought the risk of childbirth and unsanitary conditions during childbirth would have significantly shortened women’s lives up until the early 1920s. I guess I was wrong.
(2) That blip in life expectancy occurred in 1918. My educated guess is that Spanish influenza caused the blip, since the flu disproportionately affects infants. This is incredibly sad.
(3) There are many blips before about 1945, and life expectancies have looked smoother ever since. I would guess that the widespread use of childhood immunizations has greatly reduced the outbreaks of disease that periodically occur. These diseases are often fatal for infants. The first flu vaccine was introduced in 1945. Many others were introduced near that time (see this timeline and this CDC timeline).
This figure is life expectancy at birth. The infant mortality rate plummeted over the course of the 20th century, meaning that much of the improvement in life expectancy is merely caused by a drop in infant mortality. Below, I plotted the life expectancy at age 5 according to the year of birth.
(1) Here, you can see how the life expectancy is much higher at age 5 than at birth for those birth near 1900, even when not accounting for the first five years of life that went by. This underscores the seriousness of infant mortality at the time. There are few differences between the life expectancy at birth and at age 5 for those born after the year 2000, and in general, the life expectancy curves are flatter than those at birth.
The smoothness may be due to each of the points being a moving average rather than “proof” that the lack of smoothness in the first graph was caused by childhood diseases (which is what I suspect).
(2) The disparity between women and men has increased over time. It looks like the disparity has been reduced somewhat in the past 20 years, but the difference between men and women now is much greater than it was in 1900. Married men live longer. Could the life expectancy rates be less disparate if more people were married? I don’t have data to answer that question.
To plot how much of the increase in life expectancy is caused by improvements in infant mortality, below I plotted the difference in the life expectancy at age 5 and at birth. I normalized these life expectancies by five years to compare apples to apples. Boys and girls born in 1900 and who survived infancy would live 12.9 and 12.5 years longer than their life expectancies at birth, respectively. Interestingly, boys have been disproportionately affected by infant mortality over the years by a small margin, and the figure below reflects this. If you know why, please leave a comment.
Posted by Laura McLay on January 23, 2012
The race for the Republican Presidential nomination has changed so much in the past week that it is hard to keep up. I enjoy reading Nate Silver’s NY Times blog when I have a chance. A week ago (Jan 16) he wrote a post entitled “National Polls Suggest Romney is Overwhelming Favorite for GOP Nomination, where he noted that Romney had a 19 point lead in the polls. He wrote
Just how safe is a 19-point lead at this point in the campaign? Based on historical precedent, it is enough to all but assure that Mr. Romney will be the Republican nominee.
Silver compared the average size of the lead following the New Hampshire primary across the past 20+ years of Presidential campaigns. He sorted the results according to decreasing “Size of Lead” the top candidate had in the polls. The image below is from Silver’s blog, where it suggests that Romney has this race all but wrapped up.
It looks almost impossible for Romney to blow it. I stopped following the election news until Gingrich surged ahead and the recount in Iowa led to Santorum winning the caucus.
A mere week later, it looks like Romney’s campaign is in serious trouble. Today (Jan 23), Silver wrote a post entitled “Some Signs GOP Establish Backing Romney is Tenuous.” His forecasting model for the Florida primary on January 31 now predicts that Newt Gingrich has an 81% chance of winning. This is largely because Silver weighs “momentum” in his model, which Gingrich has in spades.
Two months ago, I blogged about how Obama will win the election next year. I was only half-serious about my prediction. Although the model seems to work, it is based on historical trends that may not sway voters today. Plus, I had no idea who the Republican nominee would be. Despite my prediction, I certainly envisioned a tight race that Obama could lose. Not so much these days.
A lot has changed in the past week (and certainly in the past two months!)
My question is, what models are useful for making predictions in the Republican race? Will the issue of “electability” ever become important to primary voters?
Posted by Laura McLay on January 20, 2012
My last post discussed how one might estimate how many state license plates one would expect to see on a road trip. I made a spreadsheet to compute the probability of seeing each state license plate.
The distance between state capitals was found here. The number of licensed drivers per state is here. I estimated the odds of seeing a license plate from state A in state B is captured by this formula:
P = exp(-K * (Distance from A to B in miles) / # of licensed drivers)
with K = 7000 – 2000*Summer01 – 1000*ExpensiveGas01. Summer01 is 1 if it is summer break and 0 otherwise. ExpensiveGas01 is 1 if it gas is “expensive” and AAA predicts that road trips will be down and 0 otherwise. I didn’t have time to properly identify a meaningful formula or calibrate the parameters. Suggestions here are welcome!
The results make me conclude that the first assumption is probably not true: the probabilities do depend on how long we are in a state. When driving to Vermont, we went through many (8) little states. When driving to Chicago, we went through fewer (5) states but were in each state for longer. Moreover, many of the Midwest states are not “destination” states. Take Indiana for instance. I love Hoosiers as much as the next person, but Indiana truly is the “Crossroads of America”–it’s a state that many people from other states drive through. It’s a better place to spot license plates than, say, Delaware. I didn’t take that into account.
Below is a detailed review of our winter trip numbers. It indicates the predicted probability of seeing each state license plate and whether we actually saw it. As asterisk (*) indicates whether the model is “off”–whether we (1) did not see a state with probability greater than 0.5 or (2) did not see a state with a probability of 0.5 or lower.
A copy of my spreadsheet is here if you want to see how I computed the numbers.
|State||Cumulative probability of seeing each state||States we saw|
|District of Columbia||1||Yes|
Posted by Laura McLay on January 19, 2012
My family took a lot of road trips when I grew up. To combat boredom, we tried to see how many state license plates we would see on our trip. On a trip to see Mount Rushmore, we found almost all of the states.
As an adult and geek, the license plate game has (subtlety?) changed. Now, I combat boredom by talking with my husband about how to come up with a probability distribution for how many state license plates we would expect to see on a road trip from point A to point B.
We took two road trips this year: one from Richmond, VA to Chicago, IL over the summer, the second from Richmond, VA to Burlington, VT over the winter break. We saw ~35 states in our first trip and ~25 states in our second trip. My husband and I immediately noticed that we accrued license plates at a slower rate on our winter trip, which we suspect was from fewer people making road trips over the winter as compared to summer.
We wondered if one could estimate how many license plates you would expect to see in a road trip based on
The state that you are in determines how likely you are to see other state license plates based on their relative distances as well as the number of licensed drivers in other states.
We simplified the problem to avoid looking at how long you drove through a state as well as interstate connectivity issues. That is, there is no difference between driving through West Virginia on I-70 and driving through Pennsylvania on I-80. Additionally, if you are in I-80 in Illinois, you are connected to neighbor states Iowa and Indiana but not neighbor states Missouri and Wisconsin, and therefore, one might expect to see Iowa and Indiana plates. We ignored this and just noted that you would be in Illinois, which gives the likelihood of seeing license plates from other states regardless of “route distance.”
My next post summarizes the model, the assumptions, and the results.
Have you tallied license plates on road trips? What do you think are the salient aspects of this problem to include in a probability model?
Posted by Laura McLay on January 18, 2012
It’s been quiet in the operations research blogosphere today. It seems that many OR bloggers are taking the day off due to SOPA and PIPA (Go here to read about how SOPA works) or perhaps just due a busy semester. Mike Trick is taking the day off from blogging and tweeting.
I didn’t realize I was supposed to take the day off from tweeting until I already tweeted. To be respectful of the blackout that I am supposed to be observing, I decided to postpone a new blog post until tomorrow. After being prodded by a few of my tweeps, I decided to blog about the SOPA blackout. I am not alone. FemaleScienceProfessor also blogged about SOPA and PIPA today.
Please let me know why you decided to blackout today (or not) and what media you are refraining from using for the day.
If you want to wait until tomorrow to leave a comment, I’ll respect that
Posted by Laura McLay on January 17, 2012
Last month, I had the pleasure of meeting Yakov Ben-Haim and talking with him at length about info-gap decision theory. He used an example of squirrels foraging for nuts to illustrate the types of problems for which info-gap decision theory models are useful.
A squirrel needs calories to survive, and nuts provide the perfect source of calories. The squirrel has a decision to make: where should the squirrel go to forage for nuts? Different foraging locations have different potentials for nut payoffs. They also have risks (not enough food). Foraging in a new location may carry highly uncertain risks that are impossible for the squirrel to estimate (being hit by a car, eaten by a wolf, etc.)
The squirrel has two options: the squirrel can hunt in the usual area where he can obtain n nuts with certainty or he can try a new location where he has a probability P of obtaining N nuts (with N > n) and a probability (1-P) of obtaining zero nuts. Let’s say that N and P are wild guesses.
Let’s say that the squirrel is an optimizer and decides to build a decision tree to maximize the number of nuts he can collect. Using basic decision analysis, he devices that he should choose the new location if PN>n.
If the squirrel needs to collect n nuts to survive, then maximizing is nuts (pun intended. Sorry!) Staying with the status quo guarantees survival, even if P and N are large. The payoff for the new location may be greater, but there is a 1-P chance that the squirrel would starve. The traditional decision tree is not robust to the squirrel’s desire to survive (neither is darting in front of cars on the highway, but I digress).
On the other hand, if the squirrel needs to collect N nuts to survive, then staying with the status quo guarantees the squirrel’s demise. The new location is worth a look no matter how risky.
In both of these scenarios, the squirrel isn’t really maximizing the subjective expected nuts that he can collect–he really wants to maximize the probability of meeting his nut threshold (the one that guarantees survival). This is a satisficing strategy (although not dissimilar from an optimizing strategy with a moving threshold). The satisficing strategy is a better bet for the squirrel than the optimization strategy in this decision context. The squirrel doesn’t always need to know the exact probabilistic information to make a good decision, as illustrated above. In fact, he can have absolutely no idea what N and P would be to find an effective nut foraging strategy–even when there is severe uncertainty.
The idea of a squirrel building a decision tree is, of course, ludicrous. But it makes the point that what we should rethink our traditional optimization models so make sure they fit the real decision criteria on hand. Info-gap decision theory thus focuses on satisfying a given acceptable level of what is traditionally considered the objective function value and instead optimizing robustness. It also has philosophical implications for how one views certainty.
I’ve been looking more closely at robustness lately. I won’t abandon my optimization models, but I will acknowledge that including robustness in certain scenarios leads to decisions that more accurately reflect the criteria at hand and decisions that could be counter-intuitive.
Yakov Ben-Haim can explain this much better than I can, so I’ll refer you to his blog about info-gap decision theory and his article about foragers in the American Naturalist if you want to learn more.
Posted by Laura McLay on January 12, 2012
A Chicago area man won the lottery for the second time. The Chicago Tribune reports:
Scott Anetsberger duplicated his $1 million win of nine years ago in the same instant Merry Millionaire game, lottery spokesman Mike Lang said.
Despite long odds, Anetsberger isn’t the first two-time $1 million instant winner. Kimberly Pleticha of Villa Park won $1 million twice in the instant Cash Jackpot game–the first time in August 2010 and the second only six months later in February.
Lottery officials could not instantly compute the odds against multiple winners, but did note there have been a dozen or more two-time Little Lotto winners over the years.
What would the odds of winning the lottery twice would be? Well, it depends on how frequently one plays the lottery.
Winning the Illinois Lottery requires picking six correct numbers, where the numbers range from 1 to 52. The odds of getting all six numbers correct is 1 in 20,358,520. It costs $0.50 to play the lottery, and there are three lotteries per week. Assuming that each lottery is independent (a reasonable assumption), one would have to play the lottery 20,358,520 times, over average, to win (using the geometric distribution). If one plays the lottery three times per week, then it would take 130,500 years to win the lottery once at a cost of more than $10M.
Winning the lottery twice can be modeled as a negative binomial random variable. Assuming that our lottery winner plays the lottery three times per week before and after winning the lottery, then it takes ~261,000 years, on average, to win twice.
Since it is only newsworthy to report additional wins by those who have already won the lottery, then we are really only interested in the odds that a lottery winner would win the lottery again. This is a different question. Assuming that our lottery winner continues to play the lottery three times per week, then the odds of winning again are same as the odds of someone else winning the lottery for the first time: 1 in 20,358,520 per lottery. That is, it would take our lottery winner an additional 130,500 years to win the lottery.
If someone plays the lottery more than three times per week, then the odds of winning go up.
Of course, many people play the lottery, so the odds that someone wins the lottery twice over their lifetime is much, much higher. I tell my students every semester, “Someone will win the lottery. Just not you.” If 130,500 people buy one lottery ticket per game, then there would be a two-time winner every 2 years, on average.
Little Lotto involves picking five correct numbers, where the numbers range from 1 to 39. It is easier to win, but it has a lower payout. The odds of winning are 1 in 575,757, which means that one is 35 times as likely to win the Little Lotto than the regular lottery. It would take 3691 years to win Little Lotto once (by playing three times per week) and 7382 years to win it twice.
Given that there have been 12 two-time winners in Little Lotto in its 23 years of existence, there there is approximately one two-time winner every two years. Given my assumptions, this would suggest that ~3691 people buy a Little Lotto ticket every time. That seems a bit low to me. But I have a head cold and maybe it has temporarily impaired my mathematical abilities.
A seven-time lottery winner’s advice for winning the lottery is to invest more (not less!) of one’s money into buying lottery tickets, as long as one can afford it. He also recommends treating the lottery as a job: the lottery is a skill, and one can improve at it after investing a lot of time. While skill plays a role in playing the lottery (identifying which numbers to pick and identifying which games have the best payoff), I’m pretty sure that this is bad advice. The expected payoff for the lottery is negative, meaning that on average, you are guaranteed to come out behind. The variance in earnings is large, meaning that over many attempts, it is possible that you can come out ahead. But given that one comes out ahead, it would be foolish to attribute one’s success to skill. But maybe I’m missing something.
For the record, I do not recommend gambling or routinely playing the lottery.
For more, read Mike Trick’s post on conditional probabilities and March Madness odds.
Posted by Laura McLay on January 11, 2012
I am due for another teaching with technology post. This post is on slideshare, a social networking tool for uploading and sharing your presentations. There is a lot to like about slideshare and not much to dislike. Here is my list of likes:
I haven’t used slideshare frequently for my teaching in the classroom.In the future, I may require students to maintain presentations for a course on slideshare.
I do use slideshare for teaching outside of the classroom. I shared three presentations that I gave in seminars to a broad audience (on finances, technical writing, and applying to graduate school). I use slideshare to share these slides with students who did not attend the seminar but who may be interested. People seem to find the presentations using google.
I put several of my INFORMS talks on slideshare (see these slides about blogging for operations research) and shared them on twitter using the conference hashtag (#informs2011 this year). That way, other people attending the conference could easily find my slides. I noticed a sharp increase in my presentation feed hits after sharing via twitter.
My slideshare presentations are here.
Related posts on teaching with technology: