Thinking about getting a PhD? Here are some good resources.

Part of my job is to help students figure out if grad school is for them. Over the years, I’ve accumulated a few great resources for students thinking about a PhD. Here are some of my favorites.

Here is a link to the first of a series of ten blog posts by Jean Yang, who decided to leave Google to attend grad school in CS at MIT.

Philip Guo released an e-book on the PhD experience called The PhD Grind and has a 45 minute lecture on why to consider a PhD in CS. He has a bunch of other good posts about academic life as both a student and an assistant professor. He has a post specifically on applying to grad school.

Tim Hopper has a great series about whether one should consider a PhD in a technical field. I like this series because a broad range of people participated and not everyone encourages a PhD.

Chris Chamber’s blog post called “Tough Love: an insensitive guide to thriving in your PhD” has a lot of frank advice for determining whether a PhD is for you.

Below is a slideshare presentation I put together for the VCU math club a few years ago, since several seniors were planning to apply to graduate programs.

This is a woefully incomplete list, but I don’t want to delay posting it any longer. What other useful resources are out there?


how to seat guests at a wedding

How to optimally seat people at a wedding.

SAS started an operations research blog [Link]. Matthew Galati’s first entry is how to optimally seat people at a wedding given assignment preferences. He provides a model that maximizes the total happiness of his guests. His blog post has code, data, and a pictures of a quirky family member or two. It’s a great post worth checking out.

I’ve written about optimization for weddings before. I blogged about a paper entitled “Finding an Optimal Seating Chart” in the Annals of Improbable Research by Meghan L. Bellows and J. D. Luc Peterson shows how to use integer programming to optimally seat guests at a wedding [blog post & paper]. The reader comments to this post are really interesting – many people have used a similar modeling approach.

Geoffrey De Smet provided a link to a wedding planner on github.

Related posts:


how to forecast an election using simulation: a case study for teaching operations research

After extensively blogging about the 2012 Presidential election and analytical models used to forecast the election (go here for links to some of these old posts), I decided to create a case study on Presidential election forecasting using polling data. This blog post is about this case study. I originally developed the case study for an undergraduate course on math modeling that used Palisade Decision Tools like @RISK. I retooled the spreadsheet for my undergraduate course in simulation in Spring 2014 to not rely on @RISK. All materials available in the Files tab.

The basic idea is that there are a number of mathematical models for predicting who will win the Presidential Election. The most accurate (and the most popular) use simulation to forecast the state-level outcomes based on state polls. The most sophisticated models like Nate Silver’s 538 model incorporate things such as poll biases, economic data, and momentum. I wanted to incorporate poll biases.

For this case study, we will look at state-level poll data from the 2012 Presidential election. The spreadsheet contains realistic polling data from before the election. Simulation is a useful tool for translating the uncertainty in the polls to potential election outcomes.  There are 538 electoral votes: whoever gets 270 or more votes wins.

Assumptions:

  1. Everyone votes for one of two candidates (i.e., no third party candidates – every vote that is not for Obama is for Romney).
  2. The proportion of votes that go to a candidate is normally distributed according to a known mean and standard deviation in every state. We will track Obama’s proportion of the votes since he was the incumbent in 2012.
  3. Whoever gets more than 50% of the votes in a state wins all of the state’s electoral votes. [Note: most but not all states do this].
  4. The votes cast in each state are independent, i.e., the outcome in one state does not affect the outcomes in another.

There is some concern that the polls are biased in four of the key swing states (Florida, Pennsylvania, Virginia, Wisconsin). A bias means that the poll average for Obama is too high. Let’s consider biases of 0%, 0.5%, 1%, 1.5%, and 2% and implement (all states affected by the same bias level at the same time). For example, the mean for Wisconsin is 52%. This mean would be 50% – 52% depending on the amount of bias. Side note: Obama was such an overwhelming favorite that it only makes sense to look at biases that work in his favor.

It is very difficult to find polls that are unbiased. Nate Silver of FiveThirtyEight wrote about this issue in “Registered voter polls will (usually) overrate Democrats): http://fivethirtyeight.com/features/registered-voter-polls-will-usually-overrate-democrats/

Inputs:

  1. The poll statistics of the mean and standard deviation for each state.
  2. The number of electoral votes for each state.

Outputs:

  1. The total number of electoral votes for Obama
  2. An indicator variable to capture whether Obama won the election.

Tasks:

(1) Using the spreadsheet, simulate the proportion of votes in each state that are for Obama using a spreadsheet for each of the 5 scenarios. Run 200 replications for each simulation. For each iteration, determine the number of electoral votes in each state that go to Obama and Romney and who won.

(2) Paste the model outputs (the average and standard deviation of the number of electoral votes for Obama and the probability that Obama wins) for each of the five bias scenarios into a table.

(3) What is the probability of a tie (exactly 269 votes)?

Modeling questions to think about:

  1. Obama took 332 electoral votes compared to Romney’s 206. Do you think that this outcome was well-characterized in the model or was it an unexpected outcome?
  2. Look at the frequency plot of the number of electoral votes for Obama (choose any of the simulations). Why do some electoral vote totals like 307, 313, and 332 occur more frequently than the others?
  3. Why do you think a tiny bias in 4 states would disproportionately affect the election outcomes?
  4. How do you think the simplifying assumptions affected the model outputs?
  5. No model is perfect, but an imperfect model can still be useful. Do you think this simulation model was useful?

RESULTS

I don’t give the results to my students ahead of time, but here is a figure of the results using @RISK. The students can see how small changes in poll bias can drastically affect the outcomes. With no bias, Obama has a 98.3% chance of winning and with a 2% bias in a mere four swing states, Obama’s chances go down to 79.3%.

@RISK output for the election model. The histogram shows the unbiased results. The table below tabulates the results for different levels of bias.

@RISK output for the election model. The histogram shows the distribution of electoral votes for the unbiased results. The table below tabulates the results for different levels of bias.

Files.

Here are the instructions, the Excel spreadsheet for Monte Carlo simulation, and the Excel spreadsheet that can be used with @RISK.

More reading:


election analytics roundup

Here are a few election related links:

I’ve blogged about elections a lot before. Here are some of my favorites:


thoughts on a PhD development course, part 1

I am teaching a 1 credit hour PhD development course for industrial and systems engineering students at the University of Wisconsin Madison. I am teaching the course with librarian Ryan Schryver, who is using the course to replace his office hours that students never came to. He found that students were not asking the questions that they needed to ask. Additionally, the department has a goal of exposing students to research and people across the department, but we have found that our students work in their labs with few interactions with students in operations research, manufacturing, human factors, and/or healthcare (our four department areas).

This course will fill these gaps. Our syllabus includes a variety of topics for students in their first 2-3 years, from understanding department and university policies to choosing a dissertation topic, technical writing, graphics, research organization, good programming habits, and prelims. We have a bunch of guest speakers plus me. I love hearing my colleagues’ take on things.

Student feedback thus far has been fantastic. I have urged the students to take ownership, and they seem interested in getting what they need from this course, not just in attending to get credit.

My favorite day thus far has been the PhD student panel, where PhD students asked questions to other PhD students. The questions came from all over the place, and the panelists were open to being quite honest about the process and the occasional struggles. Ryan and I knew it was successful when we had those unfiltered moments. Here are a few tweets from the panel.

Our guest speaker from the writing center, had some really good advice about the writing process:

Ryan talked about copyright and fair use, and he filled his talk with many pieces of useful information:

I’ll have a recap slide on this course at the end of the semester. In the mean time, please share advice and observations from similar courses you have taken so I can revise and improve the experience.

Related posts:


introducing Badger Bracketology, a tool for forecasting the NCAA football playoff

bucky_shoots_and_scoresToday I am introducing Badger Bracketology:
http://bracketology.engr.wisc.edu/

I have long been interested in football analytics, and I enjoy crunching numbers while watching the games. This year is the first season for the NCAA football playoff, where four teams will play to determine the National Champion. It’s a small bracket, but it’s a start in the right direction.

The first step to being becoming the national champion is to make the playoff. To do so, a team must be one of the top four ranked teams at the end of the season. A selection committee manually ranks the teams, and they are given a slew of information and other rankings to make their decisions.

I wanted to see if I could forecast the playoff ahead of time by simulating the rest of the season rather than waiting until all season’s games have been played. Plus, it’s a fun project that I can share with my undergraduate simulation simulation that I teach in the spring.

Here is how my simulation model works. The most critical part is the ranking method, which uses the completed game results to rate and then rank the teams so that I can forecast who the top 4 teams will be at the end of the season. I need to do this solely using math (no humans in the loop!) in each of 10,000 replications. Here is how it works. I start with the outcomes of the games played so far, starting with at least 8 weeks of data. This is used to come up with a rating for each team that I then rank. The ranking methodology uses a connectivity matrix based on Google’s PageRank algorithm (similar to a Markov chain). So far, I’ve considered three variants of this model that take various bits of information account like who a team beats, who it loses to, and the additional value provided by home wins. I used data from the 2012 and 2013 seasons to tune the parameters needed for the models.

The ratings along with the impact of home field advantage are then used to determine a win probability for each game. From previous years, we found that the home team won 56.9% of games later in the season (week 9 or later), which accounts for an extra boost in win probability of ~6.9% for home teams. This is important since there are home/away games as well as games on neutral sites, and we need to take this into account. The simulation selects winners in the next week of games by essentially flipping a biased coin with. Then, the teams are re-ranked after each week of simulated game outcomes. This is repeated until we get to the end of the season. Finally, I identify and simulate the conference championship games played (these are the only games not scheduled in advance). And then we end up with a final ranking. Go here for more details.

There are many methods for predicting the outcome of a game in advance. Most of the sophisticated methods use additional information that we could not expect to obtain weeks ahead of time (like the point spread, point outcomes, yards allowed, etc.). Additionally, some of the methods simply return win probabilities and cannot be used to identify the top four teams at the end of the season. My method is simple, but it gives us everything we need without being so complex that I would be suspicious of overfitting. The college football season is pretty short, so our matrix is really sparse. At present, teams have played 8 weeks of football in sum, but many teams have played just 6-7 games. Additional information could be used to help make better predictions, and I hope to further refine and improve the model in coming years. Suggestions for improving the model will be well-received.

Our results for our first week of predictions are here. Check back each week for more predictions.

Badger Bracketology: http://bracketology.engr.wisc.edu/

Our twitter handle is: @badgerbrackets

Your thoughts and feedback are welcome!

Additional reading:

Bucky_Bracket_Town14_1345


what I learned from preparing for a semi-plenary talk

I recently blogged about a semi-plenary talk I gave at the German OR Society Conference. This post is about the process of preparing for that presentation.

First I thought about the story I wanted to tell. I’ve given a lot of research talks before. I understand the general plot of a research talk, but a semi-plenary was not a regular research talk. I wasn’t initially sure how to tell a story in a new way. I asked a wise colleague for advice, which was excellent:

  1. Think about your favorite plenary talks. Model your talk after that (including the amount of math to include in the talk).
  2. Think of the talk as a series of 30 second elevator talks. Let those messages structure your story.
  3. Your audience will want to feel that they’ve learned something. What are the takeaways?

I found that creating an initial set of slides wasn’t so bad once I decided in the story I wanted to tell. I have given so many talks before that I had a huge set of slides that I could pull from. I had too many slides and could not fit into the time slot, and editing and pruning my slides was pure torture.

A few months ago, I read a post by an academic blogger who had recently given a plenary talk. I can’t find the post now but I remember that it took about 40 hours to create a one hour talk. This reminded me of an earlier post on teaching MOOCs (How college is like choosing between going to the movies and Netflix), where an enormous amount of time goes into a single lecture.

Here is why it took so long. I noticed that every time I removed a slide or combined a few slides into a single slide, it affected the story narrative in a major way. In a regular research talk, I find it easy to pick a few details to leave out. Not the case this time. Rather than condense the story, I eventually left some topics out all together or turned the insights from a  paper into a couple of bullet points on a slide. Finding the right balance of detail and insight was a constant challenge.

I ended up having almost no math in my talk. I decided that insights were more important that going through technical details.

I recreated almost all of the visuals from my slides in previous talk. It’s not that my visuals were total crap, it’s just that there was just too much detail and notation in previous figures I made for research talks. I didn’t want confusing visuals getting in the way of the story. Sometimes I added a picture to illustrate an idea or insight that was technical in nature rather than launching into a long narrative to explain a simple point. Here is an example of a new visual explaining the concept of ambulance response times and coverage:

Example of a conceptual slide I used in my talk.

Example of a conceptual slide I used in my talk.

Other times i just needed to make a simpler version of a figure or table that allowed me to look at a single curve or to compare two things, instead of a busier figure that works in a regular research talk. At one point, I changed a figure with four subfigures into a single figure by omitting the other three subfigures. I make nearly all of my figures with Matlab and save my code so that I can easily recreate figures for presentations or paper revisions. Remaking figures wasn’t too taxing, but remaking a lot of figures took some time.

Finally, I learned so much about my research when giving this talk. The end my my talk answered two questions:

  1. Where is emergency medical service research in OR going?
  2. Where does emergency medical service research in OR need to go?

I think about high level issues all the time (after all, I frequently write proposals!). But this was different: I was talking about places where this entire line of research is going, not just mine. When I was answering the question “Where does emergency medical service research in OR need to go?” when making my slides, I learned that my research had already made progress in the right direction. Not all of my ideas are in line with the where this line of research needs to go, and it was worthwhile to realign my priorities.

 

Related posts:

  1. Do you have a 30 second elevator talk about your research?
  2. The most important 30 seconds of your dissertation defense

 


Follow

Get every new post delivered to your Inbox.

Join 2,427 other followers