In some Olympic sports, such as diving, the athlete receives scores based on several trials. In diving, each trial is a separate dive. A diver’s score is the sum of the different dive scores in different trials. What is the best way to maximize the chance of getting a medal?

Women’s springboard diving rules (link):

- Women must complete five dives.
- There is no limit on the total degree of difficulty for these dives.
- At least one dive during the contest must come from each of five different categories – forward, back, reverse, inward, and twisting.
- No dive can be repeated in a list of dives.
- Divers must select dives ahead of time and cannot change the order

Men’s platform diving rules (link):

- Men must complete six dives.
- There is no limit on total degree of difficulty for these dives.
- For the men, at least one dive during the contest must come from each of six different categories – forward, back, reverse, inward, twisting and armstand.
- No category can be repeated in a list of dives.
- All dives must be competed from the 10-meter platform.
- Divers must select dives ahead of time and cannot change the order

Let’s make three assumptions:

- Divers can select their dives and their order
*on the fly*(clearly not true, but makes it more interesting). - A diver’s performance on each dive is independent of his/her performance in other dives.
- From experience, divers know the distribution of points based on each of their dives.

**Model 1: Divers know the “threshold” for medaling ahead of time (dubious!)**

Let’s solve this problem using a Markov decision process, where the stage here is the dive (1-5 for women or 1-6 for men).

Let

Vt(S(t)) = value of being in state S(t), where S(t) = the total number of points.

and let

M = point threshold for medaling.

The rewards are Rt(S(t-1),P) = 1 if the diver moves from state S(t-1)<M to S(t)>=M (i.e., the diver moves into medal contention if the P points from the dive at time t-1 moves them above threshold M). All other rewards are zero. The diver wants to maximize the probability of medaling, which is equivalent to maximizing the total value (V1(0)) or the expected value of the 0-1 indicator variable indicating whether the diver medals or not.

Let the set of dives in each category be captured by D1, D2,…,D5 for women or D1, D2,…,D6 for men. For each d in Di, there are known probability distributions for the point totals P.

Now the Bellman equations are

Vt(S(t)) = max_{d in Dt} E{Rt(S(t),P) + V[t+1](S(t)+P)}

Here, the expectation is taken over the points distribution for dive d in Dt. The probability of medaling is found by V1(0), the value before dive 1 starting with 0 points. The boundary conditions are V[T+1](S(T+1)) = 0 for all values of S(T+1) since rewards are accumulated once.

The optimal policy indicates what dive should be chosen in each trial (MDP stage) based on the total number of points that have been accumulated thus far. If a diver is successful with tough dives early on, the diver can choose easier dives later on (and vice versa).

**Model 2: Divers do not know the “threshold” for medaling ahead of time so they maximize the total number of points**

The model dynamics here are identical to those of the model above except for the rewards. The random rewards Rt(S(t-1),P) = P(d,t) if the diver gets P points for their dive d. The rest is the same yielding Bellman equations of

Vt(S(t)) = max_{d in Dt} E{

P(d,t)+ V[t+1](S(t)+P(d))}

They look the same as above, but the rewards in boldface are different. The expected number of points is captured by V1(0) and the boundary conditions are V[T+1](S(T+1)) = 0 for all values of S(T+1) as before.

Here, the diver doesn’t really need to solve an MDP. He or she can simply select the dive that yields the most points (on average) in each category, since the choices will not depend on the number of points accumulated thus far. Let **EP(d,t)** denote the average points from dive d. Then, the policies depend on the stage t rather than the full state variable S(t), yielding Bellman equations of

V[t] = max_{d in Dt} (EP(d,t) + V[t+1]).

The expected number of points is captured by V(1) and the boundary condition is V[T+1] = 0. Note that we lose the expectation here. The optimal solution isn’t rocket science: it is to select the dive with the largest EP(d,t) for each t.

**Why these models are different**

On face value, it may not be obvious why these models could yield different solutions. They are both used to identify dives that yield many points. The second model maximizes the *expected number of points* whereas the first model maximizes the *point total distribution above a threshold* (which can be thought of as moving as much of the tail of the distribution past a fixed threshold M). The first model could lead to “riskier” dive strategies for a diver who is not a favorite to win: the diver has a chance of being on the podium but could also go down in flames. For the gold medal favorite, the first strategy might lead to a conservative strategy that weeds out all of the dives that have a chance of a disastrous result. The second model leads to more conservative dive selections that yield more points (on average) but that might yield the most points but would almost certainly not lead to a medal.

**Lift the first assumption: divers must make their selections ahead of time**

If we lift the first assumption, the choice of dives cannot depend on the point total thus far to identify the best set of dives regardless of what happens with the earlier dives.

The answer to the second model is still obvious. The optimal solution is never change dives on the fly, even when one has that choice.

The first model, however, can be examined by looking at the joint probability distribution in the points that would be expected.

Let the decision variables be captured by

x(d,t) = 1 if dive d is selected in stage t and 0 otherwise.

Let P(d,t)

Then our stochastic optimization model is

max Pr( P(1,1)x(1,1) + P(1,2)x(1,2)+…+P(|D1|,1)x(|D1|,1)+…+P(|D5|,5)(|D5|,5) > M )

subject to x(1,t)+x(2,t)+…+x(|Dt|,t) = 1 for all t=1,2,3,4,5

x(d,t) \in {0,1} for t=1,2,…,|Dt|, t=1,2,3,4,5.

I’ll stop here because I’ve been up late watching the Olympics. I’ll leave the solution as an exercise for the reader. Leave feedback and corrections in the comments.

If you’ve ever dove from 10m, you rule. I jumped from 7m a few times and am unwilling to go any higher.