Many of you have seen The Birthday Problem: Given a group of n people, what is the probability that someone shares a birthday?
Here, we are only concerned with birth day and month (not year). The solution assumes that a person is equally born on any of the 365 days in the year, thus ignoring leap years.
Let P(n) = the probability that someone shares a birthday in a group of n people and let Q(n) = the probability that everyone has unique birthdays. There are 365^n ways for n people to be born on any of the 365 days.Then
P(n) = 1 – Q(n) = 1 – (365*364*…*(365-n+1))/365^n.
P(n)
P(2) = 0.0028
P(5) = 0.0271
P(10) = 0.1169
P(20) = 0.4114
P(30) = 0.7063
P(40) = 0.8912
P(50) = 0.9704
P(60) = 0.9941 –> in a room with 60 people, you are almost certain to have at least two people that share a birthday!
The key assumption is that all birth dates are equally likely. This NPR article shows that humans have a “mating season” that makes July – September birthdays more likely. I posted the image below.
This will, of course, change our answer above. The probabilities depend on who is in the room. Have you simulated the Birthday Problem with an unequal birthday distribution? If so, please shed light on realistic numbers for P(n).
On a side note, the image below suggests that babies are induced on December 27-30 for a tax break. I’m not sure how I feel about that.
May 16th, 2012 at 2:25 pm
That’s really interesting to see – I’d like to see the distribution over years as well. Like before inductions became as common, as well as more advanced birth control. With Will I was “encouraged” to consider induction so he wouldn’t be born while the doctors were on their Thanksgiving holidays, but he came on his own a week before.
May 16th, 2012 at 3:28 pm
This heat map is based on rankings (http://www.nytimes.com/2006/12/19/business/20leonhardt-table.html) not the actual frequencies which would make for a nice sim
May 16th, 2012 at 4:01 pm
Apparently, Dec 25th is the least common birthdate in the U.S., followed by Jan 1st. (Ok, actually Feb 29th is the least common one.) – http://www.nytimes.com/2006/12/19/business/20leonhardt-table.html?_r=1
But: http://minnesota.publicradio.org/display/web/2009/12/29/january-1-birthdays/ (not to mention: it’s probably the most common birthdate used in online service registrations.)
May 16th, 2012 at 4:08 pm
Using the frequency data referenced here (http://www.panix.com/~murphy/bday.html), I found no significant difference from the theoretical value (assuming uniformity) for P(23) = 0.507. I just presented this as a teaser on Mon night to kick off the summer term of an MBA course; I’ll hit them up with this “update” in an hour.
May 16th, 2012 at 4:14 pm
It is an undergraduate exercise to show that if one date has probability 1/365 + delta and another 1/365 – delta, with all other dates having probabilities 1/365, then the probability that there is one match in a group of n is greater than if all probabilities are equal. By extension, if the probabilities are unequal, then the probability of a match in a group of n is greater than if all probabilities are equal. The extreme case is of course, everyone being born on the same date!
May 16th, 2012 at 4:54 pm
Matforddavid – proof or it didn’t happen
May 16th, 2012 at 8:58 pm
Ran a quick and dirty monte carlo simulation in Matlab. Here’s what I got.
here’s the code I ran. Not the best but it’ll do:
May 17th, 2012 at 6:56 am
Could babies be induced to avoid being born in the “wrong” Chinese year. I am told some animals are good to be born under some like the pig bad.
Is there a bump caused by Valentines day? Would you expect the superbowl to cause an increase or a drop?
There is a chapter in Wiseman’s book quirkology where he talks about the parents of Churchmen faking their birthday to be on December 25th
May 17th, 2012 at 4:42 pm
Regarding your conjecture about tax breaks, I’d be more inclined to suspect it’s a backlog of planned C-sections due to doctors taking off the preceding few days (very low frequency compared to neighboring points, no doubt a holiday “seasonal” effect).
May 17th, 2012 at 5:09 pm
“The probabilities depend on who is in the room” One other issue is some groups are more likely to share birthdays than average. Professional sports people tend to be born at a time that means they are old for their underage games. So they are on the old end of 10 in the under 11’s. This means they are usually born in the first three months of the year.
Kary Mullis in his support for astrology says “A recent scientific study of the distribution of medical students in birth
months discovered that a lot of medical students were born in late June. ” http://www.crawfordperspectives.com/documents/IAMACAPRICORN_000.pdf
I dont know the paper but it implies that there could be many professions that cluster in birthdate
June 7th, 2012 at 7:00 am
A “follow-up” on your post [by David Smith of Revolutions Analytics]:
http://blog.revolutionanalytics.com/2012/06/simulating-the-birthday-problem-with-data-derived-probabilities.html
June 9th, 2012 at 3:29 pm
I coincidentally did a simulation the other day. The answer is essentially unchanged, but for medium groups (10-50 people) reality seems to be very slightly favored, to the tune of 0.15% more likely to find a match. My simulation: The CDC data includes birth day for 1969 to 1988.
I actually did a very similar simulation a few days ago using that full data and found that they were nearly identical. Slightly more likely in reality than the simulation, but only 0.14% more likely at n=23 (and n=23 is still the minimum group size necessary for >= 50%.
Post: http://chmullig.com/2012/06/births-by-day-of-year