diff options
author | Søren Sandmann Pedersen <ssp@redhat.com> | 2013-02-24 15:26:32 -0500 |
---|---|---|
committer | Søren Sandmann Pedersen <ssp@redhat.com> | 2013-02-24 15:26:32 -0500 |
commit | 999709deb90102991222863677625cd885a5c182 (patch) | |
tree | 4c7becada456c800b6f857907d47c1401a178ec9 |
checkin
-rw-r--r-- | celebrities | 75 | ||||
-rw-r--r-- | clusters.c | 52 |
2 files changed, 127 insertions, 0 deletions
diff --git a/celebrities b/celebrities new file mode 100644 index 0000000..7827706 --- /dev/null +++ b/celebrities @@ -0,0 +1,75 @@ +Celebrities die 2.7218 at a time + +The claim that celebrities die in threes is usually dismissed as the +result of the human propensity to see patterns where there are +none. But celebrities don't die at regularly spaced intervals +either. It would be very weird if celebrities predictably dies on the +1st of every month. And once you deviate from a regularly spaced +pattern, some amount of clustering is inevitable. Can we make this +more precise? + +Rather than trying to define exactly what constitutes a celebrity, I +will simply assume that they die at a fixed rate and that they do so +independently of each other (the day music died notwithstanding). In +other words, it is a Poisson process with intensity \(\lambda\) where +\(\lambda\) is the number of deaths that occur in some fixed time +period. In a Poisson process the time between events is exponentially +distributed with parameter \(\lambda\). The average waiting time is +\(1/\lambda\). + +As an example, suppose we define celebrityhood in such a way that +twelve celebrities die each year. Then \lambda is 12/year, and the +average time between two deaths will be 1/(12/year) = 1/12th year, or +1 month. + +What does it mean for celebrities to die \(n\) at a time? We will +simply say that two celebrities die together if the period between +them is shorter than expected. If the celebrity death rate is 12/year, +then two celebrities died together if their deaths were less than a +month apart. Similarly, we will say that three celebrities died +together if both the period between death 1 and death 2, and between +death 2 and death 3 was shorter than a month. In general, \(k\) +celebrities died together if the \(k - 1\) periods between them were +*all* shorter than expected. + +Here is a diagram of 10 years worth of randomly generated deaths with +12 deaths per year and clusters highlighted: + + [ picture of clusters ] + +Average cluster size + +Suppose a celebrity has just died after a longer than average +wait. This death will start a new cluster, and we want to figure out +what the size of it is. We can model the cluster size as a +stochastical variable \(C\) and figure out its distribution. + +The cluster size will be 1 when the waiting time for the next death is +larger than or equal to the average. Plugging this into the cumulative +distribution function for the exponential distribution, we get: + + P(C = 1) = P(W > 1/lambda) = 1 - (1 - e^-lambda * (1/lambda)) = e^-1 = 0.3679 + +The probability that the death will be part a cluster of size 2 is the +probability that the next waiting time is shorter than average and the +next one after that is longer: + + P(C = 2) = P(W <= 1/lambda) * P(W > 1/lambda) = (1 - e^-1) * e^-1 = 0.2325 + +For size three, it's the probability that the next two waiting times +are shorter and the third one longer: + + P(C = 3) = P(W <= 1/lambda)^2 * P(W > 1/lambda) = (1 - e^-1)^2 * e^-1 = 0.1470 + +In general, the probability that a celebrity death will be part of +cluster of size \(k\) is: + + P(C = k) = P(W <= 1/lambda)^(k - 1) * P(W > 1/lambda) = (1 - e^-1)^(k-1)*e^-1 + +So what's the average size of a Celebrity Death Cluster? The expected +value of \(C\) is given by: + + \E[C] = \sum_{k=1}^\infty k * P(C = k) = 1/e * \sum_{k=1}^\infty k * (1 - 1/e)^(k - 1) * e^-1 + +It's not terribly hard to show that this infinite series has sum +\(e\), so on average, celebrities die 2.7218 at a time. diff --git a/clusters.c b/clusters.c new file mode 100644 index 0000000..f919db8 --- /dev/null +++ b/clusters.c @@ -0,0 +1,52 @@ +#include <stdlib.h> +#include <math.h> +#include <stdio.h> + +static double +gen_exp (double lambda) +{ + double u = drand48 (); + + return log (1 - u) / (-lambda); +} + +int +main () +{ + int cluster = 1; + int i; + double avg_cluster = 0.0; + double avg_x; + int n_k = 0; + int smaller = 0; + int n_clusters = 0; + +#define N 187250000.0 +#define LAMBDA 2.0 +#define K 1 + + for (i = 0; i < N; ++i) + { + double f; + f = gen_exp (LAMBDA); + if (f <= (1/(LAMBDA))) + { + cluster++; + smaller++; + } + else + { + avg_cluster += cluster + 1; + if (cluster == K) + n_k++; + cluster = 0; + n_clusters++; + } + avg_x += f; + } + + printf ("Probability of %d: %f\n", K, n_k / N); + printf ("Average cluster size: %f\n", avg_cluster / ((double)n_clusters)); + printf ("Average value: %f\n", avg_x / N); + printf ("Probability of smaller: %f\n", smaller / N); +} |