summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSøren Sandmann Pedersen <ssp@redhat.com>2013-02-24 15:26:32 -0500
committerSøren Sandmann Pedersen <ssp@redhat.com>2013-02-24 15:26:32 -0500
commit999709deb90102991222863677625cd885a5c182 (patch)
tree4c7becada456c800b6f857907d47c1401a178ec9
checkin
-rw-r--r--celebrities75
-rw-r--r--clusters.c52
2 files changed, 127 insertions, 0 deletions
diff --git a/celebrities b/celebrities
new file mode 100644
index 0000000..7827706
--- /dev/null
+++ b/celebrities
@@ -0,0 +1,75 @@
+Celebrities die 2.7218 at a time
+
+The claim that celebrities die in threes is usually dismissed as the
+result of the human propensity to see patterns where there are
+none. But celebrities don't die at regularly spaced intervals
+either. It would be very weird if celebrities predictably dies on the
+1st of every month. And once you deviate from a regularly spaced
+pattern, some amount of clustering is inevitable. Can we make this
+more precise?
+
+Rather than trying to define exactly what constitutes a celebrity, I
+will simply assume that they die at a fixed rate and that they do so
+independently of each other (the day music died notwithstanding). In
+other words, it is a Poisson process with intensity \(\lambda\) where
+\(\lambda\) is the number of deaths that occur in some fixed time
+period. In a Poisson process the time between events is exponentially
+distributed with parameter \(\lambda\). The average waiting time is
+\(1/\lambda\).
+
+As an example, suppose we define celebrityhood in such a way that
+twelve celebrities die each year. Then \lambda is 12/year, and the
+average time between two deaths will be 1/(12/year) = 1/12th year, or
+1 month.
+
+What does it mean for celebrities to die \(n\) at a time? We will
+simply say that two celebrities die together if the period between
+them is shorter than expected. If the celebrity death rate is 12/year,
+then two celebrities died together if their deaths were less than a
+month apart. Similarly, we will say that three celebrities died
+together if both the period between death 1 and death 2, and between
+death 2 and death 3 was shorter than a month. In general, \(k\)
+celebrities died together if the \(k - 1\) periods between them were
+*all* shorter than expected.
+
+Here is a diagram of 10 years worth of randomly generated deaths with
+12 deaths per year and clusters highlighted:
+
+ [ picture of clusters ]
+
+Average cluster size
+
+Suppose a celebrity has just died after a longer than average
+wait. This death will start a new cluster, and we want to figure out
+what the size of it is. We can model the cluster size as a
+stochastical variable \(C\) and figure out its distribution.
+
+The cluster size will be 1 when the waiting time for the next death is
+larger than or equal to the average. Plugging this into the cumulative
+distribution function for the exponential distribution, we get:
+
+ P(C = 1) = P(W > 1/lambda) = 1 - (1 - e^-lambda * (1/lambda)) = e^-1 = 0.3679
+
+The probability that the death will be part a cluster of size 2 is the
+probability that the next waiting time is shorter than average and the
+next one after that is longer:
+
+ P(C = 2) = P(W <= 1/lambda) * P(W > 1/lambda) = (1 - e^-1) * e^-1 = 0.2325
+
+For size three, it's the probability that the next two waiting times
+are shorter and the third one longer:
+
+ P(C = 3) = P(W <= 1/lambda)^2 * P(W > 1/lambda) = (1 - e^-1)^2 * e^-1 = 0.1470
+
+In general, the probability that a celebrity death will be part of
+cluster of size \(k\) is:
+
+ P(C = k) = P(W <= 1/lambda)^(k - 1) * P(W > 1/lambda) = (1 - e^-1)^(k-1)*e^-1
+
+So what's the average size of a Celebrity Death Cluster? The expected
+value of \(C\) is given by:
+
+ \E[C] = \sum_{k=1}^\infty k * P(C = k) = 1/e * \sum_{k=1}^\infty k * (1 - 1/e)^(k - 1) * e^-1
+
+It's not terribly hard to show that this infinite series has sum
+\(e\), so on average, celebrities die 2.7218 at a time.
diff --git a/clusters.c b/clusters.c
new file mode 100644
index 0000000..f919db8
--- /dev/null
+++ b/clusters.c
@@ -0,0 +1,52 @@
+#include <stdlib.h>
+#include <math.h>
+#include <stdio.h>
+
+static double
+gen_exp (double lambda)
+{
+ double u = drand48 ();
+
+ return log (1 - u) / (-lambda);
+}
+
+int
+main ()
+{
+ int cluster = 1;
+ int i;
+ double avg_cluster = 0.0;
+ double avg_x;
+ int n_k = 0;
+ int smaller = 0;
+ int n_clusters = 0;
+
+#define N 187250000.0
+#define LAMBDA 2.0
+#define K 1
+
+ for (i = 0; i < N; ++i)
+ {
+ double f;
+ f = gen_exp (LAMBDA);
+ if (f <= (1/(LAMBDA)))
+ {
+ cluster++;
+ smaller++;
+ }
+ else
+ {
+ avg_cluster += cluster + 1;
+ if (cluster == K)
+ n_k++;
+ cluster = 0;
+ n_clusters++;
+ }
+ avg_x += f;
+ }
+
+ printf ("Probability of %d: %f\n", K, n_k / N);
+ printf ("Average cluster size: %f\n", avg_cluster / ((double)n_clusters));
+ printf ("Average value: %f\n", avg_x / N);
+ printf ("Probability of smaller: %f\n", smaller / N);
+}