radixheap

author: Søren Sandmann Pedersen <ssp@l3000.localdomain> 2012-03-09 10:41:14 -0500
committer: Søren Sandmann Pedersen <ssp@l3000.localdomain> 2012-03-09 10:41:14 -0500
commit: a701c9f3f45c416a310350d68f21a61a4a782788 (patch)
tree: bb43bc833c9bda95d106c5f8d0f6a23ae1d09046
1 files changed, 72 insertions, 0 deletions
diff --git a/radixheap b/radixheap
new file mode 100644
index 0000000..e1d4596
--- /dev/null
+++ b/radixheap
@@ -0,0 +1,72 @@
+The radix heap
+
+The binary heap is a well-known data structure that allows a priority
+queue to be implemented in time O(log n). It's quite efficient because
+it can be done as an array. However, if you accept a few restricitons,
+it's possible to do better from a caching point of view. The
+restrictions that you must accept are (a) that all the keys in the
+heap are integers and (b) that you can never insert a new item that is
+smaller than all the other items in the heap.
+
+These restrictions are not that severe. The resulting heap still works
+in many algorithms that use heaps as a subroutine: Dijkstra's
+shortest-path algorithm, Prim's minimum spanning tree algorithm,
+various sweepline algorithms in computation geometry.
+
+Here is how it works. We'll assume that the keys are 32 bit
+integers. The radix heap has 33 buckets of items:
+
+
+   0   1   2   3   4   5   6   7   8                                  31  32
+  [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ... [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
+       x
+       x
+       x
+
+each one containing a list of keys. We also maintain one global value:
+last_deleted, which is initially MIN_INT and otherwise contains the
+last value deleted from the queue.
+
+The invariant is as follows:
+
+    - bucket k contains items where bit k - 1 is different from
+      last_deleted, and where bits k to 31 are equal to last_deleted.
+
+That means the items in bucket 0 are equal to last_deleted. Items in
+bucket 10 are different in the 9th bit, and possibly in bit 0 to
+8. And items in bucket 32 are different in bit 31 and possibly any
+other bit.
+
+When a new item is inserted, it has to be added to the correct
+bucket. How can we compute the bucket number? We have to find the
+highest bit where the two numbers differ. This is easily done by
+XOR'ing them together, and then finding the highest set bit. Adding
+one gives the bucket number. Inserting a new item is as simple as
+adding it to the bucket.
+
+Extracting the minimum involves first finding the minimal item by
+walking the lowest-numbered non-empty bucket and finding the minimal
+item in that bucket. Then that item is deleted and last_deleted is
+updated.  Then the bucket is walked again and each item is moved to a
+new bucket according to the new last_deleted.
+
+We have to be sure this doesn't violate the invariant. First note that
+all the items in the bucket that is being redistributed will satisfy
+the invariant because they are simply being reinserted. And a bucket
+with a higher number c will still be different from the new
+last_deleted in bit c - 1. If it weren't, the new last_deleted would
+itself have belonged in bucket c. And there are no non-empty buckets
+with a lower number than the one that is being redistributed.
+
+So 
+
+items are different from the previous last_deleted at position k -
+1. Since a bit can only be 0 or 1, 
+
+the removed item has the same bit
+
+One reason the radix heap is more efficient than the binary heap is
+that it has better caching behavior when implemented properly. Various
+descriptions of this data structure recommend implementing the buckets
+with doubly linked lists, but it's much better to implement them as
+arrays.
author	Søren Sandmann Pedersen <ssp@l3000.localdomain>	2012-03-09 10:41:14 -0500
committer	Søren Sandmann Pedersen <ssp@l3000.localdomain>	2012-03-09 10:41:14 -0500
commit	a701c9f3f45c416a310350d68f21a61a4a782788 (patch)
tree	bb43bc833c9bda95d106c5f8d0f6a23ae1d09046