diff options
author | Søren Sandmann Pedersen <ssp@l3000.localdomain> | 2012-03-09 10:41:14 -0500 |
---|---|---|
committer | Søren Sandmann Pedersen <ssp@l3000.localdomain> | 2012-03-09 10:41:14 -0500 |
commit | a701c9f3f45c416a310350d68f21a61a4a782788 (patch) | |
tree | bb43bc833c9bda95d106c5f8d0f6a23ae1d09046 |
radixheap
-rw-r--r-- | radixheap | 72 |
1 files changed, 72 insertions, 0 deletions
diff --git a/radixheap b/radixheap new file mode 100644 index 0000000..e1d4596 --- /dev/null +++ b/radixheap @@ -0,0 +1,72 @@ +The radix heap + +The binary heap is a well-known data structure that allows a priority +queue to be implemented in time O(log n). It's quite efficient because +it can be done as an array. However, if you accept a few restricitons, +it's possible to do better from a caching point of view. The +restrictions that you must accept are (a) that all the keys in the +heap are integers and (b) that you can never insert a new item that is +smaller than all the other items in the heap. + +These restrictions are not that severe. The resulting heap still works +in many algorithms that use heaps as a subroutine: Dijkstra's +shortest-path algorithm, Prim's minimum spanning tree algorithm, +various sweepline algorithms in computation geometry. + +Here is how it works. We'll assume that the keys are 32 bit +integers. The radix heap has 33 buckets of items: + + + 0 1 2 3 4 5 6 7 8 31 32 + [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ... [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] + x + x + x + +each one containing a list of keys. We also maintain one global value: +last_deleted, which is initially MIN_INT and otherwise contains the +last value deleted from the queue. + +The invariant is as follows: + + - bucket k contains items where bit k - 1 is different from + last_deleted, and where bits k to 31 are equal to last_deleted. + +That means the items in bucket 0 are equal to last_deleted. Items in +bucket 10 are different in the 9th bit, and possibly in bit 0 to +8. And items in bucket 32 are different in bit 31 and possibly any +other bit. + +When a new item is inserted, it has to be added to the correct +bucket. How can we compute the bucket number? We have to find the +highest bit where the two numbers differ. This is easily done by +XOR'ing them together, and then finding the highest set bit. Adding +one gives the bucket number. Inserting a new item is as simple as +adding it to the bucket. + +Extracting the minimum involves first finding the minimal item by +walking the lowest-numbered non-empty bucket and finding the minimal +item in that bucket. Then that item is deleted and last_deleted is +updated. Then the bucket is walked again and each item is moved to a +new bucket according to the new last_deleted. + +We have to be sure this doesn't violate the invariant. First note that +all the items in the bucket that is being redistributed will satisfy +the invariant because they are simply being reinserted. And a bucket +with a higher number c will still be different from the new +last_deleted in bit c - 1. If it weren't, the new last_deleted would +itself have belonged in bucket c. And there are no non-empty buckets +with a lower number than the one that is being redistributed. + +So + +items are different from the previous last_deleted at position k - +1. Since a bit can only be 0 or 1, + +the removed item has the same bit + +One reason the radix heap is more efficient than the binary heap is +that it has better caching behavior when implemented properly. Various +descriptions of this data structure recommend implementing the buckets +with doubly linked lists, but it's much better to implement them as +arrays. |