The Radix Heap The binary heap is a well-known data structure that allows a priority queue to be implemented in time O(log n). It's quite efficient because it can be done as an array. However, if you accept a few restrictions, it's possible to do better from a caching point of view. The restrictions that you must accept are (a) that all the keys in the heap are integers and (b) that you can never insert a new item that is smaller than all the other items in the heap. These restrictions are not that severe. The resulting Radix Heap still works in many algorithms that use heaps as a subroutine: Dijkstra's shortest-path algorithm, Prim's minimum spanning tree algorithm, various sweepline algorithms in computation geometry. Here is how it works. We'll assume that the keys are 32 bit integers. The radix heap has 33 buckets of items: 0 1 2 3 4 5 6 7 8 31 32 [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ... [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] x x x each one containing a list of keys. We also maintain one global value: last_deleted, which is initially MIN_INT and otherwise contains the last value deleted from the queue. The invariant is as follows: Bucket k contains items where bit k - 1 is the highest bit that is different from last_deleted. The items in bucket 0 are equal to last_deleted. That means for items in bucket 10, the 9th bit is different, bit 0 to 8 are possibly different, and bit 11 to 31 are equal to those in last_deleted. Items stored in bucket 32 are different in bit 31 and possibly any other bit. Operations When a new item is inserted, it has to be added to the correct bucket. How can we compute the bucket number? We have to find the highest bit where the new item differs from last_deleted. This is easily done by XOR'ing them together, and then finding the highest set bit. Adding one then gives the bucket number. Inserting the item clearly preserves the invariant because last_deleted didn't change and the new item will be in the correct bucket. Extracting the minimum involves first finding the minimal item by walking the lowest-numbered non-empty bucket and finding the minimal item in that bucket. Then that item is deleted and last_deleted is updated. Then the bucket is walked again and each item is reinserted into a new bucket according to the new last_deleted item. We have to be sure this doesn't violate the invariant. First note that all the items in the bucket that is being redistributed will satisfy the invariant because they are simply being reinserted. And a bucket with a higher number c will still be different from the new last_deleted in bit c - 1. If it weren't, the new last_deleted would itself have belonged in bucket c. And there are no non-empty buckets with a lower number than the one that is being redistributed. Performance The time complexity if insertion is O(1) since all that is required is adding the new item to a list. (There is also a requirement to determine the highest set bit, but that can be done in constant time with an instruction such as bsr). The performance of extraction is dominated by the redistribution of items. When a bucket is redistributed, it ends up being empty. To see why, remember that all the items in bucket k are different from last_deleted in the k-1'th bit. Because the new last_deleted comes from bucket k, that means the items are now all *equal* to last_deleted in the k-1th bit. Hence they will all be redistributed to a lower-numbered bucket. Now consider the life-cycle of a single element. In the worst case, it starts out being added to bucket 31. Then each redistribution moves the element to a lower-numbered bucket. When it reaches bucket 0, it will be next in line for extraction. Since a redistribution takes constant time per element distributed, it follows that the total amount of redistribution that an element will experience is proportional to the number of bits in the keys. In other words, in the worst case the time complexity of extraction is O(d) where d is the number of bits in the integer keys. It's worth noting that in practice, the complexity will be even better since most items will not move through all the buckets. Caching performance Some descriptions of the radix heap recommend implementing the buckets as doubly linked lists. That's a big mistake because cache locality will be terrible. Instead, implement them as dynamically growing arrays. If you do that, the top of the buckets will tend to be hot which means the per-item time to do a redistribute a bucket will be more like O(1/B), where B is the number of integers in a cache line. This means the amortized complexity of extraction will be closer to O(d/B) than O(d). In a regular binary heap, both insertion and extraction requires O(log n) swaps, and each swap (except those very close to the top of the heap) will cause a cache miss. In other words, if d = Theta (log n), extraction from a radix heap will need O(log n / B) cache misses, where a binary heap will require O(log n).