International Journal of Computer MathematicsVol. 00, No. 00, Month 200x, 1–12 RESEARCH ARTICLE An in-place heapsort algorithm requiring n log n + n log∗ n − 0.546871n comparisons Md. Mahbubul Hasana† , Md. Shahjalalb‡ and M. Kaykobada§ a CSE Department, Bangladesh University of Engineering and Technology b Therap (BD) Ltd., Banani, Dhaka 1213 (Received 00 Month 200x; in ﬁnal form 00 Month 200x) In this paper we present an interesting modiﬁcation to the heap structure which yields a better comparison based in-place heapsort algorithm. The number of comparisons is shown to be bounded by n log n + n log∗ n − 0.546871n which is 0.853129n + n log∗ n away from the optimal theoretical bound of n log n − 1.44n. Keywords: Algorithms, Data structure, Heapsort.1. IntroductionThe heapsort algorithm is one of the best sorting algorithms and was introducedby Williams in 1964. It achieves both worst case and average case time complex-ity of O(n log n). Lower bound theory asserts that any comparison based sortingalgorithm will require log (n!) comparisons which is approximately n log n − 1.44n. There are many promising variants of heapsort algorithm such as MDR-Heapsort [10, 12], Generalized Heapsort [7, 11], Weak Heapsort , Bottom UpHeapsort , Ultimate Heapsort , Heapsort in 3-heaps  etc. Carlsson’s variant of heapsort  does not need extra space and requiresn log n + n log log n + 1.82n comparisons. Dutton showed that sorting using WeakHeap requires comparisons bounded by n log n + 0.086013n, but it uses n extrabits . In , Edelkamp and Stiegeler modiﬁed Dutton’s weak heap and designedseveral algorithms of heapsort requiring n log n − 0.9n comparisons but using anarray of size n to store indices. Xiao Dong Wang and Ying Jie Wu  gave anotheralgorithm bounding the number of comparisons to n log n−0.788928n, but using anadditional array of size n to store indices. The Bottom Up Heapsort  requires1.5n log n comparisons in the worst case. The Ultimate Heapsort  is anothervariant of heapsort algorithm which requires fewer than n log n + 32n + O(log n)comparisons to sort n elements in-place. In , Gonnet and Munro gave an al-gorithm to construct a heap of n elements which takes 1.625n comparisons inthe worst case. In the same paper they gave another algorithm, subsequently im-proved by Carlsson , for extraction of the maximum element requiring aroundlog n + log ∗ n + O(1) comparisons, where log ∗ denotes the iterated logarithm. Al-gorithm presented by McDiarmid and Reed  requires 1.5212n comparisons inthe average case for heap construction. But both of these algorithms require extra† Email: firstname.lastname@example.org‡ Email: email@example.com§ Email: firstname.lastname@example.orgISSN: 0020-7160 print/ISSN 1029-0265 online c 200x Taylor & FrancisDOI: 10.1080/0020716YYxxxxxxxxhttp://www.informaworld.com
2 Md. Mahbubul Hasan, Md. Shahjalal and M. Kaykobadspace. Carlsson and Chen gave two algorithms for heap construction in . First oneachieves 1.625n + O(log2 n) comparisons in the worst case and 1.5642n + O(log 2 n)on average but requires extra space. Second one is in-place algorithm which re-quires 1.528n + O(log 2 n) comparisons on average and 2n comparisons in the worstcase. In this paper we introduce a new data structure to reduce number of comparisonsby pairing elements to be sorted that does not use any extra space. Carlsson’svariant of heapsort has been applied on this data structure. This has restrictedthe number of comparisons to n log n + n log log n + 0.40821n. This data structurecan be applied to other heap related algorithms as well. For example, in this paperwe have shown how memory requirement of Dutton’s Weak Heapsort algorithmcan be reduced using this data structure. Moreover, even algorithms of Gonnetand Munro  or Carlsson  when applied to this data structure yields betterperformance than that of the corresponding algorithms. It restricts the numberof comparisons to n log n + n log∗ n − 0.546871n. While Ultimate Heapsort  isasymptotically faster than the proposed algorithm, ours will outperform it sincethe linear term in Ultimate Heapsort (32n) is more than n log∗ n for all practicalpurposes (even for large values of n). A preliminary version of this paper appearedin WALCOM 2007 . In Section 2 we present the modiﬁed data structure for heapsort. Section 3 con-tains analysis followed by the theoretical comparison between Carlsson’s variantand the proposed algorithm in Section 4. In Section 5 we show how Gonnet, Munroor Carlsson’s deletion algorithm yields better performance on the proposed datastructure, followed by generalization of the data structure in Section 6. In Sec-tion 7 we apply our pairing trick in Weak Heap. The practical performance of theproposed algorithm is presented in Section 8.2. A modiﬁed data structure for heapsortIndependent pairwise comparisons reduce uncertainty by the most. This fact en-courages creation of heaps with group of elements ordered in each node. In thefollowing we present our data structure with two elements in each node. However,in a later section we analyze performance of a more generalized data structure toﬁnd the best number of elements to be stored in a node.2.1 PreprocessingFor n given elements we will make n/2 pairs. If there are odd number of elementsa dummy element can be suitably added. For simplicity of our discussion from now on we will call the ﬁrst element of apair as the Primary element, and the second one as the Secondary element. Foreach pair we compare the elements and arrange them in non-increasing order. Thatmeans after rearranging, the Primary element will be greater than or equal to theSecondary element of the pair. Figure 1 shows some possible pairings.2.2 Heap Construction PhaseNow we construct a (max)heap according to the Primary elements of the pairs.The heap is constructed in bottom up fashion. That is, we start from the last pairof the heap and we try to push it down. To push a pair, we ﬁnd the path of eldersons and then we do binary search on the path. The implementation is similar to
International Journal of Computer Mathematics 3 a) 10 3 10 3 b) 2 8 8 2Figure 1. In a) Primary element is greater than Secondary element. So we do not swap. In b) Primaryelement is smaller than Secondary element. So we swap the elements.that of the Carlsson’s variant . Figure 2 shows a sample heap. 12 3 9 9 10 9 8 2 7 6 8 4 Figure 2. A (max)heap constructed based on the Primary element of the pairs.2.3 Sorting PhaseConstruction phase yields a heap where each of the Primary elements is greaterthan or equal to the Primary element of its children pairs. As the Primary elementsare always greater than or equal to the corresponding Secondary elements, thePrimary element of the root will be the largest element of the heap. In each iteration of the sorting phase we extract the Primary element of the rootand adjust the heap, decreasing number of elements in the heap by one. Suppose 1 is the root and 2 is the last pair (with possible empty secondaryelement). Let 1P and 1S be Primary and Secondary elements of the root, and 2P ,2S are deﬁned similarly. If 1 and 2 are the same pair then we swap Primary andSecondary elements(1P and 1S). If they are not the same pair then we remove thePrimary element of the root(1P ) (which is the largest element in the heap) andplace it at temp (a temporary variable to store an element). Then there can befour possible cases:Case 1. 2S is empty (if there is an odd number of elements in the heap) and 1S ≥ 2P: we place 1S at 1P , 2P at 1S and temp at 2P . (Figure 3)Case 2. 2S is empty (if there is an odd number of elements in the heap) and 1S < 2P: we place 2P at 1P , 1S remains at the same place and temp at 2P . (Figure 4)Case 3. 2S is not empty (if there is an even number of elements in the heap) and 1S ≥ 2P: we place 1S at 1P , 2P at 1S, 2S at 2P and temp at 2S. (Figure 5)Case 4. 2S is not empty (if there is an even number of elements in the heap) and 1S < 2P: we place 2P at 1P , 1S remains at the same place,
4 Md. Mahbubul Hasan, Md. Shahjalal and M. Kaykobad a b temp 8 5 5 3 d c 6 1 3 6 1 8Figure 3. Case 1. 2S is empty and 1S ≥ 2P . So a) 1P → temp, b) 1S → 1P , c) 2P → 1S and d)temp → 2P . a temp 8 3 5 3 c b 6 1 5 6 1 8 Figure 4. Case 2. 2S is empty and 1S < 2P . So a) 1P → temp, b) 2P → 1P and c) temp → 2P . a b temp 8 5 5 3 e c 6 1 3 2 6 1 2 8 dFigure 5. Case 3. 2S is not empty and 1S ≥ 2P . So a) 1P → temp, b) 1S → 1P , c) 2P → 1S d) 2S → 2Pand d) temp → 2S. 2S at 2P and temp at 2S. (Figure 6) All of these cases require a single comparison. After adjusting the order of thepair we ﬁx the heap by pushing down the root node in the same way as we did inthe heap construction phase. We also remove the last element and mark it empty.
International Journal of Computer Mathematics 5 a temp 8 3 5 3 d b 6 1 5 2 6 1 2 8 cFigure 6. Case 4. 2S is not empty and 1S < 2P . So a) 1P → temp, b) 2P → 1P , c) 2S → 2P and d)temp → 2S.3. AnalysisFor simplicity of analysis, let us assume that heap in our new data structure is full.That is, h n/2 = 2i i=0 = 2h+1 − 1 n ∴ h = log +1 −1 2 However, had it not been a full heap the formula would have been h =⌈log n + 1 ⌉ − 1 23.1 PreprocessingWe require n/2 comparisons to pair the n elements with a possible left over.3.2 Heap Construction PhaseAt every level i (0 ≤ i < h) we have 2i pairs. To ﬁnd the path of elder sons weneed h − i comparisons and for inserting the pair into the path we need binarysearch on a path of length h − i. So the number of comparisons required for heapconstruction is,
6 Md. Mahbubul Hasan, Md. Shahjalal and M. Kaykobad h−1 2i ((h − i) + ⌊log(h − i)⌋ + 1) i=0 h = 2h−i (i + ⌊log i⌋ + 1) i=1 h h h = 2h−i i + 2h−i ⌊log i⌋ + 2h−i i=1 i=1 i=1 h ⌊log i⌋ = 2h+1 − 2 − h + 2h + 2h − 1 2i i=1 ≤ 2h+1 − 2 − h + 2h ∗ 0.632843 + 2h − 1 = 0.90821n − log n + o(1) Here, h ⌊log i⌋ 2i i=1 ∞ ⌊log i⌋ < 2i i=1 50 ∞ ⌊log i⌋ ⌊log i⌋ < + 2i 2i i=1 i=51 50 ∞ ⌊log i⌋ ⌊log 51⌋ ⌊log i⌋ < + + 2i 251 i=51 2i i=1 < 0.632843 The integral and summand are calculated using www.wolframalpha.com. Note that, if there are x elements then we require ⌊log x⌋ + 1 comparisons forbinary search in the worst case.3.3 Sorting PhaseIn sorting phase, we will have i (1 ≤ i ≤ h) length path of elder sons 2i+1 times,because we have 2i+1 numbers at the ith level. For every such path we require icomparisons for determining the path of elder sons and ⌊log i⌋ + 1 comparisons forinserting a pair into the path. So for pushing down, we require
International Journal of Computer Mathematics 7 h 2i+1 (i + ⌊log i⌋ + 1) i=1 h h = 2i+1 (i + 1) + 2i+1 ⌊log i⌋ i=1 i=1 h ≤ 2h+2 h + ⌊log h⌋ 2i+1 i=1 = 2h+2 h + ⌊log h⌋ 2h+2 − 4 = 2 (n/2 + 1) h + (n − 2) ⌊log log (n/2 + 1)⌋ ≤ (n + 2) (log (n/2 + 1) − 1) + (n − 2) ⌊log log n⌋ = (n + 2) log (n/2 + 1) − n − 2 + (n − 2) ⌊log log n⌋ = (n + 2) log n − 2n + (n − 2) ⌊log log n⌋ + o(1) We also require n − 2 extra comparisons for adjustment. So the total number of comparisons is, n/2 + 0.90821n − log n + n − 2 + (n + 2) log n − 2n + (n − 2)⌊log log n⌋ + o(1) = n log n + n log log n + 0.40821n + o(1) 4. Theoretical comparison between Carlsson’s variant and the proposed algorithm Table 1 shows results of comparison between the two algorithms. We have analyzed the worst cases here.Table 1. Theoretical comparison between Carlsson’s algorithm and the proposed algorithm Number of elements CH1 NH2 CS3 NS4 DH = CH - NH DS = CS - NS Total diﬀerence(DH + DS) 7 8 9 20 17 -1 3 2 8 13 10 25 22 3 3 6 9 13 10 30 27 3 3 6 10 15 11 35 32 4 3 7 14 21 15 55 52 6 3 9 15 21 20 60 58 1 2 3 16 28 21 67 64 7 3 10 100 178 137 761 746 41 15 56 1000 1809 1401 11713 11446 408 267 675 10000 18159 14076 153357 153094 4083 263 4346 100000 181635 140814 1903137 1868412 40821 34725 75546 1000000 1816414 1408203 22885636 22819844 408211 65792 4740031 Comparison required in Carlsson’s Heap Construction in the worst case2 Comparison required in the proposed Heap Construction in the worst case3 Comparison required in Carlsson’s Sorting in the worst case4 Comparison required in the proposed sorting algorithm in the worst case In section 3, we considered our heap to be full and the number of comparisons
8 Md. Mahbubul Hasan, Md. Shahjalal and M. Kaykobadrequired for sorting phase of our proposed algorithm was, h n−2+ 2i+1 (i + ⌊log i⌋ + 1) i=1 However, for our experimental data we considered any heap. Let the height ofthe heap be h. That means, we have full heap of height h − 1 and t = n − 2 2h − 1extra nodes in the last level. So for full heap of height h− 1 we apply the formula insection 3.3. For the last level we will have t paths of elder sons of length h and weneed h comparisons to determine the path and ⌊log h⌋+1 comparisons for insertingpair into the path. So the number of comparisons for n element heap is: h−1 n−2+ 2i+1 (i + ⌊log i⌋ + 1) + n − 2 2h − 1 ∗ (h + ⌊log h⌋ + 1) i=1 Similarly, the number of comparisons for Carlsson’s algorithm in sorting phaseis: h−1 2i (i + ⌊log i⌋ + 1) + n − 2h − 1 ∗ (h + ⌊log h⌋ + 1) i=1 We computed the number of comparisons required for the heap constructionphase of Carlsson’s variant and the proposed algorithm by Algorithm 1 and Algo-rithm 2 respectively.Algorithm 1 The number of comparisons required for the heap construction phaseof Carlsson’s variantRequire: n ≥ 1 1: h ← ⌊log n⌋ 2: CH ← 0 3: active ← n − 2h + 1 4: for i = 1 to h do 5: current ← 2h−i 6: active ← ⌈active/2⌉ 7: passive = current − active 8: CH ← CH + active ∗ (i + ⌊log i⌋ + 1) 9: CH ← CH + passive ∗ ((i − 1) + ⌊log (i − 1)⌋ + 1)10: end for5. Further improvementUsing Gonnet, Munro or Carlsson’s [2, 6] algorithm we can further improve heapconstruction and sorting phases. To replace the maximum in a heap log n+log∗ n =h + log∗ h + 1 comparisons are necessary and suﬃcient as h = log n. So the number of comparisons required for construction in our proposed datastructure is:
International Journal of Computer Mathematics 9Algorithm 2 The number of comparisons required for the heap construction phaseof the proposed algorithmRequire: n ≥ 1 1: h ← ⌊log ⌈n/2⌉⌋ 2: N H ← ⌊n/2⌋ 3: active ← ⌈n/2⌉ − 2h + 1 4: for i = 1 to h do 5: current ← 2h−i 6: active ← ⌈active/2⌉ 7: passive = current − active 8: N H ← N H + active ∗ (i + ⌊log i⌋ + 1) 9: N H ← N H + passive ∗ ((i − 1) + ⌊log (i − 1)⌋ + 1)10: end for h n/2 + 2h−i (i + log∗ i + 1) i=1 h h h−i = n/2 + 2 (i + 1) + 2h−i log∗ i i=1 i=1 h log∗ i = n/2 + (3 ∗ 2h − h) + 2h + o(1) 2i i=1 ≤ n/2 + (3 ∗ 2h − h) + 0.812515 ∗ 2h + o(1) = n/2 + 3.812515 ∗ 2h − h + o(1) = 1.453129n − log n + o(1) Here, h log∗ i 2i i=1 ∞ log∗ i < 2i i=1 50 ∞ log∗ i log∗ i < + 2i 2i i=1 i=51 50 log∗ i log∗ 51 ∞ log∗ i < + + 2i 251 i=51 2i i=1 50 log∗ i log∗ 51 ∞ ⌊log i⌋ < + + 2i 251 i=51 2i i=1 < 0.812515 The integral and the summand are calculated using www.wolframalpha.com anda simple C program respectively.
10 Md. Mahbubul Hasan, Md. Shahjalal and M. Kaykobad Similarly the number of comparisons required for sorting is: h n−2+ 2i+1 (i + log∗ i + 1) i=1 h h i+1 =n−2+ 2 (i + 1) + 2i+1 log∗ i i=1 i=1 ≤ n − 2 + 2h+2 h + log∗ h 2h+2 − 3 = n − 2 + (n + 2)(log n − 2) + (log∗ n − 1)(n − 2) = (n + 2) log n + (n − 2) log∗ n − 2n + o(1) = n log n + n log∗ n − 2n + O(log n) So total number of comparison is: n log n + n log∗ n − 0.546871n + O(log n)6. Generalization of the data structureLet us now generalize the data structure and keep c elements per node. So therewill be n nodes and if the height of the heap is h then 2h+1 − 1 = n . So we sort c cthe elements at each node and construct the heap as described above. We sortthe elements at a node by traditional inplace heapsort algorithm requiring 2c log ccomparisons. So to create the heap the required number of comparisons is: h 2n log c + 2h−i (i + log∗ i + 1) i=1. In sorting phase, when there will be c elements left in the heap then we do notneed further comparisons. So we need to adjust the root n−c times. For adjustmentwe take the largest element at the last node and insert it into the root of the heapwhere there is c − 1 elements. Since we require ⌊log i⌋ + 1 comparisons to insert anelement in a sorted list of i elements, the required number of comparisons is: h (n − c) (⌊log (c − 1)⌋ + 1) + c2i (i + log∗ i + 1) i=1 Asymptotic analysis shows that the optimal value of c is O(log n). However,experimental results do not always support this theoretical ﬁndings. One of thereasons is that the cost function contains integral functions that are not so easy toanalyze by diﬀerential calculus.7. Pairing trick in Weak Heapsort algorithmSimilar to the previous one, we pair up the numbers and construct a Weak Heap according to the Primary elements of these pairs. These pairs acts as a node inthe Weak Heap and each node have an extra bit just like normal Weak Heap. We
International Journal of Computer Mathematics 11keep an extra variable suppose temp which is initially set to empty. After ﬁndingthe pair with the largest Primary element in the heap (we call this node head)we check temp variable. If temp is empty, the Primary element of the head is thenext largest number and the Secondary element of the head is assigned in temp.If temp is not empty and temp is larger than the Primary element of the head,the next two largest number will be the number in temp and the Primary elementof the head, the Secondary element will be assigned in temp. And if temp is notempty and temp is smaller than the Primary element of the head, the next largestnumber will be the Primary element of the head and we make a pair by combiningthe number in temp and the Secondary element of the head. Each of these scenariosrequires at most two extra comparisons. So the total number of comparisons is: n/2 + n/2 − 1 + 2(n/2 ∗ (k − 1) − 2k−1 − n/2 + 2) + n = n − 1 + nk − n − 2k − n + 4 + n = nk − 2k + 3 where k = ⌈log n⌉. This is the same expression as the number of comparisons ofWeak Heapsort algorithm . But here we required only n/2 bits to sort n numbersin (n − 1) log n + 0.086013n comparisons. We can also apply this trick similarly to the Weak Heapsort algorithm modiﬁedby Edelkamp and Stiegeler . It gives a sorting algorithm with n log n − 0.9ncomparisons and requiring n/2 bits and n size array to store indices.8. Experimental resultsIn table 2, the practical performance of the three algorithms is presented. Thealgorithms were run on the same random sets of ﬂoating point data several timesand the average time is presented in table 2. In implementation we used bitwiseoperation instead of normal arithmetic operation where it is possible. We alsoavoided recomputations of some equations storing it into a temporary variablewhere it is applicable. These two optimizations give us better runtime. Table 2. Experimental results Runtime in millisecond Number of elements The proposed algorithm Carlsson’s variant Dutton’s Weak Heapsort 10000 0 0 0 50000 31 31 78 100000 47 47 156 500000 313 375 969 1000000 580 907 2234 5000000 3340 5601 15554 10000000 7078 12385 35297 30000000 22765 49938 1389379. ConclusionWe have presented a new approach to reduce the number of comparisons for in-place heapsort. The main idea of the paper is to consider groups of elements ateach node of a heap. This reduces the height of the tree, and hence results in aslight reduction in time for each of the deletemax operations, despite an increaseof some comparisons to ensure that every node (except the last) has a sorted
12 REFERENCESgroup of elements. This is because of the reduction in the cost due to lower orderterms of the deletemax operations. So overall there is a reduction in the number ofcomparisons made in the lower order (more precisely the linear) term for heapsort.However optimum number of comparisons to be stored in a node in our proposeddata structure is still inconclusive and requires further investigation. Acknowledgement: The authors profusely thank anonymous referees for sug-gesting signiﬁcant improvement and for pointing out errors and inaccuracies of anearlier version of the manuscript.References  S. Carlsson, A variant of heapsort with almost optimal number of comparisons, Inf. Process. Lett. 24 (1987), pp. 247–250.  ———, An optimal algorithm for deleting the root of a heap, Inf. Process. Lett. 37 (1991), pp. 117–120.  S. Carlsson and J. Chen, Heap Construction: Optimal in Both Worst and Average Cases?, in ISAAC, Lecture Notes in Computer Science, vol. 1004, Springer, 1995, pp. 254–263.  R.D. Dutton, Weak-heap sort, BIT 33 (1993), pp. 372–381.  S. Edelkamp and P. Stiegeler, Implementing heapsort with (n logn - 0.9n) and quicksort with (n logn + 0.2n) comparisons, ACM Journal of Experimental Algorithmics 7 (2002), p. 5.  G.H. Gonnet and J.I. Munro, Heaps on heaps, SIAM J. Comput. 15 (1986), pp. 964–971.  T.M. Islam and M. Kaykobad, Worst-case analysis of generalized heapsort algorithm revisited, Int. J. Comput. Math. 83 (2006), pp. 59–67.  J. Katajainen, The Ultimate Heapsort, in CATS, 1998, pp. 87–96.  M. Kaykobad, M.M. Islam, M.E. Amyeen, and M.M. Murshed, 3 is a more promising algorithmic parameter than 2, Comput. Math. Appl. 36 (1998), pp. 19–24. C. McDiarmid and B.A. Reed, Building heaps fast, J. Algorithms 10 (1989), pp. 352–365. A. Paulik, Worst-case analysis of a generalized heapsort algorithm, Inf. Process. Lett. 36 (1990), pp. 159–165. S.K.N. Rezaul Alam Chowdhury and M. Kaykobad, A simpliﬁed complexity analysis of mcdiarmid and reed’s variant of bottom-up heapsort algorithm, Int. J. Comput. Math. 73 (2000), pp. 293–297. M. Shahjalal and M. Kaykobad, A New Data Structure for Heapsort with Improved Number of Comparisons, in WALCOM, 2007, pp. 88–96. X.D. Wang and Y.J. Wu, An improved heapsort algorithm with log n - 0.788928 comparisons in the worst case, J. Comput. Sci. Technol. 22 (2007), pp. 898–903. I. Wegener, Bottom-up-heapsort, a new variant of heapsort, beating, on an average, quicksort (if n is not very small), Theor. Comput. Sci. 118 (1993), pp. 81–98.