That’s why I’ve wrote not optimized version of mergesort and I’ve run it with generated array with 500k elements. I waited…waited…and at last, after about 15 minutes the array was sorted. It was quite disappointing. I decided that it was so slow because of the recursion inside it. I have had bad experience with DFS using recursion for manipulating the DOM tree. The result for the default sort was about 0.5 seconds. For that test I’ve used nodejs (it uses Google’s V8 engine). After merge sort I’ve implemented heap sort. Also not bad algorithm with complexity O(nlog(n)). I though that it will be faster in Node.js because in my opinion that the recursion was the thing that made the mergesort so slow. I’ve generated another array with 500k elements using a simple perl script. The result was very interesting…In the chart below you can see mergesort compared to heapsort for array with 50k elements.
In the X-axis it’s an attempt number, the Y-axis is the time required for the algorithm to finish (in seconds).
Here is the implementation of the Mergesort:
And the heapsort:
If you find any mistakes in any of the implementations I’ll be glad to know and fix them. In the script above I use simple closure to hide the methods which are not useful for the public API. After I’ve made the test between merge and heap sorts I noticed that there’s a quite big difference…So that’s why I started heap vs the default sort to see how faster the native sort will be:
As you see the result is quite unexpected…I’ve checked my algorithm many times because I thought that it isn’t correct…In more than half of the cases (because as you might see there are 50 tests) heapsort is faster than the default sort. The default sort is some kind of optimized quicksort mixed with insertion sort for small arrays. The exact implementation of the V8 sort can be found somewhere here.
That is the perl script which I used for the test:
If you find any issue in the script please let me know, I’ll fix it as soon as possible. For the test cases I used both – different arrays for each algorithm in each test and the same array for each algorithm in each test. I didn’t found any difference between both alternatives that’s why I choose to generate a single array for each test (for faster testing).
Let start… In the charts below there’s a statistic for Selection sort, Insertion sort, Bubble sort, Heapsort, Mergesort and the Default sort (Quick/Insertion).
In this first chart there’s a statistic with 100 elements. In this case there’s almost no difference. The lines are very intertwined but we can see that all sorts have almost the same level of performance.
In the chart above the leaders are almost clear. The default sort is with speed like the mergesort, the heapsort is the fastest. But let’s increase the array…Let’s try with 250k elements:
From the chart it’s easy to see that the default sort is slower than the merge and the heap sorts…It’s very unusual. If you’ve ever tried to beat the default sort in Java or STL…well it’s
If we increase the array to more than 300k elements mergesort’s performance becomes very bad (more than 10 minutes) so I’ll just exclude the mergesort from the next tests.
Let me include one more algorithm implementation. It will be quicksort. Its’ implementation is like taken from a book, nothing special:
I’ll start the test again. The array size will be 500k. The competitors will be quicksort, heapsort and the default sort:
Here is the last statistic with V8 v3.10.8. The array size this time will be 2 million.
The default sort is more than 5 times slower… In the next chart there’s statistics with the V8 engine used by node v0.8.12.
In the chart above the quicksort custom implementation is again the fastest. The difference is less because of poor performance of the custom implementation of quicksort and faster default sort… If you’re interested about the tests with other CPU types please send me a message and I’ll post them. The difference is not impressive. With Core2Duo Heapsort’s performance is poor than the default sort but quicksort is again almost 1.5 seconds faster, when sorting array with 2 million items (node v0.8.14, under Windows 7).
Lesson learned is that you should not use the default sort in the Google’s V8 engine for large arrays of data and multiples sorts. What will be an eventual drawback. Imagine you have a high traffic website. There’s large amount of data which you receive each 10 seconds. You need to sort it and process it in any way. Let’s say that the data is with 2m records. If you use the default sort you’ll loose 5 seconds on each sort, so for two minutes you’ll lose 1 minute only for sorting. If you use your custom quicksort implementation each sort will lose about 1 second. For two minutes you’re going to lose 12 seconds. For small amount of data it’s not a big deal what kind of sorting algorithm you’re going to use…Even, I think that the default will be more readable and error resistant.
All files used for the test (except the generated ones) can be found here.