Array.sort() should not be used to shuffle an array

When a developer searches the web for an algorithm to shuffle an array in JavaScript or ActionScript, he will surely find a way to achieve that using the Array.sort() method (see references : JavaScript, ActionScript 3). While this method is intended to sort an array, it can also be used to randomize it (!). In general, here and there, the following algorithm comes forth :
/**
 * Add a randomize method to the Array object prototype
 * Usage :
 *  var tmpArray = ["a", "b", "c", "d", "e"];
 *  tmpArray.randomize();
 */

Array.prototype.randomize = function (){
    this.sort(
        function(a,b) {
            return Math.round( Math.random() * 2 ) - 1;
        }
    );
};
The principle of this implementation is simple. It takes advantage of the comparison Function that can be passed to the Array.sort() method. This function defines the sort order. It will be passed two objects (from the array) a and b, compare them and return -1, 0 or 1 whether a is less, equal or greater than b. The idea, to shuffle an array, is to randomly output -1, 0 or 1. This is the purpose of the expression Math.round( Math.random() * 2 ) – 1. This idea is bad.

Shuffling an array with Array.sort() results in… a not so shuffled array

Implementing a shuffle function on arrays, this way, is bad. At first glance it seems to produce well randomized arrays, but the random distribution is not uniform. I will illustrate this in a minute. Before that, give a look at the table below. It’s interactive. When you click start, it will run the following process :
  1. Create an array with ten letters in alphabetical orders from A to J
    var simpleArray = ["A","B","C","D","E","F","G","H","I","J"];
  2. Copy this array in a second array
  3. Run a shuffle algorithm against the second array
  4. Look where the letters are in the second array and record their position in the table columns
  5. Return to the step 2 and start the process again until a certain number of iterations has been reached
This repetitive process is pretty simple. It is not intended to give a mathematical evidence of the problem but to illustrate the consequences of a badly implemented shuffle algorithm. Note that the copy at step 2 is always made from the original array (not from the last shuffled array). In this first example and to make sure you understand how this table work, step 3 is bypassed. Thus the copied array is always exactly the same as the original one. Click on Start button will launch the process (The number of iterations can be changed).
Normally, at the end of the process, you should see a table full of zeros with only the diagonal (top-left / bottom-right) filled with the number of iterations. It means that, at every steps, letters in the second array are at the same position than in the first one. This is what was expected because there is no shuffle. Let’s do the same thing but now step 3 uses the shuffle algorithm we have seen before.
The result is interesting, isn’t it ? The position of letters has well been set randomly over iterations. Almost every cells in the table are filled with numbers. Let’s give a closer look. Actually, the nearer a cell is from the diagonal (top-left / bottom-right) the higher its number. Moreover there are still zeros in some cells far from this diagonal. Obviously, it means letters are more likely to be near their original position, after a shuffle.

How to better shuffle an array with Array.sort() ?

We said the comparison function returned randomly either -1, 0 or 1. What exactly happens in each case ? Let’s quote the Mozilla developer Center for the Array.sort() comparison function description
If compareFunction is supplied, the array elements are sorted according to the return value of the compare function. If a and b are two elements being compared, then:
  • If compareFunction(a, b) is less than 0, sort a to a lower index than b.
  • If compareFunction(a, b) returns 0, leave a and b unchanged with respect to each other, but sorted with respect to all different elements. Note: the ECMAscript standard does not guarantee this behaviour, and thus not all browsers (e.g. Mozilla versions dating back to at least 2003) respect this.
  • If compareFunction(a, b) is greater than 0, sort b to a lower index than a.
The Array.sort() method uses a Bubble Sort algorithm. Roughly, it steps through array elements and compare each one with its next adjacent element. It swaps the elements when the comparison function return 1. What does it do when the comparison function return 0 or -1 ? Actually nothing and here is the issue ! As the comparison function randomly generate -1, 0 or 1, each of those numbers has the same chance to arise. Unfortunately a letter can change its position only if 1 arises. It means that a letter has twice the chance not to move than to move ! So to resolve the issue the comparison function must not generate 3 numbers randomly but only two : 0 or 1. Thus a letter has the same chance to move than to stay in place. Below is the new implementation (Note that it is simpler than the previous one).
/**
 * Add a randomize method to the Array object prototype
 * Usage :
 *  var tmpArray = ["a", "b", "c", "d", "e"];
 *  tmpArray.randomize2();
 */

Array.prototype.randomize2 = function (){
    this.sort(
        function(a,b) {
            return  Math.round( Math.random() );;
        }
    );
};
The following table shows the result of this new implementation :
This result seems much more random than the previous one. There are much less zeros filling the table, however we can still see a kind of diagonal pattern where higher numbers seem much more on the diagonal. It is less obvious than before but still there. The reason for this can be found in the Bubble Sort algorithm and its implementation. Think about if A < B and B < C then not need to compare A and C.Thus, some comparisons are not done and optimization in code results in a greater efficiency to sort but an issue when shuffle. Update : it seems there are differences between browsers. Notably, the method explained here is more accurate in Firefox but less in IE. Both gives bad uniform random, though. Not tested in other browsers

Shuffle arrays the good way

Fortunately it is not required to use Array.sort() to shuffle an array. Implement a shuffle function using Fisher-Yates algorithm is better and still easy. Below is the code implementing a shuffle function using the Fisher-Yates algorithm. This algorithm provides uniform random.
/*
 * Add a shuffle function to Array object prototype
 * Usage :
 *  var tmpArray = ["a", "b", "c", "d", "e"];
 *  tmpArray.shuffle();
 */

Array.prototype.shuffle = function (){
    var i = this.length, j, temp;
    if ( i == 0 ) return;
    while ( --i ) {
        j = Math.floor( Math.random() * ( i + 1 ) );
        temp = this[i];
        this[i] = this[j];
        this[j] = temp;
    }
};
To prove the superiority of Fisher-Yates algorithm in Array.sort() implementation play with the table below.
With these results, it is obvious that this algorithm is far more efficient in distributing uniformly the letters in the array over iterations than Array.sort() style algorithms. Distribution is much more the same across cells.

Tags: , , , , ,

This entry was posted on Monday, November 16th, 2009 at 2:48 pm and is filed under ActionScript, JavaScript. You can follow any comments to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

9 comments

  1. Comment 372

    georges

    Stéphane, on http://opentochoice.org you posted about your page on Array.sort().
    It’s not clear to me why.
    Are you implying that Mozilla is doing something unfair with it? and how would they benefit from it???

    • Comment 373

      Not at all. The European Commission forces Microsoft to show Windows users a screen asking to choose fro a browser if they still use the one by default. The web site http://opentochoice.org is the Mozilla website to help people make a decision. The window which will be displayed to Windows users to ask them to make their choice is a web page developed by and hosted at Microsoft’s : http://www.browserchoice.eu (Not to be confused with the previous one). The Microsoft web page at http://www.browserchoice.eu should display browsers in random order. This is where they use a bad implementation of a random Algorithm. I don’t think this is for anybody a way to benefit to something. It’s just the result of the web spreading some bad habits and developers to appropriate them. Even those at Microsoft’s.

  2. Comment 364

    Simon

    Perhaps I’m missing something, but this seems an awfully clumsy way of doing things – using a deliberately broken comparison function to make the sort do something other than sort.

    If you have access to a sort function and a random generator, why not just assign a large random integer to each item in the set, then sort the resulting pairs by the numeric order?

    • Comment 365

      It’s always a matter of efficiency. The comparison function is not “deliberately broken”. It is built to be efficient at what it is intended to, ie sort things. Using it to shuffle things is, at a first glance, a clever way to use its built-in speediness but it turns that it’s a bad idea.

      • Comment 366

        Simon

        You misunderstand – the user-provided comparison function *is* deliberately broken, returning a random value in order to break the sort algorithm, trying to force it to un-sort the array instead. Even without the statistical demonstration of why it’s invalid, this approach is screaming out ‘bad idea’ to me…

  3. Comment 131

    David McFarlane

    Very nice explanation and demos! (Chanced upon this while looking for info on shuffling arrays without consecutive repeats.)