Me@0: From: Me@0: Subject: Bitonic Sort Me@0: Date: Mon, 4 Jun 2007 10:57:06 -0700 Me@0: MIME-Version: 1.0 Me@0: Content-Type: text/html; Me@0: charset="Windows-1252" Me@0: Content-Transfer-Encoding: quoted-printable Me@0: Content-Location: http://www.tools-of-computing.com/tc/CS/Sorts/bitonic_sort.htm Me@0: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962 Me@0: Me@0: Me@0: Bitonic Sort Me@0: Me@0: Me@0: Me@0:

Bitonic Sort

Me@0:

Abstract

Me@0:
Me@0:
Me@0:
Me@0:
Me@0:

Continuing a tutorial on sorting algorithms, = Me@0: this page=20 Me@0: animates bitonic = Me@0: sort.

Me@0:

Author
Thomas W. Christopher

Me@0:

Bitonic sort is a sorting algorithm designed specially for parallel=20 Me@0: machines.

Me@0:

A sorted sequence is a monotonically non-decreasing (or = Me@0: non-increasing)=20 Me@0: sequence. A bitonic sequence is composed of two subsequences, one = Me@0: monotonically=20 Me@0: non-decreasing and the other monotonically non-increasing. A "V" and an = Me@0: A-frame=20 Me@0: are examples of bitonic sequences.

Me@0:

I get tired of saying "non-decreasing" and "non-increasing." They are = Me@0: clunky=20 Me@0: and throw an extra negative into sentences. I will use "ascending" to = Me@0: mean=20 Me@0: "non-decreasing" and "descending" to mean "non-increasing."

Me@0:

Moreover, any rotation of a bitonic sequence is a bitonic sequence, = Me@0: or if you=20 Me@0: prefer, one of the subsequences can wrap around the end of the bitonic=20 Me@0: sequence.

Me@0:

Of course, a sorted sequence is itself a bitonic sequence: one of the = Me@0: Me@0: subsequences is empty.

Me@0:

Now we come to a strange property of bitoinic sequences, the property = Me@0: that is=20 Me@0: uses in bitonic sort:

Me@0:
Me@0:

Suppose you have a bitonic sequence of length 2n, that is, elements = Me@0: in=20 Me@0: positions [0,2n). You can easily divide it into two halves, [0,n) and = Me@0: [n,2n),=20 Me@0: such that Me@0:

Me@0:

What is this easy method? Simply compare elements in the = Me@0: corresponding=20 Me@0: positions in the two halves and exchange them if they are out of = Me@0: order.

Me@0:
Me@0:

    for (i=3D0;i<n;i++)=20 Me@0: {
        if = Me@0: (get(i)>get(i+n))=20 Me@0: exchange(i,i+n);
    }

Me@0:

This is sometimes called a bitonic merge.

Me@0:

Why does it work? I'm not sure I could do a concise and clear enough = Me@0: proof to=20 Me@0: fit in with these pages. Let's just leave it without a proof, but = Me@0: true.

Me@0:

So here's how we do a bitonic sort:=20 Me@0:

Me@0:

Me@0: Bitonic Sort Bitonic Sort This first version actually = Me@0: uses=20 Me@0: recursion. It uses methods sortup, sortdown, mergeup, and mergedown, to = Me@0: sort=20 Me@0: into ascending order or descending order and to recursively merge into = Me@0: ascending=20 Me@0: or descending order.

Me@0:

Method void sortup(int m, int n) will sort the n elements in = Me@0: the=20 Me@0: range [m,m+n) into ascending order. It uses method void mergeup(int = Me@0: m, int=20 Me@0: n) to merge the n elements in the subsequence [m,m+n) into = Me@0: ascending order.=20 Me@0: Similarly for void sortdown(int m, int n) and void = Me@0: mergedown(int m,=20 Me@0: int n).

Me@0:

The overall sort is performed by a call:

Me@0:
Me@0:

    sortup(0,N);

Me@0:

Both sortup and sortdown recursively sort each half to produce an = Me@0: A-frame=20 Me@0: shape and then recursively merge that into an ascending or descending=20 Me@0: sequence.

Me@0:
Me@0:

void sortup(int m, int n) {//from m to m+n
    if = Me@0: (n=3D=3D1)=20 Me@0: return;
    sortup(m,n/2);
   =20 Me@0: sortdown(m+n/2,n/2);
    = Me@0: mergeup(m,n/2);
}
void=20 Me@0: sortdown(int m, int n) {//from m to m+n
    if = Me@0: (n=3D=3D1)=20 Me@0: return;
    sortup(m,n/2);
   =20 Me@0: sortdown(m+n/2,n/2);
   =20 Me@0: mergedown(m,n/2);
}

Me@0:

Methods mergeup and mergedown are fairly straightfoward. They compare = Me@0: Me@0: elements in the two halves, exchange them if they are out of order, and=20 Me@0: recursively merge the two halves. Call mergeup( m,  n) sorts into = Me@0: ascending=20 Me@0: order the 2*n elements in the range [m,2n).

Me@0:
Me@0:

void mergeup(int m, int n) {
    if (n=3D=3D0)=20 Me@0: return;
    int i;
    for=20 Me@0: (i=3D0;i<n;i++) {
        if=20 Me@0: (get(m+i)>get(m+i+n)) exchange(m+i,m+i+n);
   =20 Me@0: }
    mergeup(m,n/2);
   =20 Me@0: mergeup(m+n,n/2);
}
void mergedown(int m, int n) = Me@0: {
   =20 Me@0: if (n=3D=3D0) return;
    int = Me@0: i;
    for=20 Me@0: (i=3D0;i<n;i++) {
        if=20 Me@0: (get(m+i)<get(m+i+n)) exchange(m+i,m+i+n);
   =20 Me@0: }
    mergedown(m,n/2);
   =20 Me@0: mergedown(m+n,n/2);
}

Me@0:

 

Me@0:

Me@0: Bitonic Sort2 Bitonic Sort2 Bitonic sort is perfect for=20 Me@0: parallelization. The recursive calls of merge can be done in parallel. = Me@0: The loops=20 Me@0: in the merges, comparing and conditionally exchanging elements (m+i) and = Me@0: (m+i+n)=20 Me@0: can also be run in parallel.

Me@0:

However, getting it to run efficiently in parallel requires turning = Me@0: the=20 Me@0: algorithm upside down. The algorithm is expressed in terms of recursive = Me@0: calls.=20 Me@0: What we need is an algorithm in terms of each individual element.

Me@0:

Let's examine what the elements are doing. We will use an = Me@0: eight-element array=20 Me@0: and identify the elements by their positions in binary.

Me@0:
Me@0:
Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0:
index01234567
in binary00000001001000110100010101100111
Me@0:

The bits in the addresses are numbered from 0 on the right: the = Me@0: rightmost bit=20 Me@0: is bit 0, the bit to its left is bit 1, etc. Bit j contributes = Me@0: 2j to=20 Me@0: the value of the binary number.

Me@0:

Here is the sequence of operations that can be performed in parallel: = Me@0: Me@0:

    Me@0:
  1. All elements are sorted single element subsequences.=20 Me@0:
  2. Pairs of elements are sorted into ascending or descending = Me@0: subsequences: Me@0:
    Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0:
    [0000,0001]ascending
    [0010,0011]descending
    [0100,0101]ascending
    [0110,0111]descending
Me@0:
Me@0:
Me@0:
    Me@0:
  1. These pairs are compared and conditionally exchanged:
Me@0:
Me@0:

First, corresponding to the top level merge call:

Me@0:
Me@0:
Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0:
0000<->0010ascending
0001<->0011ascending
0100<->0110descending
0101<->0111descending
Me@0:
Me@0:

Second, corresponding to the recursive merge = Me@0: calls:

Me@0:
Me@0:
Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0:
0000<->0001ascending
0010<->0011ascending
0100<->0101descending
0110<->0111descending
Me@0:
Me@0:

In both cases, bit 2 determines whether the elements are sorted = Me@0: ascending=20 Me@0: (bit 2 equals 0) or descending (=3D1).

Me@0:

In the first level recursive call of mergeup or mergedown, the = Me@0: elements=20 Me@0: compared have positions that differ in bit 1. In the second level = Me@0: calls, the=20 Me@0: elements compared differ in bit 0.

Me@0:
    Me@0:
  1. The final series of merges that puts the array in order = Me@0: corresponds to=20 Me@0: three levels of calls to mergeup.=20 Me@0:

    First level call of mergeup:

    Me@0:
    Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0:
    0000<->0100ascending
    0001<->0101ascending
    0010<->0110ascending
    0011<->0111ascending
Me@0:
Me@0:

Two second level recursive calls of mergeup:

Me@0:
Me@0:
Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0:
0000<->0010ascending
0001<->0011ascending
0100<->0110ascending
0101<->0111ascending
Me@0:
Me@0:

Four third level recursive calls of mergeup:

Me@0:
Me@0:
Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0: Me@0:
0000<->0001ascending
0010<->0011ascending
0100<->0101ascending
0110<->0111ascending
Me@0:
Me@0:

In all cases, bit 3 determines that the elements are sorted = Me@0: ascending (bit=20 Me@0: 3 equals 0).

Me@0:

In the first level recursive call of mergeup or mergedown, the = Me@0: elements=20 Me@0: compared have positions that differ in bit 2. In the second level = Me@0: calls, the=20 Me@0: elements compared differ in bit 1. In the third level calls, the = Me@0: elements=20 Me@0: compared differ in bit 0.

Me@0:

So here's the code:

Me@0:
    int i,j,k;
Me@0:     for (k=3D2;k<=3DN;k=3D2*k) {
Me@0:       for (j=3Dk>>1;j>0;j=3Dj>>1) {
Me@0:         for (i=3D0;i<N;i++) {
Me@0:           int ixj=3Di^j;
Me@0:           if ((ixj)>i) {
Me@0:             if ((i&k)=3D=3D0 && get(i)>get(ixj)) =
Me@0: exchange(i,ixj);
Me@0:             if ((i&k)!=3D0 && get(i)<get(ixj)) =
Me@0: exchange(i,ixj);
Me@0:           }
Me@0:         }
Me@0:       }
Me@0:     }
Me@0:

In this code, k selects the bit position that determines whether = Me@0: the pairs=20 Me@0: of elements are to be exchanged into ascending or descending order. = Me@0: Variable j=20 Me@0: corresponds to the distance apart the elements are that are to be = Me@0: compared and=20 Me@0: conditionally exchanged.   Variable i goes through all the = Me@0: elements; ixj=20 Me@0: is the element that is the pair of element i (the exclusive-or of i = Me@0: and j, the=20 Me@0: element whose position differs only in bit position (log2 = Me@0: j)). We=20 Me@0: only compare elements i and ixj if i<ixj. This avoids comparing = Me@0: them=20 Me@0: twice.

Me@0:

Me@0: Bitonic Sort3 Bitonic Sort3. The third = Me@0: version=20 Me@0: of bitonic sort is the same as the second except that the array is shown = Me@0: after=20 Me@0: each iteration of the for i loop. This gives the appearance of = Me@0: a=20 Me@0: parallel algorithm.

Me@0:

In this version, the delay between each showing of the array is set = Me@0: to five=20 Me@0: times the delay following exchanges of single elements in the other = Me@0: versions.=20 Me@0: There are two reasons: first, it simulates the longer time required to = Me@0: exchange=20 Me@0: elements between nodes of a distributed memory parallel computer, and = Me@0: second, it=20 Me@0: slows down the simulation enough that you will be able to make sense of = Me@0: it.

Me@0:

 

Me@0:

 

Me@0:

Parallel bitonic sort.  Here is a version of = Me@0: bitonic=20 Me@0: sort that uses the Tools of Computing thread package. A version of the = Me@0: first=20 Me@0: bitonic sort algorithm builds a task graph to do the sorting. The tasks = Me@0: are=20 Me@0: placed in a RunQueue when they become runnable. Tasks become runnable = Me@0: when all=20 Me@0: tasks in a predecessor set have terminated. There are a limited number = Me@0: of=20 Me@0: Threads (4) running tasks from the queue. When tasks complete, they = Me@0: signal a=20 Me@0: TerminationGroup. When all tasks in a TerminationGroup have signaled = Me@0: their=20 Me@0: completion, the tasks waiting for the termination of that group are made = Me@0: Me@0: runnable.

Me@0:

Me@0: ParBitonicSort

Me@0:

Back to=20 Me@0: beginning of the tutorial