DOCUMENTATION

seo


FAQ on SORTINGS version 1.1

 

1. An educational program for understanding of the essential information: What means O (n character? Why the logarithm base is not written: O (log n)?


A: Kantor Ilia - Tomas Niemann
Estimations of time of execution. C????? O ().

For an estimation of productivity of algorithms it is possible to use different lines of thought. The most artless - simply to launch each algorithm on several tasks and to compare execution time. Other method - to estimate execution time. H???????, we can state that search time is O (n) (it is read so: about big from n).

When use designation O (), mean not execution exact time, but only its limit on top, and to within the constant multiplier. When speak, for example, that time of order O (n^2) is required to algorithm, mean that time of execution of the task grows not faster, than a square of an amount of elements. To experience that at this such, look at the table where the numbers illustrating growth rate for several different functions are resulted.

     n log n n*log n n^2 
   1 0 0 1 
   16 4 64 256 
   256 8 2,048 65,536 
   4,096 12 49,152 16,777,216 
   65,536 16 1,048,565 4,294,967,296 
   1,048,476 20 20,969,520 1,099,301,922,576 
   16,775,616 24 402,614,784 281,421,292,179,456 

If to consider that numbers in the table correspond to microseconds, for the task with 1048476 elements with operating time O (log n) 20 microseconds, and to algorithm with operating time O (n^2) - is required more than 12 days to algorithm.

If both algorithms, for example, O (n*log n) it does not mean at all that they ????????? are effective.

The character About does not consider a constant, that is the first can be, we tell in 1000 times more effectively. It means only that their time increases approximately as function n*log n.

The amount of operations increasing fastest undertakes function.

That is if in the program one function is fulfilled O (n) time - for example, multiplication, and addition - O (n^2) time, the general complexity - O (n^2), as eventually at magnification n faster (in the certain, constant number of times) additions become will be fulfilled so often that rare multiplyings will brake much more, rather than slow, but.

The logarithm base is not written.

The reason of it is rather simple. Let at us is O (log2_n). H? log2_n = log3_n / log3_2, and log3_2, as well as any constant, asymptotics - the character About () does not consider. Thus, O (log2_n) = O (log3_n). To any base we can pass similarly, and, means, and to write it it is not meaningful.

2. What for today the most effective methods of sorting?

A: Kantor Ilia

The quick sort, arranging sorting and quick sort with composite keys which is too difficult for ????.

Pyramidal sorting is very useful to the application-oriented tasks using elements of sorting also.

3. Sorting by simple insertions.

A: Kantor Ilia - Nicolas Virt - Tomas Niemann ;-)) Algorithm

All elements are conditionally divided into ready sequence a1... ai-1 and input ai... an. H? each step, since i=2 and increasing i on 1, we take i-? an element of input sequence and it is interposed it on a proper place into the ready.

             Example: 

  H???????? keys 44 \\55 12 42 94 18 06 67 
           i = 2 44 55 \\12 42 94 18 06 67 
           i = 3 12 44 55 \\42 94 18 06 67 
           i = 4 12 42 44 55 \\94 18 06 67 
           i = 5 12 42 44 55 94 \\18 06 67 
           i = 6 12 18 42 44 55 94 \\06 67 
           i = 7 06 12 18 42 44 55 94 \\67 
           i = 8 06 12 18 42 44 55 67 94 \\ 

By search of a suitable place it is convenient ' to sift ' x, comparing it to the next element ai and either interposing it, or sending ai to the right and moving ahead on the left.

Sifting can come to an end under two conditions:

1. H????? ai with a key, smaller x.

2. The end of ready sequence is reached.

The method is good stability of sorting, convenience to implementation in lists and, the most important thing, naturalness of behavior. That is the already partially sorted array will be ???????????? it much faster than many ' advanced ' methods.

The analysis

        Number of comparing 
                             The minimum: n - 1 
                             Average: (n^2 + n - 2) / 4 
                             The maximum: (n^2 + n) * / 2 - 1 
        Amount of transfers 
                             The minimum: 2 * (n - 1) 

                             Average: (n^2 + 9 * n - 10) / 4 

                             The maximum: (n^2 + 3 * n - 4) / 2 

        Example on Si - Tomas Niemann. 
--------------------------------------------------------------------
typedef int item;/* Type of sorted elements */ 
typedef int tblIndex;/* Type of keys by which it is sorted */ 

#define compGT (a, b) (a> b)/* comparing Function */ 

void insertSort (T *a, tblIndex lb, tblIndex ub) { 
    item t; 
    tblIndex i, j; 

   / *********************** 
    * We sort a [lb. ub] * 
    ***********************/ 
    for (i = lb + 1; i <= ub; i ++) { 
        t = a [i]; 

        /* we Shift elements downwards, while */ 
        /* we do not find an insertion place. */ 
        for (j = i-1; j> = lb && compGT (a [j], t); j-) 
            a [j+1] = a [j]; 

        /* an insertion */ 
        a [j+1] = t; 
    }
}

4. The description and source code QuickSort (quick sort).

The main algorithm

Let's select in a random way any element ? and we view an array, moving from left to right, yet we do not find ?i ??????? x, and then from right to left, yet we do not find ?i smaller ?. We change their places and we continue process of review with an exchange while reviews do not meet somewhere in the middle of an array.

As a result the array will be divided on two parts: left - with keys, smaller ? and right - with keys, big ?.

This step is called as sharing. ? - center.

To the turned out parts recursively it is applied the same procedure.

As a result very effective sorting turns out

 Example of recursive QuickSort 
------------------------------------
typedef int item;/* Type of sorted elements */ 
typedef int tblIndex;/* Type of keys by which it is sorted */ 

#define CompGT (a, b) (a> b) 

tblIndex partition (T *a, tblIndex lb, tblIndex ub) { 
     item t, pivot; 
    tblIndex i, j, p; 

   / ********************************** 
    *  Array sharing a [lb. ub] * 
    **********************************/ 

    /* we Select center - pivot */ 
    p = lb + ((ub - lb)>> 1); 
    pivot = a [p]; 
    a [p] = a [lb]; 

    /* it is sorted lb+1. ub concerning center */ 
    i = lb+1; 
    j = ub; 
    while (1) { 
        while (i <j && compGT (pivot, a [i])) i ++; 
        while (j> = i && compGT (a [j], pivot)) j-; 
        if (i> = j) break; 
        t = a [i]; 
        a [i] = a [j]; 
        a [j] = t; 
        j-; i ++; 
    }

    /* center in a [j] */ 
    a [lb] = a [j]; 
    a [j] = pivot; 

    return j; 
}

void quickSort (T *a, tblIndex lb, tblIndex ub) { 
    tblIndex m; 

   / ************************** 
    *  We sort a [lb. ub] * 
    **************************/ 

    while (lb <ub) { 

        /* sorting by insertions for small arrays */ 
        if (ub - lb <= 12) { 
            insertSort (a, lb, ub); 
            return; 
        }

        /* sharing in halves */ 
        m = partition (a, lb, ub); 

        /* we Reduce memory requirements: */ 
        /* the smaller segment is sorted by the first */ 
        if (m - lb <= ub - m) { 
            quickSort (a, lb, m - 1); 
            lb = m + 1; 
        } else { 
            quickSort (a, m + 1, ub); 
            ub = m - 1; 
        }
    }
}

Improvings

H? to practice for speed magnification, but not asymptotics, it is possible to produce some improvings:

1. As central for function partition the element allocated in the middle is selected. Such choice refines an estimation of an average operating time if the array is arranged only partially. H???????? for this implementation the situation arises in a case when each time by operation partition as the central is selected the maximum or minimum element.

1 ' It is possible to select average of the first, last and average elements. Maxim Razin: then the amount of passes decreases in 7/6 times.

2. For short arrays sorting by insertions is caused. Because of a recursion and other "overhead charge" fast search appears not so fast for short arrays. Therefore, if in an array 12 elements there are less, sorting by insertions is caused. Threshold value is not critical - it strongly depends on quality of the generated code.

3. If the last operator of function is a call of this function, speak about a tail recursion. It makes sense to replace it with iterations - the stack in this case is better used.

4. After a partition the smaller section at first is sorted. It also leads to the best usage of a stack as short sections are sorted faster and shorter stack is necessary to them. Memory requirements decrease with n to log n.

The example entering into standard implementation by Si uses many of these improvings.

Standard implementation iterative QuickSort 
------------------------------------------------

#include  
#define MAXSTACK (sizeof (size_t) * CHAR_BIT) 

static void exchange (void *a, void *b, size_t size) { 
    size_t i; 

    / ****************** 
     *  To exchange a, b * 
     ******************/ 

    for (i = sizeof (int); i <= size; i + = sizeof (int)) { 
        int t = * ((int *) a); 
        *(((int *) a) ++) = * ((int *) b); 
        *(((int *) b) ++) = t; 
    }
    for (i = i - sizeof (int) + 1; i <= size; i ++) { 
        char t = * ((char *) a); 
        *(((char *) a) ++) = * ((char *) b); 
        *(((char *) b) ++) = t; 
    }
}

void qsort (void *base, size_t nmemb, size_t size, 
        int (*compar) (const void *, const void *)) { 
    void *lbStack [MAXSTACK], *ubStack [MAXSTACK]; 
    int sp; 
    unsigned int offset; 

    lbStack [0] = (char *) base; 
    ubStack [0] = (char *) base + (nmemb-1) *size; 
    for (sp = 0; sp> = 0; sp-) { 
        char *lb, *ub, *m; 
        char *P, *i, *j; 

        lb = lbStack [sp]; 
        ub = ubStack [sp]; 

        while (lb <ub) { 

            /* we select center and we change with 1? an element */ 
            offset = (ub - lb)>> 1; 
            P = lb + offset - offset % size; 
            exchange (lb, P, size); 

            /* sharing in two segments */ 
            i = lb + size; 
            j = ub; 
            while (1) { 
                while (i <j && compar (lb, i)> 0) i + = size; 
                while (j> = i && compar (j, lb)> 0) j - = size; 
                if (i> = j) break; 
                exchange (i, j, size); 
                j - = size; 
                i + = size; 
            }

            /* center in A [j] */ 
            exchange (lb, j, size); 
            m = j; 

            /* the Smaller segment we continue to process, ??????? - in a stack */ 
            if (m - lb <= ub - m) { 
                if (m + size <ub) { 
                    lbStack [sp] = m + size; 
                    ubStack [sp ++] = ub; 
                }
                ub = m - size; 
            } else { 
                if (m - size> lb) { 
                    lbStack [sp] = lb; 
                    ubStack [sp ++] = m - size; 
                }
                lb = m + size; 
            }
        }
    }
}

5. The description and source code HeapSort (pyramidal sorting).

A: Kantor Ilia - Nicolas Virt

H?????? a pyramid sequence of elements

  hl, hl+1..., hr 
     Such that 
  hi <= h2i 
  hi <= h2i+1 
  For everyone i = l..., r/2 
  Geometrical interpretation ????????: 

                      h1 
                     / \ 
                    / \ 
                   / \ 
                  / \ 
                 / \ 
               h2 h3 
              / \/\ 
             / \/\ 
           h4 h5 h6 h7 
          / \/\/\/\ 
         h8 h9 h10 h11 h12 h13 h14 h15 

      For sequence 06 42 12 55 94 18 44 

                      06 
                     / \ 
                    / \ 
                   / \ 
                  / \ 
                 / \ 
               42 12 
              / \/\ 
             / \/\ 
           55 94 18 44 

Element adding in a ready pyramid

1. H???? element ? is placed in tree peak.

2. We look at an element at the left and the element on the right - is sampled the least.

3. If this element is less ? - is interchanged the position them and we go at a step 2. Otherwise the procedure end.

Phase 1: pyramid creation

Let the array h1 is given... hn. Clearly that elements hn/2 + 1... hn already form ' the lower row ' pyramids as there are no indexes i, j: j = 2*i (or j = 2*i + 1). That is here orderings it is not required.

H? each step the new element at the left is added and ' is sifted ' into place. Here an illustration of process for a pyramid from 8 elements:

     44 55 12 42//94 18 06 67 
     44 55 12//42 94 18 06 67 
     44 55//06 42 94 18 12 67 
     44//42 06 55 94 18 12 67 
     //06 42 12 55 94 18 44 67 

Phase 2: sorting

To sort elements, it is necessary to fulfill n sifting steps: after each step the next element undertakes from pyramid peak. H? each step we take the last element ?, the upper element of a pyramid is located on its place, and ? is sifted on the ' lawful '. In this case it is necessary to make n - 1 steps. An example:

      06 42 12 55 94 18 44 67// 
      12 42 18 55 94 67 44//06 
      18 42 44 55 94 67//12 06 
      42 55 44 67 94//18 12 06 
      44 55 94 67//42 18 12 06 
      55 67 94//44 42 18 12 06 
      67 94//55 44 42 18 12 06 
      94//67 55 44 42 18 12 06 

H? here we m received the sorted sequence, only upside-down. Changing comparing on opposite, we receive function Heapsort on Pascal

The fine characteristic of this method is that average of transfers - (n*log n)/2 and deviations from this value are rather small.

{We sort a type array ' item ' by a key ' a.key '} 

procedure Heapsort; 
 var i, l: index; x: item; 

procedure sift; 
 label 13; 
 var i, j: index; 
begin i: = l; j: = 2*i; x: = a [i]; 
 while j <= r do 
 begin if j <r then 
  if a [j].key <a [j+1].key then j: = j+1; 
  if x.key> = a [j].key then goto 13; 
  a [i]: = a [j]; i: = j; j: = 2*i 
 end; 
 13: a [i]: = x 
 end; 

 begin l: = (n div 2) + 1; r: = n; 
  while l> 1 do 
   begin l: = l - 1; sift 
   end; 

  while r> 1 do 
   begin x: = a [l]; a [l]: = a [r]; a [r]: = x; 
     r: = r - 1; sift 
 end 
end {heapsort} 

6. Requirements QuickSort and HeapSort. InsertSort... What it is better?

A: Kantor Ilia
Simple insertions.

The general high-speed performance - O (n^2), but in view of simplicity of a method, is the fastest for small (12-20 elements) arrays.

Natural behavior. It is easy to add new elements. In view of the singularities, it is extremely good for lists.

Sorting does not demand an add-in memory.

Quick sort

The general high-speed performance - O (nlogn). The case n^2 is theoretically possible, but extremely [1 / (n^logn)] is improbable.

In general - most quick sort by comparing for ???????????????? arrays 50 elements.

The behavior is unnatural. Already almost sorted array will be sorted as much, how many and completely ????????????????.

The iterative variant demands logn storage, recursive - O (n).

Pyramidal sorting.

In 1.5 times more slowly the fast. H????????? the case is not present - always O (nlogn). Practically, its elements are often applied in the adjacent tasks.

The behavior is unnatural.

The main advantage - sorting does not demand an add-in memory.

7. What quick sort?

A: Kantor Ilia

Let's understand once and for all. There are two types of sortings:

 Arranging and its variations | Sorting by comparing 
                                     |
It is based on that the number possible | Uses only possibility 
 Values of a key certainly. | direct comparing of keys and 
                                     |        Their orderliness. 
                                     |    Quicksort, Heapsort... 

For sortings by comparing the theorem of the maximum high-speed performance for a long time is proved: O (nlog n).

For sortings by allocation it - O (n). It is impossible to sort faster.

 So, the fastest sortings - 
    Quicksort - fast, 
    Radix sort - arranging 
           And their young hybrid: 
    Multiway Quicksort/MQS / - fast with composite keys. 
        ???????, for lines. 

Generally, it is necessary to look on the resources given and available available, but in typical cases sm above> become the fastest decisions <.

8. What is Byte, Numeral, Radiksnaja or Arranging sorting?

A: Kantor Ilia

Unfortunately, or fortunately, the information unit in the modern technics is capable to apply only 2 values: 1 and 0.

So, any computer data too can accept a limited number of values as consist of some number of bits ;-)).

Let we have a maximum on k byte in each key (though, for a sorting element it is quite possible to accept and something another. k it should be known in advance, before sorting.

Digit capacity of the data (an amount of possible values of elements k) - m.

If we sort words, a sorting element - a letter. m = 33. If in the long word of 10 letters, k = 10.

We will normally sort the data by keys from k byte, m=255.


Let we have an array source from n elements on one byte in everyone.

For an example can write out on a leaflet an array source = {7,9,8,5,4,7,7}, and to do with it all operations, meaning m=9.

I. We Make the allocation table. In it will be m (= 256) values and it will be filled so:

for (i = 0; i <255; i ++) distr [i] =0; 
for (i = 0; i <n; i ++) distr [source [i]] = distr [[i]] + 1; 

For our example we will have distr = {0, 0, 0, 0, 1, 1, 0, 3, 1, 1}, that is i th element distr [] - an amount of keys with value i.

     Let's fill the table of indexes: 

int index [256]; 
index [0 =0; 
for (i = 1; i <255; i ++) index [i] =index [i-1] +distr [i-1]; 

In index [i] we placed the information on the future amount of characters in the sorted array to the character with a key i.

H???????, index [8] = 5: we have 4, 5, 7, 7, 7, 8.

     And now we fill the new created array sorted the size n: 

for (i = 0; i <n; i ++) 
     { 
      sorted [index [source [i]]] =source [i]; 
//it is in passing changeable index already interposed characters, that 
//identical keys went one after another: 
      index [source [i]] = index [source [i]] +1; 
     }

     So, we learned for O (n) to sort bytes. And from bytes till the lines 
And numbers - 1 step. Let at us in each number - k byte. 

  Let's operate in decimal system and to sort normal numbers 
 (m = 10). 

      At first they in it is sorted on low on one 
       Disorder: to discharge: above: and once again: 
           523 523 523 088 
           153 153 235 153 
           088 554 153 235 
           554 235 554 523 
           235 088 088 554 

H? here we also sorted for O (k*n) steps. If the amount of possible various keys ????????? exceeds their general number digit-by-digit ' sorting ' appears much faster even ' quick sort '!

Implementation on a C ++ for long int'??. Itself did not do - rolled in i-not those.

 #include  
 #include <stdlib.h> 
 #include <string.h> 

 void radix (int byte, long N, long *source, long *dest) 
 { 
  long count [256]; 
  long index [256]; 
  memset (count, 0, sizeof (count)); 
  for (int i=0; i> (byte*8)) &0xff] ++; 
  index [0 =0; 
  for (i=1; i <256; i ++) index [i] =index [i-1] +count [i-1]; 
  for (i=0; i> (byte*8)) &0xff] ++] =source [i]; 
 }

 void radixsort (long *source, long *temp, long N) 
 { 
  radix (0, N, source, temp); 
  radix (1, N, temp, source); 
  radix (2, N, source, temp); 
  radix (3, N, temp, source); 
 }

 void make_random (long *data, long N) 
 { 
  for (int i=0; i <N; i ++) data [i] =rand () | (rand () <<16); 
 }

 long data [10000]; 
 long temp [10000]; 

 void main (void) 
 { 
  make_random (data, 10000); 
  radixsort (data, temp, 10000); 
  for (int i=0; i <100; i ++) cout <<data [i] <<'\n '; 
 }

9. That faster: arranging sorting or QuickSort?

A: Kantor Ilia

When an amount of possible various keys ????????? their total number - arranging.

It is a lot of various keys, the size of an array rather small - fast.

10. There is a big file. How it to sort?

A: Kantor Ilia - Tomas Niemann Multiphase sorting

This type of sorting concerns to so-called ' to merge sorts '. As merge is called process of join of the several arranged series in one.

 Example for 3 series, ???????? on 4th: 

    3 7 9 3 7 9 3 7 9 7 9 7 9 
  {2 4 6 1 {2 4 6 1 2 {4 6 1 2 3 {4 6 1 2 3 4 {6 
    1 5 8 5 8 5 8 5 8 5 8 

               7 9 7 9 9 
   1 2 3 4 5 {6 1 2 3 4 5 6 {8 1 2 3 5 6 7 {8 1 2 3 4 5 6 7 8 9 { 
               8 

Everyone p?? we ??p?? ???p?? the least element.

Thus, each operation of merge of series demands n transfers of elements, where n - total number of elements of series.

Let for us is available N tapes: N - 1 input and one empty. We will be ?????? elements from input tapes on output while any of them does not become empty. Then it becomes input.

Example of sorting with six tapes containing only 65 series. Series are designated by letters fi, in the table - an amount of elements.

    Type f1 f2 f3 f4 f5 f6 

            16 15 14 12 8 
            8 7 6 4 0 8 
            4 3 2 0 4 4 
            2 1 0 2 2 2 
            1 0 1 1 1 1 
            0 1 0 0 0 0 

At each moment of time merge happens on an empty tape with remaining, therefore number of required passes approximately equally log N n. In the given example allocation of initial series ??????? is artificial. For ideal sorting initial numbers of series should be the totals n - 1, n - 2..., 1 serial numbers Fibonacci of the order n - 2.

      Number Fibonacci of the order p are defined as follows: 
 fi+1 (p) = fi (p) + fi-1 (p) +... + fi-p (p) for i> =p, 
 fp (p) = 1, 
 fi (p) = 0 for 0 <= i <p. 

Obviously, normal numbers Fibonacci have the order 1.

Therefore we assume existence of fictitious series, such that the total fictitious with the real gives ideal number.

At first all data is allocated on one tape. The tape is read also segments are arranged on other tapes which are available in system. After initial segments are created, they merge, as is described above. One of methods which can be used for creation of initial segments, consists in reading of a portion of records in storage, their sorting and result record on a tape. The choice with substitution allows us to receive longer segments. This algorithm works with the buffer allocated in a random access memory. At first we fill the buffer. Then we repeat following steps until the input data will be settled:

  • To select record with the least key, i.e. with a key, which value> = values of a key of the last read record.
  • If all "old" keys less than the last key we reached the segment end. We select record with the least key as the first element of a following segment.
  • We write down the selected record.
  • We replace the selected and written down record on new of an input file.

H? to the following table a choice with substitution are illustrated for absolutely small file.

H????? a file - on the right. To simplify an example, it is considered that in the buffer 2 records are located only. Certainly, in real tasks in the buffer thousand records are located. We load the buffer on a step In and we write down in an output file record with the least number> = 6 on step D. Record with a key 7 appeared It. Now we replace it with new record from an input file - with a key 4. Process proceeds to step F where we it appears that the last written down key is equal 8 and all keys less than 8. During this moment we finish formation of a current segment and we begin following formation.

                      Step the Input the Buffer the Output 
                       A 5-3-4-8-6-7 
                       B 5-3-4-8 6-7 
                       C 5-3-4 8-7 6 
                       D 5-3 8-4 7-6 
                       E 5 3-4 8-7-6 
                       F 5-4 3 | 8-7-6 
                       G 5 4-3 | 8-7-6 
                       H 5-4-3 | 8-7-6 

Pay attention we store records in the buffer until there comes time to write down them in an output file. If the input is casual, the average length of segments is equal to approximately doubled length of the buffer. However, if the data though is somehow arranged, segments can be very long. That is why this method, generally speaking, is more effective the intermediate, partial sortings.

Reading from an input file the next record, we search for the least key, which> = last ??????????. Thus we, of course, can scan simply records in the buffer. However, if such records of thousand, search time can appear is inadmissible the big. If at this stage to use binary trees, it is required to us only log n comparing.

Implementation

In implementation of exterior sorting on ANSI-C function makeRuns causes readRec for reading of the next record. In function readRec the choice with substitution (with binary trees) for obtaining of the necessary record is used, and makeRuns Fibonacci arranges records according to a row. If the amount of segments appears out of sequence Fibonacci, in the beginning of each file empty segments are added. Then function mergeSort which produces multiphase merge of segments is caused.

/* exterior sorting */ 

#include  
#include  
#include  

/* ???????? for temporal files (a format 8.3) */ 
#define FNAME "_sort%03d.dat" 
#define LNAME 13 

/* comparison operators */ 
#define compLT (x, y) (x <y) 
#define compGT (x, y) (x> y) 

/* it is defined sorted records */ 
#define LRECL 100 
typedef int keyType; 
typedef struct recTypeTag { 
    keyType key;/* a key by which it is sorted */ 
    #if LRECL 
        char data [LRECL-sizeof (keyType)];/* remaining fields */ 
    #endif 
} recType; 

typedef enum {false, true} bool; 

typedef struct tmpFileTag { 
    FILE *fp;/* the pointer on a file */ 
    char name [LNAME];/* a file name */ 
    recType rec;/* the last read record */ 
    int dummy;/* number of idle passes */ 
    bool eof;/* a flag of the end of run end-of-file */ 
    bool eor;/* a flag of the end of pass end-of-run */ 
    bool valid;/* it is true, if record - suitable */ 
    int fib;/* ideal number Fibonacci */ 
} tmpFileType; 

static tmpFileType ** file;/* an array of the information on temporal files */ 
static int nTmpFiles;/* an amount of temporal files */ 
static char *ifName;/* a name of an input file */ 
static char *ofName;/* a name of an output file */ 

static int level;/* level of passes */ 
static int nNodes;/* an amount of nodes for a choice tree */ 

void deleteTmpFiles (void) { 
    int i; 

    /* to delete merged files and to release resources */ 
    if (file) { 
        for (i = 0; i <nTmpFiles; i ++) { 
            if (file [i]) { 
                if (file [i]-> fp) fclose (file [i]-> fp); 
                if (*file [i]-> name) remove (file [i]-> name); 
                free (file [i]); 
            }
        }
        free (file); 
    }
}

void termTmpFiles (int rc) { 

    /* to clear files */ 
    remove (ofName); 
    if (rc == 0) { 
        int fileT; 

        /* file [T] contains results */ 
        fileT = nTmpFiles - 1; 
        fclose (file [fileT]-> fp); file [fileT]-> fp = NULL; 
        if (rename (file [fileT]-> name, ofName)) { 
            perror ("io1"); 
            deleteTmpFiles (); 
            exit (1); 
        }
        *file [fileT]-> name = 0; 
    }
    deleteTmpFiles (); 
}

void cleanExit (int rc) { 

    /* to clear temporal files and to quit */ 
    termTmpFiles (rc); 
    exit (rc); 
}

void *safeMalloc (size_t size) { 
    void *p; 

    /* it is safe to select storage and ???????????????? */ 
    if ((p = calloc (1, size)) == NULL) { 
        printf ("error: malloc failed, size = %d\n", size); 
        cleanExit (1); 
    }
    return p; 
}

void initTmpFiles (void) { 
    int i; 
    tmpFileType *fileInfo; 

    /* ?????????????? files for merge */ 
    if (nTmpFiles <3) nTmpFiles = 3; 
    file = safeMalloc (nTmpFiles * sizeof (tmpFileType *)); 
    fileInfo = safeMalloc (nTmpFiles * sizeof (tmpFileType)); 
    for (i = 0; i <nTmpFiles; i ++) { 
        file [i] = fileInfo + i; 
        sprintf (file [i]-> name, FNAME, i); 
        if ((file [i]-> fp = fopen (file [i]-> name, "w+b")) == NULL) { 
            perror ("io2"); 
            cleanExit (1); 
        }
    }
}

recType *readRec (void) { 

    typedef struct iNodeTag {/* an internal node */ 
        struct iNodeTag *parent;/* the ancestor of an internal node */ 
        struct eNodeTag *loser;/* exterior lost */ 
    } iNodeType; 

    typedef struct eNodeTag {/* an exterior node */ 
        struct iNodeTag *parent;/* the ancestor of an exterior node */ 
        recType rec;/* entered record */ 
        int run;/* pass number */ 
        bool valid;/* entered record is suitable */ 
    } eNodeType; 

    typedef struct nodeTag { 
        iNodeType i;/* an internal node */ 
        eNodeType e;/* an exterior node */ 
    } nodeType; 

    static nodeType *node;/* an array of nodes of a tree of a choice */ 
    static eNodeType *win;/* the new winner */ 
    static FILE *ifp;/* an input file */ 
    static bool eof;/* it is true, if the end of an input file */ 
    static int maxRun;/* the maximum number of passes */ 
    static int curRun;/* number of current pass */ 
    iNodeType *p;/* the pointer on internal nodes */ 
    static bool lastKeyValid;/* it is true, if lastKey ????? */ 
    static keyType lastKey;/* the last key lastKey is written down */ 

    /* to Read the following record by a choice with substitution */ 

    /* Check on the first ????? */ 
    if (node == NULL) { 
        int i; 

        if (nNodes <2) nNodes = 2; 
        node = safeMalloc (nNodes * sizeof (nodeType)); 
        for (i = 0; i <nNodes; i ++) { 
            node [i].i.loser = &node [i].e; 
            node [i].i.parent = &node [i/2].i; 
            node [i].e.parent = &node [(nNodes + i)/2].i; 
            node [i].e.run = 0; 
            node [i].e.valid = false; 
        }
        win = &node [0].e; 
        lastKeyValid = false; 

        if ((ifp = fopen (ifName, "rb")) == NULL) { 
            printf ("error: file %s, unable to open\n", ifName); 
            cleanExit (1); 
        }
    }

    while (1) { 

        /* to substitute the previous winner with new record */ 
        if (! eof) { 
            if (fread (&win->rec, sizeof (recType), 1, ifp) == 1) { 
                if ((! lastKeyValid || compLT (win-> rec.key, lastKey)) 
                && (++ win-> run> maxRun)) 
                    maxRun = win-> run; 
                win-> valid = true; 
            } else if (feof (ifp)) { 
                fclose (ifp); 
                eof = true; 
                win-> valid = false; 
                win-> run = maxRun + 1; 
            } else { 
                perror ("io4"); 
                cleanExit (1); 
            }
        } else { 
            win-> valid = false; 
            win-> run = maxRun + 1; 
        }

        /* to put in order ancestors of the winner and lost */ 
        p = win-> parent; 
        do { 
            bool swap; 
            swap = false; 
            if (p-> loser-> run <WIN-> run) { 
                swap = true; 
            } else if (p-> loser-> run == win-> run) { 
                if (p-> loser-> valid && win-> valid) { 
                    if (compLT (p-> loser-> rec.key, win-> rec.key)) 
                        swap = true; 
                } else { 
                    swap = true; 
                }
            }
            if (swap) { 
                /* p should be the winner */ 
                eNodeType *t; 

                t = p-> loser; 
                p-> loser = win; 
                win = t; 
            }
            p = p-> parent; 
        } while (p! = &node [0].i); 

        /* the pass end? */ 
        if (win-> run! = curRun) { 
            /* win-> run = curRun + 1 */ 
            if (win-> run> maxRun) { 
                /* the output end */ 
                free (node); 
                return NULL; 
            }
            curRun = win-> run; 
        }

        /* to deduce tree top */ 
        if (win-> run) { 
            lastKey = win-> rec.key; 
            lastKeyValid = true; 
            return &win->rec; 
        }
    }
}

void makeRuns (void) { 
    recType *win;/* the winner */ 
    int fileT;/* last file */ 
    int fileP;/* following last file */ 
    int j;/* it is sampled file [j] */ 


    /* to Make ????????????????? passes through a choice with substitution. 
     * Passes ??????? with usage of allocation Fibonacci. 
     */ 

    /* ?????????????? file structures */ 
    fileT = nTmpFiles - 1; 
    fileP = fileT - 1; 
    for (j = 0; j <fileT; j ++) { 
        file [j]-> fib = 1; 
        file [j]-> dummy = 1; 
    }
    file [fileT]-> fib = 0; 
    file [fileT]-> dummy = 0; 

    level = 1; 
    j = 0; 


    win = readRec (); 
    while (win) { 
        bool anyrun; 

        anyrun = false; 
        for (j = 0; win && j <= fileP; j ++) { 
            bool run; 

            run = false; 
            if (file [j]-> valid) { 
                if (! compLT (win-> key, file [j]-> rec.key)) { 
                    /* to add to existing pass */ 
                    run = true; 
                } else if (file [j]-> dummy) { 
                    /* to begin new pass */ 
                    file [j]-> dummy-; 
                    run = true; 
                }
            } else { 
                /* the first pass to a file */ 
                file [j]-> dummy-; 
                run = true; 
            }

            if (run) { 
                anyrun = true; 

                /* the full pass */ 
                while (1) { 
                    if (fwrite (win, sizeof (recType), 1, file [j]-> fp)! = 1) { 
                        perror ("io3"); 
                        cleanExit (1); 
                    }
                    file [j]-> rec.key = win-> key; 
                    file [j]-> valid = true; 
                    if ((win = readRec ()) == NULL) break; 
                    if (compLT (win-> key, file [j]-> rec.key)) break; 
                }
            }
        }

        /* If there is no place for passes - upwards on level */ 
        if (! anyrun) { 
            int t; 
            level ++; 
            t = file [0]-> fib; 
            for (j = 0; j <= fileP; j ++) { 
                file [j]-> dummy = t + file [j+1]-> fib - file [j]-> fib; 
                file [j]-> fib = t + file [j+1]-> fib; 
            }
        }
    }
}

void rewindFile (int j) { 
    /* To go to the beginning file [j] and to read the first record */ 
    file [j]-> eor = false; 
    file [j]-> eof = false; 
    rewind (file [j]-> fp); 
    if (fread (&file [j]-> rec, sizeof (recType), 1, file [j]-> fp)! = 1) { 
        if (feof (file [j]-> fp)) { 
            file [j]-> eor = true; 
            file [j]-> eof = true; 
        } else { 
            perror ("io5"); 
            cleanExit (1); 
        }
    }
}

void mergeSort (void) { 
    int fileT; 
    int fileP; 
    int j; 
    tmpFileType *tfile; 

    /* Multiphase merge sort */ 

    fileT = nTmpFiles - 1; 
    fileP = fileT - 1; 

    /* to supply files with the information */ 
    for (j = 0; j <fileT; j ++) { 
        rewindFile (j); 
    }

    /* Each pass on a cycle merges one pass */ 
    while (level) { 
        while (1) { 
            bool allDummies; 
            bool anyRuns; 

            /* to scan about passes */ 
            allDummies = true; 
            anyRuns = false; 
            for (j = 0; j <= fileP; j ++) { 
                if (! file [j]-> dummy) { 
                    allDummies = false; 
                    if (! file [j]-> eof) anyRuns = true; 
                }
            }

            if (anyRuns) { 
                int k; 
                keyType lastKey; 

                /* to merge 1 pass file [0]. file [P]-> in file [T] */ 

                while (1) { 
                   /* Each pass on a cycle writes down 1 record in file [fileT] 
*/ 

                    /* H???? the least key */ 
                    k =-1; 
                    for (j = 0; j <= fileP; j ++) { 
                        if (file [j]-> eor) continue; 
                        if (file [j]-> dummy) continue; 
                        if (k <0 || 
                        (k! = j && compGT (file [k]-> rec.key, file [j]-> rec.key))) 
                            k = j; 
                    }
                    if (k <0) break; 

                    /* to write down record [k] in file [fileT] */ 
                    if (fwrite (&file [k]-> rec, sizeof (recType), 1, 
                            file [fileT]-> fp)! = 1) { 
                        perror ("io6"); 
                        cleanExit (1); 
                    }

                    /* to replace record [k] */ 
                    lastKey = file [k]-> rec.key; 
                    if (fread (&file [k]-> rec, sizeof (recType), 1, 
                            file [k]-> fp) == 1) { 
                        /* to check up about the run end file [s] */ 
                        if (compLT (file [k]-> rec.key, lastKey)) 
                            file [k]-> eor = true; 
                    } else if (feof (file [k]-> fp)) { 
                        file [k]-> eof = true; 
                        file [k]-> eor = true; 
                    } else { 
                        perror ("io7"); 
                        cleanExit (1); 
                    }
                }

                /* Podkorrektirovkat idle */ 
                for (j = 0; j <= fileP; j ++) { 
                    if (file [j]-> dummy) file [j]-> dummy-; 
                    if (! file [j]-> eof) file [j]-> eor = false; 
                }

            } else if (allDummies) { 
                for (j = 0; j <= fileP; j ++) 
                    file [j]-> dummy-; 
                file [fileT]-> dummy ++; 
            }

            /* the pass end */ 
            if (file [fileP]-> eof &&! file [fileP]-> dummy) { 
                /* completed a fibonocci-level */ 
                level-; 
                if (! level) { 
                    /* it is ready, file [fileT] contains given */ 
                    return; 
                }

                /* fileP it was exhausted, we open new same */ 
                fclose (file [fileP]-> fp); 
                if ((file [fileP]-> fp = fopen (file [fileP]-> name, "w+b")) 
                        == NULL) { 
                    perror ("io8"); 
                    cleanExit (1); 
                }
                file [fileP]-> eof = false; 
                file [fileP]-> eor = false; 

                rewindFile (fileT); 

                /* f [0], f [1]..., f [fileT] <- f [fileT], f [0]..., f [T-1] */ 
                tfile = file [fileT]; 
                memmove (file + 1, file, fileT * sizeof (tmpFileType *)); 
                file [0] = tfile; 

                /* to begin new passes */ 
                for (j = 0; j <= fileP; j ++) 
                    if (! file [j]-> eof) file [j]-> eor = false; 
            }
        }

    }
}


void extSort (void) { 
    initTmpFiles (); 
    makeRuns (); 
    mergeSort (); 
    termTmpFiles (0); 
}

int main (int argc, char *argv []) { 

    /* a command line: 
     *
     *   ext ifName ofName nTmpFiles nNodes 
     *
     *   ext in.dat out.dat 5 2000 
     *       Reads in.dat, sorts, using 5 files and 2000 nodes, 
     *        Deduces in out.dat 
     */ 
    if (argc! = 5) { 
        printf ("%s ifName ofName nTmpFiles nNodes\n", argv [0]); 
        cleanExit (1); 
    }

    ifName = argv [1]; 
    ofName = argv [2]; 
    nTmpFiles = atoi (argv [3]); 
    nNodes = atoi (argv [4]); 

    printf ("extSort: nFiles = % d, nNodes = % d, lrecl = % d\n", 
        nTmpFiles, nNodes, sizeof (recType)); 

    extSort (); 

    return 0; 
}