Quicksort and Binary Search algorithms in C++



This post is related to one of the best coursework I've done during the computer engineering course. I'm really proud of it. It was the cornerstone in my programming career and helped me choose an area to put my efforts from that moment on. I was in the 5th term out of 10 (3rd year of the course out of 5 more exactly). The discipline was Programming Languages.

This subject is fantastic and is used extensively throughout the the computer science field.

First I'll give a short description about the Quicksort and Binary Search algorithms and then I'll present the work that I and my dear brother in faith Wellington Magalhães Leite did.

Quicksort

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare. Typically, quicksort is significantly faster in practice than other sorting algorithms, because its inner loop can be efficiently implemented on most architectures.

Binary Search

A binary search algorithm (or binary chop) is a technique for finding a particular value in a sorted list. It makes progressively better guesses, and closes in on the sought value by selecting the median element in a list, comparing its value to the target value, and determining if the selected value is greater than, less than, or equal to the target value. A guess that turns out to be too high becomes the new top of the list, and a guess that is too low becomes the new bottom of the list. Pursuing this strategy iteratively, it narrows the search by a factor of two each time, and finds the target value.

Our paper

Our paper is entitled Quicksort and Binary Search Algorithms. You can get a copy at the end of this post.

Without more ado, see its abstract bellow:

Sorting and searching algorithms are a core part of the computer science area. They are used throughout the programming work when you need to sort a set of data and when you need to search for a specific record (key) present in such set of data.

Quicksort is one of the fastest (quick) sorting algorithms and is most used in huge sets of data. It performs really well in such situations.

Binary search tree is one of the fastest searching algorithms and is applied in a sorted set of data. It reduces the search space by 2 in each iteration, hence its name (binary).

In this paper we present the intrinsic nature of each algorithm as well as a functional implementation of such algorithms in the C++ programming language.

Keywords: quicksort, binary search, sorting algorithms, searching algorithms, c++ programming language

CONTENTS
1 INTRODUCTION 6
  1.1 Objective 6
  1.2 Definition 6
      1.2.1 Sorting algorithms 6
      1.2.2 Searching algorithms 7
2 DEVELOPMENT 8
  2.1 Quicksort algorithm (swapping and partitioning) 8
      2.1.1 Detailed steps 8
  2.2 Binary search algorithm 9
  2.3 Studying the efficiency of the methods 9
      2.3.1 The big O notation 9
      2.3.2 Quicksort efficiency 10
            2.3.2.1 The best and the worst case 10
            2.3.2.2 Comparison with other sorting algorithms 11
      2.3.3 Binary search efficiency 11
3 APPLICATION 12
  3.1 Quicksort implementation 12
  3.2 Binary search implementation 13
4 CONCLUSION 14
5 REFERENCES 15

Words of wisdom

As I mentioned, during the 5th term of the computer engineering course our teacher Marcus Vinicius Carvalho Guelpeli selected some sorting and searching algorithms to pass to the class as a coursework.

The coursework should be done in pairs and each pair should select a sorting and searching algorithm to compose a paper about it. We selected the quicksort and the binary search algorithms. The teacher advised us that these weren't the easy ones. I just thought: that's what I want. I don't want the easy ones. Why? Because if you get just the easy problems I'll never understand something that demands a more deep approach and every time a difficult task is given you'll tend to refuse it. What's the best thing to do? Just accept the challenge and go for it. Chances are you'll succeed. That's just what happened with us.

We haven't just written about the quicksort and the binary search, we implemented it and presented it to the class in a power point presentation. The teacher liked it so much that our grade was the highest possible! :-) In the end what did we feel? An amazing feeling. Something such that the work had been done and we learned a lot from it. That’s what a college and even an online computer science degree program is supposed to do. Give you the subjects and motivate you; teaching the basic so that you can dig up the more difficult aspects of the subject being taught.

So what's next? Well, I'll explain how we implemented the quicksort and the binary search algorithms.

Quicksort and Binary search algorithms implementation in C++

One of the things that I've always listened to was about code reuse. You search for something already implemented so that you just haven't to reinvent the wheel. Of course you'll complement or even adapt the available code to your situation. That was what we did. I found some code for the quick sort at MSDN and implemented the binary search one. Unfortunately the page at MSDN isn't available anymore. It's been three years since I hit that page.

I wanted a way of measuring the time elapsed so that we could compare the efficiency of both methods when they were fed with different input data sets. The input is nothing more than a text file .txt full of numbers in this case. Can be any data you want. Each test case we passed a text file with different random numbers and different quantity of numbers. For example, in a test case we passed a file named 2500.txt, that means 2500 random numbers. In another test case we passed other file named 7500.txt as so on. I think you got it. Doing so we could compare how well the algorithms were performing.

To generate the random numbers we used an Excel spreadsheet with the formula =RAND()*1000000. For each set of data we generated new numbers and copied and pasted those numbers into the text files that are the input for our program. During a coursework as this one we get to learn everywhere, even a new formula in Excel. It's really good. ;-)

Again, I searched for a timing class that I could reuse with the code and for sure I found it. I didn't use it at all but I used it to learn about how to measure time in C++. It's amazing how fast you can implement something. Much of the things you need related to programming are already implemented. You just have to search for it as is what you're doing here, I think! You searched for the subject of this post and here you are seeing something implemented. Try to learn from it and just don't copy the entire work and think that you know about it. It's wrong. Try to understand what the code is doing. Dive into the theory because it explains the inner essence.

The code that follows is well commented which is something every developer should do. You see, it was three years ago when we worked with this code. Today it's difficult to remember every step I took. The comments helped me to remember almost everything.

Bellow I present the quick sort method we borrowed from MSDN (we adapted it to fit our case). Note the use of the Partition method (explained in the accompanying paper):

// QuickSort implementation
void QuickSort(char** szArray, int nLower, int nUpper)
{
 // Check for non-base case
 if(nLower < nUpper)
 {
   // Split and sort partitions
   int nSplit = Partition(szArray, nLower, nUpper);
   QuickSort(szArray, nLower, nSplit - 1);
   QuickSort(szArray, nSplit + 1, nUpper);
 }
}

// QuickSort partition implementation
int Partition (char** szArray, int nLower, int nUpper)
{
 // Pivot with first element
 int nLeft = nLower + 1;
 char* szPivot = szArray[nLower];
 int nRight = nUpper;

 // Partition array elements
 char* szSwap;
 while(nLeft <= nRight)
 {
   // Find item out of place
   while(nLeft <= nRight && strcmp (szArray[nLeft], szPivot) <= 0)
     nLeft = nLeft + 1;
   while (nLeft <= nRight && strcmp (szArray[nRight], szPivot) > 0)
     nRight = nRight - 1;

   // Swap values if necessary
   if(nLeft < nRight)
   {
     szSwap = szArray[nLeft];
     szArray[nLeft] = szArray[nRight];
     szArray[nRight] = szSwap;
     nLeft = nLeft + 1;
     nRight = nRight - 1;
   }
 }

 // Move pivot element
 szSwap = szArray[nLower];
 szArray[nLower] = szArray[nRight];
 szArray[nRight] = szSwap;
 return nRight;
}

Now see the binary search method implementation that we did:

int BinarySearch(char** szArray, char key[], int nLower, int nUpper)
{
 // Termination case
 if(nLower > nUpper)
   return 0;

 int middle = (nLower + nUpper) / 2;

 if(strcmp(szArray[middle], key) == 0)
   return middle;
 else
 {
   if(strcmp(szArray[middle], key) > 0)
     // Search left
     return BinarySearch(szArray, key, nLower, middle - 1);
   // Search right
   return BinarySearch(szArray, key, middle + 1, nUpper);
 }
}

The next ones are the method prototypes and the main entry point that calls a menu. According to the user passed parameters we call the quicksort and the binary search methods:

// Function prototypes
void Menu(void);
void QuickSort(char** szArray, int nLower, int nUpper);
int Partition(char** szArray, int nLower, int nUpper);
int BinarySearch(char** szArray, char key[], int nLower, int nUpper);

// Application initialization
void main(void)
{
 char op;

 do
 {
   Menu();
   printf("\n\nDo you wanna a new QuickSort? Y/N");
   op = getche();
   if(islower(op))
     op = toupper(op);
 }
 while(op == 'Y');
}

void Menu(void)
{
 // Clear screen
 system("CLS");

 // Control execution time
 clock_t initial, final;

 // Print startup banner
 printf("\nQuickSort C++ Sample Application\n");
 printf("Copyright (c)2001-2002 Microsoft Corporation. All rights reserved.\n\n");
 printf("MSDN ACADEMIC ALLIANCE [http://www.msdnaa.net/]\n\n");
 printf("BinarySearch C++ Sample Application\n");
 printf("Copyright (c)2005 Leniel Braz de Oliveira Macaferi & Wellington Magalhaes Leite.\n");
 printf("UBM COMPUTER ENGINEERING - 5TH SEMESTER [http://www.ubm.br/]\n\n");

 // Describe program function
 printf("This program example demonstrates the QuickSort and BinarySearch algorithms by\n");
 printf("reading an input file, sorting its contents, writing them to a new file and\n");
 printf("searching on them.\n\n");

 // Prompt user for filenames
 char szSrcFile[1024], szDestFile[1024];
 printf("Source: ");
 gets(szSrcFile);
 printf("Output: ");
 gets(szDestFile);

 // Read contents of source file
 const long nGrow = 8;
 long nAlloc = nGrow;
 long nSize = 0;
 char** szContents = new char* [nAlloc];
 char szSrcLine[1024];
 FILE* pStream = fopen(szSrcFile, "rt");

 while(fgets(szSrcLine, 1024, pStream))
 {
   // Trim newline character
   char* pszCheck = szSrcLine;
   while(*pszCheck != '\0')
   {
     if(*pszCheck == '\n' && *(pszCheck + 1) == '\0')
       *pszCheck = '\0';
     pszCheck++;
   }

   // Append to array
   szContents[nSize] = new char [strlen(szSrcLine) + 1];
   strcpy(szContents[nSize], szSrcLine);
   nSize = nSize + 1;

   if(nSize % nGrow == 0)
   {
     // Resize the array
     char** szPrev = szContents;
     nAlloc += nGrow;
     szContents = new char* [nAlloc];
     memcpy(szContents, szPrev, nSize * sizeof(char*));
     delete szPrev;
   }
 }
 fclose (pStream);

 initial = clock();

 // Pass to QuickSort function
 QuickSort(szContents, 0, nSize - 1);

 final = clock();

 // Write sorted lines
 pStream = fopen (szDestFile, "wt");
 for(int nIndex = 0; nIndex < nSize; nIndex++)
 {
   // Write line to output file
   fprintf (pStream, "%s\n", szContents[nIndex]);
 }
 fclose (pStream);

 // Report program success
 printf("\nThe sorted lines have been written to the output file.\n\n");

 // QuickSort execution time
 double duration = (double)(final - initial) / CLOCKS_PER_SEC;

 printf("The QuickSort execution time was: %2.9lf s = %.0lf ms = %.0lf \xE6s\n\n", duration, duration * 1000, duration * 1000000);

 char op = '\0';

 do
 {
   printf("Do you wanna a BinarySearch to locate a specific key? Y/N");

   op = getche();
   if(islower(op))
     op = toupper(op);
   if(op == 'Y')
   {
     printf("\n\nType the key you want to search for: ");
     char key[1024];
     gets(key);

     initial = clock();

     if(BinarySearch(szContents, key, 0, nSize - 1))
     {
       final = clock();

       duration = (double)(final - initial) / CLOCKS_PER_SEC;

       printf("\nKey found!\n\n");

       printf("The BinarySearch execution time was: %2.9lf s = %.0lf ms = %.0lf \xE6s\n\n", duration, duration * 1000, duration * 1000000);

     }
     else
     {
       final = clock();

       duration = (double)(final - initial) / CLOCKS_PER_SEC;

       printf("\nKey not found!\n\n");

       printf("The BinarySearch execution time was: %2.9lf s = %.0lf ms = %.0lf \xE6s\n\n", duration, duration * 1000, duration * 1000000);

     }
   }
   else
   {
     // Deallocate entire array
     for(int nIndex = 0; nIndex < nSize; nIndex++)
       // Delete current array element
       delete szContents[nIndex];

     delete szContents;
     szContents = NULL;
   }
 }
 while(op == 'Y'); 
}

Visual Studio C++ Console Application

You can get the project files at:
http://leniel.googlepages.com/QuicksortBinarySearchCPlusPlus.zip

Random number generator

You can get the spreadsheet responsible for this task at:
http://leniel.googlepages.com/QuicksortBinarySearchRandomNumGen.xls

How to use it?

To use the program:
  1. Enter the name of a file that contains unsorted data;
  2. Use the sample files included in the .ZIP package as: 1000.txt and 2500.txt;
  3. In the command line "Source", type: 1000.txt;
  4. In the command line "Output", type a name to the file that will be sorted. e.g.: sorted.txt;
  5. After the sorting process, choose if you want or not to execute a Binary Search. If yes, provide a value to be searched. If not, choose if it is or not desired to execute a new Quicksort.

Postscript:
- To generate random numbers, use the file Random numbers generator.xls file;
- The file QuicksortBinarySearch.cpp contains the source code. The same can be used freely. Mention the authors.

Efficiency comparison

For the sake of comparison I've run some test cases with different input files. See the result in the table that follows:

Quicksort and Binary search performance
n File name File size (bytes) Timing (milliseconds)
Quicksort Binary search
10000 10000.txt 122.880 16 0
25000 25000.txt 200.704 78 0
50000 50000.txt 401.408 219 0
75000 75000.txt 602.112 360 0
100000 100000.txt 802.816 516 0

It's important to note that the time the quicksort takes appears to be longer but it is not. Why? Because the the program needs to read the file content and write the sorted data back to the output file so that it appears to take longer than the milliseconds shown on the above table. The timing functions just operate while the quicksort is running.

For the the binary search key I've input a value localized in the beginning of the sorted file, in the middle and in the end. There was no time changes. The binary search found the key I entered with a time less than (0 µs - microsecond). I have an AMD Athlon XP 2400 with 512 MB RAM.

See a screenshot of the last test case:

QuicksortBinarySearchCPlusPlusTestCase

The paper

You can get a copy of the paper in the .PDF format at:
http://leniel.googlepages.com/QuicksortAndBinarySearchAlgorithms.pdf