Tech

Guides
 

Write a common sorting algorithm

By Tim Chapman, Special to ZDNet Asia
Thursday, April 10, 2008 12:20 PM
A sorting algorithm routine helps users to sort a set of data, typically stored in an array, a set or other data structure.

You are probably familiar with the built-in sorting abilities of SQL Server for columnar data, but what if you need to sort character strings stored in SQL Server fields?

Even though this may not be a common need for most users, this issue may come up from time to time.

I'll demonstrate how to write a common sorting algorithm using SQL Server TSQL code.

Sorting algorithms
A sorting algorithm is a routine that sorts a set of data, typically stored in an array, a set or other data structure.

There are many types of sorting algorithms--some of which are very fast, sophisticated and complex, and some are pretty basic.

It's best to write the more complex algorithms in iterative languages, such as C++, where it is easy to define data structures to hold the data, and the constructs to sort are readily available.

You can write the more simple algorithms in TSQL, although some iteration is required.

In this example, I will implement a variation of the Bubble Sort sorting algorithm, which is one of the easiest sorting algorithms to implement because of its "brute force" nature. It loops through the characters, checking each one against the one next to it until it has determined that the strings (or numbers) are in sorted order.

Because of how it's implemented, Bubble Sort is one of the worst performing sorting algorithms, and it is only used for sorting small sets of data.

My example will contain small sets, so this algorithm will be fine, although I wouldn't run it on a field in a table with millions of records.

In "algorithm terms" Bubble Sort is O(n2), which means the maximum number of operations to sort the set is the number of items in the set squared. For my example, this number will be a little bit worse because of the manner in which I have to do some of the swapping in TSQL.

If this algorithm was implemented in C++, you could write it in such a way as to performing at O(n2).

The example
Before I get to the sorting function, I need to write a helper function to assist me with the sorting. This function will do the work of swapping two characters based on specific positions in the character string.

If I have the string called elephant, and I need to swap the second and fourth characters of the string, my swap function will produce epelhant. It may not make much sense to do this right now, but it is vital to the way in which our Bubble Sort algorithm works. (The example in this article applies to SQL Server 2000 and later.)

CREATE FUNCTION dbo.udf_Swap
 (
     @fullstring VARCHAR(1000),
     @charlocation1 TINYINT,
     @charlocation2 TINYINT
 )
 RETURNS VARCHAR(1000)
 AS
 BEGIN
         DECLARE @returnval varchar(1000)         

         DECLARE @begin VARCHAR(1000), @middle VARCHAR(1000), 
         @end VARCHAR(1000)
         DECLARE @firstchar CHAR(1), @secondchar CHAR(1), @len INT         

         SET @fullstring = LTRIM(RTRIM(@fullstring))
         SET @len = LEN(@fullstring)         

     IF @charlocation1 > @len OR @charlocation2 > @len
         SET @returnval = @fullstring
         ELSE
         BEGIN
                SET @firstchar = SUBSTRING(@fullstring, 
                @charlocation1, 1)
                SET @secondchar = SUBSTRING(@fullstring, 
                @charlocation2, 1)
                SET @begin = LEFT(@fullstring, 
                (@charlocation1-1))
                SET @middle = SUBSTRING(@fullstring, 
                @charlocation1+1, (@charlocation2-@charlocation1)-1)
                SET @end = SUBSTRING(@fullstring, 
                @charlocation2+1, @len)
                SET @returnval = @begin + @secondchar + @middle + 
                @firstchar + @end
         END         

     RETURN(@returnval)
 END

Now that I have my helper function, I can write the sorting function. This function, udf_SortString, accepts a string of characters and returns them in sorted order. If there are numbers, they will be returned first, followed by any alpha characters. If the string has one or more words, the words will be sorted individually.

There are two loops in this function that do the sorting work. The first loop's job is to loop until characters have been swapped. Once a set of characters have been swapped, it is time to start at the beginning and do another scan. This loop will end once no characters have been swapped, and the string is in sorted order.

The second loop examines each character in the string and compares it to the next character in the string. If the characters are in sorted order, it moves to the next character. If the characters are not sorted, it uses the function I created earlier and swaps them. Then the loop starts over, and the characters are examined again and again until no more swapping occurs. Once a pass is made and there is no swapping, the string is in sorted order.

CREATE FUNCTION udf_SortString
 (
     @string VARCHAR(1000)
 )
 RETURNS VARCHAR(1000)
 AS
 BEGIN
     DECLARE @len TINYINT
     DECLARE @i TINYINT
     DECLARE @currentchar CHAR(1)
     DECLARE @swapped BIT
     DECLARE @begin BIT
     DECLARE @nextchar CHAR(1)         

     SET @begin = 1
     SET @len = LEN(@string)
     SET @i = 1         

     WHILE @begin = 1 OR @swapped = 1
     BEGIN
         SET @swapped = 0
         SET @i = 1
         SET @begin = 0         

         WHILE @i <= @len
         BEGIN         

             SET @currentchar = SUBSTRING(@string, @i, 1)
             SET @nextchar = SUBSTRING(@string, @i + 1, 1)
             IF @currentchar > @nextchar AND (@nextchar > '')
             BEGIN
                 SET @string = dbo.udf_swap(@string, @i, @i + 1)
                 SET @swapped = 1
             END
             SET @i = @i + 1         

         END
     END
     RETURN(@string)
 END
 GO

All I need to do is call my function to sort a string:

SELECT dbo.udf_SortString('asdofimasdfasdf23rasdf')

Sort it out
Most database users don't think about how much work goes into sorting data and searching for data because the database provides the functionality. By understanding how this works, it gives me a greater appreciation of the functionality.

While you may not apply the information you learned in this example for most users, I wanted to show how you can write your own string sorting functions in TSQL if the need arises. Also, I wanted to provide an overview. Hope you learned about sorting algorithms if you weren't already aware of them.

Tim Chapman, an SQL Server database administrator and consultant who works for a bank, has more than eight years of IT experience, and he is a Microsoft certified Database Developer and Administrator. If you would like to contact him, please e-mail him at chapman.tim@gmail.com.



WORTHWHILE?

0

0 votes
Blog

Talkback 0 comments

There are currently no comments for this post.


Guest user

Guest user

Level: 
Joined: —
Already a member? Log in »



 

Loading...

Whitepapers/Case Studies

Downloads

Database News



Tech Jobs Now!