:::: MENU ::::

Friday, May 1, 2009

I'm editing in a link to Adam Machanic's blog on this. In the comments on this topic here you will see there are imperfections found in my methods. Reading Adam's blog shows this in more detail.
http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx

Thanks Adam!

I wrote this short CLR split function awhile back based on a few other articles I read when 2005 was released. I decided to play with it today and see if I could put it with the SQL split solutions.

Let's get the basics out of the way on SQL CLR. SQL CLR is only good once it's in memory. The CLR function split basically won over the T-SQL split functions after it was cached. This is a critical variable to consider when thinking CLR vs. T-SQL options on coding. If you are doing heavy manipulation of data and heavy math, CLR will typically help you, but you should be very careful with CLR and memory management. You can run your server resources out and literally stop functionality. I highly recommend reading MrDenny's blog on CLR here. Denny touches on important topics on when to use CLR and why you shouldn't. After that, look into how SQL Server 32bit, 32bit with AWE and 64bit versions handle memory. Each handles memory differently. AWE enalbed instances will probably be the one that will cause you more headaches then the rest. I had severe memory issues a few months ago on a production database server that forced restarts nightly until I fixed the problem. I analyzed the problem and it came to be several factors that caused it and SQL CLR memory was one of those factors. Here is my chance to thank mrdenny and ptheriault again for the assisatnce on that strange problem.

I went out and google'd "fast split function t-sql". Found a few and tested them against the CLR split method. I found a dozen or so split functions that looked good. I still went with a numbers table one after testing them out next to each other. Here is one of the functions I used. If you have a better one, post it in the comments and I can edit the post.

tsqlLine number On/Off | Show/Hide | Select all

  1. ALTER FUNCTION [dbo].[Split] (
  2. @List VARCHAR(7998), --The delimited list
  3. @Del CHAR(1) = ',' --The delimiter
  4. )
  5. RETURNS @T TABLE (Item VARCHAR(7998))
  6. AS
  7. BEGIN
  8. DECLARE @WrappedList VARCHAR(8000), @MaxItems INT
  9. SELECT @WrappedList = @Del + @List + @Del, @MaxItems = LEN(@List)
  10.  
  11. INSERT INTO @T (Item)
  12. SELECT SUBSTRING(@WrappedList, Number + 1, CHARINDEX(@Del, @WrappedList, Number + 1) - Number - 1)
  13. FROM dbo.Numbers n
  14. WHERE n.Number <= LEN(@WrappedList) - 1
  15. AND SUBSTRING(@WrappedList, n.Number, 1) = @Del
  16.  
  17. RETURN
  18. END

Code is hidden, SHOW

Here is my CLR split

csharpLine number On/Off | Show/Hide | Select all

  1. using System;
  2. using System.Data;
  3. using System.Collections;
  4. using System.Data.SqlClient;
  5. using System.Data.SqlTypes;
  6. using Microsoft.SqlServer.Server;
  7.  
  8. public partial class UserDefinedFunctions
  9. {
  10.     [SqlFunction(Name = "CLR_Split",
  11.     FillRowMethodName = "FillRow",
  12.     TableDefinition = "id nvarchar(10)")]
  13.  
  14.     public static IEnumerable SqlArray(SqlString str, SqlChars delimiter)
  15.     {
  16.         if (delimiter.Length == 0)
  17.             return new string[1] { str.Value };
  18.         return str.Value.Split(delimiter[0]);
  19.     }
  20.  
  21.     public static void FillRow(object row, out SqlString str)
  22.     {
  23.         str = new SqlString((string)row);
  24.     }
  25. };

Code is hidden, SHOW

I loaded a text file with a huge amount of delimited data to really get a gauge on time this would take. The string is basically, "data%data%data%data%data" and on. Around 600 indexes. I restarted my local instance of SQL Server 2005 that I did these on to ensure you can see CLR before cache and after.

More