SQL Server Regular Expressions – Replacements

2011-05-25 Pavel Pawlowski CLR, SQL Server CLR, RegEx, SQL Server

In my earlier post SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching I was focusing on the Regular Expressions matching on the SQL Server using a CLR.

This post was focused only on the matches itself, but didn’t cover another great feature of the Regular Expressions and this feature is Replacements.

When using the replacements then first the regular expression is executed on the source string and then on one, several or all matches a replacement is applied and the result of such replacement is returned.

Here are the CLR functions which handle the replacements.

using System;
using System.Collections.Generic;
using System.Text;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Collections;

/// <summary>
/// Regular expressions for SQL
/// </summary>
public class SQLRegEx
{
    #region Internal structures
    private struct RegExRowReplace
    {
        /// <summary>
        /// Replacement table row
        /// </summary>
        /// <param name="matchID">ID of the match. ID = 0 represents a complete source string</param>
        /// <param name="match">
        /// String representing match. For ID=0 is NULL as it is equal to original string. Null is returnet to limit transferring the whore source string
        /// </param>
        /// <param name="result">Result of the replacement for the matchID</param>
        public RegExRowReplace(int matchID, SqlChars match, SqlChars result)
        {
            MatchID = matchID;
            Match = match;
            Result = result;
        }

        public int MatchID;
        public SqlChars Match;
        public SqlChars Result;
    }

    #endregion

    #region RegEx functions

    /// <summary>
    /// Applies Regular Expression on the Source string, takes apropriate match and aplies e replace on it.
    /// </summary>
    /// <param name="sourceString">Source string on which the regular expression should be applied</param>
    /// <param name="pattern">Regular Expression pattern</param>
    /// <param name="replacement">Replacement to be used on match</param>
    /// <param name="matchId">ID of the Match to be returned. 0 replaces all matches in the sourceString</param>
    /// <returns>Value of the Group from within a Match</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlChars RegExReplace(SqlString sourceString, SqlString pattern, SqlString replacement, int matchId)
    {
        Match m = null;
        Regex r = new Regex(pattern.Value, RegexOptions.Compiled);

        if (matchId == 0)
        {
            return new SqlChars(r.Replace(sourceString.Value, replacement.Value));
        }
        if (matchId == 1)
        {
            m = r.Match(sourceString.Value);
        }
        else if (matchId > 1)
        {
            MatchCollection mc = r.Matches(sourceString.Value);
            m = mc != null && mc.Count > matchId - 1 ? mc[matchId - 1] : null;
        }

        return m != null ? new SqlChars(m.Result(replacement.Value)) : SqlChars.Null;
    }


    /// <summary>
    /// Applies Regular Expression o the Source strings and return all matches and final results after replacement
    /// </summary>
    /// <param name="sourceString">Source string on which the regular expression should be applied</param>
    /// <param name="pattern">Regular Expression pattern</param>
    /// <param name="replacement">Replacement to be used on matches</param>
    /// <returns>IEnumerable</returns>
    [SqlFunction(FillRowMethodName = "FillRegExRowReplace")]
    public static IEnumerable RegExMatchesReplace(SqlString sourceString, SqlString pattern, SqlString replacement)
    {
        Regex r = new Regex(pattern.Value, RegexOptions.Compiled);
        int matchId = 0;

        foreach (Match m in r.Matches(sourceString.Value))
        {
            yield return new RegExRowReplace(++matchId, new SqlChars(m.Value), new SqlChars(m.Result(replacement.Value)));
        }
    }

    #endregion

    #region FillRow methods

    public static void FillRegExRowReplace(object obj, out int matchID, out SqlChars match, out SqlChars result)
    {
        RegExRowReplace row = (RegExRowReplace)obj;
        matchID = row.MatchID;
        match = row.Match;
        result = row.Result;
    }

    #endregion
}

RegExReplace

This is a scalar CLR function which handles a Regular Expression replacement. It allows to replace a single match from within a string or even replace all matches inside the string if matchID = 0 is passed as an argument.

RegExMatchesReplace

In contrast to RegExReplace the RegExMatchesReplace is a CLR table valued function, which returns all the matches as separated rows and also replacements. It returns a table with three columns MatchID which represents ID of the match from within the original string, Match which represents the particular RegEx match from within the string and Result, which is result of the replacement done on the Match.

So once the class is compiled into Assembly, we can register the functions inside SQL Server database.

CREATE ASSEMBLY [SQLRegEx]
AUTHORIZATION [dbo]
FROM 'C:CLRSQLRegEx.dll'
WITH PERMISSION_SET = SAFE

CREATE FUNCTION [dbo].[fn_RegExReplace](
    @sourceString [nvarchar](max),        --Source string to apply RegEx
    @pattern [nvarchar](4000),            --RegEx pattern to apply on the source string
    @replacement [nvarchar](400),        --Replacement pattern to apply on match
    @matchID [int] = 1                    --ID of the match to return. @matchID = 1 is the first match. IF @matchID=0 then all the patterns in the @sourceString are replaced
)        
RETURNS [nvarchar](4000)
WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExReplace]
GO


CREATE FUNCTION [dbo].[fn_RegExMatchesReplace](
    @sourceString [nvarchar](max),  --Source string to apply RegEx
    @pattern [nvarchar](4000),        --RegEx to apply on the source string
    @replacement [nvarchar](4000)    --Replacement pattern to apply on matches
)        
RETURNS  TABLE (
    [matchId] [int] NULL,                --ID of returnet match
    [match] [nvarchar](4000) NULL,        --value of match
    [result] [nvarchar](4000) NULL        --Replacement result
) WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExMatchesReplace]
GO

Testing the Replace method

Here are some sample queries so you can imagine how the functions work.

Replace year in all Dates with year 2011

WITH StringDates AS (
    SELECT '2010/09/3' AS StringDate UNION ALL
    SELECT '2010/07/09' UNION ALL
    SELECT '2009/1/15'
)
SELECT
    StringDate,
    dbo.fn_RegExReplace(StringDate, '(d{4})/(d{1,2})/(d{1,2})', '2011/$2/$3', 0)
FROM StringDates

Remove the values from the Key=Value pairs and leave only the Key values separated by semicolons

SELECT dbo.fn_RegExReplace(     'Key1=Val1;Key2=Val2;Key3=Val3;Key4=Val4',     's*?(w+?)s*?=s*?(.*?)(;|$)',     '$1$3',     0)

Get list of all dates from within a string and replace the year of each date by 2011

SELECT
*
FROM dbo.fn_RegExMatchesReplace(
    '2010/09/3;2010/07/09;2009/1/15', 
    '(d{4})/(d{1,2})/(d{1,2})', 
    '2011/$2/$3'
    )

From the examples we can see, that CLR give us a real power to using Regular Expressions on SQL Server. It allows us to make some complex transformation which could not be possible to do using a pure T-SQL and in case of transformation which can be done by pure T-SQL, mostly the CLR RegEx solution is much faster and straightforward.

Fastest CSV strings splitting using CLR (T-SQL vs. CLR revisited)

2010-10-06 Pavel Pawlowski CLR, T-SQL CLR, RegEx, SQL Server, String splitting, T-SQL

In one my previous blog post SQL Server – String splitting (T-SQL vs. CLR) I was comparing speed of T-SQL solution for string splitting vs. CLR RegEx solution. Although the CLR RegEx is fast enough, it isn’t the fastest solution for simple CSV string splitting. Also I will again compare it to the T-SQL solution.

In the mentioned post the T-SQL solution wasn’t usable for larger amount of CSV values, but after some investigations by Jeff Moden in post http://www.sqlservercentral.com/Forums/FindPost997236.aspx found a mistake we both made in the T-SQL Testing, and therefore I will post here also updated comparison to the T-SQL version

Fastest CLR version

Probably fastest CLR version for splitting sting is a CLR table-valued function which processes the whole string on character level and on a delimiter sends the results.

Here is one possible solution.

public class StringSplit
{
    private struct StrRow
    {
        public StrRow(int rowId, SqlChars value)
        {
            RowId = rowId;
            Value = value;
        }

        public int RowId;
        public SqlChars Value;

    }

    [SqlFunction(FillRowMethodName = "FillSplitString3")]
    public static IEnumerable SplitString3(SqlString sourceString, string delimiter, int maxLen)
    {
        char[] buffer = new char[maxLen];
        char delim = delimiter[0];
        int rowNumber = 0;
        int chars = 0;
        char[] finalString;

        foreach (char chr in sourceString.Value)
        {
            if (chr == delim)
            {
                finalString = new char[chars];
                Array.Copy(buffer, finalString, chars);
                yield return new StrRow(++rowNumber, new SqlChars(finalString));
                chars = 0;
            }
            else
            {
                buffer[chars++] = chr;
            }
        }
        if (chars > 0)
        {
            finalString = new char[chars];
            Array.Copy(buffer, finalString, chars);
            yield return new StrRow(++rowNumber, new SqlChars(finalString));
        }

    }

    [SqlFunction(FillRowMethodName = "FillSplitString3")]
    public static IEnumerable SplitString4(SqlString sourceString, string delimiter)
    {
        StringBuilder sb = new StringBuilder();
        char delim = delimiter[0];
        int rowNumber = 0;
        foreach (char chr in sourceString.Value)
        {
            if (chr == delim)
            {
                yield return new StrRow(++rowNumber, new SqlChars(sb.ToString()));
                sb = new StringBuilder(sb.Capacity);
            }
            else
            {
                sb.Append(chr);
            }
        }
        if (sb.Length > 0)
        {
            yield return new StrRow(++rowNumber, new SqlChars(sb.ToString()));
        }

    }

    public static void FillSplitString3(object obj, out int rowId, out SqlChars value)
    {
        StrRow r = (StrRow)obj;
        rowId = r.RowId;
        value = r.Value;
    }
}

CREATE FUNCTION dbo.fn_SplitString3(
  @sourceString nvarchar(max),
  @delimiter nchar(1),
  @maxLen int
)
RETURNS  TABLE (
    RowID int NULL,
    Value nvarchar(10) NULL
) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME SQLRegEx.StringSplit.SplitString3
GO

This function takes three parameters. First the source string to be split, delimiter and maxLen, which is maximum length for an item in the CSV List. It is used to allocate buffer. And e.g.. for integer values it will be 10 as positive integer will have maximum of 10 digits. It is possible to write this function also without this parameter, but I’ve added it because of speed, as it doesn’t require buffer reallocations.

I will compare the speed also to the CLR RegEx version. I will use the function mentioned in my previous post.

For CLR RegEx we will use a simple Regular expression ”d+” as it is enough for the integer values delimited by commas.

As T-SQL candidate for speed comparison I will use the latest optimized version of Tally table splitting by Jeff Moden.

CREATE FUNCTION dbo.Split8KTallyM (
    @Parameter VARCHAR(8000),
    @Delimiter VARCHAR(1)
)
RETURNS @Result TABLE (ItemNumber INT, ItemValue INT) AS
  BEGIN
 INSERT INTO @Result
        (ItemNumber, ItemValue)
 SELECT CAST(ROW_NUMBER() OVER (ORDER BY N) AS INT) AS ItemNumber,
        SUBSTRING(@Parameter,N,CHARINDEX(@Delimiter,@Parameter+@Delimiter,N)-N) AS ItemValue
   FROM dbo.Tally
  WHERE N BETWEEN 1 AND LEN(@Parameter)+1
    AND SUBSTRING(@Delimiter+@Parameter,N,1) = @Delimiter; --Notice how we find the comma
 RETURN
    END;
GO

Test data preparation

I will use as test data the same tables as in previous tests. We will use table with 10 000 rows and each will be with different length of CSV string (16 items, 100 items and 1333 items). The table definition will be only modified and the string will not be stored as nvarchar(max) but as varchar(max). The nvarchar in previous test totally degraded the T-SQL solution so it was not usable for 1333 item in SCV string.

SELECT TOP 11000
    IDENTITY(INT, 1, 1) AS N
INTO dbo.Tally
FROM sys.all_objects o1, sys.all_objects
GO

--Add Clustered Index on Tally table
ALTER TABLE dbo.Tally
    ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
GO

--Create and populate CsvTest table (doesn't matter whether the table has Clustered index or it is simply heap)
SELECT TOP (10000) --Controls the number of rows in the test table
    ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,
    (
        SELECT CAST(STUFF( --=== STUFF get's rid of the leading comma
                ( --=== This builds CSV row with a leading comma
                SELECT TOP (16) --Controls the number of CSV elements in each row
                    ','+CAST(ABS(CHECKSUM(NEWID()))%100000 AS VARCHAR(10))
                FROM dbo.Tally t3      --Classic cross join pseudo-cursor
                CROSS JOIN dbo.Tally t4 --can produce row sets up 121 million.
                WHERE t1.N <> t3.N --Without this line, all rows would be the same
                FOR XML PATH('')
                )
                ,1,1,'') AS VARCHAR(8000))
                ) AS CsvParameter
INTO CsvTest
FROM dbo.Tally t1        --Classic cross join pseudo-cursor
CROSS JOIN dbo.Tally t2;  --can produce row sets up 121 million.
GO

SELECT TOP (10000) --Controls the number of rows in the test table
    ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,
    (
        SELECT CAST(STUFF( --=== STUFF get's rid of the leading comma
                ( --=== This builds CSV row with a leading comma
                SELECT TOP (100) --Controls the number of CSV elements in each row
                    ','+CAST(ABS(CHECKSUM(NEWID()))%100000 AS VARCHAR(10))
                FROM dbo.Tally t3      --Classic cross join pseudo-cursor
                CROSS JOIN dbo.Tally t4 --can produce row sets up 121 million.
                WHERE t1.N <> t3.N --Without this line, all rows would be the same
                FOR XML PATH('')
                )
                ,1,1,'') AS VARCHAR(8000))
                ) AS CsvParameter
INTO CsvTest2
FROM dbo.Tally t1        --Classic cross join pseudo-cursor
CROSS JOIN dbo.Tally t2;  --can produce row sets up 121 million.
GO

SELECT TOP (10000) --Controls the number of rows in the test table
    ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,
    (
        SELECT CAST(STUFF( --=== STUFF get's rid of the leading comma
                ( --=== This builds CSV row with a leading comma
                SELECT TOP (1333) --Controls the number of CSV elements in each row
                    ','+CAST(ABS(CHECKSUM(NEWID()))%100000 AS VARCHAR(10))
                FROM dbo.Tally t3      --Classic cross join pseudo-cursor
                CROSS JOIN dbo.Tally t4 --can produce row sets up 121 million.
                WHERE t1.N <> t3.N --Without this line, all rows would be the same
                FOR XML PATH('')
                )
                ,1,1,'') AS VARCHAR(8000))
                ) AS CsvParameter
INTO CsvTest3
FROM dbo.Tally t1        --Classic cross join pseudo-cursor
CROSS JOIN dbo.Tally t2;  --can produce row sets up 121 million.
GO

Speed comparison

Here is a script I will use to compare the speed:

--================= 16 items ==========
GO
--CLR fn_SplitString3
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.RowID,
    @ItemValue = V.Value
FROM dbo.CsvTest D
CROSS APPLY dbo.fn_SplitString3(D.CsvParameter, ',', 10) V
GO
--CLR RegEx
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.RowID,
    @ItemValue = V.Value
FROM dbo.CsvTest D
CROSS APPLY dbo.fn_RegExMatches2(D.CsvParameter, 'd+') V
GO
--T-SQL Split8KTallyM
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.ItemNumber,
    @ItemValue = V.ItemValue
FROM dbo.CsvTest D
CROSS APPLY dbo.Split8KTallyM(D.CsvParameter, ',') V
GO
--================= 100 items ==========
GO
--CLR fn_SplitString3
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.RowID,
    @ItemValue = V.Value
FROM dbo.CsvTest2 D
CROSS APPLY dbo.fn_SplitString3(D.CsvParameter, ',', 10) V
GO
--CLR RegEx
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.RowID,
    @ItemValue = V.Value
FROM dbo.CsvTest2 D
CROSS APPLY dbo.fn_RegExMatches2(D.CsvParameter, 'd+') V
GO
--T-SQL Split8KTallyM
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.ItemNumber,
    @ItemValue = V.ItemValue
FROM dbo.CsvTest2 D
CROSS APPLY dbo.Split8KTallyM(D.CsvParameter, ',') V
GO
--================= 1333 items ==========
GO
--CLR fn_SplitString3
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.RowID,
    @ItemValue = V.Value
FROM dbo.CsvTest3 D
CROSS APPLY dbo.fn_SplitString3(D.CsvParameter, ',', 10) V
GO
--CLR RegEx
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.RowID,
    @ItemValue = V.Value
FROM dbo.CsvTest3 D
CROSS APPLY dbo.fn_RegExMatches2(D.CsvParameter, 'd+') V
GO
--T-SQL Split8KTallyM
DECLARE @RowNum INT, @ItemNumber INT, @ItemValue INT;
SELECT
    @RowNum = D.RowNum,
    @ItemNumber = V.ItemNumber,
    @ItemValue = V.ItemValue
FROM dbo.CsvTest3 D
CROSS APPLY dbo.Split8KTallyM(D.CsvParameter, ',') V
GO

And here are the results from profiler:

Results of comparison and conclusion

As we can see in the output from profiles, the new fn_SplitString3 function is unbeatable in all scenarios. While the T-SQL took 3.5 seconds for 16 items, the new CLR split function takes only 253 milliseconds. As mentioned in previous post, the CLR RegEx benefits at higher items count over 100. And in higher counts beats the T-SQL Solutions. The new fn_SplitString even on 1333 items count took only 8.2 sec.

SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching

2010-09-12 Pavel Pawlowski CLR, SQL Server CLR, RegEx, SQL Server

To write this blog post I was inspired by a question on Ask SQL Server Central.com about finding a substring between two characters.

Some times it is necessary to process input string and retrieve some parts of them. For this purposes we can write some more complex T-SQL constructs or use the CLR functionality of SQL Server 2005 and SQL Server 2008 and write some functions for processing Regular Expressions.

Generally all the string manipulations (especially on very large string values) are much better performing using a CLR scalar and table-valued functions than their equivalent using pure T-SQL code. Also using regular expression you do not need to write specialized functions for particular processing as you simply write a regular expression and use it in the RegEx function.

For general info about writing a CLR functions read MSDN.

More info on regular expressions you can find on Regular-Expressions.info and you can find inspiration also in the Regular Expression Library.

Implementation

For such use you can find a below SQLRegEx class which contains two functions – RegExMatch with is a scalar CLR function to return particular group from within particular match and CLR table-valued function RegExMatches which return all matches and groups.

Below is a code snippet you can use to create an assembly.

public class SQLRegEx
{
    private class RegExRow
    {
        /// <summary>
        /// Private class for passing matches of the RegExMatches to the FillRow method
        /// </summary>
        /// <param name=”rowId”>ID of the Row</param>
        /// <param name=”matchId”>ID of the Match</param>
        /// <param name=”groupID”>ID of the Group within the Match</param>
        /// <param name=”value”>Value of the particular group</param>
        public RegExRow(int rowId, int matchId, int groupID, string value)
        {
            RowId = rowId;
            MatchId = matchId;
            GroupID = groupID;
            Value = value;
        }

        public int RowId;
        public int MatchId;
        public int GroupID;
        public string Value;
    }

    /// <summary>
    /// Applies Regular Expression on the Source string and returns value of particular group from withing a specified match
    /// </summary>
    /// <param name=”sourceString”>Source string on which the regular expression should be applied</param>
    /// <param name=”pattern”>Regular Expression pattern</param>
    /// <param name=”matchId”>ID of the Match to be returned 1 inex-based</param>
    /// <param name=”groupId”>ID of the group from within a match to return. GroupID 0 returns complete match</param>
    /// <returns>Value of the Group from within a Match</returns>
    [SqlFunction(IsDeterministic=true)]
    public static SqlChars RegExMatch(string sourceString, string pattern, int matchId, int groupId)
    {
        Match m = null;
        Regex r = new Regex(pattern, RegexOptions.Compiled);

        if (matchId == 1)
        {
            m = r.Match(sourceString);
        }
        else if (matchId > 1)
        {
            MatchCollection mc = r.Matches(sourceString);
            m = mc != null && mc.Count > matchId – 1 ? mc[matchId - 1] : null;
        }

        return m != null && m.Groups.Count > groupId ? new SqlChars(m.Groups[groupId].Value) : SqlChars.Null;
    }

    /// <summary>
    /// Applies Regular Expression o the Source strings and return all matches and groups
    /// </summary>
    /// <param name=”sourceString”>Source string on which the regular expression should be applied</param>
    /// <param name=”pattern”>Regular Expression pattern</param>
    /// <returns>Returns list of RegExRows representing the group value</returns>
    [SqlFunction(FillRowMethodName = "FillRegExRow")]
    public static IEnumerable RegExMatches(string sourceString, string pattern)
    {
        Regex r = new Regex(pattern, RegexOptions.Compiled);
        int rowId = 0;
        int matchId = 0;
        foreach (Match m in r.Matches(sourceString))
        {
            matchId++;
            for (int i = 0; i < m.Groups.Count; i++)
            {
                yield return new RegExRow(++rowId, matchId, i, m.Groups[i].Value);
            }
        }
    }

    /// <summary>
    /// FillRow method to populate the output table
    /// </summary>
    /// <param name=”obj”>RegExRow passed as object</param>
    /// <param name=”rowId”>ID or the returned row</param>
    /// <param name=”matchId”>ID of returned Match</param>
    /// <param name=”groupID”>ID of group in the Match</param>
    /// <param name=”value”>Value of the Group</param>
    public static void FillRegExRow(Object obj, out int rowId, out int matchId, out int groupID, out SqlChars value)
    {
        RegExRow r = (RegExRow)obj;
        rowId = r.RowId;
        matchId = r.MatchId;
        groupID = r.GroupID;
        value = new SqlChars(r.Value);
    }
}

RegExMatch function

This is a scalar function which takes three arguments. First is the source string to be parsed using regular expressions. Second parameter is regular expression itself, third is a matchId (counted from 1 and which represents a match to be returned) to be returned if there could be more matches and fourth is a Group ID to be returned from within the regular expression match. When there is no group or you want to return simply the complete Match, pass 0.

RegExMatches function

This is CLR table function and it returns all matches including each group in each match. The complete Match has groupID equal to 0 in the output. Input parameters are source string to be parsed using the regular expression a second is the regular expression pattern.

Once we have a compiled assembly, we can register the assembly and function in database and start using them for various Regular Expressions processing.

CREATE ASSEMBLY [SQLRegEx]
AUTHORIZATION [dbo]
FROM 'C:CLRSQLRegEx.dll'
WITH PERMISSION_SET = SAFE
GO

CREATE FUNCTION [dbo].[fn_RegExMatch](
    @sourceString nvarchar(max), --Source string to be processed by regular expression
    @pattern nvarchar(4000), --Regular expression (pattern) to be applied on the source string
    @matchId int, --ID of the match to be returned (if there are more matches, you can return a concrete one). Numbering starts from 1
    @groupID int --ID of the group in the match to be returned. GroupID = 0 represents a complete match.
)
RETURNS nvarchar(4000)
WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExMatch]
GO

CREATE FUNCTION [dbo].[fn_RegExMatches](
    @sourceString nvarchar(max), --Source string to be processed by regular expression
    @pattern nvarchar(4000) --Regular expression (pattern) to be applied on the source string
)
RETURNS TABLE (
    [rowId] int, --RowId each row as it`s ID
    [matchId] int, --ID of particular match (starts from 1)
    [groupId] int, --ID of particular group in RegEx match (GroupID = 0) represents a complete match
    [value] nvarchar(4000) --value of the group
) WITH EXECUTE AS CALLER
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExMatches]
GO

RegExMatches expamples

Eg. Split semicolon separated list of Key=Value pairs. Pair has the same matchId and Key has groupID = 1 and value has groupID=2. Without the WHERE Condition also the GroupID 0 will be returned which contains each pair.

Split Key=Value pairs

SELECT
*
FROM dbo.fn_RegExMatches('Key1=Val1;Key2=Val2;Key3=Val3;Key4=Val4', 's*(w+)s*=s*?(.*?)(?=;|z)')
WHERE groupId > 0

rowId       matchId     groupId     value
———– ———– ———– ——
2           1           1           Key1
3           1           2           Val1
5           2           1           Key2
6           2           2           Val2
8           3           1           Key3
9           3           2           Val3
11          4           1           Key4

Splitting comma separated list

SELECT
    *
FROM dbo.fn_RegExMatches('1,2,3,4,5,6,7,8', '(d+?)(?:,|z)')
WHERE groupId = 1

--or even this and doesn’t matter what separator is used
SELECT
    *
FROM dbo.fn_RegExMatches('1,2,3,4,5;6,7,8', 'w+')

List of words in sentence

SELECT
    *
FROM dbo.fn_RegExMatches('This is a sample sentence to Split.', 'w+')

RegExMatch examples

Value of the Key=Value pair

SELECT dbo.fn_RegExMatch('Key = Value' ,'s*w+s*=s*(.*)', 1, 1)

Fourth word in the string

SELECT dbo.fn_RegExMatch('This is a sample sentence to Split.', 'w+', 4, 0)

Sixth number in list

SELECT dbo.fn_RegExMatch('1,2,3,4,5;6,7,8', 'w+', 6, 0)

Email address validation

--validating email address
DECLARE @emailAddress nvarchar(20)
SET @emailAddress = 'name@domain.com'
SELECT
    CASE
        WHEN
            @emailAddress =
            dbo.fn_RegExMatch(@emailAddress, '([a-zA-Z0-9_-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})', 1, 0)
        THEN 1
        ELSE 0
    END AS ValidEmail

--lists all invalid email
WITH emails AS (
    SELECT 'name@domain.com' AS email UNION ALL
    SELECT 'name@subdomain.domain.com' UNION All
    SELECT 'name@this.email.is.wrong' UNION ALL
    SELECT 'another.wrong@@domain.com'
)
SELECT
    *
FROM emails
WHERE email <> dbo.fn_RegExMatch(email, '([a-zA-Z0-9_-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})', 1, 0)

Get day from string date

WITH StringDates AS (
    SELECT '2010/09/3' AS StringDate UNION ALL
    SELECT '2010/07/09' UNION ALL
    SELECT '2009/1/15'
)
SELECT
    StringDate,
    dbo.fn_RegExMatch(StringDate, '(d{4})/(d{1,2})/(d{1,2})', 1, 3) AS DateDay
FROM StringDates

Summary

Hope, this article will help you using regular expressions in SQL Server and you will be able to simplify and speed-up your queries.

You may be also interested in my next post SQL Server Regular Expressions – Replacements which describes possibilities of string replacements using regular expressions.

Pavel's SQL Pages

SQL Server & other stuff related

RegEx

SQL Server Regular Expressions – Replacements

RegExReplace

RegExMatchesReplace

Testing the Replace method

Replace year in all Dates with year 2011

Remove the values from the Key=Value pairs and leave only the Key values separated by semicolons

Get list of all dates from within a string and replace the year of each date by 2011

Fastest CSV strings splitting using CLR (T-SQL vs. CLR revisited)

Fastest CLR version

Test data preparation

Speed comparison

Results of comparison and conclusion

SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching

Implementation

RegExMatch function

RegExMatches function

RegExMatches expamples

Split Key=Value pairs

Splitting comma separated list

List of words in sentence

RegExMatch examples

Value of the Key=Value pair

Fourth word in the string

Sixth number in list

Email address validation

Get day from string date

Summary

RegExReplace

RegExMatchesReplace

Testing the Replace method

Replace year in all Dates with year 2011

Remove the values from the Key=Value pairs and leave only the Key values separated by semicolons

Get list of all dates from within a string and replace the year of each date by 2011

Share this:

Fastest CLR version

Test data preparation

Speed comparison

Results of comparison and conclusion

Share this:

Implementation

RegExMatch function

RegExMatches function

RegExMatches expamples

Split Key=Value pairs

Splitting comma separated list

List of words in sentence

RegExMatch examples

Value of the Key=Value pair

Fourth word in the string

Sixth number in list

Email address validation

Get day from string date

Summary

Share this: