SQL Server Regular Expressions – Replacements

In my earlier post SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching I was focusing on the Regular Expressions matching on the SQL Server using a CLR.

This post was focused only on the matches itself, but didn’t cover another great feature of the Regular Expressions and this feature is Replacements.

When using the replacements then first the regular expression is executed on the source string and then on one, several or all matches a replacement is applied and the result of such replacement is returned.

Here are the CLR functions which handle the replacements.

using System;
using System.Collections.Generic;
using System.Text;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Collections;

/// <summary>
/// Regular expressions for SQL
/// </summary>
public class SQLRegEx
{
    #region Internal structures
    private struct RegExRowReplace
    {
        /// <summary>
        /// Replacement table row
        /// </summary>
        /// <param name="matchID">ID of the match. ID = 0 represents a complete source string</param>
        /// <param name="match">
        /// String representing match. For ID=0 is NULL as it is equal to original string. Null is returnet to limit transferring the whore source string
        /// </param>
        /// <param name="result">Result of the replacement for the matchID</param>
        public RegExRowReplace(int matchID, SqlChars match, SqlChars result)
        {
            MatchID = matchID;
            Match = match;
            Result = result;
        }

        public int MatchID;
        public SqlChars Match;
        public SqlChars Result;
    }

    #endregion

    #region RegEx functions

    /// <summary>
    /// Applies Regular Expression on the Source string, takes apropriate match and aplies e replace on it.
    /// </summary>
    /// <param name="sourceString">Source string on which the regular expression should be applied</param>
    /// <param name="pattern">Regular Expression pattern</param>
    /// <param name="replacement">Replacement to be used on match</param>
    /// <param name="matchId">ID of the Match to be returned. 0 replaces all matches in the sourceString</param>
    /// <returns>Value of the Group from within a Match</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlChars RegExReplace(SqlString sourceString, SqlString pattern, SqlString replacement, int matchId)
    {
        Match m = null;
        Regex r = new Regex(pattern.Value, RegexOptions.Compiled);

        if (matchId == 0)
        {
            return new SqlChars(r.Replace(sourceString.Value, replacement.Value));
        }
        if (matchId == 1)
        {
            m = r.Match(sourceString.Value);
        }
        else if (matchId > 1)
        {
            MatchCollection mc = r.Matches(sourceString.Value);
            m = mc != null && mc.Count > matchId - 1 ? mc[matchId - 1] : null;
        }

        return m != null ? new SqlChars(m.Result(replacement.Value)) : SqlChars.Null;
    }


    /// <summary>
    /// Applies Regular Expression o the Source strings and return all matches and final results after replacement
    /// </summary>
    /// <param name="sourceString">Source string on which the regular expression should be applied</param>
    /// <param name="pattern">Regular Expression pattern</param>
    /// <param name="replacement">Replacement to be used on matches</param>
    /// <returns>IEnumerable</returns>
    [SqlFunction(FillRowMethodName = "FillRegExRowReplace")]
    public static IEnumerable RegExMatchesReplace(SqlString sourceString, SqlString pattern, SqlString replacement)
    {
        Regex r = new Regex(pattern.Value, RegexOptions.Compiled);
        int matchId = 0;

        foreach (Match m in r.Matches(sourceString.Value))
        {
            yield return new RegExRowReplace(++matchId, new SqlChars(m.Value), new SqlChars(m.Result(replacement.Value)));
        }
    }

    #endregion

    #region FillRow methods

    public static void FillRegExRowReplace(object obj, out int matchID, out SqlChars match, out SqlChars result)
    {
        RegExRowReplace row = (RegExRowReplace)obj;
        matchID = row.MatchID;
        match = row.Match;
        result = row.Result;
    }

    #endregion
}

RegExReplace

This is a scalar CLR function which handles a Regular Expression replacement. It allows to replace a single match from within a string or even replace all matches inside the string if matchID = 0 is passed as an argument.

RegExMatchesReplace

In contrast to RegExReplace the RegExMatchesReplace is a CLR table valued function, which returns all the matches as separated rows and also replacements. It returns a table with three columns MatchID which represents ID of the match from within the original string, Match which represents the particular RegEx match from within the string and Result, which is result of the replacement done on the Match.

So once the class is compiled into Assembly, we can register the functions inside SQL Server database.

CREATE ASSEMBLY [SQLRegEx]
AUTHORIZATION [dbo]
FROM 'C:CLRSQLRegEx.dll'
WITH PERMISSION_SET = SAFE

CREATE FUNCTION [dbo].[fn_RegExReplace](
    @sourceString [nvarchar](max),        --Source string to apply RegEx
    @pattern [nvarchar](4000),            --RegEx pattern to apply on the source string
    @replacement [nvarchar](400),        --Replacement pattern to apply on match
    @matchID [int] = 1                    --ID of the match to return. @matchID = 1 is the first match. IF @matchID=0 then all the patterns in the @sourceString are replaced
)        
RETURNS [nvarchar](4000)
WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExReplace]
GO


CREATE FUNCTION [dbo].[fn_RegExMatchesReplace](
    @sourceString [nvarchar](max),  --Source string to apply RegEx
    @pattern [nvarchar](4000),        --RegEx to apply on the source string
    @replacement [nvarchar](4000)    --Replacement pattern to apply on matches
)        
RETURNS  TABLE (
    [matchId] [int] NULL,                --ID of returnet match
    [match] [nvarchar](4000) NULL,        --value of match
    [result] [nvarchar](4000) NULL        --Replacement result
) WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExMatchesReplace]
GO

Testing the Replace method

Here are some sample queries so you can imagine how the functions work.

Replace year in all Dates with year 2011
WITH StringDates AS (
    SELECT '2010/09/3' AS StringDate UNION ALL
    SELECT '2010/07/09' UNION ALL
    SELECT '2009/1/15'
)
SELECT
    StringDate,
    dbo.fn_RegExReplace(StringDate, '(d{4})/(d{1,2})/(d{1,2})', '2011/$2/$3', 0)
FROM StringDates
Remove the values from the Key=Value pairs and leave only the Key values separated by semicolons
SELECT dbo.fn_RegExReplace(     'Key1=Val1;Key2=Val2;Key3=Val3;Key4=Val4',     's*?(w+?)s*?=s*?(.*?)(;|$)',     '$1$3',     0)
Get list of all dates from within a string and replace the year of each date by 2011
SELECT
*
FROM dbo.fn_RegExMatchesReplace(
    '2010/09/3;2010/07/09;2009/1/15', 
    '(d{4})/(d{1,2})/(d{1,2})', 
    '2011/$2/$3'
    )

From the examples we can see, that CLR give us a real power to using Regular Expressions on SQL Server. It allows us to make some complex transformation which could not be possible to do using a pure T-SQL and in case of transformation which can be done by pure T-SQL, mostly the CLR RegEx solution is much faster and straightforward.