Second Update to Querying Active Directory on SQL Server using CLR

Finally I had a time to make a second update to my post Querying Active Directory on SQL Server using CLR. This update should finally resolve an issue when you need to return large numbers of properties from Active Directory.

The solution is very simple by adding a possibility to specify the columns lengths (maximum length) for the returned properties. All the structures in the code left the same as they were originally. The only change is, that the list of properties is no longer passed as a comma separated list and now semicolon is used to delimit the properties. Comma can be used to provide a column length. If the length is not provided then the default of 4000 is used as it was before.

So don’t hesitate and check the updated article.

I’ve also provided a link to an updated Visual C# project with all the codes.

SQL Server Regular Expressions – Replacements

In my earlier post SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching I was focusing on the Regular Expressions matching on the SQL Server using a CLR.

This post was focused only on the matches itself, but didn’t cover another great feature of the Regular Expressions and this feature is Replacements.

When using the replacements then first the regular expression is executed on the source string and then on one, several or all matches a replacement is applied and the result of such replacement is returned.

Here are the CLR functions which handle the replacements.

using System;
using System.Collections.Generic;
using System.Text;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Collections;

/// <summary>
/// Regular expressions for SQL
/// </summary>
public class SQLRegEx
{
    #region Internal structures
    private struct RegExRowReplace
    {
        /// <summary>
        /// Replacement table row
        /// </summary>
        /// <param name="matchID">ID of the match. ID = 0 represents a complete source string</param>
        /// <param name="match">
        /// String representing match. For ID=0 is NULL as it is equal to original string. Null is returnet to limit transferring the whore source string
        /// </param>
        /// <param name="result">Result of the replacement for the matchID</param>
        public RegExRowReplace(int matchID, SqlChars match, SqlChars result)
        {
            MatchID = matchID;
            Match = match;
            Result = result;
        }

        public int MatchID;
        public SqlChars Match;
        public SqlChars Result;
    }

    #endregion

    #region RegEx functions

    /// <summary>
    /// Applies Regular Expression on the Source string, takes apropriate match and aplies e replace on it.
    /// </summary>
    /// <param name="sourceString">Source string on which the regular expression should be applied</param>
    /// <param name="pattern">Regular Expression pattern</param>
    /// <param name="replacement">Replacement to be used on match</param>
    /// <param name="matchId">ID of the Match to be returned. 0 replaces all matches in the sourceString</param>
    /// <returns>Value of the Group from within a Match</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlChars RegExReplace(SqlString sourceString, SqlString pattern, SqlString replacement, int matchId)
    {
        Match m = null;
        Regex r = new Regex(pattern.Value, RegexOptions.Compiled);

        if (matchId == 0)
        {
            return new SqlChars(r.Replace(sourceString.Value, replacement.Value));
        }
        if (matchId == 1)
        {
            m = r.Match(sourceString.Value);
        }
        else if (matchId > 1)
        {
            MatchCollection mc = r.Matches(sourceString.Value);
            m = mc != null && mc.Count > matchId - 1 ? mc[matchId - 1] : null;
        }

        return m != null ? new SqlChars(m.Result(replacement.Value)) : SqlChars.Null;
    }


    /// <summary>
    /// Applies Regular Expression o the Source strings and return all matches and final results after replacement
    /// </summary>
    /// <param name="sourceString">Source string on which the regular expression should be applied</param>
    /// <param name="pattern">Regular Expression pattern</param>
    /// <param name="replacement">Replacement to be used on matches</param>
    /// <returns>IEnumerable</returns>
    [SqlFunction(FillRowMethodName = "FillRegExRowReplace")]
    public static IEnumerable RegExMatchesReplace(SqlString sourceString, SqlString pattern, SqlString replacement)
    {
        Regex r = new Regex(pattern.Value, RegexOptions.Compiled);
        int matchId = 0;

        foreach (Match m in r.Matches(sourceString.Value))
        {
            yield return new RegExRowReplace(++matchId, new SqlChars(m.Value), new SqlChars(m.Result(replacement.Value)));
        }
    }

    #endregion

    #region FillRow methods

    public static void FillRegExRowReplace(object obj, out int matchID, out SqlChars match, out SqlChars result)
    {
        RegExRowReplace row = (RegExRowReplace)obj;
        matchID = row.MatchID;
        match = row.Match;
        result = row.Result;
    }

    #endregion
}

RegExReplace

This is a scalar CLR function which handles a Regular Expression replacement. It allows to replace a single match from within a string or even replace all matches inside the string if matchID = 0 is passed as an argument.

RegExMatchesReplace

In contrast to RegExReplace the RegExMatchesReplace is a CLR table valued function, which returns all the matches as separated rows and also replacements. It returns a table with three columns MatchID which represents ID of the match from within the original string, Match which represents the particular RegEx match from within the string and Result, which is result of the replacement done on the Match.

So once the class is compiled into Assembly, we can register the functions inside SQL Server database.

CREATE ASSEMBLY [SQLRegEx]
AUTHORIZATION [dbo]
FROM 'C:CLRSQLRegEx.dll'
WITH PERMISSION_SET = SAFE

CREATE FUNCTION [dbo].[fn_RegExReplace](
    @sourceString [nvarchar](max),        --Source string to apply RegEx
    @pattern [nvarchar](4000),            --RegEx pattern to apply on the source string
    @replacement [nvarchar](400),        --Replacement pattern to apply on match
    @matchID [int] = 1                    --ID of the match to return. @matchID = 1 is the first match. IF @matchID=0 then all the patterns in the @sourceString are replaced
)        
RETURNS [nvarchar](4000)
WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExReplace]
GO


CREATE FUNCTION [dbo].[fn_RegExMatchesReplace](
    @sourceString [nvarchar](max),  --Source string to apply RegEx
    @pattern [nvarchar](4000),        --RegEx to apply on the source string
    @replacement [nvarchar](4000)    --Replacement pattern to apply on matches
)        
RETURNS  TABLE (
    [matchId] [int] NULL,                --ID of returnet match
    [match] [nvarchar](4000) NULL,        --value of match
    [result] [nvarchar](4000) NULL        --Replacement result
) WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExMatchesReplace]
GO

Testing the Replace method

Here are some sample queries so you can imagine how the functions work.

Replace year in all Dates with year 2011
WITH StringDates AS (
    SELECT '2010/09/3' AS StringDate UNION ALL
    SELECT '2010/07/09' UNION ALL
    SELECT '2009/1/15'
)
SELECT
    StringDate,
    dbo.fn_RegExReplace(StringDate, '(d{4})/(d{1,2})/(d{1,2})', '2011/$2/$3', 0)
FROM StringDates
Remove the values from the Key=Value pairs and leave only the Key values separated by semicolons
SELECT dbo.fn_RegExReplace(     'Key1=Val1;Key2=Val2;Key3=Val3;Key4=Val4',     's*?(w+?)s*?=s*?(.*?)(;|$)',     '$1$3',     0)
Get list of all dates from within a string and replace the year of each date by 2011
SELECT
*
FROM dbo.fn_RegExMatchesReplace(
    '2010/09/3;2010/07/09;2009/1/15', 
    '(d{4})/(d{1,2})/(d{1,2})', 
    '2011/$2/$3'
    )

From the examples we can see, that CLR give us a real power to using Regular Expressions on SQL Server. It allows us to make some complex transformation which could not be possible to do using a pure T-SQL and in case of transformation which can be done by pure T-SQL, mostly the CLR RegEx solution is much faster and straightforward.

Update to Querying Active Directory on SQL Server using CLR

I’ve just posted an update to my article Querying Active Directory on SQL Server using CLR. I’ve added another 2 parameters to the samples which one of them allows control the Page Size limit for paged AD Search and the second one allows limit the number of rows returned to the client. This can be handful when the AD query can return more results than we need. In such case it’s easy to limit the number of rows.

I’ve updated the article after the comment from Chris, when he received an insufficient memory error when he wanted to return a higher number of fields from AD.

In this case the lowering the page size from 1000 to lower values should avoid this kind of problems.

Querying Active Directory on SQL Server using CLR

In my previous article Querying Active Directory on SQL Server using T-SQL I was showing the possibilities of querying the Active Directory using T-SQL, specifically using linked server and OPENQUERY statement or without linked server using the OPENROWSET statement.

This is an updated article and includes modification coming upon some of the comments to resolve some issues with large number of returned AD properties.

There ware mentioned some limitations of querying the AD using T-SQL and CLR will help us to bypass those limitations. Especially the limit of 1000 records returned as we can use pagination in the CLR code effectively.

For the purpose I’m creating a CLR Stored Procedure which will take several argument and return a result set. The reason I’m going to use a CLR Stored Procedure is, that stored procedure will allow me to return dynamic result set. It means I can specify properties of AD to return and those will be returned as columns of the result set. In case we go through the CLR Table Valued function, we had to create a separate function for each properties combination we would like to return.

So here is the CLR class for querying AD.

public class ActiveDirectory
{
    /// <summary>
    /// Queries Active directory according provided parameters
    /// Current user credentials are used for authentication
    /// </summary>
    /// <param name="adRoot">AD Root for querying AD</param>
    /// <param name="filter">Filter to be used for querying</param>
    /// <param name="searchScope">Scope to be used for queryingg</param>
    /// <param name="propertiesToLoad">List of properties to return</param>
    /// <param name="pageSize">Represents a PageSise for the paged search of AD</param>
    /// <param name="rowsLimit">Rrepresent limit for numbers of rows returned. NULL or value less than 1 represents unlimited</param>
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void QueryAD(string adRoot, string filter, string propertiesToLoad, string searchScope, int pageSize, SqlInt32 rowsLimit)
    {
        SearchAD(null, null, null, adRoot, filter, searchScope, propertiesToLoad, pageSize, rowsLimit);
    }

    /// <summary>
    /// Queries Active directory according provided parameters
    /// </summary>
    /// <param name="userName">UserName to be used to authenticate AD</param>
    /// <param name="password">Password to be used to authenticate to AD</param>
    /// <param name="adRoot">AD Root for querying AD</param>
    /// <param name="filter">Filter to be used for querying</param>
    /// <param name="searchScope">Scope to be used for queryingg</param>
    /// <param name="propertiesToLoad">List of properties to return</param>
    /// <param name="pageSize">Represents a PageSise for the paged search of AD</param>
    /// <param name="rowsLimit">Rrepresent limit for numbers of rows returned. NULL or value less than 1 represents unlimited</param>
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void QueryADUName(string userName, string password, string adRoot, string filter, string propertiesToLoad, string searchScope, int pageSize, SqlInt32 rowsLimit)
    {
        SearchAD(userName, password, null, adRoot, filter, searchScope, propertiesToLoad, pageSize, rowsLimit);
    }

    /// <summary>
    /// Queries Active directory according provided parameters
    /// </summary>
    /// <param name="userName">UserName to be used to authenticate AD</param>
    /// <param name="password">Password to be used to authenticate to AD</param>
    /// <param name="authType">Authentication type to be used to authenticate to AD</param>
    /// <param name="adRoot">AD Root for querying AD</param>
    /// <param name="filter">Filter to be used for querying</param>
    /// <param name="searchScope">Scope to be used for queryingg</param>
    /// <param name="propertiesToLoad">List of properties to return</param>
    /// <param name="pageSize">Represents a PageSise for the paged search of AD</param>
    /// <param name="rowsLimit">Rrepresent limit for numbers of rows returned. NULL or value less than 1 represents unlimited</param>
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void QueryADAuth(string userName, string password, string authType, string adRoot, string filter, string propertiesToLoad, string searchScope, int pageSize, SqlInt32 rowsLimit)
    {
        SearchAD(userName, password, authType, adRoot, filter, searchScope, propertiesToLoad, pageSize, rowsLimit);
    }

    private static DirectoryEntry GetRootEntry(string adRoot, string userName, string password, string authType)
    {
        if (userName == null)
            return new DirectoryEntry(adRoot);
        else if (authType == null)
            return new DirectoryEntry(adRoot, userName, password);
        else
        {
            AuthenticationTypes at;
            if (TryParseEnum<AuthenticationTypes>(authType, true, out at))
                return new DirectoryEntry(adRoot, userName, password, at);
            else
                throw new System.InvalidCastException(string.Format("authType must be one of '{0}'", GetEnumNames<AuthenticationTypes>()));
        }
    }

    private static string GetEnumNames<T>()
    {
        string[] names = Enum.GetNames(typeof(T));
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < names.Length; i++)
        {
            if (i > 0)
                sb.Append(',');
            sb.Append(names[i]);
        }
        return sb.ToString();
    }

    private static bool TryParseEnum<T>(string value, bool ignoreCase, out T outEnum)
    {
        try
        {
            outEnum = (T)Enum.Parse(typeof(T), value, ignoreCase);
            return true;
        }
        catch
        {
            outEnum = (T)Enum.GetValues(typeof(T)).GetValue(0);
            return false;
        }
    }

    /// <summary>
    /// Searches Active Directory according provided parameters
    /// </summary>
    /// <param name="userName">UserName to be used to authenticate AD</param>
    /// <param name="password">Password to be used to authenticate to AD</param>
    /// <param name="authType">Authentication type to be used to authenticate to AD</param>
    /// <param name="adRoot">AD Root for querying AD</param>
    /// <param name="filter">Filter to be used for querying</param>
    /// <param name="searchScope">Scope to be used for queryingg</param>
    /// <param name="propertiesToLoad">List of properties to return</param>
    /// <param name="pageSize">Represents a PageSise for the paged search of AD</param>
    /// <param name="rowsLimit">Rrepresent limit for numbers of rows returned. NULL or value less than 1 represents unlimited</param>
    private static void SearchAD(string userName, string password, string authType, string adRoot, string filter, string searchScope, string propertiesToLoad, int pageSize, SqlInt32 rowsLimit)
    {
        string[] properties = propertiesToLoad.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
        SqlMetaData[] recordMetaData = new SqlMetaData[properties.Length];
        SearchScope scope;

        int limit = rowsLimit.IsNull ? 0 : rowsLimit.Value;
        int rowsCount = 0;

        if (rowsLimit > 0 && pageSize > limit)
            pageSize = limit;

        if (!TryParseEnum<SearchScope>(searchScope, true, out scope))
            throw new System.InvalidCastException(string.Format("searchScope must be one of '{0}'", GetEnumNames<SearchScope>()));

        //Trim properties and prepare result set metadata, also process specified lengths
        for (int i = 0; i < properties.Length; i++)
        {
            string[] propDetails = properties[i].Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
            string propName = propDetails[0].Trim();
            int len;

            if (propDetails.Length < 2 || !int.TryParse(propDetails[1], out len) || len < 1 || len > 4000)
                len = 4000;

            recordMetaData[i] = new SqlMetaData(properties[i], System.Data.SqlDbType.NVarChar, len);
            properties[i] = propName;
        }

        //Get Root Directory Entry
        using (DirectoryEntry rootEntry = GetRootEntry(adRoot, userName, password, authType))
        {
            //Create a directory searcher with aproperiate filter, properties and search scope
            using (DirectorySearcher ds = new DirectorySearcher(rootEntry, filter, properties, scope))
            {
                ds.PageSize = pageSize; //Set Page Size - without this we will not do a paged search and we will be limiited to 1000 results

                //find all object from the rood, according the filter and search scope
                using (SearchResultCollection results = ds.FindAll())
                {
                    SqlDataRecord record = new SqlDataRecord(recordMetaData);
                    //Start pushing of records to client
                    SqlContext.Pipe.SendResultsStart(record);

                    foreach (SearchResult result in results)
                    {
                        record = new SqlDataRecord(recordMetaData);

                        for (int i = 0; i < properties.Length; i++)
                        {
                            ResultPropertyValueCollection props = result.Properties[properties[i]];

                            if (props.Count == 1)           //if property collection contains single vallue, set the record field to that value
                                record.SetSqlString(i, props[0].ToString());
                            else if (props.Count == 0)      //if property collection doesn't contain any value, set record field to NULL
                                record.SetSqlString(i, SqlString.Null);
                            else                            //In case of multiple value, separate the values by commas
                            {
                                StringBuilder sb = new StringBuilder();
                                bool firstItem = true;
                                foreach (object prop in props)
                                {
                                    if (!firstItem)
                                        sb.Append(',');
                                    else
                                        firstItem = false;

                                    sb.Append(prop.ToString());
                                }

                                record.SetSqlString(i, sb.ToString());
                            }
                        }

                        //send record to client
                        SqlContext.Pipe.SendResultsRow(record);

                        //if rowsLimit was reached, break the loop
                        if (++rowsCount == rowsLimit)
                            break;
                    }

                    //stop sending records to client
                    SqlContext.Pipe.SendResultsEnd();
                }
            }
        }
    }
}

The class contains three public methods (CLR Stored Procedures) QueryAD, QueryADUName and QueryADAuth. The first one will query AD using current user credentials and default authentication method, second will query AD using provided user credentials and default authentication method and in the third one we can specify also authentication method.

Those methods call a private method SearchAD, which takes care about the AD Searching and return the returns the result to client.

After the comment from Chris, I’ve updated my sample codes to have also the optional parameter pageSize which allow reduce the size of a Page for Paged Search used for querying the AD and avoid insufficient memory problems when querying higher amount of AD attributes.

I’ve also added a parameter rowsLimit which allows limit the maximum number of rows returned, which can be useful especially when querying very large AD. There is added a break into the loop which iterates the results from AD when the number of precessed imtes reach the limit.

UPDATE:

Finally I’ve also updated the private static void SearchAD method so now you pass the properties not as a comma separated list but a semicolon separated list. You can use a comma to specify the return length of each property. If the length is not specified, then the method will use a default 4000 characters length. This update should finally solve the issues with large number of properties returned when previously all were returned as nvarchar(4000)

The updated part of the code is the for loop which is processing properties and generating result set metadata, starting with the comment //Trim properties and prepare result set metadata, also process specified lengths

END OF UPDATE:

To be able to compile the code for example using Visual C# Express it is necessary to add reference to the System.DirectoryServices assembly in the project.

Once we compile the code and create say ADServices.dll assembly, we can register that assembly and CLR Stored procedures in our database.

Because the ActiveDirectory class is using System.DirectoryServices assembly, we will have to use UNSAFE PERMISSION_SET for our assembly and so the database using that assembly has to be TRUSTWORTHY.

ALTER DATABASE TestDB3 SET TRUSTWORTHY ON WITH ROLLBACK IMMEDIATE;

As we are using the System.DirectoryServices assembly, we have to register it in our database prior registering our assembly, otherwise we will not be able to register it.

--Register the NetFramework System.DirectoryServices assembly
CREATE ASSEMBLY [System.DirectoryServices]
AUTHORIZATION [dbo]
FROM 'C:WindowsMicrosoft.NETFrameworkv2.0.50727System.DirectoryServices.dll'
WITH PERMISSION_SET = UNSAFE
GO
--Register our assembly with ActiveDirectory CLR Stored Procedures
CREATE ASSEMBLY [ADServices]
AUTHORIZATION [dbo]
FROM 'C:CLRADServices.dll'
WITH PERMISSION_SET = UNSAFE
GO

Once we have successfully registered those assemblies, we can create the CLR Stored procedures in our DB.

--==========================================================================
-- @adRoot - root of searching eg. 'LDAP://OU=Sales,DC=Fabricam,DC=com'
-- @filter -  filter to be used for searching eg. '(&(objectCategory=group)'
-- @propertiesToLoad - list of properties to be retrieved eg. 'cn,50;ADsPath'
--                   - properties are separated by a semicolon and
--                   - and comma can be used to specify return length of the property
-- @searchScope - scope to be used for searching: {Base,OneLevel,Subtree}
-- @pageSize - specifies the PageSize for paged search - default is 1000
--               it is possible to lower the value if there is a problem 
--             retrieving such amount of records at once.
-- @rowsLimit - represents maximum number of rows returned.
--              NULL or value less than 1 represents unlimited
--==========================================================================
CREATE PROCEDURE usp_QueryAD
    @adRoot nvarchar(255),
    @filter nvarchar(255),
    @propertiesToLoad nvarchar(255),
    @searchScope nvarchar(8),
    @pageSize int = 1000,
    @rowsLimit int = 0
AS
EXTERNAL NAME [ADServices].[ActiveDirectory].[QueryAD]
GO
--==========================================================================
-- @userName - username to be used for authentificaiton to AD
-- @password - password to be used for authentification to AD
-- @adRoot - root of searching eg. 'LDAP://OU=Sales,DC=Fabricam,DC=com'
-- @filter -  filter to be used for searching eg. '(&(objectCategory=group)'
-- @propertiesToLoad - list of properties to be retrieved eg. 'cn,50;ADsPath'
--                   - properties are separated by a semicolon and
--                   - and comma can be used to specify return length of the property
-- @searchScope - scope to be used for searching: {Base,OneLevel,Subtree}
-- @pageSize - specifies the PageSize for paged search - default is 1000
--               it is possible to lower the value if there is a problem 
--             retrieving such amount of records at once.
-- @rowsLimit - represents maximum number of rows returned.
--              NULL or value less than 1 represents unlimited
--==========================================================================
CREATE PROCEDURE usp_QueryADUname
    @userName nvarchar(255),
    @password nvarchar(255),
    @adRoot nvarchar(255),
    @filter nvarchar(255),
    @propertiesToLoad nvarchar(255),
    @searchScope nvarchar(8),
    @pageSize int = 1000,
    @rowsLimit int = 0
AS
EXTERNAL NAME [ADServices].[ActiveDirectory].[QueryADUName]
GO
--==========================================================================
-- @userName - username to be used for authentificaiton to AD
-- @password - password to be used for authentification to AD
-- @authType - Authentification Type to be used for AD Authentification
--             {None,Secure,Encryption,SecureSocketsLayer,ReadonlyServer,Anonymous,
--             FastBind,Signing,Sealing,Delegation,ServerBind}
-- @adRoot - root of searching eg. 'LDAP://OU=Sales,DC=Fabricam,DC=com'
-- @filter -  filter to be used for searching eg. '(&(objectCategory=group)'
-- @propertiesToLoad - list of properties to be retrieved eg. 'cn,50;ADsPath'
--                   - properties are separated by a semicolon and
--                   - and comma can be used to specify return length of the property
-- @searchScope - scope to be used for searching: {Base,OneLevel,Subtree}
-- @pageSize - specifies the PageSize for paged search - default is 1000
--               it is possible to lower the value if there is a problem 
--             retrieving such amount of records at once.
-- @rowsLimit - represents maximum number of rows returned.
--              NULL or value less than 1 represents unlimited
--==========================================================================
CREATE PROCEDURE usp_QueryADAuth
    @userName nvarchar(255),
    @password nvarchar(255),
    @authType nvarchar(20),
    @adRoot nvarchar(255),
    @filter nvarchar(255),
    @propertiesToLoad nvarchar(255),
    @searchScope nvarchar(8),
    @pageSize int = 1000,
    @rowsLimit int = 0
AS
EXTERNAL NAME [ADServices].[ActiveDirectory].[QueryADAuth]
GO

After successful creation of the stored procedures we  can start querying the AD: The filter parameter is a LDAP filter according the Search Filter Syntax.

--Query AD Groups
--Result set will contain cn - common name, ADsPath
EXECUTE [usp_QueryAD] 
   @adRoot = 'LDAP://DC=Fabricam,DC=com'
  ,@filter = '(&(objectCategory=group))'
  ,@propertiesToLoad = 'cn,50;ADsPath'
  ,@searchScope = 'subtree'
GO

--Query users of Sales organization unit
--Result set will containt sn - surname, cn - common name and ADsPath
EXECUTE [usp_QueryAD] 
   @adRoot = 'LDAP://OU=Sales,DC=Fabricam,DC=com'
  ,@filter = '(&(objectCategory=user)(objectClass=user))'
  ,@propertiesToLoad = 'sn,cn,ADsPath'
  ,@searchScope = 'subtree'
GO

--Query al users belonging to Group "Sales Representatives" and providing user name and password 
--for AD Authetication including authetication type
--Result set will containt sn - surname, cn - common name and ADsPath
EXECUTE [TestDB3].[dbo].[usp_QueryADAuth] 
   @userName = 'FabricamADUserName'
  ,@password = 'ADuserNamePassword'
  ,@authType = 'secure'
  ,@adRoot = 'LDAP://DC=Fabricam,DC=com'
  ,@filter = '(&(objectCategory=user)(objectClass=user)(memberOf=CN=Sales Representatives,OU=Security Groups,OU=Groups,DC=Fabricam,DC=com))'
  ,@propertiesToLoad = 'sn,100;cn,200;ADsPath,1000'
  ,@searchScope = 'subtree'
GO

As we use a stored procedure for querying AD, then we cannot work directly with the result further (OK.. On Denali it will be possible thanks to the EXECUTE WITH RESULT SETS. On SQL Server 2005 and 2008 we could store the results e.g. to table variable to temp table and then work with the results as normally. From the CLR code we can see, that the result set contains all the AD properties we have passed as parameter and the order is exactly the same as in the input parameter.  The data type of each returned column is nvarchar with length which was specified in the properties list. If no length was specified or the length was less than one or grater than 4000 then the return type is nvarchar(4000).

--Query All users from AD, store them in a table variable and then
--select all users with their cn (common name) starting by "C"
DECLARE @result TABLE (
    sn nvarchar(100),
    cn nvarchar(200),
    ADsPath nvarchar(1000)
)

INSERT INTO @result
EXECUTE [TestDB3].[dbo].[usp_QueryADAuth] 
   @userName = 'FabricamADUserName'
  ,@password = 'ADuserNamePassword'
  ,@authType = 'secure'
  ,@adRoot = 'LDAP://DC=Fabricam,DC=com'
  ,@filter = '(&(objectCategory=user)(objectClass=user))'
  ,@propertiesToLoad = 'sn,100;cn,200;ADsPath,1000'
  ,@searchScope = 'subtree'

SELECT 
*
FROM @result
WHERE cn LIKE 'C%'

From the examples above we can see, that once we create CLR stored procedures for querying the AD, the queries to AD are quite easy. Comparing the the T-SQL and Linked Server or OPENROWSET solution we have much greater possibilities and what is most important, we are not limited to 1000 results from our query, so we can easily query all the object in AD.

If you are interested, you can download sample ADServices Visual C# 2010 Express project. It contains all the latest updates I have mentioned.

Here you can download an updated sample PPSqlDirectoryServices Visual C# 2010 project with all the lasted updates implemented.

Querying Active Directory on SQL Server using T-SQL

You may come to a situation when you need to retrieve list of users, groups or other information from Windows Active Directory (AD) or another LDAP (Lightweight Directory Access Protocol) from within SQL Server.  You may want to store the information from AD in SQL Server tables for later use, or for example determine list of users belonging to particular group etc..

There are basically 3 ways of accessing the Active Directory on SQL Server.

  1. Use Linked Server
  2. User OPENROWSET statement
  3. Using CLR

This article will focus on Querying the Active Directory using the first 2 options. The CLR way is describe in my next article Querying Active Directory on SQL Server using CLR.

Access Active Directory using Linked Server

First I will focus on the Linked Server way to access the AD. So the first step is to create a Linked Server. Tis can be done using the sp_addlinkedserver system stored procedure or using Management Studio and we will use the OLE DB Provider for Microsoft Directory Services (ADSDSOObject).

Creating the Linked Server using the sp_addlinkedserver system stored procedure

To get the linked server working we need to add the linked server itself and also provide credentials to be used by linked server. Here we have two possibilities – use the login’s current security context – this mean use credentials of currently authenticated user (will not work for SQL Server authentication) or provide some AD account credentials to access the linked server.

EXEC sp_addlinkedserver @server = N'ADSI', @srvproduct=N'Active Directory Service Interfaces', @provider=N'ADSDSOObject', @datasrc=N'adsdatasource'

Prior we can use the linked server we have to add a login to that linked server which will be used to authenticate against AD. There are three possibilities:

--Use current user security context (Linked server will use security context of currently autheticated user
EXEC sp_addlinkedsrvlogin @rmtsrvname=N'ADSI',@useself=N'True',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL

--Use one remote account for all users (Linked server will be available to all users) and will use 'DOMAINUserAccount' for authetication to AD
EXEC sp_addlinkedsrvlogin @rmtsrvname=N'ADSI',@useself=N'False',@locallogin=NULL,@rmtuser=N'DOMAINUserAccount',@rmtpassword=N'DomainAccountPassword'

--The Linked Server will be available to SQL Server Login 'SQL Server Login Name' and will use the 'DomainUserAccount' for authetication to AD
EXEC sp_addlinkedsrvlogin @rmtsrvname=N'ADSI',@useself=N'False',@locallogin=N'SQL Server Login Name',@rmtuser=N'DOMAINUserAccunt',@rmtpassword=N'DomainAccountPassword'
Creating the Linked Server using Management Studio

Create Linked Server Step 1Create Linked Server Step 2Create Linked Server Step 3

Querying AD using the Linked Server

Once we have created the linked server we can start querying the AD and we have two possibilities to construct the query against AD (LDAP).

  1. Use SQL Dialect (MSDN)
  2. Use LDAP Dialect (MSDN)

I will show some sample queries using both dialect and details you can find on the above links.

SQL Dialect

SQL Dialect uses the constructs of SQL language and we can write queries against the AD

--Select all groups
SELECT
*
FROM OPENQUERY(ADSI,'SELECT cn, ADsPath FROM ''LDAP://DC=Fabricam,DC=com'' WHERE objectCategory=''group''')

--Select all users
SELECT
*
FROM OPENQUERY(ADSI,'SELECT cn, ADsPath FROM ''LDAP://OU=Sales,DC=Fabricam,DC=com'' WHERE bjectCategory=''person'' AND objectClass=''user''')

Be aware, the the LDAP is case sensitive, so you have to write LDAP:// and not ldap:// otherwise you will not receive any results. Also remember that the LDAP path needs to be enclosed in single quotes and therefore when it is specified in the OPENQUERY statement, it needs to be double single quoted as in example.

Also remember, that LDAP query returns only first 1000 records matching the specified conditions. Once the limit is met, you will receive and error message. You can limit the number of records using the TOP clause to not receive the error. To receive more than 1000 records is possible by using pagination, but it’s quite problematic in T-SQL and I will show this method in the upcoming article which will focus on getting AD data using CLR.

LDAP Dialect

As mentioned above, the other option to make query against the AD is using the LDAP dialect and details about the dialect you can find on the link above. Details how you write the condition in LDAP dialect you can find on MSDN Search Filter Syntax. Using the LDAP dialect we have a greater possibilities in writing conditions, which allow us easily retrieve members of AD Groups etc..

--Select all groups
SELECT
*
FROM OPENQUERY(ADSI,'<LDAP://DC=Fabricam,DC=com>;(&(objectCategory=group));cn,ADsPath;subtree')

--Select all users
SELECT
*
FROM OPENQUERY(ADSI,'<LDAP://OU=Sales,DC=Fabricam,DC=com>;(&(objectCategory=person)(objectClass=user));cn,ADsPath;subtree')

--Select all users belonging to group 'MyGroup'
SELECT
*
FROM OPENQUERY(ADSI,'<LDAP://DC=Fabricam,DC=com>;(&(objectCategory=user)(objectClass=user)(memberOf=MyGroups,OU=Security Groups,OU=Groups,DC=Fabricam,DC=com));cn,ADsPath;subtree')

Access Active Directory using OPENROWSET statement

The method of accessing Active Directory using the OPENROWSET statement is similar to the Linked Server solution. The difference is, that we do not have to create the linked server, but instead of this we provide all the information related to driver, login etc. directly as parameters of the OPENROWSET statement.

Again we have the possibilities to use SQL Dialect or LDAP dialect whatever suits your needs better. Here are some Examples.

--Select all Groups which name starts with "D" (SQL Dialect)
SELECT
*
FROM OPENROWSET('ADSDSOObject', '','SELECT cn, ADsPath FROM ''LDAP://DC=Fabricam,DC=com'' WHERE objectCategory=''group'' AND cn=''D*''')


--Select all users (LDAP Dialect) providing also credentials and flags
SELECT
*
FROM OPENROWSET('ADSDSOObject', 'User ID=UserName;Password=UserPassword;ADSI Flag=0x11;','<LDAP://OU=Sales,DC=Fabricam,DC=com>;(&(objectCategory=user)(objectClass=user));cn,ADsPath;subtree')

In the second example with LDAP Dialect I’ve used also an “ADSI Flag” option. This option specifies binding authentication option and can be a combination of values from the ADS_AUTHENTICATION_ENUM.

Conclusion

The ADSI interface provides us an easy and simple way how to query Active Directory from SQL Server directly sing T-SQL commands. The biggest limitation of that approach is, that it cannot return more than 1000 records in a single batch. This is possible using a Paged Search, but unfortunately this is not available in the T-SQL approach. Although we are limited to 1000 records per batch, we can partially avoid this limitation by adding additional conditions to the query e.g. retrieving all users in batches by the first letter of Common Name etc.. The real power in querying AD brings CLR solution described in Querying Active Directory on SQL Server using CLR.

Reading SharePoint List and Survey data using Integration Services package

Sometimes you can come to situation when you need to read data from SharePoint and store them in database on SQL Server.

If you need to read a SharePoint List, then you can use SharePoint List Source and Destination components from CodePlex. There is a quite good presentation how to use it on MSDN site Extracting and Loading SharePoint Data in SQL Server Integration Services

These components work very well for the SharePoint List. Unfortunately you will have no success using them for reading data from the SharePoint Survey, although the structure of the data are nearly the same as for list. Simply these components doesn’t support Survey data. Also it can happen, that you cannot install any third party component you your server and therefore you cannot use the SharePoint List Source and Destination components.

In that case you can use a script which will extract the XML data out of the SharePoint and you have two possibilities how to handle the XML.

  1. You can save the XML file to some temporary location and then process the XML file using the XML Source in the Data Flow task to read data from it.
  2. You can process the XML inside the Script Component in the Data Flow task and direct records to corresponding output.

To retrieve the data you can use a below class snippet (C#), which you put into your Script Component or Scrip Task.

internal class SharePointDownloader
{
    public static XmlDocument GetSharePointListXml(string listUrl)
    {
        ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(CertificateOverride.RemoteCertificateValidationCallback);
        WebResponse response;
        HttpWebRequest request = (HttpWebRequest)System.Net.HttpWebRequest.Create(listUrl);
        CookieContainer cookieContainer = new CookieContainer();

        request.CookieContainer = cookieContainer;
        request.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials;

        response = request.GetResponse();
        string html;
        using (StreamReader sr = new StreamReader(response.GetResponseStream()))
        {
            html = sr.ReadToEnd();
            sr.Close();
        }

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(html);

        XmlDocument destXml = new XmlDocument();
        XmlDeclaration dec = destXml.CreateXmlDeclaration("1.0", null, null);
        destXml.AppendChild(dec);
        XmlNode data = destXml.CreateElement("Data");
        destXml.AppendChild(data);

        foreach (XmlNode node in doc.GetElementsByTagName("z:row"))
        {
            XmlNode row = destXml.CreateElement("row");
            foreach (XmlAttribute attr in node.Attributes)
            {
                XmlAttribute da = destXml.CreateAttribute(attr.Name);
                da.Value = attr.Value;
                row.Attributes.Append(da);
            }
            data.AppendChild(row);
        }

        return destXml;
    }

    internal class CertificateOverride
    {
        public static bool RemoteCertificateValidationCallback(object sender, X509Certificate certificate, X509Chain chaing, SslPolicyErrors sslPolicyErrors)
        {
            return true;
        }
    }
}

As mentioned above put this class as part of the Script Component or Script Task and use the static method SharePointDownloader.GetSharePointListXml method to retrieve the XML data.

The function takes as argument an URL which points to the list/survey you need to download. You can retrieve this link from the .igy file which you get, when you select Export To Spreadsheet action in the SharePoint. Instead opening the file in excel you choose to save it and retrieve the URL from this file.

It is necessary to mention the assignation of the ServerCertificateValidationCallback property of the ServicePointManager class to a newly created internal class, which simply returns true. This is necessary to avoid problems with certificates.

The code snippet also uses the DefaultNetworkCredentials which represents credentials of the account under which the SSIS package will be executed. You can eventually provide other credentials if necessary.

One very important thing related to this snipped is, that in case of downloading Survey data, the account under which the code will be executed needs admin rights to the Survey in SharePoint, otherwise it will have no access to the complete list of responses in the survey.

So finally a complete sample of a Script Task could look similar to below code:

[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{

    #region VSTA generated code
    enum ScriptResults
    {
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    };

    #endregion

    public void Main()
    {

        string url = "source data URL retrieved from the .igy file of the list/survey";
        string destFile = "c:tempfile.xml";
        try
        {
            XmlDocument data = SharePointDownloader.GetSharePointListXml(url);
            data.Save(destFile);
        }
        catch
        {
            Dts.TaskResult = (int)ScriptResults.Failure;
            return;
        }
        Dts.TaskResult = (int)ScriptResults.Success;
    }
}

internal class SharePointDownloader
{
    public static XmlDocument GetSharePointListXml(string listUrl)
    {
        ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(CertificateOverride.RemoteCertificateValidationCallback);
        WebResponse response;
        HttpWebRequest request = (HttpWebRequest)System.Net.HttpWebRequest.Create(listUrl);
        CookieContainer cookieContainer = new CookieContainer();

        request.CookieContainer = cookieContainer;
        request.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials;

        response = request.GetResponse();
        string html;
        using (StreamReader sr = new StreamReader(response.GetResponseStream()))
        {
            html = sr.ReadToEnd();
            sr.Close();
        }

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(html);

        XmlDocument destXml = new XmlDocument();
        XmlDeclaration dec = destXml.CreateXmlDeclaration("1.0", null, null);
        destXml.AppendChild(dec);
        XmlNode data = destXml.CreateElement("Data");
        destXml.AppendChild(data);

        foreach (XmlNode node in doc.GetElementsByTagName("z:row"))
        {
            XmlNode row = destXml.CreateElement("row");
            foreach (XmlAttribute attr in node.Attributes)
            {
                XmlAttribute da = destXml.CreateAttribute(attr.Name);
                da.Value = attr.Value;
                row.Attributes.Append(da);
            }
            data.AppendChild(row);
        }

        return destXml;
    }

    internal class CertificateOverride
    {
        public static bool RemoteCertificateValidationCallback(object sender, X509Certificate certificate, X509Chain chaing, SslPolicyErrors sslPolicyErrors)
        {
            return true;
        }
    }
}

Once you save the data to a temporary destination, you can connect to the xml file using the XML Source in the Data Flow Task. What you will need is to generate the XSD schema from the XML source and keep that XSD schema for future use as in case of any other XML file which needs to be imported by SSIS.

If you would like to process the XML in the Script component, again you simply put the code inside the Script component, retrieve the XML and then go through the XML programmatically and direct the data to appropriate output of the script component.

Of course you can make some fine tuning for the script component to avoid double processing of the file use the core of the GetSharePointListXML and redirect the data directly to and output of the script component when processing the nodes in that part.

Querying agent job status, executing and waiting for job completion from within T-SQL

In this article I would like to describe how easily you can query an Agent Job status from within T-SQL command and also how it is possible to execute an agent job and wait for it’s completion.

It can happen, that you do not have access to the Job Activity Monitor and would like to check the status of particular job or simply you would like to query the status to know whether the job finished or not.

For this purposes you can crate a very simple function in the [msdb].

USE [msdb]
GO
-- ============================================= 
-- Author:      Pavel Pawlowski 
-- Create date: 2009/08/14 
-- Description: Gets state of particular Job 
-- 
-- -2 = Job was not Found 
-- -1 = Job is Disabled 
--  0 = Failed 
--  1 = Succeeded 
--  2 = Retry 
--  3 = Canceled 
--  4 = In progress 
--  5 = Disabled 
--  6 = Idle
-- ============================================= 
CREATE FUNCTION [dbo].[fn_GetJobStatus] ( 
    @pJobName varchar(100) 
) 
RETURNS int 
AS 
BEGIN 
    DECLARE @status int     

    SELECT 
        @status = CASE 
            WHEN O.enabled = 0 THEN -1
            WHEN OA.run_requested_date IS NULL THEN 6
            ELSE ISNULL(JH.RUN_STATUS, 4)
        END        
    FROM MSDB.DBO.SYSJOBS O 
    INNER JOIN MSDB.DBO.SYSJOBACTIVITY OA ON (O.job_id = OA.job_id) 
    INNER JOIN (SELECT MAX(SESSION_ID) AS SESSION_ID FROM MSDB.DBO.SYSSESSIONS ) AS S ON (OA.session_ID = S.SESSION_ID) 
    LEFT JOIN MSDB.DBO.SYSJOBHISTORY JH ON (OA.job_history_id = JH.instance_id) 
    WHERE O.name = @pJobName 

    RETURN ISNULL(@status, -2) 
END 
GO

This function queries several tables from [msdb] to retrieve the status and takes a job name as parameter. Return codes are described in the T-SQL above and codes 2 and 4 are representing a job, which is currently running. Where 2 represents job, which is running in retry mode after previous failure.

This function we can use for writing a procedure, which will allow us to wait for it’s completion.

USE [msdb]
GO
-- ============================================= 
-- Author:      Pavel Pawlowski 
-- Create date: 2009/08/14 
-- Description: Waits for a Job Completion 
--
-- @pStatusRequestFrequence defines delay between requesting status of the Job in seconds. 
--
--Returns values: 
-- -2 = Job was not Found 
-- -1 = Job is Disabled 
--  0 = Failed 
--  1 = Succeeded 
--  2 = Retry 
--  3 = Canceled 
--  4 = In progress 
--  5 = Disabled
--  6 = Idle
-- ============================================= 
CREATE PROCEDURE [dbo].[usp_WaitForJob]  
    @pJobName varchar(100),  
    @pStatusRequestFrequency tinyint = 5,
    @pWaitInitialIdle bit = 1
AS 
BEGIN 
    SET NOCOUNT ON; 
    DECLARE @hours int 
    DECLARE @mins int 
    DECLARE @delay varchar(8) 
    DECLARE @status int 

    SET @hours = ROUND(@pStatusRequestFrequency / 3600, 0, 1) 
    SET @pStatusRequestFrequency = @pStatusRequestFrequency - (@hours * 3600) 
    SET @mins = ROUND(@pStatusRequestFrequency / 60, 0, 1) 
    SET @pStatusRequestFrequency = @pStatusRequestFrequency - (@mins * 60) 

    SET @delay = RIGHT('00' + CONVERT(varchar(2), @hours), 2) + ':' + RIGHT('00' + CONVERT(varchar(2), @mins), 2) + ':' + RIGHT('00' + CONVERT(varchar(2), @pStatusRequestFrequency), 2)

    SET @status = 4 
    WHILE (@status IN (2, 4)) 
    BEGIN 
        SET @status = dbo.fn_GetJobStatus(@pJobName) 
        IF (@status IN (2, 4) OR (@status = 6 AND @pWaitInitialIdle = 1) )
        BEGIN
            WAITFOR DELAY @delay
            SET @pWaitInitialIdle = 0
        END
    END 

    RETURN @status 
END

The procedure executes periodically the above function to get execution status of the job. and when it detects that it is not running, it returns the status of the job. The procedure takes again a job name as parameter and additional two parameters. One to specify the delays between the check for the execution status (by default 5) and an argument specifying whether wait for completion even querying for the first time and the first return value was “Idle”. It can happen, when you start the job and immediately query status, that the state is not reflected in the [msdb] system tables.

Finally at the end we can create a stored procedure for executing the job itself. It will use the previous stored procedure for waiting and the function for querying the status prior execution of the job.

-- ============================================= 
-- Author:      Pavel Pawlowski 
-- Create date: 2009/08/14 
-- Description: Runs Particular Job 
-- Start Options: 
--  0 = if job is already running, do not run job and finish 
--  1 = if job is already running, wait for completion and then run it again 

-- Return Options: 
--  0 = Start Job and wait for job completion 
--  1 = Start Job and return from procedure 

--Returns values: 
-- -2 = Job was not Found 
-- -1 = Job is Disabled 
--  0 = Failed 
--  1 = Succeeded 
--  2 = Retry 
--  3 = Canceled 
--  4 = In progress 
--  5 = Disabled
--  6 = Idle

-- @pStatusRequestFrequence defines delay between requesting status of the Job in seconds. 
-- ============================================= 
CREATE PROCEDURE [dbo].[usp_RunJob]  
    @pJobName varchar(100),  
    @pStartOption int = 0, 
    @pReturnOption int = 0, 
    @pStatusRequestFrequency tinyint = 5
AS 
BEGIN 
    SET NOCOUNT ON; 
    DECLARE @status int 

    SET @status = dbo.fn_GetJobStatus(@pJobName) 

    IF (@status IN (2, 4) AND @pStartOption = 1) 
        EXEC @status = dbo.usp_WaitForJob @pJobName, @pStatusRequestFrequency 

    IF (@status IN (0, 1, 3, 6)) 
    BEGIN 
        EXEC MSDB.dbo.sp_start_job @pJobName 

        IF (@pReturnOption <> 1) 
            EXEC @status = dbo.usp_WaitForJob @pJobName, @pStatusRequestFrequency 
        ELSE 
            SET @status = dbo.fn_GetJobStatus(@pJobName) 
    END 

    RETURN @status 
END

This procedure again takes several parameter including the job name to be executed. It has also option whether to wait for the job completion or not and how to handle situation if the job is already running when this procedure is executed.

You can use this mechanism also e.g. for executing Integration Services Package from within T-SQL. You only need to create a job for that package, which will have no schedule. Then using the usp_RunJob you start the job, which takes care about execution of the package itself.

Cloning user rights in database

update: Check new post SQL Server – Cloning User Rights – updated sp_CloneRights on GitHub

Some times it could happen, that you need to create a new database user, which will have exactly the same right as another existing user.

In ideal scenario, you will have all the necessary rights assigned to a database roles, and then when you create the new user, you simply add the user to appropriate roles to grant all the necessary rights.

This is ideal scenario, which is not always met, especially when you have to manage the server after somebody else, who didn’t used roles for granting rights.

In such scenario a below system stored stored procedure can be very handful.

USE [master]
GO
--============================================
-- Author:      Pavel Pawlowski
-- Created:     2010/04/16
-- Description: Copies rights of old user to new user
--==================================================
CREATE PROCEDURE sp_CloneRights (
    @oldUser sysname, --Old user from which to copy right
    @newUser sysname, --New user to which copy rights
    @printOnly bit = 1, --When 1 then only script is printed on screen, when 0 then also script is executed, when NULL, script is only executed and not printed
    @NewLoginName sysname = NULL --When a NewLogin name is provided also a creation of user is part of the final script
)
AS
BEGIN
    SET NOCOUNT ON

    CREATE TABLE #output (
        command nvarchar(4000)
    )

    DECLARE
        @command nvarchar(4000),
        @sql nvarchar(max),
        @dbName nvarchar(128),
        @msg nvarchar(max)

    SELECT
        @sql = N'',
        @dbName = QUOTENAME(DB_NAME())

    IF (NOT EXISTS(SELECT 1 FROM sys.database_principals where name = @oldUser))
    BEGIN
        SET @msg = 'Source user ' + QUOTENAME(@oldUser) + ' doesn''t exists in database ' + @dbName
        RAISERROR(@msg, 11,1)
        RETURN
    END    

    INSERT INTO #output(command)
    SELECT '--Database Context' AS command UNION ALL
    SELECT    'USE' + SPACE(1) + @dbName UNION ALL
    SELECT 'SET XACT_ABORT ON'

    IF (ISNULL(@NewLoginName, '') <> '')
    BEGIN
        SET @sql = N'USE ' + @dbName + N';
        IF NOT EXISTS (SELECT 1 FROM sys.database_principals WHERE name = @newUser)
        BEGIN
            INSERT INTO #output(command)
            SELECT ''--Create user'' AS command

            INSERT INTO #output(command)
            SELECT
                ''CREATE USER '' + QUOTENAME(@NewUser) + '' FOR LOGIN '' + QUOTENAME(@NewLoginName) +
                    CASE WHEN ISNULL(default_schema_name, '''') <> '''' THEN '' WITH DEFAULT_SCHEMA = '' + QUOTENAME(dp.default_schema_name)
                        ELSE ''''
                    END AS Command
            FROM sys.database_principals dp
            INNER JOIN sys.server_principals sp ON dp.sid = sp.sid
            WHERE dp.name = @OldUser
        END'

        EXEC sp_executesql @sql, N'@OldUser sysname, @NewUser sysname, @NewLoginName sysname', @OldUser = @OldUser, @NewUser = @NewUser, @NewLoginName=@NewLoginName
    END

    INSERT INTO #output(command)
    SELECT    '--Cloning permissions from' + SPACE(1) + QUOTENAME(@OldUser) + SPACE(1) + 'to' + SPACE(1) + QUOTENAME(@NewUser)

    INSERT INTO #output(command)
    SELECT '--Role Memberships' AS command

    SET @sql = N'USE ' + @dbName + N';
    INSERT INTO #output(command)
    SELECT ''EXEC sp_addrolemember @rolename =''
        + SPACE(1) + QUOTENAME(USER_NAME(rm.role_principal_id), '''''''') + '', @membername ='' + SPACE(1) + QUOTENAME(@NewUser, '''''''') AS command
    FROM    sys.database_role_members AS rm
    WHERE    USER_NAME(rm.member_principal_id) = @OldUser
    ORDER BY rm.role_principal_id ASC'

    EXEC sp_executesql @sql, N'@OldUser sysname, @NewUser sysname', @OldUser = @OldUser, @NewUser = @NewUser

    INSERT INTO #output(command)
    SELECT '--Object Level Permissions'

    SET @sql = N'USE ' + @dbName + N';
    INSERT INTO #output(command)
    SELECT    CASE WHEN perm.state <> ''W'' THEN perm.state_desc ELSE ''GRANT'' END
        + SPACE(1) + perm.permission_name + SPACE(1) + ''ON '' + QUOTENAME(SCHEMA_NAME(obj.schema_id)) + ''.'' + QUOTENAME(obj.name)
        + CASE WHEN cl.column_id IS NULL THEN SPACE(0) ELSE ''('' + QUOTENAME(cl.name) + '')'' END
        + SPACE(1) + ''TO'' + SPACE(1) + QUOTENAME(@NewUser) COLLATE database_default
        + CASE WHEN perm.state <> ''W'' THEN SPACE(0) ELSE SPACE(1) + ''WITH GRANT OPTION'' END
    FROM    sys.database_permissions AS perm
        INNER JOIN
        sys.objects AS obj
        ON perm.major_id = obj.[object_id]
        INNER JOIN
        sys.database_principals AS usr
        ON perm.grantee_principal_id = usr.principal_id
        LEFT JOIN
        sys.columns AS cl
        ON cl.column_id = perm.minor_id AND cl.[object_id] = perm.major_id
    WHERE    usr.name = @OldUser
    ORDER BY perm.permission_name ASC, perm.state_desc ASC'

    EXEC sp_executesql @sql, N'@OldUser sysname, @NewUser sysname', @OldUser = @OldUser, @NewUser = @NewUser

    INSERT INTO #output(command)
    SELECT N'--Database Level Permissions'

    SET @sql = N'USE ' + @dbName + N';
    INSERT INTO #output(command)
    SELECT    CASE WHEN perm.state <> ''W'' THEN perm.state_desc ELSE ''GRANT'' END
        + SPACE(1) + perm.permission_name + SPACE(1)
        + SPACE(1) + ''TO'' + SPACE(1) + QUOTENAME(@NewUser) COLLATE database_default
        + CASE WHEN perm.state <> ''W'' THEN SPACE(0) ELSE SPACE(1) + ''WITH GRANT OPTION'' END
    FROM    sys.database_permissions AS perm
        INNER JOIN
        sys.database_principals AS usr
        ON perm.grantee_principal_id = usr.principal_id
    WHERE    usr.name = @OldUser
    AND    perm.major_id = 0
    ORDER BY perm.permission_name ASC, perm.state_desc ASC'

    EXEC sp_executesql @sql, N'@OldUser sysname, @NewUser sysname', @OldUser = @OldUser, @NewUser = @NewUser

    DECLARE cr CURSOR FOR
        SELECT command FROM #output

    OPEN cr

    FETCH NEXT FROM cr INTO @command

    SET @sql = ''

    WHILE @@FETCH_STATUS = 0
    BEGIN
        IF (@printOnly IS NOT NULL)
            PRINT @command

        SET @sql = @sql + @command + CHAR(13) + CHAR(10)
        FETCH NEXT FROM cr INTO @command
    END

    CLOSE cr
    DEALLOCATE cr

    IF (@printOnly IS NULL OR @printOnly = 0)
        EXEC (@sql)

    DROP TABLE #output
END
GO
EXECUTE sp_ms_marksystemobject 'dbo.sp_CloneRights'
GO

The stored procedure allows copying all the objects and database rights from the old user to a new one. It also clones roles membership for the user.

If the @NewLoginName is specified then then it also creates the @newUser in the database for the login specified and then copies the rights.

@printOnly specifies whether the script should be printed, executed automatically or both printed and executed automatically.

As the system is marked as system, it executes in the context of the database in which is executed.

It also allows copying rights among database roles. If you specify as @oldUser a database or application role name, then the rights of that role will be copied to the @newUser. Again the @newUser can be a user name or database/application role name.

For example if you have a Integration services installed and invoke a below script

USE [msdb]
GO
EXEC sp_CloneRights 'db_ssisadmin', 'NewUser'

you will receive a below script for assigning rights.

--Database Context
USE [msdb]
SET XACT_ABORT ON
--Cloning permissions from [db_ssisadmin] to [NewUser]
--Role Memberships
--Object Level Permissions
GRANT DELETE ON [dbo].[sysssislog] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_get_dtsversion] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_make_dtspackagename] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_add_dtspackage] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_drop_dtspackage] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_reassign_dtspackageowner] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_get_dtspackage] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_addlogentry] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_listpackages] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_enum_dtspackages] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_listfolders] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_deletepackage] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_deletefolder] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_getpackage] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_getfolder] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_putpackage] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_checkexists] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_addfolder] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_renamefolder] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_setpackageroles] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_ssis_getpackageroles] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_log_dtspackage_begin] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_log_dtspackage_end] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_log_dtsstep_begin] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_log_dtsstep_end] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_log_dtstask] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_enum_dtspackagelog] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_enum_dtssteplog] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_enum_dtstasklog] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_dump_dtslog_all] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_dump_dtspackagelog] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_dump_dtssteplog] TO [NewUser]
GRANT EXECUTE ON [dbo].[sp_dump_dtstasklog] TO [NewUser]
GRANT INSERT ON [dbo].[sysssislog] TO [NewUser]
GRANT REFERENCES ON [dbo].[sysssislog] TO [NewUser]
GRANT SELECT ON [dbo].[sysssislog] TO [NewUser]
GRANT UPDATE ON [dbo].[sysssislog] TO [NewUser]
--Database Level Permissions

Hope you will find this script useful and hope it will save you a lot of work when cloning rights.

It saved me several times, when I come to an existing database with several hundreds of tables with rights assigned on the object level and I had to introduce a new user with exactly the same rights as an existing one.

One thing needs to be mentioned for end. The procedure clones only object and database right. It doesn’t clone right for system objects, assemblies etc., but you can easily extend the procedure to cover also this this.

This procedure is inspired by  a script I found in past somewhere on internet.

Automating CRUD procedures generation using T-SQL

CRUD procedures represent a good way how to secure the database. When using CRUD procedures, no user has direct access to the database tables and instead of this is forced to use parameterized procedures for all data manipulations called CRUD procedures. CRUD is acronym for Create Read Update Delete.

CRUD procedures have a lot of advantages from which I can name several:

  • You do not need to grant rights directly to tables and in this way you hide the physical tables structure to users.
  • You can encapsulate some logic into the procedures and then when something needs to be changed, you simply modify a procedure and do not need to rebuild the application.
  • You hide the SQL logic encapsulated in stored procedures from end user.
  • You protect your SQL server against SQL injection as all data using the CRUD procedures has to be passed as parameters and in this way you drastically decrease a chance for the SQL injection.

The main disadvantage of CRUD procedures is the overhead which SQL developer has to do as for each table or group of several tables it is necessary separate stored procedures for the data manipulation.

There are several tools and plugins into visual studio, which simplifies creation of the CRUD procedures, but in this article I would like to show a possible way how to automate creation of the CRUD procedures using pure T-SQL.

The automation will be handled by a custom system stored procedure, which generate all the necessary code or directly create all the CRUD procedures for a database table.

This article describes how to crate a system stored procedure sp_makeCRUD, which will take 4 parameters @objectName, @executionMode, @dropExistingProcedures and @outputIdentityCalcField. The @objectName represents a table for which the CRUD procedures will be generated, @executionMode will represents execution mode of the stored procedure and will control whether the code of CRUD procedures will be only printed on scree or whether the generated code will be automatically executed and the CRUD procedures will created. @dropExistingProcedures parameter will specify whether include DROP statement for all procedures to drop eventual existing procedures and @outputIdentityCalcField will control, whether Identity and Calculated fields will be automatically returned as record set for ISERT and UPDATE operations.

CRUD procedures construction

To create a CRUD procedures we need to know all the fields in the table, whether the fields are Identity or calculated fields and also we need to know what fields the primary key is composed from. Also table name and it’s schema will be needed as user can specify the object name without schema or with schema. The CRUD procedures will be created in the same schema as the original table. To retrieve such information various system tables will be used.

Retrieving Object ID, Table Name, Schema Name and DB Name

To retrieve Object ID (ID of the table) and DB Name we will use the OBJECT_ID() and DB_NAME() functions.

SELECT
    @objID = OBJECT_ID(@objectName),
    @dbName = DB_NAME()

Table name and schema name, could be retrieved using system tables sys.objects and sys.schemas.

SELECT
    @schemaName = s.name,
    @tableName = o.name
FROM sys.objects o
INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
WHERE o.object_id = @objId AND o.type = 'U'

Retrieving Table Field and Primary Key

After the object ID is retrieved, it is necessary to retrieve all the table fields, their data types and information whether the field is an IDENTITY field or whether it is Calculated Field. This information is necessary for generating INSERT and UPDATE statements as such fields are not updatable. Also it is necessary to identify fields which are part of the primary key. This fields are necessary for the UPDATE, DELETE and also for SELECT One statements. Fields will be temporarily stored in table variables as they will be accessed several times to construct different fields combinations for each statement and also for creation of procedure parameters.

The retrieval of all fields is quite easy as only access to sys.columns table for field names and sys.types for type names is necessary. The field type is written to the table variable as SQL statement which will be used when generating the procedure parameters.

DECLARE @allFields TABLE (        --Table variable for storing all the table fields
    name sysname,                --field name
    isIdentity bit,                --specifies whether field is INDENTITY
    isCalculated bit,            --specifies whether filed is Calculated field
    fieldType sysname            --Specified data type of the field
)

--Get all table fields and store them in the @allFields table variable for construction of CRUD procedures
INSERT INTO @allFields (
    name,
    isIdentity,
    isCalculated,
    fieldType
)
SELECT
    c.name,
    is_identity,
    is_computed,
    CASE
        WHEN t.name IN (N'char', N'nchar', N'varchar', N'nvarchar') THEN QUOTENAME(t.name) + N'(' + CASE WHEN c.max_length = -1 THEN N'max' ELSE CAST(c.max_length AS sysname) END + N')'
        WHEN t.name IN (N'decimal', N'numeric') THEN QUOTENAME(t.name) + N'(' + CAST(c.precision AS sysname) + N', ' + CAST(c.scale AS sysname) + N')'
        ELSE QUOTENAME(t.name)
    END
FROM sys.columns c
INNER JOIN sys.types t ON c.user_type_id = t.user_type_id
WHERE object_id = @objId

The query to retrieve the fields form which the Primary Key is composed is more complex and to retrieve all the information we need it is necessary to access several system tables. As the Primary Key is enforced by a unique index, the field names can be found using the ID of the columns in unique index by which the primary key is enforced.

DECLARE @pkFields TABLE (        --Table variable for storing fields which are part of primary key
    name sysname,                --field Name
    fieldType sysname            --Specified data type of the field
)

--Get list of Primary Key Fields and store them in @pkFields table variable for construction of CRUD procedures
INSERT INTO @pkFields(
    name,
    fieldType
)
SELECT
    c.name,
    CASE
        WHEN t.name IN (N'char', N'nchar', N'varchar', N'nvarchar') THEN QUOTENAME(t.name) + N'(' + CASE WHEN c.max_length = -1 THEN N'max' ELSE CAST(c.max_length AS sysname) END + N')'
        WHEN t.name IN (N'decimal', N'numeric') THEN QUOTENAME(t.name) + N'(' + CAST(c.precision AS sysname) + N', ' + CAST(c.scale AS sysname) + N')'
        ELSE QUOTENAME(t.name)
    END
FROM sys.key_constraints kc
INNER JOIN sys.indexes i ON kc.parent_object_id = i.object_id AND kc.unique_index_id = i.index_id
INNER JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
INNER JOIN sys.types t ON c.user_type_id = t.user_type_id
WHERE kc.parent_object_id = @objId and kc.type = 'PK'

Generating Columns and Variables Lists

Once the list of all columns is retrieved, the lists of different columns combinations could be generated. For this purpose a SELECT statement with FOR XML PATH clause will be used as this clause greatly simplifies a CSV list generation on SQL 2005 and above. Below is a code for the All Columns

SELECT @allColumns =
    STUFF(REPLACE((SELECT
                        N'            ' + @crlf + N'            ,' + QUOTENAME(c.name)
                    FROM @allFields c
                    FOR XML PATH(N'')
                    ),
            @crlfXML, @crlf),
        1, 27, N'             ')

The spaces are included in the generated string for later final query formatting as it allows the query to be printed in readable form and also when a stored procedure will be generated, it can be later easily modified as the query will contain correct line breaks and indentation. The REPLACE function is used to replace the XML representation of CRLF back to real CRLF characters as the FOR XML PATH in reality produces XML output and translates all the CRLF characters to their XML compliant equivalents.

Generating code of CRUD Procedures

Once all the the combination of columns are generated, the variables with such fields and parameters list can be used to generate the final CRUD procedure. Below is a snippet representing a SELECT procedure.

SET @sql = N'-- =======================================================
-- Author:        ' + QUOTENAME(SUSER_SNAME()) + N'
-- Create date: ' + CONVERT(nvarchar(10), GETDATE(), 111) + N'
-- Description:    Selects records from table ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName) + N'
-- =======================================================
CREATE PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixSelect + N']
' + @selectParams + N'
AS
BEGIN
    SET NOCOUNT ON;

    IF (' + @selectAllIfNullCondition + N') THEN
    BEGIN
        SELECT
' + @allColumns + N'
        FROM ' + QUOTENAME(@schemaName) + N'.' + QUOTENAME(@tableName) + N'
    END
    ELSE
    BEGIN
        SELECT
' + @allColumns + N'
        FROM ' + QUOTENAME(@schemaName) + N'.' + QUOTENAME(@tableName) + N'
        WHERE
' + @selectCondition + N'
    END
END
GO

';

sp_makeCRUD Final Form

Once all the parts of the procedure are prepared it’s possible to construct a final form of the sp_makeCRUD procedure. Below is a complete code for the sp_makeCRUD procedure.

USE [master]
GO
--==========================================================
-- Author: (c) 2011 Pavel Pawlowski
-- Description: Generates CRUD procedures for a table
--
--@objectName = table name with or without schema for which the CRUD procedures should be generated
--
--@executionMode
--  1 = Print the script Only
--  2 = Output the script as recordset using SELECT for longer procedures which is not possible output using PRINT
--  4 = Execute and Commit
--  8 = Execute and Rollback - testing mode
--
--@dropExistingProcedures = 1 | 0
-- specifies whether generate DROP commands for existing objects
--
--@outputIndentityCalcFields = 1 | 0
-- specifies whether Identity and Calculated fields should be OUTPUTed in INSERT and UPDATE
--==========================================================
CREATE PROCEDURE [dbo].[sp_makeCRUD]
    @objectName sysname,
    @executionMode tinyint = 1,
    @dropExistingProcedures bit = 1,
    @outputIndentityCalcFields bit = 1
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;

    --variables declaration
    DECLARE
        @objId int,                                 --ID of the Table
        @schemaName sysname,                        --schema Name of the Table
        @tableName sysname,                         --TableName
        @dbName sysname,                            --Database name in which we are creating the procedures
        @crlfXML nchar(7),                          --XML representation of the CLRF
        @crlf nchar(2),                             --CLRF characters
        @procPrefix sysname,                        --CRUD procedures Prefix
        @sql nvarchar(max),                         --SQL code for particular steps
        @msg nvarchar(max),                         --A message
        @suffixSelect sysname,                      --Suffix for SELECT procedure
        @suffixUpdate sysname,                      --Suffix for UPDATE procedure
        @suffixDelete sysname,                      --Suffix for Delete procedure
        @suffixInsert sysname,                      --Suffix for INSERT procedure
        @selectParams nvarchar(max),                --Parameters for SELECT procedure
        @allColumns nvarchar(max),                  --List of All columns in a table for SELECT statement
        @selectAllIfNullCondition nvarchar(max),    --Condition for checking if all parameters in SELECT procedure are NULL
        @selectCondition nvarchar(max),             --SELECT statement condition
        @updateParams nvarchar(max),                --Parameters for UPDATE procedure
        @updateColumns nvarchar(max),               --List of columns for UPDATE statement
        @updateDeleteCondition nvarchar(max),       --Condition for UPDATE and DELETE statement
        @updateOutputCols nvarchar(max),            --List of UPDATE statement output columns to output calculated columns
        @deleteParams nvarchar(max),                --Parameters for DELETE procedure
        @insertParams nvarchar(max),                --Parameters for INSERT procedure
        @insertColumns nvarchar(max),               --List of COLUMNS for INSERT statement
        @insertOutputCols nvarchar(max),            --List of INSERT statement ouptup columns to output IDENTITY and calculated fields
        @insertParamNames nvarchar(max),            --List of parameter names in Insert procedure
        @isTooLongForPrint bit                      --Sores info whether some of the procs is too long for PRINT

    --Declaration of fields Table Variables
    DECLARE @pkFields TABLE (        --Table variable for storing fields which are part of primary key
        name sysname,                --field Name
        fieldType sysname            --Specified data type of the field
    )
    DECLARE @allFields TABLE (
        name sysname,                --field name
        isIdentity bit,              --specifies whether field is INDENTITY
        isCalculated bit,            --specifies whether filed is Calculated field
        fieldType sysname            --Specified data type of the field
    )

    --Table variable for storing scripts for execution
    DECLARE @scripts TABLE (
        id int NOT NULL IDENTITY,
        script nvarchar(max)
    )

    --Check if an execution mode is selected
    IF ((@executionMode & 7) = 0)
    BEGIN
        SET @msg = N'You have to select at at leas one possible execution Mode (@executionMode)
    1 = Print the script Only
    2 = Output the script as SELECT resordset for longer procedures which is not possible output using PRINT
    4 = Execute and Commit
    8 = Execute and Rollback - testing mode

You can also combine the Print and Execute Modes, but you cannot combine both execution modes'
        RAISERROR (@msg, 11, 1)
        RETURN
    END

    IF ((@executionMode & 6) = 6)
    BEGIN
        SET @msg = N'You cannot specify Execute and Commit with Execute and Rollback Together'
        RAISERROR (@msg, 11, 1)
        RETURN
    END

    --populate parameters and constants
    SELECT
        @objID = OBJECT_ID(@objectName),
        @dbName = DB_NAME(),
        @crlfXML = N'
' + NCHAR(10),    --XML Representation of the CR+LF delimiter as we use FOR XML PATH ant this translates the original CR+LF to XML Equivalent. We need it to change it back
        @crlf = NCHAR(13) + NCHAR(10),        --CR+LF delimiter used in script
        @procPrefix = 'usp_CRUD_',            --Specifies prefix to be added to all CRUD procedures
        @suffixSelect = '_S',                --Specifies suffix to be added to the Select Procedure
        @suffixUpdate = '_U',                --Specifies suffix to be added to the Update Procedure
        @suffixDelete = '_D',                --Specifies suffix to be added to the Delete Procedure
        @suffixInsert = '_I'                --Specifies suffix to be added to the Inser Procedure

    --Check whether object exists
    IF @objId IS NULL
    BEGIN
        SET @msg = N'Object "' + @objectName + '" doesnt'' exist in database ' + QUOTENAME(@dbName)
        RAISERROR (@msg, 11, 1)
        RETURN
    END

    --Populate table name and schema name
    SELECT
        @schemaName = s.name,
        @tableName = o.name
    FROM sys.objects o
    INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
    WHERE o.object_id = @objId AND o.type = 'U'

    --check whether object is table
    IF (@tableName IS NULL)
    BEGIN
        SET @msg = N'Object "' + @objectName + '" is not User Table. Creating CRUD procedures is possible only on User Tables.'
        RAISERROR (@msg, 11, 1)
        RETURN
    END

    --Get all table fields and store them in the @allFields table variable for construction of CRUD procedures
    INSERT INTO @allFields (
        name,
        isIdentity,
        isCalculated,
        fieldType
    )
    SELECT
        c.name,
        is_identity,
        is_computed,
        CASE
            WHEN t.name IN (N'char', N'nchar', N'varchar', N'nvarchar') THEN QUOTENAME(t.name) + N'(' + CASE WHEN c.max_length = -1 THEN N'max' ELSE CAST(c.max_length AS sysname) END + N')'
            WHEN t.name IN (N'decimal', N'numeric') THEN QUOTENAME(t.name) + N'(' + CAST(c.precision AS sysname) + N', ' + CAST(c.scale AS sysname) + N')'
            ELSE QUOTENAME(t.name)
        END
    FROM sys.columns c
    INNER JOIN sys.types t ON c.user_type_id = t.user_type_id
    WHERE object_id = @objId

    --Get list of Primary Key Fields and store them in @pkFields table variable for construction of CRUD procedures
    INSERT INTO @pkFields(
        name,
        fieldType
    )
    SELECT
        c.name,
        CASE
            WHEN t.name IN (N'char', N'nchar', N'varchar', N'nvarchar') THEN QUOTENAME(t.name) + N'(' + CASE WHEN c.max_length = -1 THEN N'max' ELSE CAST(c.max_length AS sysname) END + N')'
            WHEN t.name IN (N'decimal', N'numeric') THEN QUOTENAME(t.name) + N'(' + CAST(c.precision AS sysname) + N', ' + CAST(c.scale AS sysname) + N')'
            ELSE QUOTENAME(t.name)
        END
    FROM sys.key_constraints kc
    INNER JOIN sys.indexes i ON kc.parent_object_id = i.object_id AND kc.unique_index_id = i.index_id
    INNER JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
    INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
    INNER JOIN sys.types t ON c.user_type_id = t.user_type_id
    WHERE kc.parent_object_id = @objId and kc.type = 'PK'

    --Check Whether there is primary Key the CRUD works only if there is primary key in the table
    IF (NOT EXISTS(SELECT 1 FROM @pkFields))
    BEGIN
        SET @msg = N'Table "' + @objectName + '" does not have a Primary Key. There must exists a primary key prior generating CRUD procedures.'
        RAISERROR (@msg, 11, 1)
    END

    --list of output columns for INSERT statement (ouptup of Identity and calculated fields)
    SELECT @insertOutputCols =
        STUFF(REPLACE((SELECT
                            N'        ' + @crlf + N'        ,inserted.' + QUOTENAME(c.name)
                        FROM @allFields c
                        WHERE isIdentity = 1 OR isCalculated = 1
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 19, N'         ')

    --list of output columns for UPDATE statement (Calculated fields only)
    SELECT @updateOutputCols =
        STUFF(REPLACE((SELECT
                            N'        ' + @crlf + N'        ,inserted.' + QUOTENAME(c.name)
                        FROM @allFields c
                        WHERE isCalculated = 1
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 19, N'         ')        

    --ist of all columns used in the SELECT Statement
    SELECT @allColumns =
        STUFF(REPLACE((SELECT
                            N'            ' + @crlf + N'            ,' + QUOTENAME(c.name)
                        FROM @allFields c
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 27, N'             ')

    --list of columns for UPDATE statement including the equal sign and variable (all columns except indentity and calculated ones)
    SELECT @updateColumns =
        STUFF(REPLACE((SELECT
                            N'        ' + @crlf + N'        ,' + QUOTENAME(c.name) + N' = @' + c.name
                        FROM @allFields c
                        WHERE isIdentity = 0 AND isCalculated = 0
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 19, N'         ')

    --list of columns for INSERT statement (all columns except identity and calculated ones)
    SELECT @insertColumns =
        STUFF(REPLACE((SELECT
                            N'        ' + @crlf + N'        ,' + QUOTENAME(c.name)
                        FROM @allFields c
                        WHERE isIdentity = 0 AND isCalculated = 0
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 19, N'         ')

    --condition for UPDATE and DELETE statement
    SET @updateDeleteCondition =
            STUFF(REPLACE((SELECT
                            N'        ' + @crlf + N'        AND' + @crlf + '        ' + QUOTENAME(c.name) + N' = @' + c.name
                        FROM @pkFields c
                        FOR XML PATH(N'')), @crlfXML, @crlf), 1, 23, N'')

    --IF condition for SELECT statement if all params will be NULL to do not use condition to receive better plans
    SET @selectAllIfNullCondition =
                STUFF((SELECT
                            N' AND @' + c.name + N' IS NULL'
                        FROM @pkFields c
                        FOR XML PATH(N'')), 1, 5, N'')

    --Select condition (for SELECT ONE)
    SET @selectCondition =
        STUFF(REPLACE((SELECT
                            N'            ' + @crlf + N'            AND' + @crlf + N'            (@' + c.name + N' IS NULL OR ' + QUOTENAME(c.name) + N' = @' + c.name + N')'
                        FROM @pkFields c
                        FOR XML PATH(N'')), @crlfXML, @crlf), 1, 31, N'')

    --parameters list for SELECT CRUD procedure
    SELECT @selectParams =
        STUFF(REPLACE((SELECT
                            N',    ' + @crlf + N'    @' + c.name + N' ' + c.fieldType + N' = NULL'
                        FROM @pkFields c
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 11, N'    ')

    --parameters list for DELETE CRUD procedure
    SELECT @deleteParams =
        STUFF(REPLACE((SELECT
                            N',    ' + @crlf + N'    @' + c.name + N' ' + c.fieldType
                        FROM @pkFields c
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 11, N'    ')

    --parameters list for UPDATE CRUD procedure
    SELECT @updateParams =
        STUFF(REPLACE((SELECT
                            N',    ' + @crlf + N'    @' + c.name + N' ' + c.fieldType
                        FROM @allFields c
                        LEFT JOIN @pkFields pk ON c.name = pk.name
                        WHERE (c.isIdentity = 0 AND c.isCalculated = 0) OR pk.name IS NOT NULL
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 11, N'    ')

    --parameters list for INSERT CRUD procedure
    SELECT @insertParams =
        STUFF(REPLACE((SELECT
                            N',    ' + @crlf + N'    @' + c.name + N' ' + c.fieldType
                        FROM @allFields c
                        LEFT JOIN @pkFields pk ON c.name = pk.name
                        WHERE c.isIdentity = 0 AND c.isCalculated = 0
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 11, N'    ')

    --parameter names list for INSERT command in the INSERT CRUD procedure
    SELECT @insertParamNames =
        STUFF(REPLACE((SELECT
                            N'        ' + @crlf + N'        ,@' + c.name
                        FROM @allFields c
                        LEFT JOIN @pkFields pk ON c.name = pk.name
                        WHERE c.isIdentity = 0 AND c.isCalculated = 0
                        FOR XML PATH(N'')
                        ),
                @crlfXML, @crlf),
            1, 19, N'         ')

--USE DB
SET @sql = N'USE ' + QUOTENAME(@dbName) + N'
'
INSERT INTO @scripts(script) VALUES(@sql)

--SELECT PROCEDURE
IF (@dropExistingProcedures = 1)
BEGIN
SET @sql = N'--Drop existing SELECT CRUD Procedure
IF (EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(''' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixSelect + N']'') AND type = ''P''))
    DROP PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixSelect +N']
'
INSERT INTO @scripts(script) VALUES(@sql)
END

SET @sql = N'-- =======================================================
-- Author:      ' + QUOTENAME(SUSER_SNAME()) + N'
-- Create date: ' + CONVERT(nvarchar(10), GETDATE(), 111) + N'
-- Description: Selects records from table ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName) + N'
-- =======================================================
CREATE PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixSelect + N']
' + @selectParams + N'
AS
BEGIN
    SET NOCOUNT ON;

    IF (' + @selectAllIfNullCondition + N')
    BEGIN
        SELECT
' + @allColumns + N'
        FROM ' + QUOTENAME(@schemaName) + N'.' + QUOTENAME(@tableName) + N'
    END
    ELSE
    BEGIN
        SELECT
' + @allColumns + N'
        FROM ' + QUOTENAME(@schemaName) + N'.' + QUOTENAME(@tableName) + N'
        WHERE
' + @selectCondition + N'
    END
END
';
INSERT INTO @scripts(script) VALUES(@sql)

--UPDATE PROCEDURE
IF (@dropExistingProcedures = 1)
BEGIN
SET @sql = N'--Drop existing UPDATE CRUD Procedure
IF (EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(''' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixUpdate + N']'') AND type = ''P''))
    DROP PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixUpdate + N']
'
INSERT INTO @scripts(script) VALUES(@sql)
END

SET @sql = N'-- =======================================================
-- Author:      ' + QUOTENAME(SUSER_SNAME()) + N'
-- Create date: ' + CONVERT(nvarchar(10), GETDATE(), 111) + N'
-- Description: Updates record in table ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName) + N'
-- =======================================================
CREATE PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixUpdate + N']
' + @updateParams + N'
AS
BEGIN
    SET NOCOUNT ON;

    UPDATE ' + QUOTENAME(@schemaName) + N'.' + QUOTENAME(@tableName) + N' SET
' + @updateColumns + CASE WHEN @outputIndentityCalcFields = 1 AND @updateOutputCols <> N'' AND @updateOutputCols IS NOT NULL THEN N'
    OUTPUT
' + @updateOutputCols
    ELSE N'' END + N'
    WHERE
' + @updateDeleteCondition + N'
END
';
INSERT INTO @scripts(script) VALUES(@sql)

--DELETE PROCEDURE
IF (@dropExistingProcedures = 1)
BEGIN
SET @sql = N'--Drop existing DELETE CRUD Procedure
IF (EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(''' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixDelete + N']'') AND type = ''P''))
    DROP PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName +  @suffixDelete + N']
'
INSERT INTO @scripts(script) VALUES(@sql)
END

SET @sql = N'-- =======================================================
-- Author:      ' + QUOTENAME(SUSER_SNAME()) + N'
-- Create date: ' + CONVERT(nvarchar(10), GETDATE(), 111) + N'
-- Description: Deletes recors from table ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName) + N'
-- =======================================================
CREATE PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixDelete + N']
' + @deleteParams + N'
AS
BEGIN
    SET NOCOUNT ON;

    DELETE FROM ' + QUOTENAME(@schemaName) + N'.' + QUOTENAME(@tableName) + N'
    WHERE
' + @updateDeleteCondition + N'
END
';
INSERT INTO @scripts(script) VALUES(@sql)

--INSERT PROCEDURE
IF (@dropExistingProcedures = 1)
BEGIN
SET @sql = N'--Drop existing INSERT CRUD Procedure
IF (EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(''' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixInsert + N']'') AND type = ''P''))
    DROP PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixInsert + N']
'
INSERT INTO @scripts(script) VALUES(@sql)
END

SET @sql = N'-- =======================================================
-- Author:      ' + QUOTENAME(SUSER_SNAME()) + N'
-- Create date: ' + CONVERT(nvarchar(10), GETDATE(), 111) + N'
-- Description: Deletes recors from table ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName) + N'
-- =======================================================
CREATE PROCEDURE ' + QUOTENAME(@schemaName) + N'.[' + @procPrefix + @tableName + @suffixInsert + N']
' + @insertParams + N'
AS
BEGIN
    SET NOCOUNT ON;

    INSERT INTO ' + QUOTENAME(@schemaName) + N'.' + QUOTENAME(@tableName) + N' (
' + @insertColumns + N'
    )' + CASE WHEN @outputIndentityCalcFields = 1 AND @insertOutputCols <> N'' AND @insertOutputCols IS NOT NULL THEN N'
    OUTPUT
' + @insertOutputCols
    ELSE N'' END + N'
    SELECT
' + @insertParamNames + N'
END
';
INSERT INTO @scripts(script) VALUES(@sql)

DECLARE cr CURSOR FAST_FORWARD FOR
SELECT script FROM @scripts ORDER BY id

--if EXECUTION mode contains 2 we should output the code using SELECT
--Script generate using the SELECT can be saved by right lick on the result and
--select Save Result AS and storing it as CSV
IF ((@executionMode & 2) = 2 OR (@isTooLongForPrint = 1 AND (@executionMode & 1) = 1))
BEGIN
    SELECT script + N'GO' FROM @scripts
END

SET @isTooLongForPrint = ISNULL((SELECT 1 FROM @scripts WHERE LEN(script) > 4000), 0)

--if Execution mode contains 1 we should PRINT the statements
IF ((@executionMode & 1) = 1 AND @isTooLongForPrint = 0)
BEGIN
    OPEN cr
    FETCH NEXT FROM cr INTO @sql
    WHILE (@@FETCH_STATUS = 0)
    BEGIN
        PRINT @sql
        PRINT 'GO'
        FETCH NEXT FROM cr INTO @sql
    END
    CLOSE cr
END

--Execute the statement if it should be executed
IF (( @executionMode & 4 ) = 4 OR ( @executionMode & 8 ) = 8 )
BEGIN

    OPEN cr

    BEGIN TRY
        BEGIN TRANSACTION

        FETCH NEXT FROM cr INTO @sql
        WHILE (@@FETCH_STATUS = 0)
        BEGIN
            EXEC (@sql)
            FETCH NEXT FROM cr INTO @sql
        END

        IF ((@executionMode & 4) = 4)
        BEGIN
            IF (@@TRANCOUNT > 0)
                COMMIT TRANSACTION
        END
        ELSE
        BEGIN
            IF (@@TRANCOUNT > 0)
                ROLLBACK TRANSACTION
        END
    END TRY
    BEGIN CATCH
        IF (@@TRANCOUNT > 0)
            ROLLBACK TRANSACTION
    END CATCH
END

--if cursor is open, close it
IF (cursor_status('global', 'cr') = 1)
    CLOSE cr

DEALLOCATE cr
END
GO

EXECUTE sp_ms_marksystemobject 'dbo.sp_makeCRUD'
Go

The each of the SQL statement in the procedure is first stored in a table variable called @scripts for later printing and/or execution. At the end of the procedure the code is printed using a PRINT statement or using a SELECT statement which is automatically used in case the code for some CRUD procedure is longer than 4000 characters and therefore not possible to PRINT.

If Execute and Commit or Execute And Rollback was selected then the code for the CRUD procedures is also executed. The Execute and Rollback is useful for testing, that the code for the procedures is correct and working.

Procedure is created in the [master] database so it is accessible from all databases and we do not need to create it in every database.

To be fully functional, it is necessary to mark the procedure as system procedure using below command.

EXECUTE sp_ms_marksystemobject 'dbo.sp_makeCRUD'

Marking the procedure as system object is necessary as if we not mark the object, then the procedure will run in context of the database in which it was created – in this case [system] and will fail as there are not the tables for which we want to generate the CRUD procedures.

Once the procedure is marked as system object, then it executes in the context of the current database, even it is located in the [master] database.

After we create  the proc, we can test it on a small example:

USE [tempdb]
GO
CREATE TABLE dbo.TestTable (
    ID int NOT NULL IDENTITY(1,1) PRIMARY KEY,
    Name varchar(10),
    Price money,
    Quantity decimal(5,2),
    TotalPrice AS (Price * Quantity)
)
GO
sp_makeCRUD 'dbo.TestTable'

This will produce below code for the CRUD procedures

USE [tempdb]
GO
--Drop existing SELECT CRUD Procedure
IF (EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID('[dbo].[usp_CRUD_TestTable_S]') AND type = 'P'))
    DROP PROCEDURE [dbo].[usp_CRUD_TestTable_S]
GO
-- =======================================================
-- Author:      [PAVEL-PCPavel]
-- Create date: 2011/02/10
-- Description: Selects records from table [dbo].[TestTable]
-- =======================================================
CREATE PROCEDURE [dbo].[usp_CRUD_TestTable_S]
    @ID [int] = NULL
AS
BEGIN
    SET NOCOUNT ON;

    IF (@ID IS NULL)
    BEGIN
        SELECT
             [ID]
            ,[Name]
            ,[Price]
            ,[Quantity]
            ,[TotalPrice]
        FROM [dbo].[TestTable]
    END
    ELSE
    BEGIN
        SELECT
             [ID]
            ,[Name]
            ,[Price]
            ,[Quantity]
            ,[TotalPrice]
        FROM [dbo].[TestTable]
        WHERE
            (@ID IS NULL OR [ID] = @ID)
    END
END
GO
--Drop existing UPDATE CRUD Procedure
IF (EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID('[dbo].[usp_CRUD_TestTable_U]') AND type = 'P'))
    DROP PROCEDURE [dbo].[usp_CRUD_TestTable_U]
GO
-- =======================================================
-- Author:      [PAVEL-PCPavel]
-- Create date: 2011/02/10
-- Description: Updates record in table [dbo].[TestTable]
-- =======================================================
CREATE PROCEDURE [dbo].[usp_CRUD_TestTable_U]
    @ID [int],
    @Name [varchar](10),
    @Price [money],
    @Quantity [decimal](5, 2)
AS
BEGIN
    SET NOCOUNT ON;

    UPDATE [dbo].[TestTable] SET
         [Name] = @Name
        ,[Price] = @Price
        ,[Quantity] = @Quantity
    OUTPUT
         inserted.[TotalPrice]
    WHERE
        [ID] = @ID
END
GO
--Drop existing DELETE CRUD Procedure
IF (EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID('[dbo].[usp_CRUD_TestTable_D]') AND type = 'P'))
    DROP PROCEDURE [dbo].[usp_CRUD_TestTable_D]
GO
-- =======================================================
-- Author:      [PAVEL-PCPavel]
-- Create date: 2011/02/10
-- Description: Deletes recors from table [dbo].[TestTable]
-- =======================================================
CREATE PROCEDURE [dbo].[usp_CRUD_TestTable_D]
    @ID [int]
AS
BEGIN
    SET NOCOUNT ON;

    DELETE FROM [dbo].[TestTable]
    WHERE
        [ID] = @ID
END
GO
--Drop existing INSERT CRUD Procedure
IF (EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID('[dbo].[usp_CRUD_TestTable_I]') AND type = 'P'))
    DROP PROCEDURE [dbo].[usp_CRUD_TestTable_I]
GO
-- =======================================================
-- Author:      [PAVEL-PCPavel]
-- Create date: 2011/02/10
-- Description: Deletes recors from table [dbo].[TestTable]
-- =======================================================
CREATE PROCEDURE [dbo].[usp_CRUD_TestTable_I]
    @Name [varchar](10),
    @Price [money],
    @Quantity [decimal](5, 2)
AS
BEGIN
    SET NOCOUNT ON;

    INSERT INTO [dbo].[TestTable] (
         [Name]
        ,[Price]
        ,[Quantity]
    )
    OUTPUT
         inserted.[ID]
        ,inserted.[TotalPrice]
    SELECT
         @Name
        ,@Price
        ,@Quantity
END
GO

Selecting random records from a table

Some time you may come to a situation, when you have a table and need to get only a few random records from it. You can come with a question how to achieve this on SQL Server.

You may come with incorporating the RAND() function, but as I mentioned in my previous post Random numbers in query, the RAND() function is a constant function executed once per whole query and does not it’s work.

One of the solution could be using the TABLESAMPLE clause. All the samples here use the AdventureWorks2008R2 sample database.

SELECT
    *
FROM Sales.SalesOrderDetail
TABLESAMPLE (100 ROWS);

As this for some tables could work, probably it will not give you exact results you probably expect as the current implementation of the TABLESAMPLE clause build for purpose of statistics retrieving.

Selecting random rows using NEWID()

As working workaround for the purpose of random records selection we can use again the NEWID() function mentioned in my previous post. Each call to the function creates new unique identifier. There are several possibilities to return random rows using the function.

  1. Select an approximate percentage sample of random rows
  2. Select exact number of random rows
  3. Select an exact percentage sample of random rows
  4. Select random number of random rows

Selecting approximate percentage sample of random rows

To select an approximate percentage sample of random rows from a table you can you a below query. The query will select around 10 % of records from a table.

SELECT
    *
FROM Sales.SalesOrderDetail
WHERE 0.1 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
    / CAST (0x7fffffff AS int)

Each execution of the above query returns a different count of records, which will be close to the 10 percent of total records in the table.

To select a different percentage simply change the 0.1 to and appropriate percentage. In the condition there is added a SalesOrderID field in the calculation to force generation of the NEWID() for each row, otherwise the NEWID() would be generated only once as a constant in this query. The 0x7fffffff represents maximum positive integer value, so when using it in bit AND operation it limits the original number to that value.

The above query produces following plan:

Approximate percentage sample of random rows query plan

And trace from the profiler:

Approximate percentage sample of random rows profiler trace

The table Sales.SalesOrderDetail has 121317 rows and from the profiler results we can see, that the query returned exactly 10 % of records (without 1.7). With multiple runs the row count produced will be different, but very close to 10 %.

Selecting exact number of random rows

Selecting exact number of random rows is very easy, as it is only necessary to sort the query by the NEWID() result and limit the number of returned rows. For returning 100 random rows it’s possible to use below query:

SELECT TOP 100
    *
FROM Sales.SalesOrderDetail
ORDER BY NEWID()

In this example it is not necessary to include any random number generation and checksum as the NEWID() is in the ORDER BY clause and therefore it will be evaluated once for each row so the SQL Server can sort the record on it’s value.

Here is a plan for the query:

Exact number of random rows Query Plan

And trace from profiler:

Exact number of random rows Profiler Trace

As we can see, even the query looks much simpler than the previous for approximate percentage, this query produces much more complicated parallel plan and from the profiler we can see, that even returning les rows, it produces more reads and has much higher CPU cost. This is caused by the ORDER BY operator as the database engine has to sort all the records prior returning the TOP 100 rows. The duration of the first query is longer, but this is caused by returning 12 times more rows to client.

On small tables this doesn’t matter, but on large one it can has high impact. To solve the issue, we can use the approximate count query and limit the records returned by this query. Only we need to choose appropriate count records in the WHERE clause. If we know, that in the table is more than 120 000 records, we can first limit the results to about 0.15 % so we will be sure, the base query returns more than 100 rows and then simply limit them to 100.

SELECT TOP 100
    *
FROM Sales.SalesOrderDetail
WHERE 0.0015 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
    / CAST (0x7fffffff AS int)

This query produces below plan and trace from profiler.

Query Plan for Exact number of random rows with random generator in where condition

Profiler Trace for Exact number of random rows with random generator in where condition

As we can see, the query plan is again much simpler and the Reads count is more than half less then previous and the CPU cost is much lower and duration is a little bit lower, but generally equal because of returning rows to client.

Selecting exact percentage sample of random rows

To select an exact percentage sample of random rows it’s very simple and consist of simple modification of the TOP clause to a percentage value. So to select 10 percent of random records we can use below query.

SELECT TOP 10 PERCENT
    *
FROM Sales.SalesOrderDetail
ORDER BY NEWID()

Query produces below and again more complex plan then the solution with WHERE clause. It’s even more complicated than the one for the TOP without PERCENT as a table spool operator is added to the plan.

Query Plan for exact percentage sample of random rows

Profiler Trace for exact percentage sample of random rows

From the profiles we again see, tan it produces enormous amount of reads comparing the to the approximate percentage sample even the amount of returned records is similar. The duration and CPU cost is incomparable and even writes are produced because of the writes to Table Spool.

To optimize this, we cannot use the approximate percentage count query and limit it by percent, as the TOP 10 PERCENT clause will return only 10 percent of records from the one limited by the WHERE clause.

We can optimize the reads, execution time and resources cost by below query.

SELECT TOP ((SELECT COUNT(*) FROM Sales.SalesOrderDetail) / 10)
    *
FROM Sales.SalesOrderDetail
WHERE 0.11 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
    / CAST (0x7fffffff AS int)

The (SELECT COUNT(*) FROM Sales.SalesOrderDetail) / 10 query in TOP clause calculates 10 percent of records and whole query produces below plan and profiler trace.

Query Plan for exact percentage sample of random rows with random generator in WHERE clause

Profiler Trace for exact percentage sample of random rows with random generator in WHERE clause

Even the plan may look more complex than the previous one, from the profiler trace we can see, that for the same Row Count this plan produces 256 time less reads, no writes and 3 time lower duration and 6.5 time less CPU cost.

Selecting random number of random records

To select a random number of random records we can again use the above query and slightly modify the top clause. This modification consist of adding a random generator into the TOP clause. To select random number of records (or percent) in particular interval – in this example 10-20 records or 10 – 20 percent a below queries can be used:

--Select 10 - 20 random records
SELECT TOP (ABS(CHECKSUM(NEWID())) % 10 + 10)
    *
FROM Sales.SalesOrderDetail
ORDER BY NEWID()

--Select 10 - 20 random records
SELECT TOP (ABS(CHECKSUM(NEWID())) % 10 + 10)
    *
FROM Sales.SalesOrderDetail
WHERE 0.0022 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
    / CAST (0x7fffffff AS int)

Profiler Trace for Random number of random records

The plans for the queries are the same as the plan mentioned above and from the profiler trace we can again see, the incomparable lower costs of the second query compared to the first one.

--Select 10 - 20 % of random records
SELECT TOP (ABS(CHECKSUM(NEWID())) % 10 + 10) PERCENT
    *
FROM Sales.SalesOrderDetail
ORDER BY NEWID()

--Select 10 - 20 % of random records
SELECT TOP ((SELECT COUNT(*) FROM Sales.SalesOrderDetail) * (ABS(CHECKSUM(NEWID())) % 10 + 10) / 100)
    *
FROM Sales.SalesOrderDetail
WHERE 0.21 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
    / CAST (0x7fffffff AS int)

Profiler Trace for Random percentage of random records

Again the plan are the same as the plan for each version showed above and again from the profiler trace we can again see, that the performance of the second query is incomparable better, than the first one.

Complete comparison

To make a complete comparison of performance of all the queries mentioned in this article we can use below query batch. All the queries are written as sub queries of SELECT COUNT(*) to avoid wrong duration calculation which is affected by transferring the records from server to client.

--Approximate 10 percent =======================
SELECT COUNT(*) FROM (
    SELECT
        *
    FROM Sales.SalesOrderDetail
    WHERE 0.1 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
        / CAST (0x7fffffff AS int)
) q
GO
--TOP 100, ORDER BY =============================
SELECT COUNT(*) FROM (
    SELECT TOP 100
        *
    FROM Sales.SalesOrderDetail
    ORDER BY NEWID()
) q
GO
--TOP 100, Approximate =============================
SELECT COUNT(*) FROM (
    SELECT TOP 100
        *
    FROM Sales.SalesOrderDetail
    WHERE 0.0015 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
        / CAST (0x7fffffff AS int)
) q
GO
--TOP 10 PERCENT, ORDER BY =============================
SELECT COUNT(*) FROM (
    SELECT TOP 10 PERCENT
        *
    FROM Sales.SalesOrderDetail
    ORDER BY NEWID()
) q
GO
--TOP 10 PERCENT, Approximate =============================
SELECT COUNT(*) FROM (
    SELECT TOP ((SELECT COUNT(*) FROM Sales.SalesOrderDetail) / 10)
        *
    FROM Sales.SalesOrderDetail
    WHERE 0.11 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
        / CAST (0x7fffffff AS int)
) q
GO
--Random 10 - 20 Records, ORDER BY  =============================
SELECT COUNT(*) FROM (
    SELECT TOP (ABS(CHECKSUM(NEWID())) % 10 + 10)
        *
    FROM Sales.SalesOrderDetail
    ORDER BY NEWID()
) q
GO
--Random 10 - 20 Records, Approximate =============================
SELECT COUNT(*) FROM (
    SELECT TOP (ABS(CHECKSUM(NEWID())) % 10 + 10)
        *
    FROM Sales.SalesOrderDetail
    WHERE 0.0022 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
        / CAST (0x7fffffff AS int)
) q
GO
--Random 10 - 20 PERCENT, ORDER BY =============================
SELECT COUNT(*) FROM (
    SELECT TOP (ABS(CHECKSUM(NEWID())) % 10 + 10) PERCENT
        *
    FROM Sales.SalesOrderDetail
    ORDER BY NEWID()
) q
GO
--Random 10 - 20 PERCENT, Approximate =============================
SELECT COUNT(*) FROM (
    SELECT TOP ((SELECT COUNT(*) FROM Sales.SalesOrderDetail) * (ABS(CHECKSUM(NEWID())) % 10 + 10) / 100)
        *
    FROM Sales.SalesOrderDetail
    WHERE 0.21 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float)
        / CAST (0x7fffffff AS int)
) q
GO

And profiler output for above queries.

Queries comparison profiler trace output

Conclusion

As we can see from the queries, their query plans and profiler traces, we have several possibilities to select random records from table. Some possibilities have a very simpler query, but produces more complex plan and have a higher cost and can be suitable for relatively small tables.

Other versions producing the same results are more complex to write the query itself, but finally the query produces a much simpler plan and have much better performance comparing to the simple looking one. Such queries are then suitable even fro very large tables to produce results in acceptable time with low performance impact on the server.