SSIS 2008 Columns To XML Data Flow Transformation

This article represents a custom components for transformation of columns inside SSIS Data Flow into XML.

If you are writing ETL processes using SSIS, you many times comes to situation, when you need log errors or other problems which can occur during ETL process. An example can be a failure of data conversion during the ETL process. In that situation you mostly would like to know that record exactly caused that error and log such conversion errors into particular error table.

Problem occurs when you have a lot of tables and want to see what data were converted and caused that particular error. As SSIS requires that the sources and destinations needs to be strongly mapped and you have different sources with different structures, this leads to a situation where you have a lot of different error tables – generally one per the source structure at which an error can occur.

This situation can be easily solved by a custom Data Flow component which will convert all source columns into an XML. In that situation you can only have one Error Table, which will contain e.g. Error Code, Error Column, Description and an XML Field in which will be all the data of particular row which caused error encoded in XML Form. This rapidly simplifies error logging and also error log monitoring as you need to monitor only a single table.

So the SSIS Data Flow with the Custom Columns to XML Transformation could look like the one below.

Columns To Xml Data FLow

ColumnsToXML PropertiesColumnsToXML Input Columns

You can provide a SourceID and SourceName properties which are then part of the produced XML so you can easily identify the source. In the Input Columns you select columns you want to be written to the XML. You can also specify an OutpuAlias and the the column names written into the XML will be the one specified in the OutputAlias property.

The Log written using that component and the XML produced will look like the ones below.

ColumnsToXML Log

<row sourceID="1" sourceName="Test Data Conversion">
  <Column name="Name" id="235" lineageId="72">Joseph</Column>
  <Column name="ValueInt" id="236" lineageId="75">2</Column>
  <Column name="ValueDate" id="237" lineageId="78">2012/01/35</Column>
</row>

Using the .value method of XML data type we can easily querying necessary information from the log table for whatever source in the table. So for our sample scenario we would like to query the Name, ValueInt and ValueDate columns stored as XML. This can be easily achieved by below SQL query:

SELECT
    [RowID]
    ,[ErrorCode]
    ,[ErrorColumn]
    ,[ColumnsXMLData].value('data((/row/Column[@name="Name"])[1])', 'varchar(128)') AS Name
    ,[ColumnsXMLData].value('data((/row/Column[@name="ValueInt"])[1])', 'varchar(10)') AS ValueInt
    ,[ColumnsXMLData].value('data((/row/Column[@name="ValueDate"])[1])', 'varchar(10)') AS ValueDate
FROM [TestDB].[dbo].[ETLErrorLog]
WHERE [ColumnsXMLData].value('data((/row/@sourceID)[1])', 'int') = 1

As you can see from the samples above, you can create easily create single Error table for whatever ETL logging and simplify the logging rapidly as you#160; do not need to create the error tables with separate structures for each source.

ColumnsToXML Implementation

The ColumnsToXmlTransform is a custom implementation of the PipelineComponent. It is developed in C# for .NET framework 3.5 and uses Linq to XML for creating the XML output.

The component processes selected input columns, generates the XML and stores it in the output column called ColumnsXMLData.

Except the ColumsXMLDataSourceID and SourceName) to allow you to identify the source row source once it is written as XML into the database and simplifies querying for data coming from particular sources as showed above.

Once the component is built, it is necessary to install it into the Global Assembly Cache and to the PipelineComponents folder in the SQL Server Integration services installation path on the machine on which the SSIS using this component will be running.

To install it into a GAC you can use the gacutil.exe which is part of the Windows SDK.

Component is now available publicly with other components on GitHub as projects for SSIS 2008 – SSIS 2016.

If you run the development environment as with elevated Administrator privileges, the project contains Post Build events, which will install the assembly into the GAC (it assumes you have Windows SDK 7.0 installed – if you have newer version, modify the path to the gacutil.exe accordingly) and also copies the assembly into the codePipelineComponents/code folder of the default SQL Server Integration services installation. If you installation directory differ, please modify the paths in Post Build events accordingly.

Once you have compiled the code, installed in GAC and copied to the PipelineComponents folder, you can include the component to the data flow items palette. Right-click on the palette and select Choose Items

Choose Data Flow Transformation Items Pop-upChoose SSIS Data Flow Items

Once you select the component, it will appear on the toolbox and you can start using it.

Columns To XML On Palette

Once I will have more time, I will try to create an installer for the component. Anyway, any comments are welcome.

Advertisements

MCTS: Microsoft SQL Server 2008, Business Intelligence Development and Maintenance

Last few months I was quite busy at work and didn’t have much time to write blog posts. Also I was preparing for Microsoft Certification exam.

Finally, last  Wednesday I’ve successfully passed the Exam 70-448: TS: Microsoft SQL Server 2008, Business Intelligence Development and Maintenance and become MCTS.

So I hope I will have more free time now and be able to publish some of my posts I have in a queue.

Anyway I would like to start with preparation for the 70-452 certification exam to become MCITP.

Second Update to Querying Active Directory on SQL Server using CLR

Finally I had a time to make a second update to my post Querying Active Directory on SQL Server using CLR. This update should finally resolve an issue when you need to return large numbers of properties from Active Directory.

The solution is very simple by adding a possibility to specify the columns lengths (maximum length) for the returned properties. All the structures in the code left the same as they were originally. The only change is, that the list of properties is no longer passed as a comma separated list and now semicolon is used to delimit the properties. Comma can be used to provide a column length. If the length is not provided then the default of 4000 is used as it was before.

So don’t hesitate and check the updated article.

I’ve also provided a link to an updated Visual C# project with all the codes.

SQL Server Regular Expressions – Replacements

In my earlier post SQL Server 2005 and SQL Server 2008 Regular Expressions and pattern matching I was focusing on the Regular Expressions matching on the SQL Server using a CLR.

This post was focused only on the matches itself, but didn’t cover another great feature of the Regular Expressions and this feature is Replacements.

When using the replacements then first the regular expression is executed on the source string and then on one, several or all matches a replacement is applied and the result of such replacement is returned.

Here are the CLR functions which handle the replacements.

using System;
using System.Collections.Generic;
using System.Text;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Collections;

/// <summary>
/// Regular expressions for SQL
/// </summary>
public class SQLRegEx
{
    #region Internal structures
    private struct RegExRowReplace
    {
        /// <summary>
        /// Replacement table row
        /// </summary>
        /// <param name="matchID">ID of the match. ID = 0 represents a complete source string</param>
        /// <param name="match">
        /// String representing match. For ID=0 is NULL as it is equal to original string. Null is returnet to limit transferring the whore source string
        /// </param>
        /// <param name="result">Result of the replacement for the matchID</param>
        public RegExRowReplace(int matchID, SqlChars match, SqlChars result)
        {
            MatchID = matchID;
            Match = match;
            Result = result;
        }

        public int MatchID;
        public SqlChars Match;
        public SqlChars Result;
    }

    #endregion

    #region RegEx functions

    /// <summary>
    /// Applies Regular Expression on the Source string, takes apropriate match and aplies e replace on it.
    /// </summary>
    /// <param name="sourceString">Source string on which the regular expression should be applied</param>
    /// <param name="pattern">Regular Expression pattern</param>
    /// <param name="replacement">Replacement to be used on match</param>
    /// <param name="matchId">ID of the Match to be returned. 0 replaces all matches in the sourceString</param>
    /// <returns>Value of the Group from within a Match</returns>
    [SqlFunction(IsDeterministic = true)]
    public static SqlChars RegExReplace(SqlString sourceString, SqlString pattern, SqlString replacement, int matchId)
    {
        Match m = null;
        Regex r = new Regex(pattern.Value, RegexOptions.Compiled);

        if (matchId == 0)
        {
            return new SqlChars(r.Replace(sourceString.Value, replacement.Value));
        }
        if (matchId == 1)
        {
            m = r.Match(sourceString.Value);
        }
        else if (matchId > 1)
        {
            MatchCollection mc = r.Matches(sourceString.Value);
            m = mc != null && mc.Count > matchId - 1 ? mc[matchId - 1] : null;
        }

        return m != null ? new SqlChars(m.Result(replacement.Value)) : SqlChars.Null;
    }


    /// <summary>
    /// Applies Regular Expression o the Source strings and return all matches and final results after replacement
    /// </summary>
    /// <param name="sourceString">Source string on which the regular expression should be applied</param>
    /// <param name="pattern">Regular Expression pattern</param>
    /// <param name="replacement">Replacement to be used on matches</param>
    /// <returns>IEnumerable</returns>
    [SqlFunction(FillRowMethodName = "FillRegExRowReplace")]
    public static IEnumerable RegExMatchesReplace(SqlString sourceString, SqlString pattern, SqlString replacement)
    {
        Regex r = new Regex(pattern.Value, RegexOptions.Compiled);
        int matchId = 0;

        foreach (Match m in r.Matches(sourceString.Value))
        {
            yield return new RegExRowReplace(++matchId, new SqlChars(m.Value), new SqlChars(m.Result(replacement.Value)));
        }
    }

    #endregion

    #region FillRow methods

    public static void FillRegExRowReplace(object obj, out int matchID, out SqlChars match, out SqlChars result)
    {
        RegExRowReplace row = (RegExRowReplace)obj;
        matchID = row.MatchID;
        match = row.Match;
        result = row.Result;
    }

    #endregion
}

RegExReplace

This is a scalar CLR function which handles a Regular Expression replacement. It allows to replace a single match from within a string or even replace all matches inside the string if matchID = 0 is passed as an argument.

RegExMatchesReplace

In contrast to RegExReplace the RegExMatchesReplace is a CLR table valued function, which returns all the matches as separated rows and also replacements. It returns a table with three columns MatchID which represents ID of the match from within the original string, Match which represents the particular RegEx match from within the string and Result, which is result of the replacement done on the Match.

So once the class is compiled into Assembly, we can register the functions inside SQL Server database.

CREATE ASSEMBLY [SQLRegEx]
AUTHORIZATION [dbo]
FROM 'C:CLRSQLRegEx.dll'
WITH PERMISSION_SET = SAFE

CREATE FUNCTION [dbo].[fn_RegExReplace](
    @sourceString [nvarchar](max),        --Source string to apply RegEx
    @pattern [nvarchar](4000),            --RegEx pattern to apply on the source string
    @replacement [nvarchar](400),        --Replacement pattern to apply on match
    @matchID [int] = 1                    --ID of the match to return. @matchID = 1 is the first match. IF @matchID=0 then all the patterns in the @sourceString are replaced
)        
RETURNS [nvarchar](4000)
WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExReplace]
GO


CREATE FUNCTION [dbo].[fn_RegExMatchesReplace](
    @sourceString [nvarchar](max),  --Source string to apply RegEx
    @pattern [nvarchar](4000),        --RegEx to apply on the source string
    @replacement [nvarchar](4000)    --Replacement pattern to apply on matches
)        
RETURNS  TABLE (
    [matchId] [int] NULL,                --ID of returnet match
    [match] [nvarchar](4000) NULL,        --value of match
    [result] [nvarchar](4000) NULL        --Replacement result
) WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [SQLRegEx].[SQLRegEx].[RegExMatchesReplace]
GO

Testing the Replace method

Here are some sample queries so you can imagine how the functions work.

Replace year in all Dates with year 2011
WITH StringDates AS (
    SELECT '2010/09/3' AS StringDate UNION ALL
    SELECT '2010/07/09' UNION ALL
    SELECT '2009/1/15'
)
SELECT
    StringDate,
    dbo.fn_RegExReplace(StringDate, '(d{4})/(d{1,2})/(d{1,2})', '2011/$2/$3', 0)
FROM StringDates
Remove the values from the Key=Value pairs and leave only the Key values separated by semicolons
SELECT dbo.fn_RegExReplace(     'Key1=Val1;Key2=Val2;Key3=Val3;Key4=Val4',     's*?(w+?)s*?=s*?(.*?)(;|$)',     '$1$3',     0)
Get list of all dates from within a string and replace the year of each date by 2011
SELECT
*
FROM dbo.fn_RegExMatchesReplace(
    '2010/09/3;2010/07/09;2009/1/15', 
    '(d{4})/(d{1,2})/(d{1,2})', 
    '2011/$2/$3'
    )

From the examples we can see, that CLR give us a real power to using Regular Expressions on SQL Server. It allows us to make some complex transformation which could not be possible to do using a pure T-SQL and in case of transformation which can be done by pure T-SQL, mostly the CLR RegEx solution is much faster and straightforward.

Update to Querying Active Directory on SQL Server using CLR

I’ve just posted an update to my article Querying Active Directory on SQL Server using CLR. I’ve added another 2 parameters to the samples which one of them allows control the Page Size limit for paged AD Search and the second one allows limit the number of rows returned to the client. This can be handful when the AD query can return more results than we need. In such case it’s easy to limit the number of rows.

I’ve updated the article after the comment from Chris, when he received an insufficient memory error when he wanted to return a higher number of fields from AD.

In this case the lowering the page size from 1000 to lower values should avoid this kind of problems.

Querying Active Directory on SQL Server using CLR

In my previous article Querying Active Directory on SQL Server using T-SQL I was showing the possibilities of querying the Active Directory using T-SQL, specifically using linked server and OPENQUERY statement or without linked server using the OPENROWSET statement.

This is an updated article and includes modification coming upon some of the comments to resolve some issues with large number of returned AD properties.

There ware mentioned some limitations of querying the AD using T-SQL and CLR will help us to bypass those limitations. Especially the limit of 1000 records returned as we can use pagination in the CLR code effectively.

For the purpose I’m creating a CLR Stored Procedure which will take several argument and return a result set. The reason I’m going to use a CLR Stored Procedure is, that stored procedure will allow me to return dynamic result set. It means I can specify properties of AD to return and those will be returned as columns of the result set. In case we go through the CLR Table Valued function, we had to create a separate function for each properties combination we would like to return.

So here is the CLR class for querying AD.

public class ActiveDirectory
{
    /// <summary>
    /// Queries Active directory according provided parameters
    /// Current user credentials are used for authentication
    /// </summary>
    /// <param name="adRoot">AD Root for querying AD</param>
    /// <param name="filter">Filter to be used for querying</param>
    /// <param name="searchScope">Scope to be used for queryingg</param>
    /// <param name="propertiesToLoad">List of properties to return</param>
    /// <param name="pageSize">Represents a PageSise for the paged search of AD</param>
    /// <param name="rowsLimit">Rrepresent limit for numbers of rows returned. NULL or value less than 1 represents unlimited</param>
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void QueryAD(string adRoot, string filter, string propertiesToLoad, string searchScope, int pageSize, SqlInt32 rowsLimit)
    {
        SearchAD(null, null, null, adRoot, filter, searchScope, propertiesToLoad, pageSize, rowsLimit);
    }

    /// <summary>
    /// Queries Active directory according provided parameters
    /// </summary>
    /// <param name="userName">UserName to be used to authenticate AD</param>
    /// <param name="password">Password to be used to authenticate to AD</param>
    /// <param name="adRoot">AD Root for querying AD</param>
    /// <param name="filter">Filter to be used for querying</param>
    /// <param name="searchScope">Scope to be used for queryingg</param>
    /// <param name="propertiesToLoad">List of properties to return</param>
    /// <param name="pageSize">Represents a PageSise for the paged search of AD</param>
    /// <param name="rowsLimit">Rrepresent limit for numbers of rows returned. NULL or value less than 1 represents unlimited</param>
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void QueryADUName(string userName, string password, string adRoot, string filter, string propertiesToLoad, string searchScope, int pageSize, SqlInt32 rowsLimit)
    {
        SearchAD(userName, password, null, adRoot, filter, searchScope, propertiesToLoad, pageSize, rowsLimit);
    }

    /// <summary>
    /// Queries Active directory according provided parameters
    /// </summary>
    /// <param name="userName">UserName to be used to authenticate AD</param>
    /// <param name="password">Password to be used to authenticate to AD</param>
    /// <param name="authType">Authentication type to be used to authenticate to AD</param>
    /// <param name="adRoot">AD Root for querying AD</param>
    /// <param name="filter">Filter to be used for querying</param>
    /// <param name="searchScope">Scope to be used for queryingg</param>
    /// <param name="propertiesToLoad">List of properties to return</param>
    /// <param name="pageSize">Represents a PageSise for the paged search of AD</param>
    /// <param name="rowsLimit">Rrepresent limit for numbers of rows returned. NULL or value less than 1 represents unlimited</param>
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void QueryADAuth(string userName, string password, string authType, string adRoot, string filter, string propertiesToLoad, string searchScope, int pageSize, SqlInt32 rowsLimit)
    {
        SearchAD(userName, password, authType, adRoot, filter, searchScope, propertiesToLoad, pageSize, rowsLimit);
    }

    private static DirectoryEntry GetRootEntry(string adRoot, string userName, string password, string authType)
    {
        if (userName == null)
            return new DirectoryEntry(adRoot);
        else if (authType == null)
            return new DirectoryEntry(adRoot, userName, password);
        else
        {
            AuthenticationTypes at;
            if (TryParseEnum<AuthenticationTypes>(authType, true, out at))
                return new DirectoryEntry(adRoot, userName, password, at);
            else
                throw new System.InvalidCastException(string.Format("authType must be one of '{0}'", GetEnumNames<AuthenticationTypes>()));
        }
    }

    private static string GetEnumNames<T>()
    {
        string[] names = Enum.GetNames(typeof(T));
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < names.Length; i++)
        {
            if (i > 0)
                sb.Append(',');
            sb.Append(names[i]);
        }
        return sb.ToString();
    }

    private static bool TryParseEnum<T>(string value, bool ignoreCase, out T outEnum)
    {
        try
        {
            outEnum = (T)Enum.Parse(typeof(T), value, ignoreCase);
            return true;
        }
        catch
        {
            outEnum = (T)Enum.GetValues(typeof(T)).GetValue(0);
            return false;
        }
    }

    /// <summary>
    /// Searches Active Directory according provided parameters
    /// </summary>
    /// <param name="userName">UserName to be used to authenticate AD</param>
    /// <param name="password">Password to be used to authenticate to AD</param>
    /// <param name="authType">Authentication type to be used to authenticate to AD</param>
    /// <param name="adRoot">AD Root for querying AD</param>
    /// <param name="filter">Filter to be used for querying</param>
    /// <param name="searchScope">Scope to be used for queryingg</param>
    /// <param name="propertiesToLoad">List of properties to return</param>
    /// <param name="pageSize">Represents a PageSise for the paged search of AD</param>
    /// <param name="rowsLimit">Rrepresent limit for numbers of rows returned. NULL or value less than 1 represents unlimited</param>
    private static void SearchAD(string userName, string password, string authType, string adRoot, string filter, string searchScope, string propertiesToLoad, int pageSize, SqlInt32 rowsLimit)
    {
        string[] properties = propertiesToLoad.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
        SqlMetaData[] recordMetaData = new SqlMetaData[properties.Length];
        SearchScope scope;

        int limit = rowsLimit.IsNull ? 0 : rowsLimit.Value;
        int rowsCount = 0;

        if (rowsLimit > 0 && pageSize > limit)
            pageSize = limit;

        if (!TryParseEnum<SearchScope>(searchScope, true, out scope))
            throw new System.InvalidCastException(string.Format("searchScope must be one of '{0}'", GetEnumNames<SearchScope>()));

        //Trim properties and prepare result set metadata, also process specified lengths
        for (int i = 0; i < properties.Length; i++)
        {
            string[] propDetails = properties[i].Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
            string propName = propDetails[0].Trim();
            int len;

            if (propDetails.Length < 2 || !int.TryParse(propDetails[1], out len) || len < 1 || len > 4000)
                len = 4000;

            recordMetaData[i] = new SqlMetaData(properties[i], System.Data.SqlDbType.NVarChar, len);
            properties[i] = propName;
        }

        //Get Root Directory Entry
        using (DirectoryEntry rootEntry = GetRootEntry(adRoot, userName, password, authType))
        {
            //Create a directory searcher with aproperiate filter, properties and search scope
            using (DirectorySearcher ds = new DirectorySearcher(rootEntry, filter, properties, scope))
            {
                ds.PageSize = pageSize; //Set Page Size - without this we will not do a paged search and we will be limiited to 1000 results

                //find all object from the rood, according the filter and search scope
                using (SearchResultCollection results = ds.FindAll())
                {
                    SqlDataRecord record = new SqlDataRecord(recordMetaData);
                    //Start pushing of records to client
                    SqlContext.Pipe.SendResultsStart(record);

                    foreach (SearchResult result in results)
                    {
                        record = new SqlDataRecord(recordMetaData);

                        for (int i = 0; i < properties.Length; i++)
                        {
                            ResultPropertyValueCollection props = result.Properties[properties[i]];

                            if (props.Count == 1)           //if property collection contains single vallue, set the record field to that value
                                record.SetSqlString(i, props[0].ToString());
                            else if (props.Count == 0)      //if property collection doesn't contain any value, set record field to NULL
                                record.SetSqlString(i, SqlString.Null);
                            else                            //In case of multiple value, separate the values by commas
                            {
                                StringBuilder sb = new StringBuilder();
                                bool firstItem = true;
                                foreach (object prop in props)
                                {
                                    if (!firstItem)
                                        sb.Append(',');
                                    else
                                        firstItem = false;

                                    sb.Append(prop.ToString());
                                }

                                record.SetSqlString(i, sb.ToString());
                            }
                        }

                        //send record to client
                        SqlContext.Pipe.SendResultsRow(record);

                        //if rowsLimit was reached, break the loop
                        if (++rowsCount == rowsLimit)
                            break;
                    }

                    //stop sending records to client
                    SqlContext.Pipe.SendResultsEnd();
                }
            }
        }
    }
}

The class contains three public methods (CLR Stored Procedures) QueryAD, QueryADUName and QueryADAuth. The first one will query AD using current user credentials and default authentication method, second will query AD using provided user credentials and default authentication method and in the third one we can specify also authentication method.

Those methods call a private method SearchAD, which takes care about the AD Searching and return the returns the result to client.

After the comment from Chris, I’ve updated my sample codes to have also the optional parameter pageSize which allow reduce the size of a Page for Paged Search used for querying the AD and avoid insufficient memory problems when querying higher amount of AD attributes.

I’ve also added a parameter rowsLimit which allows limit the maximum number of rows returned, which can be useful especially when querying very large AD. There is added a break into the loop which iterates the results from AD when the number of precessed imtes reach the limit.

UPDATE:

Finally I’ve also updated the private static void SearchAD method so now you pass the properties not as a comma separated list but a semicolon separated list. You can use a comma to specify the return length of each property. If the length is not specified, then the method will use a default 4000 characters length. This update should finally solve the issues with large number of properties returned when previously all were returned as nvarchar(4000)

The updated part of the code is the for loop which is processing properties and generating result set metadata, starting with the comment //Trim properties and prepare result set metadata, also process specified lengths

END OF UPDATE:

To be able to compile the code for example using Visual C# Express it is necessary to add reference to the System.DirectoryServices assembly in the project.

Once we compile the code and create say ADServices.dll assembly, we can register that assembly and CLR Stored procedures in our database.

Because the ActiveDirectory class is using System.DirectoryServices assembly, we will have to use UNSAFE PERMISSION_SET for our assembly and so the database using that assembly has to be TRUSTWORTHY.

ALTER DATABASE TestDB3 SET TRUSTWORTHY ON WITH ROLLBACK IMMEDIATE;

As we are using the System.DirectoryServices assembly, we have to register it in our database prior registering our assembly, otherwise we will not be able to register it.

--Register the NetFramework System.DirectoryServices assembly
CREATE ASSEMBLY [System.DirectoryServices]
AUTHORIZATION [dbo]
FROM 'C:WindowsMicrosoft.NETFrameworkv2.0.50727System.DirectoryServices.dll'
WITH PERMISSION_SET = UNSAFE
GO
--Register our assembly with ActiveDirectory CLR Stored Procedures
CREATE ASSEMBLY [ADServices]
AUTHORIZATION [dbo]
FROM 'C:CLRADServices.dll'
WITH PERMISSION_SET = UNSAFE
GO

Once we have successfully registered those assemblies, we can create the CLR Stored procedures in our DB.

--==========================================================================
-- @adRoot - root of searching eg. 'LDAP://OU=Sales,DC=Fabricam,DC=com'
-- @filter -  filter to be used for searching eg. '(&(objectCategory=group)'
-- @propertiesToLoad - list of properties to be retrieved eg. 'cn,50;ADsPath'
--                   - properties are separated by a semicolon and
--                   - and comma can be used to specify return length of the property
-- @searchScope - scope to be used for searching: {Base,OneLevel,Subtree}
-- @pageSize - specifies the PageSize for paged search - default is 1000
--               it is possible to lower the value if there is a problem 
--             retrieving such amount of records at once.
-- @rowsLimit - represents maximum number of rows returned.
--              NULL or value less than 1 represents unlimited
--==========================================================================
CREATE PROCEDURE usp_QueryAD
    @adRoot nvarchar(255),
    @filter nvarchar(255),
    @propertiesToLoad nvarchar(255),
    @searchScope nvarchar(8),
    @pageSize int = 1000,
    @rowsLimit int = 0
AS
EXTERNAL NAME [ADServices].[ActiveDirectory].[QueryAD]
GO
--==========================================================================
-- @userName - username to be used for authentificaiton to AD
-- @password - password to be used for authentification to AD
-- @adRoot - root of searching eg. 'LDAP://OU=Sales,DC=Fabricam,DC=com'
-- @filter -  filter to be used for searching eg. '(&(objectCategory=group)'
-- @propertiesToLoad - list of properties to be retrieved eg. 'cn,50;ADsPath'
--                   - properties are separated by a semicolon and
--                   - and comma can be used to specify return length of the property
-- @searchScope - scope to be used for searching: {Base,OneLevel,Subtree}
-- @pageSize - specifies the PageSize for paged search - default is 1000
--               it is possible to lower the value if there is a problem 
--             retrieving such amount of records at once.
-- @rowsLimit - represents maximum number of rows returned.
--              NULL or value less than 1 represents unlimited
--==========================================================================
CREATE PROCEDURE usp_QueryADUname
    @userName nvarchar(255),
    @password nvarchar(255),
    @adRoot nvarchar(255),
    @filter nvarchar(255),
    @propertiesToLoad nvarchar(255),
    @searchScope nvarchar(8),
    @pageSize int = 1000,
    @rowsLimit int = 0
AS
EXTERNAL NAME [ADServices].[ActiveDirectory].[QueryADUName]
GO
--==========================================================================
-- @userName - username to be used for authentificaiton to AD
-- @password - password to be used for authentification to AD
-- @authType - Authentification Type to be used for AD Authentification
--             {None,Secure,Encryption,SecureSocketsLayer,ReadonlyServer,Anonymous,
--             FastBind,Signing,Sealing,Delegation,ServerBind}
-- @adRoot - root of searching eg. 'LDAP://OU=Sales,DC=Fabricam,DC=com'
-- @filter -  filter to be used for searching eg. '(&(objectCategory=group)'
-- @propertiesToLoad - list of properties to be retrieved eg. 'cn,50;ADsPath'
--                   - properties are separated by a semicolon and
--                   - and comma can be used to specify return length of the property
-- @searchScope - scope to be used for searching: {Base,OneLevel,Subtree}
-- @pageSize - specifies the PageSize for paged search - default is 1000
--               it is possible to lower the value if there is a problem 
--             retrieving such amount of records at once.
-- @rowsLimit - represents maximum number of rows returned.
--              NULL or value less than 1 represents unlimited
--==========================================================================
CREATE PROCEDURE usp_QueryADAuth
    @userName nvarchar(255),
    @password nvarchar(255),
    @authType nvarchar(20),
    @adRoot nvarchar(255),
    @filter nvarchar(255),
    @propertiesToLoad nvarchar(255),
    @searchScope nvarchar(8),
    @pageSize int = 1000,
    @rowsLimit int = 0
AS
EXTERNAL NAME [ADServices].[ActiveDirectory].[QueryADAuth]
GO

After successful creation of the stored procedures we  can start querying the AD: The filter parameter is a LDAP filter according the Search Filter Syntax.

--Query AD Groups
--Result set will contain cn - common name, ADsPath
EXECUTE [usp_QueryAD] 
   @adRoot = 'LDAP://DC=Fabricam,DC=com'
  ,@filter = '(&(objectCategory=group))'
  ,@propertiesToLoad = 'cn,50;ADsPath'
  ,@searchScope = 'subtree'
GO

--Query users of Sales organization unit
--Result set will containt sn - surname, cn - common name and ADsPath
EXECUTE [usp_QueryAD] 
   @adRoot = 'LDAP://OU=Sales,DC=Fabricam,DC=com'
  ,@filter = '(&(objectCategory=user)(objectClass=user))'
  ,@propertiesToLoad = 'sn,cn,ADsPath'
  ,@searchScope = 'subtree'
GO

--Query al users belonging to Group "Sales Representatives" and providing user name and password 
--for AD Authetication including authetication type
--Result set will containt sn - surname, cn - common name and ADsPath
EXECUTE [TestDB3].[dbo].[usp_QueryADAuth] 
   @userName = 'FabricamADUserName'
  ,@password = 'ADuserNamePassword'
  ,@authType = 'secure'
  ,@adRoot = 'LDAP://DC=Fabricam,DC=com'
  ,@filter = '(&(objectCategory=user)(objectClass=user)(memberOf=CN=Sales Representatives,OU=Security Groups,OU=Groups,DC=Fabricam,DC=com))'
  ,@propertiesToLoad = 'sn,100;cn,200;ADsPath,1000'
  ,@searchScope = 'subtree'
GO

As we use a stored procedure for querying AD, then we cannot work directly with the result further (OK.. On Denali it will be possible thanks to the EXECUTE WITH RESULT SETS. On SQL Server 2005 and 2008 we could store the results e.g. to table variable to temp table and then work with the results as normally. From the CLR code we can see, that the result set contains all the AD properties we have passed as parameter and the order is exactly the same as in the input parameter.  The data type of each returned column is nvarchar with length which was specified in the properties list. If no length was specified or the length was less than one or grater than 4000 then the return type is nvarchar(4000).

--Query All users from AD, store them in a table variable and then
--select all users with their cn (common name) starting by "C"
DECLARE @result TABLE (
    sn nvarchar(100),
    cn nvarchar(200),
    ADsPath nvarchar(1000)
)

INSERT INTO @result
EXECUTE [TestDB3].[dbo].[usp_QueryADAuth] 
   @userName = 'FabricamADUserName'
  ,@password = 'ADuserNamePassword'
  ,@authType = 'secure'
  ,@adRoot = 'LDAP://DC=Fabricam,DC=com'
  ,@filter = '(&(objectCategory=user)(objectClass=user))'
  ,@propertiesToLoad = 'sn,100;cn,200;ADsPath,1000'
  ,@searchScope = 'subtree'

SELECT 
*
FROM @result
WHERE cn LIKE 'C%'

From the examples above we can see, that once we create CLR stored procedures for querying the AD, the queries to AD are quite easy. Comparing the the T-SQL and Linked Server or OPENROWSET solution we have much greater possibilities and what is most important, we are not limited to 1000 results from our query, so we can easily query all the object in AD.

If you are interested, you can download sample ADServices Visual C# 2010 Express project. It contains all the latest updates I have mentioned.

Here you can download an updated sample PPSqlDirectoryServices Visual C# 2010 project with all the lasted updates implemented.

Querying Active Directory on SQL Server using T-SQL

You may come to a situation when you need to retrieve list of users, groups or other information from Windows Active Directory (AD) or another LDAP (Lightweight Directory Access Protocol) from within SQL Server.  You may want to store the information from AD in SQL Server tables for later use, or for example determine list of users belonging to particular group etc..

There are basically 3 ways of accessing the Active Directory on SQL Server.

  1. Use Linked Server
  2. User OPENROWSET statement
  3. Using CLR

This article will focus on Querying the Active Directory using the first 2 options. The CLR way is describe in my next article Querying Active Directory on SQL Server using CLR.

Access Active Directory using Linked Server

First I will focus on the Linked Server way to access the AD. So the first step is to create a Linked Server. Tis can be done using the sp_addlinkedserver system stored procedure or using Management Studio and we will use the OLE DB Provider for Microsoft Directory Services (ADSDSOObject).

Creating the Linked Server using the sp_addlinkedserver system stored procedure

To get the linked server working we need to add the linked server itself and also provide credentials to be used by linked server. Here we have two possibilities – use the login’s current security context – this mean use credentials of currently authenticated user (will not work for SQL Server authentication) or provide some AD account credentials to access the linked server.

EXEC sp_addlinkedserver @server = N'ADSI', @srvproduct=N'Active Directory Service Interfaces', @provider=N'ADSDSOObject', @datasrc=N'adsdatasource'

Prior we can use the linked server we have to add a login to that linked server which will be used to authenticate against AD. There are three possibilities:

--Use current user security context (Linked server will use security context of currently autheticated user
EXEC sp_addlinkedsrvlogin @rmtsrvname=N'ADSI',@useself=N'True',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL

--Use one remote account for all users (Linked server will be available to all users) and will use 'DOMAINUserAccount' for authetication to AD
EXEC sp_addlinkedsrvlogin @rmtsrvname=N'ADSI',@useself=N'False',@locallogin=NULL,@rmtuser=N'DOMAINUserAccount',@rmtpassword=N'DomainAccountPassword'

--The Linked Server will be available to SQL Server Login 'SQL Server Login Name' and will use the 'DomainUserAccount' for authetication to AD
EXEC sp_addlinkedsrvlogin @rmtsrvname=N'ADSI',@useself=N'False',@locallogin=N'SQL Server Login Name',@rmtuser=N'DOMAINUserAccunt',@rmtpassword=N'DomainAccountPassword'
Creating the Linked Server using Management Studio

Create Linked Server Step 1Create Linked Server Step 2Create Linked Server Step 3

Querying AD using the Linked Server

Once we have created the linked server we can start querying the AD and we have two possibilities to construct the query against AD (LDAP).

  1. Use SQL Dialect (MSDN)
  2. Use LDAP Dialect (MSDN)

I will show some sample queries using both dialect and details you can find on the above links.

SQL Dialect

SQL Dialect uses the constructs of SQL language and we can write queries against the AD

--Select all groups
SELECT
*
FROM OPENQUERY(ADSI,'SELECT cn, ADsPath FROM ''LDAP://DC=Fabricam,DC=com'' WHERE objectCategory=''group''')

--Select all users
SELECT
*
FROM OPENQUERY(ADSI,'SELECT cn, ADsPath FROM ''LDAP://OU=Sales,DC=Fabricam,DC=com'' WHERE bjectCategory=''person'' AND objectClass=''user''')

Be aware, the the LDAP is case sensitive, so you have to write LDAP:// and not ldap:// otherwise you will not receive any results. Also remember that the LDAP path needs to be enclosed in single quotes and therefore when it is specified in the OPENQUERY statement, it needs to be double single quoted as in example.

Also remember, that LDAP query returns only first 1000 records matching the specified conditions. Once the limit is met, you will receive and error message. You can limit the number of records using the TOP clause to not receive the error. To receive more than 1000 records is possible by using pagination, but it’s quite problematic in T-SQL and I will show this method in the upcoming article which will focus on getting AD data using CLR.

LDAP Dialect

As mentioned above, the other option to make query against the AD is using the LDAP dialect and details about the dialect you can find on the link above. Details how you write the condition in LDAP dialect you can find on MSDN Search Filter Syntax. Using the LDAP dialect we have a greater possibilities in writing conditions, which allow us easily retrieve members of AD Groups etc..

--Select all groups
SELECT
*
FROM OPENQUERY(ADSI,'<LDAP://DC=Fabricam,DC=com>;(&(objectCategory=group));cn,ADsPath;subtree')

--Select all users
SELECT
*
FROM OPENQUERY(ADSI,'<LDAP://OU=Sales,DC=Fabricam,DC=com>;(&(objectCategory=person)(objectClass=user));cn,ADsPath;subtree')

--Select all users belonging to group 'MyGroup'
SELECT
*
FROM OPENQUERY(ADSI,'<LDAP://DC=Fabricam,DC=com>;(&(objectCategory=user)(objectClass=user)(memberOf=MyGroups,OU=Security Groups,OU=Groups,DC=Fabricam,DC=com));cn,ADsPath;subtree')

Access Active Directory using OPENROWSET statement

The method of accessing Active Directory using the OPENROWSET statement is similar to the Linked Server solution. The difference is, that we do not have to create the linked server, but instead of this we provide all the information related to driver, login etc. directly as parameters of the OPENROWSET statement.

Again we have the possibilities to use SQL Dialect or LDAP dialect whatever suits your needs better. Here are some Examples.

--Select all Groups which name starts with "D" (SQL Dialect)
SELECT
*
FROM OPENROWSET('ADSDSOObject', '','SELECT cn, ADsPath FROM ''LDAP://DC=Fabricam,DC=com'' WHERE objectCategory=''group'' AND cn=''D*''')


--Select all users (LDAP Dialect) providing also credentials and flags
SELECT
*
FROM OPENROWSET('ADSDSOObject', 'User ID=UserName;Password=UserPassword;ADSI Flag=0x11;','<LDAP://OU=Sales,DC=Fabricam,DC=com>;(&(objectCategory=user)(objectClass=user));cn,ADsPath;subtree')

In the second example with LDAP Dialect I’ve used also an “ADSI Flag” option. This option specifies binding authentication option and can be a combination of values from the ADS_AUTHENTICATION_ENUM.

Conclusion

The ADSI interface provides us an easy and simple way how to query Active Directory from SQL Server directly sing T-SQL commands. The biggest limitation of that approach is, that it cannot return more than 1000 records in a single batch. This is possible using a Paged Search, but unfortunately this is not available in the T-SQL approach. Although we are limited to 1000 records per batch, we can partially avoid this limitation by adding additional conditions to the query e.g. retrieving all users in batches by the first letter of Common Name etc.. The real power in querying AD brings CLR solution described in Querying Active Directory on SQL Server using CLR.