SSIS 2008 Columns To XML Data Flow Transformation

This article represents a custom components for transformation of columns inside SSIS Data Flow into XML.

If you are writing ETL processes using SSIS, you many times comes to situation, when you need log errors or other problems which can occur during ETL process. An example can be a failure of data conversion during the ETL process. In that situation you mostly would like to know that record exactly caused that error and log such conversion errors into particular error table.

Problem occurs when you have a lot of tables and want to see what data were converted and caused that particular error. As SSIS requires that the sources and destinations needs to be strongly mapped and you have different sources with different structures, this leads to a situation where you have a lot of different error tables – generally one per the source structure at which an error can occur.

This situation can be easily solved by a custom Data Flow component which will convert all source columns into an XML. In that situation you can only have one Error Table, which will contain e.g. Error Code, Error Column, Description and an XML Field in which will be all the data of particular row which caused error encoded in XML Form. This rapidly simplifies error logging and also error log monitoring as you need to monitor only a single table.

So the SSIS Data Flow with the Custom Columns to XML Transformation could look like the one below.

Columns To Xml Data FLow

ColumnsToXML PropertiesColumnsToXML Input Columns

You can provide a SourceID and SourceName properties which are then part of the produced XML so you can easily identify the source. In the Input Columns you select columns you want to be written to the XML. You can also specify an OutpuAlias and the the column names written into the XML will be the one specified in the OutputAlias property.

The Log written using that component and the XML produced will look like the ones below.

ColumnsToXML Log

<row sourceID="1" sourceName="Test Data Conversion">
  <Column name="Name" id="235" lineageId="72">Joseph</Column>
  <Column name="ValueInt" id="236" lineageId="75">2</Column>
  <Column name="ValueDate" id="237" lineageId="78">2012/01/35</Column>
</row>

Using the .value method of XML data type we can easily querying necessary information from the log table for whatever source in the table. So for our sample scenario we would like to query the Name, ValueInt and ValueDate columns stored as XML. This can be easily achieved by below SQL query:

SELECT
    [RowID]
    ,[ErrorCode]
    ,[ErrorColumn]
    ,[ColumnsXMLData].value('data((/row/Column[@name="Name"])[1])', 'varchar(128)') AS Name
    ,[ColumnsXMLData].value('data((/row/Column[@name="ValueInt"])[1])', 'varchar(10)') AS ValueInt
    ,[ColumnsXMLData].value('data((/row/Column[@name="ValueDate"])[1])', 'varchar(10)') AS ValueDate
FROM [TestDB].[dbo].[ETLErrorLog]
WHERE [ColumnsXMLData].value('data((/row/@sourceID)[1])', 'int') = 1

As you can see from the samples above, you can create easily create single Error table for whatever ETL logging and simplify the logging rapidly as you#160; do not need to create the error tables with separate structures for each source.

ColumnsToXML Implementation

The ColumnsToXmlTransform is a custom implementation of the PipelineComponent. It is developed in C# for .NET framework 3.5 and uses Linq to XML for creating the XML output.

The component processes selected input columns, generates the XML and stores it in the output column called ColumnsXMLData.

Except the ColumsXMLDataSourceID and SourceName) to allow you to identify the source row source once it is written as XML into the database and simplifies querying for data coming from particular sources as showed above.

Once the component is built, it is necessary to install it into the Global Assembly Cache and to the PipelineComponents folder in the SQL Server Integration services installation path on the machine on which the SSIS using this component will be running.

To install it into a GAC you can use the gacutil.exe which is part of the Windows SDK.

Component is now available publicly with other components on GitLab as projects for SSIS 2008 – SSIS 2016.

If you run the development environment as with elevated Administrator privileges, the project contains Post Build events, which will install the assembly into the GAC (it assumes you have Windows SDK 7.0 installed – if you have newer version, modify the path to the gacutil.exe accordingly) and also copies the assembly into the codePipelineComponents/code folder of the default SQL Server Integration services installation. If you installation directory differ, please modify the paths in Post Build events accordingly.

Once you have compiled the code, installed in GAC and copied to the PipelineComponents folder, you can include the component to the data flow items palette. Right-click on the palette and select Choose Items

Choose Data Flow Transformation Items Pop-upChoose SSIS Data Flow Items

Once you select the component, it will appear on the toolbox and you can start using it.

Columns To XML On Palette

Once I will have more time, I will try to create an installer for the component. Anyway, any comments are welcome.

Advertisements

13 thoughts on “SSIS 2008 Columns To XML Data Flow Transformation

  1. Hi Pavel, I know it’s 2 years ago but I have a question. I’m trying to get this to work in SQL2012/VS2012 but whatever i try to do, the input.InputColumnCollection does not contain any incoming columns. Online documentation supports your code so I guess I’m doing something wrong. Any hint?

    Thanks in advance.
    Hans

  2. Hi Hans

    Did you manage to get this working with SQL2012/VS2012?

    I get the following errors linking in the SQL2012 DLLs, I’m assuming this component can’t be run in 64 bit?

    Warning 1 There was a mismatch between the processor architecture of the project being built “MSIL” and the processor architecture of the reference “Microsoft.SQLServer.DTSRuntimeWrap, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=x86”, “x86”. This mismatch may cause runtime failures. Please consider changing the targeted processor architecture of your project through the Configuration Manager so as to align the processor architectures between your project and references, or take a dependency on references with a processor architecture that matches the targeted processor architecture of your project. PP.SSIS.DataFlow

    When I do compile it against x86, the component fails to load the InputColumnCollection and fails with

    TITLE: Microsoft Visual Studio
    ——————————

    Value of ‘null’ is not valid for ‘stream’.

    ——————————
    BUTTONS:

    OK
    ——————————

    Cheers
    Nick

    • It works without issues with SQL Server 2012 and 2014 with small modifications.

      When I will have a little more time, I will post solutions for both SQL 2012 and SQL 2014. Probably during weekend or start of next week.

      Anyway, you are buliding the solution, you have to build it against Any CPU. As the VS is x86 SSIS can run in both x86 and x64.

      • Thanks Pavel.

        I have been able to get it working under VS 2012. It seems there is a problem with VS2013.

        The component installs correctly and works fine under VS 2012, however fails to display in the toolbox under VS2013.

        Cheers
        Nick

  3. Could anyone provide the sample project for this as the URL in the post is no longert valid!?

    Thanks and regards,
    Matt.

        • Hre yo ucan find my complete solution for DataFlow components including ColumnsToXml, HashColumns etc. Those are quite enhanced by GUI etc.

          It contains versions for 2008, 2012 and 2014.
          PP.SSIS.DataFlow

          • Hi Pavel,

            I know this was a year ago now but this link is dead, i am too looking for 2012 version if possible. This looks great and is exactly what I need. I have no idea why this isn’t something they include as standard

            • Hi,

              I made it available on GitLab: PP.SSIS

              Project contains laso some other components including GUI for parameterization.

              It covers SSIS 2008-SSIS 2016

            • A great many thanks Pavel for making public on GitLab. Best component around for converting error rows to XML.

              Timely 2016 update – Columns to XML confirmed working in VS 2015 SSDT and on a SSIS 2016 RC3 deployment πŸ™‚

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s