This article represents a custom components for transformation of columns inside SSIS Data Flow into XML.
If you are writing ETL processes using SSIS, you many times comes to situation, when you need log errors or other problems which can occur during ETL process. An example can be a failure of data conversion during the ETL process. In that situation you mostly would like to know that record exactly caused that error and log such conversion errors into particular error table.
Problem occurs when you have a lot of tables and want to see what data were converted and caused that particular error. As SSIS requires that the sources and destinations needs to be strongly mapped and you have different sources with different structures, this leads to a situation where you have a lot of different error tables – generally one per the source structure at which an error can occur.
This situation can be easily solved by a custom Data Flow component which will convert all source columns into an XML. In that situation you can only have one Error Table, which will contain e.g. Error Code, Error Column, Description and an XML Field in which will be all the data of particular row which caused error encoded in XML Form. This rapidly simplifies error logging and also error log monitoring as you need to monitor only a single table.
So the SSIS Data Flow with the Custom Columns to XML Transformation could look like the one below.
You can provide a SourceID
and SourceName
properties which are then part of the produced XML so you can easily identify the source. In the Input Columns you select columns you want to be written to the XML. You can also specify an OutpuAlias
and the the column names written into the XML will be the one specified in the OutputAlias
property.
The Log written using that component and the XML produced will look like the ones below.
<row sourceID="1" sourceName="Test Data Conversion"> <Column name="Name" id="235" lineageId="72">Joseph</Column> <Column name="ValueInt" id="236" lineageId="75">2</Column> <Column name="ValueDate" id="237" lineageId="78">2012/01/35</Column> </row>
Using the .value
method of XML data type we can easily querying necessary information from the log table for whatever source in the table. So for our sample scenario we would like to query the Name
, ValueInt
and ValueDate
columns stored as XML. This can be easily achieved by below SQL query:
SELECT [RowID] ,[ErrorCode] ,[ErrorColumn] ,[ColumnsXMLData].value('data((/row/Column[@name="Name"])[1])', 'varchar(128)') AS Name ,[ColumnsXMLData].value('data((/row/Column[@name="ValueInt"])[1])', 'varchar(10)') AS ValueInt ,[ColumnsXMLData].value('data((/row/Column[@name="ValueDate"])[1])', 'varchar(10)') AS ValueDate FROM [TestDB].[dbo].[ETLErrorLog] WHERE [ColumnsXMLData].value('data((/row/@sourceID)[1])', 'int') = 1
As you can see from the samples above, you can create easily create single Error table for whatever ETL logging and simplify the logging rapidly as you#160; do not need to create the error tables with separate structures for each source.
ColumnsToXML Implementation
The ColumnsToXmlTransform
is a custom implementation of the PipelineComponent
. It is developed in C# for .NET framework 3.5 and uses Linq to XML for creating the XML output.
The component processes selected input columns, generates the XML and stores it in the output column called ColumnsXMLData
.
Except the ColumsXMLDataSourceID
and SourceName
) to allow you to identify the source row source once it is written as XML into the database and simplifies querying for data coming from particular sources as showed above.
Once the component is built, it is necessary to install it into the Global Assembly Cache and to the PipelineComponents
folder in the SQL Server Integration services installation path on the machine on which the SSIS using this component will be running.
To install it into a GAC you can use the gacutil.exe
which is part of the Windows SDK.
Component is now available publicly with other components on GitHub as projects for SSIS 2008 – SSIS 2016.
If you run the development environment as with elevated Administrator privileges, the project contains Post Build events, which will install the assembly into the GAC (it assumes you have Windows SDK 7.0 installed – if you have newer version, modify the path to the gacutil.exe accordingly) and also copies the assembly into the codePipelineComponents/code folder of the default SQL Server Integration services installation. If you installation directory differ, please modify the paths in Post Build events accordingly.
Once you have compiled the code, installed in GAC and copied to the PipelineComponents
folder, you can include the component to the data flow items palette. Right-click on the palette and select Choose Items
Once you select the component, it will appear on the toolbox and you can start using it.
Once I will have more time, I will try to create an installer for the component. Anyway, any comments are welcome.