Reading SharePoint Surveys and Lists using a SQL Server Integration Services Package – Updated

On www.mssqltips.com (Reading SharePoint Surveys and Lists using a SQL Server Integration Services SSIS Package) you can find an updated version of my original post Reading SharePoint List and Survey data using Integration Services package.

The updated version on mssqltips describes an easier way of reading the Surveys and Lists data with simplified use of URL protocol an more elegant way of final data extraction using the T-SQL XML methods.

Post also contains complete demo solution.

Advertisements

SSIS 2008 Columns To XML Data Flow Transformation

This article represents a custom components for transformation of columns inside SSIS Data Flow into XML.

If you are writing ETL processes using SSIS, you many times comes to situation, when you need log errors or other problems which can occur during ETL process. An example can be a failure of data conversion during the ETL process. In that situation you mostly would like to know that record exactly caused that error and log such conversion errors into particular error table.

Problem occurs when you have a lot of tables and want to see what data were converted and caused that particular error. As SSIS requires that the sources and destinations needs to be strongly mapped and you have different sources with different structures, this leads to a situation where you have a lot of different error tables – generally one per the source structure at which an error can occur.

This situation can be easily solved by a custom Data Flow component which will convert all source columns into an XML. In that situation you can only have one Error Table, which will contain e.g. Error Code, Error Column, Description and an XML Field in which will be all the data of particular row which caused error encoded in XML Form. This rapidly simplifies error logging and also error log monitoring as you need to monitor only a single table.

So the SSIS Data Flow with the Custom Columns to XML Transformation could look like the one below.

Columns To Xml Data FLow

ColumnsToXML PropertiesColumnsToXML Input Columns

You can provide a SourceID and SourceName properties which are then part of the produced XML so you can easily identify the source. In the Input Columns you select columns you want to be written to the XML. You can also specify an OutpuAlias and the the column names written into the XML will be the one specified in the OutputAlias property.

The Log written using that component and the XML produced will look like the ones below.

ColumnsToXML Log

<row sourceID="1" sourceName="Test Data Conversion">
  <Column name="Name" id="235" lineageId="72">Joseph</Column>
  <Column name="ValueInt" id="236" lineageId="75">2</Column>
  <Column name="ValueDate" id="237" lineageId="78">2012/01/35</Column>
</row>

Using the .value method of XML data type we can easily querying necessary information from the log table for whatever source in the table. So for our sample scenario we would like to query the Name, ValueInt and ValueDate columns stored as XML. This can be easily achieved by below SQL query:

SELECT
    [RowID]
    ,[ErrorCode]
    ,[ErrorColumn]
    ,[ColumnsXMLData].value('data((/row/Column[@name="Name"])[1])', 'varchar(128)') AS Name
    ,[ColumnsXMLData].value('data((/row/Column[@name="ValueInt"])[1])', 'varchar(10)') AS ValueInt
    ,[ColumnsXMLData].value('data((/row/Column[@name="ValueDate"])[1])', 'varchar(10)') AS ValueDate
FROM [TestDB].[dbo].[ETLErrorLog]
WHERE [ColumnsXMLData].value('data((/row/@sourceID)[1])', 'int') = 1

As you can see from the samples above, you can create easily create single Error table for whatever ETL logging and simplify the logging rapidly as you#160; do not need to create the error tables with separate structures for each source.

ColumnsToXML Implementation

The ColumnsToXmlTransform is a custom implementation of the PipelineComponent. It is developed in C# for .NET framework 3.5 and uses Linq to XML for creating the XML output.

The component processes selected input columns, generates the XML and stores it in the output column called ColumnsXMLData.

Except the ColumsXMLDataSourceID and SourceName) to allow you to identify the source row source once it is written as XML into the database and simplifies querying for data coming from particular sources as showed above.

Once the component is built, it is necessary to install it into the Global Assembly Cache and to the PipelineComponents folder in the SQL Server Integration services installation path on the machine on which the SSIS using this component will be running.

To install it into a GAC you can use the gacutil.exe which is part of the Windows SDK.

Component is now available publicly with other components on GitHub as projects for SSIS 2008 – SSIS 2016.

If you run the development environment as with elevated Administrator privileges, the project contains Post Build events, which will install the assembly into the GAC (it assumes you have Windows SDK 7.0 installed – if you have newer version, modify the path to the gacutil.exe accordingly) and also copies the assembly into the codePipelineComponents/code folder of the default SQL Server Integration services installation. If you installation directory differ, please modify the paths in Post Build events accordingly.

Once you have compiled the code, installed in GAC and copied to the PipelineComponents folder, you can include the component to the data flow items palette. Right-click on the palette and select Choose Items

Choose Data Flow Transformation Items Pop-upChoose SSIS Data Flow Items

Once you select the component, it will appear on the toolbox and you can start using it.

Columns To XML On Palette

Once I will have more time, I will try to create an installer for the component. Anyway, any comments are welcome.