Converting AS400 EBCDIC 870 into Win1250 code page on SQL Server 2005

I have to read data from AS400 machine using an ODBC driver under SQL Server 2005.  The data on the AS400 machine are encoded using EBCDIC 870 – Multilingual/ROECE (Latin-2) code page and contains Czech diacritics.

Unfortunately,SQL Server 2005 have no native support for EBCDIC 870 code page and when you read the data, you receive wrong characters.

There are some EBCDIC character sets supported by SQL server, but not the 870 one.

select 
    * 
from fn_helpcollations()

I was googling and binging for some time, but didn’t find any suitable solution, if you need to select live data from the AS400 machine.

Finally I was looking deeper into to the problem and found an acceptable work around.

The ODBC driver and AS400 support HEX() function, which returns sequence of hexadecimal codes representing the value.

SELECT 
    * 
FROM OPENQUERY(AS400, 'SELECT HEX(FieldName) AS FieldName FROM AS400Table')
Once we have an hexadecimal sequence, we can handle it on the SQL Server side in a way we want.

You can crate an CLR function which will allow you to convert the source CBCDIC 870 code page to the destination code page of you choice.

[SqlFunction()]
public static SqlString fnEBCDIC870ToSql(SqlString hexBytes, int destinationCodePage)
{
    if (hexBytes.IsNull) return hexBytes;
    string str = hexBytes.ToString();
    byte[] buffer = new byte[str.Length / 2];

    Encoding destinationEncoding = Encoding.GetEncoding(destinationCodePage);
    for (int i = 0; i < buffer.Length; i++)
    {
        buffer[i] = byte.Parse(str.Substring(i * 2, 2), NumberStyles.HexNumber);
    } 

    buffer = Encoding.Convert(Encoding.GetEncoding(870), destinationEncoding, buffer);
    str = destinationEncoding.GetString(buffer);
    return new SqlString(str);
}

If you cannot use the CLR function on the SQL server or Database, you can simply create an Conversion table and conversion scalar function for this conversion.

for conversion to code page 1250:

CREATE FUNCTION [dbo].[fnEBCDIC870ToWin1250] ( 
    @pHex varchar(8000) 
) 
RETURNS varchar(4000) 
AS 
BEGIN 
    DECLARE @result varchar(4000) 

    SET @result = '';

    WITH Tally AS (
        SELECT
            ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) As RowNo
        FROM sys.all_columns
    )
    SELECT
        @result = @result + ISNULL(C.Character, '?') 
    FROM Tally
    LEFT JOIN CodePage870 C ON UPPER(SUBSTRING(@pHex, (RowNo - 1) * 2 + 1, 2)) = C.HexByte
    WHERE RowNo <= LEN(@pHex) / 2
    ORDER BY RowNo

    RETURN @result
END

The function uses a conversion table CodePage870.

CREATE TABLE CodePage870 (
    HexByte char(2) NOT NULL,
    Character char(1) NOT NULL,
    CONSTRAINT PK_CodePage870 PRIMARY KEY (HexByte)
)
INSERT CodePage870 (
    HexByte,
    Character
)
SELECT '40' As HexByte, ' ' As Character UNION ALL
SELECT '41' As HexByte, ' ' As Character UNION ALL
SELECT '42' As HexByte, 'â' As Character UNION ALL
SELECT '43' As HexByte, 'ä' As Character UNION ALL
SELECT '44' As HexByte, 'ţ' As Character UNION ALL
SELECT '45' As HexByte, 'á' As Character UNION ALL
SELECT '46' As HexByte, 'ă' As Character UNION ALL
SELECT '47' As HexByte, 'č' As Character UNION ALL
SELECT '48' As HexByte, 'ç' As Character UNION ALL
SELECT '49' As HexByte, 'ć' As Character UNION ALL
SELECT '4A' As HexByte, '[' As Character UNION ALL
SELECT '4B' As HexByte, '.' As Character UNION ALL
SELECT '4C' As HexByte, '<' As Character UNION ALL
SELECT '4D' As HexByte, '(' As Character UNION ALL
SELECT '4E' As HexByte, '+' As Character UNION ALL
SELECT '4F' As HexByte, '!' As Character UNION ALL
SELECT '50' As HexByte, '&' As Character UNION ALL
SELECT '51' As HexByte, 'é' As Character UNION ALL
SELECT '52' As HexByte, 'ę' As Character UNION ALL
SELECT '53' As HexByte, 'ë' As Character UNION ALL
SELECT '54' As HexByte, 'ů' As Character UNION ALL
SELECT '55' As HexByte, 'í' As Character UNION ALL
SELECT '56' As HexByte, 'î' As Character UNION ALL
SELECT '57' As HexByte, 'ľ' As Character UNION ALL
SELECT '58' As HexByte, 'ĺ' As Character UNION ALL
SELECT '59' As HexByte, 'ß' As Character UNION ALL
SELECT '5A' As HexByte, ']' As Character UNION ALL
SELECT '5B' As HexByte, '$' As Character UNION ALL
SELECT '5C' As HexByte, '*' As Character UNION ALL
SELECT '5D' As HexByte, ')' As Character UNION ALL
SELECT '5E' As HexByte, ';' As Character UNION ALL
SELECT '5F' As HexByte, '^' As Character UNION ALL
SELECT '60' As HexByte, '-' As Character UNION ALL
SELECT '61' As HexByte, '/' As Character UNION ALL
SELECT '62' As HexByte, 'Â' As Character UNION ALL
SELECT '63' As HexByte, 'Ä' As Character UNION ALL
SELECT '64' As HexByte, '˝' As Character UNION ALL
SELECT '65' As HexByte, 'Á' As Character UNION ALL
SELECT '66' As HexByte, 'Ă' As Character UNION ALL
SELECT '67' As HexByte, 'Č' As Character UNION ALL
SELECT '68' As HexByte, 'Ç' As Character UNION ALL
SELECT '69' As HexByte, 'Ć' As Character UNION ALL
SELECT '6A' As HexByte, '|' As Character UNION ALL
SELECT '6B' As HexByte, ',' As Character UNION ALL
SELECT '6C' As HexByte, '%' As Character UNION ALL
SELECT '6D' As HexByte, '_' As Character UNION ALL
SELECT '6E' As HexByte, '>' As Character UNION ALL
SELECT '6F' As HexByte, '?' As Character UNION ALL
SELECT '70' As HexByte, 'ˇ' As Character UNION ALL
SELECT '71' As HexByte, 'É' As Character UNION ALL
SELECT '72' As HexByte, 'Ę' As Character UNION ALL
SELECT '73' As HexByte, 'Ë' As Character UNION ALL
SELECT '74' As HexByte, 'Ů' As Character UNION ALL
SELECT '75' As HexByte, 'Í' As Character UNION ALL
SELECT '76' As HexByte, 'Î' As Character UNION ALL
SELECT '77' As HexByte, 'Ľ' As Character UNION ALL
SELECT '78' As HexByte, 'Ĺ' As Character UNION ALL
SELECT '79' As HexByte, '`' As Character UNION ALL
SELECT '7A' As HexByte, ':' As Character UNION ALL
SELECT '7B' As HexByte, '#' As Character UNION ALL
SELECT '7C' As HexByte, '@' As Character UNION ALL
SELECT '7D' As HexByte, '''' As Character UNION ALL
SELECT '7E' As HexByte, '=' As Character UNION ALL
SELECT '7F' As HexByte, '"' As Character UNION ALL
SELECT '80' As HexByte, '˘' As Character UNION ALL
SELECT '81' As HexByte, 'a' As Character UNION ALL
SELECT '82' As HexByte, 'b' As Character UNION ALL
SELECT '83' As HexByte, 'c' As Character UNION ALL
SELECT '84' As HexByte, 'd' As Character UNION ALL
SELECT '85' As HexByte, 'e' As Character UNION ALL
SELECT '86' As HexByte, 'f' As Character UNION ALL
SELECT '87' As HexByte, 'g' As Character UNION ALL
SELECT '88' As HexByte, 'h' As Character UNION ALL
SELECT '89' As HexByte, 'i' As Character UNION ALL
SELECT '8A' As HexByte, 'ś' As Character UNION ALL
SELECT '8B' As HexByte, 'ň' As Character UNION ALL
SELECT '8C' As HexByte, 'đ' As Character UNION ALL
SELECT '8D' As HexByte, 'ý' As Character UNION ALL
SELECT '8E' As HexByte, 'ř' As Character UNION ALL
SELECT '8F' As HexByte, 'ş' As Character UNION ALL
SELECT '90' As HexByte, '°' As Character UNION ALL
SELECT '91' As HexByte, 'j' As Character UNION ALL
SELECT '92' As HexByte, 'k' As Character UNION ALL
SELECT '93' As HexByte, 'l' As Character UNION ALL
SELECT '94' As HexByte, 'm' As Character UNION ALL
SELECT '95' As HexByte, 'n' As Character UNION ALL
SELECT '96' As HexByte, 'o' As Character UNION ALL
SELECT '97' As HexByte, 'p' As Character UNION ALL
SELECT '98' As HexByte, 'q' As Character UNION ALL
SELECT '99' As HexByte, 'r' As Character UNION ALL
SELECT '9A' As HexByte, 'ł' As Character UNION ALL
SELECT '9B' As HexByte, 'ń' As Character UNION ALL
SELECT '9C' As HexByte, 'š' As Character UNION ALL
SELECT '9D' As HexByte, '¸' As Character UNION ALL
SELECT '9E' As HexByte, '˛' As Character UNION ALL
SELECT '9F' As HexByte, '¤' As Character UNION ALL
SELECT 'A0' As HexByte, 'ą' As Character UNION ALL
SELECT 'A1' As HexByte, '~' As Character UNION ALL
SELECT 'A2' As HexByte, 's' As Character UNION ALL
SELECT 'A3' As HexByte, 't' As Character UNION ALL
SELECT 'A4' As HexByte, 'u' As Character UNION ALL
SELECT 'A5' As HexByte, 'v' As Character UNION ALL
SELECT 'A6' As HexByte, 'w' As Character UNION ALL
SELECT 'A7' As HexByte, 'x' As Character UNION ALL
SELECT 'A8' As HexByte, 'y' As Character UNION ALL
SELECT 'A9' As HexByte, 'z' As Character UNION ALL
SELECT 'AA' As HexByte, 'Ś' As Character UNION ALL
SELECT 'AB' As HexByte, 'Ň' As Character UNION ALL
SELECT 'AC' As HexByte, 'Đ' As Character UNION ALL
SELECT 'AD' As HexByte, 'Ý' As Character UNION ALL
SELECT 'AE' As HexByte, 'Ř' As Character UNION ALL
SELECT 'AF' As HexByte, 'Ş' As Character UNION ALL
SELECT 'B0' As HexByte, '˙' As Character UNION ALL
SELECT 'B1' As HexByte, 'Ą' As Character UNION ALL
SELECT 'B2' As HexByte, 'ż' As Character UNION ALL
SELECT 'B3' As HexByte, 'Ţ' As Character UNION ALL
SELECT 'B4' As HexByte, 'Ż' As Character UNION ALL
SELECT 'B5' As HexByte, '§' As Character UNION ALL
SELECT 'B6' As HexByte, 'ž' As Character UNION ALL
SELECT 'B7' As HexByte, 'ź' As Character UNION ALL
SELECT 'B8' As HexByte, 'Ž' As Character UNION ALL
SELECT 'B9' As HexByte, 'Ź' As Character UNION ALL
SELECT 'BA' As HexByte, 'Ł' As Character UNION ALL
SELECT 'BB' As HexByte, 'Ń' As Character UNION ALL
SELECT 'BC' As HexByte, 'Š' As Character UNION ALL
SELECT 'BD' As HexByte, '¨' As Character UNION ALL
SELECT 'BE' As HexByte, '´' As Character UNION ALL
SELECT 'BF' As HexByte, '×' As Character UNION ALL
SELECT 'C0' As HexByte, '{' As Character UNION ALL
SELECT 'C1' As HexByte, 'A' As Character UNION ALL
SELECT 'C2' As HexByte, 'B' As Character UNION ALL
SELECT 'C3' As HexByte, 'C' As Character UNION ALL
SELECT 'C4' As HexByte, 'D' As Character UNION ALL
SELECT 'C5' As HexByte, 'E' As Character UNION ALL
SELECT 'C6' As HexByte, 'F' As Character UNION ALL
SELECT 'C7' As HexByte, 'G' As Character UNION ALL
SELECT 'C8' As HexByte, 'H' As Character UNION ALL
SELECT 'C9' As HexByte, 'I' As Character UNION ALL
SELECT 'CA' As HexByte, '­' As Character UNION ALL
SELECT 'CB' As HexByte, 'ô' As Character UNION ALL
SELECT 'CC' As HexByte, 'ö' As Character UNION ALL
SELECT 'CD' As HexByte, 'ŕ' As Character UNION ALL
SELECT 'CE' As HexByte, 'ó' As Character UNION ALL
SELECT 'CF' As HexByte, 'ő' As Character UNION ALL
SELECT 'D0' As HexByte, '}' As Character UNION ALL
SELECT 'D1' As HexByte, 'J' As Character UNION ALL
SELECT 'D2' As HexByte, 'K' As Character UNION ALL
SELECT 'D3' As HexByte, 'L' As Character UNION ALL
SELECT 'D4' As HexByte, 'M' As Character UNION ALL
SELECT 'D5' As HexByte, 'N' As Character UNION ALL
SELECT 'D6' As HexByte, 'O' As Character UNION ALL
SELECT 'D7' As HexByte, 'P' As Character UNION ALL
SELECT 'D8' As HexByte, 'Q' As Character UNION ALL
SELECT 'D9' As HexByte, 'R' As Character UNION ALL
SELECT 'DA' As HexByte, 'Ě' As Character UNION ALL
SELECT 'DB' As HexByte, 'ű' As Character UNION ALL
SELECT 'DC' As HexByte, 'ü' As Character UNION ALL
SELECT 'DD' As HexByte, 'ť' As Character UNION ALL
SELECT 'DE' As HexByte, 'ú' As Character UNION ALL
SELECT 'DF' As HexByte, 'ě' As Character UNION ALL
SELECT 'E0' As HexByte, '' As Character UNION ALL
SELECT 'E1' As HexByte, '÷' As Character UNION ALL
SELECT 'E2' As HexByte, 'S' As Character UNION ALL
SELECT 'E3' As HexByte, 'T' As Character UNION ALL
SELECT 'E4' As HexByte, 'U' As Character UNION ALL
SELECT 'E5' As HexByte, 'V' As Character UNION ALL
SELECT 'E6' As HexByte, 'W' As Character UNION ALL
SELECT 'E7' As HexByte, 'X' As Character UNION ALL
SELECT 'E8' As HexByte, 'Y' As Character UNION ALL
SELECT 'E9' As HexByte, 'Z' As Character UNION ALL
SELECT 'EA' As HexByte, 'ď' As Character UNION ALL
SELECT 'EB' As HexByte, 'Ô' As Character UNION ALL
SELECT 'EC' As HexByte, 'Ö' As Character UNION ALL
SELECT 'ED' As HexByte, 'Ŕ' As Character UNION ALL
SELECT 'EE' As HexByte, 'Ó' As Character UNION ALL
SELECT 'EF' As HexByte, 'Ő' As Character UNION ALL
SELECT 'F0' As HexByte, '0' As Character UNION ALL
SELECT 'F1' As HexByte, '1' As Character UNION ALL
SELECT 'F2' As HexByte, '2' As Character UNION ALL
SELECT 'F3' As HexByte, '3' As Character UNION ALL
SELECT 'F4' As HexByte, '4' As Character UNION ALL
SELECT 'F5' As HexByte, '5' As Character UNION ALL
SELECT 'F6' As HexByte, '6' As Character UNION ALL
SELECT 'F7' As HexByte, '7' As Character UNION ALL
SELECT 'F8' As HexByte, '8' As Character UNION ALL
SELECT 'F9' As HexByte, '9' As Character UNION ALL
SELECT 'FA' As HexByte, 'Ď' As Character UNION ALL
SELECT 'FB' As HexByte, 'Ű' As Character UNION ALL
SELECT 'FC' As HexByte, 'Ü' As Character UNION ALL
SELECT 'FD' As HexByte, 'Ť' As Character UNION ALL
SELECT 'FE' As HexByte, 'Ú' As Character UNION ALL
SELECT 'FF' As HexByte, '?' As Character

The Script was generated by simple C# method.

public static void CreateEBCDICScript(string destinationFileName, int destinationCodePage)
{
    StringBuilder sb = new StringBuilder();
    byte[] buffer = new byte[1];
    Encoding ebcdic = Encoding.GetEncoding(870);
    Encoding destinationEncoding = Encoding.GetEncoding(destinationCodePage);    

    for (int i = 64; i < 256; i++)
    {
        buffer[0] = (byte)i;
        buffer = Encoding.Convert(ebcdic, destinationEncoding, buffer);
        string target = destinationEncoding.GetString(buffer);
        sb.AppendFormat("SELECT '{0:X}' AS HexByte, '{1}' AS Character UNION ALL", i, target.Equals("'") ? "''" : target);
        sb.AppendLine();
    }    

    File.WriteAllText(destinationFileName, sb.ToString();
}
You can use it to generate script for whatever encoding you want.

Finally as you have the conversion functions in database you can use below selects to receive correctly encoded strings from within the AS400 machine.

--CLR Version 
SELECT 
    fnECDIC870ToSql(FieldName, 1250) As FieldName 
FROM OPENQUERY(AS400, 'SELECT HEX(FieldName) AS FieldName FROM AS400Table')

--T-SQL Version 
SELECT 
    fnECDIC870ToWin1250(FieldName) As FieldName 
FROM OPENQUERY(AS400, 'SELECT HEX(FieldName) AS FieldName FROM AS400Table')
Advertisements