Converting AS400 EBCDIC 870 into Win1250 code page on SQL Server 2005

I have to read data from AS400 machine using an ODBC driver under SQL Server 2005.  The data on the AS400 machine are encoded using EBCDIC 870 – Multilingual/ROECE (Latin-2) code page and contains Czech diacritics.

Unfortunately,SQL Server 2005 have no native support for EBCDIC 870 code page and when you read the data, you receive wrong characters.

There are some EBCDIC character sets supported by SQL server, but not the 870 one.

select 
    * 
from fn_helpcollations()

I was googling and binging for some time, but didn’t find any suitable solution, if you need to select live data from the AS400 machine.

Finally I was looking deeper into to the problem and found an acceptable work around.

The ODBC driver and AS400 support HEX() function, which returns sequence of hexadecimal codes representing the value.

SELECT 
    * 
FROM OPENQUERY(AS400, 'SELECT HEX(FieldName) AS FieldName FROM AS400Table')
Once we have an hexadecimal sequence, we can handle it on the SQL Server side in a way we want.

You can crate an CLR function which will allow you to convert the source CBCDIC 870 code page to the destination code page of you choice.

[SqlFunction()]
public static SqlString fnEBCDIC870ToSql(SqlString hexBytes, int destinationCodePage)
{
    if (hexBytes.IsNull) return hexBytes;
    string str = hexBytes.ToString();
    byte[] buffer = new byte[str.Length / 2];

    Encoding destinationEncoding = Encoding.GetEncoding(destinationCodePage);
    for (int i = 0; i < buffer.Length; i++)
    {
        buffer[i] = byte.Parse(str.Substring(i * 2, 2), NumberStyles.HexNumber);
    } 

    buffer = Encoding.Convert(Encoding.GetEncoding(870), destinationEncoding, buffer);
    str = destinationEncoding.GetString(buffer);
    return new SqlString(str);
}

If you cannot use the CLR function on the SQL server or Database, you can simply create an Conversion table and conversion scalar function for this conversion.

for conversion to code page 1250:

CREATE FUNCTION [dbo].[fnEBCDIC870ToWin1250] ( 
    @pHex varchar(8000) 
) 
RETURNS varchar(4000) 
AS 
BEGIN 
    DECLARE @result varchar(4000) 

    SET @result = '';

    WITH Tally AS (
        SELECT
            ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) As RowNo
        FROM sys.all_columns
    )
    SELECT
        @result = @result + ISNULL(C.Character, '?') 
    FROM Tally
    LEFT JOIN CodePage870 C ON UPPER(SUBSTRING(@pHex, (RowNo - 1) * 2 + 1, 2)) = C.HexByte
    WHERE RowNo <= LEN(@pHex) / 2
    ORDER BY RowNo

    RETURN @result
END

The function uses a conversion table CodePage870.

CREATE TABLE CodePage870 (
    HexByte char(2) NOT NULL,
    Character char(1) NOT NULL,
    CONSTRAINT PK_CodePage870 PRIMARY KEY (HexByte)
)
INSERT CodePage870 (
    HexByte,
    Character
)
SELECT '40' As HexByte, ' ' As Character UNION ALL
SELECT '41' As HexByte, ' ' As Character UNION ALL
SELECT '42' As HexByte, 'â' As Character UNION ALL
SELECT '43' As HexByte, 'ä' As Character UNION ALL
SELECT '44' As HexByte, 'ţ' As Character UNION ALL
SELECT '45' As HexByte, 'á' As Character UNION ALL
SELECT '46' As HexByte, 'ă' As Character UNION ALL
SELECT '47' As HexByte, 'č' As Character UNION ALL
SELECT '48' As HexByte, 'ç' As Character UNION ALL
SELECT '49' As HexByte, 'ć' As Character UNION ALL
SELECT '4A' As HexByte, '[' As Character UNION ALL
SELECT '4B' As HexByte, '.' As Character UNION ALL
SELECT '4C' As HexByte, '<' As Character UNION ALL
SELECT '4D' As HexByte, '(' As Character UNION ALL
SELECT '4E' As HexByte, '+' As Character UNION ALL
SELECT '4F' As HexByte, '!' As Character UNION ALL
SELECT '50' As HexByte, '&' As Character UNION ALL
SELECT '51' As HexByte, 'é' As Character UNION ALL
SELECT '52' As HexByte, 'ę' As Character UNION ALL
SELECT '53' As HexByte, 'ë' As Character UNION ALL
SELECT '54' As HexByte, 'ů' As Character UNION ALL
SELECT '55' As HexByte, 'í' As Character UNION ALL
SELECT '56' As HexByte, 'î' As Character UNION ALL
SELECT '57' As HexByte, 'ľ' As Character UNION ALL
SELECT '58' As HexByte, 'ĺ' As Character UNION ALL
SELECT '59' As HexByte, 'ß' As Character UNION ALL
SELECT '5A' As HexByte, ']' As Character UNION ALL
SELECT '5B' As HexByte, '$' As Character UNION ALL
SELECT '5C' As HexByte, '*' As Character UNION ALL
SELECT '5D' As HexByte, ')' As Character UNION ALL
SELECT '5E' As HexByte, ';' As Character UNION ALL
SELECT '5F' As HexByte, '^' As Character UNION ALL
SELECT '60' As HexByte, '-' As Character UNION ALL
SELECT '61' As HexByte, '/' As Character UNION ALL
SELECT '62' As HexByte, 'Â' As Character UNION ALL
SELECT '63' As HexByte, 'Ä' As Character UNION ALL
SELECT '64' As HexByte, '˝' As Character UNION ALL
SELECT '65' As HexByte, 'Á' As Character UNION ALL
SELECT '66' As HexByte, 'Ă' As Character UNION ALL
SELECT '67' As HexByte, 'Č' As Character UNION ALL
SELECT '68' As HexByte, 'Ç' As Character UNION ALL
SELECT '69' As HexByte, 'Ć' As Character UNION ALL
SELECT '6A' As HexByte, '|' As Character UNION ALL
SELECT '6B' As HexByte, ',' As Character UNION ALL
SELECT '6C' As HexByte, '%' As Character UNION ALL
SELECT '6D' As HexByte, '_' As Character UNION ALL
SELECT '6E' As HexByte, '>' As Character UNION ALL
SELECT '6F' As HexByte, '?' As Character UNION ALL
SELECT '70' As HexByte, 'ˇ' As Character UNION ALL
SELECT '71' As HexByte, 'É' As Character UNION ALL
SELECT '72' As HexByte, 'Ę' As Character UNION ALL
SELECT '73' As HexByte, 'Ë' As Character UNION ALL
SELECT '74' As HexByte, 'Ů' As Character UNION ALL
SELECT '75' As HexByte, 'Í' As Character UNION ALL
SELECT '76' As HexByte, 'Î' As Character UNION ALL
SELECT '77' As HexByte, 'Ľ' As Character UNION ALL
SELECT '78' As HexByte, 'Ĺ' As Character UNION ALL
SELECT '79' As HexByte, '`' As Character UNION ALL
SELECT '7A' As HexByte, ':' As Character UNION ALL
SELECT '7B' As HexByte, '#' As Character UNION ALL
SELECT '7C' As HexByte, '@' As Character UNION ALL
SELECT '7D' As HexByte, '''' As Character UNION ALL
SELECT '7E' As HexByte, '=' As Character UNION ALL
SELECT '7F' As HexByte, '"' As Character UNION ALL
SELECT '80' As HexByte, '˘' As Character UNION ALL
SELECT '81' As HexByte, 'a' As Character UNION ALL
SELECT '82' As HexByte, 'b' As Character UNION ALL
SELECT '83' As HexByte, 'c' As Character UNION ALL
SELECT '84' As HexByte, 'd' As Character UNION ALL
SELECT '85' As HexByte, 'e' As Character UNION ALL
SELECT '86' As HexByte, 'f' As Character UNION ALL
SELECT '87' As HexByte, 'g' As Character UNION ALL
SELECT '88' As HexByte, 'h' As Character UNION ALL
SELECT '89' As HexByte, 'i' As Character UNION ALL
SELECT '8A' As HexByte, 'ś' As Character UNION ALL
SELECT '8B' As HexByte, 'ň' As Character UNION ALL
SELECT '8C' As HexByte, 'đ' As Character UNION ALL
SELECT '8D' As HexByte, 'ý' As Character UNION ALL
SELECT '8E' As HexByte, 'ř' As Character UNION ALL
SELECT '8F' As HexByte, 'ş' As Character UNION ALL
SELECT '90' As HexByte, '°' As Character UNION ALL
SELECT '91' As HexByte, 'j' As Character UNION ALL
SELECT '92' As HexByte, 'k' As Character UNION ALL
SELECT '93' As HexByte, 'l' As Character UNION ALL
SELECT '94' As HexByte, 'm' As Character UNION ALL
SELECT '95' As HexByte, 'n' As Character UNION ALL
SELECT '96' As HexByte, 'o' As Character UNION ALL
SELECT '97' As HexByte, 'p' As Character UNION ALL
SELECT '98' As HexByte, 'q' As Character UNION ALL
SELECT '99' As HexByte, 'r' As Character UNION ALL
SELECT '9A' As HexByte, 'ł' As Character UNION ALL
SELECT '9B' As HexByte, 'ń' As Character UNION ALL
SELECT '9C' As HexByte, 'š' As Character UNION ALL
SELECT '9D' As HexByte, '¸' As Character UNION ALL
SELECT '9E' As HexByte, '˛' As Character UNION ALL
SELECT '9F' As HexByte, '¤' As Character UNION ALL
SELECT 'A0' As HexByte, 'ą' As Character UNION ALL
SELECT 'A1' As HexByte, '~' As Character UNION ALL
SELECT 'A2' As HexByte, 's' As Character UNION ALL
SELECT 'A3' As HexByte, 't' As Character UNION ALL
SELECT 'A4' As HexByte, 'u' As Character UNION ALL
SELECT 'A5' As HexByte, 'v' As Character UNION ALL
SELECT 'A6' As HexByte, 'w' As Character UNION ALL
SELECT 'A7' As HexByte, 'x' As Character UNION ALL
SELECT 'A8' As HexByte, 'y' As Character UNION ALL
SELECT 'A9' As HexByte, 'z' As Character UNION ALL
SELECT 'AA' As HexByte, 'Ś' As Character UNION ALL
SELECT 'AB' As HexByte, 'Ň' As Character UNION ALL
SELECT 'AC' As HexByte, 'Đ' As Character UNION ALL
SELECT 'AD' As HexByte, 'Ý' As Character UNION ALL
SELECT 'AE' As HexByte, 'Ř' As Character UNION ALL
SELECT 'AF' As HexByte, 'Ş' As Character UNION ALL
SELECT 'B0' As HexByte, '˙' As Character UNION ALL
SELECT 'B1' As HexByte, 'Ą' As Character UNION ALL
SELECT 'B2' As HexByte, 'ż' As Character UNION ALL
SELECT 'B3' As HexByte, 'Ţ' As Character UNION ALL
SELECT 'B4' As HexByte, 'Ż' As Character UNION ALL
SELECT 'B5' As HexByte, '§' As Character UNION ALL
SELECT 'B6' As HexByte, 'ž' As Character UNION ALL
SELECT 'B7' As HexByte, 'ź' As Character UNION ALL
SELECT 'B8' As HexByte, 'Ž' As Character UNION ALL
SELECT 'B9' As HexByte, 'Ź' As Character UNION ALL
SELECT 'BA' As HexByte, 'Ł' As Character UNION ALL
SELECT 'BB' As HexByte, 'Ń' As Character UNION ALL
SELECT 'BC' As HexByte, 'Š' As Character UNION ALL
SELECT 'BD' As HexByte, '¨' As Character UNION ALL
SELECT 'BE' As HexByte, '´' As Character UNION ALL
SELECT 'BF' As HexByte, '×' As Character UNION ALL
SELECT 'C0' As HexByte, '{' As Character UNION ALL
SELECT 'C1' As HexByte, 'A' As Character UNION ALL
SELECT 'C2' As HexByte, 'B' As Character UNION ALL
SELECT 'C3' As HexByte, 'C' As Character UNION ALL
SELECT 'C4' As HexByte, 'D' As Character UNION ALL
SELECT 'C5' As HexByte, 'E' As Character UNION ALL
SELECT 'C6' As HexByte, 'F' As Character UNION ALL
SELECT 'C7' As HexByte, 'G' As Character UNION ALL
SELECT 'C8' As HexByte, 'H' As Character UNION ALL
SELECT 'C9' As HexByte, 'I' As Character UNION ALL
SELECT 'CA' As HexByte, '­' As Character UNION ALL
SELECT 'CB' As HexByte, 'ô' As Character UNION ALL
SELECT 'CC' As HexByte, 'ö' As Character UNION ALL
SELECT 'CD' As HexByte, 'ŕ' As Character UNION ALL
SELECT 'CE' As HexByte, 'ó' As Character UNION ALL
SELECT 'CF' As HexByte, 'ő' As Character UNION ALL
SELECT 'D0' As HexByte, '}' As Character UNION ALL
SELECT 'D1' As HexByte, 'J' As Character UNION ALL
SELECT 'D2' As HexByte, 'K' As Character UNION ALL
SELECT 'D3' As HexByte, 'L' As Character UNION ALL
SELECT 'D4' As HexByte, 'M' As Character UNION ALL
SELECT 'D5' As HexByte, 'N' As Character UNION ALL
SELECT 'D6' As HexByte, 'O' As Character UNION ALL
SELECT 'D7' As HexByte, 'P' As Character UNION ALL
SELECT 'D8' As HexByte, 'Q' As Character UNION ALL
SELECT 'D9' As HexByte, 'R' As Character UNION ALL
SELECT 'DA' As HexByte, 'Ě' As Character UNION ALL
SELECT 'DB' As HexByte, 'ű' As Character UNION ALL
SELECT 'DC' As HexByte, 'ü' As Character UNION ALL
SELECT 'DD' As HexByte, 'ť' As Character UNION ALL
SELECT 'DE' As HexByte, 'ú' As Character UNION ALL
SELECT 'DF' As HexByte, 'ě' As Character UNION ALL
SELECT 'E0' As HexByte, '' As Character UNION ALL
SELECT 'E1' As HexByte, '÷' As Character UNION ALL
SELECT 'E2' As HexByte, 'S' As Character UNION ALL
SELECT 'E3' As HexByte, 'T' As Character UNION ALL
SELECT 'E4' As HexByte, 'U' As Character UNION ALL
SELECT 'E5' As HexByte, 'V' As Character UNION ALL
SELECT 'E6' As HexByte, 'W' As Character UNION ALL
SELECT 'E7' As HexByte, 'X' As Character UNION ALL
SELECT 'E8' As HexByte, 'Y' As Character UNION ALL
SELECT 'E9' As HexByte, 'Z' As Character UNION ALL
SELECT 'EA' As HexByte, 'ď' As Character UNION ALL
SELECT 'EB' As HexByte, 'Ô' As Character UNION ALL
SELECT 'EC' As HexByte, 'Ö' As Character UNION ALL
SELECT 'ED' As HexByte, 'Ŕ' As Character UNION ALL
SELECT 'EE' As HexByte, 'Ó' As Character UNION ALL
SELECT 'EF' As HexByte, 'Ő' As Character UNION ALL
SELECT 'F0' As HexByte, '0' As Character UNION ALL
SELECT 'F1' As HexByte, '1' As Character UNION ALL
SELECT 'F2' As HexByte, '2' As Character UNION ALL
SELECT 'F3' As HexByte, '3' As Character UNION ALL
SELECT 'F4' As HexByte, '4' As Character UNION ALL
SELECT 'F5' As HexByte, '5' As Character UNION ALL
SELECT 'F6' As HexByte, '6' As Character UNION ALL
SELECT 'F7' As HexByte, '7' As Character UNION ALL
SELECT 'F8' As HexByte, '8' As Character UNION ALL
SELECT 'F9' As HexByte, '9' As Character UNION ALL
SELECT 'FA' As HexByte, 'Ď' As Character UNION ALL
SELECT 'FB' As HexByte, 'Ű' As Character UNION ALL
SELECT 'FC' As HexByte, 'Ü' As Character UNION ALL
SELECT 'FD' As HexByte, 'Ť' As Character UNION ALL
SELECT 'FE' As HexByte, 'Ú' As Character UNION ALL
SELECT 'FF' As HexByte, '?' As Character

The Script was generated by simple C# method.

public static void CreateEBCDICScript(string destinationFileName, int destinationCodePage)
{
    StringBuilder sb = new StringBuilder();
    byte[] buffer = new byte[1];
    Encoding ebcdic = Encoding.GetEncoding(870);
    Encoding destinationEncoding = Encoding.GetEncoding(destinationCodePage);    

    for (int i = 64; i < 256; i++)
    {
        buffer[0] = (byte)i;
        buffer = Encoding.Convert(ebcdic, destinationEncoding, buffer);
        string target = destinationEncoding.GetString(buffer);
        sb.AppendFormat("SELECT '{0:X}' AS HexByte, '{1}' AS Character UNION ALL", i, target.Equals("'") ? "''" : target);
        sb.AppendLine();
    }    

    File.WriteAllText(destinationFileName, sb.ToString();
}
You can use it to generate script for whatever encoding you want.

Finally as you have the conversion functions in database you can use below selects to receive correctly encoded strings from within the AS400 machine.

--CLR Version 
SELECT 
    fnECDIC870ToSql(FieldName, 1250) As FieldName 
FROM OPENQUERY(AS400, 'SELECT HEX(FieldName) AS FieldName FROM AS400Table')

--T-SQL Version 
SELECT 
    fnECDIC870ToWin1250(FieldName) As FieldName 
FROM OPENQUERY(AS400, 'SELECT HEX(FieldName) AS FieldName FROM AS400Table')
Advertisements

6 thoughts on “Converting AS400 EBCDIC 870 into Win1250 code page on SQL Server 2005

  1. Hmmm, I have done data transfers to MS SQL from a 400 before and never ran into this problem using the built in ODBC driver. Perhaps it has to do with the foreign language, unless you are referring to packed fields?

  2. Yeah, the problem is with the foreign language. I had the problem with Czech characters with diacritics. Also when need to solve the problem I found that also people had the problems with Polish.

    Unless you have characters with diacritics in the DB, you do not need to take care about this as ASCII characters are handled correctly, but the one with diacritics not.

  3. Thanks !!! Really works.

    only in the function [fnEBCDIC870ToWin1250] line 20

    WHERE RowNo < LEN(@pHex) / 2

    change for:

    WHERE RowNo <= LEN(@pHex) / 2

      • Thanks for the code.I used the SQL program.

        It works. The only thing I found is, when I got the data into hex, I have 0x pre-pended to the hex string. The routine converts that and adding ‘?’ as a default at the beginning. So, I checked the string for the ‘0x’ in the beginning, and if it is, removed that and it works perfectly.

        Also, when you are selecting sequence, you can limit the sequence by using top 8000 if you want to go only up to 8000. (i changed the sizes to 2050 and 1024 since I do not need so much…This does not affect much. But i think when you are using this 1000s of times, it adds up…

        Just my 2C..

      • Hi, I rewrite the function to takeaway the tables and sequences. It simplified a lot. Here is the code…

        CREATE FUNCTION [dbo].[fnebcidictoascii] (
        @pHex varchar(2050)
        )
        RETURNS varchar(1025)
        AS
        BEGIN
        DECLARE @char_idx varchar(256) =
        ‘????????????????’ + — 0-0F
        ‘????????????????’ + — 10 – 1F
        ‘????????????????’ + — 20 – 2F
        ‘????????????????’ + — 30 – 3F
        ‘ âäţáăčçć[.?’ + — 60 – 6F
        ‘ˇÉĘËŮÍÎĽĹ`:#@”=”‘ + –70 – 7F
        ‘˘abcdefghiśňđýřş’ + — 80 – 8F
        ‘°jklmnopqrłńš¸˛¤’ + — 90 – 9F
        ‘ą~stuvwxyzŚŇĐÝŘŞ’ + — A0 – AF
        ‘˙ĄżŢŻ§žźŽŹŁŃŠ¨´×’ + — B0 – BF
        ‘{ABCDEFGHI­ôöŕóő’ + — C0 – CF
        ‘}JKLMNOPQRĚűüťúě’ + — D0 – DF
        ‘?÷STUVWXYZďÔÖŔÓŐ’ + — E0 – EF
        ‘0123456789ĎŰÜŤÚ?’, — F0 – FF

        @result varchar(1025),
        @i int,
        @len int

        set @i = 3
        set @len = len(@pHex)
        set @result = ”

        while @i < @len
        begin
        set @result = @result + substring(@char_idx, convert(int, convert(binary(1), '0X'+SUBSTRING(@pHex, @i, 2), 1) ) + 1 ,1)
        set @i = @i + 2
        end
        RETURN @result
        END
        GO

        all it does is setup the 256 characters string that contains the ASCII characters in place of EDSIDIC equivalent position. for each of the HEX value, it converts into integer and selects the ASCCII character from the string from that position. This code is much faster. This can be ported to any of the database or other language.

        I thought it would help.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s