Snowflake Recursive CTE NULL Handling bug

Recently I have discovered a bug in Snowlake Core Cloud Sevices query parser, which causes wrong NULL handling in recursive CTE queriers.

Snowflake is already aware about that and confirmed that bug, but currently there is no ETA for this bug resolution. Problem should be addressed in some of future releases.

The problem is, that in case you have an NULL constant in the anchor part of resursive CTE, then such field when used in an expression is always evaluated as NULL.

WITH RECURSE_CTE AS (
    SELECT
        T.TABLE_COLUMN
        ,T.PARENT_TABLE_COLUMN
        ,NULL                   AS NULL_PARENT_TABLE_COLUMN
        ,1                      AS LEVEL
    FROM A_TABLE T
    WHERE T.PARENT_NODE IS NULL

    UNION ALL
    
    SELECT
         T.TABLE_COLUMN
        ,T.PARENT_TABLE_COLUMN
        ,PT.TABLE_COLUMN        AS NULL_PARENT_TABLE_COLUMN
        ,PT.LEVEL + 1           AS LEVEL
    FROM A_TABLE T
    INNER JOIN RECURSE_CTE PT ON PT.TABLE_NAME = TH.PARENT_TABLE_COLUMN    
)
SELECT
    TABLE_COLUMN
    ,PARENT_TABLE_COLUMN
    ,NULL_PARENT_TABLE_COLUMN
    ,LEVEL
    ,IFNULL(NULL_PARENT_TABLE_COLUMN, 'This is always evaluated as NULL') AS ALWAYS_NULL_IN_EXPRESSION
FROM RECURSE_CTE
   

In the example above there is NULL AS NUL_PARENT_TABLE_COLUMN in the anchor expression of the CTE. It is a NULL constant in the the anchor part, but in the recursive part it is defined as PT.TABLE_COLUMN AS NULL_PARENT_TABLE_COLUMN, thus for child records it is NOT NULL. In final select, the NULL_PARENT_TABLE_COLUMN field produces correct results when used as standalone field. But if the field is used in an expression like IFNULL(NULL_PARENT_TABLE_COLUMN, 'This is always evaluated as NULL'), then such expression always treats the NULL_PARENT_TABLE_COLUMN as NULL.

Here is a complete sample with some demo data:

CREATE OR REPLACE TABLE TABLE_HIERARCHY (
    TABLE_NAME varchar
    ,TABLE_LOAD_TYPE varchar
    ,PARENT_TABLE_NAME varchar
);

INSERT INTO TABLE_HIERARCHY (
    TABLE_NAME
    ,TABLE_LOAD_TYPE
    ,PARENT_TABLE_NAME
)
VALUES 
     ('A', 'MERGE', NULL)
    ,('B', 'MERGE', NULL)
    ,('C', 'INSERT', 'A')
    ,('D', 'MERGE', 'B')
    ,('E', 'INSERT', 'D');
    
    
--SELECT * FROM TABLE_HIERARCHY;

WITH RECURSE_CTE AS (
    SELECT
         TH.TABLE_NAME
        ,TH.TABLE_LOAD_TYPE
        ,TH.PARENT_TABLE_NAME
        ,NULL                   AS PARENT_TABLE_LOAD_TYPE               -- NULL CONSTANT in anchor part of recursive CTE causes further
        ,CAST('' as varchar)    AS PARENT_TABLE_LOAD_TYPE_NOT_NULL      -- 
        ,1                      AS LEVEL
    FROM TABLE_HIERARCHY TH
    WHERE PARENT_TABLE_NAME IS NULL

    UNION ALL
    
    SELECT
         TH.TABLE_NAME
        ,TH.TABLE_LOAD_TYPE
        ,TH.PARENT_TABLE_NAME
        ,PT.TABLE_LOAD_TYPE                 AS PARENT_TABLE_LOAD_TYPE
        ,PT.TABLE_LOAD_TYPE                 AS PARENT_TABLE_LOAD_TYPE_NOT_NULL
        ,PT.LEVEL + 1                       AS LEVEL
    FROM TABLE_HIERARCHY TH
    INNER JOIN RECURSE_CTE PT ON PT.TABLE_NAME = TH.PARENT_TABLE_NAME
    
)
SELECT
    C.TABLE_NAME
    ,C.TABLE_LOAD_TYPE
    ,C.PARENT_TABLE_NAME
    ,C.PARENT_TABLE_LOAD_TYPE                                               -- For Anchor members it returns null, for non-anchor it returns correct values
    ,C.PARENT_TABLE_LOAD_TYPE_NOT_NULL                                      
    ,C.LEVEL
    ,IFNULL(C.PARENT_TABLE_LOAD_TYPE, 'ISNULL')     AS WORNG_NULL_HANDLING  -- if there is in the anchor part of the recuse CTE a NULL constant, all rows evaluate to NULL     
    ,COALESCE(C.PARENT_TABLE_LOAD_TYPE, 'ISNULL')   AS WRONG_NULL_HANDLING2 -- if there is in the anchor part of the recuse CTE a NULL constant, all rows evaluate to NULL
    ,LENGTH(C.PARENT_TABLE_LOAD_TYPE)               AS WRONG_NULL_HANDLING3 -- if there is in the anchor part of the recuse CTE a NULL constant, all rows evaluate to NULL
    ,C.PARENT_TABLE_LOAD_TYPE || '_CONCAT'          AS WRONG_NULL_HANDLING4 -- if there is in the anchor part of the recuse CTE a NULL constant, all rows evaluate to NULL
    ,IFNULL(C.PARENT_TABLE_NAME, 'ISNULL')          AS CORRECT_HANDLING     -- Non-null constants work fine
    ,C.PARENT_TABLE_LOAD_TYPE_NOT_NULL || '_CONCAT' AS CORRECT_HANDLING2    -- nulls retrieved from table work fine
FROM RECURSE_CTE C;

Te wrong handling is visible in the below image which shows results of the above demo:

If the field which has NULL constant in anchor part of recursive CTE is used in resultset as it is (out of expression), then its value is evaluated properly and proper result is returned. Problem is only, if such field is used in expression. Then it looks like Snowflake query parser looks on the CTE and if there is NULL constant, it automatically returns NULL for each row (to avoid per-row evaluation). This migh work for normal CTE, but does not work for recursive CTE as the child records might produce non-NULL values.

Also there are no problem, if a physical table field is used in expression and that field contains NULL. Problem is only with those NULL constants.

Unless the problem is addressed by Snowflake, the only workaround is to use a non NULL constant in the anchor part of recursive CTE, but this might be problematic in some cases.

Advertisement