Recently I have discovered a bug in Snowlake Core Cloud Sevices query parser, which causes wrong NULL
handling in recursive CTE queriers.
Snowflake is already aware about that and confirmed that bug, but currently there is no ETA for this bug resolution. Problem should be addressed in some of future releases.
The problem is, that in case you have an NULL
constant in the anchor part of resursive CTE, then such field when used in an expression is always evaluated as NULL
.
WITH RECURSE_CTE AS (
SELECT
T.TABLE_COLUMN
,T.PARENT_TABLE_COLUMN
,NULL AS NULL_PARENT_TABLE_COLUMN
,1 AS LEVEL
FROM A_TABLE T
WHERE T.PARENT_NODE IS NULL
UNION ALL
SELECT
T.TABLE_COLUMN
,T.PARENT_TABLE_COLUMN
,PT.TABLE_COLUMN AS NULL_PARENT_TABLE_COLUMN
,PT.LEVEL + 1 AS LEVEL
FROM A_TABLE T
INNER JOIN RECURSE_CTE PT ON PT.TABLE_NAME = TH.PARENT_TABLE_COLUMN
)
SELECT
TABLE_COLUMN
,PARENT_TABLE_COLUMN
,NULL_PARENT_TABLE_COLUMN
,LEVEL
,IFNULL(NULL_PARENT_TABLE_COLUMN, 'This is always evaluated as NULL') AS ALWAYS_NULL_IN_EXPRESSION
FROM RECURSE_CTE
In the example above there is NULL AS NUL_PARENT_TABLE_COLUMN
in the anchor expression of the CTE. It is a NULL
constant in the the anchor part, but in the recursive part it is defined as PT.TABLE_COLUMN AS NULL_PARENT_TABLE_COLUMN
, thus for child records it is NOT NULL
. In final select, the NULL_PARENT_TABLE_COLUMN
field produces correct results when used as standalone field. But if the field is used in an expression like IFNULL(NULL_PARENT_TABLE_COLUMN, 'This is always evaluated as NULL')
, then such expression always treats the NULL_PARENT_TABLE_COLUMN
as NULL
.
Here is a complete sample with some demo data:
CREATE OR REPLACE TABLE TABLE_HIERARCHY (
TABLE_NAME varchar
,TABLE_LOAD_TYPE varchar
,PARENT_TABLE_NAME varchar
);
INSERT INTO TABLE_HIERARCHY (
TABLE_NAME
,TABLE_LOAD_TYPE
,PARENT_TABLE_NAME
)
VALUES
('A', 'MERGE', NULL)
,('B', 'MERGE', NULL)
,('C', 'INSERT', 'A')
,('D', 'MERGE', 'B')
,('E', 'INSERT', 'D');
--SELECT * FROM TABLE_HIERARCHY;
WITH RECURSE_CTE AS (
SELECT
TH.TABLE_NAME
,TH.TABLE_LOAD_TYPE
,TH.PARENT_TABLE_NAME
,NULL AS PARENT_TABLE_LOAD_TYPE -- NULL CONSTANT in anchor part of recursive CTE causes further
,CAST('' as varchar) AS PARENT_TABLE_LOAD_TYPE_NOT_NULL --
,1 AS LEVEL
FROM TABLE_HIERARCHY TH
WHERE PARENT_TABLE_NAME IS NULL
UNION ALL
SELECT
TH.TABLE_NAME
,TH.TABLE_LOAD_TYPE
,TH.PARENT_TABLE_NAME
,PT.TABLE_LOAD_TYPE AS PARENT_TABLE_LOAD_TYPE
,PT.TABLE_LOAD_TYPE AS PARENT_TABLE_LOAD_TYPE_NOT_NULL
,PT.LEVEL + 1 AS LEVEL
FROM TABLE_HIERARCHY TH
INNER JOIN RECURSE_CTE PT ON PT.TABLE_NAME = TH.PARENT_TABLE_NAME
)
SELECT
C.TABLE_NAME
,C.TABLE_LOAD_TYPE
,C.PARENT_TABLE_NAME
,C.PARENT_TABLE_LOAD_TYPE -- For Anchor members it returns null, for non-anchor it returns correct values
,C.PARENT_TABLE_LOAD_TYPE_NOT_NULL
,C.LEVEL
,IFNULL(C.PARENT_TABLE_LOAD_TYPE, 'ISNULL') AS WORNG_NULL_HANDLING -- if there is in the anchor part of the recuse CTE a NULL constant, all rows evaluate to NULL
,COALESCE(C.PARENT_TABLE_LOAD_TYPE, 'ISNULL') AS WRONG_NULL_HANDLING2 -- if there is in the anchor part of the recuse CTE a NULL constant, all rows evaluate to NULL
,LENGTH(C.PARENT_TABLE_LOAD_TYPE) AS WRONG_NULL_HANDLING3 -- if there is in the anchor part of the recuse CTE a NULL constant, all rows evaluate to NULL
,C.PARENT_TABLE_LOAD_TYPE || '_CONCAT' AS WRONG_NULL_HANDLING4 -- if there is in the anchor part of the recuse CTE a NULL constant, all rows evaluate to NULL
,IFNULL(C.PARENT_TABLE_NAME, 'ISNULL') AS CORRECT_HANDLING -- Non-null constants work fine
,C.PARENT_TABLE_LOAD_TYPE_NOT_NULL || '_CONCAT' AS CORRECT_HANDLING2 -- nulls retrieved from table work fine
FROM RECURSE_CTE C;
Te wrong handling is visible in the below image which shows results of the above demo:

If the field which has NULL
constant in anchor part of recursive CTE is used in resultset as it is (out of expression), then its value is evaluated properly and proper result is returned. Problem is only, if such field is used in expression. Then it looks like Snowflake query parser looks on the CTE and if there is NULL
constant, it automatically returns NULL
for each row (to avoid per-row evaluation). This migh work for normal CTE, but does not work for recursive CTE as the child records might produce non-NULL
values.
Also there are no problem, if a physical table field is used in expression and that field contains NULL
. Problem is only with those NULL
constants.
Unless the problem is addressed by Snowflake, the only workaround is to use a non NULL
constant in the anchor part of recursive CTE, but this might be problematic in some cases.