Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] dbt adapter tests for utils unreliable #7670

Closed
2 tasks done
sdebruyn opened this issue May 20, 2023 · 6 comments · Fixed by #7672
Closed
2 tasks done

[Bug] dbt adapter tests for utils unreliable #7670

sdebruyn opened this issue May 20, 2023 · 6 comments · Fixed by #7672
Assignees
Labels
bug Something isn't working

Comments

@sdebruyn
Copy link
Contributor

sdebruyn commented May 20, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

The current adapter tests for the utils use an incorrect way to verify the end results of the tests.

{% test assert_equal(model, actual, expected) %}
select * from {{ model }} where {{ actual }} != {{ expected }}

{% endtest %}

The above macro is currently used. However, NULL works in a strange way in SQL.

This is documented for SQL Server or Postgres in the same way as it is the ANSI standard that dictates this behaviour.

The result is that the test above will completely ignore results where either actual or expected is null and the other one is not.

I just noticed this in the dbt-sqlserver tests. Apparently the hashing test should have been failing from as soon it was implemented, but it never did.

Expected Behavior

There are multiple routes we can take here. I'd be glad to submit a PR once we agree on the way forward.

  1. distinct from
    This SQL syntax is not supported in all common databases (e.g. SQL Server just added support in the 2022 version), but it effectively compares the values as was probably intended in the mentioned macro.

  2. coalesce
    If we would coalesce the result values to an invalid value (e.g. coalesce(actual, 'invalid') for strings), then the tests could continue using the != operator to compare actual and expected. I think this would mean that we have to adapt every utils test as some work on strings, others on integers etc.

  3. add null checks to where clause
    IMHO the best way forward. Just add conditions to see if actual is null while expected is not and vice versa.

{% test assert_equal(model, actual, expected) %}
select * from {{ model }}
where {{ actual }} != {{ expected }}
or ({{ actual }} is null and {{ expected }} is not null)
or ({{ expected }} is null and {{ actual }} is not null)
{% endtest %}

Steps To Reproduce

This issue is visible in the latest version of the dbt-sqlserver tests, but it will probably have caused more false positives for other adapters as well.

@sdebruyn sdebruyn added bug Something isn't working triage labels May 20, 2023
@sdebruyn
Copy link
Contributor Author

@dbeatty10
Copy link
Contributor

Thank you for raising this @sdebruyn!

Agreed that the three-valued logic is not intuitive for most of us and then it leads to situations like the this one with the assert_equal macro for these functional tests for adapters.

On a related note, #6997 has a feature request to add a cross-database macro for is [not] distinct from. Implementing it and using it would be a 4th route we could take.

Deciding on a route

I agree with you that Option 3 is the easiest short-term path forward for addressing the assert_equal macro. The downside is that this replacement assert_equal would not have its own testing to make sure it is implemented correctly. Since it's a simple and well-known implementation, I'd be comfortable having assert_equal as un-tested itself.

Would you be interested in raising a PR to implement Option 3, by any chance? If not, @alzaar was looking for a good_first_issue recently and might be interested in picking it up.

Implementation of Option 3

I'd suggest a slightly less verbose variant that should yield equivalent results:

{% test assert_equal(model, actual, expected) %}
select * from {{ model }}
{# -- actual is distinct from expected #}
where not (
  ({{ actual }} = {{ expected }})
  or ({{ actual }} is null and {{ expected }} is null)
)
{% endtest %}

or maybe even:

{% macro equals(actual, expected) %}
{# -- actual is not distinct from expected #}
(({{ actual }} = {{ expected }}) or ({{ actual }} is null and {{ expected }} is null))
{% endmacro %}

{% test assert_equal(model, actual, expected) %}
select * from {{ model }}
where not {{ equals(actual, expected) }}
{% endtest %}

@sdebruyn
Copy link
Contributor Author

Thanks @dbeatty10, opened #7672. I don't have local postgres testing set up, so let's see in CI if some of them start failing now.

@dbeatty10
Copy link
Contributor

Awesome @sdebruyn! Looks like all the CI checks on that PR are passing, and a member of our engineering team will give it a review.

@dbeatty10 dbeatty10 removed the triage label May 22, 2023
@VersusFacit VersusFacit self-assigned this May 24, 2023
@lllong33
Copy link
Contributor

@sdebruyn @dbeatty10 It looks like not null is not true, can't write it like that. FYI

select version();  -- PostgreSQL 12.3 on x86_64-pc-linux-gnu

with t1 as (
	select 1 as actual, 2 as expected
	union all 
	select 1 as actual, 1 as expected
	union all 
	select 1 as actual, cast(null as int) as expected
	union all 
	select cast(null as int) as actual, cast(null as int) as expected 
)

select * from t1 
--where actual = expected or (actual is null and expected is null) -- normal: 1,1; null,null
--where actual != expected 
--	or (actual is null and expected is not null )
--	or (actual is not null and expected is null ) -- normal: 1,2; 1,null
where not (actual = expected or (actual is null and expected is null)) -- issue: actual=1, expected=null is not work 

image

@dbeatty10
Copy link
Contributor

dbeatty10 commented May 26, 2023

@lllong33 Is the key take-away that the implementation of {{ equals(actual, expected) }} should probably include a case statement as described here and shown below?

Minimal test cases

Here's some Postgres-specific SQL that is helpful to demonstrate all the permutations listed here using the case-when logic explained here:

with x as (

    select 1::int as i union all
    select 2::int as i union all
    select null::int as i

),

permutations as (

    select
        x1.i as x1_i,
        x2.i as x2_i 
    from x as x1, x as x2
    order by x1.i, x2.i

)

select
    -- values to compare
    x1_i,
    x2_i,

    -- logic for equals macro
    ((x1_i = x2_i) or (x1_i is null and x2_i is null)) as equals_macro,

    -- compliant syntax for null-safe equality
    x1_i is not distinct from x2_i as is_not_distinct_from,

    -- case/when logic for equals macro
    case when ((x1_i = x2_i) or (x1_i is null and x2_i is null))
        then 0
        else 1
    end = 0 as case_equals_macro,

    -- negation of logic for equals macro
    not ((x1_i = x2_i) or (x1_i is null and x2_i is null)) as not_equals_macro,

    -- compliant syntax for null-safe inequality
    x1_i is distinct from x2_i as is_distinct_from,

    -- negation of case/when logic for equals macro
    not (case when ((x1_i = x2_i) or (x1_i is null and x2_i is null))
     then 0
     else 1
    end = 0) as case_equals_macro

from permutations
;
image

Reflection

This discussion makes me think that what we really need are cross-database null-safe implementation(s) of is [not] distinct from as proposed in #6997. Doing those implementation(s) would necessarily include all the relevant testing to make sure the edge cases are covered.

Summary

At the very least, we need more test cases and then to update the implementation in #7672 to use case/when syntax.

While we are doing more test cases, we might as well just do #6997.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants