Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Row validations fail for values with trailing newline #1415

Merged

Conversation

nj1973
Copy link
Contributor

@nj1973 nj1973 commented Jan 29, 2025

This PR merged changes to align RStrip behaviour across as many SQL engines as possible. The engines changed here are:

  • Oracle
  • DB2
  • Snowflake

BigQuery, PostgreSQL, Spanner and Teradata already behaved correctly.

MySQL and Hive rstrip() functions do not support characters other than space and therefore are not included in this PR.

SQL Server has mixed support for customizing the characters to remove and I've created a specific issue for this: #1419

I've added integration tests for all systems above.

@nj1973 nj1973 marked this pull request as ready for review January 31, 2025 14:03
@nj1973
Copy link
Contributor Author

nj1973 commented Jan 31, 2025

/gcbrun

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

To address one of your questions in the comments... I believe we originally added the rstrip to account for the teradatasql bug that adds extra padding when returning results. For validations that don't include TD as a source or target we probably don't need the rstrip as a sanitization step before hashing but it's good that this PR now fixes the whitespace bug 😄

tests/resources/oracle_test_tables.sql Show resolved Hide resolved
tests/resources/snowflake_test_tables.sql Outdated Show resolved Hide resolved
tests/resources/sqlserver_test_tables.sql Outdated Show resolved Hide resolved
tests/resources/teradata_test_tables.sql Show resolved Hide resolved
tests/system/data_sources/test_sql_server.py Show resolved Hide resolved
Copy link
Collaborator

@sundar-mudupalli-work sundar-mudupalli-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nick,

Thank you for the work on this. I understand this is holding up a customer - so will approve this.

I agree with Neha that we are overusing RTRIM/Rstrip and distorting validation results. I have opened a separate issue requesting this be fixed.

Thanks again for the hard work and the test cases!!

Sundar Mudupallli

@nj1973
Copy link
Contributor Author

nj1973 commented Feb 4, 2025

/gcbrun

@nj1973 nj1973 requested a review from helensilva14 February 4, 2025 06:58
Copy link
Collaborator

@helensilva14 helensilva14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the IF EXISTS part, I definitely was unsure of it about Teradata and for Oracle I did some search before and I found it available, but I checked again and it's specific to version 23 apparently, so my bad!

I'll look further on proper ways to do this type of checking for TD and Oracle in the draft PR for the DDLs: #1416

@nj1973 nj1973 merged commit a169f63 into develop Feb 4, 2025
5 checks passed
@nj1973 nj1973 deleted the 1411-row-validations-fail-for-values-with-trailing-newline branch February 4, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat and hash row validations fail for values with trailing newline
4 participants