Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Print summary information after row validation #1417

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

helensilva14
Copy link
Collaborator

@helensilva14 helensilva14 commented Jan 29, 2025

Description of changes

Write a description of the changes you have made in this PR. Extremely small changes such as fixing typos do not need a description.

  • data_validation/__main__.py: removed TODO comment for an issue already closed (CLI Support for Parallel Validation Execution #31)
  • data_validation/combiner.py: implemented get_summary function to log a summary report only of row validation results, including statistics on rows present in source but not in target, and vice versa; added type checking and type hinting for all functions' parameters in the file; added a called to get_summary on function generate_report before returning its result.
  • data_validation/result_handlers/bigquery.py: updated code comment
  • tests/unit/test_combiner.py: implemented unit test for get_summary function

Issues to be closed

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Closes #1277
Closes #1414

Checklist

  • I have followed the CONTRIBUTING Guide.
  • I have commented my code, particularly in hard-to-understand areas
  • I have added unit and integration tests relevant to my change as needed
  • I have updated any relevant documentation to reflect my changes

@helensilva14 helensilva14 requested a review from nj1973 January 29, 2025 19:28
@helensilva14 helensilva14 linked an issue Jan 29, 2025 that may be closed by this pull request
@helensilva14

This comment was marked as outdated.

nj1973
nj1973 previously requested changes Jan 30, 2025
Copy link
Contributor

@nj1973 nj1973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments and one that I think we should not go down this route and discuss other options.

data_validation/combiner.py Outdated Show resolved Hide resolved
data_validation/combiner.py Outdated Show resolved Hide resolved
data_validation/combiner.py Outdated Show resolved Hide resolved
@helensilva14 helensilva14 reopened this Feb 6, 2025
@helensilva14 helensilva14 force-pushed the 1277-print-summary-information-after-row-validation branch from d93958b to d15983f Compare February 6, 2025 00:45
@helensilva14 helensilva14 requested a review from nj1973 February 6, 2025 00:47
@helensilva14 helensilva14 dismissed nj1973’s stale review February 6, 2025 00:48

Approach refactored from initial review

Copy link
Collaborator

@sundar-mudupalli-work sundar-mudupalli-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helen,

This is really close. What the customer is asking for ROW validation

  1. Number of rows in the Source table - this is basically length of source_df
  2. Number of rows with validation success - this you have calculated use that
  3. Number of rows present in source and target, but failed validation - this is basically all failures where source_agg_value is not NULL and target_agg_value is not NULL
  4. Number of rows present in source, but not in target - this is basically all failures where source_agg_value is not NULL and target_agg_value is NULL

Similarly for the target table.

I think these are additional conditions that you can run on the Pandas table. Please try it out and if you run into difficulty, we can work on it together.

Thanks.

Sundar Mudupalli

@helensilva14
Copy link
Collaborator Author

/gcbrun

@helensilva14 helensilva14 marked this pull request as ready for review February 7, 2025 18:25
@pull-request-size pull-request-size bot added size/L and removed size/M labels Feb 7, 2025
@helensilva14
Copy link
Collaborator Author

/gcbrun

Copy link
Collaborator

@sundar-mudupalli-work sundar-mudupalli-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helen,

Thank you for completing this - very helpful - Neil can take it to his customer.
Also thank you for using good coding practices by including typing - I call that the gift that you gave us that keeps on. giving - in the form of clarity when making other changes.

Thanks.

Sundar Mudupalli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report total number of validated rows Print summary information after row validation
3 participants