-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Print summary information after row validation #1417
base: develop
Are you sure you want to change the base?
feat: Print summary information after row validation #1417
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor comments and one that I think we should not go down this route and discuss other options.
d93958b
to
d15983f
Compare
Approach refactored from initial review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helen,
This is really close. What the customer is asking for ROW validation
- Number of rows in the Source table - this is basically length of source_df
- Number of rows with validation success - this you have calculated use that
- Number of rows present in source and target, but failed validation - this is basically all failures where source_agg_value is not NULL and target_agg_value is not NULL
- Number of rows present in source, but not in target - this is basically all failures where source_agg_value is not NULL and target_agg_value is NULL
Similarly for the target table.
I think these are additional conditions that you can run on the Pandas table. Please try it out and if you run into difficulty, we can work on it together.
Thanks.
Sundar Mudupalli
/gcbrun |
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helen,
Thank you for completing this - very helpful - Neil can take it to his customer.
Also thank you for using good coding practices by including typing - I call that the gift that you gave us that keeps on. giving - in the form of clarity when making other changes.
Thanks.
Sundar Mudupalli
Description of changes
Write a description of the changes you have made in this PR. Extremely small changes such as fixing typos do not need a description.
data_validation/__main__.py
: removed TODO comment for an issue already closed (CLI Support for Parallel Validation Execution #31)data_validation/combiner.py
: implementedget_summary
function to log a summary report only of row validation results, including statistics on rows present in source but not in target, and vice versa; added type checking and type hinting for all functions' parameters in the file; added a called toget_summary
on functiongenerate_report
before returning its result.data_validation/result_handlers/bigquery.py
: updated code commenttests/unit/test_combiner.py
: implemented unit test forget_summary
functionIssues to be closed
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Closes #1277
Closes #1414
Checklist
CONTRIBUTING
Guide.