Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have a few aggregations which run nightly to make querying the previous day's data faster. Some of these are quite expensive on the database and we're looking to move this processing out of the database. We already write the day's "Offers" to a parquet file and these aggregations can be run against that file more efficiently.
This PR starts by running only the Domain aggregation against that file and only if the day's offers are present.
While putting this together, I also noticed we're seeing very small discrepencies (~100 total impressions per day out of ~600k+) and this is due to ad views for an offer from one day occurring on the next day after the Offers have been dumped. This lowers the threshold of how long an offer can be viewed for from 4 hours -> 2 hours and makes the offer dump happen 2 hours after UTC midnight. These discrepancies should disappear.