Skip to content

Aggregation Optimizations #1016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

davidfischer
Copy link
Collaborator

We have a few aggregations which run nightly to make querying the previous day's data faster. Some of these are quite expensive on the database and we're looking to move this processing out of the database. We already write the day's "Offers" to a parquet file and these aggregations can be run against that file more efficiently.

This PR starts by running only the Domain aggregation against that file and only if the day's offers are present.

While putting this together, I also noticed we're seeing very small discrepencies (~100 total impressions per day out of ~600k+) and this is due to ad views for an offer from one day occurring on the next day after the Offers have been dumped. This lowers the threshold of how long an offer can be viewed for from 4 hours -> 2 hours and makes the offer dump happen 2 hours after UTC midnight. These discrepancies should disappear.

We have a few aggregations which run nightly to make querying the
previous day's data faster. Some of these are quite expensive on the
database and we're looking to move this processing out of the database.
We already write the day's "Offers" to a parquet file and these
aggregations can be run against that file more efficiently.

This PR starts by running only the Domain aggregation against that file
and only if the day's offers are present.

While putting this together, I also noticed we're seeing very small
discrepencies (~100 total impressions per day out of ~600k+) and this is
due to ad views for an offer from one day occurring on the next day
after the Offers have been dumped. This lowers the threshold of how long
an offer can be viewed for from 4 hours -> 2 hours and makes the offer
dump happen 2 hours after UTC midnight. These discrepancies should
disappear.
@davidfischer davidfischer requested a review from a team as a code owner May 5, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant