Skip to content

Suggestion: Should preserve_finished_jobs really default to true? #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stefanvermaas opened this issue May 7, 2025 · 5 comments
Open

Comments

@stefanvermaas
Copy link

stefanvermaas commented May 7, 2025

We've just hit a tricky issue that I think many teams might run into, especially when adopting solid_queue in smaller projects without deep background job needs.

By default, SolidQueue preserves all finished jobs (source). That feels like a safe option at first glance, but if you're not aware of this setting, you might only discover it once it's too late.

In our case, we had a small Rails app using SolidQueue out of the box. It was deployed with Kamal, everything worked perfectly for months. Until one day, the app became unreachable. No traffic, no response, and only a single error message from hours earlier: "could not connect to database."

The database server was up. The app server was up. But after digging in, we discovered that the entire server had run out of disk space.

The root cause? All those finished background jobs were quietly preserved. Since this app had processed hundreds of thousands of jobs over time, including all error notifications, which also went through the job system, things spiraled. Once the disk was full, new jobs couldn't run, including error reporting. We were blind to the failure.

To make matters worse, recovering the machine wasn't easy. With no space left, even SSH access became unreliable. Freeing up disk space in that state is non-trivial, especially if you don't have automated clean-up in place.

Yes, we should have monitored disk usage more closely. But we did clean up our own records — we just didn’t realize that finished jobs were being preserved automatically.
Proposal

Should SolidQueue.preserve_finished_jobs really default to true?

Most job libraries don’t retain finished jobs unless you explicitly ask for that. For example, Sidekiq and Resque both discard successful jobs by default. It’s easy to opt into persistence if you need auditing or observability, but it's safer to start with a clean slate.

For a lot of teams, especially those adopting SolidQueue for the first time, this default might feel invisible until it causes problems. Would it be worth reconsidering?

@stefanvermaas stefanvermaas changed the title Suggestion: Should preserve_finished_jobs really default to true? Suggestion: Should preserve_finished_jobs really default to true? May 7, 2025
@rosa
Copy link
Member

rosa commented May 9, 2025

Hey @stefanvermaas, sorry that happened and thanks for taking the time to write this up. The current default is the right one. Not deleting jobs right after they're executed is more performant (as you don't have to do multiple deletes) and necessary to ensure recurring tasks aren't enqueued twice. Resque and Sidekiq's behaviours aren't comparable because they use Redis and implement the queue by popping a job from a Redis list (roughly), so the job is always deleted when it's picked up. GoodJob, another database-backed adapter and thus comparable, preserves finished jobs by default as well.

The right way to approach this, as I see it, is to periodically clean finished jobs. This is mentioned here:

clear_finished_jobs_after: period to keep finished jobs around, in case preserve_finished_jobs is true—defaults to 1 day. Note: Right now, there's no automatic cleanup of finished jobs. You'd need to do this by periodically invoking SolidQueue::Job.clear_finished_in_batches, which can be configured as a recurring task.

So what I could see is having this recurring task added by default to the generated recurring.yml file.

@stefanvermaas
Copy link
Author

The current default is the right one. Not deleting jobs right after they're executed is more performant (as you don't have to do multiple deletes) and necessary to ensure recurring tasks aren't enqueued twice.

I knew there was a good reason for this. This makes sense.

The right way to approach this, as I see it, is to periodically clean finished jobs.

Truthfully, I was actually already aware of this. But I forgot, and then this happened. The reason is that I never thought about it twice because we're still using redis + sidekiq for most projects.

So what I could see is having this recurring task added by default to the generated recurring.yml file.

This makes sense. I'll open a PR for it.

@calvinchieng
Copy link

calvinchieng commented May 15, 2025

Invalid recurring tasks:
- periodic_job_cleanup: Class name doesn't correspond to an existing class
Exiting...

I am getting the above error when I try with the code in the PR. Did I missed something?

production:
  periodic_job_cleanup:
    class: SolidQueue::Job.clear_finished_in_batches
    queue: default
    schedule: at 4pm every day

Ruby 3.4.3
Gem Spec:

    solid_queue (1.1.5)
      activejob (>= 7.1)
      activerecord (>= 7.1)
      concurrent-ruby (>= 1.3.1)
      fugit (~> 1.11.0)
      railties (>= 7.1)
      thor (~> 1.3.1)

@rosa
Copy link
Member

rosa commented May 15, 2025

Ahhh yes! SolidQueue::Job.clear_finished_in_batches is not a valid class name. This should be specified as command.

@calvinchieng
Copy link

Thanks @rosa

It works now.

SolidQueue-1.1.5 Enqueued recurring task (109.9ms)  task: "periodic_job_cleanup", active_job_id: "ab544601-dbc7-47a7-bd9a-a1c770fd1861", at: "2025-05-15T08:00:00Z"

Config:

    command: "SolidQueue::Job.clear_finished_in_batches"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants