Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

development env, high CPU usage #482

Open
alex29ua opened this issue Jan 15, 2025 · 18 comments
Open

development env, high CPU usage #482

alex29ua opened this issue Jan 15, 2025 · 18 comments

Comments

@alex29ua
Copy link

alex29ua commented Jan 15, 2025

Trying a default install of solid_queue 1.1.2, with a still empty queue DB.
Ruby 3.1.6, Rails 7.1.5.1, Ubuntu 22.04, MySQL 5.7. No Puma.

When I run rails solid_queue:start, CPU usage goes crazy. It looks like this:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2859 user    20   0 1032208 183844  17444 R  93.7   1.1   0:22.42 solid-queue-supervisor(1.1.2): supervising 2875, 2878
   2875 user    20   0  966868 179332   8632 R  79.7   1.1   0:17.48 solid-queue-dispatcher(1.1.2): dispatching every 1 seconds
   2878 user    20   0  900884 179320   8360 R  74.7   1.1   0:16.36 solid-queue-worker(1.1.2): waiting for jobs in *
    295 mysql     20   0 2717988 290808  15516 S  38.9   1.8   0:16.64 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid

and it never goes down. I tried changing the polling to 500/50 or even slower, but it doesn't change anything at all in usage.

Is it some misconfiguration issue?

Nothing suspicious in the logs as I can see.

@rosa
Copy link
Member

rosa commented Jan 15, 2025

I don't see anything like this in my machine 😕 Could yo provide more details? For example, Solid Queue version, OS..?

@alex29ua
Copy link
Author

SQ 1.1.2. Tried with ruby 3.1.6 and 3.0.6 (similar to my production versions for now).

OS is Ubuntu 22.04 (technically, it's WSL in Windows 11). Ruby, Rails, Mysql see in my first message

I tried setting up a new Rails app from scratch, then simple SQ config according to installation guide, but I got the same results. Looks like mysql experiences some load too, so probably SQ makes a lot of requests? I cannot see anything in show processlist in mysql console though. And the queue db is really empty, a few rows about the workers started, that's it.

@alex29ua
Copy link
Author

tried with this queue.yml config

default: &default
  dispatchers:
    - polling_interval: 30
      batch_size: 500
  workers:
    - queues: "*"
      threads: 1
      processes: 1
      polling_interval: 10

to make it really slooow - but I still get the same huge load.

@rosa
Copy link
Member

rosa commented Jan 15, 2025

Could you try with Solid Queue <= v1.0.2? Just to rule out some changes around worker/dispatcher polling and the interruptible sleep?

@hms
Copy link
Contributor

hms commented Jan 16, 2025

@rosa

"wasn't me".... 😏 (at least I really hope so 🤞 )

All kidding aside, I did spin up my dev env to reconfirm that see almost zero CPU resources when SQ is quiescent. For comparisons and to keep the information flowing: M3 Mac, OSX 15.2, Postgres@17(primary, I test across all 3 databases at the latest release), ruby 3.4, Rails 7.2.2.

Before pushing up my PRs, I did test across Ruby 3.1 - 3.6 and on Rails 7.1.

@alex29ua
Copy link
Author

@rosa nicely spotted!
1.0.2 works perfectly, I can barely find it in the processes.
1.1.0 and up - goes full on with CPU again.

I only have mysql5.7 to test with, though I set the recommended use_skip_locked = false in all the cases.

@alex29ua
Copy link
Author

A little bit more info on SQ 1.1.2 tests:
Ruby 3.1.6 is full CPU usage
Ruby 3.2.6 is totally ok.

I'm currently upgrading my app to 3.2.6 so it might be not that important for me any longer, but still probably worth noticing.

@rosa
Copy link
Member

rosa commented Jan 20, 2025

Huh, interesting! I wonder if it's something specific to

Ubuntu 22.04 (technically, it's WSL in Windows 11).

since @hms couldn't reproduce this in Ruby 3.1.2 🤔

@markivancho
Copy link

markivancho commented Jan 21, 2025

I can confirm the same problem with high CPU usage and many queries to DB.
The graph from Datadog shows hits in solid_queue 1.1.2 and a low amount of hits after rollback to 1.0.2.

Image

queue_dispatcher.yml

default: &default
  dispatchers:
    - polling_interval: 5
      batch_size: 500

development:
  <<: *default

test:
  <<: *default

production:
  <<: *default

queue_default.yml

default: &default
  workers:
    - queues: [default, medium]
      threads: 10
      processes: 1
      polling_interval: 1

development:
  <<: *default

test:
  <<: *default

production:
  <<: *default

queue_heavy.yml

default: &default
  workers:
    - queues: [heavy, super_heavy]
      threads: 5
      processes: 1
      polling_interval: 5

development:
  <<: *default

test:
  <<: *default

production:
  <<: *default

My wild guess is that this is related to this change https://github.com/rails/solid_queue/pull/444/files#diff-bef5b458db8bc17c1a7cdf1b6f2f1508f8ad3c57f8c04139d7ff0ad4ad8d822bR28 in #444
as the amount of query hits stays nearly the same in non-business hours where we don't expect load to background processing

@hms
Copy link
Contributor

hms commented Jan 21, 2025

@markivancho

Would it be possible for you to post more info on the SolidQueue queries that are running? From the screen shot you posted, all I can see is there are two different queries labeled "delete from solid_queue".

@hms

@hms
Copy link
Contributor

hms commented Jan 21, 2025

@markivancho

Can also share your deployed Ruby and Rails versions, and what database and version?

@hms

@markivancho
Copy link

@hms sure, here is more detailed graph
ruby 3.1.2
rails 7.1.5.1
AWS RDS PostgreSQL 14.12

Image

@hms
Copy link
Contributor

hms commented Jan 23, 2025

@rosa, @markivancho, @alex29ua

I tracked problem down.

In above referenced PR, Interruptible.rb was reimplemented using Thread::Queue and Queue.pop(timeout:).

Ruby 3.1.* does not support the Queue.pop method with a keyword argument of timeout and that results in the queue pop call raising an exception, effectively returning immediately. This explains the high CPU utilization since Workers and Dispatchers effectively never sleep between poll loops.

The timeout argument is supported in Ruby 3.2 which is why everything worked when @alex29ua upgraded to Ruby 3.2.

What made this issue a little harder to track down is Queue.pop is called inside of a Concurrent::Promises.future, which swallowed the exception because I "cleverly" failed to add an exception handler because "such simple code could never fail" (hangs head in shame 😳 ).

When I initially responded that I tested Ruby 3.1 - 3.6, that was because I transposed 3.1.3 with 3.3.1 (which was the earliest version I did test with). Sorry for the delay that this caused in resolving the issue.

@hms

@rosa
Copy link
Member

rosa commented Jan 23, 2025

Oooh! Amazing find, @hms! 👏

Is this something you could open a PR to fix? Otherwise, I'll prepare a fix.

@markivancho
Copy link

@hms @rosa Thanks folks, you are amazing. Looking forward for an update

@hms
Copy link
Contributor

hms commented Jan 23, 2025

@rosa

I already have a PR to fix another small, zero impact bug (but a bug never the less) that I had found during my debugging this issue. I'm happy to address the Interruptible issue in that PR.

I do need some guidance on the SQ team's preferred solution. If SQ is going to support Ruby 3.1, then I have to replace the current version of Interruptible.rb with the original version -- there is no ruby 3.1 work around with the Thread::Queue based implementation.

The catch is: Ruby 3.1 hits EOL (for security patches -- which I assume is an effective EOL for 99%+ of users) in 2 months. If we're really only considering supporting 3.1 for 2 to X months, then I would propose to craft an initializer that includes the correct version of Interruptible based on the Ruby version.

Please let me know your preferences and I'll supply a PR.

Also, if SQ is going to support multiple versions of Ruby, would you be open to me creating a PR for the GitHub actions to test across supported rubies? When I created the original interruptible PR, I made the assumption that 3.3.1 was the baseline ruby because it was set in .ruby_version and the GitHub actions as the only tested ruby.

@rosa
Copy link
Member

rosa commented Jan 23, 2025

Thanks so much, @hms!

I made the assumption that 3.3.1 was the baseline ruby because it was set in .ruby_version and the GitHub actions as the only tested ruby.

Yeah, you're totally right about this, that's my bad, sorry! I think to drop support for Ruby 3.1, which is supported by the minimum Rails version required by Solid Queue (Rails 7.1) we'd need to release a major version as that'd be a breaking change 🤔 I'm not ready yet to ship a major version (I'd like to get batches and sharding support done before that, and I'm not quite sure when that'll happen) so I think the approach with the two versions of interruptible based on RUBY_VERSION would be perfect in this case. Then, for Solid Queue 2.0, we can drop the 3.1 implementation.

would you be open to me creating a PR for the GitHub actions to test across supported rubies?

That'd be super, super helpful!! Thank you so much!

hms added a commit to ikyn-inc/solid_queue that referenced this issue Jan 24, 2025
The current Thread::Queue based Interruptible can not work with
Ruby version earlier than 3.2.  Given that SQ sets it's minimum
supported version of Ruby is derived from "full support of Rails 7.1",
this in theory mandates support for Ruby 2.7.8.  However, other
dependencies force the minimum version of Ruby to: 3.1.6

This commit adds a boot time check of the Ruby version and selects
either the current or original implementation of Interruptible.
@hms
Copy link
Contributor

hms commented Jan 24, 2025

@rosa @alex29ua @markivancho

PR submitted as committed. Hopefully it's acceptable.

@rosa -- the one failure in test run is because of a timing problem based on container load rather than a real issue.

@hms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants