Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockpit-bridge runs at 100% cpu within 24 hours of installation and eventually causes a kernel dump involving dbus-broker #21468

Open
steeldomejeff opened this issue Dec 23, 2024 · 7 comments
Labels

Comments

@steeldomejeff
Copy link

steeldomejeff commented Dec 23, 2024

Explain what happens

  1. Install Rocky 9.5
  2. Install Cockpit v331 from repo
  3. No particular cockpit activity in UI, simply login/logout
  4. Within 24 hours top shows cockpit-bridge at 100% CPU
  5. Reproducible on three systems I've tried so far
  6. Current test system is in this state currently which will allow for further debugging
  7. Once dbus-broker crashes (I'll send dmesg output once that happens), the cockpit UI will not accept logins, either times out or becomes completely unresponsive. Restarting cockpit service doesn't clear the condition (either the bridge CPU or the login issue).
  8. Further investigation ps -T -p 3879458 reveals there are 4 threads, and one of the four is at 100%
  9. pstack of the offending thread (Thread 1 - 3879458) reveals this:

Thread 4 (Thread 0x7f38977fe640 (LWP 3880754) "cockpit-bridge"):
#0 0x00007f389e286c2a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00007f389e2920d8 in __new_sem_wait_slow64.constprop.0 () from /lib64/libc.so.6
#2 0x00007f389e718a2c in PyThread_acquire_lock_timed () from /lib64/libpython3.9.so.1.0
#3 0x00007f389cde664e in _queue_SimpleQueue_get_impl () from /usr/lib64/python3.9/lib-dynload/_queue.cpython-39-x86_64-linux-gnu.so
#4 0x00007f389cde6820 in _queue_SimpleQueue_get () from /usr/lib64/python3.9/lib-dynload/_queue.cpython-39-x86_64-linux-gnu.so
#5 0x00007f389e73820f in cfunction_vectorcall_FASTCALL_KEYWORDS () from /lib64/libpython3.9.so.1.0
#6 0x00007f389e730b30 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#7 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#8 0x00007f389e733970 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#9 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#10 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#11 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#12 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#13 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#14 0x00007f389e74f4e5 in method_vectorcall () from /lib64/libpython3.9.so.1.0
#15 0x00007f389e82ddfa in t_bootstrap () from /lib64/libpython3.9.so.1.0
#16 0x00007f389e82dc68 in pythread_wrapper () from /lib64/libpython3.9.so.1.0
#17 0x00007f389e28a092 in start_thread () from /lib64/libc.so.6
#18 0x00007f389e30f120 in clone3 () from /lib64/libc.so.6

Thread 3 (Thread 0x7f3897fff640 (LWP 3879487) "cockpit-bridge"):
#0 0x00007f389e286c2a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00007f389e2920d8 in __new_sem_wait_slow64.constprop.0 () from /lib64/libc.so.6
#2 0x00007f389e718a2c in PyThread_acquire_lock_timed () from /lib64/libpython3.9.so.1.0
#3 0x00007f389cde664e in _queue_SimpleQueue_get_impl () from /usr/lib64/python3.9/lib-dynload/_queue.cpython-39-x86_64-linux-gnu.so
#4 0x00007f389cde6820 in _queue_SimpleQueue_get () from /usr/lib64/python3.9/lib-dynload/_queue.cpython-39-x86_64-linux-gnu.so
#5 0x00007f389e73820f in cfunction_vectorcall_FASTCALL_KEYWORDS () from /lib64/libpython3.9.so.1.0
#6 0x00007f389e730b30 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#7 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#8 0x00007f389e733970 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#9 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#10 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#11 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#12 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#13 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#14 0x00007f389e74f4e5 in method_vectorcall () from /lib64/libpython3.9.so.1.0
#15 0x00007f389e82ddfa in t_bootstrap () from /lib64/libpython3.9.so.1.0
#16 0x00007f389e82dc68 in pythread_wrapper () from /lib64/libpython3.9.so.1.0
#17 0x00007f389e28a092 in start_thread () from /lib64/libc.so.6
#18 0x00007f389e30f120 in clone3 () from /lib64/libc.so.6

Thread 2 (Thread 0x7f389ca5e640 (LWP 3879461) "cockpit-bridge"):
#0 0x00007f389e286c2a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00007f389e2920d8 in __new_sem_wait_slow64.constprop.0 () from /lib64/libc.so.6
#2 0x00007f389e718a2c in PyThread_acquire_lock_timed () from /lib64/libpython3.9.so.1.0
#3 0x00007f389cde664e in _queue_SimpleQueue_get_impl () from /usr/lib64/python3.9/lib-dynload/_queue.cpython-39-x86_64-linux-gnu.so
#4 0x00007f389cde6820 in _queue_SimpleQueue_get () from /usr/lib64/python3.9/lib-dynload/_queue.cpython-39-x86_64-linux-gnu.so
#5 0x00007f389e73820f in cfunction_vectorcall_FASTCALL_KEYWORDS () from /lib64/libpython3.9.so.1.0
#6 0x00007f389e730b30 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#7 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#8 0x00007f389e733970 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#9 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#10 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#11 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#12 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#13 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#14 0x00007f389e74f4e5 in method_vectorcall () from /lib64/libpython3.9.so.1.0
#15 0x00007f389e82ddfa in t_bootstrap () from /lib64/libpython3.9.so.1.0
#16 0x00007f389e82dc68 in pythread_wrapper () from /lib64/libpython3.9.so.1.0
#17 0x00007f389e28a092 in start_thread () from /lib64/libc.so.6
#18 0x00007f389e30f120 in clone3 () from /lib64/libc.so.6

Thread 1 (Thread 0x7f389ea17740 (LWP 3879458) "cockpit-bridge"):
#0 0x00007f389d50a8a1 in siphash24_compress () from /lib64/libsystemd.so.0
#1 0x00007f389d50ab7f in trivial_hash_func () from /lib64/libsystemd.so.0
#2 0x00007f389d50ac69 in base_bucket_hash.lto_priv () from /lib64/libsystemd.so.0
#3 0x00007f389d50b9df in _hashmap_get () from /lib64/libsystemd.so.0
#4 0x00007f389d503b46 in sd_event_wait () from /lib64/libsystemd.so.0
#5 0x00007f389d503d74 in sd_event_prepare () from /lib64/libsystemd.so.0
#6 0x00007f389d5e78d6 in ffi_call_unix64 () from /lib64/libffi.so.8
#7 0x00007f389d5e4556 in ffi_call_int.lto_priv () from /lib64/libffi.so.8
#8 0x00007f389d6021e9 in _ctypes_callproc.cold () from /usr/lib64/python3.9/lib-dynload/_ctypes.cpython-39-x86_64-linux-gnu.so
#9 0x00007f389d60c073 in PyCFuncPtr_call () from /usr/lib64/python3.9/lib-dynload/_ctypes.cpython-39-x86_64-linux-gnu.so
#10 0x00007f389e74fcbb in PyObject_Call () from /lib64/libpython3.9.so.1.0
#11 0x00007f389e733970 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#12 0x00007f389e72e699 in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
#13 0x00007f389e740da5 in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1.0
#14 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#15 0x00007f389e72e699 in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
#16 0x00007f389e740da5 in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1.0
#17 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#18 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#19 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#20 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#21 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#22 0x00007f389e74109b in function_code_fastcall () from /lib64/libpython3.9.so.1.0
#23 0x00007f389e72fdfd in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#24 0x00007f389e72e699 in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
#25 0x00007f389e740da5 in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1.0
#26 0x00007f389e730b30 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#27 0x00007f389e72e699 in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
#28 0x00007f389e740da5 in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1.0
#29 0x00007f389e730b30 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#30 0x00007f389e72e699 in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
#31 0x00007f389e740da5 in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1.0
#32 0x00007f389e72faf7 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
#33 0x00007f389e72e699 in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
#34 0x00007f389e72e305 in _PyEval_EvalCodeWithName () from /lib64/libpython3.9.so.1.0
#35 0x00007f389e7e2627 in PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
#36 0x00007f389e810114 in run_eval_code_obj () from /lib64/libpython3.9.so.1.0
#37 0x00007f389e80c20b in run_mod () from /lib64/libpython3.9.so.1.0
#38 0x00007f389e697810 in pyrun_file[cold] () from /lib64/libpython3.9.so.1.0
#39 0x00007f389e805a59 in PyRun_SimpleFileExFlags () from /lib64/libpython3.9.so.1.0
#40 0x00007f389e802f0f in Py_RunMain () from /lib64/libpython3.9.so.1.0
#41 0x00007f389e7d509d in Py_BytesMain () from /lib64/libpython3.9.so.1.0
#42 0x00007f389e2295d0 in __libc_start_call_main () from /lib64/libc.so.6
#43 0x00007f389e229680 in __libc_start_main_impl () from /lib64/libc.so.6
#44 0x00005579e9efe095 in _start ()

Version of Cockpit

331

Where is the problem in Cockpit?

Unknown or not applicable

Server operating system

other

Server operating system version

Rocky 9.5

What browsers are you using?

Firefox

System log

cockpit-bridge --bridges
[
  {
    "label": null,
    "privileged": true,
    "match": {},
    "environ": [
      "SUDO_ASKPASS=/usr/libexec/cockpit-askpass"
    ],
    "spawn": [
      "sudo",
      "-k",
      "-A",
      "cockpit-bridge",
      "--privileged"
    ],
    "name": "sudo"
  },
  {
    "label": null,
    "privileged": true,
    "match": {},
    "environ": [],
    "spawn": [
      "pkexec",
      "--disable-internal-agent",
      "cockpit-bridge",
      "--privileged"
    ],
    "name": "pkexec"
  },
  {
    "label": null,
    "privileged": false,
    "match": {
      "payload": "metrics1"
    },
    "environ": [],
    "spawn": [
      "/usr/libexec/cockpit-pcp"
    ],
    "name": "/usr/libexec/cockpit-pcp"
  }
]


strace on the process reveals:

strace -fkp 3879458
strace: Process 3879458 attached with 4 threads
[pid 3880754] futex(0x5579ed55b740, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 3879487] futex(0x5579ed55b740, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 3879461] futex(0x5579ed55b740, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY

no live activity on process despite 100% CPU utilization.
@martinpitt
Copy link
Member

@steeldomejeff There is no way to confirm this quickly (as it takes a full day to reproduce), and everyone is on holidays. I want to clarify:

  1. No particular cockpit activity in UI, simply login/logout

Are you sure about the logout? Normally if you log out of Cockpit, the bridge program gets killed. If it does stay around, then (1) this is an immediate bug, not one that takes much time to reproduce, but (2) I can't reproduce it. We also test that cleanup in CI on all OSes. If that was just a typo, and you really mean "log in to cockpit and let it sit for a day", then I can try on a cloud instance or so.

Is there anything else happening on the system during that time? Applying package updates, changing config files in /etc and such?

Is the dbus-broker crash related to this? I. e. if you kill -SEGV (or similar) the system or user (which one is it?) instance, does that trigger the bug? Or is it something unrelated?

@steeldomejeff
Copy link
Author

steeldomejeff commented Dec 23, 2024 via email

@steeldomejeff
Copy link
Author

I noticed that dbus-broker is locked at version 28 for Rocky 9.x (later versions included in el10). So I built version 36 from source and installed that on my test systems. I noticed there were some memory and memory leak fixes in version 31. Since dbus-broker appears to bork first with a memory leak, it may be worth a shot.

@martinpitt
Copy link
Member

Given the dependence of cockpit on dbus I initially assumed they were related events (that may not necessarily be the case).

They are related, the problem happens in the systemd dbus client library:

#4 0x00007f389d503b46 in sd_event_wait () from /lib64/libsystemd.so.0

So it seems it's doing a 100% CPU polling loop to D-Bus. I can't yet say whether it's a bug in sd-dbus client lib or dbus-broker. Thanks for setting up your "new d-broker" experiment!

@steeldomejeff
Copy link
Author

steeldomejeff commented Dec 25, 2024 via email

@steeldomejeff
Copy link
Author

steeldomejeff commented Dec 25, 2024 via email

@zeozeozeo
Copy link

Having the same issue, cockpit-bridge together with dbus-daemon and podman seem to be hogging ~30% of every CPU core on my board. It begins after a login has been initiated.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants