Skip to content

Segfault in gc while finalizing #135115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dimaqq opened this issue Jun 4, 2025 · 6 comments
Open

Segfault in gc while finalizing #135115

dimaqq opened this issue Jun 4, 2025 · 6 comments
Labels
pending The issue will be closed if no feedback is provided type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@dimaqq
Copy link
Contributor

dimaqq commented Jun 4, 2025

Crash report

What happened?

I believe I have been able to hit this twice by now, though the incidence is low: about 10 hours of running unit tests back-to-back yielded me one crash where a core dump was captured.

Setup: mostly pure python (with wrapt, cyaml and pydantic), with threads (opentelemetry-sdk helper thread) and some form of multi- or sub-processing (pytest-xdist).

Tracback:

#0  gc_get_refs (g=0x99dd7d052835e0c0) at Python/gc.c:71
#1  move_unreachable (young=0x5e558d4e2890 <_PyRuntime+111872>, unreachable=0x7ffc77bd0718) at Python/gc.c:562
#2  deduce_unreachable (base=base@entry=0x5e558d4e2890 <_PyRuntime+111872>, unreachable=unreachable@entry=0x7ffc77bd0718) at Python/gc.c:1125
#3  0x00005e558c21f0ef in gc_collect_main (tstate=tstate@entry=0x5e558d5103b0 <_PyRuntime+299040>, generation=generation@entry=2, reason=reason@entry=_Py_GC_REASON_SHUTDOWN)
    at Python/gc.c:1360
#4  0x00005e558c21ff74 in _PyGC_CollectNoFail (tstate=0x7c514fdce4e0, tstate@entry=0x5e558d5103b0 <_PyRuntime+299040>) at Python/gc.c:1657
#5  0x00005e558c257dee in finalize_modules (tstate=tstate@entry=0x5e558d5103b0 <_PyRuntime+299040>) at Python/pylifecycle.c:1795
#6  0x00005e558c256829 in _Py_Finalize (runtime=0x5e558d4c7390 <_PyRuntime>) at Python/pylifecycle.c:2132
#7  0x00005e558c256687 in Py_FinalizeEx () at Python/pylifecycle.c:2259
#8  0x00005e558c2a7e74 in Py_RunMain () at Modules/main.c:777
#9  0x00005e558c2a86f1 in pymain_main (args=args@entry=0x7ffc77bd0b40) at Modules/main.c:805
#10 0x00005e558c2a874c in Py_BytesMain (argc=<optimized out>, argv=0x5e558c2217b0 <visit_reachable>) at Modules/main.c:829
#11 0x00007c515762a1ca in __libc_start_call_main (main=main@entry=0x5e558bfd3aa0 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffc77bd0c88)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#12 0x00007c515762a28b in __libc_start_main_impl (main=0x5e558bfd3aa0 <main>, argc=4, argv=0x7ffc77bd0c88, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7ffc77bd0c78) at ../csu/libc-start.c:360
#13 0x00005e558bfd39d2 in _start ()

gdb session:
https://gist.github.com/dimaqq/cd87dfdad4e0cc1d5832fa226e516ff0

Admittedly I'm a little rusty with gdb and a bit out of my depth when it comes to GC.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

No response

@dimaqq
Copy link
Contributor Author

dimaqq commented Jun 4, 2025

Python: Python 3.13.3 (main, May 30 2025, 05:35:48) [Clang 20.1.4 ]

Specifically cpython-3.13.3+20250529-x86_64-unknown-linux-gnu-debug-full.tar.zst from https://github.com/astral-sh/python-build-standalone/releases/tag/20250529

The other crash was with a non-debug build of 3.13, slightly older build.

@dimaqq
Copy link
Contributor Author

dimaqq commented Jun 4, 2025

I've uploaded core, cpython and venv tarballs

@ZeroIntensity
Copy link
Member

Unfortunately, there's very little we can do without a repro. There's tons of ways to cause a crash during a garbage collection, many of which are due to misuse of the C API or ctypes.

@ZeroIntensity ZeroIntensity added the pending The issue will be closed if no feedback is provided label Jun 4, 2025
@dimaqq
Copy link
Contributor Author

dimaqq commented Jun 5, 2025

ctypes were not used.
I've confirmed that by grepping through the venv.

c extensions are present, however these are relatively well-known: yaml, pydantic-core, wrapt.

I would love to be able to investigate this further, or help someone else investigate this further.
Would it help to cook up a repro container image and push it somewhere?
Is there some additional verbosity or logging that would help?
Maybe someone can teach me to process the core file better, e.g. to walk though all the allocated (and deallocated?) objects? A set that could contain clues why this happens in the first place.

@ZeroIntensity
Copy link
Member

If there are extensions active, then there's a very real possibility this is a bug in one of those.

Would it help to cook up a repro container image and push it somewhere?

That would be good as a last resort. The test suite takes 10 hours, right?

My favorite debugger for Python core files is PyStack, I suggest trying that on the core file and seeing which objects are causing the crash, and what the program was doing when it happened. That should give us a much better idea of where to look.

@vstinner
Copy link
Member

vstinner commented Jun 5, 2025

Did you try to reproduce your issue with a debug build of Python? It's usually the python3-debug package on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

3 participants