[cuda.parallel]: CI testing should exclude tests making large GPU memory allocations

We use pytest-xdist to speed-up test execution, and invoke `pytest -n auto`. 

Executing tests using `N` processes, we scale GPU allocation footprint by a factor of `N` as well (in the worst case scenario). This risks running into spurious `OutOfMemoryError` exceptions of our own doing. 

A simple solution would be to introduce a pytest mark, say `pytest.mark.large` to mark those tests that make large GPU memory allocations and  exclude them using `pytest` command line argument `-m "not large"` in CI jobs. 

Alternative solution might be to introduce `exclusive_gpu_use_lock` based on `FileLock` as in https://pytest-xdist.readthedocs.io/en/latest/how-to.html#making-session-scoped-fixtures-execute-only-once 

The lock would be used around blocks making and using GPU allocations, and would need to make sure to release GPU allocations before releasing the lock. Doing so would overlap JIT-ting steps, allocation and execution steps, and host validation steps while making sure that GPU allocation/execution/validation steps are serialized.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cuda.parallel]: CI testing should exclude tests making large GPU memory allocations #4722

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

[cuda.parallel]: CI testing should exclude tests making large GPU memory allocations #4722

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions