Skip to content

Parallel processing settings #38

@dshean

Description

@dshean

Seems like default number of processes (which is later used to set number of jobs) should be the number of logical CPU cores minus 1 by default.

Curious why default number of threads is 2? If that improves PDAL pipeline performance, then we should set default number of processes as number of physical CPU cores minus 1, assuming modern CPU with hyperthreading.

I'm also not clear on where the default memory limits per worker are set - I think this is all done on the dask side. I'm seeing a sequence of these warnings, when using 5x5 km tiles for a small AOI requiring 4 tiles total, on a machine with 64 GB RAM...

2025-07-11 15:35:38,884 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 11.20 GiB -- Worker memory limit: 16.00 GiB
2025-07-11 15:35:46,663 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker.  Process memory: 12.98 GiB -- Worker memory limit: 16.00 GiB
2025-07-11 15:37:05,132 - distributed.worker.memory - WARNING - Worker is at 21% memory usage. Resuming worker. Process memory: 3.50 GiB -- Worker memory limit: 16.00 GiB

Which slows down the processing.

Some options presented here: https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os. Some of this could be related to older OS version on this machine - modern OS may be better about freeing memory more quickly, which dask expects.

I will reduce tile size for now, which should help as I kick off processing for larger AOI, but I think we can do a better job setting number of processes based on user-specified tile size, expected memory requirements for each tile, and the total available RAM, instead of just the total number of available CPU cores.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions