Dask threads and subprocess count
Several tasks and subtasks have config options daskThreads
and
subprocessCount
used to control threading within a subtask:
# The number of threads dask is allowed to spawn for each task/subtask.
# Decrease this number if tasks/subtasks are running out of available threads
daskThreads = 2
# The number of subprocesses that each task/subtask gets counted as
# occupying. Increase this number if tasks/subtasks are running out of
# memory, so that fewer tasks will be allowed to run at once
subprocessCount = 1
Dask threads
Dask and xarray support thread-parallel operations on data sets. They also support chunk-wise operation on data sets that can’t fit in memory. These capabilities are very powerful but also difficult to configure for general cases. Dask is also not desigend by default with the idea that multiple tasks, each with multiple dask threads, might operate simultaneously. As a result, it is possible to spawn huge numbers of dask threads in MPAS-Analysis that both slow down analysis and lead to errors when the node runs out of threads completely.
To prevent this, many tasks or subtasks that use dask threading take the number
of execution threads from a config option, typically in the config section for
the parent task. Typically, the number of daskThreads
should be around
the same as the number of cores on a node divided by the number of tasks
that will run simultaneiously. Since the number of running tasks is controlled
by subprocessCount
, see below, this number might differ from
parallelTaskCount
.
Subprocess count
Tasks or subtasks that use dask threading may consume too much memory or use
too many threads to “count” as a single task. That is, it might not be safe to
run with parallelTaskCount
simultaneious instances of the task/subtask and
it would be better if it occupied the slot of multiple tasks in the pool of
tasks. MPAS-Analysis will treat a dask-based task or subtask as occupying
the number of task slots given by the subprocessCount
option. For example,
if parallelTaskCount = 8
and subprocessCount = 2
, up to 4 tasks or
subtasks would be allowed to run simultaneiously.