Reference
=========

.. currentmodule:: trio_parallel

This project's aim is to use the lightest-weight, lowest-overhead, lowest latency
method to achieve parallelism of arbitrary Python code, and make it natively async for Trio.
Given that Python (and CPython in particular) has ongoing difficulties parallelizing
CPU-bound work in threads, this package dispatches synchronous function execution to
*subprocesses*. However, this project is not fundamentally constrained by that,
and will be considering subinterpreters, or any other avenue as they become available.

Running CPU-bound functions in parallel
---------------------------------------

The main interface for ``trio-parallel`` is :func:`run_sync`:

.. autofunction:: run_sync

.. note::

    :func:`trio_parallel.run_sync` does not work with functions defined at the REPL
    or in a Jupyter notebook cell due to the use of the `multiprocessing` ``spawn``
    context... *unless* cloudpickle_ is also installed!

A minimal program that dispatches work with :func:`run_sync` looks like this:

.. literalinclude:: examples/minimal.py

Just like that, you've dispatched a CPU-bound synchronous function to a worker
subprocess and returned the result! However, only doing this much is a bit pointless;
we are just expending the startup time of a whole python process to achieve the same
result that we could have gotten synchronously. To take advantage, some other task
needs to be able to run concurrently:

.. literalinclude:: examples/checkpointing.py

The output of this script indicates that the Trio event loop is running smoothly.
Still, this doesn't demonstrate much advantage over :func:`trio.to_thread.run_sync`.
You can see for yourself by substituting the function calls, since the call
signatures are intentionally identical.

No, ``trio-parallel`` really shines when your function has significant CPU-intensive
work that regularly involves the python interpreter:

.. literalinclude:: examples/parallel_loops.py

This script should output a roughly equal number of loops completed for each process,
as opposed to the lower and unbalanced number you might observe using threads.

As with Trio threads, these processes are cached to minimize latency and resource
usage. Despite this, executing a function in a process can take orders of magnitude
longer than in a thread when dealing with large arguments or a cold cache.

.. literalinclude:: examples/cache_warmup.py

Therefore, we recommend avoiding worker process dispatch
for synchronous functions with an expected duration of less than about 1 ms.

Controlling Concurrency
-----------------------

By default, ``trio-parallel`` will cache as many workers as the system has CPUs
(as reported by :func:`os.cpu_count`), allowing fair, maximal, truly-parallel
dispatch of CPU-bound work in the vast majority of cases. There are two ways to modify
this behavior. The first is the ``limiter`` argument of :func:`run_sync`, which
permits you to limit the concurrency of a specific function dispatch. In some cases,
it may be useful to modify the default limiter, which will affect all :func:`run_sync`
calls.

.. autofunction:: current_default_worker_limiter

Cancellation and Exceptions
---------------------------

Unlike threads, subprocesses are strongly isolated from the parent process, which
allows two important features that cannot be portably implemented in threads:

  - Forceful cancellation: a deadlocked call or infinite loop can be cancelled
    by completely terminating the process.
  - Protection from errors: if a call segfaults or an extension module has an
    unrecoverable error, the worker may die but the main process will raise
    a normal Python exception.

Cancellation
~~~~~~~~~~~~

Cancellation of :func:`trio_parallel.run_sync` is modeled after
:func:`trio.to_thread.run_sync`, with a ``kill_on_cancel`` keyword argument that
defaults to ``False``. Entry is an unconditional checkpoint, i.e. regardless of
the value of ``kill_on_cancel``. The key difference in behavior comes upon cancellation
when ``kill_on_cancel=True``. A Trio thread will be abandoned to run in the background
while this package will kill the worker with ``SIGKILL``/``TerminateProcess``:

.. literalinclude:: examples/cancellation.py

We recommend to avoid using the ``kill_on_cancel`` feature
if loss of intermediate results, writes to the filesystem, or shared memory writes
may leave the larger system in an incoherent state.

Exceptions
~~~~~~~~~~

.. autoexception:: BrokenWorkerError

Signal Handling
~~~~~~~~~~~~~~~

This library configures worker processes to ignore ``SIGINT`` to have correct semantics
when you hit ``CTRL+C``, but all other signal handlers are left in python's default
state. This can have surprising consequences if you handle signals in the main
process, as the workers are in the same process group but do not share the same
signal handlers. For example, if you handle ``SIGTERM`` in the main process to
achieve a graceful shutdown of a service_, a spurious :class:`BrokenWorkerError` will
raise at any running calls to :func:`run_sync`. You will either
need to handle the exeptions, change the method you use to send signals, or configure
the workers to handle signals at initialization using the tools in the next section.

Configuring workers
-------------------

By default, :func:`trio_parallel.run_sync` draws workers from a global cache
that is shared across sequential and between concurrent :func:`trio.run()`
calls, with workers' lifetimes limited to the life of the main process. This
can be configured with `configure_default_context()`:

.. autofunction:: configure_default_context

This covers most use cases, but for the many edge cases, `open_worker_context()`
yields a `WorkerContext` object on which `WorkerContext.run_sync()` pulls workers
from an isolated cache with behavior specified by the class arguments. It is only
advised to use this if specific control over worker type, state, or
lifetime is required in a subset of your application.

.. autofunction:: open_worker_context
   :async-with: ctx

.. autoclass:: WorkerContext()
   :members:

Alternatively, you can implicitly override the default context of :func:`run_sync`
in any subset of the task tree using `cache_scope()`. This async context manager
sets an internal TreeVar_ so that the current task and all nested subtasks operate
using an internal, isolated `WorkerContext`, without having to manually pass a
context object around.

.. autofunction:: cache_scope
   :async-with:

One typical use case for configuring workers is to set a policy for taking a worker
out of service. For this, use the ``retire`` argument. This example shows how to
build (trivial) stateless and stateful worker retirement policies.

.. literalinclude:: examples/single_use_workers.py

A more realistic use-case might examine the worker process's memory usage (e.g. with
psutil_) and retire if usage is too high.

If you are retiring workers frequently, like in the single-use case, a large amount
of process startup overhead will be incurred with the default "spawn" worker type.
If your platform supports it, an alternate `WorkerType` might cut that overhead down.

.. autoclass:: WorkerType()

Internal Esoterica
------------------

You probably won't use these... but create an issue if you do and need help!

.. autofunction:: default_context_statistics

.. _cloudpickle: https://github.com/cloudpipe/cloudpickle
.. _psutil: https://psutil.readthedocs.io/en/latest/
.. _service: https://github.com/richardsheridan/trio-parallel/issues/348
.. _TreeVar: https://tricycle.readthedocs.io/en/latest/reference.html#tricycle.TreeVar