from concurrent.futures import ProcessPoolExecutor
= ProcessPoolExecutor(max_workers=2) pool
Diagnosing an issue with plugin modules, process pools, and the import system
This is a walkthrough of the rather surprising mechanics underlying a failing test that brings together plugin modules (that register themselves on import), process pools, and Python’s import system.
What went wrong
It starts with a failed test:
= r"\[\d, \d+.\d+, \d+.\d+, \d+.\d+, '\d\d:\d\d'\]"
pat lambda: learn.fit(1), pat, regex=True) test_stdout(
The idea is to test that calling learn.fit(1)
writes text to stdout
(standard output) that matches the regex pattern pat
. For those less familiar, this form of testing is common in projects that use nbdev
.
Here’s what actually happens:
- The
nbprocess_test
command creates aProcessPoolExecutor
with some number of workers and tasks (each notebook being a separate task). - Worker 1 processes one task, say task 1, which creates an IPython
InteractiveShell
, then runsimport fastai.callbacks.progress
which addsProgressCallback
to the the variablefastcore.basics.defaults
. - Worker 1 processes another task, say task 3, which creates a fresh
InteractiveShell
, callslearner.fit
, and testsstdout
. SinceProgressCallback
has been registered in this process by task 1, a progress bar is also printed tostdout
, breaking the test.
Let’s break down the underlying mechanics. There are three behaviours that come together to cause this sequence of events:
ProcessPoolExecutor
s reuse processes. It seems obvious in hindsight since that’s how pools usually work, but I had never realised it until now. In the example above, worker 1 executes task 1 and task 3 in the same process.fastai
callbacks register themselves on import. In this case,fastai.callbacks.progress
addsProgressCallback
todefaults.callbacks
.- Changes to imported modules persist across IPython
InteractiveShell
s.nbprocess_test
runs each test in parallel usingexecnb
, which implements a sub-class ofInteractiveShell
.
Next, we’ll verify these behaviours with tiny experiments. I highly recommend using tiny experiments to understand complex systems.
ProcessPoolExecutor
s reuse processes
Perhaps we should know this simply from the name, but I didn’t, so we’ll figure it out with a tiny experiment. Start by creating a pool with 2 max_workers
:
There aren’t any processes in the pool yet:
pool._processes
{}
Submit a task: the function os.getpid
, which will return the process ID of the worker that runs it. Since there are no processes in the pool, submit
will start a new worker process, and have it execute the task. ProcessPoolExecutor.submit
returns a Future
object, and Future.result
returns the task’s return value:
import os
= pool.submit(os.getpid)
future future.result()
45907
No matter how many times you manually rerun the above cell, it will aways be executed on the same process. Notice that the process is now also available in the pool:
pool._processes
{45907: <SpawnProcess name='SpawnProcess-1' pid=45907 parent=45899 started>}
If we submit another task:
= pool.submit(os.getpid)
future future.result()
45907
…it’s still executed on the same process.
Let’s try executing two processes at the same time:
= [pool.submit(os.getpid) for _ in range(2)]
futures for o in futures] [o.result()
[45907, 45907]
Weird. They’re both executed on the same process…
pool._processes
{45907: <SpawnProcess name='SpawnProcess-1' pid=45907 parent=45899 started>,
45908: <SpawnProcess name='SpawnProcess-2' pid=45908 parent=45899 started>}
It looks like another process was started! I haven’t confirmed this, but I suspect that when we submitted two futures, the pool determined that it needed more workers, so it started another. However, the first worker’s task ended before the second worker started up, so the first worker processed both.
Since we instantiated the pool with 2 max_workers
, these two processes will execute all tasks, no matter how many we submit:
= [pool.submit(os.getpid) for _ in range(10)]
futures for o in futures] [o.result()
[45907, 45907, 45907, 45907, 45907, 45907, 45907, 45907, 45907, 45907]
Shutdown the pool to free up any resources:
pool.shutdown()
fastai callbacks register themselves on import
This one is easy to demonstrate. defaults
has no callbacks
attribute to start with:
from fastcore.basics import defaults
defaults
namespace(cpus=4)
defaults.callbacks
is populated after importing ProgressCallback
:
from fastai.callback.progress import ProgressCallback
defaults
namespace(cpus=4,
benchmark=True,
use_cuda=None,
activation=torch.nn.modules.activation.ReLU,
callbacks=[fastai.callback.core.TrainEvalCallback,
fastai.learner.Recorder,
fastai.learner.CastToTensor,
fastai.callback.progress.ProgressCallback],
lr=0.001)
Changes to imported modules persist across IPython InteractiveShell
s
Why is any of the above a problem? Didn’t we say that nbprocess_test
creates a separate shell for each notebook? Yes it does, but it turns out that changes to imported modules persist across shells.
from execnb.shell import CaptureShell
First make sure that CaptureShell
doesn’t have a foo
attribute - this will make sense in a second:
assert not hasattr(CaptureShell, 'foo')
Now add the foo
attribute:
= 'bar' CaptureShell.foo
We can see foo
inside a CaptureShell
:
= CaptureShell()
shell 'from execnb.shell import CaptureShell; CaptureShell.foo')
shell.run( shell.result
'bar'
This happens because when we first imported from the execnb.shell
module it was cached in sys.modules
:
import sys
'execnb.shell'] sys.modules[
<module 'execnb.shell' from '/Users/seem/code/execnb/execnb/shell.py'>
In fact, sys.modules['execnb.shell'].CaptureShell
is another name for CaptureShell
:
'execnb.shell'].CaptureShell is CaptureShell sys.modules[
True
Python caches imports to speed up consecutive imports from the same modules. InteractiveShell
(and its sub-classes) have the same sys.modules
which causes this behaviour:
= CaptureShell()
shell "import sys; sys.modules['execnb.shell'].CaptureShell.foo")
shell.run( shell.result
'bar'
exec
with empty globals
and locals
uses the same sys.modules
too so it doesn’t avoid the issue:
exec("import sys; print('execnb.shell' in sys.modules)", {}, {})
True
exec("from execnb.shell import CaptureShell; print(CaptureShell.foo)", {}, {})
bar
In the end, we agreed that the test itself was broken, because it made assumptions about its environment without ensuring that they were true. Tests like this may fail - regardless of the above behaviour - if we import any module that registers a callback. For example, this fails too:
from fastai.callback.progress import *
from nbprocess.test import test_nb
'nbs/nb_with_failing_test.ipynb') test_nb(
The fix is to explicitly run learner.fit
with the precise list of callbacks required.