FEAT: extend profiling to child processes#431
Conversation
|
Did some more tests on local post-#428-merge, maybe it is just legacy Python and dependency versions causing the issues. Will just rebase, force-push, and see what happens. |
2cd2ed4 to
f9a37af
Compare
- `line_profiler/curated_profiling.py`
New module for setting up profiling in a curated environment
- `ClassifiedPreimportTargets.from_targets()`
Method for creating a `ClassifiedPreimportTargets` instance,
facilitating writing pre-import modules in a replicable and portable
manner
- `ClassifiedPreimportTargets.write_preimport_module()`
Method for writing a pre-import module based on an instance;
also fixed bug where the body of the written module was intercepted
without appearing in the debug output
- `kernprof.py`
- `_gather_preimport_targets()`
Migrated to `line_profiler.curated_profiling`
- `_write_preimports()`
Now using the new `ClassifiedPreimportTargets` class, moving esp.
the logic to the `write_preimport_module()` method
- `kernprof.py::_manage_profiler` `line_profiler/curated_profiling.py::CuratedProfilerContext` New context-manager classes for handling profiler setup and teardown - `kernprof.py::_pre_profile()` Refactored into the above context managers and other private functions (`_prepare_profiler()`, `_prepare_exec_script()`)
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
New class for passing info onto child processes so that profiling
can resume there
line_profiler/pth_hook.py
New submodule for the .pth-file-based solution to propagating
profiling into child processes:
write_pth_hook()
In the main process, write the temporary .pth file to be loaded
in child processes
load_pth_hook()
Called by the .pth in child process, loading the cache and
setting up profiling based thereon
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
Added new `.profile_imports` attribute to correspond to `kernprof`'s
`--prof-imports` flag
line_profiler/_child_process_profiling/meta_path_finder.py
New submodule defining the `RewritingFinder` class, a meta path
finder which rewrites a single module on import
line_profiler/_child_process_profiling/pth_hook.py
write_pth_hook()
Now also handling the `os.fork()` patching/wrapping
_setup_in_child_process()
Now creating a `RewritingFinder` to mirror what
`~.autoprofile.autoprofile.run()` does in the main process
.
line_profiler/_child_process_profiling/cache::LineProfilingCache
Refactored `.load()`
line_profiler/_child_process_profiling/multiprocessing_patches.py
New submodule for applying patches to the `multiprocessing`
package, so that profiling is automatically set up in child
processes created by it
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
<general>
Added debug logging to various methods
gather_stats()
New method for gathering profiling stats from child processes
inject_env_vars()
New method for injecting `.environ` into `os.environ`
line_profiler/line_profiler.py::LineStats
get_empty_instance()
New convenience method for creating an empty instance
from_files()
Added new argument `on_defective` to allow for processing a
group of files that cannot all be correctly read
line_profiler/rc/line_profiler.toml::[tool.line_profiler.kernprof]
Added new key-value pair `prof-child-procs` for the default value
of `kernprof --prof-child-procs`
kernprof.py
- New boolean flags
`[--prof-child-procs[=...] | --no-prof-child-procs]` for
controlling whether to set up profiling in child processes
- Fixed bug in `_manage_profiler.__exit__()` where
`CuratedProfilerContext.uninstall()` can be skipped if the
preceding code raises an error
kernprof.py::_prepare_child_profiling_cache()
- Now respecting ${LINE_PROFILER_KEEP_TEMPDIRS}
- Now setting `LineProfilingCache.debug`
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
- Added new attributes `.debug` and `._debug_log`
- Now diverting debug messages to log files in `.cache_dir`
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
add_cleanup()
Now deferring to a `._add_cleanup()` method which allows for
cleanup-function prioritization
_debug_output()
Fixed type-checking
line_profiler/_child_process_profiling/multiprocessing_patches.py
::apply()
Added debug output before `_setup_in_child_process()` is called to
help with tracing
line_profiler/_child_process_profiling/pth_hook.py
load_pth_hook()
_wrap_os_fork()
Added debug output before `_setup_in_child_process()` is called
to help with tracing
_setup_in_child_process()
- `wrap_os_fork` now defaults to false
- `prof.dump_stats()` now has increased priority over other
callbacks (doesn't seem to help with the malformed prof files
though...)
- Child-process profiling output now written to a less
randomized filename to facilitate debugging
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
profiler
New attribute for the profiler instance
copy(..., inherit_profiler=...)
New argument for inheriting the `.profiler`
load()
Now keeping track of the loaded instance and returning it in
subsequent calls
line_profiler/_child_process_profiling/multiprocessing_patches.py
::apply(..., lp_cache=None)
- If the `LineProfilingCache.load()`-ed instance is consistent with
that loaded from `cache_path`, the former is used
- Added more debugging output
line_profiler/_child_process_profiling/pth_hook.py
load_pth_hook()
Added more debugging output
_wrap_os_fork()
Updated debugging output
_setup_in_child_process()
- Now returning a boolean (whether setup has been newly done)
- Now setting `.profiler` of the cache instance
- Added moew debugging output
kernprof.py::_manage_profiler.__enter__()
Updated so that the created `LineProfilingCache` instance carries a
`.rewrite_module`
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
Added an optional `.rewrite_module` attribute
line_profiler/_child_process_profiling/import_machinery.py
::RewritingFinder.find_spec()
Now looking at `.lp_cache.rewrite_module` (where available) to check
for specs to return
line_profiler/_child_process_profiling/
cache.py::LineProfilingCache
_replace_loaded_instance()
New convenience method for an instance in a fork to replace
the instance to be `.load()`-ed
_consistent_with_loaded_instance
New attribute for checking whether the instance is
consistent with what would have been `.load()`-ed
multiprocessing_patches.py
bootstrap(..., lp_cache=...)
Can now be `None`, which defers the `.load()`-ing of the
cache instance
apply()
- Streamlined logic for retrieving the loaded instance
- Now using the above deferred loading whenever appropriate,
so that cleanup and profiling is preserved in forked
processes
pth_hook.py::_wrap_os_fork()
Now using `._replace_loaded_instance()`, so that future calls to
`.load()` in the forked process retrieves the newly-created
instance
kernprof.py::_prepare_child_profiling_cache()
- Updated call to `[...].multiprocessing_patches.apply()`
- Now always setting up the created instance as the one returned by
further calls to `.load()`
line_profiler/_child_process_profiling/multiprocessing_patches.py
PickleHook
- Refactored to contain no instance variables
- Now always using `LineProfilingCache.load()` to retrieve the
appropriate cache instance
bootstrap()
Removed argument `lp_cache`
get_preparation_data()
Removed arguemnt `cache_path`
apply()
- Removed argument `cache_path`
- Argument `lp_cache` now required
- Simplified implementation
line_profiler/_child_process_profiling/import_machinery.py
Removed
line_profiler/_child_process_profiling/pth_hook.py
::_setup_in_child_process()
No longer set up the `RewritingFinder` because messing with the
import system doesn't help with propagating autoprofiling rewrites
to child processes...
kernprof.py
_dump_filtered_stats()
Fixed bug where if no tempfile remains, the `extra_line_stats`
are not merged into the dumped stats
_prepare_child_profiling_cache()
Now setting the `.profiler` of the returned cache object
line_profiler/_child_process_profiling/multiprocessing_patches.py
::_apply_mp_patches()
- Added debugging output for the patches
- Now patching the copy of `runpy` imported by
`multiprocessing.runpy`
line_profiler/_child_process_profiling/pth_hook.py
_wrap_os_fork()
No longer creating a new `LineProfiler` instance (helps with
handling forked processes)
_setup_in_child_process(..., prof=...)
New argument for avoiding instantiating a new profiler when not
necessary (e.g. in a forked process)
line_profiler/_child_process_profiling/runpy_patches.py
New submodule for the aforementioned patching of `runpy`
tests/test_child_procs.py
test_running_multiproc_literal_code()
New test paralleling `test_running_multiproc_{script,module}` to
test `kernprof -c ...`
test_multiproc_script_sanity_check()
- Refactored parameters for better `pytest` output
- Added testing for running the code with `python -c ...`
<Misc>
- Added CLI argument `--local` to the profiled module to toggle
between a locally-defined summing function and an imported one
- Refactored how the test modules are injected
- Added debugging output to `subprocess.run()` calls
- Added provisional support for examining the profiling data
tests/test_child_procs.py::test_multiproc_script_sanity_check()
Now parametrized to test passing the function defined in the test
module itself to `multiprocessing`
tests/test_child_procs.py
test_running_multiproc_{module,literal_code}()
Integrated into `test_running_multiproc_script()`
test_running_multiproc_script()
Extended parametrization
tests/test_child_procs.py
test_profiling_multiproc_script()
Test parallel to `test_running_multiproc_script()`, checking
whether we are correctly profiling the child processes
<General>
- Added more docs
- Updated dummy parameter names
_ext_module, _test_module
- Refactored how the fixtures are set up
- Module names now randomized and clash-proof via `uuid.uuid4()`
_run_subproc()
- Moved code outputting captured streams from
`_run_test_module()` to here
- Added timing code
tests/test_child_procs.py
TEST_MODULE_BODY, [_]test_module()
Added CLI flag to select `multiprocessing` start methods
_Params
New convenience class for test parametrization
test_multiproc_script_sanity_check()
- Streamlined parametrization (15 subtests -> 10)
- Added subtests for various `multiprocessing` start methods
test_multiproc_script_sanity_check()
- Streamlined parametrization (24 subtests -> 21)
- Added subtests for various `multiprocessing` start methods
tests/test_child_procs.py
test_module(), ext_module()
Updated so that we can toggle for the function sent to
`multiprocessing` to raise an error with the `--force-failure`
CLI flag
_run_test_module()
- Now raising a new `ResultMismatch` error class (instead of
using base assertions) for:
- If `test_module()` writes the wrong number to stdout
- If `nhits` are provided and the profiling results differ
therefrom
- Added argument `fail` for using the aforementioned
`--force-failure` flag
test_multiproc_script_sanity_check()
Now also chceking the cases where the test module is run with
`--force-failure`
test_profiling_multiproc_script()
Now also chceking the cases where the test module is run with
`--force-failure`
(FIXME: profiling bugged when the function errors out, and
doesn't fail with a consistent pattern)
tests/test_child_procs.py
_Params.drop_params(), .split_on_params()
New helper methods for constructing instances based on existing
ones
_check_warnings
New context manager for helping with checking the absence and
presence of warnings at exit time
_run_test_module()
(This function underlies these tests:
- `test_multiproc_script_sanity_check()`
- `test_running_multiproc_script`
- `test_profiling_multiproc_script_{success,failure}()`)
Added protection against stray .pth file created when running
`kernprof`
test_cache_setup_child()
Added check that warnings for empty profiling files are emitted
iff expected
test_apply_mp_patches_{success,failure}()
- Split parametrized test into separate tests, and only allow
retries on the known flaky cases
- Added check that no stray warnings are emitted
test_profiling_multiproc_script_{success,failure}()
test_profiling_bare_python_{success,failure}()
Split parametrized tests into separate tests, and only allow
retries on the known flaky cases
line_profiler/_child_process_profiling
cache.py::LineProfilingCache._handle_signal()
Added more output and handling for the outcome of the
`.cleanup()` call
multiprocessing_patches.py::wrap_bootstrap()
Now calling cleanup with the `new_thread=True` argument to
prevent deadlocks when SIGTERM arrives at an inopportune time --
fellow child threads seem to play well together with the lock,
but not the main thread and a child thread
line_profiler
curated_profiling.py::ClassifiedPreimportTargets.write_preimport_module()
line_profiler.py::LineStats.from_files()
Fixed calls to `warnings.warn()` so that the warnings are
attributed to more informative frames
line_profiler/line_profiler_utils.py::CallbackRepr
Migrated from `line_profiler/cleanup.py::_CallbackRepr`
line_profiler/_child_process_profiling/
cache.py::LineProfilingCache
_dump_stats()
New pseudo-method set by `._setup_in_child_process()` for
writing profiling output
@_method_wrapper
Refactored implementation to use `._format_call()` for
formatting calls in debug-log messages
_format_call()
New convenience method for formatting a call
runpy_patches.py::_run()
Refactored implementation to use:
- `cache._format_call()` for formatting calls in debug-log
messages
- `Cleanup.patch()` for monkeypatching `exec()`
tests/conftest.py
_RetryFailure
New exception class for when a retry condition fails evaluation
_RetryHelper
condition
New attribute for text condition
manage_call()
Added handling for evaluating `.condition` before each retry
using the test function's globals locals
get_helper()
Added handling for the `condition=...` parameter of the
marker:
- If `False`, returns `None`
- If a string, returns an instance with `.condition` set
pytest_configure()
Updated help text for the `@pytest.mark.retry` marker
tests/test_retry_tests.py
_TestModule
run()
New params `additional_std{out,err}_lines` for additional
patterns to match the output streams to
check_verbose_summary(), check_concise_summary()
New params `std{out,err}` for additional patterns to match
the output streams to
conditions_module
New `_TestModule` fixture for a test module using
`@pytest.mark.retry(condition=...)`
test_retry_conditions()
New test using `conditions_module` to check the behavior of
`@pytest.mark.retry(condition=...)`
line_profiler/_child_process_profiling/multiprocessing_patches.py
TaskWrapper
New pickleable wrapper object around a callable to be sent as
tasks to child processes, pking sure that rofiling output is
written before the call result is pushed back to the parent (and
thus leaving the child prone to termination)
Patch
New helper object for declaratively constructing patches for
subcomponents, reworked from `_apply_patches_generic()`; the
patches are:
- `pool`:
Patch `multiprocessing.pool.Pool`, so that each tasks write
profiling output before pushing result back to the parent
process
- `process`:
Patch `multiprocessing.process.BaseProcess`, so that each
child process write profiling output before exiting
- `logging`:
Patch logging functions in `multiprocessing.util`, so that the
internal logs of `multiprocessing` are teed to the session
cache's debug logs
wrap_get_tasks(), wrap_guarded_task_generation()
New wrappers for the eponymous methods in
`multiprocessing.pool.Pool` which wraps the input callables in a
`TaskWrapper`
apply()
- Merged with `_apply_patches_generic()` which it was a thin
wrapper for
- Added param `patches` for explicitly supplying which patches
to apply
- Refactored internals to work with `Patch` objects
line_profiler/rc/line_profiler.toml
[tool.line_profiler.child_processes.multiprocessing]
Removed key `intercept_logs` (superseded by `patches.logging`)
[tool.line_profiler.child_processes.multiprocessing.patches]
New subtable of boolean switches for whether to apply each of
the aforementioned patches
[toll.line_profiler.child_processes.multiprocessing.polling]
Updated obsolete comments
line_profiler/_child_process_profiling/multiprocessing_patches.py
Patch.apply(static=...)
Changed default to true
_PATCHES['pool'].targets['Pool']['_get_tasks']
Updated to work with the new default
line_profiler/cleanup.py::Cleanup.patch()
Added new parameter `static` to toggle whether to use
`inspect.getattr_static()` to get the pre-patch value; now
defaulting to true, while the old behavior is equivalent to false
tests/test_child_procs.py
_preserve_obj_attributes
_preserve_attributes[.fetch_current_values()]
New param `static` for toggling whether to use
`inspect.getattr_static()`
_preserve_attributes.compare_with_current_values()
- New param `static` for toggling whether to use
`inspect.getattr_static()`
- Refactored to give more useful output and to only raise the
`AssertionError` after all comparisons have been made
get_patched_attributes()
New convenience function superseding the `patched_attributes`
fixture
test_cache_setup_main_process()
Refactored to test the various combinations of applying
`multiprocessing` patches
test_apply_mp_patches_{success,failure}()
- Refactored to test the various combinations of applying
`multiprocessing` patches
- Further narrowed the scope of the `@pytest.mark.retry` on the
failing case: retries are only attempted when the
fully-deterministic `pool` patch isn't used
test_profiling_{multiproc_script,bare_python}_failure()
No longer using the `@pytest.mark.retry` decorator because the
'pool' patch is used by default
test_load_pth_hook()
Updated to check (against) the correct patched attributes
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
_additional_data
New free-for-all field allowing for unstructured data to be
`.dump()`-ed and `.load()`-ed
.dump(), .load(), ._from_path()
Updated implementations to handle `._additional_data`
_warn_possible_lack_of_stats()
New method for arbitrary (child or main) process to communicate
that some PIDs are expected to not have written to their
profiling-stats file ("maybe-bad PIDs"); this fixes the stray
empty-file warning sometimes emitted when the 'process' patch is
not applied
_get_pids_possibly_lacking_stats()
New method for retrieving said PIDs
line_profiler/_child_process_profiling/multiprocessing_patches.py
Massively refactored.
_PollerArgs.new()
Refactored from `.from_config()`
MPConfig
New object for holding the submodule-specific config options
extracted from `LineProfilingCache.load().config`
_Patch
New protocol for objects (e/g. `Patch`) capable of
`.apply()`-ing themselves via a `LineProfilingCache` and
providing a `.summary` of the patched entities
Patch.add_method()
New convenience method for adding a single (instance, class, or
static) method to be patched
wrap_repopulate_pool_static()
New wrapper for `Pool._repopulate_pool_static()` which alerts
the session cache to "maybe-bad PIDs"; this is needed because
when wrapping only `Pool` and not `Process`, processes which
exited without receiving a single task would leave their
profiling-stats file empty (see
`LineProfilingCache._warn_possible_lack_of_stats()` above)
wrap_terminate(), wrap_bootstrap()
Simplified implementations using `MPConfig`
RebootForkserverPatch, RunpyPatch
New helper `_Patch` classes consisting of functionalities
previously embedded in `apply()`
ResourceTrackerPatch
New helper `_Patch` class patching
`multiprocessing.resource_tracker.ResourceTracker` to set up
the reporting of "maybe-bad PIDs"; this is needed because the
resource-tracker server, while not directly handling the
profiled parallel workload, is long-lived and may not have
exited (and thus written profiling stats) at stats-gathering
time; supersedes `cache.py::_DEFAULT_GATHER_STATS_EXCLUDES`
apply()
- Renamed parameter `lp_cache` -> `cache`
- Simplified implementation because the components are all
outsourced to `_Patch` objects
tests/test_child_procs.py
test_cache_dump_load()
Added check for the round-tripping of `._additional_data`
_test_apply_mp_patches()
Fixed minor typing issue
_GLOBAL_MINIMAL_PATCHES, DEFAULT_GLOBAL_PATCHES, etc.
Updated definitions so that we check for the patching and
restoration of the correct attributes
tests/conftest.py
RetryMarker
Refactored from `_RetryHelper`:
- `pytest_pyfunc_call()` now issues a warning for malformed
`@pytest.mark.retry(...)` instead of erroring out
- New attribute `require` for controlling whether to require the
test to pass on 'all' or 'any' retries; old and default
behavior equivalent to `'any'`
- New attribute `is_active` for representing
`@pytest.mark.retry(...)` setups where retries shouldn't be
attempted
- Constructor method `from_test_func()` refactored from
`get_helper()`, and now always return an instance instead of
`None`
- New constructor method `from_arguments()` with doctest
RetryMarkerWarning
New warning class for malformed `@pytest.mark.retry(...)`
markers
RetryConditionFailure
Refactored from `_RetryFailure`
tests/test_retry_tests.py
<General>
Simplified implementation of `_TestModule` fixtures
test_bad_markers()
New test that malformed `@pytest.mark.retry(...)` results in the
appropriate warnings instead of test failure
test_requirement()
New test that `@pytest.mark.retry(require=...)` works as
expected
tests/test_child_procs.py::test_apply_mp_patches_failure()
Now retrying with `require='all'` to test AGAINST flakiness
line_profiler/cleanup.py::Cleanup.cleanup()
Now taking an optional `reason` argument which is reflected in
log messages
line_profiler/_child_process_profiling/cache.py
::LineProfilingCache._handle_signal()
- Updated the log message written to (hopefully) make it easier to
catch corner cases where the `.cleanup()` call somehow didn't
happen
- Added a `reason` to the `.cleanup()` call
line_profiler/_child_process_profiling/multiprocessing_patches.py
::wrap_bootstrap()
line_profiler/curated_profiling.py::CuratedProfilerContext.uninstall()
Added `reason`s to the `.cleanup()` calls
line_profiler/_child_process_profiling/multiprocessing_patches.py
TaskWrapper, wrap_get_tasks(), wrap_guarded_task_generation()
Removed
wrap_worker()
New wrapper function for wrapping
`multiprocessing.pool.worker()` and writing profiling output
ASAP as it catches the sentinel value
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
_stats_dumper
New private attribute superseding `._dump_stats`
cleanup()
No longer overriding the base-class method
_setup_in_child_process()
Now setting up the `._stats_dumper` attribute instead of
`._dump_stats`
_handle_signal()
- No longer calling `.cleanup()` to avoid deadlocks
- Added handling for when the signal handler was originally set
from outside of Python
line_profiler/_child_process_profiling/multiprocessing_patches.py
wrap_runner(), wrap_bootstrap()
No longer calling `LineProfilingCache.load().cleanup()` with the
`new_thread` argument
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
_stats_dumper
Now a convenience subtype of `Cleanup` which can be directly
called
_setup_in_child_process(), _handle_signal()
Updated setup and handling for `._stats_dumper`
line_profiler/_child_process_profiling/multiprocessing_patches.py
TaskWrapper, wrap_get_tasks(), wrap_guarded_task_generation()
Partially reverted deletion because we need these on Windows
wrap_worker()
Patch now only applied when on POSIX (Windows uses the above
instead)
tests/test_child_procs.py::test_apply_mp_patches_{success,failure}()
- Updated implementations
- Loosened checks against empty-file warnings
- Updated retrying rules for `test_apply_mp_patches_failure()`:
- Only retrying when on Windows and `patch_pool` is false
- Reverted back to `require='any'`
tests/test_retry_tests.py::test_retry_conditions()
Fix failure when running in a narrow terminal;
see `pytest-mark-retry` commit d53a515
|
As it is now, the code (at least on non-Windows platforms) already reliably and deterministically captures all profiling data related to tasks set to the child processes. If the child process gets to execute any task at all, it must have gone through setup which allows for the capturing of profiling data. What however is more difficult to handle consistently, and is causing most of the recent pipeline failures (other than 25785347402, which was entirely on me), is what happens with child processes which terminates without ever having received a task.1 While currently the Such failure calls into question:
Footnotes
|
line_profiler/_child_process_profiling/multiprocessing_patches.py
wrap_worker_pool()
Renamed from `wrap_worker()` because another function now also
patches `multiprocessing.pool.worker()`
wrap_handle_results(), wrap_worker_pid(), wrap_process()
Parts of the new patch `child_pids` which patchs resp.
- `multiprocessing.pool.Pool._handle_results()`
- `multiprocessing.pool.worker()`
- `multiprocessing.pool.Pool.Process()`
To help the parent process keep track of which child process
having completed which task, and hence be able to warn the
session cache against profiling-stats file expected to be empty
apply()
Added documentation on the `child_pids` patch
line_profiler/rc/line_profiler.toml
::[tool.line_profiler.child_processes.multiprocessing.patches]
Added boolean key `child_pids` for whether to apply the new patch
tests/test_child_procs.py
_test_apply_mp_patches()
- Parameters `patch_pool`, `patch_process`, and `intercept_logs`
now permit the value `None` for using the default-config
values
- New parameter `trace_pids` for toggling the `child_pids` patch
- Updated the `line_profiler` warning check to only happen when
the `child_pids` patch is applied (since that is when we can
reliably weed out the empty-file warnings)
test_apply_mp_patches_success()
- Reworked most subtests to just use the default value for
whether to apply the `logging` patch
- Added subtest for whether to apply the `child_pids` patch
test_apply_mp_patches_failure()
Now using the default for whether to apply the `logging` patch
tests/test_child_procs.py
@_timeout, _TestTimeout
New decorator and error class for timing out function execution
using threads
_test_apply_mp_patches_inner()
Now timing out the tested function
(`test_module_object.sum_in_child_procs()`) which uses
`multiprocessing`
line_profiler/_child_process_profiling/multiprocessing_patches.py
::wrap_process()
- Fixed bug where the if `Process.join()` and `Process.terminate()`
are called more than once, the second call onwards will always
report the process' having completed 0 tasks
- Enclosed the bookkeeping in the above methods in a try-except
block so that the original implementation is always run
|
Added the PID bookkeeping, but there seems to be an unrelated Heisenbug which I'm struggling to pin down. Every ≈ 1000–2000 runs of The newest pipeline failures are semi-related:
|
tests/test_child_procs.py
@_timeout
Now no longer raising an error from the new thread when the main
thread has timed out, causing the file handle to be destroyed
_run_test_module(), run_{module,script,literal_code}()
Fixed bug where if the subprocess timed out the debug logs won't
be written
_test_apply_mp_patches_inner()
No longer using `pytest.raises()` to check for the error when
`fail=True` to allow for other `RuntimeError`s (e.g. timeouts)
to be propagated
line_profiler/_child_process_profiling/cache.py
_DumpStatsHelper.cleanup()
New parameter `force` for forcing the writing of the profiling
stats even if the cleanup stacks are empty
LineProfilingCache._atexit_hook()
New property for a persistent handle on the hook passed to
`atexit.register()`, so that it can be unregistered when
necessary
line_profiler/_child_process_profiling/multiprocessing_patches.py
::wrap_{worker_pool,bootstrap}()
- Now unregistering `LineProfilingCache.load()._atexit_hook()` with
`atexit.unregister()` to hopefully take care of end-of-process
deadlocks in child processes
- Now prioritizing cleanup speed in child processes, using
`cache._stats_dumper.cleanup()` instead of `cache.cleanup()` where
appropriate to avoid complications
line_profiler/_child_process_profiling/cache.py::LineProfilingCache
@_method_wrapper
Now also handling `BaseException`
_format_exception()
New convenience method
line_profiler/_child_process_profiling/multiprocessing_patches.py
::wrap_process()
Added more debugging output in `.join()` and `.terminate()` of the
created `BaseProcess`
|
@Erotemic yikes, unfortunately job 76507649976 seems to have gotten stuck despite multiple precautions (timeouts for dubious sections, hundreds/thousands of rounds of local tests)... worse, its the Mac job that's stuck (which I suppose is the most expensive on CI), despite that being the most thoroughly tested env, since it's also my local. I don't have the perms to cancel the job; please do so ASAP before it continues to rack up compute. Terribly sorry for this. |
|
Yikes! Just saw the message. I don't see an immediate way to cancel it. Maybe it timed out and it just looks like it is churning? It does say: "The job has exceeded the maximum execution time of 6h0m0s", and in this PR I see: Cancelled after 365m. So I think it is not running. |
|
Yep, I think Actions have a default 6 h limit (not sure if it's the entire pipeline or individual jobs) on GitHub. This happened on Monday so said default timeout expired a while before intervention... it may not be ideal, but we can probably put in loose-ish timeouts for For the (probably1) offending test We can probably mitigate this by folding the cases tested by Footnotes
|
|
When trying to sniff around and trigger and fix the bug I ran into an even weirder issue.
This seems to indicate something being wrong when profiling starts on a non-main thread; in view of that, how |
This PR adds support for
kernprofto profile code execution in child Python processes, building on ongoing work (see Credits).Usage
The EXPERIMENTAL new flags
--no-prof-child-procsand--prof-child-procs[=...]are added to kernprof. By setting--prof-child-procsto true, child Python processes created by the profiled process are also profiled:1Note how the
sum_worker()calls are profiled:Highlights
os.system()andsubprocess.run()multiprocessing2multiprocessing"start methods" ('fork','forkserver', and'spawn') tested to be compatible, where available on the platform--preimportsor via test-code rewriting) replicated in child processesExplanation
line_profiler._child_process_profiling.cache.LineProfilingCache) is created by the main process, containing session config information (e.g. values for--prod-modand--preimports) so that profiling can be replicated in child processes..pthfile is created; Python processes inheriting the right environment will thus go through profiling setup, while those without the env var (and just happens to share the Python executable) will be minimally affected.os.fork()(where available) is patched with a wrapper which ensures consistent global states.coverage.multiproc, variousmultiprocessingcomponents are patched (line_profiler._child_process_profiling.multiprocessing_patches.apply()) so that child processes can retrieve the cache and explicitly cleanup before exiting. Patches are inherited by forked child processes and reapplied by spawned ones.multiprocessingchild processes are allowed to fully clean up and write their profiling, even when the parallel workload errors out,3 additional patches are made tomultiprocessing.kernprofthen gather and merge with the profiling result in main process.Code changes
New code (click to expand)
line_profiler/_threading_patches.pyNew submodule patching
threadingfor the consistent gathering of profiling data between tracing modes.apply():When legacy tracing is used (Python < 3.12 or
LINE_PROFILER_CORE=ctrace), patchthreading.Thread.__init__()so that the profiler starts with the same.enable_counton the new thread as with the parent thread; this is necessary for correctness because the profiler may otherwise not be enabled on the new thread.line_profiler/cleanup.pyNew submodule defining the
Cleanupclass, which handles various setup/cleanup tasks like:line_profiler/curated_profiling.pyNew submodule containing mostly relocated code from
kernprof, so that child processes can more easily reestablish profiling:ClassifiedPreimportTargets:Object resolving and classifying the
--prof-mods, and writing a corresponding preimport moduleCuratedProfilerContext:Context manager managing the state of the
LineProfiler, e.g.:line_profiler.profileon startupthreading(see_threading_patchesabove) so that the profiler stays enabled on newly spawned threads.enable_counts on teardownline_profiler/_child_process_profiling/New private subpackage for maintaining the states, setting up the hooks, and performing the patches which makes it possible to profile child processes:
cache.py::LineProfilingCache:"Session state" object. It:
Can be auto-(de-)serialized in the main and child processes based on env-var values, managing setup (module patches, profiler curation, eager pre-imports) and cleanup (tempfile management, dumping and gathering of profiling results) in each process.
Injects the following environment variables, which are inherited by child processes:
${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_PID}: main-process PID${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_DIR_<PID>}: location of the cache directoryFrom the combination of both, child processes can retrieve the cache by calling
.load().multiprocessing_patches.py::apply():Apply patches to these
multiprocessingmodule components so that profiling results are properly gathered on child-process exit:Process(read:multiprocessing.process.BaseProcess):._bootstrap():Wrapped call to:
LineProfilingCache.cleanup()on exit.SIGTERMhandler to do the above as a failsafe..terminate():Wrapped call to poll on the child process, and soft-block (with a timeout) until it is deleted.
spawn.runpy:Replaced with a localized, patched clone of
runpy(seerunpy_patches.pybelow). This is necessary for profiling to function in non-eager-preimports mode (--no-preimports).forkserver.ForkServer:Global instance rebooted if
reboot_forkserver=Trueso that child processes fork with the profiling infrastructure set up properly.pth_hooks.py:Facilities for effecting profiling-code execution in child processes by injecting a temporary
.pthfile into the current venv. This module is kept as minimal as possible to minimize the amount of startup code run as the mere result of having said.pthfile.write_pth_hook():Set up a
.pthfile under the directorysysconfig.get_path('purelib')which callsload_pth_hook()(see below). The.pthfile will be cleaned up by the supplied cache object.load_pth_hook():For processes inheriting a matching "parent PID" from the environment (see
LineProfilingCacheabove), load the cache and set up theLineProfilerinstance used, like how the mainkernprofprocess does.runpy_patches.py::create_runpy_wrapper():Make a clone of the
runpymodule which checks if the code executed is the code to be profiled; if so, it goes through the same code-rewriting facilities thatline_profiler.autoprofile.autoprofile.run()uses to set up profiling.threading_patches.py::apply():Patch
threading.Thread.__init__()so that when using legacy tracing the profiler is suitably enabled on newly-spawn thread, depending on whether it is enabled on the current thread.tests/conftest.pyAdd the
@pytest.mark.retrymarker so that we can retry flaky tests._RetryEntry:Object containing info on retried test items so that they can be summarized when the session ends.
_RetryHelper:Object implementing the retrying of failing test functions (
pytest.Function) and the teardown of function-scoped fixtures between retries.pytest_pyfunc_call():Hook implementation for instantiating a helper to handle test rerunning if necessary.
get_helper():Instantiate from a
pytest.Functionmarked with@pytest.mark.retry(...).pytest_addhooks():Hook implementation for registering
_RetryHelperas a plugin.pytest_configure():Hook implementation for defining the
@pytest.mark.retrymarker, so that it doesn't produce apytest.PytestUnknownMarkWarningwhen used.pytest_terminal_summary():Hook implementation for writing a summary section if any test is retried.
tests/test_retry_tests.pyTest that the
@pytest.mark.retrymarker:test_fixture_scoping()), esp. the teardown on non-persisted function-scoped fixtures between retries (test_fixture_teardown())test_exception_restrictions())pytest.PytestUnknownMarkWarningwhen used.tests/test_child_procs.py_ModuleFixture:Helper object which handles:
tests/test_cython.py::propose_name()) to avoid clashes; andsys.pathandos.environ['PYTHONPATH']._Params:Helper object which handles concatenation and Cartesian products of parametrizations.
ext_module:New
_ModuleFixturerepresenting a module defining the sum function used bytest_modulewhen run without the--localflag._run_subproc():New wrapper around
subprocess.run()which provide extra debugging output (standard streams, timing info, etc.)line_profiler._child_processing_profilingcomponents, or as close as is possible thereto:test_runpy_patches():Test the functionality of
~.runpy_patches.create_runpy_wrapper().test_cache_dump_load():Test the functionalities of
~.cache.LineProfilingCache.dump()and.load().test_cache_setup_main_process():Test the functionality of
~.cache.LineProfilingCache._setup_in_main_process().test_cache_setup_child():Test the functionality of
~.cache.LineProfilingCache._setup_in_child_process().test_load_pth_hook():Test the functionality of
~.pth_hook.load_pth_hook().test_apply_mp_patches():Test the functionality of
~.multiprocessing_patches.apply().test_profiling_multiproc_script():"Main" new test for running the
test_module(see Modified Code) withkernprof --prof-child-procs; heavily parametrized to check for profiling-result correctness in different contexts:run_func: execution modes (kernprof <script>,kernprof -m <module>, andkernprof -c <code>)prof_child_procs; whether to use child-process profiling (--[no-]prof-child-procs)preimports: eager vs. on-import profiling (--[no-]preimports)use_local_func: whether the parallel workload is locally defined in the executed code or imported from external modulesfail: whether the parallel workload errors outstart_method:multiprocessing"start methods" ('fork','forkserver', and'spawn')test_profiling_bare_python():New test for profiling child processes where the code run by
kernprof --prof-child-procsdoesn't directly invokemultiprocessing, but spins up another Python process that does (viaos.system()orsubprocess.run()).Modified code (click to expand)
line_profiler/line_profiler.py::LineStats.get_empty_instance():New convenience class method for creating an instance with no profiling data and the platform-appropriate
.unit..from_files():Added new argument
on_defective: Literal['ignore', 'warn', 'error'], allowing for passing over bad files (e.g. empty ones) with optional warnings. The old behavior (on_defective='error') remains the default.line_profiler/rc/line_profiler.toml[tool.line_profiler.kernprof]:New key-value pair
prof-child-procsfor the default of thekernprof --[no-]prof-child-procsflag.[tool.line_profiler.multiprocessing]:New key-value pairs for controlling the application of the
.multiprocessing_patches:intercept_logs(bool):Whether to patch methods like
multiprocessing.util.debug()to also capture themultiprocessinglogs.polling:Behavior for the polling for child-process termination:
cooldown(float):Time (s) to wait between polls.
timeout(float):Timeout (s) for the overall polling.
on_timeout(Literal['error', 'warn', 'ignore']):What to do when the polling timed out.
The
multiprocssingtable and its contents are as of yet considered private implementation details.kernprof.py_add_core_parser_arguments():Now adding the new
--[no-]prof-child-procsflags to the parser._write_preimports():Refactored to use the new/relocated facilities at
line_profiler.curated_profiling._dump_filtered_stats():extra_line_stats: LineStats | Noneallows for handling and combining the profiling stats gathered elsewhere (e.g. child processes)._dump_filtered_line_stats()which it now calls._manage_profiler:Context manager refactored from the old
_pre_profile()for more Pythonic handling of setups and teardowns._prepare_child_profiling_cache()._prepare_profiler(),_prepare_exec_script())._post_profile()on context exit so that we no longer have to explicitlytry: ... finally: ...in_main_profile()._post_profile():extra_line_stats: LineStats | Noneallows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).line_profiler.curated_profiling.tests/test_child_procs.pytest_module:Pathfixture into a_ModuleFixture(see above in New Code).--start-methodselects a specificmultiprocessing"start method".--localtoggles between using a sum function defined locally intest_moduleor the one defined externally inext_module(see New Code).--force-failuretoggles whether the sum function should return normally or raise an error._run_as_{script,module}():_run_as_literal_code()to also testkernprof -c ....test_moduleas a_ModuleFixtureinstead of a path, and handling its installation._run_test_module():run_module = partial(_run_test_module, _run_as_module), etc. now available for more convenient testing ofkernprofexecution modes as test parametrization.profiled_code_is_tempfile: boolhelps with constructing thekernprofcommand line in cases where the code is anonymous (kernprof -c ...).use_local_func: bool,fail: bool, andstart_method: Literal['fork', 'forkserver', 'spawn'] | Noneallows for fuzzing code execution with the aforementionedtest_moduleCLI flags (resp.--local,--force-failure, and--start-method).nhits: dict[str, int] | None, when provided, checks that the line-hit stats are as expected (all calls traced with--prof-child-procs, only those in the main process without).failis true, thekernprofsubprocess should fail..pthfiles created bykernprof --prof-child-procsshould be cleaned up.nhits(where available).test_multiproc_script_sanity_check():use_local_func,fail, andstart_method, to ensure that the test script is fully functional in vanilla Python.as_module: boolwithrun_func: Callable[..., CompletedProcess], allowing for more flexible testing of execution modes (python ...,python -m ..., and the newpython -cadded via the aforementioned_run_as_literal_code()).test_running_multiproc_script():New parametrization
run_funcallows for absorbing the oldtest_running_multiproc_module()into the same test as additional parametrization, as well as testingkernprof -c.Caveats
The temporary
.pthfile created is course benign and as mentioned tries to be as out of the way as possible, but I just figured that the use of.pthfiles should be called out, given their recent spotlight in a CVE vulnerability.Since the
.pthfile is written tosys.get_path('purelib'), it depends on said directory being writable. If we aren't in a venv or a similarly isolated environment (which is increasingly unlikely nowadays), all processes using the system Python will have to import and runline_profiler._child_process_profiling.load_pth_hook(). Though the function itself should quit rather quickly when we're not in a child process, it may still entail loading a significant portion ofline_profilerintosys.modules.Blocking
Process.terminate()3 is a bit dodgy:To avoid uncontrolled hot-looping
.poll()-ing for the activity of the child-process PID, I'm just using a 1/32-s cooldown.After all, there's a reason
Process.terminate()justSIGTERMs the child process with reckless regard – children which errored out are sporadically stuck in an unclean state, and the polling may enter a deadlock. To guard against that I added:SIGTERMis sent anywaykernprof --prof-child-procsor equivalent code:subprocess.run().This seems to be enough to both get rid of the deadlocks in tests and preserve profiling data... but the problem is that for child processes to deadlock AT ALL, their cleanup routines must have (of yet) failed to complete, and thus there is still a risk of profiling data not being written. So there's probably either some race conditions hidden by the delays, or an error in how the PID statuses are detected.
(Note however that on non-Windows platforms we can and do just setup a
SIGTERMhandler,4 which ensures that cleanup finishes before the child succumbs to termination. But yeah on Windows it's just the polling and the delays keeping us afloat.)Apparently
coveragegets by alright by only patchingProcess._bootstrap(), without the above termination issue. Gotta figure out why...TODO
line_profiler._child_process_profilingto become public API?Credits
pytest-autoprofile, which in turn was based on the solution implemented incoverage.multiproc.LineStatsaggregation #380kernprofto always run code insys.modules['__main__']'s namespace #423Notes
Welp. This took way longer than I expected. The main friction points were that:
line_profiler._child_process_profiling.cache.LineProfilingCacheclass tackles this issue.pytest-autoprofile::tests/test_subprocess.py::_test_inner()), but apparently I only made the tests fail there, not the parallel functions themselves. Had to do some rather hacky stuff to circumvent that (see Caveats)...3Footnotes
Note however that the equivalent vanilla Python command (
python -c ...) would error out, because functions sent tomultiprocessingmust be pickle-able and thus reside in a physical file. This is sidestepped bykernprof's always writing code received bykernprof -c ...and... | kernprof -to a tempfile (ENH: auto-profilestdinor literal snippets #338). ↩In the test suite we're only testing process creation with the most common
multiprocessing[.get_context(...)].Pool. However, since none of the patched components are specific tomultiprocessing.pool, it should also work with other model of parallelism built with the components ofmultiprocessing. ↩From the docs for
mulitprocessing.Process.terminate():Note that exit handlers and finally clauses, etc., will not be executed. Normally this doesn't matter, but if the parallelly-executed function errors out,multiprocessinghas a bad habit of just.terminate()-ing child processes without allowing for enough time to run cleanup, leading to incomplete profiling data. Hence the only workaround seems to be interceptingProcess.terminate()calls and blocking them where appropriate. ↩ ↩2 ↩3Handling
SIGTERMon Windows is generally noted to be inconsistent. See e.g. thiscoverage.control.Coverage._init_for_start()comment and this SO discussion that it refers to. ↩