fix(build): remove numpy from dependency blacklist, rename to SIZE_PROHIBITIVE_PACKAGES#263
fix(build): remove numpy from dependency blacklist, rename to SIZE_PROHIBITIVE_PACKAGES#263
Conversation
The dependency blacklist was defined by what the GPU base image ships (torch, numpy, triton, etc.), which silently broke CPU endpoints using python-slim. Numpy and similar packages aren't pre-installed in slim images, so excluding them caused runtime ImportErrors. Rename BASE_IMAGE_PACKAGES to SIZE_PROHIBITIVE_PACKAGES and remove numpy. The blacklist now contains only packages that exceed the 500 MB tarball limit (torch ecosystem + triton), which are CUDA-specific and never needed by CPU endpoints.
There was a problem hiding this comment.
Pull request overview
This PR updates the build-time “auto-exclude” package list to reflect tarball size constraints (rather than GPU base-image contents), and stops stripping numpy from CPU build artifacts.
Changes:
- Renames
BASE_IMAGE_PACKAGEStoSIZE_PROHIBITIVE_PACKAGESand updates associated messaging. - Removes
numpyfrom the auto-excluded set while keeping the torch/CUDA ecosystem exclusions. - Updates unit tests to reflect the new constant name and expected numpy behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/runpod_flash/cli/commands/build.py |
Renames and redefines the auto-exclusion set and updates exclusion messaging/docs in code comments/docstrings. |
tests/unit/cli/commands/test_build.py |
Updates tests to use the renamed constant and ensures numpy is no longer auto-filtered. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
ResourceDiscovery only found LB endpoints (ep = Endpoint(...) + @ep.get/post) and @Remote patterns, missing the QB pattern where @endpoint(...) decorates a function/class directly. This caused --auto-provision to skip all queue-based endpoints. - Add _is_endpoint_direct_decorator() AST check for @endpoint(...) - Record decorated function/class name (not variable) for QB pattern - Extract resource_config from __remote_config__ on wrapped functions - Add 6 tests covering GPU/CPU QB, class, directory scan, mixed
- Update stale assertion from "Auto-excluded base image packages" to "Auto-excluded size-prohibitive packages" - Fix "Numpy" casing to "NumPy" in test docstring - Reword SIZE_PROHIBITIVE_PACKAGES comment to focus on size constraints, not runtime assumptions
runpod-Henrik
left a comment
There was a problem hiding this comment.
QA Results — PR #263 + flash-examples #42
What changed
- CPU workers declaring numpy as a dependency now receive it in the build artifact. Previously numpy was silently stripped, causing import failures at runtime on CPU instances where the base image has no pre-installed packages.
@Endpoint(...)used directly on functions and classes is now correctly discovered byflash run,flash build, andflash deploy.
What works
| Scenario | Result |
|---|---|
CPU worker with dependencies=["numpy"] — deploys, numpy importable, correct output returned |
PASS — numpy=2.4.3, mean/std/median all correct |
flash build --exclude numpy — user exclusion flag still works after the rename |
PASS — numpy absent from tarball, other deps present |
flash run discovers QB-decorated workers (@Endpoint on functions) alongside existing workers |
PASS — both mixed_worker endpoints appear in dev server alongside 4 existing workers |
CPU regression — 02_cpu_worker, 03_mixed_workers, autoscaling, CPU LB |
PASS — all 4 pass |
What was not tested
| Gap | Risk |
|---|---|
GPU worker with dependencies=["numpy"] — numpy is no longer excluded from GPU tarballs as a side effect of this fix. GPU base image has numpy pre-installed so the bundled copy should be ignored at runtime, but this path has no E2E coverage. |
Medium |
| GPU regression | Low — no GPU-specific code paths changed |
Verdict
Pass for the scenarios tested. The fix works for its stated purpose. The one untested path (GPU + numpy) is a behaviour change without coverage — risk is low given GPU base image precedence, but worth a note.
The server.py codegen imported LB config variables by their raw name
(e.g. "api"). When multiple files exported the same variable name,
later imports overwrote earlier ones, causing GPU LB endpoints to
dispatch to the wrong resource (CPU worker image instead of GPU).
Config variables are now imported with unique aliases derived from the
resource name (_cfg_{resource_name}). Also passes endpoint dependencies
through lb_execute to the stub so the remote worker installs them.
runpod-Henrik
left a comment
There was a problem hiding this comment.
QA Update — commit 6bfac3e
Bug found and fixed: When flash run is invoked at the root of flash-examples with multiple LB workers exporting the same config variable name (api), the generated server overwrote the first import with the second, causing GPU LB endpoints to provision and dispatch through the CPU resource config instead.
Scenario: GET /03_advanced_workers/05_load_balancer/gpu_lb/info → was provisioning live-03_05_load_balancer_cpu-fb instead of live-03_05_load_balancer_gpu-fb.
Fix verified: Generated server.py now imports each config variable with a unique alias (_cfg__03_advanced_workers_05_load_balancer_gpu_lb, _cfg__03_advanced_workers_05_load_balancer_cpu_lb). All GPU LB routes dispatch through the GPU config, all CPU LB routes through the CPU config. Confirmed at root of flash-examples with all workers loaded.
This scenario was missing from the original test plan. We ran flash run in a single subdirectory where no two files share a variable name. Root-level multi-LB dispatch correctness (not just discovery) was not covered.
|
QA Update — LB config alias fix What changed: When Tested (manual E2E, flash run at project root):
Not tested: Deployed LB workers end-to-end (GPU endpoint not live during testing; CPU LB end-to-end covered in prior testing on this PR). Verdict: Pass. No regressions found. The fix correctly scopes the alias to the dev server path — the deploy path is unaffected. |
Summary
BASE_IMAGE_PACKAGEStoSIZE_PROHIBITIVE_PACKAGESto reflect the actual constraint (500 MB tarball limit, not base image contents)numpyfrom the blacklist -- it was being stripped from CPU endpoint build artifacts wherepython-slimhas no pre-installed packagesRoot cause: The blacklist was defined by what the GPU base image ships, not by physical size constraints. This silently broke CPU endpoints that declared
numpyas a dependency.Companion PRs:
Test plan
make quality-checkpassesnumpyis no longer inSIZE_PROHIBITIVE_PACKAGESdependencies=["numpy"]and confirm numpy is in the tarball