feat: add concurrency_modifier support for QB endpoints#301
feat: add concurrency_modifier support for QB endpoints#301
Conversation
|
Promptless prepared a documentation update related to this change. Triggered by runpod/flash#301 Added documentation for the new Review at https://app.gopromptless.ai/suggestions/640ef899-880f-42f8-a055-db3680e8a632 |
There was a problem hiding this comment.
Pull request overview
Adds max_concurrency support intended to control concurrent job execution for queue-based (QB) endpoints, and propagates it through manifest generation and handler code generation.
Changes:
- Add
max_concurrencytoEndpoint(with validation) and to runtimeResourceConfig. - Propagate
max_concurrencyinto the build manifest (and warn/strip for load-balanced endpoints). - Generate handlers with
concurrency_modifierwhenmax_concurrency > 1(new async templates + sync injection path), plus tests.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Bumps runpod-flash version metadata. |
| tests/unit/test_endpoint.py | Adds unit tests for Endpoint.max_concurrency validation/defaults. |
| tests/unit/test_concurrency_manifest.py | Adds tests for ResourceConfig.max_concurrency and manifest behavior for QB vs LB. |
| tests/unit/cli/commands/build_utils/test_handler_generator.py | Adds coverage for handler codegen with/without concurrency_modifier and warning behavior. |
| src/runpod_flash/runtime/models.py | Adds max_concurrency field to ResourceConfig and from_dict handling. |
| src/runpod_flash/endpoint.py | Adds max_concurrency parameter + validation and stores it on the endpoint instance. |
| src/runpod_flash/cli/commands/build_utils/manifest.py | Extracts max_concurrency during build; warns/strips for LB endpoints. |
| src/runpod_flash/cli/commands/build_utils/handler_generator.py | Adds async handler templates and injects concurrency_modifier for max_concurrency > 1. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
runpod-Henrik
left a comment
There was a problem hiding this comment.
PR #301 — feat: add concurrency_modifier support for QB endpoints
Henrik's AI-Powered Bug Finder
1. Question: What happens if is_async is incorrectly detected?
If the scanner marks a sync function as is_async=True (or vice versa), the wrong template is selected:
is_async=Trueon a sync function → generated handler doesawait func(**job_input)on a non-coroutine → runtimeTypeErroron every jobis_async=Falseon an async function → sync template used, handler callsfunc(**job_input)withoutawait→ function returns a coroutine object, not a result
Is is_async detection in the scanner reliable for all QB worker patterns? If the scanner can be fooled (e.g., a function decorated with @functools.wraps wrapping an async function), users get a silently broken deployment.
2. Test coverage gap: no live concurrency test
All tests verify that the string concurrency_modifier appears in generated code. None verify that the RunPod platform actually honours it — i.e., that a deployed worker with max_concurrency=5 accepts 5 simultaneous jobs without queuing them. Noted as a gap, not a blocker.
Nits
_METHODS = {methods_dict}maps{m: m}— values are never read, only keys. Same pattern as the existing sync class template, so not new, but worth cleaning up.- The high-concurrency warning threshold of
> 100is arbitrary and undocumented. Worth a comment explaining the rationale.
Verdict: PASS WITH NITS
Core implementation is correct — validation, manifest propagation, template selection, and LB stripping all work as described. The is_async detection question is the main open item but depends on the scanner which is outside this PR's scope.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
62f3c86 to
9873dcb
Compare
Add max_concurrency int field (default 1) to ResourceConfig dataclass and its from_dict() parser. Enables manifest to carry per-resource concurrency config for QB workers.
Read _max_concurrency from Endpoint facade before unwrapping to internal resource config in _extract_deployment_config(). Only set in manifest when value > 1 (default 1 means no override needed). For LB endpoints, max_concurrency is meaningless (uvicorn handles concurrency), so warn and remove the field in build() to keep the manifest schema consistent and avoid silent misconfiguration.
Add DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE for max_concurrency > 1 with async handlers. Update _generate_handler to read max_concurrency from resource_data and select template accordingly. Sync handlers with max_concurrency > 1 get concurrency_modifier injected via string replacement. Includes 10 new tests covering all template selection paths and warning logs.
The concurrency feature accidentally tightened the name validation guard and removed auto-derive in __call__ and the _route name check, breaking nameless QB decorator mode and LB stub routing.
- Fix inline @endpoint() max_concurrency loss: persist into __remote_config__ so manifest builder can read it when the Endpoint facade is not reachable as a module-level variable. - Fix scanner is_async for class-based workers: detect async public methods so the correct handler template is selected. - Harden _inject_concurrency_modifier: raise ValueError if the expected start call string is absent instead of silently producing a handler without concurrency_modifier. - Add rationale comment for the >100 concurrency warning threshold. - Add tests for all new behaviors.
9873dcb to
1f1e91d
Compare
runpod-Henrik
left a comment
There was a problem hiding this comment.
1. manifest.py — remote_cfg is always None; inline decorator path is dead code
remote_cfg is initialized to None and never reassigned in the diff. The elif isinstance(remote_cfg, dict) branch that's supposed to pick up max_concurrency for inline @Endpoint() decorators is therefore unreachable.
resource_config = None
remote_cfg = None # ← set here
# ... resource_config gets assigned ...
elif (
isinstance(remote_cfg, dict) # ← always False; remote_cfg is still None
and remote_cfg.get("max_concurrency", 1) > 1
):
config["max_concurrency"] = remote_cfg["max_concurrency"]endpoint.py correctly writes max_concurrency into __remote_config__ on the decorated callable, but manifest.py never reads it. The fix is to assign remote_cfg after resource_config is set:
remote_cfg = getattr(resource_config, "__remote_config__", None)The existing test only verifies that max_concurrency is persisted into __remote_config__; there's no test that the manifest builder actually reads it for inline decorators.
2. Async handler templates don't apply the empty-input validation from PR #302
DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE still use:
job_input = job.get("input", {{}})PR #302 (in review simultaneously) adds None/empty-dict rejection to the sync templates and create_deployed_handler. Once both land, sync QB handlers reject {} but async QB handlers (selected when max_concurrency > 1) silently invoke with empty kwargs. Users migrating to max_concurrency would see inconsistent platform parity depending on whether their handler is sync or async.
3. Mixed sync/async class methods with max_concurrency > 1 will TypeError at runtime
The scanner correctly detects a class as async if any public method is a coroutine. The async class template then dispatches with await method(**job_input). If the dispatched method is the sync one, await sync_return_value raises TypeError: object dict can't be used in 'await' expression.
The warning about sync handlers only fires at the function level. For a class with e.g. a sync setup() and an async predict(), the class is classified as async, the async template is selected, and calling setup via the deployed endpoint will crash. test_mixed_class_detected_as_async confirms the detection behaviour but there's no test for what happens when the sync method is dispatched.
4. Async class template error dicts missing "success": False
The DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE error responses for unknown and missing method key don't include "success": False:
return {{"error": "class {class_name} has multiple methods..."}}The sync class template (modified in PR #302) has "success": False in these same paths. If both PRs merge the response shape differs based on concurrency configuration.
5. _inject_concurrency_modifier string-replacement is fragile but guarded
The injector matches 'runpod.serverless.start({"handler": handler})' as a literal string. Any future template change that reformats that call would cause ValueError at build time rather than silent corruption. The raise-on-miss behaviour and the test for it are the right calls; worth keeping in mind for future template edits.
Nit
The max_concurrency > 100 warning fires at flash build time, not flash deploy. A user setting max_concurrency=200 for a high-throughput CPU endpoint (e.g. embedding service) will see a GPU-oriented warning with no indication their workload is unaffected.
Verdict
Three things need resolution: (1) remote_cfg is never assigned — inline @Endpoint(max_concurrency=N) decorator is silently broken; (2) async templates skip the empty-input validation being added in PR #302; (3) mixed sync/async class methods crash at runtime when the sync method is dispatched. Issue 1 is a functional regression; 2 and 3 can be deferred with documentation if needed, but should be tracked.
Summary
max_concurrencyparameter toEndpointfor controlling concurrent job execution on QB endpointsmax_concurrencythroughResourceConfigand manifest, with LB endpoint warning/strippingconcurrency_modifierformax_concurrency > 1+ async handlersconcurrency_modifierinto sync handler templates whenmax_concurrency > 1(with warning)Linear
AE-2415
Changes
endpoint.py: newmax_concurrencyparam with validation (>= 1)runtime/models.py:max_concurrencyfield onResourceConfigmanifest.py: propagate fromEndpointfacade, warn and strip for LB endpointshandler_generator.py: newDEPLOYED_ASYNC_HANDLER_TEMPLATEandDEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATEwithconcurrency_modifier, plus_inject_concurrency_modifierfor sync fallbackTest plan
Endpoint.max_concurrencyvalidation (default, explicit, zero, negative)ResourceConfig.max_concurrencyround-trip through dict