feat: add concurrency_modifier support for QB endpoints by deanq · Pull Request #301 · runpod/flash

deanq · 2026-04-08T00:26:06Z

Summary

Add max_concurrency parameter to Endpoint for controlling concurrent job execution on QB endpoints
Propagate max_concurrency through ResourceConfig and manifest, with LB endpoint warning/stripping
Generate async handler templates with concurrency_modifier for max_concurrency > 1 + async handlers
Inject concurrency_modifier into sync handler templates when max_concurrency > 1 (with warning)

Linear

AE-2415

Changes

endpoint.py: new max_concurrency param with validation (>= 1)
runtime/models.py: max_concurrency field on ResourceConfig
manifest.py: propagate from Endpoint facade, warn and strip for LB endpoints
handler_generator.py: new DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE with concurrency_modifier, plus _inject_concurrency_modifier for sync fallback

Test plan

Unit tests for Endpoint.max_concurrency validation (default, explicit, zero, negative)
Unit tests for ResourceConfig.max_concurrency round-trip through dict
Unit tests for manifest builder (QB includes, LB warns and strips)
Unit tests for handler codegen: async function, async class, sync function, sync class, default (no modifier)
Unit tests for warning logs on sync + concurrency and high concurrency values
All 2452 tests pass, 86% coverage, quality-check green

promptless · 2026-04-08T00:32:23Z

Promptless prepared a documentation update related to this change.

Triggered by runpod/flash#301

Added documentation for the new max_concurrency parameter including parameter overview, usage examples with async handlers, guidance for different workload types, and clarification that it only applies to queue-based endpoints.

Review at https://app.gopromptless.ai/suggestions/640ef899-880f-42f8-a055-db3680e8a632

Copilot

Pull request overview

Adds max_concurrency support intended to control concurrent job execution for queue-based (QB) endpoints, and propagates it through manifest generation and handler code generation.

Changes:

Add max_concurrency to Endpoint (with validation) and to runtime ResourceConfig.
Propagate max_concurrency into the build manifest (and warn/strip for load-balanced endpoints).
Generate handlers with concurrency_modifier when max_concurrency > 1 (new async templates + sync injection path), plus tests.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
uv.lock	Bumps `runpod-flash` version metadata.
tests/unit/test_endpoint.py	Adds unit tests for `Endpoint.max_concurrency` validation/defaults.
tests/unit/test_concurrency_manifest.py	Adds tests for `ResourceConfig.max_concurrency` and manifest behavior for QB vs LB.
tests/unit/cli/commands/build_utils/test_handler_generator.py	Adds coverage for handler codegen with/without `concurrency_modifier` and warning behavior.
src/runpod_flash/runtime/models.py	Adds `max_concurrency` field to `ResourceConfig` and `from_dict` handling.
src/runpod_flash/endpoint.py	Adds `max_concurrency` parameter + validation and stores it on the endpoint instance.
src/runpod_flash/cli/commands/build_utils/manifest.py	Extracts `max_concurrency` during build; warns/strips for LB endpoints.
src/runpod_flash/cli/commands/build_utils/handler_generator.py	Adds async handler templates and injects `concurrency_modifier` for `max_concurrency > 1`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/runpod_flash/cli/commands/build_utils/manifest.py

src/runpod_flash/cli/commands/build_utils/handler_generator.py

tests/unit/cli/commands/build_utils/test_handler_generator.py

runpod-Henrik

PR #301 — `feat: add concurrency_modifier support for QB endpoints`

Henrik's AI-Powered Bug Finder

1. Question: What happens if `is_async` is incorrectly detected?

If the scanner marks a sync function as is_async=True (or vice versa), the wrong template is selected:

is_async=True on a sync function → generated handler does await func(**job_input) on a non-coroutine → runtime TypeError on every job
is_async=False on an async function → sync template used, handler calls func(**job_input) without await → function returns a coroutine object, not a result

Is is_async detection in the scanner reliable for all QB worker patterns? If the scanner can be fooled (e.g., a function decorated with @functools.wraps wrapping an async function), users get a silently broken deployment.

2. Test coverage gap: no live concurrency test

All tests verify that the string concurrency_modifier appears in generated code. None verify that the RunPod platform actually honours it — i.e., that a deployed worker with max_concurrency=5 accepts 5 simultaneous jobs without queuing them. Noted as a gap, not a blocker.

Nits

_METHODS = {methods_dict} maps {m: m} — values are never read, only keys. Same pattern as the existing sync class template, so not new, but worth cleaning up.
The high-concurrency warning threshold of > 100 is arbitrary and undocumented. Worth a comment explaining the rationale.

Verdict: PASS WITH NITS

Core implementation is correct — validation, manifest propagation, template selection, and LB stripping all work as described. The is_async detection question is the main open item but depends on the scanner which is outside this PR's scope.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/runpod_flash/cli/commands/build_utils/handler_generator.py

Add max_concurrency int field (default 1) to ResourceConfig dataclass and its from_dict() parser. Enables manifest to carry per-resource concurrency config for QB workers.

Read _max_concurrency from Endpoint facade before unwrapping to internal resource config in _extract_deployment_config(). Only set in manifest when value > 1 (default 1 means no override needed). For LB endpoints, max_concurrency is meaningless (uvicorn handles concurrency), so warn and remove the field in build() to keep the manifest schema consistent and avoid silent misconfiguration.

Add DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE for max_concurrency > 1 with async handlers. Update _generate_handler to read max_concurrency from resource_data and select template accordingly. Sync handlers with max_concurrency > 1 get concurrency_modifier injected via string replacement. Includes 10 new tests covering all template selection paths and warning logs.

The concurrency feature accidentally tightened the name validation guard and removed auto-derive in __call__ and the _route name check, breaking nameless QB decorator mode and LB stub routing.

- Fix inline @endpoint() max_concurrency loss: persist into __remote_config__ so manifest builder can read it when the Endpoint facade is not reachable as a module-level variable. - Fix scanner is_async for class-based workers: detect async public methods so the correct handler template is selected. - Harden _inject_concurrency_modifier: raise ValueError if the expected start call string is absent instead of silently producing a handler without concurrency_modifier. - Add rationale comment for the >100 concurrency warning threshold. - Add tests for all new behaviors.

runpod-Henrik

1. `manifest.py` — `remote_cfg` is always `None`; inline decorator path is dead code

remote_cfg is initialized to None and never reassigned in the diff. The elif isinstance(remote_cfg, dict) branch that's supposed to pick up max_concurrency for inline @Endpoint() decorators is therefore unreachable.

resource_config = None
remote_cfg = None          # ← set here

# ... resource_config gets assigned ...

elif (
    isinstance(remote_cfg, dict)   # ← always False; remote_cfg is still None
    and remote_cfg.get("max_concurrency", 1) > 1
):
    config["max_concurrency"] = remote_cfg["max_concurrency"]

endpoint.py correctly writes max_concurrency into __remote_config__ on the decorated callable, but manifest.py never reads it. The fix is to assign remote_cfg after resource_config is set:

remote_cfg = getattr(resource_config, "__remote_config__", None)

The existing test only verifies that max_concurrency is persisted into __remote_config__; there's no test that the manifest builder actually reads it for inline decorators.

2. Async handler templates don't apply the empty-input validation from PR #302

DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE still use:

job_input = job.get("input", {{}})

PR #302 (in review simultaneously) adds None/empty-dict rejection to the sync templates and create_deployed_handler. Once both land, sync QB handlers reject {} but async QB handlers (selected when max_concurrency > 1) silently invoke with empty kwargs. Users migrating to max_concurrency would see inconsistent platform parity depending on whether their handler is sync or async.

3. Mixed sync/async class methods with `max_concurrency > 1` will `TypeError` at runtime

The scanner correctly detects a class as async if any public method is a coroutine. The async class template then dispatches with await method(**job_input). If the dispatched method is the sync one, await sync_return_value raises TypeError: object dict can't be used in 'await' expression.

The warning about sync handlers only fires at the function level. For a class with e.g. a sync setup() and an async predict(), the class is classified as async, the async template is selected, and calling setup via the deployed endpoint will crash. test_mixed_class_detected_as_async confirms the detection behaviour but there's no test for what happens when the sync method is dispatched.

4. Async class template error dicts missing `"success": False`

The DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE error responses for unknown and missing method key don't include "success": False:

return {{"error": "class {class_name} has multiple methods..."}}

The sync class template (modified in PR #302) has "success": False in these same paths. If both PRs merge the response shape differs based on concurrency configuration.

5. `_inject_concurrency_modifier` string-replacement is fragile but guarded

The injector matches 'runpod.serverless.start({"handler": handler})' as a literal string. Any future template change that reformats that call would cause ValueError at build time rather than silent corruption. The raise-on-miss behaviour and the test for it are the right calls; worth keeping in mind for future template edits.

Nit

The max_concurrency > 100 warning fires at flash build time, not flash deploy. A user setting max_concurrency=200 for a high-throughput CPU endpoint (e.g. embedding service) will see a GPU-oriented warning with no indication their workload is unaffected.

Verdict

Three things need resolution: (1) remote_cfg is never assigned — inline @Endpoint(max_concurrency=N) decorator is silently broken; (2) async templates skip the empty-input validation being added in PR #302; (3) mixed sync/async class methods crash at runtime when the sync method is dispatched. Issue 1 is a functional regression; 2 and 3 can be deferred with documentation if needed, but should be tracked.

deanq requested a review from Copilot April 8, 2026 00:35

Copilot started reviewing on behalf of deanq April 8, 2026 00:36 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

src/runpod_flash/cli/commands/build_utils/manifest.py Show resolved Hide resolved

src/runpod_flash/cli/commands/build_utils/handler_generator.py Show resolved Hide resolved

tests/unit/cli/commands/build_utils/test_handler_generator.py Show resolved Hide resolved

runpod-Henrik reviewed Apr 8, 2026

View reviewed changes

deanq requested a review from Copilot April 8, 2026 06:48

Copilot started reviewing on behalf of deanq April 8, 2026 06:49 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

src/runpod_flash/cli/commands/build_utils/handler_generator.py Show resolved Hide resolved

deanq force-pushed the deanq/ae-2415-support-for-concurrency_modifier branch from 62f3c86 to 9873dcb Compare April 9, 2026 21:37

deanq added 6 commits April 9, 2026 15:30

feat(endpoint): add max_concurrency parameter to Endpoint

4d69ee0

feat(models): add max_concurrency to ResourceConfig

2316d9f

Add max_concurrency int field (default 1) to ResourceConfig dataclass and its from_dict() parser. Enables manifest to carry per-resource concurrency config for QB workers.

fix(endpoint): restore name validation and auto-derive logic

bdf6e1d

The concurrency feature accidentally tightened the name validation guard and removed auto-derive in __call__ and the _route name check, breaking nameless QB decorator mode and LB stub routing.

deanq force-pushed the deanq/ae-2415-support-for-concurrency_modifier branch from 9873dcb to 1f1e91d Compare April 9, 2026 22:30

deanq requested a review from KAJdev April 10, 2026 20:33

runpod-Henrik reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add concurrency_modifier support for QB endpoints#301

feat: add concurrency_modifier support for QB endpoints#301
deanq wants to merge 6 commits intomainfrom
deanq/ae-2415-support-for-concurrency_modifier

deanq commented Apr 8, 2026

Uh oh!

promptless bot commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

runpod-Henrik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deanq commented Apr 8, 2026

Summary

Linear

Changes

Test plan

Uh oh!

promptless bot commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

PR #301 — feat: add concurrency_modifier support for QB endpoints

1. Question: What happens if is_async is incorrectly detected?

2. Test coverage gap: no live concurrency test

Nits

Verdict: PASS WITH NITS

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. manifest.py — remote_cfg is always None; inline decorator path is dead code

2. Async handler templates don't apply the empty-input validation from PR #302

3. Mixed sync/async class methods with max_concurrency > 1 will TypeError at runtime

4. Async class template error dicts missing "success": False

5. _inject_concurrency_modifier string-replacement is fragile but guarded

Nit

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PR #301 — `feat: add concurrency_modifier support for QB endpoints`

1. Question: What happens if `is_async` is incorrectly detected?

1. `manifest.py` — `remote_cfg` is always `None`; inline decorator path is dead code

3. Mixed sync/async class methods with `max_concurrency > 1` will `TypeError` at runtime

4. Async class template error dicts missing `"success": False`

5. `_inject_concurrency_modifier` string-replacement is fragile but guarded