Skip to content

feat: add concurrency_modifier support for QB endpoints#301

Open
deanq wants to merge 6 commits intomainfrom
deanq/ae-2415-support-for-concurrency_modifier
Open

feat: add concurrency_modifier support for QB endpoints#301
deanq wants to merge 6 commits intomainfrom
deanq/ae-2415-support-for-concurrency_modifier

Conversation

@deanq
Copy link
Copy Markdown
Member

@deanq deanq commented Apr 8, 2026

Summary

  • Add max_concurrency parameter to Endpoint for controlling concurrent job execution on QB endpoints
  • Propagate max_concurrency through ResourceConfig and manifest, with LB endpoint warning/stripping
  • Generate async handler templates with concurrency_modifier for max_concurrency > 1 + async handlers
  • Inject concurrency_modifier into sync handler templates when max_concurrency > 1 (with warning)

Linear

AE-2415

Changes

  • endpoint.py: new max_concurrency param with validation (>= 1)
  • runtime/models.py: max_concurrency field on ResourceConfig
  • manifest.py: propagate from Endpoint facade, warn and strip for LB endpoints
  • handler_generator.py: new DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE with concurrency_modifier, plus _inject_concurrency_modifier for sync fallback

Test plan

  • Unit tests for Endpoint.max_concurrency validation (default, explicit, zero, negative)
  • Unit tests for ResourceConfig.max_concurrency round-trip through dict
  • Unit tests for manifest builder (QB includes, LB warns and strips)
  • Unit tests for handler codegen: async function, async class, sync function, sync class, default (no modifier)
  • Unit tests for warning logs on sync + concurrency and high concurrency values
  • All 2452 tests pass, 86% coverage, quality-check green

@promptless
Copy link
Copy Markdown

promptless bot commented Apr 8, 2026

Promptless prepared a documentation update related to this change.

Triggered by runpod/flash#301

Added documentation for the new max_concurrency parameter including parameter overview, usage examples with async handlers, guidance for different workload types, and clarification that it only applies to queue-based endpoints.

Review at https://app.gopromptless.ai/suggestions/640ef899-880f-42f8-a055-db3680e8a632

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds max_concurrency support intended to control concurrent job execution for queue-based (QB) endpoints, and propagates it through manifest generation and handler code generation.

Changes:

  • Add max_concurrency to Endpoint (with validation) and to runtime ResourceConfig.
  • Propagate max_concurrency into the build manifest (and warn/strip for load-balanced endpoints).
  • Generate handlers with concurrency_modifier when max_concurrency > 1 (new async templates + sync injection path), plus tests.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Bumps runpod-flash version metadata.
tests/unit/test_endpoint.py Adds unit tests for Endpoint.max_concurrency validation/defaults.
tests/unit/test_concurrency_manifest.py Adds tests for ResourceConfig.max_concurrency and manifest behavior for QB vs LB.
tests/unit/cli/commands/build_utils/test_handler_generator.py Adds coverage for handler codegen with/without concurrency_modifier and warning behavior.
src/runpod_flash/runtime/models.py Adds max_concurrency field to ResourceConfig and from_dict handling.
src/runpod_flash/endpoint.py Adds max_concurrency parameter + validation and stores it on the endpoint instance.
src/runpod_flash/cli/commands/build_utils/manifest.py Extracts max_concurrency during build; warns/strips for LB endpoints.
src/runpod_flash/cli/commands/build_utils/handler_generator.py Adds async handler templates and injects concurrency_modifier for max_concurrency > 1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #301feat: add concurrency_modifier support for QB endpoints

Henrik's AI-Powered Bug Finder


1. Question: What happens if is_async is incorrectly detected?

If the scanner marks a sync function as is_async=True (or vice versa), the wrong template is selected:

  • is_async=True on a sync function → generated handler does await func(**job_input) on a non-coroutine → runtime TypeError on every job
  • is_async=False on an async function → sync template used, handler calls func(**job_input) without await → function returns a coroutine object, not a result

Is is_async detection in the scanner reliable for all QB worker patterns? If the scanner can be fooled (e.g., a function decorated with @functools.wraps wrapping an async function), users get a silently broken deployment.


2. Test coverage gap: no live concurrency test

All tests verify that the string concurrency_modifier appears in generated code. None verify that the RunPod platform actually honours it — i.e., that a deployed worker with max_concurrency=5 accepts 5 simultaneous jobs without queuing them. Noted as a gap, not a blocker.


Nits

  • _METHODS = {methods_dict} maps {m: m} — values are never read, only keys. Same pattern as the existing sync class template, so not new, but worth cleaning up.
  • The high-concurrency warning threshold of > 100 is arbitrary and undocumented. Worth a comment explaining the rationale.

Verdict: PASS WITH NITS

Core implementation is correct — validation, manifest propagation, template selection, and LB stripping all work as described. The is_async detection question is the main open item but depends on the scanner which is outside this PR's scope.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@deanq deanq force-pushed the deanq/ae-2415-support-for-concurrency_modifier branch from 62f3c86 to 9873dcb Compare April 9, 2026 21:37
deanq added 6 commits April 9, 2026 15:30
Add max_concurrency int field (default 1) to ResourceConfig dataclass
and its from_dict() parser. Enables manifest to carry per-resource
concurrency config for QB workers.
Read _max_concurrency from Endpoint facade before unwrapping to internal
resource config in _extract_deployment_config(). Only set in manifest
when value > 1 (default 1 means no override needed).

For LB endpoints, max_concurrency is meaningless (uvicorn handles
concurrency), so warn and remove the field in build() to keep the
manifest schema consistent and avoid silent misconfiguration.
Add DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE
for max_concurrency > 1 with async handlers. Update _generate_handler to read
max_concurrency from resource_data and select template accordingly. Sync handlers
with max_concurrency > 1 get concurrency_modifier injected via string replacement.
Includes 10 new tests covering all template selection paths and warning logs.
The concurrency feature accidentally tightened the name validation
guard and removed auto-derive in __call__ and the _route name check,
breaking nameless QB decorator mode and LB stub routing.
- Fix inline @endpoint() max_concurrency loss: persist into
  __remote_config__ so manifest builder can read it when the
  Endpoint facade is not reachable as a module-level variable.
- Fix scanner is_async for class-based workers: detect async
  public methods so the correct handler template is selected.
- Harden _inject_concurrency_modifier: raise ValueError if the
  expected start call string is absent instead of silently
  producing a handler without concurrency_modifier.
- Add rationale comment for the >100 concurrency warning threshold.
- Add tests for all new behaviors.
@deanq deanq force-pushed the deanq/ae-2415-support-for-concurrency_modifier branch from 9873dcb to 1f1e91d Compare April 9, 2026 22:30
@deanq deanq requested a review from KAJdev April 10, 2026 20:33
Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. manifest.pyremote_cfg is always None; inline decorator path is dead code

remote_cfg is initialized to None and never reassigned in the diff. The elif isinstance(remote_cfg, dict) branch that's supposed to pick up max_concurrency for inline @Endpoint() decorators is therefore unreachable.

resource_config = None
remote_cfg = None          # ← set here

# ... resource_config gets assigned ...

elif (
    isinstance(remote_cfg, dict)   # ← always False; remote_cfg is still None
    and remote_cfg.get("max_concurrency", 1) > 1
):
    config["max_concurrency"] = remote_cfg["max_concurrency"]

endpoint.py correctly writes max_concurrency into __remote_config__ on the decorated callable, but manifest.py never reads it. The fix is to assign remote_cfg after resource_config is set:

remote_cfg = getattr(resource_config, "__remote_config__", None)

The existing test only verifies that max_concurrency is persisted into __remote_config__; there's no test that the manifest builder actually reads it for inline decorators.


2. Async handler templates don't apply the empty-input validation from PR #302

DEPLOYED_ASYNC_HANDLER_TEMPLATE and DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE still use:

job_input = job.get("input", {{}})

PR #302 (in review simultaneously) adds None/empty-dict rejection to the sync templates and create_deployed_handler. Once both land, sync QB handlers reject {} but async QB handlers (selected when max_concurrency > 1) silently invoke with empty kwargs. Users migrating to max_concurrency would see inconsistent platform parity depending on whether their handler is sync or async.


3. Mixed sync/async class methods with max_concurrency > 1 will TypeError at runtime

The scanner correctly detects a class as async if any public method is a coroutine. The async class template then dispatches with await method(**job_input). If the dispatched method is the sync one, await sync_return_value raises TypeError: object dict can't be used in 'await' expression.

The warning about sync handlers only fires at the function level. For a class with e.g. a sync setup() and an async predict(), the class is classified as async, the async template is selected, and calling setup via the deployed endpoint will crash. test_mixed_class_detected_as_async confirms the detection behaviour but there's no test for what happens when the sync method is dispatched.


4. Async class template error dicts missing "success": False

The DEPLOYED_ASYNC_CLASS_HANDLER_TEMPLATE error responses for unknown and missing method key don't include "success": False:

return {{"error": "class {class_name} has multiple methods..."}}

The sync class template (modified in PR #302) has "success": False in these same paths. If both PRs merge the response shape differs based on concurrency configuration.


5. _inject_concurrency_modifier string-replacement is fragile but guarded

The injector matches 'runpod.serverless.start({"handler": handler})' as a literal string. Any future template change that reformats that call would cause ValueError at build time rather than silent corruption. The raise-on-miss behaviour and the test for it are the right calls; worth keeping in mind for future template edits.


Nit

The max_concurrency > 100 warning fires at flash build time, not flash deploy. A user setting max_concurrency=200 for a high-throughput CPU endpoint (e.g. embedding service) will see a GPU-oriented warning with no indication their workload is unaffected.


Verdict

Three things need resolution: (1) remote_cfg is never assigned — inline @Endpoint(max_concurrency=N) decorator is silently broken; (2) async templates skip the empty-input validation being added in PR #302; (3) mixed sync/async class methods crash at runtime when the sync method is dispatched. Issue 1 is a functional regression; 2 and 3 can be deferred with documentation if needed, but should be tracked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants