Skip to content

Improve HTTP response handling and retry behavior#1342

Open
MichaelGHSeg wants to merge 48 commits intomasterfrom
response-status-updates
Open

Improve HTTP response handling and retry behavior#1342
MichaelGHSeg wants to merge 48 commits intomasterfrom
response-status-updates

Conversation

@MichaelGHSeg
Copy link
Copy Markdown
Collaborator

@MichaelGHSeg MichaelGHSeg commented Feb 12, 2026

Summary

This PR improves HTTP response handling and retry behavior across browser and node SDKs to align with analytics-java implementation.

Key Changes

HTTP Headers:

  • Send X-Retry-Count header on retries after initial attempt
  • Add Authorization headers with Basic auth (browser) and OAuth fallback (node)

Retry Behavior:

  • Cap Retry-After header values at 300 seconds maximum
  • Remove 413 (Payload Too Large) from retryable status codes - payload won't shrink on retry
  • Standardize backoff timing: 100ms minimum, 60 second maximum
  • Increase default maxRetries to 10 (was 3 for node, 10 for browser)

Commits

  1. Always send X-Retry-Count and Authorization headers
  2. Add Retry-After cap and fix 413 handling
  3. Fix node publisher to always send X-Retry-Count header
  4. Standardize backoff timing: 100ms min, 60s max
  5. Set default maxRetries to 10

Test Coverage

  • Added tests for Authorization header in all dispatchers
  • Added tests for Retry-After capping behavior
  • Added tests for 413 non-retryable behavior
  • Updated all existing tests to reflect new X-Retry-Count behavior
  • Updated backoff tests for new timing expectations

All tests passing:

  • Browser: 28 tests (batched-dispatcher), 11 tests (fetch-dispatcher)
  • Node: 42 tests (publisher)
  • Core: 3 tests (backoff)

🤖 Generated with Claude Code

MichaelGHSeg and others added 12 commits January 16, 2026 11:49
- Send X-Retry-Count header on all requests (0 for first attempt)
- Add Basic auth using write key for browser dispatchers
- Node publisher prefers OAuth Bearer, falls back to Basic auth

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cap Retry-After header at 300 seconds (MAX_RETRY_AFTER_SECONDS)
- Remove 413 from retryable status codes (payload won't shrink on retry)
- Fix fetch-dispatcher to base64 encode Authorization header
- Add test coverage for Authorization header in all dispatchers
- Add tests for Retry-After capping behavior
- Update test expectations for X-Retry-Count always being sent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove conditional X-Retry-Count header logic in publisher.ts
- Always send X-Retry-Count starting with '0' on first attempt
- Update all test expectations from toBeUndefined() to toBe('0')
- Update T01 test name to reflect header is now sent

This aligns node behavior with browser dispatchers which already
always send the header.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update core and browser backoff defaults: minTimeout 100ms (was 500ms), maxTimeout 60s (was Infinity)
- Update node publisher explicit backoff params to match
- Update backoff tests to reflect new timing expectations
- Aligns with analytics-java backoff timing (100ms base, 1min cap)

This provides faster initial retries while preventing excessive delays,
improving both responsiveness and resource efficiency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update node default maxRetries from 3 to 1000
- Update browser default maxRetries from 10 to 1000
- Aligns with analytics-java which uses 1000 max flush attempts

With the shorter 100ms base backoff (60s max), higher retry limits
allow better recovery while still respecting the backoff timing caps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Feb 12, 2026

🦋 Changeset detected

Latest commit: 25271b3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@segment/analytics-next Minor
@segment/analytics-node Minor
@segment/analytics-core Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 95.76271% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.68%. Comparing base (8f10626) to head (25271b3).

Files with missing lines Patch % Lines
packages/node/src/plugins/segmentio/publisher.ts 92.68% 9 Missing ⚠️
...rowser/src/plugins/segmentio/batched-dispatcher.ts 95.87% 4 Missing ⚠️
packages/browser/src/plugins/segmentio/index.ts 92.59% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1342      +/-   ##
==========================================
+ Coverage   91.32%   91.68%   +0.36%     
==========================================
  Files         163      163              
  Lines        4393     4657     +264     
  Branches     1052     1151      +99     
==========================================
+ Hits         4012     4270     +258     
- Misses        381      387       +6     
Flag Coverage Δ
browser 92.52% <97.15%> (+0.29%) ⬆️
core 90.12% <100.00%> (-0.05%) ⬇️
node 90.07% <93.66%> (+1.55%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Updated header expectations in http-integration.test.ts to include Authorization
- Updated header expectations in http-client.integration.test.ts to include X-Retry-Count and Authorization
- Added maxRetries: 3 to OAuth integration tests that expect exactly 3 retry attempts
- Added maxRetries: 0 to timeout test to prevent excessive retries during timeout testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the browser and node SDK HTTP dispatch/retry logic to align response handling (headers, retryability, backoff timing) with the analytics-java implementation.

Changes:

  • Standardizes request headers (adds Authorization and X-Retry-Count) and updates tests accordingly.
  • Revises retry policy (Retry-After handling + cap, retryable status allow/deny lists, standardized backoff bounds).
  • Increases default retry budget (notably node default maxRetries to 1000) and adjusts token refresh retry behavior.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
packages/node/src/plugins/segmentio/publisher.ts New header handling + revised retry semantics including Retry-After support and new success/retry classification.
packages/node/src/plugins/segmentio/tests/publisher.test.ts Expands coverage for new retry semantics, headers, and status handling.
packages/node/src/lib/token-manager.ts Adjusts OAuth retry pacing, switches rate-limit handling to Retry-After, and fixes token validity logic.
packages/node/src/lib/tests/token-manager.test.ts Adds/updates tests for token validity, Retry-After spacing, and refresh scheduling.
packages/node/src/app/analytics-node.ts Raises default maxRetries to 1000.
packages/node/src/tests/test-helpers/assert-shape/segment-http-api.ts Makes header assertions resilient to new required headers.
packages/node/src/tests/oauth.integration.test.ts Updates OAuth integration expectations for Retry-After and retry limits.
packages/node/src/tests/http-integration.test.ts Updates integration snapshots/expectations for auth + retry headers.
packages/node/src/tests/http-client.integration.test.ts Updates expected outbound headers for node HTTP client integration.
packages/node/src/tests/emitter.integration.test.ts Updates emitter integration setup for new retry behavior expectations.
packages/core/src/priority-queue/backoff.ts Changes default backoff bounds (min 100ms, max 60s).
packages/core/src/priority-queue/tests/backoff.test.ts Updates core backoff tests to match new bounds.
packages/browser/src/plugins/segmentio/ratelimit-error.ts Adds flag to mark Retry-After retries as “doesn’t consume retry budget”.
packages/browser/src/plugins/segmentio/index.ts Passes attempt count through to dispatcher and drops non-retryable failures.
packages/browser/src/plugins/segmentio/fetch-dispatcher.ts Adds Basic auth header, supports Retry-After, and refines retryable status handling.
packages/browser/src/plugins/segmentio/batched-dispatcher.ts Adds Basic auth + X-Retry-Count, introduces Retry-After handling, and changes batching behavior.
packages/browser/src/plugins/segmentio/tests/retries.test.ts Updates retry semantics tests and adds header expectations.
packages/browser/src/plugins/segmentio/tests/fetch-dispatcher.test.ts New unit tests for standard dispatcher status/Retry-After/header behavior.
packages/browser/src/plugins/segmentio/tests/batched-dispatcher.test.ts Updates batching snapshots and adds retry/header semantics coverage.
packages/browser/src/lib/priority-queue/backoff.ts Mirrors core backoff default changes for browser PQ backoff.
packages/browser/src/lib/priority-queue/tests/backoff.test.ts Updates browser backoff tests to match new bounds.
packages/browser/src/core/mocks/analytics-page-tools.ts Adds a mock module file (currently appears not wired up).
packages/browser/jest.config.js Adds moduleNameMapper for @segment/analytics-page-tools.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MichaelGHSeg and others added 4 commits February 13, 2026 11:57
When buildBatch() splits events due to payload size limits, ensure remaining
events are flushed:
- On success: schedule flush for remaining buffered events
- On retry exhaustion: drop failed batch but continue flushing remaining events

This prevents events from being orphaned indefinitely when batches are split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace shared totalAttempts counter with per-flush requestCount + isRetrying
  flag to fix race condition when pagehide sends concurrent chunks via
  Promise.all (each chunk now correctly gets X-Retry-Count: 0)
- sendBatch takes retryCount parameter instead of using shared mutable state
- Guard Authorization header behind if(writeKey) to avoid unnecessary encoding
- Schedule flush for remaining events on both success and retry exhaustion paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add MAX_RETRY_AFTER_RETRIES (20) safety cap to prevent infinite
  retries when server keeps returning Retry-After headers (node
  publisher and browser batched-dispatcher)
- Lower node maxRetries default from 1000 to 10
- Fix convertHeaders TS warning by removing redundant any cast
- Fix retryCount metadata stamped on wrong events (batch.forEach
  instead of buffer.map)
- Update tests: X-Retry-Count always sent (0 on first attempt),
  413 now non-retryable, 300 treated as success, Retry-After cap

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update priority-queue backoff thresholds for minTimeout 100ms
  (was 500ms): delay assertions lowered from >1000/2000/3000 to
  >200/400/800
- Update integration test to use objectContaining for headers
  since Authorization and X-Retry-Count are now always sent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bsneed
bsneed previously approved these changes Feb 18, 2026
…ap test

- Clamp parsed Retry-After to >= 0 in batched-dispatcher and fetch-dispatcher
- Clamp clockSkew-adjusted wait time to >= 0 in token-manager
- Add T21 test for MAX_RETRY_AFTER_RETRIES safety cap in publisher
- Rename T01 test to reflect that header is '0', not absent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MichaelGHSeg and others added 2 commits February 20, 2026 12:01
The mock was never applied because moduleNameMapper in jest.config.js
points to the real implementation, taking precedence over __mocks__.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…patchers

Extract parseRetryAfter and getStatusBehavior helpers into
shared-dispatcher, replacing duplicated hardcoded status code logic in
both fetch-dispatcher and batched-dispatcher. Status behavior is now
driven by httpConfig (default4xxBehavior, default5xxBehavior,
statusCodeOverrides) from CDN settings, with sensible built-in defaults.

Also wires httpConfig through settings.ts and the segmentio plugin index
so CDN-provided HTTP configuration reaches the dispatchers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MichaelGHSeg and others added 13 commits February 27, 2026 12:54
The batched-dispatcher double-scheme bug is fixed in source, so the
patch no longer applies. run-tests.sh handles stale patch references
gracefully via --check guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align retry configuration with cross-library defaults:
- Node: base backoff 100ms -> 500ms, increase test timeouts
- Browser: max backoff 300s -> 60s, max retries 100 -> 10

Retry-After cap remains at 300s across all libraries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Deep-merge CDN httpConfig over init options so server can override
  retry behavior while client fills in gaps
- Track rate-limit retries separately via WeakMap for correct maxRetryCount

Note: the enabled flag is intentionally not implemented for browser.
It's meant for mobile SDKs to revert to legacy requeue behavior;
browser has no such legacy behavior and honoring it would drop data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add retry, retry-settings to test_suites. Set HTTP_CONFIG_SETTINGS,
SETTINGS_ERROR_FALLBACK, and skip settings-enabled-flag tests (enabled
flag is for mobile SDKs to revert to legacy requeue behavior).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Centralize CDN/init merge logic in resolveHttpConfig(config, cdnConfig)
  instead of duplicating in each caller
- Only override buffer.maxAttempts when retryQueue is not disabled,
  preventing httpConfig from overriding retryQueue: false
- Add unit test for retryQueue: false + httpConfig interaction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backoff base changed from 25ms to 500ms, so 3 retries with exponential
backoff + crypto signing can exceed the previous 10s timeout. Bumped to
30s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With the upcoming v2 endpoint change, initial requests no longer need
the header to distinguish new vs old traffic. First attempt now omits
X-Retry-Count entirely; retries send 1, 2, 3, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reduce flushInterval to 1s (tests don't need 10s)
- Pass explicit 60s timeout to closeAndFlush so retries with
  exponential backoff have time to complete
- Enable retry test suite in e2e-config.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The wildcard dependency ("*") resolved from npm on CI, causing e2e tests
to run against the published SDK rather than the branch under test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Inline single-element RETRY_AFTER_STATUSES array as direct comparison,
  bringing browser UMD bundle back under the 29.7 KB size limit
- Increase timeout on emitter retry test (backoff can take ~5s, matching
  the default Jest timeout)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mschasz
Copy link
Copy Markdown
Collaborator

mschasz commented Apr 6, 2026

Review Summary

Three specialized reviewers (HTTP/Networking, SDK Architecture, Testing/Quality) analyzed this PR for production readiness.


Consensus Status: NOT YET REACHED

The core implementation is solid, but the reviewers identified issues that should be addressed before merge.


Critical Concerns (Must Address)

1. CDN Config Merge Precedence Issue

File: packages/browser/src/plugins/segmentio/shared-dispatcher.ts

The comment says CDN config "wins on overlapping fields", but this is backwards from typical SDK behavior where user-provided config should override server-pushed config. Need to clarify intent and potentially swap merge order.

2. Missing Unit Tests for Config Merge

No tests verify resolveHttpConfig(initConfig, cdnConfig) with both parameters populated. Add tests like:

it('verifies init vs CDN merge precedence', () => {
  const init = { rateLimitConfig: { maxRetryCount: 5 } }
  const cdn = { rateLimitConfig: { maxRetryCount: 20 } }
  const resolved = resolveHttpConfig(init, cdn)
  // Assert expected winner
})

3. Missing Concurrent Batch Handling Tests

The implementation maintains state (requestCount, isRetrying, _rateLimitedUntil) that could race with concurrent batches. No tests verify behavior when multiple batches are in-flight and receive 429s simultaneously.


Minor Concerns (Non-Blocking)

Area Issue Severity
HTTP Date-format Retry-After not supported Low (degrades gracefully)
HTTP X-Retry-Count shows '2' on first retry after 429+Retry-After Documentation
Testing Real timer usage could be flaky under CI load Low
Testing buildBatch partial send not directly tested Low
Docs Typo "alaways" in shared-dispatcher.ts line 66 Trivial
Docs Magic numbers (300s, 20 retries) undocumented Low

What All Reviewers Agree Is Production-Ready

  • HTTP status code handling per RFC specs
  • Exponential backoff with jitter
  • Retry-After parsing (integer format)
  • X-Retry-Count header semantics (omit on first attempt)
  • Duration caps preventing infinite retries
  • Backward compatibility maintained
  • Comprehensive test coverage for core paths
  • Shared dispatcher abstraction design

Recommendation

Address these 3 items to reach consensus:

  1. Clarify/fix CDN config merge precedence - determine intended behavior and fix or document
  2. Add unit tests for two-parameter resolveHttpConfig
  3. Add tests for concurrent batch rate-limiting

🤖 Review conducted by 3 specialized Claude agents (HTTP/Networking, SDK Architecture, Testing/Quality)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… batch on maxRateLimitDuration exceeded

- Move totalAttempts++ after rate-limit sleep check so it only counts actual HTTP requests
- Detect when maxRateLimitDuration is exceeded and drop the batch instead of silently retrying
- Update test expectations for X-Retry-Count and maxRateLimitDuration behavior
- Clamp token-manager backoff attempt to non-negative
Copy link
Copy Markdown
Contributor

@silesky silesky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Multi-Persona Analysis

Synthesized from three independent review passes (Correctness & Logic, Architecture & Design, Security & Edge Cases).


Critical

1. Core backoff default changes silently break all browser plugins

Files: packages/core/src/priority-queue/backoff.ts, packages/browser/src/lib/priority-queue/backoff.ts

The backoff() function defaults changed from minTimeout=500, maxTimeout=Infinity to minTimeout=100, maxTimeout=60000. This function is used by PriorityQueue.pushWithBackoff for ALL browser plugin retries, not just Segment.io delivery. Third-party destination plugins that relied on the old timing will now retry 5x faster with a hard 60s cap.

The node publisher already overrides these defaults explicitly (minTimeout: 500, maxTimeout: 60000 at publisher.ts:501), which demonstrates awareness that the core defaults matter. But the browser's PriorityQueue calls backoff() with no explicit parameters.

Recommendation: Revert core defaults and pass explicit parameters at each call site that needs the new values. This keeps the change scoped to Segment.io delivery without surprising other consumers.

2. maxRetryCount is not clamped, enabling unbounded retries from CDN config

File: shared-dispatcher.ts:292,298

Both rateLimitConfig.maxRetryCount and backoffConfig.maxRetryCount use ?? 10 as default but are never passed through clamp(), unlike every other numeric field. A compromised CDN or bad config can set maxRetryCount: Infinity. In the node publisher, this feeds the safety cap formula totalAttempts > maxRetries + MAX_RETRY_AFTER_RETRIES (publisher.ts:495), which would never trigger with maxRetries = Infinity, effectively removing the only hard loop termination for rate-limited retries.

Recommendation: Add clamp() to both maxRetryCount fields, e.g., clamp(rate?.maxRetryCount, 10, 0, 100).


Important

3. Status code retry policy is duplicated between browser and node with no shared code

Files: shared-dispatcher.ts:166-174 (browser) vs publisher.ts:441-462 (node)

The browser encodes the retry policy via DEFAULT_STATUS_CODE_OVERRIDES and getStatusBehavior(). The node publisher hardcodes the same lists inline: [501, 505, 511] as non-retryable 5xx, [408, 410, 429, 460] as retryable 4xx. These two representations of the same policy will inevitably drift.

The node also reimplements getRetryAfterInSeconds and convertHeaders, duplicating browser parseRetryAfter logic.

Recommendation: Extract shared retry policy types and functions (getStatusBehavior, parseRetryAfter, computeBackoff, DEFAULT_STATUS_CODE_OVERRIDES) into @segment/analytics-core. This can be incremental: move the types/functions now, wire up node in a follow-up PR.

4. Node publisher Retry-After: 0 creates a tight request loop

File: publisher.ts:315-508

When 429 is received with Retry-After: 0, the rate-limit delay is 0ms. The loop sets shouldCountTowardsMaxRetries = false, so delayMs = 0 (line 500-507). It loops back to the top where _isRateLimited() returns false immediately. The code proceeds to make another request with zero delay. With the safety cap at 30 total attempts (default maxRetries=10 + MAX_RETRY_AFTER_RETRIES=20), this fires 30 rapid requests against the rate-limiting server.

Test T21 exercises this scenario and confirms termination, but 30 zero-delay requests against a rate-limiting server is aggressive.

Recommendation: Enforce a minimum delay (e.g., 1000ms) for the rate-limit retry path, even when Retry-After: 0.

5. Batched dispatcher has potential concurrent flush state corruption

File: batched-dispatcher.ts:180-264

isRetrying is cleared synchronously at the top of flush() (line 187), before sendBatch() returns. While the promise is pending, new events arriving via dispatch() can trigger a direct flush() call if the buffer overflows (line 310). This creates two concurrent promise chains operating on the same mutable closure state (requestCount, totalBackoffTime, etc.). The second flush() resets requestCount=0, corrupting the count accumulated by the first.

JavaScript's single-threaded execution means interleaving only occurs at .then/.catch boundaries, and the schedule guard reduces likelihood. But under high throughput with large events, this can occur.

Recommendation: Add a guard at the top of flush() to prevent re-entrant execution (e.g., an isFlushing flag that prevents a second concurrent flush()).


Concerns

6. enabled fields on config types are dead code

File: shared-dispatcher.ts:99,114

RateLimitConfig.enabled and BackoffConfig.enabled are resolved and stored but never checked. Setting enabled: false has no effect. This erodes trust in the API surface.

Recommendation: Remove them until they are functional, or wire them up.

7. Authorization header now sent unconditionally (node), increasing credential exposure

File: publisher.ts:370

Previously, Authorization was only sent when OAuth was configured. Now Authorization: Basic <base64(writeKey:)> is always sent as fallback. The write key was already in the body, but the header format is more likely to be captured by proxy logs, CDN logs, and APM tools. The http_request emitter event (line 385-390) also now includes this header, so monitoring listeners receive the credential.

Recommendation: Document this as an intentional behavior change. Consider redacting the Authorization header from the http_request emitted event.

8. CDN can override statusCodeOverrides to silently drop retryable errors

Files: shared-dispatcher.ts:278-280, batched-dispatcher.ts:170

CDN config merges last and can set statusCodeOverrides: { '429': 'drop', '500': 'drop' }. The batched dispatcher's non-retryable path (line 170) silently returns with no error emitted. Events are lost with no signal.

Recommendation: Log or emit a warning when events are dropped due to overrides on normally-retryable status codes.

9. 1xx responses treated as successful delivery (node)

File: publisher.ts:395

response.status >= 100 && response.status < 400 treats 1xx (e.g., 100 Continue) as success. Standard HTTP clients don't surface 1xx as final responses, but a custom HTTPClient implementation could expose this.

Recommendation: Tighten to >= 200 && < 400.


Summary

Severity Count Key Items
Critical 2 Core backoff breaking change; unclamped maxRetryCount
Important 3 Policy duplication; tight 429 loop; concurrent flush state
Concern 4 Dead enabled fields; auth header exposure; CDN override risk; 1xx as success

attempt,
maxTimeout = Infinity,
} = params
const { minTimeout = 100, factor = 2, attempt, maxTimeout = 60000 } = params
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Silent breaking change for all browser plugins.

This default change (minTimeout: 500→100, maxTimeout: Infinity→60000) affects every consumer of PriorityQueue.pushWithBackoff, including third-party destination plugins. The node publisher already overrides these explicitly (minTimeout: 500, maxTimeout: 60000 in publisher.ts:501).

Recommendation: Revert these defaults and pass explicit parameters at each Segment.io call site that needs the new timing.

},
backoffConfig: {
enabled: backoff?.enabled ?? true,
maxRetryCount: backoff?.maxRetryCount ?? 10,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: maxRetryCount is not clamped.

Both rateLimitConfig.maxRetryCount (line 292) and backoffConfig.maxRetryCount (here) use ?? 10 but skip clamp(), unlike every other numeric field. A compromised CDN config or bad user input can set maxRetryCount: Infinity, which disables the safety cap in the node publisher (totalAttempts > maxRetries + MAX_RETRY_AFTER_RETRIES) and enables unbounded retries.

Recommendation: clamp(backoff?.maxRetryCount, 10, 0, 100).

// 511 is retried only when a token manager is configured.
if (status === 511 && this._tokenManager) {
shouldRetry = true
} else if (![501, 505, 511].includes(status)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Status code retry policy is duplicated with no shared code.

The browser encodes the policy via DEFAULT_STATUS_CODE_OVERRIDES and getStatusBehavior() in shared-dispatcher.ts. This file hardcodes the same lists inline: [501, 505, 511] as non-retryable 5xx, [408, 410, 429, 460] as retryable 4xx. getRetryAfterInSeconds and convertHeaders also duplicate browser parseRetryAfter logic.

These will inevitably drift. Consider extracting the shared policy into @segment/analytics-core (can be incremental: move types/functions now, wire up node in a follow-up).

maxTimeout: 1000,
})
)
const delayMs = shouldCountTowardsMaxRetries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Retry-After: 0 creates a tight request loop.

When 429 has Retry-After: 0, delayMs = 0 (line 500-507, because shouldCountTowardsMaxRetries = false). The loop iterates back to the top where _isRateLimited() returns false immediately (deadline already past). Another request fires with zero delay. With the safety cap at 30 total attempts (maxRetries=10 + MAX_RETRY_AFTER_RETRIES=20), this fires 30 rapid requests against the rate-limiting server.

Recommendation: Enforce a minimum delay (e.g., 1000ms) for the rate-limit retry path.

totalBackoffTime = 0
totalRateLimitTime = 0
}
isRetrying = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Potential concurrent flush state corruption.

isRetrying is cleared synchronously at line 187, before sendBatch() resolves. While the promise is pending, new events via dispatch() can trigger a direct flush() if buffer overflows (line 310), since isRetrying is false. Two concurrent promise chains then operate on the same mutable closure state (requestCount, totalBackoffTime, etc.). The second flush() resets requestCount=0, corrupting the first's count.

The schedule guard and JS single-threading reduce likelihood, but high-throughput scenarios with large events can hit this.

Recommendation: Add an isFlushing flag to prevent re-entrant flush().

* Browser SDK already had rate-limit handling before this config and currently keeps existing behavior.
* @default true
*/
enabled?: boolean
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern: enabled is resolved and stored but never checked anywhere.

Both RateLimitConfig.enabled and BackoffConfig.enabled are carried through resolution but no code path reads them. Setting enabled: false has no effect, which is misleading.

Recommendation: Remove until functional, or wire them up. A no-op config field erodes trust in the API surface.

// Prefer OAuth Bearer token when available; otherwise fall back to Basic auth with write key.
...(authString
? { Authorization: authString }
: { Authorization: `Basic ${this._basicAuth}` }),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern: Authorization header now sent unconditionally, increasing credential exposure.

Previously, Authorization was only sent when OAuth was configured. Now Basic <base64(writeKey:)> is always the fallback. The write key was already in the body, but the header format is more likely to be captured by proxy logs, CDN logs, and APM tools. The http_request emitter event (lines 385-390) also includes this header, so monitoring listeners receive the credential.

Consider redacting Authorization from the emitted event, or documenting this as intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants