Skip to content

feat(releases): Cache calls to compare-commits #112494

Merged
armenzg merged 6 commits intomasterfrom
armenzg/fix/github-fetch-commits-cache
Apr 9, 2026
Merged

feat(releases): Cache calls to compare-commits #112494
armenzg merged 6 commits intomasterfrom
armenzg/fix/github-fetch-commits-cache

Conversation

@armenzg
Copy link
Copy Markdown
Member

@armenzg armenzg commented Apr 8, 2026

A customer has a large number of GitHub API rate limit emails coming in and the highest referrer is compare-commits.

This branch aims to temporarily cache and protect from many incoming calls to compare-commits and see if it solves the problem.

@armenzg armenzg self-assigned this Apr 8, 2026
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 8, 2026
- Added a new org feature flag in `src/sentry/features/temporary.py`:
  - `organizations:integrations-github-fetch-commits-compare-cache`

- Refactored commit fetching in `src/sentry/tasks/commits.py`:
  - Added `fetch_compare_commits(...)` helper to centralize compare-commits behavior.
  - Added cache key generation with `get_github_compare_commits_cache_key(...)`.
  - Added cache-backed reuse of GitHub compare-commits results (TTL: 120s), gated by the new feature flag.
  - Limited caching to GitHub providers (`integrations:github`, `integrations:github_enterprise`) and only when `start_sha` is present.
  - Added lifecycle extras for cache telemetry (`compare_commits_cache_enabled`, `compare_commits_cache_hit`).
  - `fetch_commits(...)` now evaluates the feature flag once and routes compare calls through the helper.

- Added provider-level test coverage in `tests/sentry/integrations/github/test_repository.py`:
  - New test verifies repeated compare calls reuse cached patchset data and reduce API calls.

- Added task-level cache behavior tests in `tests/sentry/tasks/test_commits.py`:
  - Cache disabled: compare called on each fetch.
  - Cache enabled: compare result reused across releases with the same compare range.
  - Cache key variance: different `end_sha` values do not share cache entries.

- Included a follow-up typing fix in `src/sentry/tasks/commits.py`:
  - Narrowed `repo.provider` to `str` before cache-key creation to satisfy mypy (`str | None` -> `str`).
@armenzg armenzg force-pushed the armenzg/fix/github-fetch-commits-cache branch from 88b35be to 6b1a01d Compare April 8, 2026 18:12
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Provider parameter overwritten with string, breaking compare_commits calls
    • Renamed repo.provider assignment to repo_provider_name to preserve the provider object parameter for compare_commits calls.

Create PR

Or push these changes by commenting:

@cursor push 24519ed48a
Preview (24519ed48a)
diff --git a/src/sentry/tasks/commits.py b/src/sentry/tasks/commits.py
--- a/src/sentry/tasks/commits.py
+++ b/src/sentry/tasks/commits.py
@@ -95,15 +95,15 @@
     lifecycle,
 ):
     cache_key = None
-    provider = repo.provider
+    repo_provider_name = repo.provider
     if (
         cache_enabled
-        and isinstance(provider, str)
-        and provider in GITHUB_CACHEABLE_REPOSITORY_PROVIDERS
+        and isinstance(repo_provider_name, str)
+        and repo_provider_name in GITHUB_CACHEABLE_REPOSITORY_PROVIDERS
         and start_sha is not None
     ):
         cache_key = get_github_compare_commits_cache_key(
-            repo.organization_id, repo.id, provider, start_sha, end_sha
+            repo.organization_id, repo.id, repo_provider_name, start_sha, end_sha
         )
 
     if cache_key is not None:

You can send follow-ups to the cloud agent here.

@github-actions

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Tests pass wrong kwarg, silently breaking cache validation
    • Changed all 13 occurrences of 'previous_release_id=' to 'prev_release_id=' to match the actual function parameter name, enabling proper cache validation testing.

Create PR

Or push these changes by commenting:

@cursor push c275806d00
Preview (c275806d00)
diff --git a/tests/sentry/tasks/test_commits.py b/tests/sentry/tasks/test_commits.py
--- a/tests/sentry/tasks/test_commits.py
+++ b/tests/sentry/tasks/test_commits.py
@@ -52,7 +52,7 @@
                     release_id=release2.id,
                     user_id=user.id,
                     refs=refs,
-                    previous_release_id=release.id,
+                    prev_release_id=release.id,
                 )
 
         commit_list = list(
@@ -129,13 +129,13 @@
                 release_id=first_release.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=previous_release.id,
+                prev_release_id=previous_release.id,
             )
             fetch_commits(
                 release_id=second_release.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=previous_release.id,
+                prev_release_id=previous_release.id,
             )
 
         assert mock_compare_commits.call_count == 2
@@ -178,13 +178,13 @@
                     release_id=first_release.id,
                     user_id=self.user.id,
                     refs=refs,
-                    previous_release_id=previous_release.id,
+                    prev_release_id=previous_release.id,
                 )
                 fetch_commits(
                     release_id=second_release.id,
                     user_id=self.user.id,
                     refs=refs,
-                    previous_release_id=previous_release.id,
+                    prev_release_id=previous_release.id,
                 )
 
         assert mock_compare_commits.call_count == 1
@@ -231,13 +231,13 @@
                     release_id=first_release.id,
                     user_id=self.user.id,
                     refs=refs_first,
-                    previous_release_id=previous_release.id,
+                    prev_release_id=previous_release.id,
                 )
                 fetch_commits(
                     release_id=second_release.id,
                     user_id=self.user.id,
                     refs=refs_second,
-                    previous_release_id=previous_release.id,
+                    prev_release_id=previous_release.id,
                 )
 
         assert mock_compare_commits.call_count == 2
@@ -264,7 +264,7 @@
                 release_id=new_release.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=old_release.id,
+                prev_release_id=old_release.id,
             )
         count_query = ReleaseHeadCommit.objects.filter(release=new_release)
         # No release commits should be made as the task should return early.
@@ -297,7 +297,7 @@
         mock_compare_commits.side_effect = InvalidIdentity(identity=usa)
 
         fetch_commits(
-            release_id=release2.id, user_id=self.user.id, refs=refs, previous_release_id=release.id
+            release_id=release2.id, user_id=self.user.id, refs=refs, prev_release_id=release.id
         )
 
         mock_handle_invalid_identity.assert_called_once_with(identity=usa, commit_failure=True)
@@ -334,7 +334,7 @@
                 release_id=release2.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=release.id,
+                prev_release_id=release.id,
             )
 
         msg = mail.outbox[-1]
@@ -375,7 +375,7 @@
                 release_id=release2.id,
                 user_id=sentry_app.proxy_user_id,
                 refs=refs,
-                previous_release_id=release.id,
+                prev_release_id=release.id,
             )
 
         msg = mail.outbox[-1]
@@ -415,7 +415,7 @@
                 release_id=release2.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=release.id,
+                prev_release_id=release.id,
             )
 
         msg = mail.outbox[-1]
@@ -456,7 +456,7 @@
                 release_id=release2.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=release.id,
+                prev_release_id=release.id,
             )
 
         msg = mail.outbox[-1]

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 6bab863. Configure here.

@armenzg
Copy link
Copy Markdown
Member Author

armenzg commented Apr 8, 2026

@sentry review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Backend Test Failures

Failures on 563c738 in this run:

tests/sentry/tasks/test_commits.py::FetchCommitsTest::test_github_compare_commits_cache_flag_enabledlog
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/tasks/test_commits.py:190: in test_github_compare_commits_cache_flag_enabled
    assert mock_compare_commits.call_count == 1
E   AssertionError: assert 2 == 1
E    +  where 2 = <MagicMock name='compare_commits' id='140223302636880'>.call_count

@armenzg
Copy link
Copy Markdown
Member Author

armenzg commented Apr 8, 2026

@sentry review

@@ -1,12 +1,15 @@
from __future__ import annotations

import logging
from collections.abc import Mapping, Sequence
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding stronger typing for this PR to be a bit more certain of the changes.

user: RpcUser | None,
lifecycle: Any,
) -> list[dict[str, Any]]:
if cache_enabled:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else block is the new caching mechanism.

return f"fetch-commits:compare-commits:v1:{digest}"


def fetch_compare_commits(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of this PR, I'm extracting some of the code in the main for/loop into a couple of new functions (to make the final code easier to read):

else:
lifecycle.add_extra("compare_commits_cache_enabled", False)

if is_integration_repo_provider:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else comes from the main loop:

if is_integration_repo_provider:
repo_commits = provider.compare_commits(repo, start_sha, end_sha)
else:
repo_commits = provider.compare_commits(repo, start_sha, end_sha, actor=user)

ref: Mapping[str, str],
user_id: int,
) -> tuple[Repository, Any, bool, str] | None:
repo = (
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here comes from the main loop (no logic is changed):

repo = (
Repository.objects.filter(
organization_id=release.organization_id,
name=ref["repository"],
status=ObjectStatus.ACTIVE,
)
.order_by("-pk")
.first()
)
if not repo:
logger.info(
"repository.missing",
extra={
"organization_id": release.organization_id,
"user_id": user_id,
"repository": ref["repository"],
},
)
continue
is_integration_repo_provider = is_integration_provider(repo.provider)
binding_key = (
"integration-repository.provider"
if is_integration_repo_provider
else "repository.provider"
)
try:
provider_cls = bindings.get(binding_key).get(repo.provider)
except KeyError:
continue

provider = provider_cls(id=repo.provider)
provider_key = (
provider_cls.repo_provider
if is_integration_repo_provider
else provider_cls.auth_provider
)

continue
repo, provider, is_integration_repo_provider, provider_key = resolved
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the removed lines above are replaced with these four lines.

provider_cls.repo_provider
if is_integration_repo_provider
else provider_cls.auth_provider
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved within get_repo_and_provider_for_ref.

compare_commits_cache_enabled = (
github_compare_commits_cache_feature_enabled
and isinstance(provider_name, str)
and provider_name in GITHUB_CACHEABLE_REPOSITORY_PROVIDERS
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to mess with all the other integrations.

@armenzg armenzg changed the title feat(commits): Cache calls to compare-commits feat(releases): Cache calls to compare-commits Apr 8, 2026
prev_release_id=previous_release.id,
)

assert mock_compare_commits.call_count == 2
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the feature we call it twice.

{"id": end_sha, "repository": repo_name},
]

def _setup_github_compare_commits_cache_context(self):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper function will simplify a lot of the tests in this module.

@@ -44,7 +87,7 @@ def _test_simple_action(self, user, org):
release_id=release2.id,
user_id=user.id,
refs=refs,
previous_release_id=release.id,
prev_release_id=release.id,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug can be found in many of the tests in this file. The task does not have previous_release_id but prev_release_id:

def fetch_commits(release_id: int, user_id: int, refs, prev_release_id=None, **kwargs):

prev_release_id=previous_release.id,
)

assert mock_compare_commits.call_count == 1
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the feature flag is enabled, we only make one call.

assert mock_compare_commits.call_count == 1

@patch("sentry.integrations.github.repository.GitHubRepositoryProvider.compare_commits")
def test_github_compare_commits_cache_key_variance_on_end_sha(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test verifies that cache entries are keyed by commit range, not just repo/org.

fetch_commits(
release_id=first_release.id,
user_id=self.user.id,
refs=refs_first,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing refs_first.

fetch_commits(
release_id=second_release.id,
user_id=self.user.id,
refs=refs_second,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing refs_second.

prev_release_id=previous_release.id,
)

assert mock_compare_commits.call_count == 2
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They each will make a call.

@armenzg armenzg marked this pull request as ready for review April 8, 2026 20:57
@armenzg armenzg requested review from a team as code owners April 8, 2026 20:57
Copy link
Copy Markdown
Member

@billyvg billyvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -80,9 +192,14 @@ def handle_invalid_identity(identity, commit_failure=False):
silo_mode=SiloMode.CELL,
)
@retry(exclude=(Release.DoesNotExist, User.DoesNotExist))
def fetch_commits(release_id: int, user_id: int, refs, prev_release_id=None, **kwargs):
# TODO(dcramer): this function could use some cleanup/refactoring as it's a bit unwieldy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how old is this TODO :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol very!


lifecycle.add_extra("compare_commits_cache_hit", False)
else:
lifecycle.add_extra("compare_commits_cache_enabled", False)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is lifecycle?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something the integrations team added to add more context to what events happen in the lifecycle of an event.

end_sha: str,
) -> str:
digest = hash_values(
[organization_id, repository_id, provider or "", start_sha or "", end_sha],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it actually possible that provider is null or is this mostly a typing issue?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typing thing. It should be tighter higher up but I did not want to diverge this PR. A lot of the integrations code has lose typing.

@armenzg armenzg merged commit dbd31cb into master Apr 9, 2026
79 checks passed
@armenzg armenzg deleted the armenzg/fix/github-fetch-commits-cache branch April 9, 2026 12:32
george-sentry pushed a commit that referenced this pull request Apr 9, 2026
A customer has a large number of GitHub API rate limit emails coming in
and the highest referrer is compare-commits.

This branch aims to temporarily cache and protect from many incoming
calls to compare-commits and see if it solves the problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants