feat(releases): Cache calls to compare-commits by armenzg · Pull Request #112494 · getsentry/sentry

armenzg · 2026-04-08T18:10:29Z

A customer has a large number of GitHub API rate limit emails coming in and the highest referrer is compare-commits.

This branch aims to temporarily cache and protect from many incoming calls to compare-commits and see if it solves the problem.

- Added a new org feature flag in `src/sentry/features/temporary.py`: - `organizations:integrations-github-fetch-commits-compare-cache` - Refactored commit fetching in `src/sentry/tasks/commits.py`: - Added `fetch_compare_commits(...)` helper to centralize compare-commits behavior. - Added cache key generation with `get_github_compare_commits_cache_key(...)`. - Added cache-backed reuse of GitHub compare-commits results (TTL: 120s), gated by the new feature flag. - Limited caching to GitHub providers (`integrations:github`, `integrations:github_enterprise`) and only when `start_sha` is present. - Added lifecycle extras for cache telemetry (`compare_commits_cache_enabled`, `compare_commits_cache_hit`). - `fetch_commits(...)` now evaluates the feature flag once and routes compare calls through the helper. - Added provider-level test coverage in `tests/sentry/integrations/github/test_repository.py`: - New test verifies repeated compare calls reuse cached patchset data and reduce API calls. - Added task-level cache behavior tests in `tests/sentry/tasks/test_commits.py`: - Cache disabled: compare called on each fetch. - Cache enabled: compare result reused across releases with the same compare range. - Cache key variance: different `end_sha` values do not share cache entries. - Included a follow-up typing fix in `src/sentry/tasks/commits.py`: - Narrowed `repo.provider` to `str` before cache-key creation to satisfy mypy (`str | None` -> `str`).

src/sentry/tasks/commits.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Provider parameter overwritten with string, breaking compare_commits calls
- Renamed repo.provider assignment to repo_provider_name to preserve the provider object parameter for compare_commits calls.

Or push these changes by commenting:

@cursor push 24519ed48a

Preview (24519ed48a)

diff --git a/src/sentry/tasks/commits.py b/src/sentry/tasks/commits.py
--- a/src/sentry/tasks/commits.py
+++ b/src/sentry/tasks/commits.py
@@ -95,15 +95,15 @@
     lifecycle,
 ):
     cache_key = None
-    provider = repo.provider
+    repo_provider_name = repo.provider
     if (
         cache_enabled
-        and isinstance(provider, str)
-        and provider in GITHUB_CACHEABLE_REPOSITORY_PROVIDERS
+        and isinstance(repo_provider_name, str)
+        and repo_provider_name in GITHUB_CACHEABLE_REPOSITORY_PROVIDERS
         and start_sha is not None
     ):
         cache_key = get_github_compare_commits_cache_key(
-            repo.organization_id, repo.id, provider, start_sha, end_sha
+            repo.organization_id, repo.id, repo_provider_name, start_sha, end_sha
         )
 
     if cache_key is not None:

_{You can send follow-ups to the cloud agent here.}

src/sentry/tasks/commits.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Tests pass wrong kwarg, silently breaking cache validation
- Changed all 13 occurrences of 'previous_release_id=' to 'prev_release_id=' to match the actual function parameter name, enabling proper cache validation testing.

Or push these changes by commenting:

@cursor push c275806d00

Preview (c275806d00)

diff --git a/tests/sentry/tasks/test_commits.py b/tests/sentry/tasks/test_commits.py
--- a/tests/sentry/tasks/test_commits.py
+++ b/tests/sentry/tasks/test_commits.py
@@ -52,7 +52,7 @@
                     release_id=release2.id,
                     user_id=user.id,
                     refs=refs,
-                    previous_release_id=release.id,
+                    prev_release_id=release.id,
                 )
 
         commit_list = list(
@@ -129,13 +129,13 @@
                 release_id=first_release.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=previous_release.id,
+                prev_release_id=previous_release.id,
             )
             fetch_commits(
                 release_id=second_release.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=previous_release.id,
+                prev_release_id=previous_release.id,
             )
 
         assert mock_compare_commits.call_count == 2
@@ -178,13 +178,13 @@
                     release_id=first_release.id,
                     user_id=self.user.id,
                     refs=refs,
-                    previous_release_id=previous_release.id,
+                    prev_release_id=previous_release.id,
                 )
                 fetch_commits(
                     release_id=second_release.id,
                     user_id=self.user.id,
                     refs=refs,
-                    previous_release_id=previous_release.id,
+                    prev_release_id=previous_release.id,
                 )
 
         assert mock_compare_commits.call_count == 1
@@ -231,13 +231,13 @@
                     release_id=first_release.id,
                     user_id=self.user.id,
                     refs=refs_first,
-                    previous_release_id=previous_release.id,
+                    prev_release_id=previous_release.id,
                 )
                 fetch_commits(
                     release_id=second_release.id,
                     user_id=self.user.id,
                     refs=refs_second,
-                    previous_release_id=previous_release.id,
+                    prev_release_id=previous_release.id,
                 )
 
         assert mock_compare_commits.call_count == 2
@@ -264,7 +264,7 @@
                 release_id=new_release.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=old_release.id,
+                prev_release_id=old_release.id,
             )
         count_query = ReleaseHeadCommit.objects.filter(release=new_release)
         # No release commits should be made as the task should return early.
@@ -297,7 +297,7 @@
         mock_compare_commits.side_effect = InvalidIdentity(identity=usa)
 
         fetch_commits(
-            release_id=release2.id, user_id=self.user.id, refs=refs, previous_release_id=release.id
+            release_id=release2.id, user_id=self.user.id, refs=refs, prev_release_id=release.id
         )
 
         mock_handle_invalid_identity.assert_called_once_with(identity=usa, commit_failure=True)
@@ -334,7 +334,7 @@
                 release_id=release2.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=release.id,
+                prev_release_id=release.id,
             )
 
         msg = mail.outbox[-1]
@@ -375,7 +375,7 @@
                 release_id=release2.id,
                 user_id=sentry_app.proxy_user_id,
                 refs=refs,
-                previous_release_id=release.id,
+                prev_release_id=release.id,
             )
 
         msg = mail.outbox[-1]
@@ -415,7 +415,7 @@
                 release_id=release2.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=release.id,
+                prev_release_id=release.id,
             )
 
         msg = mail.outbox[-1]
@@ -456,7 +456,7 @@
                 release_id=release2.id,
                 user_id=self.user.id,
                 refs=refs,
-                previous_release_id=release.id,
+                prev_release_id=release.id,
             )
 
         msg = mail.outbox[-1]

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 6bab863. Configure here.}

tests/sentry/tasks/test_commits.py

armenzg · 2026-04-08T20:29:18Z

@sentry review

github-actions · 2026-04-08T20:29:49Z

Backend Test Failures

Failures on 563c738 in this run:

tests/sentry/tasks/test_commits.py::FetchCommitsTest::test_github_compare_commits_cache_flag_enabled — log

[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/tasks/test_commits.py:190: in test_github_compare_commits_cache_flag_enabled
    assert mock_compare_commits.call_count == 1
E   AssertionError: assert 2 == 1
E    +  where 2 = <MagicMock name='compare_commits' id='140223302636880'>.call_count

src/sentry/tasks/commits.py

armenzg · 2026-04-08T20:39:46Z

@sentry review

armenzg · 2026-04-08T20:17:59Z

src/sentry/tasks/commits.py

@@ -1,12 +1,15 @@
 from __future__ import annotations

 import logging
+from collections.abc import Mapping, Sequence


I'm adding stronger typing for this PR to be a bit more certain of the changes.

armenzg · 2026-04-08T20:20:14Z

src/sentry/tasks/commits.py

+    user: RpcUser | None,
+    lifecycle: Any,
+) -> list[dict[str, Any]]:
+    if cache_enabled:


This if/else block is the new caching mechanism.

armenzg · 2026-04-08T20:22:41Z

src/sentry/tasks/commits.py

+    return f"fetch-commits:compare-commits:v1:{digest}"
+
+
+def fetch_compare_commits(


As part of this PR, I'm extracting some of the code in the main for/loop into a couple of new functions (to make the final code easier to read):

fetch_compare_commits

get_repo_and_provider_for_ref

sentry/src/sentry/tasks/commits.py

Line 100 in 82814b2

for ref in refs:

armenzg · 2026-04-08T20:23:12Z

src/sentry/tasks/commits.py

+    else:
+        lifecycle.add_extra("compare_commits_cache_enabled", False)
+
+    if is_integration_repo_provider:


This if/else comes from the main loop:

sentry/src/sentry/tasks/commits.py

Lines 174 to 177 in 82814b2

if is_integration_repo_provider:

repo_commits = provider.compare_commits(repo, start_sha, end_sha)

else:

repo_commits = provider.compare_commits(repo, start_sha, end_sha, actor=user)

armenzg · 2026-04-08T20:24:13Z

src/sentry/tasks/commits.py

+    ref: Mapping[str, str],
+    user_id: int,
+) -> tuple[Repository, Any, bool, str] | None:
+    repo = (


The code here comes from the main loop (no logic is changed):

sentry/src/sentry/tasks/commits.py

Lines 101 to 130 in 82814b2

repo = (

Repository.objects.filter(

organization_id=release.organization_id,

name=ref["repository"],

status=ObjectStatus.ACTIVE,

)

.order_by("-pk")

.first()

)

if not repo:

logger.info(

"repository.missing",

extra={

"organization_id": release.organization_id,

"user_id": user_id,

"repository": ref["repository"],

},

)

continue

is_integration_repo_provider = is_integration_provider(repo.provider)

binding_key = (

"integration-repository.provider"

if is_integration_repo_provider

else "repository.provider"

)

try:

provider_cls = bindings.get(binding_key).get(repo.provider)

except KeyError:

continue

sentry/src/sentry/tasks/commits.py

Lines 149 to 155 in 82814b2

provider = provider_cls(id=repo.provider)

provider_key = (

provider_cls.repo_provider

if is_integration_repo_provider

else provider_cls.auth_provider

)

armenzg · 2026-04-08T20:25:22Z

src/sentry/tasks/commits.py

            continue
+        repo, provider, is_integration_repo_provider, provider_key = resolved


All the removed lines above are replaced with these four lines.

armenzg · 2026-04-08T20:25:35Z

src/sentry/tasks/commits.py

-            provider_cls.repo_provider
-            if is_integration_repo_provider
-            else provider_cls.auth_provider
-        )


Moved within get_repo_and_provider_for_ref.

armenzg · 2026-04-08T20:26:04Z

src/sentry/tasks/commits.py

+                compare_commits_cache_enabled = (
+                    github_compare_commits_cache_feature_enabled
+                    and isinstance(provider_name, str)
+                    and provider_name in GITHUB_CACHEABLE_REPOSITORY_PROVIDERS


I don't want to mess with all the other integrations.

armenzg · 2026-04-08T20:42:42Z

tests/sentry/tasks/test_commits.py

+                prev_release_id=previous_release.id,
+            )
+
+        assert mock_compare_commits.call_count == 2


Without the feature we call it twice.

armenzg · 2026-04-08T20:47:13Z

tests/sentry/tasks/test_commits.py

+            {"id": end_sha, "repository": repo_name},
+        ]
+
+    def _setup_github_compare_commits_cache_context(self):


This helper function will simplify a lot of the tests in this module.

armenzg · 2026-04-08T20:48:33Z

tests/sentry/tasks/test_commits.py

@@ -44,7 +87,7 @@ def _test_simple_action(self, user, org):
                    release_id=release2.id,
                    user_id=user.id,
                    refs=refs,
-                    previous_release_id=release.id,
+                    prev_release_id=release.id,


This bug can be found in many of the tests in this file. The task does not have previous_release_id but prev_release_id:

sentry/src/sentry/tasks/commits.py

Line 83 in fb92eb8

def fetch_commits(release_id: int, user_id: int, refs, prev_release_id=None, **kwargs):

armenzg · 2026-04-08T20:49:28Z

tests/sentry/tasks/test_commits.py

+                    prev_release_id=previous_release.id,
+                )
+
+        assert mock_compare_commits.call_count == 1


When the feature flag is enabled, we only make one call.

armenzg · 2026-04-08T20:50:38Z

tests/sentry/tasks/test_commits.py

+        assert mock_compare_commits.call_count == 1
+
+    @patch("sentry.integrations.github.repository.GitHubRepositoryProvider.compare_commits")
+    def test_github_compare_commits_cache_key_variance_on_end_sha(


This test verifies that cache entries are keyed by commit range, not just repo/org.

armenzg · 2026-04-08T20:53:50Z

tests/sentry/tasks/test_commits.py

+                fetch_commits(
+                    release_id=first_release.id,
+                    user_id=self.user.id,
+                    refs=refs_first,


Passing refs_first.

armenzg · 2026-04-08T20:53:57Z

tests/sentry/tasks/test_commits.py

+                fetch_commits(
+                    release_id=second_release.id,
+                    user_id=self.user.id,
+                    refs=refs_second,


Passing refs_second.

armenzg · 2026-04-08T20:54:06Z

tests/sentry/tasks/test_commits.py

+                    prev_release_id=previous_release.id,
+                )
+
+        assert mock_compare_commits.call_count == 2


They each will make a call.

billyvg

lgtm

billyvg · 2026-04-08T21:16:11Z

src/sentry/tasks/commits.py

@@ -80,9 +192,14 @@ def handle_invalid_identity(identity, commit_failure=False):
    silo_mode=SiloMode.CELL,
 )
 @retry(exclude=(Release.DoesNotExist, User.DoesNotExist))
-def fetch_commits(release_id: int, user_id: int, refs, prev_release_id=None, **kwargs):
-    # TODO(dcramer): this function could use some cleanup/refactoring as it's a bit unwieldy


how old is this TODO :)

billyvg · 2026-04-08T21:17:25Z

src/sentry/tasks/commits.py

+
+        lifecycle.add_extra("compare_commits_cache_hit", False)
+    else:
+        lifecycle.add_extra("compare_commits_cache_enabled", False)


what is lifecycle?

It's something the integrations team added to add more context to what events happen in the lifecycle of an event.

billyvg · 2026-04-08T21:19:45Z

src/sentry/tasks/commits.py

+    end_sha: str,
+) -> str:
+    digest = hash_values(
+        [organization_id, repository_id, provider or "", start_sha or "", end_sha],


is it actually possible that provider is null or is this mostly a typing issue?

A typing thing. It should be tighter higher up but I did not want to diverge this PR. A lot of the integrations code has lose typing.

A customer has a large number of GitHub API rate limit emails coming in and the highest referrer is compare-commits. This branch aims to temporarily cache and protect from many incoming calls to compare-commits and see if it solves the problem.

armenzg self-assigned this Apr 8, 2026

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 8, 2026

armenzg force-pushed the armenzg/fix/github-fetch-commits-cache branch from 88b35be to 6b1a01d Compare April 8, 2026 18:12

sentry-warden bot reviewed Apr 8, 2026

View reviewed changes

src/sentry/tasks/commits.py Outdated Show resolved Hide resolved

cursor bot reviewed Apr 8, 2026

View reviewed changes

src/sentry/tasks/commits.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview April 8, 2026 18:15 View deployment

This comment was marked as outdated.

Sign in to view

armenzg added 2 commits April 8, 2026 16:15

More changes

c4b15e3

More changes

6bab863

vercel bot deployed to Preview April 8, 2026 20:19 View deployment

cursor bot reviewed Apr 8, 2026

View reviewed changes

tests/sentry/tasks/test_commits.py Outdated Show resolved Hide resolved

Fix bug

2dda0cd

sentry bot reviewed Apr 8, 2026

View reviewed changes

src/sentry/tasks/commits.py Outdated Show resolved Hide resolved

Address hash problem

1df104b

armenzg commented Apr 8, 2026

View reviewed changes

armenzg changed the title ~~feat(commits): Cache calls to compare-commits~~ feat(releases): Cache calls to compare-commits Apr 8, 2026

vercel bot deployed to Preview April 8, 2026 20:42 View deployment

Simplify tests

e192014

vercel bot deployed to Preview April 8, 2026 20:50 View deployment

armenzg commented Apr 8, 2026

View reviewed changes

armenzg marked this pull request as ready for review April 8, 2026 20:57

armenzg requested review from a team as code owners April 8, 2026 20:57

armenzg requested review from GabeVillalobos and trevor-e April 8, 2026 20:57

billyvg approved these changes Apr 8, 2026

View reviewed changes

armenzg merged commit dbd31cb into master Apr 9, 2026
79 checks passed

armenzg deleted the armenzg/fix/github-fetch-commits-cache branch April 9, 2026 12:32

		return f"fetch-commits:compare-commits:v1:{digest}"


		def fetch_compare_commits(

	if is_integration_repo_provider:
	repo_commits = provider.compare_commits(repo, start_sha, end_sha)
	else:
	repo_commits = provider.compare_commits(repo, start_sha, end_sha, actor=user)

	repo = (
	Repository.objects.filter(
	organization_id=release.organization_id,
	name=ref["repository"],
	status=ObjectStatus.ACTIVE,
	)
	.order_by("-pk")
	.first()
	)
	if not repo:
	logger.info(
	"repository.missing",
	extra={
	"organization_id": release.organization_id,
	"user_id": user_id,
	"repository": ref["repository"],
	},
	)
	continue

	is_integration_repo_provider = is_integration_provider(repo.provider)
	binding_key = (
	"integration-repository.provider"
	if is_integration_repo_provider
	else "repository.provider"
	)
	try:
	provider_cls = bindings.get(binding_key).get(repo.provider)
	except KeyError:
	continue

	provider = provider_cls(id=repo.provider)

	provider_key = (
	provider_cls.repo_provider
	if is_integration_repo_provider
	else provider_cls.auth_provider
	)

		continue
		repo, provider, is_integration_repo_provider, provider_key = resolved

Uh oh!

Conversation

armenzg commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

armenzg commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Backend Test Failures

Uh oh!

Uh oh!

armenzg commented Apr 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

billyvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

armenzg commented Apr 8, 2026 •

edited

Loading

cursor bot left a comment •

edited

Loading

cursor bot left a comment •

edited

Loading