Skip to content

Use SHA256SUMS file for standalone python checksum validation#1655

Closed
thmahe wants to merge 2 commits intopypa:mainfrom
thmahe:bugfix/checksum-pbs
Closed

Use SHA256SUMS file for standalone python checksum validation#1655
thmahe wants to merge 2 commits intopypa:mainfrom
thmahe:bugfix/checksum-pbs

Conversation

@thmahe
Copy link
Copy Markdown

@thmahe thmahe commented Jul 25, 2025

Bugfix for #1652

  • Support new release content from python-build-standalone project
  • I have added a news fragment under changelog.d/ (if the patch affects the end users)

No entry in changelog.d/ since this patch is a bugfix.

Summary of changes

Test plan

Tested by running

$ pipx install --python 3.11 --fetch-missing-python

thmahe and others added 2 commits July 25, 2025 14:50
@thmahe thmahe changed the title Use SHA256SUMS file for standalone python checksum validation (#1652) Use SHA256SUMS file for standalone python checksum validation Jul 25, 2025
@13steinj
Copy link
Copy Markdown
Contributor

👋 I was just bit by the issue that this PR resolves.

Just a comment: I don't know what would be best, to use SHA256SUMS or, since the GitHub API already provides the digests as of June 3 it might instead be best to save the digest with the initial API call that populates the local index, and just check the digest against that, instead of separately downloading the SHA256SUMS file each time?

See something like my change on my branch: 13steinj@4bb24ed (I did this quick and dirty trying to figure out what was going on, before I stumbled upon this).

@thmahe
Copy link
Copy Markdown
Author

thmahe commented Jul 30, 2025

Hi @13steinj,

I initially started my work by considering using checksum computed by Github and available as you said in the initial API call that pipx is making.

Requires more changes and in the end, should we trust checksum provided by GitHub or the one provided by the project itself ?
Personally made my choice: small change-set & checksum from the people releasing python standalone.

@13steinj
Copy link
Copy Markdown
Contributor

Yeah just bringing up as an option in case maintainers prefer it. I just want something merged and a hotfix released sooner rather than later.

should we trust checksum provided by GitHub or the one provided by the project itself ?

Me personally, I don't know. I mean, if you trust automated checksums by the hosting platform you can argue you're more secure than the alternative. It would solve (similar, not the same) kind of issues that happened with the (relatively recent) xz supply chain attack.

@thmahe thmahe requested a review from dukecat0 August 13, 2025 07:52
@dukecat0
Copy link
Copy Markdown
Member

@thmahe @13steinj Thanks for your contributions! Personally I would prefer the solution provided by @13steinj as it's more efficient and an extra file is not required.
Also, with 13steinj@4bb24ed, I think it's a small change as well so it shouldn't be an issue. 👍

@13steinj
Copy link
Copy Markdown
Contributor

@dukecat0 I don't want to step on any toes here, am I good to clean my branch up a bit (mainly the whole link[0] thing which is odd instead of properly unpacking it to a descriptive var name) and make a PR?

I'd also like to:

  • confirm that older downloads than June 3 contain the digests as well (unclear to me) otherwise keep downloading the files separately as a fallback mechanism if github isn't providing a digest
  • tag the index/cache file downloaded with a simple incrementing schema number; force a re-download on either no schema number found or an old one found, re-download as well

@dukecat0
Copy link
Copy Markdown
Member

dukecat0 commented Aug 22, 2025

am I good to clean my branch up a bit (mainly the whole link[0] thing which is odd instead of properly unpacking it to a descriptive var name) and make a PR?

Yes sure, feel free to work on it! @thmahe Still thanks for your work on this and raising the issue!

confirm that older downloads than June 3 contain the digests as well (unclear to me) otherwise keep downloading the files separately as a fallback mechanism if github isn't providing a digest

Don't worry, it's not an issue. The code guarantees that the freshness of index is always within 30 days:

def get_or_update_index(use_cache: bool = True):
"""Get or update the index of available python builds from
the python-build-standalone repository."""
index_file = paths.ctx.standalone_python_cachedir / "index.json"
if use_cache and index_file.exists():
index = json.loads(index_file.read_text())
# update index after 30 days
fetched = datetime.datetime.fromtimestamp(index["fetched"])
if datetime.datetime.now() - fetched > datetime.timedelta(days=30):
index = {}
else:
index = {}
if not index:
releases = get_latest_python_releases()
index = {"fetched": datetime.datetime.now().timestamp(), "releases": releases}
# update index
index_file.write_text(json.dumps(index))
return index

@thmahe
Copy link
Copy Markdown
Author

thmahe commented Sep 2, 2025

Superseded by #1662

@thmahe thmahe closed this Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants