Skip to content

Add CRC64NVME checksum support#633

Open
kdn36 wants to merge 6 commits intoapache:mainfrom
kdn36:feat_checksum_crc64
Open

Add CRC64NVME checksum support#633
kdn36 wants to merge 6 commits intoapache:mainfrom
kdn36:feat_checksum_crc64

Conversation

@kdn36
Copy link
Copy Markdown

@kdn36 kdn36 commented Feb 3, 2026

Which issue does this PR close?

Rationale for this change

Improve (AWS) S3 write performance without compromising object integrity.

What changes are included in this PR?

Key change is that the CRC64NVME checksum algorithm, which is the AWS default checksum algorithm, is added as a supported checksum variant.

The crc-fast crate is added for maximum performance.

Implementation note: if more checksum algorithms are to be supported, a minor refactor how checksum variants are handled would be in order.

Are there any user-facing changes?

The following storage_options key-value pair will be honored:

    "aws_checksum_algorithm": "CRC64NVME",

A typical use case would combine this with

    "aws_unsigned_payload": "true",

Kindly review, thanks!

/// SHA-256 algorithm.
SHA256,
/// CRC64-NVME algorithm.
CRC64NVME,
Copy link
Copy Markdown

@orlp orlp Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking change because this enum was (erroneously IMO) not marked as #[non_exhaustive]. It's up to the maintainer - I personally don't believe anyone is matching on Checksum, it would only break code that currently does such a match.

But @kdn36 you should definitely mark this enum as #[non_exhaustive].

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we would have to wait for the next breaking object store release. We could discuss making such a change and quickly release 0.14 (next breaking change) if needed

checksum_sha256,
};
quick_xml::se::to_string(&meta).unwrap()
let content_id = if let Some(checksum) = self.config.checksum {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So copy can read both checksum but will only write/propagate one?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no expert, but as I understand, we do not send a checksum on copy (write) since we do not have the bytes to calculate one. We do collect the response checksum(s), but only retain them if they match the configured checksum algorithm (we could also raise a warning or error, I think). The use case of specifying a checksum algorithm on copy is to change the checksum metadata as stored in S3.

However, I failed to properly test this locally, as minio does not properly support checksums on multipart copy. Instead, it errors out (minio/minio#17013).

Testing in CI and directly on AWS works as expected.

I added a test case where we modify the checksum algorithm as part of a copy. That works as expected. Unfortunately I could only verify this manually as I could not find an easy way to obtain the checksum metadata for programmatic verification (a simple head()..meta does not contain this metdata).

Again, I am new to this, and will happily stand corrected.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why the match block below just doesn't copy both checksums that you've extracted from the headers (you don't even have to compute it). Why do we strip one of them?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I was (falsely) assuming an unncessary level of strictness for the checkusm configuration setting.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kdn36 and @orlp and @crepererum

Do we have any benchmarks to show how much faster this is? I ask because I think that would help us tradeoff the breaking API and getting this out. For example, does writing a 4GB object with CRC64-NVME go 1% faster? 10% faster?

For example, I would have thought that the network connection was the bottleneck, but as the speed of networks has increased significantly compared to CPU advances in recent years I no longer know if this intuition is true

So TLDR is that this change seems like a good one to me, the only question is the timeline of the breaking change

/// SHA-256 algorithm.
SHA256,
/// CRC64-NVME algorithm.
CRC64NVME,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we would have to wait for the next breaking object store release. We could discuss making such a change and quickly release 0.14 (next breaking change) if needed

@@ -24,12 +24,15 @@ use std::str::FromStr;
pub enum Checksum {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and indeed CRC64NVME seems to be the default suggestion:

https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

Seems like it was added to the official SDK about a year ago:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to change the enum, perhaps we can add all the other supported checksums as well (we don't have to actually support them, but we can minimize API churn)

I think that would mean adding the following variants (with comments that they aren't yet supported)

CRC32
CRC32C
SHA1
MD5

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb If the enum is marked as #[non_exhaustive] as I recommended above then adding new variants isn't a breaking change. I personally don't think it's a good idea to add variants which aren't actually supported.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking it as #[non_exhaustive] would also be fine

(my rationale to add Variants that are not supported was to 🎣 for help supporting them)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. Changed to #[non-exhaustive]. If this goes through, adding more algorithm variants should be straightforward.

@orlp
Copy link
Copy Markdown

orlp commented Feb 5, 2026

@alamb I don't have any publicly available benchmarks, but @kdn36 measured that for one of our workflows in Polars Cloud it saved ~25% end-to-end on the entire query. Rather significant.

@orlp
Copy link
Copy Markdown

orlp commented Feb 5, 2026

But to put some paper napkin numbers to this... Due to the way the code is currently architectured (this might be fundamental if the hash has to be in the header, I don't know about that), the hash must be known before the object is uploaded. This means hashing time is added to upload time.

SHA256 with hardware acceleration on Intel seems to be around 2GB / sec (may vary a bit depending on machine, but let's assume this). So hashing 4GB takes 2 seconds. crc-fast-rust claims speeds of 100+ GB/s. So let's assume 100GB / s, in which case hashing 4GB takes 0.04 seconds.

With a 10 Gbit / s connection (fairly standard for a powerful cloud machine) uploading a 4GB object takes 3.2 seconds. So the total time taken (checksum + upload):

  • SHA256: 5.2 seconds.
  • CRC64NVME: 3.24 seconds.

It becomes even more crooked if you have more powerful setups with 50 Gbit/s or 100 Gbit/s networking.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 5, 2026

I did some research on this crc-fast crate: https://crates.io/crates/crc-fast

It does look primarily the work of one individual which is always a little concerning from a maintainability perspective: https://github.com/awesomized/crc-fast-rust

However it seems that the official aws sdk uses this crate (via aws-smithy-checksums which is a pretty good endorsement in my mind https://crates.io/crates/crc-fast/reverse_dependencies

Thus I think it is ok to add it as a dependency

@crepererum
Copy link
Copy Markdown
Contributor

Regarding the crate of choice: Since we -- I think -- have decided that this is a breaking change, I think we should merge this AFTER #585 and hook up the crc calculation into the crypto provider framework.

@orlp
Copy link
Copy Markdown

orlp commented Feb 5, 2026

@crepererum I don't see why, CRC has nothing to do with cryptography.

@crepererum
Copy link
Copy Markdown
Contributor

I don't see why, CRC has nothing to do with cryptography.

If you look at the PR, the abstraction for algorithms that that is in there could be extended to cover CRC as well and may provide people an easier way to pick whatever crc lib they want. The discussion above makes it clear that there isn't such an obvious choice on that front anyways.

@orlp
Copy link
Copy Markdown

orlp commented Feb 6, 2026

@crepererum There are legitimate reasons for wanting to choose your crypto provider (you're already using provider X, you only trust provider X, you have security certification for provider X, etc). Especially the last one is important.

For CRC only the first reason is applicable, and I don't think that's good enough of a reason by itself. Plus, again, CRC isn't a cryptographic algorithm and thus has no place in a crypto provider.

@crepererum
Copy link
Copy Markdown
Contributor

Well, it was a suggestion, but it's not a hill I'm gonna die on though 😉

@kdn36 kdn36 marked this pull request as draft February 9, 2026 16:00
@kdn36
Copy link
Copy Markdown
Author

kdn36 commented Feb 9, 2026

On performance:
(1) On a local dev machine (minio same-host, backed by memory tmpfs, so no IO latency), the time to sink 2.3 GB drops from ~1.7s (SHA256) to ~1.3s (CRC64NVME).
(2) On AWS, using a high-end instance (100 Gbps NIC), on standard S3, we are pushing almost 2 GB/s write for a 11.2 GB file. Results (in seconds, sorted by median)

Checksum Type | Best    | Median | Worst
--------------|---------|--------|-------
unsigned      | 5.22    | 7.14   | 8.41
crc64nvme     | 4.41    | 7.32   | 8.95
sha256        | 5.35    | 7.83   | 9.57
default       | 5.54    | 8.20   | 10.51

(where defaults = signed + sha256 checksum, unsigned = not signed no checksum, sha256 = not signed but sha256 checksum, and crc64nvme = not signed but crc64nvme checksum).
Note that the results on AWS have high variance, which is not unexpected given that this is in a shared production environment. The results are less pronounced because of IO, but keep in mind we keep optimizing the IO stack.

@kdn36 kdn36 marked this pull request as ready for review February 9, 2026 19:56
@crepererum
Copy link
Copy Markdown
Contributor

OK, I think the impl. is good, the choice of the CRC crate is OK (I mean we can always change it, it's not part of the public API), and I do agree that for some users the perf win is a worth it. Now sadly this is a breaking change and hence I would like to wait until the next major release. Since we don't have a super strict release schedule: how urgent do you need this?

@kdn36
Copy link
Copy Markdown
Author

kdn36 commented Feb 11, 2026

Thank you for your support.

We would like to see it as soon as we can reasonably get it; without taking on excessive risk or effort. For context, IO is critical for our distributed query performance and an area of active development.

@crepererum crepererum added next-major-release the PR has API changes and it waiting on the next major version api-change labels Feb 11, 2026
@alamb alamb requested a review from orlp February 11, 2026 20:10
Copy link
Copy Markdown

@orlp orlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks, other than that looks good.

@kdn36 kdn36 requested a review from orlp February 25, 2026 14:57
@alamb alamb requested a review from crepererum February 27, 2026 19:50
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 27, 2026

@crepererum any chance you have some time to give this a final review?

@kdn36
Copy link
Copy Markdown
Author

kdn36 commented Mar 9, 2026

Do we have an eta for the release? Thanks!

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 11, 2026

Do we have an eta for the release? Thanks!

Since it adds an API change it looks to me like this PR will have to wait for the next major release . It seems like @crepererum is tracking with

@kdn36 kdn36 force-pushed the feat_checksum_crc64 branch from f90cb2b to e49a372 Compare April 7, 2026 08:44
@kdn36
Copy link
Copy Markdown
Author

kdn36 commented Apr 7, 2026

Is there an option to trigger a CI test run ourselves? If not, would someone mind triggering a test run? Thanks!

@crepererum
Copy link
Copy Markdown
Contributor

I'm still confused about the copy behavior, see https://github.com/apache/arrow-rs-object-store/pull/633/changes#r3044086641

@kdn36
Copy link
Copy Markdown
Author

kdn36 commented Apr 7, 2026

I'm still confused about the copy behavior, see https://github.com/apache/arrow-rs-object-store/pull/633/changes#r3044086641

Updated after re-review.

Note and fwiw, I oberved different behavior between AWS and Localstack, in that AWS will add a Checksum header for the Copy path, but Localstack will not.

With

$ cargo test --features aws copy_multipart_file_with_signature -- --no-capture

AWS:

$ cat debug_aws.out
running 2 tests
[src/aws/client.rs:714:9] "start put_part" = "start put_part"
[src/aws/client.rs:769:9] &checksum_sha256 = Some(
    "kZLCW3NPy62+MtrcKAicYNsOOfkMwgzi5XM/VyYazAw=",
)
[src/aws/client.rs:770:9] &checksum_crc64nvme = None
[src/aws/client.rs:714:9] "start put_part" = "start put_part"
[src/aws/client.rs:769:9] &checksum_sha256 = None
[src/aws/client.rs:770:9] &checksum_crc64nvme = Some(
    "m0nypWpUsto=",
)
test aws::tests::copy_multipart_file_with_signature_change_checksum ... ok
[src/aws/client.rs:714:9] "start put_part" = "start put_part"
[src/aws/client.rs:769:9] &checksum_sha256 = None
[src/aws/client.rs:770:9] &checksum_crc64nvme = Some(
    "m0nypWpUsto=",
)
test aws::tests::copy_multipart_file_with_signature ... ok

Localstack:

$ cat debug_localstack.out
running 2 tests
[src/aws/client.rs:714:9] "start put_part" = "start put_part"
[src/aws/client.rs:769:9] &checksum_sha256 = None
[src/aws/client.rs:770:9] &checksum_crc64nvme = None
[src/aws/client.rs:714:9] "start put_part" = "start put_part"
[src/aws/client.rs:769:9] &checksum_sha256 = None
[src/aws/client.rs:770:9] &checksum_crc64nvme = None
test aws::tests::copy_multipart_file_with_signature_change_checksum ... ok
[src/aws/client.rs:714:9] "start put_part" = "start put_part"
[src/aws/client.rs:769:9] &checksum_sha256 = None
[src/aws/client.rs:770:9] &checksum_crc64nvme = None
test aws::tests::copy_multipart_file_with_signature ... ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change next-major-release the PR has API changes and it waiting on the next major version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support CRC checksum

4 participants