Sort values for cuda_compute_capabilities templates #5144
Sort values for cuda_compute_capabilities templates #5144Flamefire wants to merge 4 commits intoeasybuilders:developfrom
Conversation
There have been instances where the order matters so sort the values. Even if it doesn't solve some issues it will at least be consistent.
813791a to
63e286f
Compare
63e286f to
e72cc62
Compare
|
Although this makes sense in general, strictly speaking this is a backwards-incompatible change. Don't we have easyblock that make assumptions on the order of CUDA compute capabilities that are specified, where first listed is the preferred one to use (there are cases where using multiple ones isn't possible)? |
|
IIRC we only sort and take lowest or highest. |
|
@casparvl Thoughts on this? |
|
I like the idea of this PR. Right now, it's up to the easyblock to do something sensible with this list of compute capabilities. E.g. the LAMMPS easyblock does some sorting https://github.com/easybuilders/easybuild-easyblocks/blob/b01ea7afd303b0456f860588f9e1c6acc44509a6/easybuild/easyblocks/l/lammps.py#L334 but indeed seems to suffer from the issue mentioned by @Flamefire : it calls It is much nicer if the framework takes care of this. If this is documented properly (which is done in this PR), the EasyBlock knows what to expect, and doesn't need to do any sorting itself (reducing the potential for error on this). That being said, @boegel is also right that this is, at least potentially, a breaking change. After merging this PR and easybuilders/easybuild-easyblocks#4092 , Lammps would be build with I haven't made the analysis for all the other easyblocks, i.e. what would be the before vs after situation. We should also realize that this likely only affects software that can only be build for a single CUDA CC, since in that case some choice has to be made from that list. Sites that use a list of CUDA CCs probably build in some common prefix, and use that on node types with diverse GPU archs - but that usage is already broken today for this subset of software, since it simply cant be build for multiple targets. So it's very well possible these sites have already made exceptions for this, and build these particular packages in an architecture-specific prefix. So yes: in theory this could lead to a breaking change. But in practice, it would surprise me if anyone is really "hit" by this in practice. In summary: I would consider this acceptable, provided we document this very clearly in the release notes. |
casparvl
left a comment
There was a problem hiding this comment.
We should probably also change the help description for --cuda-compute-capabilities and state that the order in which CCs are specified does not matter - just to make this behavior explicit.
|
Just to be clear: it's game over for EasyBuild v5.3.0, this will need to wait until the release after that... |
|
@casparvl Done. Also replaced "List" by "Set" to emphasize this. |
There have been instances where the order matters so sort the values.
Even if it doesn't solve some issues it will at least be consistent.
E.g.
--cuda-compute-capabilities=7.5,8.0should be the same as--cuda-compute-capabilities=8.0,7.5This is also important to avoid issues with Blackwell that introduced CUDA CC 10.x which when sorting naively will be incorrectly the "lowest" CC.
See companion easyblock PR:
get_cuda_cc_template_valueinstead of raw values. easybuild-easyblocks#4092