nvproxy: reject opaque GSP legacy and NV2081_BINAPI control forwarding#12921
nvproxy: reject opaque GSP legacy and NV2081_BINAPI control forwarding#12921ibondarenko1 wants to merge 1 commit intogoogle:masterfrom
Conversation
Previously, RM control commands with RM_GSS_LEGACY_MASK (bit 15) set or NV2081_BINAPI class were forwarded to the host NVIDIA driver via rmControlSimple(), which copies up to 1MB of guest-controlled opaque bytes without content validation. While the typed handler map (controlCmd) validates parameters for all 183 known control commands, these two paths bypassed validation entirely. This is inconsistent with gVisor's defense-in-depth approach, where tpuproxy (TPU/VFIO passthrough) validates ALL ioctl parameters with typed handlers and rejects unknown commands. This change rejects GSP legacy and NV2081_BINAPI controls with NV_ERR_NOT_SUPPORTED instead of forwarding opaque bytes. Standard CUDA/ML workloads should not be affected, as these are deprecated/undocumented interfaces. If legitimate use cases are identified, specific commands can be allowlisted with typed parameter validation. Security impact: reduces the attack surface exposed to sandboxed workloads by preventing arbitrary opaque data from reaching the host NVIDIA kernel driver's GSP handler.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Hi @ibondarenko1. Thanks for the patch. In general, I agree with the motivation and direction of this patch and this will definitely reduce the driver surface we expose to the application. We aim to expose as little as possible to get most CUDA workloads running.
I hope this is true. I have not tested this explicitly, but at least in the 525 driver, NV2081_BINAPI commands were still being used. Now most users are on 580+ in GKE. So the user-mode drivers might have been updated by now.
Per the comment, these control commands are undocumented so we can't do the typed parameter validation. @nixprime what do you think? |
Previously, RM control commands with RM_GSS_LEGACY_MASK (bit 15) set or NV2081_BINAPI class were forwarded to the host NVIDIA driver via rmControlSimple(), which copies up to 1MB of guest-controlled opaque bytes without content validation.
While the typed handler map (controlCmd) validates parameters for all 183 known control commands, these two paths bypassed validation entirely. This is inconsistent with gVisor's defense-in-depth approach, where tpuproxy (TPU/VFIO passthrough) validates ALL ioctl parameters with typed handlers and rejects unknown commands.
This change rejects GSP legacy and NV2081_BINAPI controls with NV_ERR_NOT_SUPPORTED instead of forwarding opaque bytes. Standard CUDA/ML workloads should not be affected, as these are deprecated/undocumented interfaces. If legitimate use cases are identified, specific commands can be allowlisted with typed parameter validation.
Security impact: reduces the attack surface exposed to sandboxed workloads by preventing arbitrary opaque data from reaching the host NVIDIA kernel driver's GSP handler.