[ROCm] Route in-memory HSACO LoadKernel through LoadModuleFromHsaco#787
[ROCm] Route in-memory HSACO LoadKernel through LoadModuleFromHsaco#787magaonka-amd wants to merge 1 commit intoROCm:mainfrom
Conversation
magaonka-amd
commented
Apr 9, 2026
- LoadKernel previously filled in_memory_modules_ and kernel_to_gpu_binary_ without gpu_binary_to_module_ refcounts, so UnloadGpuBinary skipped unload and in_memory_modules_.erase for those kernels.
- Call LoadModuleFromHsaco (same as LoadModule path) so refcounting and unload match CUDA LoadKernel -> LoadModuleFromCuBin behavior.
- LoadKernel previously filled in_memory_modules_ and kernel_to_gpu_binary_ without gpu_binary_to_module_ refcounts, so UnloadGpuBinary skipped unload and in_memory_modules_.erase for those kernels. - Call LoadModuleFromHsaco (same as LoadModule path) so refcounting and unload match CUDA LoadKernel -> LoadModuleFromCuBin behavior.
f968316 to
bbdd7d4
Compare
| } | ||
| TF_ASSIGN_OR_RETURN(ModuleHandle module_handle, | ||
| LoadModuleFromHsaco(hsaco, cubin.size())); | ||
| hipModule_t module = gpu_binary_to_module_.at(module_handle).first; |
There was a problem hiding this comment.
nit: .at() will throw std::out_of_range if the key is missing. In practice this is safe because LoadModuleFromHsaco always inserts into gpu_binary_to_module_ before returning (line 783), and the CUDA equivalent (cuda_executor.cc:978) uses the identical pattern. Just noting for reviewers that this acts as a defensive assertion rather than a silent failure path.
| TF_ASSIGN_OR_RETURN(ModuleHandle module_handle, | ||
| LoadModuleFromHsaco(hsaco, cubin.size())); |
There was a problem hiding this comment.
This is the core fix and looks correct. Previously LoadKernel populated in_memory_modules_ and kernel_to_gpu_binary_ but skipped gpu_binary_to_module_, so UnloadGpuBinary (line 627) would find no entry and return false without cleaning up. Routing through LoadModuleFromHsaco now correctly populates both gpu_binary_to_module_ (with refcount) and in_memory_modules_, making load/unload symmetric.
This also achieves full parity with the CUDA LoadKernel -> LoadModuleFromCuBin path (cuda_executor.cc:971-984).
Claude Review SummaryVerdict: Looks good ✅ Clean bug fix that routes in-memory HSACO No blocking issues found. See inline comments for details. |