[ROCm] Route in-memory HSACO LoadKernel through LoadModuleFromHsaco by magaonka-amd · Pull Request #787 · ROCm/xla

magaonka-amd · 2026-04-09T00:05:06Z

LoadKernel previously filled in_memory_modules_ and kernel_to_gpu_binary_ without gpu_binary_to_module_ refcounts, so UnloadGpuBinary skipped unload and in_memory_modules_.erase for those kernels.
Call LoadModuleFromHsaco (same as LoadModule path) so refcounting and unload match CUDA LoadKernel -> LoadModuleFromCuBin behavior.

- LoadKernel previously filled in_memory_modules_ and kernel_to_gpu_binary_ without gpu_binary_to_module_ refcounts, so UnloadGpuBinary skipped unload and in_memory_modules_.erase for those kernels. - Call LoadModuleFromHsaco (same as LoadModule path) so refcounting and unload match CUDA LoadKernel -> LoadModuleFromCuBin behavior.

claude · 2026-04-09T00:16:02Z

xla/stream_executor/rocm/rocm_executor.cc

-    }
+    TF_ASSIGN_OR_RETURN(ModuleHandle module_handle,
+                        LoadModuleFromHsaco(hsaco, cubin.size()));
+    hipModule_t module = gpu_binary_to_module_.at(module_handle).first;


nit: .at() will throw std::out_of_range if the key is missing. In practice this is safe because LoadModuleFromHsaco always inserts into gpu_binary_to_module_ before returning (line 783), and the CUDA equivalent (cuda_executor.cc:978) uses the identical pattern. Just noting for reviewers that this acts as a defensive assertion rather than a silent failure path.

claude · 2026-04-09T00:16:04Z

xla/stream_executor/rocm/rocm_executor.cc

+    TF_ASSIGN_OR_RETURN(ModuleHandle module_handle,
+                        LoadModuleFromHsaco(hsaco, cubin.size()));


This is the core fix and looks correct. Previously LoadKernel populated in_memory_modules_ and kernel_to_gpu_binary_ but skipped gpu_binary_to_module_, so UnloadGpuBinary (line 627) would find no entry and return false without cleaning up. Routing through LoadModuleFromHsaco now correctly populates both gpu_binary_to_module_ (with refcount) and in_memory_modules_, making load/unload symmetric.

This also achieves full parity with the CUDA LoadKernel -> LoadModuleFromCuBin path (cuda_executor.cc:971-984).

claude · 2026-04-09T00:16:12Z

Claude Review Summary

Verdict: Looks good ✅

Clean bug fix that routes in-memory HSACO LoadKernel through LoadModuleFromHsaco, fixing a refcount gap where gpu_binary_to_module_ was never populated — causing UnloadGpuBinary to silently fail. The fix achieves full parity with the CUDA LoadKernel → LoadModuleFromCuBin path. Lock acquisition and data structure access are correct.

No blocking issues found. See inline comments for details.

magaonka-amd added the claude-review Request a Claude AI code review for this PR label Apr 9, 2026

magaonka-amd force-pushed the fix/rocm-loadkernel-hsaco-refcount branch from f968316 to bbdd7d4 Compare April 9, 2026 00:07

claude bot reviewed Apr 9, 2026

View reviewed changes

github-actions bot removed the claude-review Request a Claude AI code review for this PR label Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Route in-memory HSACO LoadKernel through LoadModuleFromHsaco#787

[ROCm] Route in-memory HSACO LoadKernel through LoadModuleFromHsaco#787
magaonka-amd wants to merge 1 commit intoROCm:mainfrom
magaonka-amd:fix/rocm-loadkernel-hsaco-refcount

magaonka-amd commented Apr 9, 2026

Uh oh!

claude bot Apr 9, 2026

Uh oh!

claude bot Apr 9, 2026

Uh oh!

claude bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		TF_ASSIGN_OR_RETURN(ModuleHandle module_handle,
		LoadModuleFromHsaco(hsaco, cubin.size()));

Conversation

magaonka-amd commented Apr 9, 2026

Uh oh!

claude bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 9, 2026

Claude Review Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant