Skip to content

[ROCm] Route in-memory HSACO LoadKernel through LoadModuleFromHsaco#787

Draft
magaonka-amd wants to merge 1 commit intoROCm:mainfrom
magaonka-amd:fix/rocm-loadkernel-hsaco-refcount
Draft

[ROCm] Route in-memory HSACO LoadKernel through LoadModuleFromHsaco#787
magaonka-amd wants to merge 1 commit intoROCm:mainfrom
magaonka-amd:fix/rocm-loadkernel-hsaco-refcount

Conversation

@magaonka-amd
Copy link
Copy Markdown

  • LoadKernel previously filled in_memory_modules_ and kernel_to_gpu_binary_ without gpu_binary_to_module_ refcounts, so UnloadGpuBinary skipped unload and in_memory_modules_.erase for those kernels.
  • Call LoadModuleFromHsaco (same as LoadModule path) so refcounting and unload match CUDA LoadKernel -> LoadModuleFromCuBin behavior.

@magaonka-amd magaonka-amd added the claude-review Request a Claude AI code review for this PR label Apr 9, 2026
- LoadKernel previously filled in_memory_modules_ and kernel_to_gpu_binary_
  without gpu_binary_to_module_ refcounts, so UnloadGpuBinary skipped unload and
  in_memory_modules_.erase for those kernels.
- Call LoadModuleFromHsaco (same as LoadModule path) so refcounting and unload
  match CUDA LoadKernel -> LoadModuleFromCuBin behavior.
@magaonka-amd magaonka-amd force-pushed the fix/rocm-loadkernel-hsaco-refcount branch from f968316 to bbdd7d4 Compare April 9, 2026 00:07
}
TF_ASSIGN_OR_RETURN(ModuleHandle module_handle,
LoadModuleFromHsaco(hsaco, cubin.size()));
hipModule_t module = gpu_binary_to_module_.at(module_handle).first;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .at() will throw std::out_of_range if the key is missing. In practice this is safe because LoadModuleFromHsaco always inserts into gpu_binary_to_module_ before returning (line 783), and the CUDA equivalent (cuda_executor.cc:978) uses the identical pattern. Just noting for reviewers that this acts as a defensive assertion rather than a silent failure path.

Comment on lines +685 to +686
TF_ASSIGN_OR_RETURN(ModuleHandle module_handle,
LoadModuleFromHsaco(hsaco, cubin.size()));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core fix and looks correct. Previously LoadKernel populated in_memory_modules_ and kernel_to_gpu_binary_ but skipped gpu_binary_to_module_, so UnloadGpuBinary (line 627) would find no entry and return false without cleaning up. Routing through LoadModuleFromHsaco now correctly populates both gpu_binary_to_module_ (with refcount) and in_memory_modules_, making load/unload symmetric.

This also achieves full parity with the CUDA LoadKernel -> LoadModuleFromCuBin path (cuda_executor.cc:971-984).

@claude
Copy link
Copy Markdown

claude bot commented Apr 9, 2026

Claude Review Summary

Verdict: Looks good

Clean bug fix that routes in-memory HSACO LoadKernel through LoadModuleFromHsaco, fixing a refcount gap where gpu_binary_to_module_ was never populated — causing UnloadGpuBinary to silently fail. The fix achieves full parity with the CUDA LoadKernelLoadModuleFromCuBin path. Lock acquisition and data structure access are correct.

No blocking issues found. See inline comments for details.

@github-actions github-actions bot removed the claude-review Request a Claude AI code review for this PR label Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant