[rocm-jaxlib-v0.8.2] Backporting CI Benchmark related changes and fixes#760
[rocm-jaxlib-v0.8.2] Backporting CI Benchmark related changes and fixes#760mmakevic-amd wants to merge 8 commits intorocm-jaxlib-v0.8.2from
Conversation
|
@hsharsha |
i-chaochen
left a comment
There was a problem hiding this comment.
once this PR is merged, are we going to have a benchmark CI on presubmit check? because seems I don't see this benchmark CI on presubmit check on 0.9.1 branch? for example this PR #756 maybe need to rebase or shall we create a new PR to check on 0.9.1?
No, only as a postsubmit. I can enable it as a presubmit no problem, but the whole workflow lasts ~40min, as you can see https://github.com/ROCm/xla/actions/workflows/postsubmit_benchmark.yml, so I'm not sure if we want that |
|
One can trigger it manually before merging if that's necessary |
I think we can choose label to activate benchmark CI, just like what we have in claude code review and TSAN/ASAN. And it's best to let it run on pre-submit. |
IMO, 40 mins is fine. However, I don't understand two things:
|
Motivation
Currently, CI benchmarks are failing on
v0.8.2. This PR fixes it by backporting changes from #691 and #730.Note: #622 will be closed in favour of this PR
Test Plan
I will manually trigger CI check before merging
Test Result
Workflow run as expected butgemma2failed due to device time being above threshold: https://github.com/ROCm/xla/actions/runs/23635202015/job/68844227492After unsetting HLO arg mode (from
uninitialized) job is running as expected: https://github.com/ROCm/xla/actions/runs/24037623275/job/70190205343Submission Checklist