Amd zen4 support by harsdave · Pull Request #925 · flame/blis

harsdave · 2026-03-26T16:39:29Z

This patch introduces the 'zen4' configuration.

Key Changes:

Added 'zen4' configuration directory and base make_defs.mk.
Implemented an optimized daddv kernel (bli_daddv_zen4_int) using
AVX-512 intrinsics.
The daddv implementation utilizes:
- 8x/4x/2x unrolling for unit-stride vectors to maximize FMA throughput.
- AVX-512 masked loads/stores for tail (fringe) cases, eliminating
  the need for scalar fallback loops for non-unit multiples.
Initial configuration uses 'zen' fallbacks for remaining Level-1
kernels, which are scheduled for AVX-512 optimization in future updates.

This patch introduces the 'zen4' configuration. Key Changes: - Added 'zen4' configuration directory and base make_defs.mk. - Implemented an optimized daddv kernel (bli_daddv_zen4_int) using AVX-512 intrinsics. - The daddv implementation utilizes: * 8x/4x/2x unrolling for unit-stride vectors to maximize FMA throughput. * AVX-512 masked loads/stores for tail (fringe) cases, eliminating the need for scalar fallback loops for non-unit multiples. - Initial configuration uses 'zen' fallbacks for remaining Level-1 kernels, which are scheduled for AVX-512 optimization in future updates.

This commit introduces high-performance AVX-512 kernels for the SCALV and SETV operations, targeting the AMD Zen 4 architecture across S, D, and Z precisions. Key Changes: Instruction Set: Migrated core loops to use AVX-512 (ZMM) intrinsics to maximize data throughput. Throughput & Unrolling: * Implemented aggressive unrolling (e.g., 512 elements for ssetv, 48 complex elements for zscalv) to minimize loop overhead and saturate execution ports. Added logic for non-unit stride (incx != 1) fallback paths. Remainder Handling: Replaced manual scalar tail loops with AVX-512 masked loads/stores (_mm512_mask_storeu_ps/pd) for cleaner and more efficient fringe case processing. Precision Support: * Single (s), Double (d), and Double Complex (z) implementations added for both SCALV and SETV. Kernels Added: bli_sscalv_zen4_int, bli_dscalv_zen4_int, bli_zscalv_zen4_int bli_ssetv_zen4_int, bli_dsetv_zen4_int, bli_zsetv_zen4_int

harsdave added 5 commits March 26, 2026 21:38

Delete placeholder file "kernels/zen4/3/bli_dgemm_zen4_asm_8x24.c"

4125996

Update bli_cpuid.c

150ec2e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amd zen4 support#925

Amd zen4 support#925
harsdave wants to merge 5 commits intoflame:masterfrom
harsdave:amd-zen4-support

harsdave commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harsdave commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant