qemu/target-arm-Fix-SVE-SDOT-UDOT-USDOT-4-way-indexed.patch
Jiabo Feng 7a16948063 QEMU update to version 8.2.0-24:
- ppc/xive: Fix ESB length overflow on 32-bit hosts
- target/hppa: Fix PSW V-bit packaging in cpu_hppa_get for hppa64
- target/ppc: Fix migration of CPUs with TLB_EMB TLB type
- target/arm: Clear high SVE elements in handle_vec_simd_wshli
- module: Prevent crash by resetting local_err in module_load_qom_all()
- tests/docker: update debian i686 and mipsel images to bookworm
- target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
- docs/sphinx/depfile.py: Handle env.doc2path() returning a Path not a str
- block/blkio: use FUA flag on write zeroes only if supported
- virtio-pci: Fix the use of an uninitialized irqfd
- hw/cxl: Ensure there is enough data to read the input header in cmd_get_physical_port_state()
- intel_iommu: Send IQE event when setting reserved bit in IQT_TAIL
- virtio-net: Avoid indirection_table_mask overflow
- Fix calculation of minimum in colo_compare_tcp
- target/riscv/csr.c: Fix an access to VXSAT
- linux-user: Clean up unused header
- raw-format: Fix error message for invalid offset/size
- hw/loongarch/virt: Remove unnecessary 'cpu.h' inclusion
- tests: Wait for migration completion on destination QEMU to avoid failures
- acpi: ged: Add macro for acpi sleep control register
- hw/intc/openpic: Improve errors for out of bounds property values
- hw/pci-bridge: Add a Kconfig switch for the normal PCI bridge
- docs/tools/qemu-img.rst: fix typo (sumarizes)
- audio/pw: Report more accurate error when connecting to PipeWire fails
- audio/pw: Report more accurate error when connecting to PipeWire fails
- dma: Fix function names in documentation Ensure the function names match.
- edu: fix DMA range upper bound check
- platform-bus: fix refcount leak
- hw/net/can/sja1000: fix bug for single acceptance filter and standard frame
- tests/avocado: fix typo in replay_linux
- util/userfaultfd: Remove unused uffd_poll_events
- Consider discard option when writing zeros
- crypto: factor out conversion of QAPI to gcrypt constants
- crypto: drop gnutls debug logging support
- crypto: use consistent error reporting pattern for unsupported cipher modes
- hw/gpio/aspeed_gpio: Avoid shift into sign bit

Signed-off-by: Jiabo Feng <fengjiabo1@huawei.com>
(cherry picked from commit b6e04df301d30895427ab41a1edff0f40149bdd9)
2024-11-30 09:03:46 +08:00

74 lines
4.0 KiB
Diff

From 95f371c36858dd003c0c6a3d4f6ddfbc299dda9f Mon Sep 17 00:00:00 2001
From: qihao_yewu <qihao_yewu@cmss.chinamobile.com>
Date: Thu, 7 Nov 2024 20:56:18 -0500
Subject: [PATCH] target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
cheery-pick from e6b2fa1b81ac6b05c4397237c846a295a9857920
Our implementation of the indexed version of SVE SDOT/UDOT/USDOT got
the calculation of the inner loop terminator wrong. Although we
correctly account for the element size when we calculate the
terminator for the first iteration:
intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);
we don't do that when we move it forward after the first inner loop
completes. The intention is that we process the vector in 128-bit
segments, which for a 64-bit element size should mean (1, 2), (3, 4),
(5, 6), etc. This bug meant that we would iterate (1, 2), (3, 4, 5,
6), (7, 8, 9, 10) etc and apply the wrong indexed element to some of
the operations, and also index off the end of the vector.
You don't see this bug if the vector length is small enough that we
don't need to iterate the outer loop, i.e. if it is only 128 bits,
or if it is the 64-bit special case from AA32/AA64 AdvSIMD. If the
vector length is 256 bits then we calculate the right results for the
elements in the vector but do index off the end of the vector. Vector
lengths greater than 256 bits see wrong answers. The instructions
that produce 32-bit results behave correctly.
Fix the recalculation of 'segend' for subsequent iterations, and
restore a version of the comment that was lost in the refactor of
commit 7020ffd656a5 that explains why we only need to clamp segend to
opr_sz_n for the first iteration, not the later ones.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2595
Fixes: 7020ffd656a5 ("target/arm: Macroize helper_gvec_{s,u}dot_idx_{b,h}")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241101185544.2130972-1-peter.maydell@linaro.org
Signed-off-by: qihao_yewu <qihao_yewu@cmss.chinamobile.com>
---
target/arm/tcg/vec_helper.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 1f93510b85..11e874c05a 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -692,6 +692,13 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
{ \
intptr_t i = 0, opr_sz = simd_oprsz(desc); \
intptr_t opr_sz_n = opr_sz / sizeof(TYPED); \
+ /* \
+ * Special case: opr_sz == 8 from AA64/AA32 advsimd means the \
+ * first iteration might not be a full 16 byte segment. But \
+ * for vector lengths beyond that this must be SVE and we know \
+ * opr_sz is a multiple of 16, so we need not clamp segend \
+ * to opr_sz_n when we advance it at the end of the loop. \
+ */ \
intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n); \
intptr_t index = simd_data(desc); \
TYPED *d = vd, *a = va; \
@@ -709,7 +716,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
n[i * 4 + 2] * m2 + \
n[i * 4 + 3] * m3); \
} while (++i < segend); \
- segend = i + 4; \
+ segend = i + (16 / sizeof(TYPED)); \
} while (i < opr_sz_n); \
clear_tail(d, opr_sz, simd_maxsz(desc)); \
}
--
2.41.0.windows.1