- ppc/xive: Fix ESB length overflow on 32-bit hosts - target/hppa: Fix PSW V-bit packaging in cpu_hppa_get for hppa64 - target/ppc: Fix migration of CPUs with TLB_EMB TLB type - target/arm: Clear high SVE elements in handle_vec_simd_wshli - module: Prevent crash by resetting local_err in module_load_qom_all() - tests/docker: update debian i686 and mipsel images to bookworm - target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed) - docs/sphinx/depfile.py: Handle env.doc2path() returning a Path not a str - block/blkio: use FUA flag on write zeroes only if supported - virtio-pci: Fix the use of an uninitialized irqfd - hw/cxl: Ensure there is enough data to read the input header in cmd_get_physical_port_state() - intel_iommu: Send IQE event when setting reserved bit in IQT_TAIL - virtio-net: Avoid indirection_table_mask overflow - Fix calculation of minimum in colo_compare_tcp - target/riscv/csr.c: Fix an access to VXSAT - linux-user: Clean up unused header - raw-format: Fix error message for invalid offset/size - hw/loongarch/virt: Remove unnecessary 'cpu.h' inclusion - tests: Wait for migration completion on destination QEMU to avoid failures - acpi: ged: Add macro for acpi sleep control register - hw/intc/openpic: Improve errors for out of bounds property values - hw/pci-bridge: Add a Kconfig switch for the normal PCI bridge - docs/tools/qemu-img.rst: fix typo (sumarizes) - audio/pw: Report more accurate error when connecting to PipeWire fails - audio/pw: Report more accurate error when connecting to PipeWire fails - dma: Fix function names in documentation Ensure the function names match. - edu: fix DMA range upper bound check - platform-bus: fix refcount leak - hw/net/can/sja1000: fix bug for single acceptance filter and standard frame - tests/avocado: fix typo in replay_linux - util/userfaultfd: Remove unused uffd_poll_events - Consider discard option when writing zeros - crypto: factor out conversion of QAPI to gcrypt constants - crypto: drop gnutls debug logging support - crypto: use consistent error reporting pattern for unsupported cipher modes - hw/gpio/aspeed_gpio: Avoid shift into sign bit Signed-off-by: Jiabo Feng <fengjiabo1@huawei.com> (cherry picked from commit b6e04df301d30895427ab41a1edff0f40149bdd9)
74 lines
4.0 KiB
Diff
74 lines
4.0 KiB
Diff
From 95f371c36858dd003c0c6a3d4f6ddfbc299dda9f Mon Sep 17 00:00:00 2001
|
|
From: qihao_yewu <qihao_yewu@cmss.chinamobile.com>
|
|
Date: Thu, 7 Nov 2024 20:56:18 -0500
|
|
Subject: [PATCH] target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
|
|
|
|
cheery-pick from e6b2fa1b81ac6b05c4397237c846a295a9857920
|
|
|
|
Our implementation of the indexed version of SVE SDOT/UDOT/USDOT got
|
|
the calculation of the inner loop terminator wrong. Although we
|
|
correctly account for the element size when we calculate the
|
|
terminator for the first iteration:
|
|
intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);
|
|
we don't do that when we move it forward after the first inner loop
|
|
completes. The intention is that we process the vector in 128-bit
|
|
segments, which for a 64-bit element size should mean (1, 2), (3, 4),
|
|
(5, 6), etc. This bug meant that we would iterate (1, 2), (3, 4, 5,
|
|
6), (7, 8, 9, 10) etc and apply the wrong indexed element to some of
|
|
the operations, and also index off the end of the vector.
|
|
|
|
You don't see this bug if the vector length is small enough that we
|
|
don't need to iterate the outer loop, i.e. if it is only 128 bits,
|
|
or if it is the 64-bit special case from AA32/AA64 AdvSIMD. If the
|
|
vector length is 256 bits then we calculate the right results for the
|
|
elements in the vector but do index off the end of the vector. Vector
|
|
lengths greater than 256 bits see wrong answers. The instructions
|
|
that produce 32-bit results behave correctly.
|
|
|
|
Fix the recalculation of 'segend' for subsequent iterations, and
|
|
restore a version of the comment that was lost in the refactor of
|
|
commit 7020ffd656a5 that explains why we only need to clamp segend to
|
|
opr_sz_n for the first iteration, not the later ones.
|
|
|
|
Cc: qemu-stable@nongnu.org
|
|
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2595
|
|
Fixes: 7020ffd656a5 ("target/arm: Macroize helper_gvec_{s,u}dot_idx_{b,h}")
|
|
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Message-id: 20241101185544.2130972-1-peter.maydell@linaro.org
|
|
Signed-off-by: qihao_yewu <qihao_yewu@cmss.chinamobile.com>
|
|
---
|
|
target/arm/tcg/vec_helper.c | 9 ++++++++-
|
|
1 file changed, 8 insertions(+), 1 deletion(-)
|
|
|
|
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
|
|
index 1f93510b85..11e874c05a 100644
|
|
--- a/target/arm/tcg/vec_helper.c
|
|
+++ b/target/arm/tcg/vec_helper.c
|
|
@@ -692,6 +692,13 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
|
|
{ \
|
|
intptr_t i = 0, opr_sz = simd_oprsz(desc); \
|
|
intptr_t opr_sz_n = opr_sz / sizeof(TYPED); \
|
|
+ /* \
|
|
+ * Special case: opr_sz == 8 from AA64/AA32 advsimd means the \
|
|
+ * first iteration might not be a full 16 byte segment. But \
|
|
+ * for vector lengths beyond that this must be SVE and we know \
|
|
+ * opr_sz is a multiple of 16, so we need not clamp segend \
|
|
+ * to opr_sz_n when we advance it at the end of the loop. \
|
|
+ */ \
|
|
intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n); \
|
|
intptr_t index = simd_data(desc); \
|
|
TYPED *d = vd, *a = va; \
|
|
@@ -709,7 +716,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
|
|
n[i * 4 + 2] * m2 + \
|
|
n[i * 4 + 3] * m3); \
|
|
} while (++i < segend); \
|
|
- segend = i + 4; \
|
|
+ segend = i + (16 / sizeof(TYPED)); \
|
|
} while (i < opr_sz_n); \
|
|
clear_tail(d, opr_sz, simd_maxsz(desc)); \
|
|
}
|
|
--
|
|
2.41.0.windows.1
|
|
|