- target/i386: csv: Support inject secret for CSV3 guest only if the extension is enabled
- target/i386: csv: Support load kernel hashes for CSV3 guest only if the extension is enabled
- target/i386: csv: Request to set private memory of CSV3 guest if the extension is enabled
- target/i386: kvm: Support to get and enable extensions for Hygon CoCo guest
- qapi/qom,target/i386: csv-guest: Introduce secret-header-file=str and secret-file=str options
- bakcend: VirtCCA:resolve hugepage memory waste issue in vhost-user scenario
- parallels: fix ext_off assertion failure due to overflow
- backends/cryptodev-vhost-user: Fix local_error leaks
- hw/usb/hcd-ehci: Fix debug printf format string
- target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess
- target/riscv/vector_helper.c: optimize loops in ldst helpers
- target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX()
- target/hexagon: don't look for static glib
- virtio-net: Fix network stall at the host side waiting for kick
- Add if condition to avoid assertion failed error in blockdev_init
- target/arm: Use float_status copy in sme_fmopa_s
- target/arm: take HSTR traps of cp15 accesses to EL2, not EL1
- target/arm: Reinstate "vfp" property on AArch32 CPUs
- target/i386/cpu: Fix notes for CPU models
- target/arm: LDAPR should honour SCTLR_ELx.nAA
- target/riscv: Avoid bad shift in riscv_cpu_do_interrupt()
- hvf: remove unused but set variable
- hw/misc/nrf51_rng: Don't use BIT_MASK() when we mean BIT()
- Avoid taking address of out-of-bounds array index
- target/arm: Fix VCMLA Dd, Dn, Dm[idx]
- target/arm: Fix UMOPA/UMOPS of 16-bit values
- target/arm: Fix SVE/SME gross MTE suppression checks
- target/arm: Fix nregs computation in do_{ld,st}_zpa
- crypto: fix error check on gcry_md_open
- Change vmstate_cpuhp_sts vmstateDescription version_id
- hw/pci: Remove unused pci_irq_pulse() method
- hw/intc: Don't clear pending bits on IRQ lowering
- target/arm: Drop user-only special case in sve_stN_r
- migration: Ensure vmstate_save() sets errp
- target/i386: fix hang when using slow path for ptw_setl
- contrib/plugins: add compat for g_memdup2
- hw/audio/hda: fix memory leak on audio setup
- crypto: perform runtime check for hash/hmac support in gcrypt
- target/arm: Fix incorrect aa64_tidcp1 feature check
- target/arm: fix exception syndrome for AArch32 bkpt insn
- target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
- linux-user: Print tid not pid with strace
- target/arm: Fix A64 scalar SQSHRN and SQRSHRN
- target/arm: Don't assert for 128-bit tile accesses when SVL is 128
- hw/timer/exynos4210_mct: fix possible int overflow
- target/arm: Avoid shifts by -1 in tszimm_shr() and tszimm_shl()
- hw/audio/virtio-snd: Always use little endian audio format
- target/riscv: Fix vcompress with rvv_ta_all_1s
- usb-hub: Fix handling port power control messages
Signed-off-by: Jiabo Feng <fengjiabo1@huawei.com>
(cherry picked from commit d4a20b24ff377fd07fcbf2b72eecaf07a3ac4cc0)
340 lines
12 KiB
Diff
340 lines
12 KiB
Diff
From a481451a811877640a57ccbef2b33b39567f2802 Mon Sep 17 00:00:00 2001
|
|
From: thomas <east.moutain.yang@gmail.com>
|
|
Date: Fri, 12 Jul 2024 11:10:53 +0800
|
|
Subject: [PATCH] virtio-net: Fix network stall at the host side waiting for
|
|
kick
|
|
|
|
commit f937309fbdbb48c354220a3e7110c202ae4aa7fa upstream.
|
|
|
|
Patch 06b12970174 ("virtio-net: fix network stall under load")
|
|
added double-check to test whether the available buffer size
|
|
can satisfy the request or not, in case the guest has added
|
|
some buffers to the avail ring simultaneously after the first
|
|
check. It will be lucky if the available buffer size becomes
|
|
okay after the double-check, then the host can send the packet
|
|
to the guest. If the buffer size still can't satisfy the request,
|
|
even if the guest has added some buffers, viritio-net would
|
|
stall at the host side forever.
|
|
|
|
The patch enables notification and checks whether the guest has
|
|
added some buffers since last check of available buffers when
|
|
the available buffers are insufficient. If no buffer is added,
|
|
return false, else recheck the available buffers in the loop.
|
|
If the available buffers are sufficient, disable notification
|
|
and return true.
|
|
|
|
Changes:
|
|
1. Change the return type of virtqueue_get_avail_bytes() from void
|
|
to int, it returns an opaque that represents the shadow_avail_idx
|
|
of the virtqueue on success, else -1 on error.
|
|
2. Add a new API: virtio_queue_enable_notification_and_check(),
|
|
it takes an opaque as input arg which is returned from
|
|
virtqueue_get_avail_bytes(). It enables notification firstly,
|
|
then checks whether the guest has added some buffers since
|
|
last check of available buffers or not by virtio_queue_poll(),
|
|
return ture if yes.
|
|
|
|
The patch also reverts patch "06b12970174".
|
|
|
|
The case below can reproduce the stall.
|
|
|
|
Guest 0
|
|
+--------+
|
|
| iperf |
|
|
---------------> | server |
|
|
Host | +--------+
|
|
+--------+ | ...
|
|
| iperf |----
|
|
| client |---- Guest n
|
|
+--------+ | +--------+
|
|
| | iperf |
|
|
---------------> | server |
|
|
+--------+
|
|
|
|
Boot many guests from qemu with virtio network:
|
|
qemu ... -netdev tap,id=net_x \
|
|
-device virtio-net-pci-non-transitional,\
|
|
iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
|
|
|
|
Each guest acts as iperf server with commands below:
|
|
iperf3 -s -D -i 10 -p 8001
|
|
iperf3 -s -D -i 10 -p 8002
|
|
|
|
The host as iperf client:
|
|
iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
|
|
iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
|
|
|
|
After some time, the host loses connection to the guest,
|
|
the guest can send packet to the host, but can't receive
|
|
packet from the host.
|
|
|
|
It's more likely to happen if SWIOTLB is enabled in the guest,
|
|
allocating and freeing bounce buffer takes some CPU ticks,
|
|
copying from/to bounce buffer takes more CPU ticks, compared
|
|
with that there is no bounce buffer in the guest.
|
|
Once the rate of producing packets from the host approximates
|
|
the rate of receiveing packets in the guest, the guest would
|
|
loop in NAPI.
|
|
|
|
receive packets ---
|
|
| |
|
|
v |
|
|
free buf virtnet_poll
|
|
| |
|
|
v |
|
|
add buf to avail ring ---
|
|
|
|
|
| need kick the host?
|
|
| NAPI continues
|
|
v
|
|
receive packets ---
|
|
| |
|
|
v |
|
|
free buf virtnet_poll
|
|
| |
|
|
v |
|
|
add buf to avail ring ---
|
|
|
|
|
v
|
|
... ...
|
|
|
|
On the other hand, the host fetches free buf from avail
|
|
ring, if the buf in the avail ring is not enough, the
|
|
host notifies the guest the event by writing the avail
|
|
idx read from avail ring to the event idx of used ring,
|
|
then the host goes to sleep, waiting for the kick signal
|
|
from the guest.
|
|
|
|
Once the guest finds the host is waiting for kick singal
|
|
(in virtqueue_kick_prepare_split()), it kicks the host.
|
|
|
|
The host may stall forever at the sequences below:
|
|
|
|
Host Guest
|
|
------------ -----------
|
|
fetch buf, send packet receive packet ---
|
|
... ... |
|
|
fetch buf, send packet add buf |
|
|
... add buf virtnet_poll
|
|
buf not enough avail idx-> add buf |
|
|
read avail idx add buf |
|
|
add buf ---
|
|
receive packet ---
|
|
write event idx ... |
|
|
wait for kick add buf virtnet_poll
|
|
... |
|
|
---
|
|
no more packet, exit NAPI
|
|
|
|
In the first loop of NAPI above, indicated in the range of
|
|
virtnet_poll above, the host is sending packets while the
|
|
guest is receiving packets and adding buffers.
|
|
step 1: The buf is not enough, for example, a big packet
|
|
needs 5 buf, but the available buf count is 3.
|
|
The host read current avail idx.
|
|
step 2: The guest adds some buf, then checks whether the
|
|
host is waiting for kick signal, not at this time.
|
|
The used ring is not empty, the guest continues
|
|
the second loop of NAPI.
|
|
step 3: The host writes the avail idx read from avail
|
|
ring to used ring as event idx via
|
|
virtio_queue_set_notification(q->rx_vq, 1).
|
|
step 4: At the end of the second loop of NAPI, recheck
|
|
whether kick is needed, as the event idx in the
|
|
used ring written by the host is beyound the
|
|
range of kick condition, the guest will not
|
|
send kick signal to the host.
|
|
|
|
Fixes: 06b12970174 ("virtio-net: fix network stall under load")
|
|
Cc: qemu-stable@nongnu.org
|
|
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
|
|
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Signed-off-by: Jason Wang <jasowang@redhat.com>
|
|
---
|
|
hw/net/virtio-net.c | 28 ++++++++++-------
|
|
hw/virtio/virtio.c | 64 +++++++++++++++++++++++++++++++++++---
|
|
include/hw/virtio/virtio.h | 19 +++++++++--
|
|
3 files changed, 92 insertions(+), 19 deletions(-)
|
|
|
|
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
|
|
index c9c83fe297..7184c9c526 100644
|
|
--- a/hw/net/virtio-net.c
|
|
+++ b/hw/net/virtio-net.c
|
|
@@ -1662,24 +1662,28 @@ static bool virtio_net_can_receive(NetClientState *nc)
|
|
|
|
static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
|
|
{
|
|
+ int opaque;
|
|
+ unsigned int in_bytes;
|
|
VirtIONet *n = q->n;
|
|
- if (virtio_queue_empty(q->rx_vq) ||
|
|
- (n->mergeable_rx_bufs &&
|
|
- !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
|
|
- virtio_queue_set_notification(q->rx_vq, 1);
|
|
-
|
|
- /* To avoid a race condition where the guest has made some buffers
|
|
- * available after the above check but before notification was
|
|
- * enabled, check for available buffers again.
|
|
- */
|
|
- if (virtio_queue_empty(q->rx_vq) ||
|
|
- (n->mergeable_rx_bufs &&
|
|
- !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
|
|
+
|
|
+ while (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) {
|
|
+ opaque = virtqueue_get_avail_bytes(q->rx_vq, &in_bytes, NULL,
|
|
+ bufsize, 0);
|
|
+ /* Buffer is enough, disable notifiaction */
|
|
+ if (bufsize <= in_bytes) {
|
|
+ break;
|
|
+ }
|
|
+
|
|
+ if (virtio_queue_enable_notification_and_check(q->rx_vq, opaque)) {
|
|
+ /* Guest has added some buffers, try again */
|
|
+ continue;
|
|
+ } else {
|
|
return 0;
|
|
}
|
|
}
|
|
|
|
virtio_queue_set_notification(q->rx_vq, 0);
|
|
+
|
|
return 1;
|
|
}
|
|
|
|
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
|
|
index 8c3b6b87aa..4f5b241fd3 100644
|
|
--- a/hw/virtio/virtio.c
|
|
+++ b/hw/virtio/virtio.c
|
|
@@ -743,6 +743,60 @@ int virtio_queue_empty(VirtQueue *vq)
|
|
}
|
|
}
|
|
|
|
+static bool virtio_queue_split_poll(VirtQueue *vq, unsigned shadow_idx)
|
|
+{
|
|
+ if (unlikely(!vq->vring.avail)) {
|
|
+ return false;
|
|
+ }
|
|
+
|
|
+ return (uint16_t)shadow_idx != vring_avail_idx(vq);
|
|
+}
|
|
+
|
|
+static bool virtio_queue_packed_poll(VirtQueue *vq, unsigned shadow_idx)
|
|
+{
|
|
+ VRingPackedDesc desc;
|
|
+ VRingMemoryRegionCaches *caches;
|
|
+
|
|
+ if (unlikely(!vq->vring.desc)) {
|
|
+ return false;
|
|
+ }
|
|
+
|
|
+ caches = vring_get_region_caches(vq);
|
|
+ if (!caches) {
|
|
+ return false;
|
|
+ }
|
|
+
|
|
+ vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
|
|
+ shadow_idx, true);
|
|
+
|
|
+ return is_desc_avail(desc.flags, vq->shadow_avail_wrap_counter);
|
|
+}
|
|
+
|
|
+static bool virtio_queue_poll(VirtQueue *vq, unsigned shadow_idx)
|
|
+{
|
|
+ if (virtio_device_disabled(vq->vdev)) {
|
|
+ return false;
|
|
+ }
|
|
+
|
|
+ if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
|
|
+ return virtio_queue_packed_poll(vq, shadow_idx);
|
|
+ } else {
|
|
+ return virtio_queue_split_poll(vq, shadow_idx);
|
|
+ }
|
|
+}
|
|
+
|
|
+bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
|
|
+ int opaque)
|
|
+{
|
|
+ virtio_queue_set_notification(vq, 1);
|
|
+
|
|
+ if (opaque >= 0) {
|
|
+ return virtio_queue_poll(vq, (unsigned)opaque);
|
|
+ } else {
|
|
+ return false;
|
|
+ }
|
|
+}
|
|
+
|
|
static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
|
|
unsigned int len)
|
|
{
|
|
@@ -1322,9 +1376,9 @@ err:
|
|
goto done;
|
|
}
|
|
|
|
-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
|
|
- unsigned int *out_bytes,
|
|
- unsigned max_in_bytes, unsigned max_out_bytes)
|
|
+int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
|
|
+ unsigned int *out_bytes, unsigned max_in_bytes,
|
|
+ unsigned max_out_bytes)
|
|
{
|
|
uint16_t desc_size;
|
|
VRingMemoryRegionCaches *caches;
|
|
@@ -1357,7 +1411,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
|
|
caches);
|
|
}
|
|
|
|
- return;
|
|
+ return (int)vq->shadow_avail_idx;
|
|
err:
|
|
if (in_bytes) {
|
|
*in_bytes = 0;
|
|
@@ -1365,6 +1419,8 @@ err:
|
|
if (out_bytes) {
|
|
*out_bytes = 0;
|
|
}
|
|
+
|
|
+ return -1;
|
|
}
|
|
|
|
int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
|
|
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
|
|
index 60494aed62..78db2bde98 100644
|
|
--- a/include/hw/virtio/virtio.h
|
|
+++ b/include/hw/virtio/virtio.h
|
|
@@ -273,9 +273,13 @@ void qemu_put_virtqueue_element(VirtIODevice *vdev, QEMUFile *f,
|
|
VirtQueueElement *elem);
|
|
int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
|
|
unsigned int out_bytes);
|
|
-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
|
|
- unsigned int *out_bytes,
|
|
- unsigned max_in_bytes, unsigned max_out_bytes);
|
|
+/**
|
|
+ * Return <0 on error or an opaque >=0 to pass to
|
|
+ * virtio_queue_enable_notification_and_check on success.
|
|
+ */
|
|
+int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
|
|
+ unsigned int *out_bytes, unsigned max_in_bytes,
|
|
+ unsigned max_out_bytes);
|
|
|
|
void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
|
|
void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
|
|
@@ -309,6 +313,15 @@ int virtio_queue_ready(VirtQueue *vq);
|
|
|
|
int virtio_queue_empty(VirtQueue *vq);
|
|
|
|
+/**
|
|
+ * Enable notification and check whether guest has added some
|
|
+ * buffers since last call to virtqueue_get_avail_bytes.
|
|
+ *
|
|
+ * @opaque: value returned from virtqueue_get_avail_bytes
|
|
+ */
|
|
+bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
|
|
+ int opaque);
|
|
+
|
|
/* Host binding interface. */
|
|
|
|
uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);
|
|
--
|
|
2.41.0.windows.1
|
|
|