block-nbd was refacted during release 6.2.0, but we didn't induced all the needed patches within the 6.2.0 baseline, which leads to vm crash during migration. the reasons are as below: when iothread is configured, the coroutines should get back to the exact iothread that was out of. But within the 6.2.0 baseline, patches were missing, nbd related coroutine didn't have its related aio_context. It in fact get to the mainline aio_context, the mistaken context leads to vm crash.
46 lines
1.7 KiB
Diff
46 lines
1.7 KiB
Diff
From a4d001a08ce1279b121cb870c378ddeb0825f2bc Mon Sep 17 00:00:00 2001
|
|
From: Zhang Bo <oscar.zhangbo@huawei.com>
|
|
Date: Mon, 29 Aug 2022 15:34:07 +0800
|
|
Subject: [PATCH 2/5] block/nbd: Delete reconnect delay timer when done
|
|
|
|
We start the reconnect delay timer to cancel the reconnection attempt
|
|
after a while. Once nbd_co_do_establish_connection() has returned, this
|
|
attempt is over, and we no longer need the timer.
|
|
|
|
Delete it before returning from nbd_reconnect_attempt(), so that it does
|
|
not persist beyond the I/O request that was paused for reconnecting; we
|
|
do not want it to fire in a drained section, because all sort of things
|
|
can happen in such a section (e.g. the AioContext might be changed, and
|
|
we do not want the timer to fire in the wrong context; or the BDS might
|
|
even be deleted, and so the timer CB would access already-freed data).
|
|
|
|
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
|
|
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
|
|
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
|
|
Signed-off-by: Zhang Bo <oscar.zhangbo@huawei.com>
|
|
---
|
|
block/nbd.c | 7 +++++++
|
|
1 file changed, 7 insertions(+)
|
|
|
|
diff --git a/block/nbd.c b/block/nbd.c
|
|
index 63dbfa807d..16cd7fef77 100644
|
|
--- a/block/nbd.c
|
|
+++ b/block/nbd.c
|
|
@@ -381,6 +381,13 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
|
|
}
|
|
|
|
nbd_co_do_establish_connection(s->bs, NULL);
|
|
+
|
|
+ /*
|
|
+ * The reconnect attempt is done (maybe successfully, maybe not), so
|
|
+ * we no longer need this timer. Delete it so it will not outlive
|
|
+ * this I/O request (so draining removes all timers).
|
|
+ */
|
|
+ reconnect_delay_timer_del(s);
|
|
}
|
|
|
|
static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
|
|
--
|
|
2.27.0
|
|
|