71 lines
2.6 KiB
Diff
71 lines
2.6 KiB
Diff
From 99026ec6a2735c6ff2f094ac247f558f14e3f3b9 Mon Sep 17 00:00:00 2001
|
|
From: Ying Fang <fangying1@huawei.com>
|
|
Date: Thu, 2 Apr 2020 15:53:47 +0800
|
|
Subject: [PATCH] util/async: Add memory barrier to aio_ctx_prepare
|
|
|
|
Qemu main thread is found to hang up in the mainloop when doing
|
|
image format convert on aarch64 platform and it is highly
|
|
reproduceable by executing test using:
|
|
|
|
qemu-img convert -f qcow2 -O qcow2 origin.qcow2 converted.qcow2
|
|
|
|
This mysterious hang can be explained by a race condition between
|
|
the main thread and an io worker thread. There can be a chance that
|
|
the last worker thread has called aio_bh_schedule_oneshot and it is
|
|
checking against notify_me to deliver a notfiy event. At the same
|
|
time, the main thread is calling aio_ctx_prepare however it first
|
|
calls qemu_timeout_ns_to_ms, thus the worker thread did not see
|
|
notify_me as true and did not send a notify event. The time line
|
|
can be shown in the following way:
|
|
|
|
Main Thread
|
|
------------------------------------------------
|
|
aio_ctx_prepare
|
|
atomic_or(&ctx->notify_me, 1);
|
|
/* out of order execution goes here */
|
|
*timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
|
|
|
|
Worker Thread
|
|
-----------------------------------------------
|
|
aio_bh_schedule_oneshot -> aio_bh_enqueue
|
|
aio_notify
|
|
smp_mb();
|
|
if (ctx->notify_me) { /* worker thread checks notify_me here */
|
|
event_notifier_set(&ctx->notifier);
|
|
atomic_mb_set(&ctx->notified, true);
|
|
}
|
|
|
|
Normal VM runtime is not affected by this hang since there is always some
|
|
timer timeout or subsequent io worker come and notify the main thead.
|
|
To fix this problem, a memory barrier is added to aio_ctx_prepare and
|
|
it is proved to have the hang fixed in our test.
|
|
|
|
This hang is not observed on the x86 platform however it can be easily
|
|
reproduced on the aarch64 platform, thus it is architecture related.
|
|
Not sure if this is revelant to Commit eabc977973103527bbb8fed69c91cfaa6691f8ab
|
|
|
|
Signed-off-by: Ying Fang <fangying1@huawei.com>
|
|
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
|
|
Reported-by: Euler Robot <euler.robot@huawei.com>
|
|
---
|
|
util/async.c | 3 ++-
|
|
1 file changed, 2 insertions(+), 1 deletion(-)
|
|
|
|
diff --git a/util/async.c b/util/async.c
|
|
index afc17fb3..50dfa9ce 100644
|
|
--- a/util/async.c
|
|
+++ b/util/async.c
|
|
@@ -222,7 +222,8 @@ aio_ctx_prepare(GSource *source, gint *timeout)
|
|
AioContext *ctx = (AioContext *) source;
|
|
|
|
atomic_or(&ctx->notify_me, 1);
|
|
-
|
|
+ /* Make sure notify_me is set before aio_compute_timeout */
|
|
+ smp_mb();
|
|
/* We assume there is no timeout already supplied */
|
|
*timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
|
|
|
|
--
|
|
2.23.0
|
|
|