This series is an attempt to provide device memory hotplug support
on ARM virt platform. This is based on Eric's recent works here[1]
and carries some of the pc-dimm related patches dropped from his
series.
The kernel support for arm64 memory hot add was added recently by
Robin and hence the guest kernel should be => 5.0-rc1.
NVDIM support is not included currently as we still have an unresolved
issue while hot adding NVDIMM[2]. However NVDIMM cold plug patches
can be included, but not done for now, for keeping it simple.
This makes use of GED device to sent hotplug ACPI events to the
Guest. GED code is based on Nemu. Thanks to the efforts of Samuel and
Sebastien to add the hardware-reduced support to Nemu using GED
device[3]. (Please shout if I got the author/signed-off wrong for
those patches or missed any names).
This is sanity tested on a HiSilicon ARM64 platform and appreciate
any further testing.
Note:
Attempted adding dimm_pxm test case to bios-tables-test for arm/virt.
But noticed the issue decribed here[5]. This is under investigation
now.
upstream url: https://patchwork.kernel.org/cover/11150345/
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
v3:
- Added a target/arm/kvm_arm.h comment cleanup patch (1/6)
- Minor refactoring of assert_has_feature_enabled/disabled in 4/6,
kept Richard's r-b.
- Rewrote kvm-no-adjvtime documentation in 6/6.
- Reworked approach in 5/6 to properly deal with migration and to
track running vs. !running, rather than running vs. paused states.
v2:
- Reworked it enough that I brought back the RFC tag and retitled the
series. Also had to drop r-b's from a couple of patches, and even
drop patches.
- Changed approach from writing the QEMU virtual time to the guest
vtime counter to saving and restoring the guest vtime counter.
- Changed the kvm-adjvtime property, which was off by default, to a
kvm-no-adjvtime property, which is also off by default, meaning the
effective "adjust vtime" property is now on by default (but only
for 5.0 virt machine types and later)
v1:
- move from RFC status to v1
- put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
- add r-b's from Richard
This series is inspired by a series[1] posted by Bijan Mottahedeh over
a year ago and by the patch[2] posted by Heyi Guo almost a year ago.
The problem described in the cover letter of [1] is easily reproducible
and some users would like to have the option to avoid it. However the
solution, which is to adjust the virtual counter each time the VM
transitions to the running state, introduces a different problem, which
is that the virtual and physical counters diverge. As described in the
cover letter of [1] this divergence is easily observed when comparing
the output of `date` and `hwclock` after suspending the guest, waiting
a while, and then resuming it. Because this different problem may actually
be worse for some users, unlike [1], the series posted here makes the
virtual counter adjustment optional. Besides the adjustment being
optional, this series approaches the needed changes differently to apply
them in more appropriate locations.
Additional notes
----------------
Note 1
------
As described above, when running a guest with kvm-no-adjtime disabled
it will be less likely the guest OS and guest applications get surprise
time jumps when they use the virtual counter. However the counter will
no longer reflect real time. It will lag behind. If this is a problem
then the guest can resynchronize its time from an external source or
even from its physical counter. If the suspend/resume is done with
libvirt's virsh, and the guest is running the guest agent, then it's
also possible to use a sequence like this
$ virsh suspend $GUEST
$ virsh resume $GUEST
$ virsh domtime --sync $GUEST
in order to resynchronize a guest right after the resume. Of course
there will still be time when the clock is not right, possibly creating
confusing timestamps in logs, for example, and the guest must still be
tolerant to the time synchronizations.
Note 2
------
Userspace that wants to set KVM_REG_ARM_TIMER_CNT should beware that
the KVM register ID is not correct. This cannot be fixed because it's
UAPI and if the UAPI headers are used then it can't be a problem.
However, if a userspace attempts to create the ID themselves from the
register's specification, then they will get KVM_REG_ARM_TIMER_CVAL
instead, as the _CNT and _CVAL definitions have their register
parameters swapped.
Note 3
------
I didn't test this with a 32-bit KVM host, but the changes to kvm32.c
are the same as kvm64.c. So what could go wrong? Test results would be
appreciated.
[1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05713.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg03695.html
upstream url:
https://patchwork.kernel.org/cover/11341629/
Qemu main thread is found to hang up in the mainloop when doing
image format convert on aarch64 platform and it is highly
reproduceable by executing test using:
qemu-img convert -f qcow2 -O qcow2 origin.qcow2 converted.qcow2
This mysterious hang can be explained by a race condition between
the main thread and an io worker thread. There can be a chance that
the last worker thread has called aio_bh_schedule_oneshot and it is
checking against notify_me to deliver a notfiy event. At the same
time, the main thread is calling aio_ctx_prepare however it first
calls qemu_timeout_ns_to_ms, thus the worker thread did not see
notify_me as true and did not send a notify event. The time line
can be shown in the following way:
Main Thread
------------------------------------------------
aio_ctx_prepare
atomic_or(&ctx->notify_me, 1);
/* out of order execution goes here */
*timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
Worker Thread
-----------------------------------------------
aio_bh_schedule_oneshot -> aio_bh_enqueue
aio_notify
smp_mb();
if (ctx->notify_me) { /* worker thread checks notify_me here */
event_notifier_set(&ctx->notifier);
atomic_mb_set(&ctx->notified, true);
}
Normal VM runtime is not affected by this hang since there is always some
timer timeout or subsequent io worker come and notify the main thead.
To fix this problem, a memory barrier is added to aio_ctx_prepare and
it is proved to have the hang fixed in our test.
This hang is not observed on the x86 platform however it can be easily
reproduced on the aarch64 platform, thus it is architecture related.
Not sure if this is revelant to Commit eabc977973103527bbb8fed69c91cfaa6691f8ab
Signed-off-by: Ying Fang <fangying1@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reported-by: Euler Robot <euler.robot@huawei.com>
If a device is plugged in the pcie-root-port when VM kernel is
booting, the kernel may wrongly disable the device.
This bug was brought in by two patches of the linux kernel:
https://patchwork.kernel.org/patch/10575355/https://patchwork.kernel.org/patch/10766219/
VM runtime like kata uses this feature to boot microVM,
so we must fix it up. We hack into the pcie native hotplug
patch so that hotplug/unplug will work under this circumstance.
Signed-off-by: Ying Fang <fangying1@huawei.com>
block/iscsi: use MIN() between mx_sb_len and sb_len_wr
monitor: fix memory leak in monitor_fdset_dup_fd_find_remove
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Patch numbers are mismatched when QEMU is rebased from v4.0.0 to v4.0.1,
this patch is introduced to have it fixed.
Signed-off-by: Ying Fang <fangying1@huawei.com>
Prepare for upgrading base package from 4.0.0 to 4.0.1.
Remove all the patches that have been contained in 4.0.1 base package.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
fno-inline option is need by hot-patch, but we didn't support
hot-patch in this version, remove it.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>