qemu/target-arm-kvm-Implement-virtual-time-adjustment.patch
Ying Fang 976278e59b target/arm/kvm: Adjust virtual time
v3:
 - Added a target/arm/kvm_arm.h comment cleanup patch (1/6)
 - Minor refactoring of assert_has_feature_enabled/disabled in 4/6,
   kept Richard's r-b.
 - Rewrote kvm-no-adjvtime documentation in 6/6.
 - Reworked approach in 5/6 to properly deal with migration and to
   track running vs. !running, rather than running vs. paused states.

v2:
 - Reworked it enough that I brought back the RFC tag and retitled the
   series. Also had to drop r-b's from a couple of patches, and even
   drop patches.
 - Changed approach from writing the QEMU virtual time to the guest
   vtime counter to saving and restoring the guest vtime counter.
 - Changed the kvm-adjvtime property, which was off by default, to a
   kvm-no-adjvtime property, which is also off by default, meaning the
   effective "adjust vtime" property is now on by default (but only
   for 5.0 virt machine types and later)

v1:
 - move from RFC status to v1
 - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c
 - add r-b's from Richard

This series is inspired by a series[1] posted by Bijan Mottahedeh over
a year ago and by the patch[2] posted by Heyi Guo almost a year ago.
The problem described in the cover letter of [1] is easily reproducible
and some users would like to have the option to avoid it. However the
solution, which is to adjust the virtual counter each time the VM
transitions to the running state, introduces a different problem, which
is that the virtual and physical counters diverge. As described in the
cover letter of [1] this divergence is easily observed when comparing
the output of `date` and `hwclock` after suspending the guest, waiting
a while, and then resuming it. Because this different problem may actually
be worse for some users, unlike [1], the series posted here makes the
virtual counter adjustment optional. Besides the adjustment being
optional, this series approaches the needed changes differently to apply
them in more appropriate locations.

Additional notes
----------------

Note 1
------

As described above, when running a guest with kvm-no-adjtime disabled
it will be less likely the guest OS and guest applications get surprise
time jumps when they use the virtual counter.  However the counter will
no longer reflect real time.  It will lag behind.  If this is a problem
then the guest can resynchronize its time from an external source or
even from its physical counter.  If the suspend/resume is done with
libvirt's virsh, and the guest is running the guest agent, then it's
also possible to use a sequence like this

 $ virsh suspend $GUEST
 $ virsh resume $GUEST
 $ virsh domtime --sync $GUEST

in order to resynchronize a guest right after the resume.  Of course
there will still be time when the clock is not right, possibly creating
confusing timestamps in logs, for example, and the guest must still be
tolerant to the time synchronizations.

Note 2
------

Userspace that wants to set KVM_REG_ARM_TIMER_CNT should beware that
the KVM register ID is not correct.  This cannot be fixed because it's
UAPI and if the UAPI headers are used then it can't be a problem.
However, if a userspace attempts to create the ID themselves from the
register's specification, then they will get KVM_REG_ARM_TIMER_CVAL
instead, as the _CNT and _CVAL definitions have their register
parameters swapped.

Note 3
------

I didn't test this with a 32-bit KVM host, but the changes to kvm32.c
are the same as kvm64.c. So what could go wrong? Test results would be
appreciated.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05713.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg03695.html

upstream url:
https://patchwork.kernel.org/cover/11341629/
2020-06-01 09:13:39 +00:00

291 lines
8.3 KiB
Diff

From 77ee224418fac859acecd9aca4d18555ced42db6 Mon Sep 17 00:00:00 2001
From: Ying Fang <fangying1@huawei.com>
Date: Tue, 21 Apr 2020 17:32:31 +0800
Subject: [PATCH 3/4] target/arm/kvm: Implement virtual time adjustment
When a VM is stopped (such as when it's paused) guest virtual time
should stop counting. Otherwise, when the VM is resumed it will
experience time jumps and its kernel may report soft lockups. Not
counting virtual time while the VM is stopped has the side effect
of making the guest's time appear to lag when compared with real
time, and even with time derived from the physical counter. For
this reason, this change, which is enabled by default, comes with
a KVM CPU feature allowing it to be disabled, restoring legacy
behavior.
This patch only provides the implementation of the virtual time
adjustment. A subsequent patch will provide the CPU property
allowing the change to be enabled and disabled.
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-id: 20200120101023.16030-6-drjones@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.h | 7 ++++
target/arm/kvm.c | 92 ++++++++++++++++++++++++++++++++++++++++++++
target/arm/kvm32.c | 2 +
target/arm/kvm64.c | 2 +
target/arm/kvm_arm.h | 37 ++++++++++++++++++
target/arm/machine.c | 7 ++++
6 files changed, 147 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 94c990cd..e19531a7 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -816,6 +816,13 @@ struct ARMCPU {
/* KVM init features for this CPU */
uint32_t kvm_init_features[7];
+ /* KVM CPU state */
+
+ /* KVM virtual time adjustment */
+ bool kvm_adjvtime;
+ bool kvm_vtime_dirty;
+ uint64_t kvm_vtime;
+
/* Uniprocessor system with MP extensions */
bool mp_is_up;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index cc7a46df..21fb7ecd 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -336,6 +336,22 @@ static int compare_u64(const void *a, const void *b)
return 0;
}
+/*
+ * cpreg_values are sorted in ascending order by KVM register ID
+ * (see kvm_arm_init_cpreg_list). This allows us to cheaply find
+ * the storage for a KVM register by ID with a binary search.
+ */
+static uint64_t *kvm_arm_get_cpreg_ptr(ARMCPU *cpu, uint64_t regidx)
+{
+ uint64_t *res;
+
+ res = bsearch(&regidx, cpu->cpreg_indexes, cpu->cpreg_array_len,
+ sizeof(uint64_t), compare_u64);
+ assert(res);
+
+ return &cpu->cpreg_values[res - cpu->cpreg_indexes];
+}
+
/* Initialize the ARMCPU cpreg list according to the kernel's
* definition of what CPU registers it knows about (and throw away
* the previous TCG-created cpreg list).
@@ -489,6 +505,23 @@ bool write_list_to_kvmstate(ARMCPU *cpu, int level)
return ok;
}
+void kvm_arm_cpu_pre_save(ARMCPU *cpu)
+{
+ /* KVM virtual time adjustment */
+ if (cpu->kvm_vtime_dirty) {
+ *kvm_arm_get_cpreg_ptr(cpu, KVM_REG_ARM_TIMER_CNT) = cpu->kvm_vtime;
+ }
+}
+
+void kvm_arm_cpu_post_load(ARMCPU *cpu)
+{
+ /* KVM virtual time adjustment */
+ if (cpu->kvm_adjvtime) {
+ cpu->kvm_vtime = *kvm_arm_get_cpreg_ptr(cpu, KVM_REG_ARM_TIMER_CNT);
+ cpu->kvm_vtime_dirty = true;
+ }
+}
+
void kvm_arm_reset_vcpu(ARMCPU *cpu)
{
int ret;
@@ -556,6 +589,50 @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu)
return 0;
}
+void kvm_arm_get_virtual_time(CPUState *cs)
+{
+ ARMCPU *cpu = ARM_CPU(cs);
+ struct kvm_one_reg reg = {
+ .id = KVM_REG_ARM_TIMER_CNT,
+ .addr = (uintptr_t)&cpu->kvm_vtime,
+ };
+ int ret;
+
+ if (cpu->kvm_vtime_dirty) {
+ return;
+ }
+
+ ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
+ if (ret) {
+ error_report("Failed to get KVM_REG_ARM_TIMER_CNT");
+ abort();
+ }
+
+ cpu->kvm_vtime_dirty = true;
+}
+
+void kvm_arm_put_virtual_time(CPUState *cs)
+{
+ ARMCPU *cpu = ARM_CPU(cs);
+ struct kvm_one_reg reg = {
+ .id = KVM_REG_ARM_TIMER_CNT,
+ .addr = (uintptr_t)&cpu->kvm_vtime,
+ };
+ int ret;
+
+ if (!cpu->kvm_vtime_dirty) {
+ return;
+ }
+
+ ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
+ if (ret) {
+ error_report("Failed to set KVM_REG_ARM_TIMER_CNT");
+ abort();
+ }
+
+ cpu->kvm_vtime_dirty = false;
+}
+
int kvm_put_vcpu_events(ARMCPU *cpu)
{
CPUARMState *env = &cpu->env;
@@ -667,6 +744,21 @@ MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
return MEMTXATTRS_UNSPECIFIED;
}
+void kvm_arm_vm_state_change(void *opaque, int running, RunState state)
+{
+ CPUState *cs = opaque;
+ ARMCPU *cpu = ARM_CPU(cs);
+
+ if (running) {
+ if (cpu->kvm_adjvtime) {
+ kvm_arm_put_virtual_time(cs);
+ }
+ } else {
+ if (cpu->kvm_adjvtime) {
+ kvm_arm_get_virtual_time(cs);
+ }
+ }
+}
int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
{
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 51f78f72..ee158830 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -195,6 +195,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
return -EINVAL;
}
+ qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cs);
+
/* Determine init features for this CPU */
memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
if (cpu->start_powered_off) {
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index f2f0a92e..4f0bf000 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -609,6 +609,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
return -EINVAL;
}
+ qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cs);
+
/* Determine init features for this CPU */
memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
if (cpu->start_powered_off) {
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 32d97ce5..97560d4e 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -113,6 +113,23 @@ bool write_list_to_kvmstate(ARMCPU *cpu, int level);
*/
bool write_kvmstate_to_list(ARMCPU *cpu);
+/**
+ * kvm_arm_cpu_pre_save:
+ * @cpu: ARMCPU
+ *
+ * Called after write_kvmstate_to_list() from cpu_pre_save() to update
+ * the cpreg list with KVM CPU state.
+ */
+void kvm_arm_cpu_pre_save(ARMCPU *cpu);
+
+/**
+ * kvm_arm_cpu_post_load:
+ * @cpu: ARMCPU
+ *
+ * Called from cpu_post_load() to update KVM CPU state from the cpreg list.
+ */
+void kvm_arm_cpu_post_load(ARMCPU *cpu);
+
/**
* kvm_arm_reset_vcpu:
* @cpu: ARMCPU
@@ -241,6 +258,24 @@ int kvm_arm_sync_mpstate_to_kvm(ARMCPU *cpu);
*/
int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu);
+/**
+ * kvm_arm_get_virtual_time:
+ * @cs: CPUState
+ *
+ * Gets the VCPU's virtual counter and stores it in the KVM CPU state.
+ */
+void kvm_arm_get_virtual_time(CPUState *cs);
+
+/**
+ * kvm_arm_put_virtual_time:
+ * @cs: CPUState
+ *
+ * Sets the VCPU's virtual counter to the value stored in the KVM CPU state.
+ */
+void kvm_arm_put_virtual_time(CPUState *cs);
+
+void kvm_arm_vm_state_change(void *opaque, int running, RunState state);
+
int kvm_arm_vgic_probe(void);
void kvm_arm_pmu_set_irq(CPUState *cs, int irq);
@@ -272,6 +307,8 @@ static inline int kvm_arm_vgic_probe(void)
static inline void kvm_arm_pmu_set_irq(CPUState *cs, int irq) {}
static inline void kvm_arm_pmu_init(CPUState *cs) {}
+static inline void kvm_arm_get_virtual_time(CPUState *cs) {}
+static inline void kvm_arm_put_virtual_time(CPUState *cs) {}
#endif
static inline const char *gic_class_name(void)
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 3fd319a3..ee3c59a6 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -644,6 +644,12 @@ static int cpu_pre_save(void *opaque)
/* This should never fail */
abort();
}
+
+ /*
+ * kvm_arm_cpu_pre_save() must be called after
+ * write_kvmstate_to_list()
+ */
+ kvm_arm_cpu_pre_save(cpu);
} else {
if (!write_cpustate_to_list(cpu, false)) {
/* This should never fail. */
@@ -746,6 +752,7 @@ static int cpu_post_load(void *opaque, int version_id)
* we're using it.
*/
write_list_to_cpustate(cpu);
+ kvm_arm_cpu_post_load(cpu);
} else {
if (!write_list_to_cpustate(cpu)) {
return -1;
--
2.23.0