v3: - Added a target/arm/kvm_arm.h comment cleanup patch (1/6) - Minor refactoring of assert_has_feature_enabled/disabled in 4/6, kept Richard's r-b. - Rewrote kvm-no-adjvtime documentation in 6/6. - Reworked approach in 5/6 to properly deal with migration and to track running vs. !running, rather than running vs. paused states. v2: - Reworked it enough that I brought back the RFC tag and retitled the series. Also had to drop r-b's from a couple of patches, and even drop patches. - Changed approach from writing the QEMU virtual time to the guest vtime counter to saving and restoring the guest vtime counter. - Changed the kvm-adjvtime property, which was off by default, to a kvm-no-adjvtime property, which is also off by default, meaning the effective "adjust vtime" property is now on by default (but only for 5.0 virt machine types and later) v1: - move from RFC status to v1 - put kvm_arm_vm_state_change() in kvm.c to share among kvm32.c and kvm64.c - add r-b's from Richard This series is inspired by a series[1] posted by Bijan Mottahedeh over a year ago and by the patch[2] posted by Heyi Guo almost a year ago. The problem described in the cover letter of [1] is easily reproducible and some users would like to have the option to avoid it. However the solution, which is to adjust the virtual counter each time the VM transitions to the running state, introduces a different problem, which is that the virtual and physical counters diverge. As described in the cover letter of [1] this divergence is easily observed when comparing the output of `date` and `hwclock` after suspending the guest, waiting a while, and then resuming it. Because this different problem may actually be worse for some users, unlike [1], the series posted here makes the virtual counter adjustment optional. Besides the adjustment being optional, this series approaches the needed changes differently to apply them in more appropriate locations. Additional notes ---------------- Note 1 ------ As described above, when running a guest with kvm-no-adjtime disabled it will be less likely the guest OS and guest applications get surprise time jumps when they use the virtual counter. However the counter will no longer reflect real time. It will lag behind. If this is a problem then the guest can resynchronize its time from an external source or even from its physical counter. If the suspend/resume is done with libvirt's virsh, and the guest is running the guest agent, then it's also possible to use a sequence like this $ virsh suspend $GUEST $ virsh resume $GUEST $ virsh domtime --sync $GUEST in order to resynchronize a guest right after the resume. Of course there will still be time when the clock is not right, possibly creating confusing timestamps in logs, for example, and the guest must still be tolerant to the time synchronizations. Note 2 ------ Userspace that wants to set KVM_REG_ARM_TIMER_CNT should beware that the KVM register ID is not correct. This cannot be fixed because it's UAPI and if the UAPI headers are used then it can't be a problem. However, if a userspace attempts to create the ID themselves from the register's specification, then they will get KVM_REG_ARM_TIMER_CVAL instead, as the _CNT and _CVAL definitions have their register parameters swapped. Note 3 ------ I didn't test this with a 32-bit KVM host, but the changes to kvm32.c are the same as kvm64.c. So what could go wrong? Test results would be appreciated. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05713.html [2] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg03695.html upstream url: https://patchwork.kernel.org/cover/11341629/
291 lines
8.3 KiB
Diff
291 lines
8.3 KiB
Diff
From 77ee224418fac859acecd9aca4d18555ced42db6 Mon Sep 17 00:00:00 2001
|
|
From: Ying Fang <fangying1@huawei.com>
|
|
Date: Tue, 21 Apr 2020 17:32:31 +0800
|
|
Subject: [PATCH 3/4] target/arm/kvm: Implement virtual time adjustment
|
|
|
|
When a VM is stopped (such as when it's paused) guest virtual time
|
|
should stop counting. Otherwise, when the VM is resumed it will
|
|
experience time jumps and its kernel may report soft lockups. Not
|
|
counting virtual time while the VM is stopped has the side effect
|
|
of making the guest's time appear to lag when compared with real
|
|
time, and even with time derived from the physical counter. For
|
|
this reason, this change, which is enabled by default, comes with
|
|
a KVM CPU feature allowing it to be disabled, restoring legacy
|
|
behavior.
|
|
|
|
This patch only provides the implementation of the virtual time
|
|
adjustment. A subsequent patch will provide the CPU property
|
|
allowing the change to be enabled and disabled.
|
|
|
|
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
|
|
Signed-off-by: Andrew Jones <drjones@redhat.com>
|
|
Message-id: 20200120101023.16030-6-drjones@redhat.com
|
|
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
---
|
|
target/arm/cpu.h | 7 ++++
|
|
target/arm/kvm.c | 92 ++++++++++++++++++++++++++++++++++++++++++++
|
|
target/arm/kvm32.c | 2 +
|
|
target/arm/kvm64.c | 2 +
|
|
target/arm/kvm_arm.h | 37 ++++++++++++++++++
|
|
target/arm/machine.c | 7 ++++
|
|
6 files changed, 147 insertions(+)
|
|
|
|
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
|
|
index 94c990cd..e19531a7 100644
|
|
--- a/target/arm/cpu.h
|
|
+++ b/target/arm/cpu.h
|
|
@@ -816,6 +816,13 @@ struct ARMCPU {
|
|
/* KVM init features for this CPU */
|
|
uint32_t kvm_init_features[7];
|
|
|
|
+ /* KVM CPU state */
|
|
+
|
|
+ /* KVM virtual time adjustment */
|
|
+ bool kvm_adjvtime;
|
|
+ bool kvm_vtime_dirty;
|
|
+ uint64_t kvm_vtime;
|
|
+
|
|
/* Uniprocessor system with MP extensions */
|
|
bool mp_is_up;
|
|
|
|
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
|
|
index cc7a46df..21fb7ecd 100644
|
|
--- a/target/arm/kvm.c
|
|
+++ b/target/arm/kvm.c
|
|
@@ -336,6 +336,22 @@ static int compare_u64(const void *a, const void *b)
|
|
return 0;
|
|
}
|
|
|
|
+/*
|
|
+ * cpreg_values are sorted in ascending order by KVM register ID
|
|
+ * (see kvm_arm_init_cpreg_list). This allows us to cheaply find
|
|
+ * the storage for a KVM register by ID with a binary search.
|
|
+ */
|
|
+static uint64_t *kvm_arm_get_cpreg_ptr(ARMCPU *cpu, uint64_t regidx)
|
|
+{
|
|
+ uint64_t *res;
|
|
+
|
|
+ res = bsearch(®idx, cpu->cpreg_indexes, cpu->cpreg_array_len,
|
|
+ sizeof(uint64_t), compare_u64);
|
|
+ assert(res);
|
|
+
|
|
+ return &cpu->cpreg_values[res - cpu->cpreg_indexes];
|
|
+}
|
|
+
|
|
/* Initialize the ARMCPU cpreg list according to the kernel's
|
|
* definition of what CPU registers it knows about (and throw away
|
|
* the previous TCG-created cpreg list).
|
|
@@ -489,6 +505,23 @@ bool write_list_to_kvmstate(ARMCPU *cpu, int level)
|
|
return ok;
|
|
}
|
|
|
|
+void kvm_arm_cpu_pre_save(ARMCPU *cpu)
|
|
+{
|
|
+ /* KVM virtual time adjustment */
|
|
+ if (cpu->kvm_vtime_dirty) {
|
|
+ *kvm_arm_get_cpreg_ptr(cpu, KVM_REG_ARM_TIMER_CNT) = cpu->kvm_vtime;
|
|
+ }
|
|
+}
|
|
+
|
|
+void kvm_arm_cpu_post_load(ARMCPU *cpu)
|
|
+{
|
|
+ /* KVM virtual time adjustment */
|
|
+ if (cpu->kvm_adjvtime) {
|
|
+ cpu->kvm_vtime = *kvm_arm_get_cpreg_ptr(cpu, KVM_REG_ARM_TIMER_CNT);
|
|
+ cpu->kvm_vtime_dirty = true;
|
|
+ }
|
|
+}
|
|
+
|
|
void kvm_arm_reset_vcpu(ARMCPU *cpu)
|
|
{
|
|
int ret;
|
|
@@ -556,6 +589,50 @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu)
|
|
return 0;
|
|
}
|
|
|
|
+void kvm_arm_get_virtual_time(CPUState *cs)
|
|
+{
|
|
+ ARMCPU *cpu = ARM_CPU(cs);
|
|
+ struct kvm_one_reg reg = {
|
|
+ .id = KVM_REG_ARM_TIMER_CNT,
|
|
+ .addr = (uintptr_t)&cpu->kvm_vtime,
|
|
+ };
|
|
+ int ret;
|
|
+
|
|
+ if (cpu->kvm_vtime_dirty) {
|
|
+ return;
|
|
+ }
|
|
+
|
|
+ ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
|
|
+ if (ret) {
|
|
+ error_report("Failed to get KVM_REG_ARM_TIMER_CNT");
|
|
+ abort();
|
|
+ }
|
|
+
|
|
+ cpu->kvm_vtime_dirty = true;
|
|
+}
|
|
+
|
|
+void kvm_arm_put_virtual_time(CPUState *cs)
|
|
+{
|
|
+ ARMCPU *cpu = ARM_CPU(cs);
|
|
+ struct kvm_one_reg reg = {
|
|
+ .id = KVM_REG_ARM_TIMER_CNT,
|
|
+ .addr = (uintptr_t)&cpu->kvm_vtime,
|
|
+ };
|
|
+ int ret;
|
|
+
|
|
+ if (!cpu->kvm_vtime_dirty) {
|
|
+ return;
|
|
+ }
|
|
+
|
|
+ ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
|
|
+ if (ret) {
|
|
+ error_report("Failed to set KVM_REG_ARM_TIMER_CNT");
|
|
+ abort();
|
|
+ }
|
|
+
|
|
+ cpu->kvm_vtime_dirty = false;
|
|
+}
|
|
+
|
|
int kvm_put_vcpu_events(ARMCPU *cpu)
|
|
{
|
|
CPUARMState *env = &cpu->env;
|
|
@@ -667,6 +744,21 @@ MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
|
|
return MEMTXATTRS_UNSPECIFIED;
|
|
}
|
|
|
|
+void kvm_arm_vm_state_change(void *opaque, int running, RunState state)
|
|
+{
|
|
+ CPUState *cs = opaque;
|
|
+ ARMCPU *cpu = ARM_CPU(cs);
|
|
+
|
|
+ if (running) {
|
|
+ if (cpu->kvm_adjvtime) {
|
|
+ kvm_arm_put_virtual_time(cs);
|
|
+ }
|
|
+ } else {
|
|
+ if (cpu->kvm_adjvtime) {
|
|
+ kvm_arm_get_virtual_time(cs);
|
|
+ }
|
|
+ }
|
|
+}
|
|
|
|
int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
|
|
{
|
|
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
|
|
index 51f78f72..ee158830 100644
|
|
--- a/target/arm/kvm32.c
|
|
+++ b/target/arm/kvm32.c
|
|
@@ -195,6 +195,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
|
|
return -EINVAL;
|
|
}
|
|
|
|
+ qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cs);
|
|
+
|
|
/* Determine init features for this CPU */
|
|
memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
|
|
if (cpu->start_powered_off) {
|
|
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
|
|
index f2f0a92e..4f0bf000 100644
|
|
--- a/target/arm/kvm64.c
|
|
+++ b/target/arm/kvm64.c
|
|
@@ -609,6 +609,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
|
|
return -EINVAL;
|
|
}
|
|
|
|
+ qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cs);
|
|
+
|
|
/* Determine init features for this CPU */
|
|
memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
|
|
if (cpu->start_powered_off) {
|
|
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
|
|
index 32d97ce5..97560d4e 100644
|
|
--- a/target/arm/kvm_arm.h
|
|
+++ b/target/arm/kvm_arm.h
|
|
@@ -113,6 +113,23 @@ bool write_list_to_kvmstate(ARMCPU *cpu, int level);
|
|
*/
|
|
bool write_kvmstate_to_list(ARMCPU *cpu);
|
|
|
|
+/**
|
|
+ * kvm_arm_cpu_pre_save:
|
|
+ * @cpu: ARMCPU
|
|
+ *
|
|
+ * Called after write_kvmstate_to_list() from cpu_pre_save() to update
|
|
+ * the cpreg list with KVM CPU state.
|
|
+ */
|
|
+void kvm_arm_cpu_pre_save(ARMCPU *cpu);
|
|
+
|
|
+/**
|
|
+ * kvm_arm_cpu_post_load:
|
|
+ * @cpu: ARMCPU
|
|
+ *
|
|
+ * Called from cpu_post_load() to update KVM CPU state from the cpreg list.
|
|
+ */
|
|
+void kvm_arm_cpu_post_load(ARMCPU *cpu);
|
|
+
|
|
/**
|
|
* kvm_arm_reset_vcpu:
|
|
* @cpu: ARMCPU
|
|
@@ -241,6 +258,24 @@ int kvm_arm_sync_mpstate_to_kvm(ARMCPU *cpu);
|
|
*/
|
|
int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu);
|
|
|
|
+/**
|
|
+ * kvm_arm_get_virtual_time:
|
|
+ * @cs: CPUState
|
|
+ *
|
|
+ * Gets the VCPU's virtual counter and stores it in the KVM CPU state.
|
|
+ */
|
|
+void kvm_arm_get_virtual_time(CPUState *cs);
|
|
+
|
|
+/**
|
|
+ * kvm_arm_put_virtual_time:
|
|
+ * @cs: CPUState
|
|
+ *
|
|
+ * Sets the VCPU's virtual counter to the value stored in the KVM CPU state.
|
|
+ */
|
|
+void kvm_arm_put_virtual_time(CPUState *cs);
|
|
+
|
|
+void kvm_arm_vm_state_change(void *opaque, int running, RunState state);
|
|
+
|
|
int kvm_arm_vgic_probe(void);
|
|
|
|
void kvm_arm_pmu_set_irq(CPUState *cs, int irq);
|
|
@@ -272,6 +307,8 @@ static inline int kvm_arm_vgic_probe(void)
|
|
static inline void kvm_arm_pmu_set_irq(CPUState *cs, int irq) {}
|
|
static inline void kvm_arm_pmu_init(CPUState *cs) {}
|
|
|
|
+static inline void kvm_arm_get_virtual_time(CPUState *cs) {}
|
|
+static inline void kvm_arm_put_virtual_time(CPUState *cs) {}
|
|
#endif
|
|
|
|
static inline const char *gic_class_name(void)
|
|
diff --git a/target/arm/machine.c b/target/arm/machine.c
|
|
index 3fd319a3..ee3c59a6 100644
|
|
--- a/target/arm/machine.c
|
|
+++ b/target/arm/machine.c
|
|
@@ -644,6 +644,12 @@ static int cpu_pre_save(void *opaque)
|
|
/* This should never fail */
|
|
abort();
|
|
}
|
|
+
|
|
+ /*
|
|
+ * kvm_arm_cpu_pre_save() must be called after
|
|
+ * write_kvmstate_to_list()
|
|
+ */
|
|
+ kvm_arm_cpu_pre_save(cpu);
|
|
} else {
|
|
if (!write_cpustate_to_list(cpu, false)) {
|
|
/* This should never fail. */
|
|
@@ -746,6 +752,7 @@ static int cpu_post_load(void *opaque, int version_id)
|
|
* we're using it.
|
|
*/
|
|
write_list_to_cpustate(cpu);
|
|
+ kvm_arm_cpu_post_load(cpu);
|
|
} else {
|
|
if (!write_list_to_cpustate(cpu)) {
|
|
return -1;
|
|
--
|
|
2.23.0
|