- vfio/migration: Add support for manual clear vfio dirty log - vfio: Maintain DMA mapping range for the container - linux-headers: update against 5.10 and manual clear vfio dirty log series - arm/acpi: Fix when make qemu-system-aarch64 at x86_64 host bios_tables_test fail reason: __aarch64__ macro let build_pptt at x86_64 and aarch64 host build different function that let bios_tables_test fail. - pl031: support rtc-timer property for pl031 - feature: Add logs for vm start and destroy - feature: Add log for each modules - log: Add log at boot & cpu init for aarch64 - bugfix: irq: Avoid covering object refcount of qemu_irq - i386: cache passthrough: Update AMD 8000_001D.EAX[25:14] based on vCPU topo - freeclock: set rtc_date_diff for X86 - freeclock: set rtc_date_diff for arm - freeclock: add qmp command to get time offset of vm in seconds - tests: Disable filemonitor testcase - shadow_dev: introduce shadow dev for virtio-net device - pl011: reset read FIFO when UARTTIMSC=0 & UARTICR=0xffff - tests: virt: Update expected ACPI tables for virt test(update BinDir) - arm64: Add the cpufreq device to show cpufreq info to guest - hw/arm64: add vcpu cache info support - tests: virt: Allow changes to PPTT test table - cpu: add Cortex-A72 processor kvm target support - cpu: add Kunpeng-920 cpu support - net: eepro100: validate various address valuesi(CVE-2021-20255) - ide: ahci: add check to avoid null dereference (CVE-2019-12067) - vdpa: set vring enable only if the vring address has already been set - docs: Add generic vhost-vdpa device documentation - vdpa: don't suspend/resume device when vdpa device not started - vdpa: correct param passed in when unregister save - vdpa: suspend function return 0 when the vdpa device is stopped - vdpa: support vdpa device suspend/resume - vdpa: move memory listener to the realize stage - vdpa: implement vdpa device migration - vhost: implement migration state notifier for vdpa device - vhost: implement post resume bh - vhost: implement savevm_handler for vdpa device - vhost: implement vhost_vdpa_device_suspend/resume - vhost: implement vhost-vdpa suspend/resume - vhost: add vhost_dev_suspend/resume_op - vhost: introduce bytemap for vhost backend logging - vhost-vdpa: add migration log ops for VhostOps - vhost-vdpa: add VHOST_BACKEND_F_BYTEMAPLOG - hw/usb: reduce the vpcu cost of UHCI when VNC disconnect - virtio-net: update the default and max of rx/tx_queue_size - virtio-net: set the max of queue size to 4096 - virtio-net: fix max vring buf size when set ring num - virtio-net: bugfix: do not delete netdev before virtio net - monitor: Discard BLOCK_IO_ERROR event when VM rebooted - vhost-user: add unregister_savevm when vhost-user cleanup - vhost-user: add vhost_set_mem_table when vm load_setup at destination - vhost-user: quit infinite loop while used memslots is more than the backend limit - fix qemu-core when vhost-user-net config with server mode - vhost-user: Add support reconnect vhost-user socket - vhost-user: Set the acked_features to vm's featrue - i6300esb watchdog: bugfix: Add a runstate transition - hw/net/rocker_of_dpa: fix double free bug of rocker device - net/dump.c: Suppress spurious compiler warning - pcie: Add pcie-root-port fast plug/unplug feature - pcie: Compat with devices which do not support Link Width, such as ioh3420 - qdev/monitors: Fix reundant error_setg of qdev_add_device - qemu-nbd: set timeout to qemu-nbd socket - qemu-nbd: make native as the default aio mode - nbd/server.c: fix invalid read after client was already free - virtio-scsi: bugfix: fix qemu crash for hotplug scsi disk with dataplane - virtio: bugfix: check the value of caches before accessing it - virtio: print the guest virtio_net features that host does not support - virtio: bugfix: add rcu_read_lock when vring_avail_idx is called - virtio: check descriptor numbers - migration: report multiFd related thread pid to libvirt - migration: report migration related thread pid to libvirt - cpu/features: fix bug for memory leakage - doc: Update multi-thread compression doc - migration: Add compress_level sanity check - migration: Add zstd support in multi-thread compression - migration: Add multi-thread compress ops - migration: Refactoring multi-thread compress migration - migration: Add multi-thread compress method - migration: skip cache_drop for bios bootloader and nvram template - oslib-posix: optimise vm startup time for 1G hugepage - monitor/qmp: drop inflight rsp if qmp client broken - ps2: fix oob in ps2 kbd - Currently, while kvm and qemu can not handle some kvm exit, qemu will do vm_stop, which will make vm in pause state. This action make vm unrecoverable, so send guest panic to libvirt instead. - vhost: cancel migration when vhost-user restarted during migraiton Signed-off-by: Jiabo Feng <fengjiabo1@huawei.com>
205 lines
7.1 KiB
Diff
205 lines
7.1 KiB
Diff
From bd2d81775edf285149346bf793d9b71236d7cf34 Mon Sep 17 00:00:00 2001
|
|
From: Zenghui Yu <yuzenghui@huawei.com>
|
|
Date: Sat, 8 May 2021 17:31:04 +0800
|
|
Subject: [PATCH] vfio: Maintain DMA mapping range for the container
|
|
|
|
When synchronizing dirty bitmap from kernel VFIO we do it in a
|
|
per-iova-range fashion and we allocate the userspace bitmap for each of the
|
|
ioctl. This patch introduces `struct VFIODMARange` to describe a range of
|
|
the given DMA mapping with respect to a VFIO_IOMMU_MAP_DMA operation, and
|
|
make the bitmap cache of this range be persistent so that we don't need to
|
|
g_try_malloc0() every time. Note that the new structure is almost a copy of
|
|
`struct vfio_iommu_type1_dma_map` but only internally used by QEMU.
|
|
|
|
More importantly, the cached per-iova-range dirty bitmap will be further
|
|
used when we want to add support for the CLEAR_BITMAP and this cached
|
|
bitmap will be used to guarantee we don't clear any unknown dirty bits
|
|
otherwise that can be a severe data loss issue for migration code.
|
|
|
|
It's pretty intuitive to maintain a bitmap per container since we perform
|
|
log_sync at this granule. But I don't know how to deal with things like
|
|
memory hot-{un}plug, sparse DMA mappings, etc. Suggestions welcome.
|
|
|
|
* yet something to-do:
|
|
- can't work with guest viommu
|
|
- no locks
|
|
- etc
|
|
|
|
[ The idea and even the commit message are largely inherited from kvm side.
|
|
See commit 9f4bf4baa8b820c7930e23c9566c9493db7e1d25. ]
|
|
|
|
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
|
|
Signed-off-by: Kunkun Jiang <jinagkunkun@huawei.com>
|
|
---
|
|
hw/vfio/common.c | 9 +++++--
|
|
hw/vfio/container.c | 49 +++++++++++++++++++++++++++++++++++
|
|
include/hw/vfio/vfio-common.h | 12 +++++++++
|
|
3 files changed, 68 insertions(+), 2 deletions(-)
|
|
|
|
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
|
|
index e70fdf5e0c..564e933135 100644
|
|
--- a/hw/vfio/common.c
|
|
+++ b/hw/vfio/common.c
|
|
@@ -1156,6 +1156,7 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
|
|
vfio_devices_all_device_dirty_tracking(container);
|
|
uint64_t dirty_pages;
|
|
VFIOBitmap vbmap;
|
|
+ VFIODMARange *qrange;
|
|
int ret;
|
|
|
|
if (!container->dirty_pages_supported && !all_device_dirty_tracking) {
|
|
@@ -1165,10 +1166,16 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
|
|
return 0;
|
|
}
|
|
|
|
+ qrange = vfio_lookup_match_range(container, iova, size);
|
|
+ /* the same as vfio_dma_unmap() */
|
|
+ assert(qrange);
|
|
+
|
|
ret = vfio_bitmap_alloc(&vbmap, size);
|
|
if (ret) {
|
|
return ret;
|
|
}
|
|
+ g_free(vbmap.bitmap);
|
|
+ vbmap.bitmap = qrange->bitmap;
|
|
|
|
if (all_device_dirty_tracking) {
|
|
ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
|
|
@@ -1186,8 +1193,6 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
|
|
trace_vfio_get_dirty_bitmap(container->fd, iova, size, vbmap.size,
|
|
ram_addr, dirty_pages);
|
|
out:
|
|
- g_free(vbmap.bitmap);
|
|
-
|
|
return ret;
|
|
}
|
|
|
|
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
|
|
index 242010036a..9a176a0d33 100644
|
|
--- a/hw/vfio/container.c
|
|
+++ b/hw/vfio/container.c
|
|
@@ -112,6 +112,29 @@ unmap_exit:
|
|
return ret;
|
|
}
|
|
|
|
+VFIODMARange *vfio_lookup_match_range(VFIOContainer *container,
|
|
+ hwaddr start_addr, hwaddr size)
|
|
+{
|
|
+ VFIODMARange *qrange;
|
|
+
|
|
+ QLIST_FOREACH(qrange, &container->dma_list, next) {
|
|
+ if (qrange->iova == start_addr && qrange->size == size) {
|
|
+ return qrange;
|
|
+ }
|
|
+ }
|
|
+ return NULL;
|
|
+}
|
|
+
|
|
+void vfio_dma_range_init_dirty_bitmap(VFIODMARange *qrange)
|
|
+{
|
|
+ uint64_t pages, size;
|
|
+
|
|
+ pages = REAL_HOST_PAGE_ALIGN(qrange->size) / qemu_real_host_page_size();
|
|
+ size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / BITS_PER_BYTE;
|
|
+
|
|
+ qrange->bitmap = g_malloc0(size);
|
|
+}
|
|
+
|
|
/*
|
|
* DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
|
|
*/
|
|
@@ -124,6 +147,7 @@ int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
|
|
.iova = iova,
|
|
.size = size,
|
|
};
|
|
+ VFIODMARange *qrange;
|
|
bool need_dirty_sync = false;
|
|
int ret;
|
|
|
|
@@ -136,6 +160,22 @@ int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
|
|
need_dirty_sync = true;
|
|
}
|
|
|
|
+ /*
|
|
+ * unregister the DMA range
|
|
+ *
|
|
+ * It seems that the memory layer will give us the same section as the one
|
|
+ * used in region_add(). Otherwise it'll be complicated to manipulate the
|
|
+ * bitmap across region_{add,del}. Is there any guarantee?
|
|
+ *
|
|
+ * But there is really not such a restriction on the kernel interface
|
|
+ * (VFIO_IOMMU_DIRTY_PAGES_FLAG_{UN}MAP_DMA, etc).
|
|
+ */
|
|
+ qrange = vfio_lookup_match_range(container, iova, size);
|
|
+ assert(qrange);
|
|
+ g_free(qrange->bitmap);
|
|
+ QLIST_REMOVE(qrange, next);
|
|
+ g_free(qrange);
|
|
+
|
|
while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
|
|
/*
|
|
* The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
|
|
@@ -180,6 +220,14 @@ int vfio_dma_map(VFIOContainer *container, hwaddr iova,
|
|
.iova = iova,
|
|
.size = size,
|
|
};
|
|
+ VFIODMARange *qrange;
|
|
+
|
|
+ qrange = g_malloc0(sizeof(*qrange));
|
|
+ qrange->iova = iova;
|
|
+ qrange->size = size;
|
|
+ QLIST_INSERT_HEAD(&container->dma_list, qrange, next);
|
|
+ /* XXX allocate the dirty bitmap on demand */
|
|
+ vfio_dma_range_init_dirty_bitmap(qrange);
|
|
|
|
if (!readonly) {
|
|
map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
|
|
@@ -552,6 +600,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
|
|
container->iova_ranges = NULL;
|
|
QLIST_INIT(&container->giommu_list);
|
|
QLIST_INIT(&container->vrdl_list);
|
|
+ QLIST_INIT(&container->dma_list);
|
|
|
|
ret = vfio_init_container(container, group->fd, errp);
|
|
if (ret) {
|
|
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
|
|
index a4a22accb9..b131d04c9c 100644
|
|
--- a/include/hw/vfio/vfio-common.h
|
|
+++ b/include/hw/vfio/vfio-common.h
|
|
@@ -80,6 +80,14 @@ typedef struct VFIOAddressSpace {
|
|
|
|
struct VFIOGroup;
|
|
|
|
+typedef struct VFIODMARange {
|
|
+ QLIST_ENTRY(VFIODMARange) next;
|
|
+ hwaddr iova;
|
|
+ size_t size;
|
|
+ void *vaddr; /* unused */
|
|
+ unsigned long *bitmap; /* dirty bitmap cache for this range */
|
|
+} VFIODMARange;
|
|
+
|
|
typedef struct VFIOContainer {
|
|
VFIOAddressSpace *space;
|
|
int fd; /* /dev/vfio/vfio, empowered by the attached groups */
|
|
@@ -97,6 +105,7 @@ typedef struct VFIOContainer {
|
|
QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
|
|
QLIST_HEAD(, VFIOGroup) group_list;
|
|
QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
|
|
+ QLIST_HEAD(, VFIODMARange) dma_list;
|
|
QLIST_ENTRY(VFIOContainer) next;
|
|
QLIST_HEAD(, VFIODevice) device_list;
|
|
GList *iova_ranges;
|
|
@@ -212,6 +221,9 @@ void vfio_put_address_space(VFIOAddressSpace *space);
|
|
bool vfio_devices_all_running_and_saving(VFIOContainer *container);
|
|
|
|
/* container->fd */
|
|
+VFIODMARange *vfio_lookup_match_range(VFIOContainer *container,
|
|
+ hwaddr start_addr, hwaddr size);
|
|
+void vfio_dma_range_init_dirty_bitmap(VFIODMARange *qrange);
|
|
int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
|
|
ram_addr_t size, IOMMUTLBEntry *iotlb);
|
|
int vfio_dma_map(VFIOContainer *container, hwaddr iova,
|
|
--
|
|
2.27.0
|
|
|