qemu/vfio-iommufd-Introduce-auto-domain-creation.patch

276 lines
11 KiB
Diff
Raw Permalink Normal View History

QEMU update to version 8.2.0-30: - Revert "linux-user: Print tid not pid with strace" - gpex-acpi: Remove duplicate DSM #5 - smmuv3: Use default bus for arm-smmuv3-accel - smmuv3: Change arm-smmuv3-nested name to arm-smmuv3-accel - smmu-common: Return sysmem address space only for vfio-pci - smmuv3: realize get_pasid_cap and set ssidsize with pasid - vfio: Synthesize vPASID capability to VM - backend/iommufd: Report PASID capability - pci: Get pasid capability from vIOMMU - smmuv3: Add support for page fault handling - kvm: Translate MSI doorbell address only if it is valid - hw/arm/smmuv3: Enable sva/stall IDR features - iommufd.h: Updated to openeuler olk-6.6 kernel - tests/data/acpi/virt: Update IORT acpi table - hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding - tests/qtest: Allow IORT acpi table to change - hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes - hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device - hw/arm/smmuv3: Add initial support for SMMUv3 Nested device - hw/arm/virt: Add an SMMU_IO_LEN macro - hw/pci-host/gpex: [needs kernel fix] Allow to generate preserve boot config DSM #5 - tests/data/acpi: Update DSDT acpi tables - acpi/gpex: Fix PCI Express Slot Information function 0 returned value - tests/qtest: Allow DSDT acpi tables to change - hw/arm/smmuv3: Forward cache invalidate commands via iommufd - hw/arm/smmu-common: Replace smmu_iommu_mr with smmu_find_sdev - hw/arm/smmuv3: Add missing STE invalidation - hw/arm/smmuv3: Add smmu_dev_install_nested_ste() for CFGI_STE - hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD - hw/arm/smmuv3: Read host SMMU device info - hw/arm/smmuv3: Ignore IOMMU_NOTIFIER_MAP for nested-smmuv3 - hw/arm/smmu-common: Return sysmem if stage-1 is bypassed - hw/arm/smmu-common: Add iommufd helpers - hw/arm/smmu-common: Add set/unset_iommu_device callback - hw/arm/smmu-common: Extract smmu_get_sbus and smmu_get_sdev helpers - hw/arm/smmu-common: Bypass emulated IOTLB for a nested SMMU - hw/arm/smmu-common: Add a nested flag to SMMUState - backends/iommufd: Introduce iommufd_viommu_invalidate_cache - backends/iommufd: Introduce iommufd_vdev_alloc - backends/iommufd: Introduce iommufd_backend_alloc_viommu - vfio/iommufd: Implement [at|de]tach_hwpt handlers - vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler - HostIOMMUDevice: Introduce realize_late callback - vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD - backends/iommufd: Add helpers for invalidating user-managed HWPT - Update iommufd.h header for vSVA - vfio/common: Allow disabling device dirty page tracking - vfio/migration: Don't block migration device dirty tracking is unsupported - vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support - vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support - vfio/iommufd: Probe and request hwpt dirty tracking capability - vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device() - vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps - vfio/{iommufd,container}: Remove caps::aw_bits - HostIOMMUDevice: Store the VFIO/VDPA agent - vfio/iommufd: Introduce auto domain creation - vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdev - vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdev - vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() - backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities - vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev - vfio/pci: Extract mdev check into an helper - intel_iommu: Check compatibility with host IOMMU capabilities - intel_iommu: Implement [set|unset]_iommu_device() callbacks - intel_iommu: Extract out vtd_cap_init() to initialize cap/ecap - vfio/pci: Pass HostIOMMUDevice to vIOMMU - hw/pci: Introduce pci_device_[set|unset]_iommu_device() - hw/pci: Introduce helper function pci_device_get_iommu_bus_devfn() - vfio: Create host IOMMU device instance - backends/iommufd: Implement HostIOMMUDeviceClass::get_cap() handler - vfio/container: Implement HostIOMMUDeviceClass::get_cap() handler - vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler - backends/iommufd: Introduce helper function iommufd_backend_get_device_info() - vfio/container: Implement HostIOMMUDeviceClass::realize() handler - range: Introduce range_get_last_bit() - backends/iommufd: Introduce TYPE_HOST_IOMMU_DEVICE_IOMMUFD[_VFIO] devices - vfio/container: Introduce TYPE_HOST_IOMMU_DEVICE_LEGACY_VFIO device - backends/host_iommu_device: Introduce HostIOMMUDeviceCaps - backends: Introduce HostIOMMUDevice abstract - vfio/iommufd: Remove CONFIG_IOMMUFD usage - vfio/spapr: Extend VFIOIOMMUOps with a release handler - vfio/spapr: Only compile sPAPR IOMMU support when needed - vfio/iommufd: Introduce a VFIOIOMMU iommufd QOM interface - vfio/spapr: Introduce a sPAPR VFIOIOMMU QOM interface - vfio/container: Intoduce a new VFIOIOMMUClass::setup handler - vfio/container: Introduce a VFIOIOMMU legacy QOM interface - vfio/container: Introduce a VFIOIOMMU QOM interface - vfio/container: Initialize VFIOIOMMUOps under vfio_init_container() - vfio/container: Introduce vfio_legacy_setup() for further cleanups - docs/devel: Add VFIO iommufd backend documentation - vfio: Introduce a helper function to initialize VFIODevice - vfio/ccw: Move VFIODevice initializations in vfio_ccw_instance_init - vfio/ap: Move VFIODevice initializations in vfio_ap_instance_init - vfio/platform: Move VFIODevice initializations in vfio_platform_instance_init - vfio/pci: Move VFIODevice initializations in vfio_instance_init - hw/i386: Activate IOMMUFD for q35 machines - kconfig: Activate IOMMUFD for s390x machines - hw/arm: Activate IOMMUFD for virt machines - vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks - vfio/ccw: Make vfio cdev pre-openable by passing a file handle - vfio/ccw: Allow the selection of a given iommu backend - vfio/ap: Make vfio cdev pre-openable by passing a file handle - vfio/ap: Allow the selection of a given iommu backend - vfio/platform: Make vfio cdev pre-openable by passing a file handle - vfio/platform: Allow the selection of a given iommu backend - vfio/pci: Make vfio cdev pre-openable by passing a file handle - vfio/pci: Allow the selection of a given iommu backend - vfio/iommufd: Enable pci hot reset through iommufd cdev interface - vfio/pci: Introduce a vfio pci hot reset interface - vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info - vfio/iommufd: Add support for iova_ranges and pgsizes - vfio/iommufd: Relax assert check for iommufd backend - vfio/iommufd: Implement the iommufd backend - vfio/common: return early if space isn't empty - util/char_dev: Add open_cdev() - backends/iommufd: Introduce the iommufd object - vfio/spapr: Move hostwin_list into spapr container - vfio/spapr: Move prereg_listener into spapr container - vfio/spapr: switch to spapr IOMMU BE add/del_section_window - vfio/spapr: Introduce spapr backend and target interface - vfio/container: Implement attach/detach_device - vfio/container: Move iova_ranges to base container - vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size to base container - vfio/container: Move listener to base container - vfio/container: Move vrdl_list to base container - vfio/container: Move pgsizes and dma_max_mappings to base container - vfio/container: Convert functions to base container - vfio/container: Move per container device list in base container - vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API - vfio/container: Move space field to base container - vfio/common: Move giommu_list in base container - vfio/common: Introduce vfio_container_init/destroy helper - vfio/container: Switch to dma_map|unmap API - vfio/container: Introduce a empty VFIOIOMMUOps - vfio: Introduce base object for VFIOContainer and targeted interface - cryptodev: Fix error handling in cryptodev_lkcf_execute_task() - hw/xen: Fix xen_bus_realize() error handling - hw/misc/aspeed_hace: Fix buffer overflow in has_padding function - target/s390x: Fix a typo in s390_cpu_class_init() - hw/sd/sdhci: free irq on exit - hw/ufs: free irq on exit - hw/pci-host/designware: Fix ATU_UPPER_TARGET register access - target/i386: Make invtsc migratable when user sets tsc-khz explicitly - target/i386: Construct CPUID 2 as stateful iff times > 1 - target/i386: Enable fdp-excptn-only and zero-fcs-fds - target/i386: Don't construct a all-zero entry for CPUID[0xD 0x3f] - i386/cpuid: Remove subleaf constraint on CPUID leaf 1F - target/i386: pass X86CPU to x86_cpu_get_supported_feature_word - target/i386: Raise the highest index value used for any VMCS encoding - target/i386: Add VMX control bits for nested FRED support - target/i386: Delete duplicated macro definition CR4_FRED_MASK - target/i386: Add get/set/migrate support for FRED MSRs - target/i386: enumerate VMX nested-exception support - vmxcap: add support for VMX FRED controls - target/i386: mark CR4.FRED not reserved - target/i386: add support for FRED in CPUID enumeration - target/i386: fix feature dependency for WAITPKG - target/i386: Add more features enumerated by CPUID.7.2.EDX - net: fix build when libbpf is disabled, but libxdp is enabled - hw/nvme: fix invalid endian conversion - hw/nvme: fix invalid check on mcl - backends/cryptodev: Do not ignore throttle/backends Errors - backends/cryptodev: Do not abort for invalid session ID - virtcca: add kvm isolation when get tmi version. - qga: Don't daemonize before channel is initialized - qga: Add log to guest-fsfreeze-thaw command - backends: VirtCCA: cvm_gpa_start supports both 1GB and 3GB - BUGFIX: Enforce isolation for virtcca_shared_hugepage - arm: VirtCCA: qemu CoDA support UEFI boot - arm: VirtCCA: Compatibility with older versions of TMM and the kernel - arm: VirtCCA: qemu uefi boot support kae - arm: VirtCCA: CVM support UEFI boot Signed-off-by: Jiabo Feng <fengjiabo1@huawei.com> (cherry picked from commit 85fd7a435d8203dde56fedc4c8f500e41faf132c)
2025-04-22 14:34:58 +08:00
From 630efd6ca2f0c9383223f0ea092abda1c7528f21 Mon Sep 17 00:00:00 2001
From: Joao Martins <joao.m.martins@oracle.com>
Date: Mon, 22 Jul 2024 22:13:18 +0100
Subject: [PATCH] vfio/iommufd: Introduce auto domain creation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
There's generally two modes of operation for IOMMUFD:
1) The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches
to VFIO and mainly performs IOAS_MAP and UNMAP.
2) The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.
For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.
Dirty tracking insurance is enforced via HWPT_ALLOC, which is
responsible for creating an IOMMU domain. This is contrast to the
'simple API' where the IOMMU domain is created by IOMMUFD automatically
when it attaches to VFIO (usually referred as autodomains) but it has
the needed handling for mdevs.
To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimicking kernel
iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
it falls back to IOAS attach.
The auto domain logic allows different IOMMU domains to be created when
DMA dirty tracking is not desired (and VF can provide it), and others where
it is. Here it is not used in this way given how VFIODevice migration
state is initialized after the device attachment. But such mixed mode of
IOMMU dirty tracking + device dirty tracking is an improvement that can
be added on. Keep the 'all of nothing' of type1 approach that we have
been using so far between container vs device dirty tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
[ clg: Added ERRP_GUARD() in iommufd_cdev_autodomains_get() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[Shameer: Changed ret for iommufd_cdev_autodomains_get() ]
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
backends/iommufd.c | 30 +++++++++++++
backends/trace-events | 1 +
hw/vfio/iommufd.c | 85 +++++++++++++++++++++++++++++++++++
include/hw/vfio/vfio-common.h | 9 ++++
include/sysemu/iommufd.h | 5 +++
5 files changed, 130 insertions(+)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 1ce2a24226..0d995d7563 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -223,6 +223,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
return ret;
}
+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt,
+ Error **errp)
+{
+ int ret, fd = be->fd;
+ struct iommu_hwpt_alloc alloc_hwpt = {
+ .size = sizeof(struct iommu_hwpt_alloc),
+ .flags = flags,
+ .dev_id = dev_id,
+ .pt_id = pt_id,
+ .data_type = data_type,
+ .data_len = data_len,
+ .data_uptr = (uintptr_t)data_ptr,
+ };
+
+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
+ data_len, (uintptr_t)data_ptr,
+ alloc_hwpt.out_hwpt_id, ret);
+ if (ret) {
+ error_setg_errno(errp, errno, "Failed to allocate hwpt");
+ return false;
+ }
+
+ *out_hwpt = alloc_hwpt.out_hwpt_id;
+ return true;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/backends/trace-events b/backends/trace-events
index d45c6e31a6..e248bf039e 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 5e7788ed59..3b75cba26c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -225,10 +225,89 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
return ret;
}
+static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container,
+ Error **errp)
+{
+ ERRP_GUARD();
+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
+ uint32_t flags = 0;
+ VFIOIOASHwpt *hwpt;
+ uint32_t hwpt_id;
+ int ret;
+
+ /* Try to find a domain */
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ /* -EINVAL means the domain is incompatible with the device. */
+ if (ret == -EINVAL) {
+ /*
+ * It is an expected failure and it just means we will try
+ * another domain, or create one if no existing compatible
+ * domain is found. Hence why the error is discarded below.
+ */
+ error_free(*errp);
+ *errp = NULL;
+ continue;
+ }
+
+ return ret;
+ } else {
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ return 0;
+ }
+ }
+
+ if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
+ container->ioas_id, flags,
+ IOMMU_HWPT_DATA_NONE, 0, NULL,
+ &hwpt_id, errp)) {
+ return -EINVAL;
+ }
+
+ hwpt = g_malloc0(sizeof(*hwpt));
+ hwpt->hwpt_id = hwpt_id;
+ QLIST_INIT(&hwpt->device_list);
+
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ return ret;
+ }
+
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+ return 0;
+}
+
+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container)
+{
+ VFIOIOASHwpt *hwpt = vbasedev->hwpt;
+
+ QLIST_REMOVE(vbasedev, hwpt_next);
+ vbasedev->hwpt = NULL;
+
+ if (QLIST_EMPTY(&hwpt->device_list)) {
+ QLIST_REMOVE(hwpt, next);
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ }
+}
+
static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
{
+ /* mdevs aren't physical devices and will fail with auto domains */
+ if (!vbasedev->mdev) {
+ return iommufd_cdev_autodomains_get(vbasedev, container, errp);
+ }
+
return iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
@@ -240,6 +319,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
if (iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
error_report_err(err);
}
+
+ if (vbasedev->hwpt) {
+ iommufd_cdev_autodomains_put(vbasedev, container);
+ }
+
}
static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer *container)
@@ -375,6 +459,7 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
container = g_malloc0(sizeof(*container));
container->be = vbasedev->iommufd;
container->ioas_id = ioas_id;
+ QLIST_INIT(&container->hwpt_list);
bcontainer = &container->bcontainer;
vfio_container_init(bcontainer, space, iommufd_vioc);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e49e5fabba..2093ed2e91 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -107,10 +107,17 @@ typedef struct VFIOHostDMAWindow {
typedef struct IOMMUFDBackend IOMMUFDBackend;
+typedef struct VFIOIOASHwpt {
+ uint32_t hwpt_id;
+ QLIST_HEAD(, VFIODevice) device_list;
+ QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
typedef struct VFIOIOMMUFDContainer {
VFIOContainerBase bcontainer;
IOMMUFDBackend *be;
uint32_t ioas_id;
+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
} VFIOIOMMUFDContainer;
typedef struct VFIODeviceOps VFIODeviceOps;
@@ -144,6 +151,8 @@ typedef struct VFIODevice {
HostIOMMUDevice *hiod;
int devid;
IOMMUFDBackend *iommufd;
+ VFIOIOASHwpt *hwpt;
+ QLIST_ENTRY(VFIODevice) hwpt_next;
} VFIODevice;
struct VFIODeviceOps {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index a0a0143856..f6f01e4be8 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -52,6 +52,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp);
+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt,
+ Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
--
2.41.0.windows.1