Both vfio_listener_region_add and vfio_listener_region_del have
reference counting operations on ram section->mr. If the 'iova'
and 'llend' of the ram section do not pass the alignment
check, the ram section should not be mapped or unmapped. It means
that the reference counting should not be changed.
However, the address alignment check is missing in
vfio_listener_region_del. This makes memory_region_unref will
be unconditional called and causes unintended problems in some
scenarios.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
The 'iova' will be passed to host kernel for mapping with the
HPA. It is related to the host page size. So TARGET_PAGE_ALIGN
should be replaced by REAL_HOST_PAGE_ALIGN. In the case of
large granularity (64K), it may return early when map MMIO RAM
section. And because of the inconsistency with
vfio_dma_unmap_ram_section, it may cause 'assert(qrange)'
in vfio_dma_unmap.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
data might point into the middle of a larger buffer, there is a separate
free_on_destroy pointer passed into bufp_alloc() to handle that. It is
only used in the normal workflow though, not when dropping packets due
to the queue being full. Fix that.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/491
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210722072756.647673-1-kraxel@redhat.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
vfio: Support host translation granule size
vfio/migrate: Move switch of dirty tracking into vfio_memory_listener
vfio: Fix unregister SaveVMHandler in vfio_migration_finalize
migration/ram: Reduce unnecessary rate limiting
migration/ram: Optimize ram_save_host_page()
qdev/monitors: Fix reundant error_setg of qdev_add_device
linux-headers: update against 5.10 and manual clear vfio dirty log series
vfio: Maintain DMA mapping range for the container
vfio/migration: Add support for manual clear vfio dirty log
hw/arm/smmuv3: Support 16K translation granule
hw/arm/smmuv3: Set the restoration priority of the vSMMUv3 explicitly
hw/vfio/common: trace vfio_connect_container operations
update-linux-headers: Import iommu.h
vfio.h and iommu.h header update against 5.10
memory: Add new fields in IOTLBEntry
hw/arm/smmuv3: Improve stage1 ASID invalidation
hw/arm/smmu-common: Allow domain invalidation for NH_ALL/NSNH_ALL
memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute
memory: Introduce IOMMU Memory Region inject_faults API
iommu: Introduce generic header
pci: introduce PCIPASIDOps to PCIDevice
vfio: Force nested if iommu requires it
vfio: Introduce hostwin_from_range helper
vfio: Introduce helpers to DMA map/unmap a RAM section
vfio: Set up nested stage mappings
vfio: Pass stage 1 MSI bindings to the host
vfio: Helper to get IRQ info including capabilities
vfio/pci: Register handler for iommu fault
vfio/pci: Set up the DMA FAULT region
vfio/pci: Implement the DMA fault handler
hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
hw/arm/smmuv3: Store the PASID table GPA in the translation config
hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
hw/arm/smmuv3: Pass stage 1 configurations to the host
hw/arm/smmuv3: Implement fault injection
hw/arm/smmuv3: Allow MAP notifiers
pci: Add return_page_response pci ops
vfio/pci: Implement return_page_response page response callback
vfio/common: Avoid unmap ram section at vfio_listener_region_del() in nested mode
vfio: Introduce helpers to mark dirty pages of a RAM section
vfio: Add vfio_prereg_listener_log_sync in nested stage
vfio: Add vfio_prereg_listener_log_clear to re-enable mark dirty pages
vfio: Add vfio_prereg_listener_global_log_start/stop in nested stage
hw/arm/smmuv3: Post-load stage 1 configurations to the host
Signed-off-by: Chen Qun<kuhn.chenqun@huawei.com>
In nested mode, we call the set_pasid_table() callback on each
STE update to pass the guest stage 1 configuration to the host
and apply it at physical level.
In the case of live migration, we need to manually call the
set_pasid_table() to load the guest stage 1 configurations to
the host. If this operation fails, the migration fails.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
In nested mode, we set up the stage 2 and stage 1 separately. In my
opinion, vfio_memory_prereg_listener is used for stage 2 and
vfio_memory_listener is used for stage 1. So it feels weird to call
the global_log_start/stop interface in vfio_memory_listener to switch
dirty tracking, although this won't cause any errors. Add
global_log_start/stop interface in vfio_memory_prereg_listener
can separate stage 2 from stage 1.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
When tracking dirty pages, we just need to pay attention to stage 2
mappings. Legacy vfio_listener_log_clear cannot be used in nested
stage. This patch adds vfio_prereg_listener_log_clear to re-enable
dirty pages in nested mode.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
In nested mode, we set up the stage 2 (gpa->hpa)and stage 1
(giova->gpa) separately by vfio_prereg_listener_region_add()
and vfio_listener_region_add(). So when marking dirty pages
we just need to pay attention to stage 2 mappings.
Legacy vfio_listener_log_sync cannot be used in nested stage.
This patch adds vfio_prereg_listener_log_sync to mark dirty
pages in nested mode.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Extract part of the code from vfio_sync_dirty_bitmap to form a
new helper, which allows to mark dirty pages of a RAM section.
This helper will be called for nested stage.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
The ram section will be unmapped at vfio_prereg_listener_region_del()
in nested mode. So let's avoid unmap ram section at
vfio_listener_region_dev().
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
This patch implements the page response path. The
response is written into the page response ring buffer and then
update header's head index is updated. This path is not used
by this series. It is introduced here as a POC for vSVA/ARM
integration.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Add a new PCI operation that allows to return page responses
to registered VFIO devices
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
We now have all bricks to support nested paging. This
uses MAP notifiers to map the MSIs. So let's allow MAP
notifiers to be registered.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
We convert iommu_fault structs received from the kernel
into the data struct used by the emulation code and record
the evnts into the virtual event queue.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
In case PASID PciOps are set for the device we call
the set_pasid_table() callback on each STE update.
This allows to pass the guest stage 1 configuration
to the host and apply it at physical level.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Let's propagate the leaf attribute throughout the invalidation path.
This hint is used to reduce the scope of the invalidations to the
last level of translation. Not enforcing it induces large performance
penalties in nested mode.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
When the guest invalidates one S1 entry, it passes the asid.
When propagating this invalidation downto the host, the asid
information also must be passed. So let's fill the arch_id field
introduced for that purpose and accordingly set the flags to
indicate its presence.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
For VFIO integration we will need to pass the Context Descriptor (CD)
table GPA to the host. The CD table is also referred to as the PASID
table. Its GPA corresponds to the s1ctrptr field of the Stream Table
Entry. So let's decode and store it in the configuration structure.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
The SMMUv3 has the peculiarity to translate MSI
transactionss. let's advertise the corresponding
attribute.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Whenever the eventfd is triggered, we retrieve the DMA fault(s)
from the mmapped fault region and inject them in the iommu
memory region.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Set up the fault region which is composed of the actual fault
queue (mmappable) and a header used to handle it. The fault
queue is mmapped.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
We use the new extended IRQ VFIO_IRQ_TYPE_NESTED type and
VFIO_IRQ_SUBTYPE_DMA_FAULT subtype to set/unset
a notifier for physical DMA faults. The associated eventfd is
triggered, in nested mode, whenever a fault is detected at IOMMU
physical level.
The actual handler will be implemented in subsequent patches.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
As done for vfio regions, add helpers to retrieve irq info
including their optional capabilities.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
We register the stage1 MSI bindings when enabling the vectors
and we unregister them on msi disable.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
In nested mode, legacy vfio_iommu_map_notify cannot be used as
there is no "caching" mode and we do not trap on map.
On Intel, vfio_iommu_map_notify was used to DMA map the RAM
through the host single stage.
With nested mode, we need to setup the stage 2 and the stage 1
separately. This patch introduces a prereg_listener to setup
the stage 2 mapping.
The stage 1 mapping, owned by the guest, is passed to the host
when the guest invalidates the stage 1 configuration, through
a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
are cascaded downto the host through another IOMMU MR UNMAP
notifier.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Let's introduce two helpers that allow to DMA map/unmap a RAM
section. Those helpers will be called for nested stage setup in
another call site. Also the vfio_listener_region_add/del()
structure may be clearer.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Let's introduce a hostwin_from_range() helper that returns the
hostwin encapsulating an IOVA range or NULL if none is found.
This improves the readibility of callers and removes the usage
of hostwin_found.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
In case we detect the address space is translated by
a virtual IOMMU which requires HW nested paging to
integrate with VFIO, let's set up the container with
the VFIO_TYPE1_NESTING_IOMMU iommu_type.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
This patch introduces PCIPASIDOps for IOMMU related operations.
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.htmlhttps://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00940.html
So far, to setup virt-SVA for assigned SVA capable device, needs to
configure host translation structures for specific pasid. (e.g. bind
guest page table to host and enable nested translation in host).
Besides, vIOMMU emulator needs to forward guest's cache invalidation
to host since host nested translation is enabled. e.g. on VT-d, guest
owns 1st level translation table, thus cache invalidation for 1st
level should be propagated to host.
This patch adds two functions: alloc_pasid and free_pasid to support
guest pasid allocation and free. The implementations of the callbacks
would be device passthru modules. Like vfio.
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Yi Sun <yi.y.sun@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
This header is meant to exposes data types used by
several IOMMU devices such as struct for SVA and
nested stage configuration.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
This new API allows to inject @count iommu_faults into
the IOMMU memory region.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
We introduce a new IOMMU Memory Region attribute, IOMMU_ATTR_MSI_TRANSLATE
which tells whether the virtual IOMMU translates MSIs. ARM SMMU
will expose this attribute since, as opposed to Intel DMAR, MSIs
are translated as any other DMA requests.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
We introduce a new IOMMU Memory Region attribute,
IOMMU_ATTR_VFIO_NESTED that tells whether the virtual IOMMU
requires HW nested paging for VFIO integration.
Current Intel virtual IOMMU device supports "Caching
Mode" and does not require 2 stages at physical level to be
integrated with VFIO. However SMMUv3 does not implement such
"caching mode" and requires to use HW nested paging.
As such SMMUv3 is the first IOMMU device to advertise this
attribute.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
NH_ALL/NSNH_ALL corresponds to a domain granularity invalidation,
ie. all the notifier range gets invalidation, whatever the ASID.
So let's set the granularity to IOMMU_INV_GRAN_DOMAIN to allow
the consumer to benefit from the info if it can.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: chenxiang (M) <chenxiang66@hisilicon.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
At the moment ASID invalidation command (CMD_TLBI_NH_ASID) is
propagated as a domain invalidation (the whole notifier range
is invalidated independently on any ASID information).
The new granularity field now allows to be more precise and
restrict the invalidation to a peculiar ASID. Set the corresponding
fields and flag.
We still keep the iova and addr_mask settings for consumers that
do not support the new fields, like VHOST.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>