virtio-scsi: bugfix: fix qemu crash for hotplug scsi disk with dataplane
virtio: net-tap: bugfix: del net client if net_init_tap_one failed
virtio: bugfix: clean up callback when del virtqueue
virtio-net: bugfix: do not delete netdev before virtio net
virtio-net: fix max vring buf size when set ring num
virtio: check descriptor numbers
virtio: bugfix: add rcu_read_lock when vring_avail_idx is called
virtio: print the guest virtio_net features that host does not support
virtio: bugfix: check the value of caches before accessing it
virtio-net: set the max of queue size to 4096
virtio-net: update the default and max of rx/tx_queue_size
vhost-user: add unregister_savevm when vhost-user cleanup
qemu-img: block: dont blk_make_zero if discard_zeroes false
vhost-user: Add support reconnect vhost-user socket
vhost-user: Set the acked_features to vm's featrue
vhost-user: add vhost_set_mem_table when vm load_setup at destination
vhost-user: add separate memslot counter for vhost-user
vhost-user: quit infinite loop while used memslots is more than the backend limit
qmp: add command to query used memslots of vhost-net and vhost-user
vhost-user-scsi: add support for SPDK hot upgrade
i6300esb watchdog: bugfix: Add a runstate transition
Signed-off-by: Chen Qun<kuhn.chenqun@huawei.com>
QEMU will abort() for the reasons now:
invalid runstate transition: 'prelaunch' -> 'postmigrate'
Aborted
This happens when:
|<- watchdog timeout happened, then sets reset_requested to
| SHUTDOWN_CAUSE_GUEST_RESET;
|<- hot-migration thread sets vm state to RUN_STATE_FINISH_MIGRATE
| before the last time of migration;
|<- main thread gets the change of reset_requested and triggers
| reset, then sets vm state to RUN_STATE_PRELAUNCH;
|<- hot-migration thread sets vm state to RUN_STATE_POSTMIGRATE.
Then 'prelaunch' -> 'postmigrate' runstate transition will happen.
It is legal so add this transition to runstate_transitions_def.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
When used memslots is more than the backend limit,
the vhost-user netcard would attach fail and quit
infinite loop.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
Used_memslots is equal to dev->mem->nregions now, it is true for
vhost kernel, but not for vhost user, which uses the memory regions
that have file descriptor. In fact, not all of the memory regions
have file descriptor.
It is usefully in some scenarios, e.g. used_memslots is 8, and only
5 memory slots can be used by vhost user, it is failed to hot plug
a new memory RAM because vhost_has_free_slot just returned false,
but we can hot plug it safely in fact.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
When migrate huge vm, packages lost are 90+.
During the load_setup of the destination vm, pass the
vm mem structure to ovs, the netcard could be enabled
when the migration finish state shifting.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
Fix the problem when vm restart, the ovs restart and lead to the net
unreachable. The soluation is set the acked_features to vm's featrue
just the same as guest virtio-net mod load.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
Set the max of tx_queue_size to 4096 even if the backends
are not vhost-user.
Set the default of rx/tx_queue_size to 2048 if the backends
are vhost-user, otherwise to 4096.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
Vring caches may be NULL in check_vring_avail_num() if
virtio_reset() is called at the same time, such as when
the virtual machine starts.
So check it before accessing it in vring_avail_idx().
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
viring_avail_idx should be called within rcu_read_lock(),
or may get NULL caches in vring_get_region_caches() and
trigger assert().
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
Check if the vring num is normal in virtio_save(), and add LOG
the vm push the wrong viring num down through writing IO Port.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
For the vhost-user net-card, it is allow to delete its
network backend while the virtio-net device still exists.
However, when the status of the device changes in guest,
QEMU will check whether the network backend exists, otherwise
it will crash.
So do not allowed to delete the network backend directly
without delete virtio-net device.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
We will access NULL pointer as follow:
1. Start a vm with multiqueue vhost-net
2. then we write VIRTIO_PCI_GUEST_FEATURES in PCI configuration to
trigger multiqueue disable in this vm which will delete the virtqueue.
In this step, the tx_bh is deleted but the callback virtio_net_handle_tx_bh
still exist.
3. Finally, we write VIRTIO_PCI_QUEUE_NOTIFY in PCI configuration to
notify the deleted virtqueue. In this way, virtio_net_handle_tx_bh
will be called and qemu will be crashed.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
In net_init_tap_one(), if the net-tap initializes successful
but other actions failed during vhost-net hot-plugging, the
net-tap will remain in the net clients.causing next hot-plug
fails again.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
The vm will trigger a disk sweep operation after plugging
a controller who's io type is iothread. If attach a scsi
disk immediately, the sg_inqury request in vm will trigger
the assert in virtio_scsi_ctx_check(), which is called by
virtio_scsi_handle_cmd_req_prepare().
Add judgment in virtio_scsi_handle_cmd_req_prepare() and
return IO Error directly if the device has not been
initialized.
Signed-off-by: Jinhua Cao <caojinhua1@huawei.com>
bugfix: fix some illegal memory access and memory leak
bugfix: fix possible memory leak
bugfix: fix eventfds may double free when vm_id reused in ivshmem
block/mirror: fix file-system went to read-only after block-mirror
bugfix: fix mmio information leak and ehci vm escape 0-day vulnerability
target-i386: Fix the RES memory inc which caused by the coroutine created
Signed-off-by: Chen Qun<kuhn.chenqun@huawei.com>
for better performance, change the POOL_BATCH_SIZE from 64 to 128.
Signed-off-by: caojinhua <caojinhua1@huawei.com>
Signed-off-by: jiangdongxu <jiangdongxu1@huawei.com>
config vm disk with prdm, keep the disk writing data continuously
during block-mirror, the file-system will went to read-only after
block-mirror, fix it.
Signed-off-by: caojinhua <caojinhua1@huawei.com>
Signed-off-by: jiangdongxu <jiangdongxu1@huawei.com>
As the ivshmem Server-Client Protol describes, when a
client disconnects from the server, server sends disconnect
notifications to the other clients. And the other clients
will free the eventfds of the disconnected client according
to the client ID. If the client ID is reused, the eventfds
may be double freed.
It will be solved by setting eventfds to NULL after freeing
and allocating memory for it when it's used.
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: jiangdongxu <jiangdongxu1@huawei.com>
linux-headers: update against 5.10 and manual clear vfio dirty log series
vfio: Maintain DMA mapping range for the container
vfio/migration: Add support for manual clear vfio dirty log
update-linux-headers: Import iommu.h
vfio.h and iommu.h header update against 5.10
memory: Add new fields in IOTLBEntry
hw/arm/smmuv3: Improve stage1 ASID invalidation
hw/arm/smmu-common: Allow domain invalidation for NH_ALL/NSNH_ALL
memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute
memory: Introduce IOMMU Memory Region inject_faults API
iommu: Introduce generic header
pci: introduce PCIPASIDOps to PCIDevice
vfio: Force nested if iommu requires it
vfio: Introduce hostwin_from_range helper
vfio: Introduce helpers to DMA map/unmap a RAM section
vfio: Set up nested stage mappings
vfio: Pass stage 1 MSI bindings to the host
vfio: Helper to get IRQ info including capabilities
vfio/pci: Register handler for iommu fault
vfio/pci: Set up the DMA FAULT region
vfio/pci: Implement the DMA fault handler
hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
hw/arm/smmuv3: Store the PASID table GPA in the translation config
hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
hw/arm/smmuv3: Pass stage 1 configurations to the host
hw/arm/smmuv3: Implement fault injection
hw/arm/smmuv3: Allow MAP notifiers
pci: Add return_page_response pci ops
vfio/pci: Implement return_page_response page response callback
vfio/common: Avoid unmap ram section at vfio_listener_region_del() in nested mode
vfio: Introduce helpers to mark dirty pages of a RAM section
vfio: Add vfio_prereg_listener_log_sync in nested stage
vfio: Add vfio_prereg_listener_log_clear to re-enable mark dirty pages
vfio: Add vfio_prereg_listener_global_log_start/stop in nested stage
hw/arm/smmuv3: Post-load stage 1 configurations to the host
vfio/common: Fix incorrect address alignment in vfio_dma_map_ram_section
vfio/common: Add address alignment check in vfio_listener_region_del
Signed-off-by: Chen Qun<kuhn.chenqun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 45d983f4507f9f6089de83fcd4d3a2136876b566)
Both vfio_listener_region_add and vfio_listener_region_del have
reference counting operations on ram section->mr. If the 'iova'
and 'llend' of the ram section do not pass the alignment
check, the ram section should not be mapped or unmapped. It means
that the reference counting should not be changed.
However, the address alignment check is missing in
vfio_listener_region_del. This makes memory_region_unref will
be unconditional called and causes unintended problems in some
scenarios.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit c4568a05c1d9f9017c89abc9df4270ce128a9cc3)
The 'iova' will be passed to host kernel for mapping with the
HPA. It is related to the host page size. So TARGET_PAGE_ALIGN
should be replaced by REAL_HOST_PAGE_ALIGN. In the case of
large granularity (64K), it may return early when map MMIO RAM
section. And because of the inconsistency with
vfio_dma_unmap_ram_section, it may cause 'assert(qrange)'
in vfio_dma_unmap.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit af442e7ad177338fae5a5399de604cf8bef777ee)
In nested mode, we call the set_pasid_table() callback on each
STE update to pass the guest stage 1 configuration to the host
and apply it at physical level.
In the case of live migration, we need to manually call the
set_pasid_table() to load the guest stage 1 configurations to
the host. If this operation fails, the migration fails.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit db24e228d7511ab6cb54795db3237bafb275d14e)
In nested mode, we set up the stage 2 and stage 1 separately. In my
opinion, vfio_memory_prereg_listener is used for stage 2 and
vfio_memory_listener is used for stage 1. So it feels weird to call
the global_log_start/stop interface in vfio_memory_listener to switch
dirty tracking, although this won't cause any errors. Add
global_log_start/stop interface in vfio_memory_prereg_listener
can separate stage 2 from stage 1.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 3608c966eaac58ea54a2f084ddb7d31f4309c8fe)
When tracking dirty pages, we just need to pay attention to stage 2
mappings. Legacy vfio_listener_log_clear cannot be used in nested
stage. This patch adds vfio_prereg_listener_log_clear to re-enable
dirty pages in nested mode.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 9e95ed8404c1d32048d6288100c2a5fcb1ebba75)
In nested mode, we set up the stage 2 (gpa->hpa)and stage 1
(giova->gpa) separately by vfio_prereg_listener_region_add()
and vfio_listener_region_add(). So when marking dirty pages
we just need to pay attention to stage 2 mappings.
Legacy vfio_listener_log_sync cannot be used in nested stage.
This patch adds vfio_prereg_listener_log_sync to mark dirty
pages in nested mode.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 28f87e27c755c532c757ebb590b95817035504f8)
Extract part of the code from vfio_sync_dirty_bitmap to form a
new helper, which allows to mark dirty pages of a RAM section.
This helper will be called for nested stage.
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 6c878a81777952114b8c559c51450b19ab9e13d8)
The ram section will be unmapped at vfio_prereg_listener_region_del()
in nested mode. So let's avoid unmap ram section at
vfio_listener_region_dev().
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 8eba56191ca6910d5501b9d80301898369294aa7)
This patch implements the page response path. The
response is written into the page response ring buffer and then
update header's head index is updated. This path is not used
by this series. It is introduced here as a POC for vSVA/ARM
integration.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 043fcb1e27352b4a10af55cd967fa55190ef4b46)
Add a new PCI operation that allows to return page responses
to registered VFIO devices
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit fe6cd1988a114c63dd1bf6f0a2fcd5770edd6fc6)
We now have all bricks to support nested paging. This
uses MAP notifiers to map the MSIs. So let's allow MAP
notifiers to be registered.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 24eaca5a81bd79b9afd8bd2b07a7d493554447c8)
We convert iommu_fault structs received from the kernel
into the data struct used by the emulation code and record
the evnts into the virtual event queue.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit df9b0e5fdf7fec20c79e840a36d05254082cc2ec)
In case PASID PciOps are set for the device we call
the set_pasid_table() callback on each STE update.
This allows to pass the guest stage 1 configuration
to the host and apply it at physical level.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit 6b84c577acac4c464c47edffb669b1b48641cbcc)
Let's propagate the leaf attribute throughout the invalidation path.
This hint is used to reduce the scope of the invalidations to the
last level of translation. Not enforcing it induces large performance
penalties in nested mode.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: imxcc <xingchaochao@huawei.com>
(cherry picked from commit c87109806f14b82c0669d054ba6ada0dafcf95c4)