summaryrefslogtreecommitdiff
path: root/drivers/vhost/vhost.h
AgeCommit message (Collapse)AuthorFilesLines
2023-07-03vhost: Make parameter name match of vhost_get_vq_desc()Xianting Tian1-1/+1
The parameter name in the function declaration and definition should be the same. drivers/vhost/vhost.h, int vhost_get_vq_desc(..., unsigned int iov_count,...); drivers/vhost/vhost.c, int vhost_get_vq_desc(..., unsigned int iov_size,...) Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Message-Id: <20230621093835.36878-1-xianting.tian@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: Allow worker switching while work is queueingMike Christie1-1/+3
This patch drops the requirement that we can only switch workers if work has not been queued by using RCU for the vq based queueing paths and a mutex for the device wide flush. We can also use this to support SIGKILL properly in the future where we should exit almost immediately after getting that signal. With this patch, when get_signal returns true, we can set the vq->worker to NULL and do a synchronize_rcu to prevent new work from being queued to the vhost_task that has been killed. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-18-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: allow userspace to create workersMike Christie1-0/+3
For vhost-scsi with 3 vqs or more and a workload that tries to use them in parallel like: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \ --ioengine=libaio --iodepth=128 --numjobs=3 the single vhost worker thread will become a bottlneck and we are stuck at around 500K IOPs no matter how many jobs, virtqueues, and CPUs are used. To better utilize virtqueues and available CPUs, this patch allows userspace to create workers and bind them to vqs. You can have N workers per dev and also share N workers with M vqs on that dev. This patch adds the interface related code and the next patch will hook vhost-scsi into it. The patches do not try to hook net and vsock into the interface because: 1. multiple workers don't seem to help vsock. The problem is that with only 2 virtqueues we never fully use the existing worker when doing bidirectional tests. This seems to match vhost-scsi where we don't see the worker as a bottleneck until 3 virtqueues are used. 2. net already has a way to use multiple workers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-16-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: replace single worker pointer with xarrayMike Christie1-1/+2
The next patch allows userspace to create multiple workers per device, so this patch replaces the vhost_worker pointer with an xarray so we can store mupltiple workers and look them up. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-15-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: remove vhost_work_queueMike Christie1-3/+2
vhost_work_queue is no longer used. Each driver is using the poll or vq based queueing, so remove vhost_work_queue. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-13-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: convert poll work to be vq basedMike Christie1-1/+3
This has the drivers pass in their poll to vq mapping and then converts the core poll code to use the vq based helpers. In the next patches we will allow vqs to be handled by different workers, so to allow drivers to execute operations like queue, stop, flush, etc on specific polls/vqs we need to know the mappings. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: take worker or vq for flushingMike Christie1-0/+1
This patch has the core work flush function take a worker. When we support multiple workers we can then flush each worker during device removal, stoppage, etc. It also adds a helper to flush specific virtqueues, so vhost-scsi can flush IO vqs from it's ctl vq. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: take worker or vq instead of dev for queueingMike Christie1-0/+1
This patch has the core work queueing function take a worker for when we support multiple workers. It also adds a helper that takes a vq during queueing so modules can control which vq/worker to queue work on. This temp leaves vhost_work_queue. It will be removed when the drivers are converted in the next patches. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost, vhost_net: add helper to check if vq has workMike Christie1-1/+1
In the next patches each vq might have different workers so one could have work but others do not. For net, we only want to check specific vqs, so this adds a helper to check if a vq has work pending and converts vhost-net to use it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230626232307.97930-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: add vhost_worker pointer to vhost_virtqueueMike Christie1-0/+1
This patchset allows userspace to map vqs to different workers. This patch adds a worker pointer to the vq so in later patches in this set we can queue/flush specific vqs and their workers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-03vhost: dynamically allocate vhost_workerMike Christie1-2/+2
This patchset allows us to allocate multiple workers, so this has us move from the vhost_worker that's embedded in the vhost_dev to dynamically allocating it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-09vhost: support PACKED when setting-getting vring_baseShannon Nelson1-2/+6
Use the right structs for PACKED or split vqs when setting and getting the vring base. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230424225031.18947-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-06-08vhost: Fix crash during early vhost_transport_send_pkt callsMike Christie1-1/+1
If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we can race where: 1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue 2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create. 3. vhost_worker_create will set the dev->worker pointer before setting the worker->vtsk pointer. 4. thread0's vhost_work_queue will see the dev->worker pointer is set and try to call vhost_task_wake using not yet set worker->vtsk pointer. 5. We then crash since vtsk is NULL. Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"), we only had the worker pointer so we could just check it to see if VHOST_SET_OWNER has been done. After that commit we have the vhost_worker and vhost_task pointer, so we can now hit the bug above. This patch embeds the vhost_worker in the vhost_dev and moves the work list initialization back to vhost_dev_init, so we can just check the worker.vtsk pointer to check if VHOST_SET_OWNER has been done like before. Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230607192338.6041-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: syzbot+d0d442c22fa8db45ff0e@syzkaller.appspotmail.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-03-23vhost: use vhost_tasks for worker threadsMike Christie1-2/+2
For vhost workers we use the kthread API which inherit's its values from and checks against the kthreadd thread. This results in the wrong RLIMITs being checked, so while tools like libvirt try to control the number of threads based on the nproc rlimit setting we can end up creating more threads than the user wanted. This patch has us use the vhost_task helpers which will inherit its values/checks from the thread that owns the device similar to if we did a clone in userspace. The vhost threads will now be counted in the nproc rlimits. And we get features like cgroups and mm sharing automatically, so we can remove those calls. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-03-23vhost: move worker thread fields to new structMike Christie1-3/+8
This is just a prep patch. It moves the worker related fields to a new vhost_worker struct and moves the code around to create some helpers that will be used in the next patch. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-02-20vhost: remove unused parameteLiming Wu1-1/+1
"enabled" is defined in vhost_init_device_iotlb, but it is never used. Let's remove it. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Message-Id: <20230110024445.303-1-liming.wu@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2023-01-27vhost/net: Clear the pending messages when the backend is removedEric Auger1-0/+1
When the vhost iotlb is used along with a guest virtual iommu and the guest gets rebooted, some MISS messages may have been recorded just before the reboot and spuriously executed by the virtual iommu after the reboot. As vhost does not have any explicit reset user API, VHOST_NET_SET_BACKEND looks a reasonable point where to clear the pending messages, in case the backend is removed. Export vhost_clear_msg() and call it in vhost_net_set_backend() when fd == -1. Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Jason Wang <jasowang@redhat.com> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Message-Id: <20230117151518.44725-3-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost: rename vhost_work_dev_flushMike Christie1-1/+1
This patch renames vhost_work_dev_flush to just vhost_dev_flush to relfect that it flushes everything on the device and that drivers don't know/care that polls are based on vhost_works. Drivers just flush the entire device and polls, and works for vhost-scsi management TMFs and IO net virtqueues, etc all are flushed. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-9-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost: get rid of vhost_poll_flush() wrapperAndrey Ryabinin1-1/+0
vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush(). It gives wrong impression that we are doing some work over vhost_poll, while in fact it flushes vhost_poll->dev. It only complicate understanding of the code and leads to mistakes like flushing the same vhost_dev several times in a row. Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly. Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> [merge vhost_poll_flush removal from Stefano Garzarella] Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost: support ASID in IOTLB APIGautam Dawar1-2/+2
This patches allows userspace to send ASID based IOTLB message to vhost. This idea is to use the reserved u32 field in the existing V2 IOTLB message. Vhost device should advertise this capability via VHOST_BACKEND_F_IOTLB_ASID backend feature. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-10-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost: fix up vhost_work coding styleMike Christie1-3/+3
Switch from a mix of tabs and spaces to just tabs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20210525174733.6212-6-michael.christie@oracle.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost: fix poll coding styleMike Christie1-6/+6
We use 3 coding styles in this struct. Switch to just tabs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210525174733.6212-5-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost: remove work arg from vhost_work_flushMike Christie1-1/+1
vhost_work_flush doesn't do anything with the work arg. This patch drops it and then renames vhost_work_flush to vhost_work_dev_flush to reflect that the function flushes all the works in the dev and not just a specific queue or work item. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210525174733.6212-2-michael.christie@oracle.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost: Remove the repeated declarationShaokun Zhang1-1/+0
Function 'vhost_vring_ioctl' is declared twice, remove the repeated declaration. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/1621857884-19964-1-git-send-email-zhangshaokun@hisilicon.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-15vhost: add helper to check if a vq has been setupMike Christie1-0/+1
This adds a helper check if a vq has been setup. The next patches will use this when we move the vhost scsi cmd preallocation from per session to per vq. In the per vq case, we only want to allocate cmds for vqs that have actually been setup and not for all the possible vqs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1604986403-4931-2-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-21vhost_vdpa: remove unnecessary spin_lock in vhost_vring_callZhu Lingshan1-1/+0
This commit removed unnecessary spin_locks in vhost_vring_call and related operations. Because we manipulate irq offloading contents in vhost_vdpa ioctl code path which is already protected by dev mutex and vq mutex. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/r/20200909065234.3313-1-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-10-21vhost: reduce stack usage in log_usedLi Wang1-0/+1
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/vhost/vhost.c: In function log_used: drivers/vhost/vhost.c:1906:1: warning: the frame size of 1040 bytes is larger than 1024 bytes Signed-off-by: Li Wang <li.wang@windriver.com> Link: https://lore.kernel.org/r/1600106889-25013-1-git-send-email-li.wang@windriver.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-08-05vhost: generialize backend features setting/gettingJason Wang1-0/+2
Move the backend features setting/getting from net.c to vhost.c to be reused by vhost-vdpa. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200804162048.22587-3-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05vhost: introduce vhost_vring_callZhu Lingshan1-1/+8
This commit introduces struct vhost_vring_call which replaced raw struct eventfd_ctx *call_ctx in struct vhost_virtqueue. Besides eventfd_ctx, it contains a spin lock and an irq_bypass_producer in its structure. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200731065533.4144-2-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost: allow device that does not depend on vhost workerJason Wang1-0/+2
vDPA device currently relays the eventfd via vhost worker. This is inefficient due the latency of wakeup and scheduling, so this patch tries to introduce a use_worker attribute for the vhost device. When use_worker is not set with vhost_dev_init(), vhost won't try to allocate a worker thread and the vhost_poll will be processed directly in the wakeup function. This help for vDPA since it reduces the latency caused by vhost worker. In my testing, it saves 0.2 ms in pings between VMs on a mutual host. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200529080303.15449-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-02virtio: force spec specified alignment on typesMichael S. Tsirkin1-3/+3
The ring element addresses are passed between components with different alignments assumptions. Thus, if guest/userspace selects a pointer and host then gets and dereferences it, we might need to decrease the compiler-selected alignment to prevent compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration, but it seems safer to handle this generally. Note that userspace that allocates the memory is actually OK and does not need to be fixed, but userspace that gets it from guest or another process does need to be fixed. The later doesn't generally talk to the kernel so while it might be buggy it's not talking to the kernel in the buggy way - it's just using the header in the buggy way - so fixing header and asking userspace to recompile is the best we can do. I verified that the produced kernel binary on x86 is exactly identical before and after the change. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-04-16vhost: Create accessors for virtqueues private_dataEugenio Pérez1-0/+27
Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20200331192804.6019-2-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vhost: factor out IOTLBJason Wang1-28/+11
This patch factors out IOTLB into a dedicated module in order to be reused by other modules like vringh. User may choose to enable the automatic retiring by specifying VHOST_IOTLB_FLAG_RETIRE flag to fit for the case of vhost device IOTLB implementation. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-4-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vhost: allow per device message handlerJason Wang1-1/+5
This patch allow device to register its own message handler during vhost_dev_init(). vDPA device will use it to implement its own DMA mapping logic. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-12-04vhost, kcov: collect coverage from vhost_workerAndrey Konovalov1-0/+1
Add kcov_remote_start()/kcov_remote_stop() annotations to the vhost_worker() function, which is responsible for processing vhost works. Since vhost_worker() threads are spawned per vhost device instance the common kcov handle is used for kcov_remote_start()/stop() annotations (see Documentation/dev-tools/kcov.rst for details). As the result kcov can now be used to collect coverage from vhost worker threads. Link: http://lkml.kernel.org/r/e49d5d154e5da6c9ada521d2b7ce10a49ce9f98b.1572366574.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Potapenko <glider@google.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Windsor <dwindsor@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Marco Elver <elver@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-04Revert "vhost: access vq metadata through kernel virtual address"Michael S. Tsirkin1-41/+0
This reverts commit 7f466032dc ("vhost: access vq metadata through kernel virtual address"). The commit caused a bunch of issues, and while commit 73f628ec9e ("vhost: disable metadata prefetch optimization") disabled the optimization it's not nice to keep lots of dead code around. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-07-26vhost: disable metadata prefetch optimizationMichael S. Tsirkin1-1/+1
This seems to cause guest and host memory corruption. Disable for now until we get a better handle on that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-06vhost: fix clang build warningMichael S. Tsirkin1-2/+5
Clang warns: drivers/vhost/vhost.c:2085:5: warning: macro expansion producing 'defined' has undefined behavior [-Wexpansion-to-defined] #if VHOST_ARCH_CAN_ACCEL_UACCESS ^ drivers/vhost/vhost.h:98:38: note: expanded from macro 'VHOST_ARCH_CAN_ACCEL_UACCESS' #define VHOST_ARCH_CAN_ACCEL_UACCESS defined(CONFIG_MMU_NOTIFIER) && \ ^ It's being pedantic for the sake of portability, but the fix is easy enough. Rework the definition of VHOST_ARCH_CAN_ACCEL_UACCESS to expand to a constant. Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Link: https://github.com/ClangBuiltLinux/linux/issues/508 Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com>
2019-06-05vhost: access vq metadata through kernel virtual addressJason Wang1-0/+38
It was noticed that the copy_to/from_user() friends that was used to access virtqueue metdata tends to be very expensive for dataplane implementation like vhost since it involves lots of software checks, speculation barriers, hardware feature toggling (e.g SMAP). The extra cost will be more obvious when transferring small packets since the time spent on metadata accessing become more significant. This patch tries to eliminate those overheads by accessing them through direct mapping of those pages. Invalidation callbacks is implemented for co-operation with general VM management (swap, KSM, THP or NUMA balancing). We will try to get the direct mapping of vq metadata before each round of packet processing if it doesn't exist. If we fail, we will simplely fallback to copy_to/from_user() friends. This invalidation and direct mapping access are synchronized through spinlock and RCU. All matedata accessing through direct map is protected by RCU, and the setup or invalidation are done under spinlock. This method might does not work for high mem page which requires temporary mapping so we just fallback to normal copy_to/from_user() and may not for arch that has virtual tagged cache since extra cache flushing is needed to eliminate the alias. This will result complex logic and bad performance. For those archs, this patch simply go for copy_to/from_user() friends. This is done by ruling out kernel mapping codes through ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE. Note that this is only done when device IOTLB is not enabled. We could use similar method to optimize IOTLB in the future. Tests shows at most about 23% improvement on TX PPS when using virtio-user + vhost_net + xdp1 + TAP on 2.6GHz Broadwell: SMAP on | SMAP off Before: 5.2Mpps | 7.1Mpps After: 6.4Mpps | 8.2Mpps Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Jerome Glisse <jglisse@redhat.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-parisc@vger.kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-05vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()Jason Wang1-1/+1
Rename the function to be more accurate since it actually tries to prefetch vq metadata address in IOTLB. And this will be used by following patch to prefetch metadata virtual addresses. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-27vhost: introduce vhost_exceeds_weight()Jason Wang1-1/+4
We used to have vhost_exceeds_weight() for vhost-net to: - prevent vhost kthread from hogging the cpu - balance the time spent between TX and RX This function could be useful for vsock and scsi as well. So move it to vhost.c. Device must specify a weight which counts the number of requests, or it can also specific a byte_weight which counts the number of bytes that has been processed. Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-28vhost: fix OOB in get_rx_bufs()Jason Wang1-1/+3
After batched used ring updating was introduced in commit e2b3b35eb989 ("vhost_net: batch used ring update in rx"). We tend to batch heads in vq->heads for more than one packet. But the quota passed to get_rx_bufs() was not correctly limited, which can result a OOB write in vq->heads. headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx, vhost_len, &in, vq_log, &log, likely(mergeable) ? UIO_MAXIOV : 1); UIO_MAXIOV was still used which is wrong since we could have batched used in vq->heads, this will cause OOB if the next buffer needs more than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've batched 64 (VHOST_NET_BATCH) heads: Acked-by: Stefan Hajnoczi <stefanha@redhat.com> ============================================================================= BUG kmalloc-8k (Tainted: G B ): Redzone overwritten ----------------------------------------------------------------------------- INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674 kmem_cache_alloc_trace+0xbb/0x140 alloc_pd+0x22/0x60 gen8_ppgtt_create+0x11d/0x5f0 i915_ppgtt_create+0x16/0x80 i915_gem_create_context+0x248/0x390 i915_gem_context_create_ioctl+0x4b/0xe0 drm_ioctl_kernel+0xa5/0xf0 drm_ioctl+0x2ed/0x3a0 do_vfs_ioctl+0x9f/0x620 ksys_ioctl+0x6b/0x80 __x64_sys_ioctl+0x11/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x (null) flags=0x200000000010201 INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for vhost-net. This is done through set the limitation through vhost_dev_init(), then set_owner can allocate the number of iov in a per device manner. This fixes CVE-2018-16880. Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17vhost: log dirty page correctlyJason Wang1-1/+2
Vhost dirty page logging API is designed to sync through GPA. But we try to log GIOVA when device IOTLB is enabled. This is wrong and may lead to missing data after migration. To solve this issue, when logging with device IOTLB enabled, we will: 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to get HVA, for writable descriptor, get HVA through iovec. For used ring update, translate its GIOVA to HVA 2) traverse the GPA->HVA mapping to get the possible GPA and log through GPA. Pay attention this reverse mapping is not guaranteed to be unique, so we should log each possible GPA in this case. This fix the failure of scp to guest during migration. In -next, we will probably support passing GIOVA->GPA instead of GIOVA->HVA. Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Reported-by: Jintack Lim <jintack@cs.columbia.edu> Cc: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06vhost: switch to use new message formatJason Wang1-1/+10
We use to have message like: struct vhost_msg { int type; union { struct vhost_iotlb_msg iotlb; __u8 padding[64]; }; }; Unfortunately, there will be a hole of 32bit in 64bit machine because of the alignment. This leads a different formats between 32bit API and 64bit API. What's more it will break 32bit program running on 64bit machine. So fixing this by introducing a new message type with an explicit 32bit reserved field after type like: struct vhost_msg_v2 { __u32 type; __u32 reserved; union { struct vhost_iotlb_msg iotlb; __u8 padding[64]; }; }; We will have a consistent ABI after switching to use this. To enable this capability, introduce a new ioctl (VHOST_SET_BAKCEND_FEATURE) for userspace to enable this feature (VHOST_BACKEND_F_IOTLB_V2). Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11vhost: return bool from *_access_ok() functionsStefan Hajnoczi1-2/+2
Currently vhost *_access_ok() functions return int. This is error-prone because there are two popular conventions: 1. 0 means failure, 1 means success 2. -errno means failure, 0 means success Although vhost mostly uses #1, it does not do so consistently. umem_access_ok() uses #2. This patch changes the return type from int to bool so that false means failure and true means success. This eliminates a potential source of errors. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20vhost: fix vhost ioctl signature to build with clangSonny Rao1-2/+2
Clang is particularly anal about signed vs unsigned comparisons and doesn't like the fact that some ioctl numbers set the MSB, so we get this error when trying to build vhost on aarch64: drivers/vhost/vhost.c:1400:7: error: overflow converting case value to switch condition type (3221794578 to 18446744072636378898) [-Werror, -Wswitch] case VHOST_GET_VRING_BASE: 3221794578 is 0xC008AF12 in hex 18446744072636378898 is 0xFFFFFFFFC008AF12 in hex Fix this by using unsigned ints in the function signature for vhost_vring_ioctl(). Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-8/+1
Pull virtio/vhost updates from Michael Tsirkin: "virtio, vhost: fixes, cleanups, features This includes the disk/cache memory stats for for the virtio balloon, as well as multiple fixes and cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: don't hold onto file pointer for VHOST_SET_LOG_FD vhost: don't hold onto file pointer for VHOST_SET_VRING_ERR vhost: don't hold onto file pointer for VHOST_SET_VRING_CALL ringtest: ring.c malloc & memset to calloc virtio_vop: don't kfree device on register failure virtio_pci: don't kfree device on register failure virtio: split device_register into device_initialize and device_add vhost: remove unused lock check flag in vhost_dev_cleanup() vhost: Remove the unused variable. virtio_blk: print capacity at probe time virtio: make VIRTIO a menuconfig to ease disabling it all virtio/ringtest: virtio_ring: fix up need_event math virtio/ringtest: fix up need_event math virtio: virtio_mmio: make of_device_ids const. firmware: Use PTR_ERR_OR_ZERO() virtio-mmio: Use PTR_ERR_OR_ZERO() vhost/scsi: Improve a size determination in four functions virtio_balloon: include disk/file caches memory statistics
2018-02-01vhost: don't hold onto file pointer for VHOST_SET_LOG_FDEric Biggers1-1/+0
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_dev->log_file. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2018-02-01vhost: don't hold onto file pointer for VHOST_SET_VRING_ERREric Biggers1-1/+0
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_virtqueue->error. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2018-02-01vhost: don't hold onto file pointer for VHOST_SET_VRING_CALLEric Biggers1-1/+0
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_virtqueue->call. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>