Message ID | 20230321154804.184577-4-sgarzare@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:1828:b0:ab:1fc6:e12a with SMTP id l40csp2424289dyk; Tue, 21 Mar 2023 08:57:25 -0700 (PDT) X-Google-Smtp-Source: AK7set8hxX7hSA4WOf9l3TTGk62gD/Md8x7TdJ+bkP+rTejWdaykkin/9TIgd/NoJ0zf/GMEfrZW X-Received: by 2002:a17:90b:390c:b0:23f:7666:c8a2 with SMTP id ob12-20020a17090b390c00b0023f7666c8a2mr5934pjb.47.1679414245693; Tue, 21 Mar 2023 08:57:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679414245; cv=none; d=google.com; s=arc-20160816; b=ygWNjMHb9/snRm7IIa/yHpoW0JvwUqtp1w9hwmJyIk2zI52TaRSZuX9zbMG0YtvwsS yNZkdfPKGfouGF89jK0lt3WPb5f1OFz+vwD1z5P77/ClrpZ+CbUI+ZLtMRh+m+8+k5zV RG83TADjKzftdb58MaSRQmQ8sRI0Rj4Pqgf/keFl5u/hLAbVY9I5qrw/AWfdtbVUj4BU mRG40+cg10Z/NbRhYytYRfaWGfzlRgyEoxKHpiQ8sP6NJchpRPmyQ8Z2cReoWT9083ga XrbVaocIElHKAtTXsLK8r2RcyivRp09ve8A+oZGkhAgO24oqsG7UniHXwbp440eyQxQV Tx9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=qQNTryMkijjvv0zrmCG/46urX2+Q+d1Ofd7Juu04IrA=; b=Jk4Ep4UjYEEuh9oQq+AUC+MOb4cw25+hDklYnuw2GqcyMitNCOpmtJks+j/sVuRgpb Sfr3prptoyD5s5filXhPWU2aIg8ks6emDMCzxWEbkvqDCb+bjo7hwrTzuj9ptMxMQakR v1g531gbNlFee39E9G0+y1S9E5ytDNuAC4sI6Ecrjnwwpsjgtsxq9WUeOqzSfnsj1PI2 HESiYXdp6aKsGbnZsRbXZoD8yvFXZoNjSr2GEelw3O+jIPK+Sgs0Z6I1Q7yszjRsS4i9 kFoccd48I7somFVK1qQz1ineSO6xZ2yamRpOHWm851voRgRWw4f8IRuoA4zMki1VN2Mv US8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iiawttli; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m1-20020a170902db0100b001a05347d088si14857940plx.201.2023.03.21.08.57.11; Tue, 21 Mar 2023 08:57:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iiawttli; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231860AbjCUPtI (ORCPT <rfc822;ezelljr.billy@gmail.com> + 99 others); Tue, 21 Mar 2023 11:49:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231140AbjCUPtC (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 21 Mar 2023 11:49:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0BB750991 for <linux-kernel@vger.kernel.org>; Tue, 21 Mar 2023 08:48:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679413696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qQNTryMkijjvv0zrmCG/46urX2+Q+d1Ofd7Juu04IrA=; b=iiawttliNbbLhv0B9+RWRYgZEkCM2VcgVKrzeraJ2vS75gt6k6Cu4NlVwOrzgmK60d7kjE TMizqUMD+epfCYDOWNr/KRSq6IfJu1bs7Wbx98WYUWjPQ977fCtCrNU+Qoocx1LNApV9b7 z9ushf4nDjP9vpkk8z4ozEnkhBjdcLs= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-107-Oszp1K14OlygqdI9_Z6gSQ-1; Tue, 21 Mar 2023 11:48:14 -0400 X-MC-Unique: Oszp1K14OlygqdI9_Z6gSQ-1 Received: by mail-wm1-f71.google.com with SMTP id bi7-20020a05600c3d8700b003edecc610abso3512271wmb.7 for <linux-kernel@vger.kernel.org>; Tue, 21 Mar 2023 08:48:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679413693; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qQNTryMkijjvv0zrmCG/46urX2+Q+d1Ofd7Juu04IrA=; b=a7c0qagvAojj7A9ne9rFBZrEu6Vj+f//SH6+1JxaMn42KX73GQ4CaWabbRJ9WCip9d wYz+kqkbbK1gRBEMr+g7bdKoGBNOHI3UKG0gmD5v+fFrwNo38RkoOKy7fSgHgGeskD4/ jcSEuZCUGjtAf5rcPBzIQDQj+fT05AOm2hHrkwjphC9hhsTZ4uWyq0Rc2Qu4Ouc3J/8j rjJMnv13g8Y/JlqUL8N2KdIx69XizJj1ugMuy8vIIrxEbSUvKKasjPWFaljfOsfaPmm/ BIRRZpDzli5BcFWDutnRRNc1uFddwsVfedKrMg0q20cgEfEg8yZlqM2bXE75hDMKapN8 EH0g== X-Gm-Message-State: AO0yUKUMd9sSk901zB9JAYNfYqR7lwM7TUGUz+ufs/yYrhwqn6N07qmm PhCGGIVq9iBpFPP4wPo5d/mTD67lxZ94tz4cF7OPtY0qYtRcsFwp/FJQVuvnb8Zxpz62wTym521 3M4kEhLYUcYRUZ3Q4uIFEuccV X-Received: by 2002:adf:dd0e:0:b0:2d8:a55e:1fd7 with SMTP id a14-20020adfdd0e000000b002d8a55e1fd7mr2462136wrm.21.1679413693505; Tue, 21 Mar 2023 08:48:13 -0700 (PDT) X-Received: by 2002:adf:dd0e:0:b0:2d8:a55e:1fd7 with SMTP id a14-20020adfdd0e000000b002d8a55e1fd7mr2462125wrm.21.1679413693229; Tue, 21 Mar 2023 08:48:13 -0700 (PDT) Received: from step1.redhat.com (host-82-57-51-170.retail.telecomitalia.it. [82.57.51.170]) by smtp.gmail.com with ESMTPSA id z15-20020a5d44cf000000b002ce9f0e4a8fsm11692694wrr.84.2023.03.21.08.48.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 08:48:12 -0700 (PDT) From: Stefano Garzarella <sgarzare@redhat.com> To: virtualization@lists.linux-foundation.org Cc: Jason Wang <jasowang@redhat.com>, kvm@vger.kernel.org, stefanha@redhat.com, linux-kernel@vger.kernel.org, eperezma@redhat.com, "Michael S. Tsirkin" <mst@redhat.com>, Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>, netdev@vger.kernel.org, Stefano Garzarella <sgarzare@redhat.com> Subject: [PATCH v3 8/8] vdpa_sim: add support for user VA Date: Tue, 21 Mar 2023 16:48:04 +0100 Message-Id: <20230321154804.184577-4-sgarzare@redhat.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230321154804.184577-1-sgarzare@redhat.com> References: <20230321154228.182769-1-sgarzare@redhat.com> <20230321154804.184577-1-sgarzare@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760993029151194979?= X-GMAIL-MSGID: =?utf-8?q?1760993471915740374?= |
Series |
vdpa_sim: add support for user VA
|
|
Commit Message
Stefano Garzarella
March 21, 2023, 3:48 p.m. UTC
The new "use_va" module parameter (default: true) is used in
vdpa_alloc_device() to inform the vDPA framework that the device
supports VA.
vringh is initialized to use VA only when "use_va" is true and the
user's mm has been bound. So, only when the bus supports user VA
(e.g. vhost-vdpa).
vdpasim_mm_work_fn work is used to serialize the binding to a new
address space when the .bind_mm callback is invoked, and unbinding
when the .unbind_mm callback is invoked.
Call mmget_not_zero()/kthread_use_mm() inside the worker function
to pin the address space only as long as needed, following the
documentation of mmget() in include/linux/sched/mm.h:
* Never use this function to pin this address space for an
* unbounded/indefinite amount of time.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
Notes:
v3:
- called mmget_not_zero() before kthread_use_mm() [Jason]
As the documentation of mmget() in include/linux/sched/mm.h says:
* Never use this function to pin this address space for an
* unbounded/indefinite amount of time.
I moved mmget_not_zero/kthread_use_mm inside the worker function,
this way we pin the address space only as long as needed.
This is similar to what vfio_iommu_type1_dma_rw_chunk() does in
drivers/vfio/vfio_iommu_type1.c
- simplified the mm bind/unbind [Jason]
- renamed vdpasim_worker_change_mm_sync() [Jason]
- fix commit message (s/default: false/default: true)
v2:
- `use_va` set to true by default [Eugenio]
- supported the new unbind_mm callback [Jason]
- removed the unbind_mm call in vdpasim_do_reset() [Jason]
- avoided to release the lock while call kthread_flush_work() since we
are now using a mutex to protect the device state
drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 +
drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++++++++++++++++++++++++++++++-
2 files changed, 79 insertions(+), 2 deletions(-)
Comments
On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > The new "use_va" module parameter (default: true) is used in > vdpa_alloc_device() to inform the vDPA framework that the device > supports VA. > > vringh is initialized to use VA only when "use_va" is true and the > user's mm has been bound. So, only when the bus supports user VA > (e.g. vhost-vdpa). > > vdpasim_mm_work_fn work is used to serialize the binding to a new > address space when the .bind_mm callback is invoked, and unbinding > when the .unbind_mm callback is invoked. > > Call mmget_not_zero()/kthread_use_mm() inside the worker function > to pin the address space only as long as needed, following the > documentation of mmget() in include/linux/sched/mm.h: > > * Never use this function to pin this address space for an > * unbounded/indefinite amount of time. I wonder if everything would be simplified if we just allow the parent to advertise whether or not it requires the address space. Then when vhost-vDPA probes the device it can simply advertise use_work as true so vhost core can use get_task_mm() in this case? Thanks > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > --- > > Notes: > v3: > - called mmget_not_zero() before kthread_use_mm() [Jason] > As the documentation of mmget() in include/linux/sched/mm.h says: > > * Never use this function to pin this address space for an > * unbounded/indefinite amount of time. > > I moved mmget_not_zero/kthread_use_mm inside the worker function, > this way we pin the address space only as long as needed. > This is similar to what vfio_iommu_type1_dma_rw_chunk() does in > drivers/vfio/vfio_iommu_type1.c > - simplified the mm bind/unbind [Jason] > - renamed vdpasim_worker_change_mm_sync() [Jason] > - fix commit message (s/default: false/default: true) > v2: > - `use_va` set to true by default [Eugenio] > - supported the new unbind_mm callback [Jason] > - removed the unbind_mm call in vdpasim_do_reset() [Jason] > - avoided to release the lock while call kthread_flush_work() since we > are now using a mutex to protect the device state > > drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 + > drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++++++++++++++++++++++++++++++- > 2 files changed, 79 insertions(+), 2 deletions(-) > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h > index 4774292fba8c..3a42887d05d9 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h > @@ -59,6 +59,7 @@ struct vdpasim { > struct vdpasim_virtqueue *vqs; > struct kthread_worker *worker; > struct kthread_work work; > + struct mm_struct *mm_bound; > struct vdpasim_dev_attr dev_attr; > /* mutex to synchronize virtqueue state */ > struct mutex mutex; > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c > index ab4cfb82c237..23c891cdcd54 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c > @@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444); > MODULE_PARM_DESC(max_iotlb_entries, > "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)"); > > +static bool use_va = true; > +module_param(use_va, bool, 0444); > +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA"); > + > #define VDPASIM_QUEUE_ALIGN PAGE_SIZE > #define VDPASIM_QUEUE_MAX 256 > #define VDPASIM_VENDOR_ID 0 > > +struct vdpasim_mm_work { > + struct kthread_work work; > + struct vdpasim *vdpasim; > + struct mm_struct *mm_to_bind; > + int ret; > +}; > + > +static void vdpasim_mm_work_fn(struct kthread_work *work) > +{ > + struct vdpasim_mm_work *mm_work = > + container_of(work, struct vdpasim_mm_work, work); > + struct vdpasim *vdpasim = mm_work->vdpasim; > + > + mm_work->ret = 0; > + > + //TODO: should we attach the cgroup of the mm owner? > + vdpasim->mm_bound = mm_work->mm_to_bind; > +} > + > +static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim, > + struct vdpasim_mm_work *mm_work) > +{ > + struct kthread_work *work = &mm_work->work; > + > + kthread_init_work(work, vdpasim_mm_work_fn); > + kthread_queue_work(vdpasim->worker, work); > + > + kthread_flush_work(work); > +} > + > static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) > { > return container_of(vdpa, struct vdpasim, vdpa); > @@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) > { > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; > uint16_t last_avail_idx = vq->vring.last_avail_idx; > + bool va_enabled = use_va && vdpasim->mm_bound; > > - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false, > + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, > + va_enabled, > (struct vring_desc *)(uintptr_t)vq->desc_addr, > (struct vring_avail *) > (uintptr_t)vq->driver_addr, > @@ -130,8 +166,20 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops; > static void vdpasim_work_fn(struct kthread_work *work) > { > struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); > + struct mm_struct *mm = vdpasim->mm_bound; > + > + if (mm) { > + if (!mmget_not_zero(mm)) > + return; > + kthread_use_mm(mm); > + } > > vdpasim->dev_attr.work_fn(vdpasim); > + > + if (mm) { > + kthread_unuse_mm(mm); > + mmput(mm); > + } > } > > struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, > @@ -162,7 +210,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, > vdpa = __vdpa_alloc_device(NULL, ops, > dev_attr->ngroups, dev_attr->nas, > dev_attr->alloc_size, > - dev_attr->name, false); > + dev_attr->name, use_va); > if (IS_ERR(vdpa)) { > ret = PTR_ERR(vdpa); > goto err_alloc; > @@ -582,6 +630,30 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid, > return ret; > } > > +static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm) > +{ > + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); > + struct vdpasim_mm_work mm_work; > + > + mm_work.vdpasim = vdpasim; > + mm_work.mm_to_bind = mm; > + > + vdpasim_worker_change_mm_sync(vdpasim, &mm_work); > + > + return mm_work.ret; > +} > + > +static void vdpasim_unbind_mm(struct vdpa_device *vdpa) > +{ > + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); > + struct vdpasim_mm_work mm_work; > + > + mm_work.vdpasim = vdpasim; > + mm_work.mm_to_bind = NULL; > + > + vdpasim_worker_change_mm_sync(vdpasim, &mm_work); > +} > + > static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid, > u64 iova, u64 size, > u64 pa, u32 perm, void *opaque) > @@ -678,6 +750,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = { > .set_group_asid = vdpasim_set_group_asid, > .dma_map = vdpasim_dma_map, > .dma_unmap = vdpasim_dma_unmap, > + .bind_mm = vdpasim_bind_mm, > + .unbind_mm = vdpasim_unbind_mm, > .free = vdpasim_free, > }; > > @@ -712,6 +786,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = { > .get_iova_range = vdpasim_get_iova_range, > .set_group_asid = vdpasim_set_group_asid, > .set_map = vdpasim_set_map, > + .bind_mm = vdpasim_bind_mm, > + .unbind_mm = vdpasim_unbind_mm, > .free = vdpasim_free, > }; > > -- > 2.39.2 >
On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote: >On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <sgarzare@redhat.com> wrote: >> >> The new "use_va" module parameter (default: true) is used in >> vdpa_alloc_device() to inform the vDPA framework that the device >> supports VA. >> >> vringh is initialized to use VA only when "use_va" is true and the >> user's mm has been bound. So, only when the bus supports user VA >> (e.g. vhost-vdpa). >> >> vdpasim_mm_work_fn work is used to serialize the binding to a new >> address space when the .bind_mm callback is invoked, and unbinding >> when the .unbind_mm callback is invoked. >> >> Call mmget_not_zero()/kthread_use_mm() inside the worker function >> to pin the address space only as long as needed, following the >> documentation of mmget() in include/linux/sched/mm.h: >> >> * Never use this function to pin this address space for an >> * unbounded/indefinite amount of time. > >I wonder if everything would be simplified if we just allow the parent >to advertise whether or not it requires the address space. > >Then when vhost-vDPA probes the device it can simply advertise >use_work as true so vhost core can use get_task_mm() in this case? IIUC set user_worker to true, it also creates the kthread in the vhost core (but we can add another variable to avoid this). My biggest concern is the comment in include/linux/sched/mm.h. get_task_mm() uses mmget(), but in the documentation they advise against pinning the address space indefinitely, so I preferred in keeping mmgrab() in the vhost core, then call mmget_not_zero() in the worker only when it is running. In the future maybe mm will be used differently from parent if somehow it is supported by iommu, so I would leave it to the parent to handle this. Thanks, Stefano
On Thu, Mar 23, 2023 at 10:50:06AM +0100, Stefano Garzarella wrote: > On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote: > > On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > The new "use_va" module parameter (default: true) is used in > > > vdpa_alloc_device() to inform the vDPA framework that the device > > > supports VA. > > > > > > vringh is initialized to use VA only when "use_va" is true and the > > > user's mm has been bound. So, only when the bus supports user VA > > > (e.g. vhost-vdpa). > > > > > > vdpasim_mm_work_fn work is used to serialize the binding to a new > > > address space when the .bind_mm callback is invoked, and unbinding > > > when the .unbind_mm callback is invoked. > > > > > > Call mmget_not_zero()/kthread_use_mm() inside the worker function > > > to pin the address space only as long as needed, following the > > > documentation of mmget() in include/linux/sched/mm.h: > > > > > > * Never use this function to pin this address space for an > > > * unbounded/indefinite amount of time. > > > > I wonder if everything would be simplified if we just allow the parent > > to advertise whether or not it requires the address space. > > > > Then when vhost-vDPA probes the device it can simply advertise > > use_work as true so vhost core can use get_task_mm() in this case? > > IIUC set user_worker to true, it also creates the kthread in the vhost > core (but we can add another variable to avoid this). > > My biggest concern is the comment in include/linux/sched/mm.h. > get_task_mm() uses mmget(), but in the documentation they advise against > pinning the address space indefinitely, so I preferred in keeping > mmgrab() in the vhost core, then call mmget_not_zero() in the worker > only when it is running. > > In the future maybe mm will be used differently from parent if somehow > it is supported by iommu, so I would leave it to the parent to handle > this. > > Thanks, > Stefano I think iommufd is supposed to handle all this detail, yes.
On Thu, Mar 23, 2023 at 5:50 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote: > >On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > >> > >> The new "use_va" module parameter (default: true) is used in > >> vdpa_alloc_device() to inform the vDPA framework that the device > >> supports VA. > >> > >> vringh is initialized to use VA only when "use_va" is true and the > >> user's mm has been bound. So, only when the bus supports user VA > >> (e.g. vhost-vdpa). > >> > >> vdpasim_mm_work_fn work is used to serialize the binding to a new > >> address space when the .bind_mm callback is invoked, and unbinding > >> when the .unbind_mm callback is invoked. > >> > >> Call mmget_not_zero()/kthread_use_mm() inside the worker function > >> to pin the address space only as long as needed, following the > >> documentation of mmget() in include/linux/sched/mm.h: > >> > >> * Never use this function to pin this address space for an > >> * unbounded/indefinite amount of time. > > > >I wonder if everything would be simplified if we just allow the parent > >to advertise whether or not it requires the address space. > > > >Then when vhost-vDPA probes the device it can simply advertise > >use_work as true so vhost core can use get_task_mm() in this case? > > IIUC set user_worker to true, it also creates the kthread in the vhost > core (but we can add another variable to avoid this). > > My biggest concern is the comment in include/linux/sched/mm.h. > get_task_mm() uses mmget(), but in the documentation they advise against > pinning the address space indefinitely, so I preferred in keeping > mmgrab() in the vhost core, then call mmget_not_zero() in the worker > only when it is running. Ok. > > In the future maybe mm will be used differently from parent if somehow > it is supported by iommu, so I would leave it to the parent to handle > this. This should be possible, I was told by Intel that their IOMMU can access the process page table for shared virtual memory. Thanks > > Thanks, > Stefano >
在 2023/3/21 23:48, Stefano Garzarella 写道: > The new "use_va" module parameter (default: true) is used in > vdpa_alloc_device() to inform the vDPA framework that the device > supports VA. > > vringh is initialized to use VA only when "use_va" is true and the > user's mm has been bound. So, only when the bus supports user VA > (e.g. vhost-vdpa). > > vdpasim_mm_work_fn work is used to serialize the binding to a new > address space when the .bind_mm callback is invoked, and unbinding > when the .unbind_mm callback is invoked. > > Call mmget_not_zero()/kthread_use_mm() inside the worker function > to pin the address space only as long as needed, following the > documentation of mmget() in include/linux/sched/mm.h: > > * Never use this function to pin this address space for an > * unbounded/indefinite amount of time. > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > --- > > Notes: > v3: > - called mmget_not_zero() before kthread_use_mm() [Jason] > As the documentation of mmget() in include/linux/sched/mm.h says: > > * Never use this function to pin this address space for an > * unbounded/indefinite amount of time. > > I moved mmget_not_zero/kthread_use_mm inside the worker function, > this way we pin the address space only as long as needed. > This is similar to what vfio_iommu_type1_dma_rw_chunk() does in > drivers/vfio/vfio_iommu_type1.c > - simplified the mm bind/unbind [Jason] > - renamed vdpasim_worker_change_mm_sync() [Jason] > - fix commit message (s/default: false/default: true) > v2: > - `use_va` set to true by default [Eugenio] > - supported the new unbind_mm callback [Jason] > - removed the unbind_mm call in vdpasim_do_reset() [Jason] > - avoided to release the lock while call kthread_flush_work() since we > are now using a mutex to protect the device state > > drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 + > drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++++++++++++++++++++++++++++++- > 2 files changed, 79 insertions(+), 2 deletions(-) > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h > index 4774292fba8c..3a42887d05d9 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h > @@ -59,6 +59,7 @@ struct vdpasim { > struct vdpasim_virtqueue *vqs; > struct kthread_worker *worker; > struct kthread_work work; > + struct mm_struct *mm_bound; > struct vdpasim_dev_attr dev_attr; > /* mutex to synchronize virtqueue state */ > struct mutex mutex; > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c > index ab4cfb82c237..23c891cdcd54 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c > @@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444); > MODULE_PARM_DESC(max_iotlb_entries, > "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)"); > > +static bool use_va = true; > +module_param(use_va, bool, 0444); > +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA"); > + > #define VDPASIM_QUEUE_ALIGN PAGE_SIZE > #define VDPASIM_QUEUE_MAX 256 > #define VDPASIM_VENDOR_ID 0 > > +struct vdpasim_mm_work { > + struct kthread_work work; > + struct vdpasim *vdpasim; > + struct mm_struct *mm_to_bind; > + int ret; > +}; > + > +static void vdpasim_mm_work_fn(struct kthread_work *work) > +{ > + struct vdpasim_mm_work *mm_work = > + container_of(work, struct vdpasim_mm_work, work); > + struct vdpasim *vdpasim = mm_work->vdpasim; > + > + mm_work->ret = 0; > + > + //TODO: should we attach the cgroup of the mm owner? > + vdpasim->mm_bound = mm_work->mm_to_bind; > +} > + > +static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim, > + struct vdpasim_mm_work *mm_work) > +{ > + struct kthread_work *work = &mm_work->work; > + > + kthread_init_work(work, vdpasim_mm_work_fn); > + kthread_queue_work(vdpasim->worker, work); > + > + kthread_flush_work(work); > +} > + > static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) > { > return container_of(vdpa, struct vdpasim, vdpa); > @@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) > { > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; > uint16_t last_avail_idx = vq->vring.last_avail_idx; > + bool va_enabled = use_va && vdpasim->mm_bound; > > - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false, > + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, > + va_enabled, > (struct vring_desc *)(uintptr_t)vq->desc_addr, > (struct vring_avail *) > (uintptr_t)vq->driver_addr, > @@ -130,8 +166,20 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops; > static void vdpasim_work_fn(struct kthread_work *work) > { > struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); > + struct mm_struct *mm = vdpasim->mm_bound; > + > + if (mm) { > + if (!mmget_not_zero(mm)) > + return; Do we need to check use_va here. Other than this Acked-by: Jason Wang <jasowang@redhat.com> Thanks > + kthread_use_mm(mm); > + } > > vdpasim->dev_attr.work_fn(vdpasim); > + > + if (mm) { > + kthread_unuse_mm(mm); > + mmput(mm); > + } > } > > struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, > @@ -162,7 +210,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, > vdpa = __vdpa_alloc_device(NULL, ops, > dev_attr->ngroups, dev_attr->nas, > dev_attr->alloc_size, > - dev_attr->name, false); > + dev_attr->name, use_va); > if (IS_ERR(vdpa)) { > ret = PTR_ERR(vdpa); > goto err_alloc; > @@ -582,6 +630,30 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid, > return ret; > } > > +static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm) > +{ > + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); > + struct vdpasim_mm_work mm_work; > + > + mm_work.vdpasim = vdpasim; > + mm_work.mm_to_bind = mm; > + > + vdpasim_worker_change_mm_sync(vdpasim, &mm_work); > + > + return mm_work.ret; > +} > + > +static void vdpasim_unbind_mm(struct vdpa_device *vdpa) > +{ > + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); > + struct vdpasim_mm_work mm_work; > + > + mm_work.vdpasim = vdpasim; > + mm_work.mm_to_bind = NULL; > + > + vdpasim_worker_change_mm_sync(vdpasim, &mm_work); > +} > + > static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid, > u64 iova, u64 size, > u64 pa, u32 perm, void *opaque) > @@ -678,6 +750,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = { > .set_group_asid = vdpasim_set_group_asid, > .dma_map = vdpasim_dma_map, > .dma_unmap = vdpasim_dma_unmap, > + .bind_mm = vdpasim_bind_mm, > + .unbind_mm = vdpasim_unbind_mm, > .free = vdpasim_free, > }; > > @@ -712,6 +786,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = { > .get_iova_range = vdpasim_get_iova_range, > .set_group_asid = vdpasim_set_group_asid, > .set_map = vdpasim_set_map, > + .bind_mm = vdpasim_bind_mm, > + .unbind_mm = vdpasim_unbind_mm, > .free = vdpasim_free, > }; >
On Fri, Mar 24, 2023 at 10:54:39AM +0800, Jason Wang wrote: >On Thu, Mar 23, 2023 at 5:50 PM Stefano Garzarella <sgarzare@redhat.com> wrote: >> >> On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote: >> >On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <sgarzare@redhat.com> wrote: >> >> >> >> The new "use_va" module parameter (default: true) is used in >> >> vdpa_alloc_device() to inform the vDPA framework that the device >> >> supports VA. >> >> >> >> vringh is initialized to use VA only when "use_va" is true and the >> >> user's mm has been bound. So, only when the bus supports user VA >> >> (e.g. vhost-vdpa). >> >> >> >> vdpasim_mm_work_fn work is used to serialize the binding to a new >> >> address space when the .bind_mm callback is invoked, and unbinding >> >> when the .unbind_mm callback is invoked. >> >> >> >> Call mmget_not_zero()/kthread_use_mm() inside the worker function >> >> to pin the address space only as long as needed, following the >> >> documentation of mmget() in include/linux/sched/mm.h: >> >> >> >> * Never use this function to pin this address space for an >> >> * unbounded/indefinite amount of time. >> > >> >I wonder if everything would be simplified if we just allow the parent >> >to advertise whether or not it requires the address space. >> > >> >Then when vhost-vDPA probes the device it can simply advertise >> >use_work as true so vhost core can use get_task_mm() in this case? >> >> IIUC set user_worker to true, it also creates the kthread in the vhost >> core (but we can add another variable to avoid this). >> >> My biggest concern is the comment in include/linux/sched/mm.h. >> get_task_mm() uses mmget(), but in the documentation they advise against >> pinning the address space indefinitely, so I preferred in keeping >> mmgrab() in the vhost core, then call mmget_not_zero() in the worker >> only when it is running. > >Ok. > >> >> In the future maybe mm will be used differently from parent if somehow >> it is supported by iommu, so I would leave it to the parent to handle >> this. > >This should be possible, I was told by Intel that their IOMMU can >access the process page table for shared virtual memory. Cool, we should investigate this. Do you have any pointers to their documentation? Thanks, Stefano
On Fri, Mar 24, 2023 at 11:49:32AM +0800, Jason Wang wrote: > >在 2023/3/21 23:48, Stefano Garzarella 写道: >>The new "use_va" module parameter (default: true) is used in >>vdpa_alloc_device() to inform the vDPA framework that the device >>supports VA. >> >>vringh is initialized to use VA only when "use_va" is true and the >>user's mm has been bound. So, only when the bus supports user VA >>(e.g. vhost-vdpa). >> >>vdpasim_mm_work_fn work is used to serialize the binding to a new >>address space when the .bind_mm callback is invoked, and unbinding >>when the .unbind_mm callback is invoked. >> >>Call mmget_not_zero()/kthread_use_mm() inside the worker function >>to pin the address space only as long as needed, following the >>documentation of mmget() in include/linux/sched/mm.h: >> >> * Never use this function to pin this address space for an >> * unbounded/indefinite amount of time. >> >>Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> >>--- >> >>Notes: >> v3: >> - called mmget_not_zero() before kthread_use_mm() [Jason] >> As the documentation of mmget() in include/linux/sched/mm.h says: >> * Never use this function to pin this address space for an >> * unbounded/indefinite amount of time. >> I moved mmget_not_zero/kthread_use_mm inside the worker function, >> this way we pin the address space only as long as needed. >> This is similar to what vfio_iommu_type1_dma_rw_chunk() does in >> drivers/vfio/vfio_iommu_type1.c >> - simplified the mm bind/unbind [Jason] >> - renamed vdpasim_worker_change_mm_sync() [Jason] >> - fix commit message (s/default: false/default: true) >> v2: >> - `use_va` set to true by default [Eugenio] >> - supported the new unbind_mm callback [Jason] >> - removed the unbind_mm call in vdpasim_do_reset() [Jason] >> - avoided to release the lock while call kthread_flush_work() since we >> are now using a mutex to protect the device state >> >> drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 + >> drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++++++++++++++++++++++++++++++- >> 2 files changed, 79 insertions(+), 2 deletions(-) >> >>diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h >>index 4774292fba8c..3a42887d05d9 100644 >>--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h >>+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h >>@@ -59,6 +59,7 @@ struct vdpasim { >> struct vdpasim_virtqueue *vqs; >> struct kthread_worker *worker; >> struct kthread_work work; >>+ struct mm_struct *mm_bound; >> struct vdpasim_dev_attr dev_attr; >> /* mutex to synchronize virtqueue state */ >> struct mutex mutex; >>diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c >>index ab4cfb82c237..23c891cdcd54 100644 >>--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c >>+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c >>@@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444); >> MODULE_PARM_DESC(max_iotlb_entries, >> "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)"); >>+static bool use_va = true; >>+module_param(use_va, bool, 0444); >>+MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA"); >>+ >> #define VDPASIM_QUEUE_ALIGN PAGE_SIZE >> #define VDPASIM_QUEUE_MAX 256 >> #define VDPASIM_VENDOR_ID 0 >>+struct vdpasim_mm_work { >>+ struct kthread_work work; >>+ struct vdpasim *vdpasim; >>+ struct mm_struct *mm_to_bind; >>+ int ret; >>+}; >>+ >>+static void vdpasim_mm_work_fn(struct kthread_work *work) >>+{ >>+ struct vdpasim_mm_work *mm_work = >>+ container_of(work, struct vdpasim_mm_work, work); >>+ struct vdpasim *vdpasim = mm_work->vdpasim; >>+ >>+ mm_work->ret = 0; >>+ >>+ //TODO: should we attach the cgroup of the mm owner? >>+ vdpasim->mm_bound = mm_work->mm_to_bind; >>+} >>+ >>+static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim, >>+ struct vdpasim_mm_work *mm_work) >>+{ >>+ struct kthread_work *work = &mm_work->work; >>+ >>+ kthread_init_work(work, vdpasim_mm_work_fn); >>+ kthread_queue_work(vdpasim->worker, work); >>+ >>+ kthread_flush_work(work); >>+} >>+ >> static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) >> { >> return container_of(vdpa, struct vdpasim, vdpa); >>@@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) >> { >> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; >> uint16_t last_avail_idx = vq->vring.last_avail_idx; >>+ bool va_enabled = use_va && vdpasim->mm_bound; >>- vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false, >>+ vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, >>+ va_enabled, >> (struct vring_desc *)(uintptr_t)vq->desc_addr, >> (struct vring_avail *) >> (uintptr_t)vq->driver_addr, >>@@ -130,8 +166,20 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops; >> static void vdpasim_work_fn(struct kthread_work *work) >> { >> struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); >>+ struct mm_struct *mm = vdpasim->mm_bound; >>+ >>+ if (mm) { >>+ if (!mmget_not_zero(mm)) >>+ return; > > >Do we need to check use_va here. Yep, right! > >Other than this > >Acked-by: Jason Wang <jasowang@redhat.com> Thanks for the reviews, Stefano
On Fri, Mar 24, 2023 at 10:43 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > On Fri, Mar 24, 2023 at 10:54:39AM +0800, Jason Wang wrote: > >On Thu, Mar 23, 2023 at 5:50 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > >> > >> On Thu, Mar 23, 2023 at 11:42:07AM +0800, Jason Wang wrote: > >> >On Tue, Mar 21, 2023 at 11:48 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > >> >> > >> >> The new "use_va" module parameter (default: true) is used in > >> >> vdpa_alloc_device() to inform the vDPA framework that the device > >> >> supports VA. > >> >> > >> >> vringh is initialized to use VA only when "use_va" is true and the > >> >> user's mm has been bound. So, only when the bus supports user VA > >> >> (e.g. vhost-vdpa). > >> >> > >> >> vdpasim_mm_work_fn work is used to serialize the binding to a new > >> >> address space when the .bind_mm callback is invoked, and unbinding > >> >> when the .unbind_mm callback is invoked. > >> >> > >> >> Call mmget_not_zero()/kthread_use_mm() inside the worker function > >> >> to pin the address space only as long as needed, following the > >> >> documentation of mmget() in include/linux/sched/mm.h: > >> >> > >> >> * Never use this function to pin this address space for an > >> >> * unbounded/indefinite amount of time. > >> > > >> >I wonder if everything would be simplified if we just allow the parent > >> >to advertise whether or not it requires the address space. > >> > > >> >Then when vhost-vDPA probes the device it can simply advertise > >> >use_work as true so vhost core can use get_task_mm() in this case? > >> > >> IIUC set user_worker to true, it also creates the kthread in the vhost > >> core (but we can add another variable to avoid this). > >> > >> My biggest concern is the comment in include/linux/sched/mm.h. > >> get_task_mm() uses mmget(), but in the documentation they advise against > >> pinning the address space indefinitely, so I preferred in keeping > >> mmgrab() in the vhost core, then call mmget_not_zero() in the worker > >> only when it is running. > > > >Ok. > > > >> > >> In the future maybe mm will be used differently from parent if somehow > >> it is supported by iommu, so I would leave it to the parent to handle > >> this. > > > >This should be possible, I was told by Intel that their IOMMU can > >access the process page table for shared virtual memory. > > Cool, we should investigate this. Do you have any pointers to their > documentation? The vtd-spec I think. Thanks > > Thanks, > Stefano >
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h index 4774292fba8c..3a42887d05d9 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h @@ -59,6 +59,7 @@ struct vdpasim { struct vdpasim_virtqueue *vqs; struct kthread_worker *worker; struct kthread_work work; + struct mm_struct *mm_bound; struct vdpasim_dev_attr dev_attr; /* mutex to synchronize virtqueue state */ struct mutex mutex; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index ab4cfb82c237..23c891cdcd54 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444); MODULE_PARM_DESC(max_iotlb_entries, "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)"); +static bool use_va = true; +module_param(use_va, bool, 0444); +MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA"); + #define VDPASIM_QUEUE_ALIGN PAGE_SIZE #define VDPASIM_QUEUE_MAX 256 #define VDPASIM_VENDOR_ID 0 +struct vdpasim_mm_work { + struct kthread_work work; + struct vdpasim *vdpasim; + struct mm_struct *mm_to_bind; + int ret; +}; + +static void vdpasim_mm_work_fn(struct kthread_work *work) +{ + struct vdpasim_mm_work *mm_work = + container_of(work, struct vdpasim_mm_work, work); + struct vdpasim *vdpasim = mm_work->vdpasim; + + mm_work->ret = 0; + + //TODO: should we attach the cgroup of the mm owner? + vdpasim->mm_bound = mm_work->mm_to_bind; +} + +static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim, + struct vdpasim_mm_work *mm_work) +{ + struct kthread_work *work = &mm_work->work; + + kthread_init_work(work, vdpasim_mm_work_fn); + kthread_queue_work(vdpasim->worker, work); + + kthread_flush_work(work); +} + static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) { return container_of(vdpa, struct vdpasim, vdpa); @@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) { struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; uint16_t last_avail_idx = vq->vring.last_avail_idx; + bool va_enabled = use_va && vdpasim->mm_bound; - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, false, + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, + va_enabled, (struct vring_desc *)(uintptr_t)vq->desc_addr, (struct vring_avail *) (uintptr_t)vq->driver_addr, @@ -130,8 +166,20 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops; static void vdpasim_work_fn(struct kthread_work *work) { struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); + struct mm_struct *mm = vdpasim->mm_bound; + + if (mm) { + if (!mmget_not_zero(mm)) + return; + kthread_use_mm(mm); + } vdpasim->dev_attr.work_fn(vdpasim); + + if (mm) { + kthread_unuse_mm(mm); + mmput(mm); + } } struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, @@ -162,7 +210,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, vdpa = __vdpa_alloc_device(NULL, ops, dev_attr->ngroups, dev_attr->nas, dev_attr->alloc_size, - dev_attr->name, false); + dev_attr->name, use_va); if (IS_ERR(vdpa)) { ret = PTR_ERR(vdpa); goto err_alloc; @@ -582,6 +630,30 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid, return ret; } +static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_mm_work mm_work; + + mm_work.vdpasim = vdpasim; + mm_work.mm_to_bind = mm; + + vdpasim_worker_change_mm_sync(vdpasim, &mm_work); + + return mm_work.ret; +} + +static void vdpasim_unbind_mm(struct vdpa_device *vdpa) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + struct vdpasim_mm_work mm_work; + + mm_work.vdpasim = vdpasim; + mm_work.mm_to_bind = NULL; + + vdpasim_worker_change_mm_sync(vdpasim, &mm_work); +} + static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid, u64 iova, u64 size, u64 pa, u32 perm, void *opaque) @@ -678,6 +750,8 @@ static const struct vdpa_config_ops vdpasim_config_ops = { .set_group_asid = vdpasim_set_group_asid, .dma_map = vdpasim_dma_map, .dma_unmap = vdpasim_dma_unmap, + .bind_mm = vdpasim_bind_mm, + .unbind_mm = vdpasim_unbind_mm, .free = vdpasim_free, }; @@ -712,6 +786,8 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = { .get_iova_range = vdpasim_get_iova_range, .set_group_asid = vdpasim_set_group_asid, .set_map = vdpasim_set_map, + .bind_mm = vdpasim_bind_mm, + .unbind_mm = vdpasim_unbind_mm, .free = vdpasim_free, };