Message ID | 1704919215-91319-6-git-send-email-steven.sistare@oracle.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-22765-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2411:b0:101:2151:f287 with SMTP id m17csp1046938dyi; Wed, 10 Jan 2024 12:50:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IHRwmPcPlCIXVMGxXegcuASUHK5/Nt2RZ8GrnBn7NcgYXhgIMoP4d1xsKbgKHA0/bpQmBap X-Received: by 2002:a05:6a20:42a3:b0:19a:2f3f:88b7 with SMTP id o35-20020a056a2042a300b0019a2f3f88b7mr19840pzj.80.1704919847547; Wed, 10 Jan 2024 12:50:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704919847; cv=none; d=google.com; s=arc-20160816; b=s6IMu7qQdq4jiJn1hwK7M935wmqgTXuQFKPVrNREXaArV9L2VWgIjfEVofhbhZqzg+ tBUkOLbp9i3Kpo1vs6UqgIEqSwhEVJvnEkp/Femj0CB236XbFa7Fbwb+KACMfqWO9cQG KOzVJUZlNzxXvgb7R5YI7PbUn6NLAitrO6p1oFnYw2phE9a+qcBRKKIPW8JfNdq1aMji p7HmhXxurLtVmGytq0S6JSaxRFuBXCVWaIb1QSn3vCALt33FEbDvXZ9uGTJknRd1J3lL UqzDXbCYP7dbpuXcupg/fluGOuuyCEBClJEfQkSwlmzVhbIXDDjoiMlcCZDvRlngQEeD cHIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-id:precedence:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature; bh=dMCCREUha/f4rs7C2GyYgBNQR+v/C2SrOcmk5JSsA44=; fh=NLOUc4UpZ/0A9JrieYT9bxhjFpxay5qeZXON7PgxqoM=; b=C5Kifya4niNJYGaiFkHr790YP4Hbn9vvUmAGSUcJu/8wdXnIaEqRSYVKCnnrN7Dwoy U1EqtPDpXkR1Mr69V4oKOMtBYDvvC1wAJXsCvis1t9zf4z9Z4VkYqkAmlNq0Pga8NajJ LZH8NjYmxNfJWCOXlM2GSUBPuR1R58wxL386FWWq8vzowfrFB6fCmjn/yftuRWNc01ri ChhJMIqmAKdWfBrrGutOjgkI6neE4wxu1jZ7JngX1wVlWjoJTpz/rkFzyWUi84Zz8POA MUr9sMI480lnrZl1m6U0L2a6z5Xvqen+XFLL7pRpqb0FQrRvXCNIx7NPBiDrXcrWvZce Kslg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=DNlXjkqM; spf=pass (google.com: domain of linux-kernel+bounces-22765-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-22765-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id h6-20020a63c006000000b005cdc2cc9a1dsi4472972pgg.699.2024.01.10.12.50.47 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Jan 2024 12:50:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-22765-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=DNlXjkqM; spf=pass (google.com: domain of linux-kernel+bounces-22765-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-22765-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id E0AB6B26B82 for <ouuuleilei@gmail.com>; Wed, 10 Jan 2024 20:45:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A62105732D; Wed, 10 Jan 2024 20:40:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="DNlXjkqM" Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEE5A5024C for <linux-kernel@vger.kernel.org>; Wed, 10 Jan 2024 20:40:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 40AIEKYd001460; Wed, 10 Jan 2024 20:40:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-11-20; bh=dMCCREUha/f4rs7C2GyYgBNQR+v/C2SrOcmk5JSsA44=; b=DNlXjkqMufFKyp/XRiG5Cp55JWfT76/fidbY873HI1BSxR9/lufWuRag0DQmM+V2JRHi cb+i+vPqrtKD444hrvpvdFbJ6mXPSaV0iSvloT1L1WG+lsdfEe2VYmjiANU3Ram94ZHj X/plyFlaVNoyjFeRBolYwaCPQYZ2I0TI90JEmu95t9EIA2+5ITdVsXy25L5X11j55jkE s8/5V2OXGz/paaYrtwKlFJs4gBoTnxxt/Gtom54weKJnHNZtw0qKHZt60tZBxznl+4aw ppeX+nPvHlFBa2Ob9L7ydSXuQdFiMVJf6mBjWNOq/r5ef0B+KIqYeCChL1AECg799rF8 vQ== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3vhvgm8x6a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 10 Jan 2024 20:40:22 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 40AJ29e8030110; Wed, 10 Jan 2024 20:40:21 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3vfutp5x98-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 10 Jan 2024 20:40:21 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 40AKeGrZ005067; Wed, 10 Jan 2024 20:40:21 GMT Received: from ca-dev63.us.oracle.com (ca-dev63.us.oracle.com [10.211.8.221]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3vfutp5x5e-6; Wed, 10 Jan 2024 20:40:20 +0000 From: Steve Sistare <steven.sistare@oracle.com> To: virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org Cc: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Si-Wei Liu <si-wei.liu@oracle.com>, Eugenio Perez Martin <eperezma@redhat.com>, Xuan Zhuo <xuanzhuo@linux.alibaba.com>, Dragos Tatulea <dtatulea@nvidia.com>, Eli Cohen <elic@nvidia.com>, Xie Yongji <xieyongji@bytedance.com>, Steve Sistare <steven.sistare@oracle.com> Subject: [RFC V1 05/13] vhost-vdpa: VHOST_IOTLB_REMAP Date: Wed, 10 Jan 2024 12:40:07 -0800 Message-Id: <1704919215-91319-6-git-send-email-steven.sistare@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1704919215-91319-1-git-send-email-steven.sistare@oracle.com> References: <1704919215-91319-1-git-send-email-steven.sistare@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-10_10,2024-01-10_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 bulkscore=0 adultscore=0 phishscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2401100163 X-Proofpoint-GUID: F5HIFZ2cDW3w2vviq-JwN_fv0su8Y40M X-Proofpoint-ORIG-GUID: F5HIFZ2cDW3w2vviq-JwN_fv0su8Y40M Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787738034427942716 X-GMAIL-MSGID: 1787738034427942716 |
Series |
vdpa live update
|
|
Commit Message
Steven Sistare
Jan. 10, 2024, 8:40 p.m. UTC
When device ownership is passed to a new process via VHOST_NEW_OWNER,
some devices need to know the new userland addresses of the dma mappings.
Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr
of a mapping. The new uaddr must address the same memory object as
originally mapped.
The user must suspend the device before the old address is invalidated,
and cannot resume it until after VHOST_IOTLB_REMAP is called, but this
requirement is not enforced by the API.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
drivers/vhost/vdpa.c | 34 ++++++++++++++++++++++++++++++++
include/uapi/linux/vhost_types.h | 11 ++++++++++-
2 files changed, 44 insertions(+), 1 deletion(-)
Comments
On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare <steven.sistare@oracle.com> wrote: > > When device ownership is passed to a new process via VHOST_NEW_OWNER, > some devices need to know the new userland addresses of the dma mappings. > Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr > of a mapping. The new uaddr must address the same memory object as > originally mapped. > > The user must suspend the device before the old address is invalidated, > and cannot resume it until after VHOST_IOTLB_REMAP is called, but this > requirement is not enforced by the API. > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > --- > drivers/vhost/vdpa.c | 34 ++++++++++++++++++++++++++++++++ > include/uapi/linux/vhost_types.h | 11 ++++++++++- > 2 files changed, 44 insertions(+), 1 deletion(-) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index faed6471934a..ec5ca20bd47d 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, > > } > > +static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v, > + struct vhost_iotlb *iotlb, > + struct vhost_iotlb_msg *msg) > +{ > + struct vdpa_device *vdpa = v->vdpa; > + const struct vdpa_config_ops *ops = vdpa->config; > + u32 asid = iotlb_to_asid(iotlb); > + u64 start = msg->iova; > + u64 last = start + msg->size - 1; > + struct vhost_iotlb_map *map; > + int r = 0; > + > + if (msg->perm || !msg->size) > + return -EINVAL; > + > + map = vhost_iotlb_itree_first(iotlb, start, last); > + if (!map) > + return -ENOENT; > + > + if (map->start != start || map->last != last) > + return -EINVAL; > + > + /* batch will finish with remap. non-batch must do it now. */ > + if (!v->in_batch) > + r = ops->set_map(vdpa, asid, iotlb); > + if (!r) > + map->addr = msg->uaddr; I may miss something, for example for PA mapping, 1) need to convert uaddr into phys addr 2) need to check whether the uaddr is backed by the same page or not? Thanks > + > + return r; > +} > + > static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, > struct vhost_iotlb *iotlb, > struct vhost_iotlb_msg *msg) > @@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid, > ops->set_map(vdpa, asid, iotlb); > v->in_batch = false; > break; > + case VHOST_IOTLB_REMAP: > + r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg); > + break; > default: > r = -EINVAL; > break; > diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h > index 9177843951e9..35908315ff55 100644 > --- a/include/uapi/linux/vhost_types.h > +++ b/include/uapi/linux/vhost_types.h > @@ -79,7 +79,7 @@ struct vhost_iotlb_msg { > /* > * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying > * multiple mappings in one go: beginning with > - * VHOST_IOTLB_BATCH_BEGIN, followed by any number of > + * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or > * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END. > * When one of these two values is used as the message type, the rest > * of the fields in the message are ignored. There's no guarantee that > @@ -87,6 +87,15 @@ struct vhost_iotlb_msg { > */ > #define VHOST_IOTLB_BATCH_BEGIN 5 > #define VHOST_IOTLB_BATCH_END 6 > + > +/* > + * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova. > + * The new uaddr must address the same memory object as originally mapped. > + * Failure to do so will result in user memory corruption and/or device > + * misbehavior. iova and size must match the arguments used to create the > + * an existing mapping. Protection is not changed, and perm must be 0. > + */ > +#define VHOST_IOTLB_REMAP 7 > __u8 type; > }; > > -- > 2.39.3 >
On Wed, Jan 10, 2024 at 9:40 PM Steve Sistare <steven.sistare@oracle.com> wrote: > > When device ownership is passed to a new process via VHOST_NEW_OWNER, > some devices need to know the new userland addresses of the dma mappings. > Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr > of a mapping. The new uaddr must address the same memory object as > originally mapped. > I think this new ioctl is very interesting, and can be used to optimize some parts of live migration with shadow virtqueue if it allows to actually replace the uaddr. Would you be interested in that capability? > The user must suspend the device before the old address is invalidated, > and cannot resume it until after VHOST_IOTLB_REMAP is called, but this > requirement is not enforced by the API. > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > --- > drivers/vhost/vdpa.c | 34 ++++++++++++++++++++++++++++++++ > include/uapi/linux/vhost_types.h | 11 ++++++++++- > 2 files changed, 44 insertions(+), 1 deletion(-) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index faed6471934a..ec5ca20bd47d 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, > > } > > +static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v, > + struct vhost_iotlb *iotlb, > + struct vhost_iotlb_msg *msg) > +{ > + struct vdpa_device *vdpa = v->vdpa; > + const struct vdpa_config_ops *ops = vdpa->config; > + u32 asid = iotlb_to_asid(iotlb); > + u64 start = msg->iova; > + u64 last = start + msg->size - 1; > + struct vhost_iotlb_map *map; > + int r = 0; > + > + if (msg->perm || !msg->size) > + return -EINVAL; > + > + map = vhost_iotlb_itree_first(iotlb, start, last); > + if (!map) > + return -ENOENT; > + > + if (map->start != start || map->last != last) > + return -EINVAL; > + > + /* batch will finish with remap. non-batch must do it now. */ > + if (!v->in_batch) > + r = ops->set_map(vdpa, asid, iotlb); I'm missing how these cases are handled: * The device does not expose set_map but dma_map / dma_unmap (you can check this case with the simulator) * The device uses platform iommu (!set_map && !dma_unmap vdpa_ops). > + if (!r) > + map->addr = msg->uaddr; > + > + return r; > +} > + > static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, > struct vhost_iotlb *iotlb, > struct vhost_iotlb_msg *msg) > @@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid, > ops->set_map(vdpa, asid, iotlb); > v->in_batch = false; > break; > + case VHOST_IOTLB_REMAP: > + r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg); > + break; > default: > r = -EINVAL; > break; > diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h > index 9177843951e9..35908315ff55 100644 > --- a/include/uapi/linux/vhost_types.h > +++ b/include/uapi/linux/vhost_types.h > @@ -79,7 +79,7 @@ struct vhost_iotlb_msg { > /* > * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying > * multiple mappings in one go: beginning with > - * VHOST_IOTLB_BATCH_BEGIN, followed by any number of > + * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or > * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END. > * When one of these two values is used as the message type, the rest > * of the fields in the message are ignored. There's no guarantee that > @@ -87,6 +87,15 @@ struct vhost_iotlb_msg { > */ > #define VHOST_IOTLB_BATCH_BEGIN 5 > #define VHOST_IOTLB_BATCH_END 6 > + > +/* > + * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova. > + * The new uaddr must address the same memory object as originally mapped. > + * Failure to do so will result in user memory corruption and/or device > + * misbehavior. iova and size must match the arguments used to create the > + * an existing mapping. Protection is not changed, and perm must be 0. > + */ > +#define VHOST_IOTLB_REMAP 7 > __u8 type; > }; > > -- > 2.39.3 >
On 1/10/2024 10:08 PM, Jason Wang wrote: > On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare <steven.sistare@oracle.com> wrote: >> >> When device ownership is passed to a new process via VHOST_NEW_OWNER, >> some devices need to know the new userland addresses of the dma mappings. >> Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr >> of a mapping. The new uaddr must address the same memory object as >> originally mapped. >> >> The user must suspend the device before the old address is invalidated, >> and cannot resume it until after VHOST_IOTLB_REMAP is called, but this >> requirement is not enforced by the API. >> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >> --- >> drivers/vhost/vdpa.c | 34 ++++++++++++++++++++++++++++++++ >> include/uapi/linux/vhost_types.h | 11 ++++++++++- >> 2 files changed, 44 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c >> index faed6471934a..ec5ca20bd47d 100644 >> --- a/drivers/vhost/vdpa.c >> +++ b/drivers/vhost/vdpa.c >> @@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, >> >> } >> >> +static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v, >> + struct vhost_iotlb *iotlb, >> + struct vhost_iotlb_msg *msg) >> +{ >> + struct vdpa_device *vdpa = v->vdpa; >> + const struct vdpa_config_ops *ops = vdpa->config; >> + u32 asid = iotlb_to_asid(iotlb); >> + u64 start = msg->iova; >> + u64 last = start + msg->size - 1; >> + struct vhost_iotlb_map *map; >> + int r = 0; >> + >> + if (msg->perm || !msg->size) >> + return -EINVAL; >> + >> + map = vhost_iotlb_itree_first(iotlb, start, last); >> + if (!map) >> + return -ENOENT; >> + >> + if (map->start != start || map->last != last) >> + return -EINVAL; >> + >> + /* batch will finish with remap. non-batch must do it now. */ >> + if (!v->in_batch) >> + r = ops->set_map(vdpa, asid, iotlb); >> + if (!r) >> + map->addr = msg->uaddr; > > I may miss something, for example for PA mapping, > > 1) need to convert uaddr into phys addr > 2) need to check whether the uaddr is backed by the same page or not? This code does not verify that the new size@uaddr points to the same physical pages as the old size@uaddr. If the app screws up and they differ, then the app may corrupt its own memory, but no-one else's. It would be expensive for large memories to verify page by page, O(npages), and such verification lies on the critical path for virtual machine downtime during live update. I could compare the properties of the vma(s) for the old size@uaddr vs the vma for the new, but that is more complicated and would be a maintenance headache. When I submitted such code to Alex W when writing the equivalent patches for vfio, he said don't check, correctness is the user's responsibility. - Steve >> + >> + return r; >> +} >> + >> static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, >> struct vhost_iotlb *iotlb, >> struct vhost_iotlb_msg *msg) >> @@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid, >> ops->set_map(vdpa, asid, iotlb); >> v->in_batch = false; >> break; >> + case VHOST_IOTLB_REMAP: >> + r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg); >> + break; >> default: >> r = -EINVAL; >> break; >> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h >> index 9177843951e9..35908315ff55 100644 >> --- a/include/uapi/linux/vhost_types.h >> +++ b/include/uapi/linux/vhost_types.h >> @@ -79,7 +79,7 @@ struct vhost_iotlb_msg { >> /* >> * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying >> * multiple mappings in one go: beginning with >> - * VHOST_IOTLB_BATCH_BEGIN, followed by any number of >> + * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or >> * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END. >> * When one of these two values is used as the message type, the rest >> * of the fields in the message are ignored. There's no guarantee that >> @@ -87,6 +87,15 @@ struct vhost_iotlb_msg { >> */ >> #define VHOST_IOTLB_BATCH_BEGIN 5 >> #define VHOST_IOTLB_BATCH_END 6 >> + >> +/* >> + * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova. >> + * The new uaddr must address the same memory object as originally mapped. >> + * Failure to do so will result in user memory corruption and/or device >> + * misbehavior. iova and size must match the arguments used to create the >> + * an existing mapping. Protection is not changed, and perm must be 0. >> + */ >> +#define VHOST_IOTLB_REMAP 7 >> __u8 type; >> }; >> >> -- >> 2.39.3 >> >
On Thu, Jan 18, 2024 at 4:32 AM Steven Sistare <steven.sistare@oracle.com> wrote: > > On 1/10/2024 10:08 PM, Jason Wang wrote: > > On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare <steven.sistare@oracle.com> wrote: > >> > >> When device ownership is passed to a new process via VHOST_NEW_OWNER, > >> some devices need to know the new userland addresses of the dma mappings. > >> Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr > >> of a mapping. The new uaddr must address the same memory object as > >> originally mapped. > >> > >> The user must suspend the device before the old address is invalidated, > >> and cannot resume it until after VHOST_IOTLB_REMAP is called, but this > >> requirement is not enforced by the API. > >> > >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > >> --- > >> drivers/vhost/vdpa.c | 34 ++++++++++++++++++++++++++++++++ > >> include/uapi/linux/vhost_types.h | 11 ++++++++++- > >> 2 files changed, 44 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > >> index faed6471934a..ec5ca20bd47d 100644 > >> --- a/drivers/vhost/vdpa.c > >> +++ b/drivers/vhost/vdpa.c > >> @@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, > >> > >> } > >> > >> +static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v, > >> + struct vhost_iotlb *iotlb, > >> + struct vhost_iotlb_msg *msg) > >> +{ > >> + struct vdpa_device *vdpa = v->vdpa; > >> + const struct vdpa_config_ops *ops = vdpa->config; > >> + u32 asid = iotlb_to_asid(iotlb); > >> + u64 start = msg->iova; > >> + u64 last = start + msg->size - 1; > >> + struct vhost_iotlb_map *map; > >> + int r = 0; > >> + > >> + if (msg->perm || !msg->size) > >> + return -EINVAL; > >> + > >> + map = vhost_iotlb_itree_first(iotlb, start, last); > >> + if (!map) > >> + return -ENOENT; > >> + > >> + if (map->start != start || map->last != last) > >> + return -EINVAL; > >> + > >> + /* batch will finish with remap. non-batch must do it now. */ > >> + if (!v->in_batch) > >> + r = ops->set_map(vdpa, asid, iotlb); > >> + if (!r) > >> + map->addr = msg->uaddr; > > > > I may miss something, for example for PA mapping, > > > > 1) need to convert uaddr into phys addr > > 2) need to check whether the uaddr is backed by the same page or not? > > This code does not verify that the new size@uaddr points to the same physical > pages as the old size@uaddr. If the app screws up and they differ, then the app > may corrupt its own memory, but no-one else's. > > It would be expensive for large memories to verify page by page, O(npages), and such > verification lies on the critical path for virtual machine downtime during live update. > I could compare the properties of the vma(s) for the old size@uaddr vs the vma for the > new, but that is more complicated and would be a maintenance headache. When I submitted > such code to Alex W when writing the equivalent patches for vfio, he said don't check, > correctness is the user's responsibility. Ok, let's document this somewhere. Thanks > > - Steve > > >> + > >> + return r; > >> +} > >> + > >> static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, > >> struct vhost_iotlb *iotlb, > >> struct vhost_iotlb_msg *msg) > >> @@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid, > >> ops->set_map(vdpa, asid, iotlb); > >> v->in_batch = false; > >> break; > >> + case VHOST_IOTLB_REMAP: > >> + r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg); > >> + break; > >> default: > >> r = -EINVAL; > >> break; > >> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h > >> index 9177843951e9..35908315ff55 100644 > >> --- a/include/uapi/linux/vhost_types.h > >> +++ b/include/uapi/linux/vhost_types.h > >> @@ -79,7 +79,7 @@ struct vhost_iotlb_msg { > >> /* > >> * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying > >> * multiple mappings in one go: beginning with > >> - * VHOST_IOTLB_BATCH_BEGIN, followed by any number of > >> + * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or > >> * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END. > >> * When one of these two values is used as the message type, the rest > >> * of the fields in the message are ignored. There's no guarantee that > >> @@ -87,6 +87,15 @@ struct vhost_iotlb_msg { > >> */ > >> #define VHOST_IOTLB_BATCH_BEGIN 5 > >> #define VHOST_IOTLB_BATCH_END 6 > >> + > >> +/* > >> + * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova. > >> + * The new uaddr must address the same memory object as originally mapped. > >> + * Failure to do so will result in user memory corruption and/or device > >> + * misbehavior. iova and size must match the arguments used to create the > >> + * an existing mapping. Protection is not changed, and perm must be 0. > >> + */ > >> +#define VHOST_IOTLB_REMAP 7 > >> __u8 type; > >> }; > >> > >> -- > >> 2.39.3 > >> > > >
On 1/16/2024 1:14 PM, Eugenio Perez Martin wrote: > On Wed, Jan 10, 2024 at 9:40 PM Steve Sistare <steven.sistare@oracle.com> wrote: >> >> When device ownership is passed to a new process via VHOST_NEW_OWNER, >> some devices need to know the new userland addresses of the dma mappings. >> Define the new iotlb message type VHOST_IOTLB_REMAP to update the uaddr >> of a mapping. The new uaddr must address the same memory object as >> originally mapped. > > I think this new ioctl is very interesting, and can be used to > optimize some parts of live migration with shadow virtqueue if it > allows to actually replace the uaddr. Would you be interested in that > capability? Sure. Please share your thoughts on how it would be used. I don't follow, because with live migration, you are creating a new vdpa instance with new shadow queues on the destination, versus relocating old shadow queues (which we could do during live update which preserves the same vdpa instance). (Sorry for the late response, I stashed this email and forgot to respond.) >> The user must suspend the device before the old address is invalidated, >> and cannot resume it until after VHOST_IOTLB_REMAP is called, but this >> requirement is not enforced by the API. >> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >> --- >> drivers/vhost/vdpa.c | 34 ++++++++++++++++++++++++++++++++ >> include/uapi/linux/vhost_types.h | 11 ++++++++++- >> 2 files changed, 44 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c >> index faed6471934a..ec5ca20bd47d 100644 >> --- a/drivers/vhost/vdpa.c >> +++ b/drivers/vhost/vdpa.c >> @@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, >> >> } >> >> +static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v, >> + struct vhost_iotlb *iotlb, >> + struct vhost_iotlb_msg *msg) >> +{ >> + struct vdpa_device *vdpa = v->vdpa; >> + const struct vdpa_config_ops *ops = vdpa->config; >> + u32 asid = iotlb_to_asid(iotlb); >> + u64 start = msg->iova; >> + u64 last = start + msg->size - 1; >> + struct vhost_iotlb_map *map; >> + int r = 0; >> + >> + if (msg->perm || !msg->size) >> + return -EINVAL; >> + >> + map = vhost_iotlb_itree_first(iotlb, start, last); >> + if (!map) >> + return -ENOENT; >> + >> + if (map->start != start || map->last != last) >> + return -EINVAL; >> + >> + /* batch will finish with remap. non-batch must do it now. */ >> + if (!v->in_batch) >> + r = ops->set_map(vdpa, asid, iotlb); > > I'm missing how these cases are handled: > > * The device does not expose set_map but dma_map / dma_unmap (you can > check this case with the simulator) I chose not to support dma_map, because set_map looks to be more commonly supported, and batch-mode plus set_map is the most efficient way to support remapping. I could define a dma_remap entry point if folks think that is important. Regardless, I will check that ops->set_map != NULL before calling it. > * The device uses platform iommu (!set_map && !dma_unmap vdpa_ops). I believe this just works, because iommu_map() never saves userland address, so it does not need to be informed when uaddr changes. That is certainly true for the code path: vhost_vdpa_pa_map() vhost_vdpa_map() if !dma_map && !set_map iommu_map() However, this code path confuses me: vhost_vdpa_process_iotlb_update() if (vdpa->use_va) vhost_vdpa_va_map(... uaddr) vhost_vdpa_map(... uaddr) iommu_map(... uaddr) AFAICT uaddr is never translated to physical. Maybe use_va is always false if !dma_map && !set_map ? - Steve >> + if (!r) >> + map->addr = msg->uaddr; >> + >> + return r; >> +} >> + >> static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, >> struct vhost_iotlb *iotlb, >> struct vhost_iotlb_msg *msg) >> @@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid, >> ops->set_map(vdpa, asid, iotlb); >> v->in_batch = false; >> break; >> + case VHOST_IOTLB_REMAP: >> + r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg); >> + break; >> default: >> r = -EINVAL; >> break; >> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h >> index 9177843951e9..35908315ff55 100644 >> --- a/include/uapi/linux/vhost_types.h >> +++ b/include/uapi/linux/vhost_types.h >> @@ -79,7 +79,7 @@ struct vhost_iotlb_msg { >> /* >> * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying >> * multiple mappings in one go: beginning with >> - * VHOST_IOTLB_BATCH_BEGIN, followed by any number of >> + * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or >> * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END. >> * When one of these two values is used as the message type, the rest >> * of the fields in the message are ignored. There's no guarantee that >> @@ -87,6 +87,15 @@ struct vhost_iotlb_msg { >> */ >> #define VHOST_IOTLB_BATCH_BEGIN 5 >> #define VHOST_IOTLB_BATCH_END 6 >> + >> +/* >> + * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova. >> + * The new uaddr must address the same memory object as originally mapped. >> + * Failure to do so will result in user memory corruption and/or device >> + * misbehavior. iova and size must match the arguments used to create the >> + * an existing mapping. Protection is not changed, and perm must be 0. >> + */ >> +#define VHOST_IOTLB_REMAP 7 >> __u8 type; >> }; >> >> -- >> 2.39.3 >> >
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index faed6471934a..ec5ca20bd47d 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -1219,6 +1219,37 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, } +static int vhost_vdpa_process_iotlb_remap(struct vhost_vdpa *v, + struct vhost_iotlb *iotlb, + struct vhost_iotlb_msg *msg) +{ + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + u32 asid = iotlb_to_asid(iotlb); + u64 start = msg->iova; + u64 last = start + msg->size - 1; + struct vhost_iotlb_map *map; + int r = 0; + + if (msg->perm || !msg->size) + return -EINVAL; + + map = vhost_iotlb_itree_first(iotlb, start, last); + if (!map) + return -ENOENT; + + if (map->start != start || map->last != last) + return -EINVAL; + + /* batch will finish with remap. non-batch must do it now. */ + if (!v->in_batch) + r = ops->set_map(vdpa, asid, iotlb); + if (!r) + map->addr = msg->uaddr; + + return r; +} + static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, struct vhost_iotlb *iotlb, struct vhost_iotlb_msg *msg) @@ -1298,6 +1329,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid, ops->set_map(vdpa, asid, iotlb); v->in_batch = false; break; + case VHOST_IOTLB_REMAP: + r = vhost_vdpa_process_iotlb_remap(v, iotlb, msg); + break; default: r = -EINVAL; break; diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h index 9177843951e9..35908315ff55 100644 --- a/include/uapi/linux/vhost_types.h +++ b/include/uapi/linux/vhost_types.h @@ -79,7 +79,7 @@ struct vhost_iotlb_msg { /* * VHOST_IOTLB_BATCH_BEGIN and VHOST_IOTLB_BATCH_END allow modifying * multiple mappings in one go: beginning with - * VHOST_IOTLB_BATCH_BEGIN, followed by any number of + * VHOST_IOTLB_BATCH_BEGIN, followed by any number of VHOST_IOTLB_REMAP or * VHOST_IOTLB_UPDATE messages, and ending with VHOST_IOTLB_BATCH_END. * When one of these two values is used as the message type, the rest * of the fields in the message are ignored. There's no guarantee that @@ -87,6 +87,15 @@ struct vhost_iotlb_msg { */ #define VHOST_IOTLB_BATCH_BEGIN 5 #define VHOST_IOTLB_BATCH_END 6 + +/* + * VHOST_IOTLB_REMAP registers a new uaddr for the existing mapping at iova. + * The new uaddr must address the same memory object as originally mapped. + * Failure to do so will result in user memory corruption and/or device + * misbehavior. iova and size must match the arguments used to create the + * an existing mapping. Protection is not changed, and perm must be 0. + */ +#define VHOST_IOTLB_REMAP 7 __u8 type; };