From patchwork Sat Oct 21 09:25:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Si-Wei Liu X-Patchwork-Id: 156411 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp199088vqx; Sat, 21 Oct 2023 02:28:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG5Y1xfWyyY39s0cNBIBcXGOr4ZqbRQOhRqU0TuD8vYuqUJqN6jXrc8dR/NToUjBS6zp1FO X-Received: by 2002:a17:902:d042:b0:1bb:b30e:4364 with SMTP id l2-20020a170902d04200b001bbb30e4364mr3354308pll.39.1697880530119; Sat, 21 Oct 2023 02:28:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697880530; cv=none; d=google.com; s=arc-20160816; b=tY/B5lKetvXJbdlCVPkEexxo7KBKLm+9qkQ3YaGJ4DnsDgwY2n/ml6hhSj+Lx8VWPn efyZOXtfk/1uGjnd9D9DYneQ3wYS8liwNvwLXhe8RSDQ2jzsfuFEgIS+ZQ1jKWRUl+oq h3sSScozSa50/Nb1jGu+kRYhJko680L/WISUUkPFEie5fFjo9IJZi+OJlRtYzkVpb5QS EVlw39HvDXniG5JJPa/7FeYmVGsCUSSZGZfRNCcV93QMw1YJPrGCMIwelOgqnb6HrUaQ 4tcthPqdgSkInMOxjwn9d1fLkPVJpu7iiC9UChNgzl66Z0+OjpDmE0cmL9xYcKSIMPsY QA/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=9BXLcHlC+mpIFJm+VhDktZctOK6MnG2fZvm6HmvyjLg=; fh=ozy2CkLebIOzbpPF40kmeuF6jbAmH8vZa/HD/EKOE2c=; b=WDvvnZlSSqwo3TLtpSvcQqfVkHKCdcK6FuAZQdbO+Thg/Rzc6U8A5IVo8/TXcWmX4N T15XxAYK5fWLRJ2PyG+aa+Q1SG0HJFs1BdmOya4NAChdLR7/ZQCIp0tW6WUAeME8KEoP VHu2K5yzwhr1aWmOX3+VHMYINoTyT0LgopV0w0ro5s+jP8VS7uGXwmP52lDTr3SEOXPF LjF6ydjVQeA1PIf4TiqftxMKPX1yU5zTdLp8XRJ75eyyK9GTMk7izSUHsUwb8BNsZxN8 8NJWQmLW0UJ87Dx4ReZNyx1IH0C2TH59ODk3HoP14i09I0lrgOXB616FvsRpwBfRFZ1t r/SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-03-30 header.b=qRj9hrTt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id x17-20020a170902ea9100b001b9d800b487si3138575plb.87.2023.10.21.02.28.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Oct 2023 02:28:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-03-30 header.b=qRj9hrTt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 7273D82ED24B; Sat, 21 Oct 2023 02:28:42 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229980AbjJUJ21 (ORCPT + 26 others); Sat, 21 Oct 2023 05:28:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229472AbjJUJ20 (ORCPT ); Sat, 21 Oct 2023 05:28:26 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE85CBE for ; Sat, 21 Oct 2023 02:28:24 -0700 (PDT) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 39L4XchN031867; Sat, 21 Oct 2023 09:28:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2023-03-30; bh=9BXLcHlC+mpIFJm+VhDktZctOK6MnG2fZvm6HmvyjLg=; b=qRj9hrTt55/dWqiaiiKgVl8hvOcNDqCAEnuT7CZJ22YnBTag3TwW8tH79GAJcu2G1DPm R5/EyE1bKHWKFq8jL55Q0R9SUIyefquyI67NP5CfEtRqqSYalct1L3/yF80Jx5JbDw7x IxbYODeYDrhqsjoSvptx1q1HPwENTL0sXHfGQ7xDN7GQ9EpL/UheTZSIAsmVIvTmuXkM XPQSUsJad626U7UMuc9m7l352vKoORxbZZ2fI2zaLmiETGE3Yf+zD0evj41Dtuo5v0u+ e+rBfAtWCFv4dH6E1txMXGrgma6oQElcFreUzXBRW7YiMCjxiZlv/sVT/o/nJGK8P/Gi 4Q== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3tv68t8c2c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 21 Oct 2023 09:28:20 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 39L6FU6C019120; Sat, 21 Oct 2023 09:28:19 GMT Received: from ban25x6uut24.us.oracle.com (ban25x6uut24.us.oracle.com [10.153.73.24]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 3tv532gf44-2; Sat, 21 Oct 2023 09:28:19 +0000 From: Si-Wei Liu To: jasowang@redhat.com, mst@redhat.com, eperezma@redhat.com, sgarzare@redhat.com, dtatulea@nvidia.com Cc: virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 1/7] vdpa: introduce .reset_map operation callback Date: Sat, 21 Oct 2023 02:25:13 -0700 Message-Id: <1697880319-4937-2-git-send-email-si-wei.liu@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1697880319-4937-1-git-send-email-si-wei.liu@oracle.com> References: <1697880319-4937-1-git-send-email-si-wei.liu@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-10-20_10,2023-10-19_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 malwarescore=0 bulkscore=0 mlxscore=0 suspectscore=0 phishscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310170001 definitions=main-2310210086 X-Proofpoint-ORIG-GUID: SFaXeIDhHW1n_k7DF-_ZDJHrGK52hygc X-Proofpoint-GUID: SFaXeIDhHW1n_k7DF-_ZDJHrGK52hygc X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Sat, 21 Oct 2023 02:28:42 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780356774989750855 X-GMAIL-MSGID: 1780356774989750855 Some device specific IOMMU parent drivers have long standing bogus behavior that mistakenly clean up the maps during .reset. By definition, this is violation to the on-chip IOMMU ops (i.e. .set_map, or .dma_map & .dma_unmap) in those offending drivers, as the removal of internal maps is completely agnostic to the upper layer, causing inconsistent view between the userspace and the kernel. Some userspace app like QEMU gets around of this brokenness by proactively removing and adding back all the maps around vdpa device reset, but such workaround actually penalize other well-behaved driver setup, where vdpa reset always comes with the associated mapping cost, especially for kernel vDPA devices (use_va=false) that have high cost on pinning. It's imperative to rectify this behavior and remove the problematic code from all those non-compliant parent drivers. The reason why a separate .reset_map op is introduced is because this allows a simple on-chip IOMMU model without exposing too much device implementation detail to the upper vdpa layer. The .dma_map/unmap or .set_map driver API is meant to be used to manipulate the IOTLB mappings, and has been abstracted in a way similar to how a real IOMMU device maps or unmaps pages for certain memory ranges. However, apart from this there also exists other mapping needs, in which case 1:1 passthrough mapping has to be used by other users (read virtio-vdpa). To ease parent/vendor driver implementation and to avoid abusing DMA ops in an unexpacted way, these on-chip IOMMU devices can start with 1:1 passthrough mapping mode initially at the time of creation. Then the .reset_map op can be used to switch iotlb back to this initial state without having to expose a complex two-dimensional IOMMU device model. The .reset_map is not a MUST for every parent that implements the .dma_map or .set_map API, because device may work with DMA ops directly by implement their own to manipulate system memory mappings, so don't have to use .reset_map to achieve a simple IOMMU device model for 1:1 passthrough mapping. Signed-off-by: Si-Wei Liu Acked-by: Eugenio PĂ©rez Acked-by: Jason Wang --- include/linux/vdpa.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index d376309b99cf..26ae6ae1eac3 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -327,6 +327,15 @@ struct vdpa_map_file { * @iova: iova to be unmapped * @size: size of the area * Returns integer: success (0) or error (< 0) + * @reset_map: Reset device memory mapping to the default + * state (optional) + * Needed for devices that are using device + * specific DMA translation and prefer mapping + * to be decoupled from the virtio life cycle, + * i.e. device .reset op does not reset mapping + * @vdev: vdpa device + * @asid: address space identifier + * Returns integer: success (0) or error (< 0) * @get_vq_dma_dev: Get the dma device for a specific * virtqueue (optional) * @vdev: vdpa device @@ -405,6 +414,7 @@ struct vdpa_config_ops { u64 iova, u64 size, u64 pa, u32 perm, void *opaque); int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid, u64 iova, u64 size); + int (*reset_map)(struct vdpa_device *vdev, unsigned int asid); int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group, unsigned int asid); struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);