From patchwork Thu Feb 9 04:31:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 5161 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp122532wrn; Wed, 8 Feb 2023 20:34:40 -0800 (PST) X-Google-Smtp-Source: AK7set/77gbYn9rSYcVdKH2Q3X0mYPb/i35Gp3Pz8gQfjnZZExy1SVllbfAGmUZaC6OR5cKKAJ+T X-Received: by 2002:a17:902:e84f:b0:197:9184:34c6 with SMTP id t15-20020a170902e84f00b00197918434c6mr12232105plg.55.1675917280151; Wed, 08 Feb 2023 20:34:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675917280; cv=none; d=google.com; s=arc-20160816; b=BiCCmXajxkuIN0Ys/0Qs6Vw7K/C/1LhZfRXbhW+I+R+vBRyzs5qoVAjpS1QIvEDuBa PSZMNFpKnML6zyu+ge3UN63Aa20tf+eXsG27jIS060TaP+y5DkiEUNXrwU75Fyi36i1e 30+lQ66ujeVYdWDB/+zPjSbEMh2lZxJn3Mmwi2pmpxTGfcx+JqNKZ5GMm8eo0AWjxo4Y gRBZOmfxxRv8kMsZM75piti1dQVT2z+ObLHGosqCDbVNgULt7qDzkyMZhysprvLHMz9Z 8FQ7+0Kbsi4h1rhlgfmWVD1YKHtCUsUkvjViQjePy70EKsqL+Ynr0Q53VOasSF5T7c+q 15gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=bkwJb5wXbGMwFkMkE3ccc/AUwALVhd2R9YmRicxQ78Y=; b=gRMpYvDCG7avid0HFzfhyHCY4EYQ2P7cIqZGScJxSeccBaD5Xxzsq42DAdxMuTKwcf kT1/p3m4miZE5iOiXdr3beYQ8hgy+BQI2EFNkoBQwpz5/pA3T/1ioTSq8+ryIbGKXb79 Vi/x4bIbp3ZB7zot3i/WU3FH4QbePlv3tE3eqZNn0QlnS6wRmo/WangJTeoTFiohZ4/9 lwY0f2C40Cu5o//FS+JbPPHDDeIuJ3cOsnBY5wTaqxXrbmeq4/3dJ7Hmy5c5pPJcTLPR q0nE3Ex9OBQZomZg4k+y/sXtae7O8OhpMj8tDttiadqt7uhfqqVnq+/0NDZDDB0tCkuu pK0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="MnQ/FdYP"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y5-20020a1709027c8500b00188fc3a3fb1si655765pll.184.2023.02.08.20.34.25; Wed, 08 Feb 2023 20:34:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="MnQ/FdYP"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230096AbjBIEeJ (ORCPT + 99 others); Wed, 8 Feb 2023 23:34:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230184AbjBIEdk (ORCPT ); Wed, 8 Feb 2023 23:33:40 -0500 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F0B53D924; Wed, 8 Feb 2023 20:32:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675917152; x=1707453152; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=FxTu9xVn0q17LFAoBZGb5vcNT6+3d43snW3rZ3K2Mik=; b=MnQ/FdYPY/nEYAzHhf1lZaHyg/Z0OC027yqQvGuFrPAaQyEFfDEV6kDD ZNxvqoy7j6bpiHe3iPy4DTbycCyGwb118zh0PmXWIRxolIQTK8IV5XHaP 0MseU2yCtR6m4Ko5mxFDO1uunZr7+z8UshUBCZesQwdjpYOVqXmjQOz53 fK1MLMqNMXGtMdY/4zfvUmSYUKlGBcOQJ+qh4cHvEw1cNzxyKWHNQae+E g3w1Wy04Yihg+my2bacA8ukVhi+A7u/vdO8iuscBbtHGo10AS/SFsnPJp OV8pwMoH00j4mB4hUEycm8Kh1xcs/WWsdS2ODY+G5eBG5iMLt+d4J62nb A==; X-IronPort-AV: E=McAfee;i="6500,9779,10615"; a="331298571" X-IronPort-AV: E=Sophos;i="5.97,281,1669104000"; d="scan'208";a="331298571" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Feb 2023 20:31:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10615"; a="669447422" X-IronPort-AV: E=Sophos;i="5.97,281,1669104000"; d="scan'208";a="669447422" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by fmsmga007.fm.intel.com with ESMTP; 08 Feb 2023 20:31:56 -0800 From: Yi Liu To: joro@8bytes.org, alex.williamson@redhat.com, jgg@nvidia.com, kevin.tian@intel.com, robin.murphy@arm.com Cc: cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, shameerali.kolothum.thodi@huawei.com, lulu@redhat.com, suravee.suthikulpanit@amd.com, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, baolu.lu@linux.intel.com Subject: [PATCH 00/17] Add Intel VT-d nested translation Date: Wed, 8 Feb 2023 20:31:36 -0800 Message-Id: <20230209043153.14964-1-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757326638040006689?= X-GMAIL-MSGID: =?utf-8?q?1757326638040006689?= Nested translation has two stage address translations to get the final physical addresses. Take Intel VT-d as an example, the first stage translation structure is I/O page table. As the below diagram shows, guest I/O page table pointer in GPA (guest physical address) is passed to host to do the first stage translation. Along with it, guest modifications to present mappings in the first stage page should be followed with an iotlb invalidation to sync host iotlb. .-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |--------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------\.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables Different platform vendors have different first stage translation formats, so userspace should query the underlying iommu capability before setting first stage translation structures to host.[1] In iommufd subsystem, I/O page tables would be tracked by hw_pagetable objects. First stage page table is owned by userspace (guest), while second stage page table is owned by kernel for security. So First stage page tables are tracked by user-managed hw_pagetable, second stage page tables are tracked by kernel- managed hw_pagetable. This series first introduces new iommu op for allocating domains for iommufd, and op for syncing iotlb for first stage page table modifications, and then add the implementation of the new ops in intel-iommu driver. After this preparation, adds kernel-managed and user-managed hw_pagetable allocation for userspace. Last, add self-test for the new ioctls. This series is based on "[PATCH 0/6] iommufd: Add iommu capability reporting"[1] and Nicolin's "[PATCH v2 00/10] Add IO page table replacement support"[2]. Complete code can be found in[3]. Draft Qemu code can be found in[4]. Basic test done with DSA device on VT-d. Where the guest has a vIOMMU built with nested translation. [1] https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.com/ [2] https://lore.kernel.org/linux-iommu/cover.1675802050.git.nicolinc@nvidia.com/ [3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting_vtd_v1 [4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting Regards, Yi Liu Lu Baolu (5): iommu: Add new iommu op to create domains owned by userspace iommu: Add nested domain support iommu/vt-d: Extend dmar_domain to support nested domain iommu/vt-d: Add helper to setup pasid nested translation iommu/vt-d: Add nested domain support Nicolin Chen (6): iommufd: Add/del hwpt to IOAS at alloc/destroy() iommufd/device: Move IOAS attaching and detaching operations into helpers iommufd/selftest: Add IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE test op iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC ioctl iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu (6): iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation iommufd: Split iommufd_hw_pagetable_alloc() iommufd: Add kernel-managed hw_pagetable allocation for userspace iommufd: Add infrastructure for user-managed hw_pagetable allocation iommufd: Add user-managed hw_pagetable allocation iommufd/device: Report supported stage-1 page table types drivers/iommu/intel/Makefile | 2 +- drivers/iommu/intel/iommu.c | 38 ++- drivers/iommu/intel/iommu.h | 50 +++- drivers/iommu/intel/nested.c | 143 +++++++++ drivers/iommu/intel/pasid.c | 142 +++++++++ drivers/iommu/intel/pasid.h | 2 + drivers/iommu/iommufd/device.c | 117 ++++---- drivers/iommu/iommufd/hw_pagetable.c | 280 +++++++++++++++++- drivers/iommu/iommufd/iommufd_private.h | 23 +- drivers/iommu/iommufd/iommufd_test.h | 35 +++ drivers/iommu/iommufd/main.c | 11 + drivers/iommu/iommufd/selftest.c | 149 +++++++++- include/linux/iommu.h | 11 + include/uapi/linux/iommufd.h | 196 ++++++++++++ tools/testing/selftests/iommu/iommufd.c | 124 +++++++- tools/testing/selftests/iommu/iommufd_utils.h | 106 +++++++ 16 files changed, 1329 insertions(+), 100 deletions(-) create mode 100644 drivers/iommu/intel/nested.c