Message ID | 20230511145110.27707-1-yi.l.liu@intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp4439541vqo; Thu, 11 May 2023 07:59:58 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4HySa5Wb8yWfqHSs1nRgRvdmj7HN+NX5RYvCbKXLJ2srgBzHKAkcZ7+TF9EXo+do40CjpK X-Received: by 2002:a17:90a:fa96:b0:24e:1093:c8c0 with SMTP id cu22-20020a17090afa9600b0024e1093c8c0mr20634609pjb.7.1683817198365; Thu, 11 May 2023 07:59:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683817198; cv=none; d=google.com; s=arc-20160816; b=1AOUJKsH2hZTczxsffhJxNc44N0H7gm6087UZ8xJvgwR1G+q9bFoP2qMZwYAbkg1pr PaiuZCdJErn4RuK1gZJ6sBvcMh9hjMfpvWZ9ip8kxnubLTbWpMCF54UIyvDcYLfdY8Jo gg0vs1DUMTImRTADOo7eatl8VsMYtK4ekCP52kH2RX9s51cUCu5XB9PIfDnleBfNmHrs Ru4CxS2kcrNwP9EF3IzTz/GdThjJGKVdssFrxhI9RIj2BI4CP56a73xooRxWFCabPp40 3NgjmNRhj8pqknniSWNr7Roo2Nc9fFIZlvVhsczcAeLgPyDUOqaXVTKsGXYTuno7WGA3 U4WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=mRbTbWENI4Ak3Ld5nTDqT5eJSmxa4x+xtY9UWbobJJ0=; b=X6SIDrDQcPs0NDA3OHI8TSae7Az4VLRTHK4cdGTic0WYejqvMOYaYcbXr7o6FpnKBD WZn4B2OXIiejLuc8SEtkxuH/hBRNhKqARvn5A8b+KPqLmad21GkkUvzXQuqB/EYXImbH 81JFb1Qc2dY+YUa7fTzWNYqb6KSwP98HrufHTVycUScWFZjR726MKncQ9qeKTKLZ8py+ oMEWJEMh1Iz5Jvi5WojqaUmH9Z4tMKcpD9WEhpAX4ktd4/UexbF57Xi7nscP/VRVjUw0 OPoXmGG6tLuTi8084iYgWxogXj5zR666wuNfsmsWi5u7Oq5tstEwbDuZp2eyglgsUY8W c00w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WAUU2buV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t9-20020a17090a950900b0024e24699dfdsi20238617pjo.78.2023.05.11.07.59.45; Thu, 11 May 2023 07:59:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=WAUU2buV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238633AbjEKOw2 (ORCPT <rfc822;peekingduck44@gmail.com> + 99 others); Thu, 11 May 2023 10:52:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238623AbjEKOwN (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 11 May 2023 10:52:13 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FF7A210D; Thu, 11 May 2023 07:51:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683816673; x=1715352673; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=gpgGwILXegaPLWgmpXrz68dTXAMClWSGpnD37BPQ7GM=; b=WAUU2buVxR8d8oh7JnnPPEjw6NMo5EGTjL+ztpSUKZfJf1FjG7n1rJjt hMyvhoRteAJ0H19PEaWjDpW4gnFqwnYHXrEsD8XbFjB0/DXq9j0/k3yqm 3dRLb8O7rBFF9U3VxCuWtnpDIjUoBBXEhADtQ6CHQmi48S2U7v7ywDjvf MTx40MZqVHNuNabbQpfJv9JDYLnmQGkTfLh5KR+GkX98ak9clif6srhOL qF5QCYLwY2kQqcMlbqBPxVs5mi/b6AUi0CC5FddzEJ82OF3yib1ifMQYJ As81LoYJlEypZycJTfGXv/T2fPQE3fPw2c/QzG4QTWZfcXwtfi/s4Npk3 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="335025414" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="335025414" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 07:51:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="769355144" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="769355144" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by fmsmga004.fm.intel.com with ESMTP; 11 May 2023 07:51:12 -0700 From: Yi Liu <yi.l.liu@intel.com> To: joro@8bytes.org, alex.williamson@redhat.com, jgg@nvidia.com, kevin.tian@intel.com, robin.murphy@arm.com, baolu.lu@linux.intel.com Cc: cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, shameerali.kolothum.thodi@huawei.com, lulu@redhat.com, suravee.suthikulpanit@amd.com, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, zhenzhong.duan@intel.com Subject: [PATCH v3 00/10] Add Intel VT-d nested translation Date: Thu, 11 May 2023 07:51:00 -0700 Message-Id: <20230511145110.27707-1-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765610302740532502?= X-GMAIL-MSGID: =?utf-8?q?1765610302740532502?= |
Series |
Add Intel VT-d nested translation
|
|
Message
Yi Liu
May 11, 2023, 2:51 p.m. UTC
This is to add Intel VT-d nested translation based on IOMMUFD nesting infrastructure. As the iommufd nesting infrastructure series[1], iommu core supports new ops to report iommu hardware information, allocate domains with user data and sync stage-1 IOTLB. The data required in the three paths are vendor-specific, so 1) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_device_info_vtd are defined to report iommu hardware information for Intel VT-d . 2) IOMMU_HWPT_DATA_VTD_S1 is defined for the Intel VT-d stage-1 page table, it will be used in the stage-1 domain allocation and IOTLB syncing path. struct iommu_hwpt_intel_vtd is defined to pass user_data for the Intel VT-d stage-1 domain allocation. struct iommu_hwpt_invalidate_intel_vtd is defined to pass the data for the Intel VT-d stage-1 IOTLB invalidation. With above IOMMUFD extensions, the intel iommu driver implements the three paths to support nested translation. The first Intel platform supporting nested translation is Sapphire Rapids which, unfortunately, has a hardware errata [2] requiring special treatment. This errata happens when a stage-1 page table page (either level) is located in a stage-2 read-only region. In that case the IOMMU hardware may ignore the stage-2 RO permission and still set the A/D bit in stage-1 page table entries during page table walking. A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to report this errata to userspace. With that restriction the user should either disable nested translation to favor RO stage-2 mappings or ensure no RO stage-2 mapping to enable nested translation. Intel-iommu driver is armed with necessary checks to prevent such mix in patch10 of this series. Qemu currently does add RO mappings though. The vfio agent in Qemu simply maps all valid regions in the GPA address space which certainly includes RO regions e.g. vbios. In reality we don't know a usage relying on DMA reads from the BIOS region. Hence finding a way to allow user opt-out RO mappings in Qemu might be an acceptable tradeoff. But how to achieve it cleanly needs more discussion in Qemu community. For now we just hacked Qemu to test. Complete code can be found in [3], QEMU could can be found in [4]. base-commit: ce9b593b1f74ccd090edc5d2ad397da84baa9946 [1] https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.com/ [2] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html [3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%2Bnesting Change log: v3: - Further split the patches into an order of adding helpers for nested domain, iotlb flush, nested domain attachment and nested domain allocation callback, then report the hw_info to userspace. - Add batch support in cache invalidation from userspace - Disallow nested translation usage if RO mappings exists in stage-2 domain due to errata on readonly mappings on Sapphire Rapids platform. v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.com/ - The iommufd infrastructure is split to be separate series. v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.com/ Regards, Yi Liu Lu Baolu (5): iommu/vt-d: Extend dmar_domain to support nested domain iommu/vt-d: Add helper for nested domain allocation iommu/vt-d: Add helper to setup pasid nested translation iommu/vt-d: Add nested domain allocation iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu (5): iommufd: Add data structure for Intel VT-d stage-1 domain allocation iommu/vt-d: Make domain attach helpers to be extern iommu/vt-d: Set the nested domain to a device iommu/vt-d: Add iotlb flush for nested domain iommu/vt-d: Implement hw_info for iommu capability query drivers/iommu/intel/Makefile | 2 +- drivers/iommu/intel/iommu.c | 78 ++++++++++++--- drivers/iommu/intel/iommu.h | 55 +++++++++-- drivers/iommu/intel/nested.c | 181 +++++++++++++++++++++++++++++++++++ drivers/iommu/intel/pasid.c | 151 +++++++++++++++++++++++++++++ drivers/iommu/intel/pasid.h | 2 + drivers/iommu/iommufd/main.c | 6 ++ include/linux/iommu.h | 1 + include/uapi/linux/iommufd.h | 149 ++++++++++++++++++++++++++++ 9 files changed, 603 insertions(+), 22 deletions(-) create mode 100644 drivers/iommu/intel/nested.c
Comments
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Thursday, May 11, 2023 10:51 PM > > The first Intel platform supporting nested translation is Sapphire > Rapids which, unfortunately, has a hardware errata [2] requiring special > treatment. This errata happens when a stage-1 page table page (either > level) is located in a stage-2 read-only region. In that case the IOMMU > hardware may ignore the stage-2 RO permission and still set the A/D bit > in stage-1 page table entries during page table walking. > > A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to > report > this errata to userspace. With that restriction the user should either > disable nested translation to favor RO stage-2 mappings or ensure no > RO stage-2 mapping to enable nested translation. > > Intel-iommu driver is armed with necessary checks to prevent such mix > in patch10 of this series. > > Qemu currently does add RO mappings though. The vfio agent in Qemu > simply maps all valid regions in the GPA address space which certainly > includes RO regions e.g. vbios. > > In reality we don't know a usage relying on DMA reads from the BIOS > region. Hence finding a way to allow user opt-out RO mappings in > Qemu might be an acceptable tradeoff. But how to achieve it cleanly > needs more discussion in Qemu community. For now we just hacked Qemu > to test. > Hi, Alex, Want to touch base on your thoughts about this errata before we actually go to discuss how to handle it in Qemu. Overall it affects all Sapphire Rapids platforms. Fully disabling nested translation in the kernel just for this rare vulnerability sounds an overkill. So we decide to enforce the exclusive check (RO in stage-2 vs. nesting) in the kernel and expose the restriction to userspace so the VMM can choose which one to enable based on its own requirement. At least this looks a reasonable tradeoff to some proprietary VMMs which never adds RO mappings in stage-2 today. But we do want to get Qemu support nested translation on those platform as the widely-used reference VMM! Do you see any major oversight before pursuing such change in Qemu e.g. having a way for the user to opt-out adding RO mappings in stage-2? 😊 Thanks Kevin
On Wed, 24 May 2023 08:59:43 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Thursday, May 11, 2023 10:51 PM > > > > The first Intel platform supporting nested translation is Sapphire > > Rapids which, unfortunately, has a hardware errata [2] requiring special > > treatment. This errata happens when a stage-1 page table page (either > > level) is located in a stage-2 read-only region. In that case the IOMMU > > hardware may ignore the stage-2 RO permission and still set the A/D bit > > in stage-1 page table entries during page table walking. > > > > A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to > > report > > this errata to userspace. With that restriction the user should either > > disable nested translation to favor RO stage-2 mappings or ensure no > > RO stage-2 mapping to enable nested translation. > > > > Intel-iommu driver is armed with necessary checks to prevent such mix > > in patch10 of this series. > > > > Qemu currently does add RO mappings though. The vfio agent in Qemu > > simply maps all valid regions in the GPA address space which certainly > > includes RO regions e.g. vbios. > > > > In reality we don't know a usage relying on DMA reads from the BIOS > > region. Hence finding a way to allow user opt-out RO mappings in > > Qemu might be an acceptable tradeoff. But how to achieve it cleanly > > needs more discussion in Qemu community. For now we just hacked Qemu > > to test. > > > > Hi, Alex, > > Want to touch base on your thoughts about this errata before we > actually go to discuss how to handle it in Qemu. > > Overall it affects all Sapphire Rapids platforms. Fully disabling nested > translation in the kernel just for this rare vulnerability sounds an overkill. > > So we decide to enforce the exclusive check (RO in stage-2 vs. nesting) > in the kernel and expose the restriction to userspace so the VMM can > choose which one to enable based on its own requirement. > > At least this looks a reasonable tradeoff to some proprietary VMMs > which never adds RO mappings in stage-2 today. > > But we do want to get Qemu support nested translation on those > platform as the widely-used reference VMM! > > Do you see any major oversight before pursuing such change in Qemu > e.g. having a way for the user to opt-out adding RO mappings in stage-2? 😊 I don't feel like I have enough info to know what common scenarios are going to make use of 2-stage and nested configurations and how likely a user is to need such an opt-out. If it's likely that a user is going to encounter this configuration, an opt-out is at best a workaround. It's a significant support issue if a user needs to generate a failure in QEMU, notice and decipher any log messages that failure may have generated, and take action to introduce specific changes in their VM configuration to support a usage restriction. For QEMU I might lean more towards an effort to better filter the mappings we create to avoid these read-only ranges that likely don't require DMA mappings anyway. How much does this affect arbitrary userspace vfio drivers? For example are there scenarios where running in a VM with a vIOMMU introduces nested support that's unknown to the user which now prevents this usage? An example might be running an L2 guest with a version of QEMU that does create read-only mappings. If necessary, how would lack of read-only mapping support be conveyed to those nested use cases? Thanks, Alex
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, May 26, 2023 2:07 AM > > On Wed, 24 May 2023 08:59:43 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > Sent: Thursday, May 11, 2023 10:51 PM > > > > > > The first Intel platform supporting nested translation is Sapphire > > > Rapids which, unfortunately, has a hardware errata [2] requiring special > > > treatment. This errata happens when a stage-1 page table page (either > > > level) is located in a stage-2 read-only region. In that case the IOMMU > > > hardware may ignore the stage-2 RO permission and still set the A/D bit > > > in stage-1 page table entries during page table walking. > > > > > > A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to > > > report > > > this errata to userspace. With that restriction the user should either > > > disable nested translation to favor RO stage-2 mappings or ensure no > > > RO stage-2 mapping to enable nested translation. > > > > > > Intel-iommu driver is armed with necessary checks to prevent such mix > > > in patch10 of this series. > > > > > > Qemu currently does add RO mappings though. The vfio agent in Qemu > > > simply maps all valid regions in the GPA address space which certainly > > > includes RO regions e.g. vbios. > > > > > > In reality we don't know a usage relying on DMA reads from the BIOS > > > region. Hence finding a way to allow user opt-out RO mappings in > > > Qemu might be an acceptable tradeoff. But how to achieve it cleanly > > > needs more discussion in Qemu community. For now we just hacked > Qemu > > > to test. > > > > > > > Hi, Alex, > > > > Want to touch base on your thoughts about this errata before we > > actually go to discuss how to handle it in Qemu. > > > > Overall it affects all Sapphire Rapids platforms. Fully disabling nested > > translation in the kernel just for this rare vulnerability sounds an overkill. > > > > So we decide to enforce the exclusive check (RO in stage-2 vs. nesting) > > in the kernel and expose the restriction to userspace so the VMM can > > choose which one to enable based on its own requirement. > > > > At least this looks a reasonable tradeoff to some proprietary VMMs > > which never adds RO mappings in stage-2 today. > > > > But we do want to get Qemu support nested translation on those > > platform as the widely-used reference VMM! > > > > Do you see any major oversight before pursuing such change in Qemu > > e.g. having a way for the user to opt-out adding RO mappings in stage-2? > 😊 > > I don't feel like I have enough info to know what common scenarios are > going to make use of 2-stage and nested configurations and how likely a > user is to need such an opt-out. If it's likely that a user is going > to encounter this configuration, an opt-out is at best a workaround. > It's a significant support issue if a user needs to generate a failure > in QEMU, notice and decipher any log messages that failure may have > generated, and take action to introduce specific changes in their VM > configuration to support a usage restriction. Thanks. This is a good point. > > For QEMU I might lean more towards an effort to better filter the > mappings we create to avoid these read-only ranges that likely don't > require DMA mappings anyway. We thought about having intel-viommu to register a discard memory manager to filter in case the kernel reports this errata. Our originally thought was that even with it we may still want to explicitly let user to opt given this configuration doesn't match the bare metal. But with your explanation probably doing so instead causes more trouble than what it tries to achieve. > > How much does this affect arbitrary userspace vfio drivers? For > example are there scenarios where running in a VM with a vIOMMU > introduces nested support that's unknown to the user which now prevents > this usage? An example might be running an L2 guest with a version of > QEMU that does create read-only mappings. If necessary, how would lack > of read-only mapping support be conveyed to those nested use cases? To enable nested translation it's expected to have the guest use stage-1 while the host uses stage-2. So the L0 QEMU will expose a vIOMMU with only stage-1 capability to L1. In that case it's perfectly fine to have RO mappings in stage-1 no matter whether L1 further create L2 guest inside. Then only L0 QEMU needs to care about this RO thing in stage-2. In case L0 QEMU exposes a legacy vIOMMU which supports only stage-2 then nesting cannot be enabled. Instead it will fallback to the old shadowing path then RO mapping from guest doesn't matter either. Exposing a vIOMMU which supports both stage-1/stage-2/nesting is another story. But I believe it's far from when this becomes useful and it's reasonable to just have L0 QEMU not support this configuration before this errata is fixed. 😊 Thanks, Kevin
On Wed, May 24, 2023 at 08:59:43AM +0000, Tian, Kevin wrote: > At least this looks a reasonable tradeoff to some proprietary VMMs > which never adds RO mappings in stage-2 today. What is the reason for the RO anyhow? Would it be so bad if it was DMA mapped as RW due to the errata? Jason
On Mon, 29 May 2023 15:43:02 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Wed, May 24, 2023 at 08:59:43AM +0000, Tian, Kevin wrote: > > > At least this looks a reasonable tradeoff to some proprietary VMMs > > which never adds RO mappings in stage-2 today. > > What is the reason for the RO anyhow? > > Would it be so bad if it was DMA mapped as RW due to the errata? What if it's the zero page? Thanks, Alex
On Mon, May 29, 2023 at 06:16:44PM -0600, Alex Williamson wrote: > On Mon, 29 May 2023 15:43:02 -0300 > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > On Wed, May 24, 2023 at 08:59:43AM +0000, Tian, Kevin wrote: > > > > > At least this looks a reasonable tradeoff to some proprietary VMMs > > > which never adds RO mappings in stage-2 today. > > > > What is the reason for the RO anyhow? > > > > Would it be so bad if it was DMA mapped as RW due to the errata? > > What if it's the zero page? Thanks, GUP doesn't return the zero page if FOL_WRITE is specified Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, May 30, 2023 2:43 AM > > On Wed, May 24, 2023 at 08:59:43AM +0000, Tian, Kevin wrote: > > > At least this looks a reasonable tradeoff to some proprietary VMMs > > which never adds RO mappings in stage-2 today. > > What is the reason for the RO anyhow? vfio simply follows the permission in the CPU address space. vBIOS regions are marked as RO there hence also carried to vfio mappings. > > Would it be so bad if it was DMA mapped as RW due to the errata? > think of a scenario where the vbios memory is shared by multiple qemu instances then RW allows a malicious VM to modify the shared content then potentially attacking other VMs. skipping the mapping is safest in this regard.
On Wed, Jun 14, 2023 at 08:07:30AM +0000, Tian, Kevin wrote: > think of a scenario where the vbios memory is shared by multiple qemu > instances then RW allows a malicious VM to modify the shared content > then potentially attacking other VMs. qemu would have to map the vbios as MAP_PRIVATE WRITE before the iommu side could map it writable, so this is not a real worry. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, June 14, 2023 7:53 PM > > On Wed, Jun 14, 2023 at 08:07:30AM +0000, Tian, Kevin wrote: > > > think of a scenario where the vbios memory is shared by multiple qemu > > instances then RW allows a malicious VM to modify the shared content > > then potentially attacking other VMs. > > qemu would have to map the vbios as MAP_PRIVATE WRITE before the > iommu > side could map it writable, so this is not a real worry. > Make sense. but IMHO it's still safer to reduce the permission (RO->NP) than increasing the permission (RO->RW) when faithfully emulating bare metal behavior is impossible, especially when there is no real usage counting on it. 😊