Message ID | 20230508034700.7686-1-yan.y.zhao@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1910954vqo; Sun, 7 May 2023 21:43:39 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4DE5hjrQ1wccFlG8B+rIU7eQHJXNgu6cgrcRKotEmeu9jMESb42tzHqX/deTFEYhjvSJnc X-Received: by 2002:a05:6a20:7490:b0:100:5edf:da81 with SMTP id p16-20020a056a20749000b001005edfda81mr2911721pzd.36.1683521019017; Sun, 07 May 2023 21:43:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683521018; cv=none; d=google.com; s=arc-20160816; b=qgAMnlzJiwIQIX/kgc6lVK22b44unqpB3VC2g3vtmtywpJJuwG8DOn1YtLS2esQnX1 En2a2KxmQSyEj/6XgbxsmTuYePso27vc9uIwjPNl9XN51Zeiv3SMIIg5ENlo/e1a1wfM 6Yi2BuXcpvWR8XGu8G5uu1jgOOhw1jyGuLk+z4DBNqc7R/iAueNXlVp94X69Sk2bYEb+ DEz7Eh4zwkUZLLUo5GkTBoUlSdbZf6VP3ho40EYWGBnEEqfrEpJSb4NyR7v300qXH3Wa V9AUWIM/SRuk6obM3iU5QKV/panYAoxyDBno//0U+2wb0Csv/EOnSCq2lEvlcGo5yzNU SL+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=miR4ws+hBBgxpqwFBsVYK9wMZ+ziqKgK9OKVVWSzQPQ=; b=V7AG7HAUqBiUI3h8UzRrdQx1YWToomf78qHrn7lAyNjyvpJNdDlkajDq2NBrrzjgCO 9ciWnDx6Zd2pnlsdgq0xhZck9LCpHcLKMNz95p6XlOkMKSkrzrVgEK5pkzr1RYR8Hwzm b1EA2MR8xPw2UpRjclh6VhCXszvikmKzcZJ5lBF4OHNEQkdWiudwMyY/MgmUFagdu4mT t6EXU5Ed1OsIvS558D5N9gg/04572nSmtrR6m5E4ulubur9Pq6Yn244hTO6BXlUugDcp nQmEa96bIOuot1bQen0AAaIqr5QPm+fCHi8Fi9VDRCDEsiNE/eukSvqcJhfamv+DTkPH jwMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ClcI07Te; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s2-20020a63af42000000b0050f55427e3dsi7727022pgo.701.2023.05.07.21.43.24; Sun, 07 May 2023 21:43:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ClcI07Te; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232487AbjEHEMN (ORCPT <rfc822;baris.duru.linux@gmail.com> + 99 others); Mon, 8 May 2023 00:12:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232217AbjEHEMK (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 8 May 2023 00:12:10 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EFB94227; Sun, 7 May 2023 21:12:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683519129; x=1715055129; h=from:to:cc:subject:date:message-id; bh=zT05ulhlCsDMX6ubQhxzo5A+qEruyiIbSgQaS9PNcCc=; b=ClcI07Tej/VhD/Cjcn6HM1aBxmRSnkj1+r4NrEzPf8XKel5yoRTEQ3W9 T/NiGUuZL4i5QDAt7O80rfoVSX+zJv0Txa3fYPz60XayIfUoN8sLgZ/hw ukW/mFvU4tqKcOsXUNCvuAbaN1waS4jGvqPNsO7uKX4+eGZup8xGF4s3O UdEbM35Ugu7Y2VJocSsxWdOCLJLbluElXk3mkwnc4zvInGIRst4XYCaz+ WvtwC/dz3ppfSHjUNWplXHuBvvLAH2nvlEq63vDB2dpaFmSHud07z8tHC 7lKboIzkZrmxVGSEPTdKVBtQtMIsXjfZU4CfzZ40A8UP8RQ22+tLWCI7V g==; X-IronPort-AV: E=McAfee;i="6600,9927,10703"; a="415110258" X-IronPort-AV: E=Sophos;i="5.99,258,1677571200"; d="scan'208";a="415110258" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 May 2023 21:12:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10703"; a="731146970" X-IronPort-AV: E=Sophos;i="5.99,258,1677571200"; d="scan'208";a="731146970" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 May 2023 21:12:04 -0700 From: Yan Zhao <yan.y.zhao@intel.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, Yan Zhao <yan.y.zhao@intel.com> Subject: [PATCH] KVM: x86: Zap all TDP leaf entries according noncoherent DMA count Date: Mon, 8 May 2023 11:47:00 +0800 Message-Id: <20230508034700.7686-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765299736390927179?= X-GMAIL-MSGID: =?utf-8?q?1765299736390927179?= |
Series |
KVM: x86: Zap all TDP leaf entries according noncoherent DMA count
|
|
Commit Message
Yan Zhao
May 8, 2023, 3:47 a.m. UTC
Zap all TDP leaf entries when noncoherent DMA count goes from 0 to !0, or
from !0 to 0.
When there's no noncoherent DMA device, EPT memory type is
((MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT)
When there're noncoherent DMA devices, EPT memory type needs to honor
guest CR0_CD and MTRR settings.
So, if noncoherent DMA count changes between 0 and !0, EPT leaf entries
need to be zapped to clear stale memory type.
This issue might be hidden when VFIO adding/removing MMIO regions of the
noncoherent DMA devices on device attaching/de-attaching because
usually the MMIO regions will be disabled/enabled for several times during
guest PCI probing. And in KVM, TDP entries are all zapped on memslot
removal.
However, this issue may appear when kvm_mmu_zap_all_fast() is not called
before KVM slot removal, e.g. as for TDX, only leaf entries for the
memslot to be removed is zapped.
static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot,
struct kvm_page_track_notifier_node *node)
{
if (kvm_gfn_shared_mask(kvm))
/*
* Secure-EPT requires to release PTs from the leaf. The
* optimization to zap root PT first with child PT doesn't
* work.
*/
kvm_mmu_zap_memslot(kvm, slot);
else
kvm_mmu_zap_all_fast(kvm);
}
And even without TDX's case, in some extreme conditions if MMIO regions
are not disabled during device attaching, e.g. if guest does not cause
the MMIO region disabling in QEMU.
Then TDP zap will not be called and wrong EPT memory type might be
retained.
So, do the TDP zapping of all leaf entries when present/non-present state
of noncoherent DMA devices changes to ensure stale entries cleaned away.
And as this is not a frequent operation, the extra zap should be fine.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
arch/x86/kvm/x86.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Comments
On Mon, May 08, 2023 at 11:47:00AM +0800, Yan Zhao wrote: >Zap all TDP leaf entries when noncoherent DMA count goes from 0 to !0, or >from !0 to 0. > >When there's no noncoherent DMA device, EPT memory type is >((MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT) > >When there're noncoherent DMA devices, EPT memory type needs to honor >guest CR0_CD and MTRR settings. > >So, if noncoherent DMA count changes between 0 and !0, EPT leaf entries >need to be zapped to clear stale memory type. > >This issue might be hidden when VFIO adding/removing MMIO regions of the >noncoherent DMA devices on device attaching/de-attaching because >usually the MMIO regions will be disabled/enabled for several times during >guest PCI probing. And in KVM, TDP entries are all zapped on memslot >removal. > >However, this issue may appear when kvm_mmu_zap_all_fast() is not called >before KVM slot removal, e.g. as for TDX, only leaf entries for the >memslot to be removed is zapped. > >static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, > struct kvm_memory_slot *slot, > struct kvm_page_track_notifier_node *node) >{ > if (kvm_gfn_shared_mask(kvm)) > /* > * Secure-EPT requires to release PTs from the leaf. The > * optimization to zap root PT first with child PT doesn't > * work. > */ > kvm_mmu_zap_memslot(kvm, slot); > else > kvm_mmu_zap_all_fast(kvm); >} TDX code isn't merged. So, I think you'd better not use TDX as an argument. > >And even without TDX's case, in some extreme conditions if MMIO regions >are not disabled during device attaching, e.g. if guest does not cause >the MMIO region disabling in QEMU. >Then TDP zap will not be called and wrong EPT memory type might be >retained. > >So, do the TDP zapping of all leaf entries when present/non-present state >of noncoherent DMA devices changes to ensure stale entries cleaned away. >And as this is not a frequent operation, the extra zap should be fine. > >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> >--- > arch/x86/kvm/x86.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >index e7f78fe79b32..99a825722d95 100644 >--- a/arch/x86/kvm/x86.c >+++ b/arch/x86/kvm/x86.c >@@ -13145,13 +13145,15 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device); > > void kvm_arch_register_noncoherent_dma(struct kvm *kvm) > { >- atomic_inc(&kvm->arch.noncoherent_dma_count); >+ if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); The issue is specific to EPT. shouldn't this be conditional on tdp_enabled, like update_mtrr()? Likewise, shouldn't we omit to call kvm_zap_gfn_range() in kvm_post_set_cr0() if tdp_enabled is false? > } > EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); > > void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) > { >- atomic_dec(&kvm->arch.noncoherent_dma_count); >+ if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); > } > EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma); > >-- >2.17.1 >
On Mon, May 08, 2023 at 03:19:56PM +0800, Chao Gao wrote: > On Mon, May 08, 2023 at 11:47:00AM +0800, Yan Zhao wrote: > >Zap all TDP leaf entries when noncoherent DMA count goes from 0 to !0, or > >from !0 to 0. > > > >When there's no noncoherent DMA device, EPT memory type is > >((MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT) > > > >When there're noncoherent DMA devices, EPT memory type needs to honor > >guest CR0_CD and MTRR settings. > > > >So, if noncoherent DMA count changes between 0 and !0, EPT leaf entries > >need to be zapped to clear stale memory type. > > > >This issue might be hidden when VFIO adding/removing MMIO regions of the > >noncoherent DMA devices on device attaching/de-attaching because > >usually the MMIO regions will be disabled/enabled for several times during > >guest PCI probing. And in KVM, TDP entries are all zapped on memslot > >removal. > > > >However, this issue may appear when kvm_mmu_zap_all_fast() is not called > >before KVM slot removal, e.g. as for TDX, only leaf entries for the > >memslot to be removed is zapped. > > > >static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > struct kvm_page_track_notifier_node *node) > >{ > > if (kvm_gfn_shared_mask(kvm)) > > /* > > * Secure-EPT requires to release PTs from the leaf. The > > * optimization to zap root PT first with child PT doesn't > > * work. > > */ > > kvm_mmu_zap_memslot(kvm, slot); > > else > > kvm_mmu_zap_all_fast(kvm); > >} > > TDX code isn't merged. So, I think you'd better not use TDX as an argument. > Ok. But I just want to explain that kvm_mmu_zap_all_fast() is not desired in some cases during slot DELETE. TDX is a good example here. > > > >And even without TDX's case, in some extreme conditions if MMIO regions > >are not disabled during device attaching, e.g. if guest does not cause > >the MMIO region disabling in QEMU. > >Then TDP zap will not be called and wrong EPT memory type might be > >retained. > > > >So, do the TDP zapping of all leaf entries when present/non-present state > >of noncoherent DMA devices changes to ensure stale entries cleaned away. > >And as this is not a frequent operation, the extra zap should be fine. > > > >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > >--- > > arch/x86/kvm/x86.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >index e7f78fe79b32..99a825722d95 100644 > >--- a/arch/x86/kvm/x86.c > >+++ b/arch/x86/kvm/x86.c > >@@ -13145,13 +13145,15 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device); > > > > void kvm_arch_register_noncoherent_dma(struct kvm *kvm) > > { > >- atomic_inc(&kvm->arch.noncoherent_dma_count); > >+ if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) > > >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); > > The issue is specific to EPT. shouldn't this be conditional on tdp_enabled, like > update_mtrr()? > Yes. good point. Maybe also include checking of shadow_memtype_mask. > Likewise, shouldn't we omit to call kvm_zap_gfn_range() in kvm_post_set_cr0() if > tdp_enabled is false? I think so. And also check tdp_enabled and shadow_memtype_mask in the case of update_mtrr(). Will add a helper function in next version. Thanks, Chao! > > > } > > EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); > > > > void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) > > { > >- atomic_dec(&kvm->arch.noncoherent_dma_count); > >+ if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) > >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); > > } > > EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma); > > > >-- > >2.17.1 > >
On Tue, May 09, 2023 at 09:40:37AM +0800, Yan Zhao wrote: >On Mon, May 08, 2023 at 03:19:56PM +0800, Chao Gao wrote: >> >However, this issue may appear when kvm_mmu_zap_all_fast() is not called >> >before KVM slot removal, e.g. as for TDX, only leaf entries for the >> >memslot to be removed is zapped. >> > >> >static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, >> > struct kvm_memory_slot *slot, >> > struct kvm_page_track_notifier_node *node) >> >{ >> > if (kvm_gfn_shared_mask(kvm)) >> > /* >> > * Secure-EPT requires to release PTs from the leaf. The >> > * optimization to zap root PT first with child PT doesn't >> > * work. >> > */ >> > kvm_mmu_zap_memslot(kvm, slot); >> > else >> > kvm_mmu_zap_all_fast(kvm); >> >} >> >> TDX code isn't merged. So, I think you'd better not use TDX as an argument. >> >Ok. But I just want to explain that kvm_mmu_zap_all_fast() is not >desired in some cases during slot DELETE. TDX is a good example here. Ok. Maybe just remove the above code snippet. The code snippet is confusing because people may think it is what current KVM has, but actually it turns out to be a modified version in pending TDX patches, which are subject to change. > >> > >> >And even without TDX's case, in some extreme conditions if MMIO regions >> >are not disabled during device attaching, e.g. if guest does not cause >> >the MMIO region disabling in QEMU. >> >Then TDP zap will not be called and wrong EPT memory type might be >> >retained. >> > >> >So, do the TDP zapping of all leaf entries when present/non-present state >> >of noncoherent DMA devices changes to ensure stale entries cleaned away. >> >And as this is not a frequent operation, the extra zap should be fine. >> > >> >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> >> >--- >> > arch/x86/kvm/x86.c | 6 ++++-- >> > 1 file changed, 4 insertions(+), 2 deletions(-) >> > >> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> >index e7f78fe79b32..99a825722d95 100644 >> >--- a/arch/x86/kvm/x86.c >> >+++ b/arch/x86/kvm/x86.c >> >@@ -13145,13 +13145,15 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device); >> > >> > void kvm_arch_register_noncoherent_dma(struct kvm *kvm) >> > { >> >- atomic_inc(&kvm->arch.noncoherent_dma_count); >> >+ if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) >> >> >+ kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); >> >> The issue is specific to EPT. shouldn't this be conditional on tdp_enabled, like >> update_mtrr()? >> >Yes. good point. >Maybe also include checking of shadow_memtype_mask. looks good.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e7f78fe79b32..99a825722d95 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13145,13 +13145,15 @@ EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device); void kvm_arch_register_noncoherent_dma(struct kvm *kvm) { - atomic_inc(&kvm->arch.noncoherent_dma_count); + if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) + kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); } EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) { - atomic_dec(&kvm->arch.noncoherent_dma_count); + if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) + kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); } EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);