[v3,07/11] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
Message ID | 20230616023815.7439-1-yan.y.zhao@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1060439vqr; Thu, 15 Jun 2023 20:18:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6GWrmseII0G1nGH+VLzFfqBTAS7hjRXCfTpugXhpiDwdaLL0myMXTDEr2FznFXieBp9Sho X-Received: by 2002:a17:902:6a85:b0:1b5:2fdf:5bd8 with SMTP id n5-20020a1709026a8500b001b52fdf5bd8mr445575plk.8.1686885530190; Thu, 15 Jun 2023 20:18:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686885530; cv=none; d=google.com; s=arc-20160816; b=AWhQkhlZcgC11wT6ySpp1Nwz07GCIRVd4r5aEgHOnP5XjNIR76FwTMXiZQ2AH6hmSv LfBcsrnlfNw7I9ALrH8mPpdaFMnGWe/Ua9pHdPyJtjiiT8LV2Y3e17kz1Tg1t1hiJ87s fXyQUfPcBRHadHsYdty64QHR2PgIWD2AcuiYSOQV54wehbgfmj2gopG+K6vpiUZd77sk aB8HaVXXdRwAqHEQjKqGDYLab3CGyP1qBJd1vBzcSqOgRb25xHwHJ8fwtcDrt7boqin3 UYPZADTAWb+xVwP39zOzLo557yUh/bGBG4WCSycxSkr19NdFLPgX3Nj4GurcEBMpWVCI dMLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=wbJsOGmI/nT2KllmvTDf95NYFlsin399M0Ni4eYW7r0=; b=UDfeM/p1YmkLSLEZe22J4XArcUESsB/+qeMw4vcc2p6Y5dZGcUuivsyG/De8gauvBI cO9ZDVyWN+WePD0DTWM3+NR3/1cbBt5W+quX65VTZYHm0VsennACGQiJmhspgL66gdcC RyMs0MK/QsclV5FLjAF5yTE2fCKzU3Hn29b6mO3TcPpLY8oo4pHHVbETRMrqFU7zPQhQ TDYJTJQFmCQTrIxtDAamGMa/W9mvbAudoi3S3RmfEAnLQQ5VbBGCYDBHet4ELGuyL3Wd 6oXSkfUiIEUKpt+2DpaggsH2MbOAGCpeNrc3FrQ9tr6idXFPk5WZkSJMX8nJlXdmeUB8 CjRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DCilRLKj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t3-20020a170902bc4300b001ae38227983si13826359plz.199.2023.06.15.20.18.37; Thu, 15 Jun 2023 20:18:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DCilRLKj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241318AbjFPDEE (ORCPT <rfc822;maxin.john@gmail.com> + 99 others); Thu, 15 Jun 2023 23:04:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229654AbjFPDD7 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 15 Jun 2023 23:03:59 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F30AC2D47; Thu, 15 Jun 2023 20:03:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686884638; x=1718420638; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=tApTo48fmEWmOt000/vMI+RpIzA2pYKklHGoAN2lprc=; b=DCilRLKjlPcbjNYarsrjYWgkn934h+4jjB88EP+z5vlvBftB7ok9v1Mw HydxmHlgzrXTb4w/ksrTd3wuhEQD82n4fYODyBb+Gfzg+dRxDruGSv4Mu AQ2z+cZx38yCvX3J201s/CbEzHU+/iLY/Dy+4msQDelUQHLNy3DkYHzOY HhzjGxgyZr8H6kHLauEC3MMQ3ZZY+zkyJCd1hlkvk0y3Ip9gS0j1F9CKz 9qjCRhwtjgNtwfVkgKfIzH8dGFiORlBjWWSDVy5EBQH63I4JWJh1tdSo8 Fef0qiupUYibaur28516osfkm0nqtAo4xPgYqSSVKJhVf4qFiSm8T9Enl w==; X-IronPort-AV: E=McAfee;i="6600,9927,10742"; a="357976726" X-IronPort-AV: E=Sophos;i="6.00,246,1681196400"; d="scan'208";a="357976726" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2023 20:03:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10742"; a="782724568" X-IronPort-AV: E=Sophos;i="6.00,246,1681196400"; d="scan'208";a="782724568" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2023 20:03:34 -0700 From: Yan Zhao <yan.y.zhao@intel.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, Yan Zhao <yan.y.zhao@intel.com> Subject: [PATCH v3 07/11] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Date: Fri, 16 Jun 2023 10:38:15 +0800 Message-Id: <20230616023815.7439-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230616023101.7019-1-yan.y.zhao@intel.com> References: <20230616023101.7019-1-yan.y.zhao@intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768827682032173413?= X-GMAIL-MSGID: =?utf-8?q?1768827682032173413?= |
Series |
KVM: x86/mmu: refine memtype related mmu zap
|
|
Commit Message
Yan Zhao
June 16, 2023, 2:38 a.m. UTC
For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory
types when cache is disabled and non-coherent DMA are present.
With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the
EPT memory type when guest cache is disabled before this patch.
Removing the IPAT bit in this patch will allow effective memory type to
honor PAT values as well, which will make the effective memory type
stronger than WB as WB is the weakest memtype. However, this change is
acceptable as it doesn't make the memory type weaker, consider without
this quirk, the original memtype for cache disabled is UC + IPAT.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
Comments
On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote: > On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote: > >For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory > >types when cache is disabled and non-coherent DMA are present. > > > >With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the > >EPT memory type when guest cache is disabled before this patch. > >Removing the IPAT bit in this patch will allow effective memory type to > >honor PAT values as well, which will make the effective memory type > > Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g., > which guests can benefit from this change? This patch is actually a preparation for later patch 10 to implement fine-grained zap. If when CR0.CD=1 the EPT type is WB + IPAT, and when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are without IPAT, then we have to always zap all EPT entries. Given removing the IPAT bit when CR0.CD=1 only makes the quirk KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if the guest PAT overwrites it), it's still acceptable. On the other hand, from the guest's point of view, it sees the GPA is UC with CR0.CD=1. So, if we want to align host EPT memtype with guest's view, we have to drop the quirk KVM_X86_QUIRK_CD_NW_CLEARED, which, however, will impact the guest bootup performance dramatically and is not adopted in any VM. So, we remove the IPAT bit when CR0.CD=1 with the quirk KVM_X86_QUIRK_CD_NW_CLEARED to make it stricter and to enable later optimization. > > >stronger than WB as WB is the weakest memtype. However, this change is > >acceptable as it doesn't make the memory type weaker, > > >consider without > >this quirk, the original memtype for cache disabled is UC + IPAT. > > This change impacts only the case where the quirk is enabled. Maybe you > mean: > > _with_ the quirk, the original memtype for cached disabled is _WB_ + IPAT. > Uh, I mean originally without the quirk, UC + IPAT should be returned, which is the correct value to return. I can explain the full background more clearly in the next version. Thanks > > > > >Suggested-by: Sean Christopherson <seanjc@google.com> > >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > >--- > > arch/x86/kvm/vmx/vmx.c | 9 +++------ > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > >index 0ecf4be2c6af..c1e93678cea4 100644 > >--- a/arch/x86/kvm/vmx/vmx.c > >+++ b/arch/x86/kvm/vmx/vmx.c > >@@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm) > > > > static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > > { > >- u8 cache; > >- > > /* We wanted to honor guest CD/MTRR/PAT, but doing so could result in > > * memory aliases with conflicting memory types and sometimes MCEs. > > * We have to be careful as to what are honored and when. > >@@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > > > > if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) { > > if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) > >- cache = MTRR_TYPE_WRBACK; > >+ return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; > > Shouldn't KVM honor guest MTRRs as well? i.e., as if CR0.CD isn't set. > > > else > >- cache = MTRR_TYPE_UNCACHABLE; > >- > >- return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; > >+ return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) | > >+ VMX_EPT_IPAT_BIT; > > } > > > > return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT; > >-- > >2.17.1 > >
On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote: >For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory >types when cache is disabled and non-coherent DMA are present. > >With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the >EPT memory type when guest cache is disabled before this patch. >Removing the IPAT bit in this patch will allow effective memory type to >honor PAT values as well, which will make the effective memory type Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g., which guests can benefit from this change? >stronger than WB as WB is the weakest memtype. However, this change is >acceptable as it doesn't make the memory type weaker, >consider without >this quirk, the original memtype for cache disabled is UC + IPAT. This change impacts only the case where the quirk is enabled. Maybe you mean: _with_ the quirk, the original memtype for cached disabled is _WB_ + IPAT. > >Suggested-by: Sean Christopherson <seanjc@google.com> >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> >--- > arch/x86/kvm/vmx/vmx.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >index 0ecf4be2c6af..c1e93678cea4 100644 >--- a/arch/x86/kvm/vmx/vmx.c >+++ b/arch/x86/kvm/vmx/vmx.c >@@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm) > > static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > { >- u8 cache; >- > /* We wanted to honor guest CD/MTRR/PAT, but doing so could result in > * memory aliases with conflicting memory types and sometimes MCEs. > * We have to be careful as to what are honored and when. >@@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > > if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) { > if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) >- cache = MTRR_TYPE_WRBACK; >+ return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; Shouldn't KVM honor guest MTRRs as well? i.e., as if CR0.CD isn't set. > else >- cache = MTRR_TYPE_UNCACHABLE; >- >- return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; >+ return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) | >+ VMX_EPT_IPAT_BIT; > } > > return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT; >-- >2.17.1 >
On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote: > > > >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > >index 0ecf4be2c6af..c1e93678cea4 100644 > >--- a/arch/x86/kvm/vmx/vmx.c > >+++ b/arch/x86/kvm/vmx/vmx.c > >@@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm) > > > > static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > > { > >- u8 cache; > >- > > /* We wanted to honor guest CD/MTRR/PAT, but doing so could result in > > * memory aliases with conflicting memory types and sometimes MCEs. > > * We have to be careful as to what are honored and when. > >@@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > > > > if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) { > > if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) > >- cache = MTRR_TYPE_WRBACK; > >+ return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; > > Shouldn't KVM honor guest MTRRs as well? i.e., as if CR0.CD isn't set. I proposed to return guest mtrr type when CR0.CD=1. https://lore.kernel.org/all/ZHDZpHYQhPtkNnQe@google.com/ Sean rejected it because before guest MTRR is first time enabled, we have to return WB with the quirk, otherwise the VM bootup will be super slow. Or maybe we can return WB when guest MTRR is not enabled and guest MTRR type when guest MTRR is anabled to woraround it. e.g. if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) { if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) { cache = !mtrr_is_enabled(mtrr_state) ? MTRR_TYPE_WRBACK : kvm_mtrr_get_guest_memory_type(vcpu, gfn); return cache << VMX_EPT_MT_EPTE_SHIFT; } ... } But Sean does not like piling too much on top of the quirk and I think it is right to keep the quirk simple. "In short, I am steadfastly against any change that piles more arbitrary behavior functional behavior on top of the quirk, especially when that behavior relies on heuristics to try and guess what the guest is doing." > > > else > >- cache = MTRR_TYPE_UNCACHABLE; > >- > >- return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; > >+ return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) | > >+ VMX_EPT_IPAT_BIT; > > } > > > > return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT; > >-- > >2.17.1 > >
On Tue, Jun 20, 2023 at 11:34:43AM +0800, Chao Gao wrote: > On Tue, Jun 20, 2023 at 10:34:29AM +0800, Yan Zhao wrote: > >On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote: > >> On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote: > >> >For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory > >> >types when cache is disabled and non-coherent DMA are present. > >> > > >> >With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the > >> >EPT memory type when guest cache is disabled before this patch. > >> >Removing the IPAT bit in this patch will allow effective memory type to > >> >honor PAT values as well, which will make the effective memory type > >> > >> Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g., > >> which guests can benefit from this change? > >This patch is actually a preparation for later patch 10 to implement > >fine-grained zap. > >If when CR0.CD=1 the EPT type is WB + IPAT, and > >when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are > >without IPAT, then we have to always zap all EPT entries. > > OK. The goal is to reduce the cost of toggling CR0.CD. The key is if KVM sets > the IPAT, then when CR0.CD is cleared by guest, KVM has to zap _all_ EPT entries > at least to clear IPAT. > > Can kvm honor guest MTRRs as well when CR0.CD=1 && with the quirk? then later > clearing CR0.CD needn't zap _any_ EPT entry. But the optimization is exactly the > one removed in patch 6. Maybe I miss something important. I replied it in another mail.
On Tue, Jun 20, 2023 at 10:34:29AM +0800, Yan Zhao wrote: >On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote: >> On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote: >> >For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory >> >types when cache is disabled and non-coherent DMA are present. >> > >> >With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the >> >EPT memory type when guest cache is disabled before this patch. >> >Removing the IPAT bit in this patch will allow effective memory type to >> >honor PAT values as well, which will make the effective memory type >> >> Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g., >> which guests can benefit from this change? >This patch is actually a preparation for later patch 10 to implement >fine-grained zap. >If when CR0.CD=1 the EPT type is WB + IPAT, and >when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are >without IPAT, then we have to always zap all EPT entries. OK. The goal is to reduce the cost of toggling CR0.CD. The key is if KVM sets the IPAT, then when CR0.CD is cleared by guest, KVM has to zap _all_ EPT entries at least to clear IPAT. Can kvm honor guest MTRRs as well when CR0.CD=1 && with the quirk? then later clearing CR0.CD needn't zap _any_ EPT entry. But the optimization is exactly the one removed in patch 6. Maybe I miss something important.
On 6/20/2023 10:34 AM, Yan Zhao wrote: > On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote: >> On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote: >>> For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory >>> types when cache is disabled and non-coherent DMA are present. >>> >>> With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the >>> EPT memory type when guest cache is disabled before this patch. >>> Removing the IPAT bit in this patch will allow effective memory type to >>> honor PAT values as well, which will make the effective memory type >> Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g., >> which guests can benefit from this change? > This patch is actually a preparation for later patch 10 to implement > fine-grained zap. > If when CR0.CD=1 the EPT type is WB + IPAT, and > when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are > without IPAT, then we have to always zap all EPT entries. > > Given removing the IPAT bit when CR0.CD=1 only makes the quirk > KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if > the guest PAT overwrites it), it's still acceptable. Per my understanding, the reason why KVM had KVM_X86_QUIRK_CD_NW_CLEARED is to ensure the memory type is WB to achieve better boot performance for old OVMF. you need to justify the original purpose is not broken by this patch.
On Sun, Jun 25, 2023 at 03:14:37PM +0800, Xiaoyao Li wrote: > On 6/20/2023 10:34 AM, Yan Zhao wrote: > > On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote: > > > On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote: > > > > For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory > > > > types when cache is disabled and non-coherent DMA are present. > > > > > > > > With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the > > > > EPT memory type when guest cache is disabled before this patch. > > > > Removing the IPAT bit in this patch will allow effective memory type to > > > > honor PAT values as well, which will make the effective memory type > > > Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g., > > > which guests can benefit from this change? > > This patch is actually a preparation for later patch 10 to implement > > fine-grained zap. > > If when CR0.CD=1 the EPT type is WB + IPAT, and > > when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are > > without IPAT, then we have to always zap all EPT entries. > > > > Given removing the IPAT bit when CR0.CD=1 only makes the quirk > > KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if > > the guest PAT overwrites it), it's still acceptable. > > Per my understanding, the reason why KVM had KVM_X86_QUIRK_CD_NW_CLEARED is > to ensure the memory type is WB to achieve better boot performance for old > OVMF. It works well for OVMF c9e5618f84b0cb54a9ac2d7604f7b7e7859b45a7, which is Apr 14 2015. > you need to justify the original purpose is not broken by this patch. Hmm, to dig into the history, the reason for this quirk is explained below: commit fb279950ba02e3210a16b11ecfa8871f3ee0ca49 Author: Xiao Guangrong <guangrong.xiao@intel.com> Date: Thu Jul 16 03:25:56 2015 +0800 KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED OVMF depends on WB to boot fast, because it only clears caches after it has set up MTRRs---which is too late. Let's do writeback if CR0.CD is set to make it happy, similar to what SVM is already doing. which means WB is only a must for fast boot before OVMF has set up MTRRs. At that period, PAT is default to WB. After OVMF setting up MTRR, according to the definition of no-fill cache mode, "Strict memory ordering is not enforced unless the MTRRs are disabled and/or all memory is referenced as uncached", it's valid to honor PAT in no-fill cache mode. Besides, if the guest explicitly claim UC via PAT, why should KVM return WB? In other words, if it's still slow caused by a UC value in guest PAT, it's desired to be fixed in guest instead of a workaround in KVM.
On Mon, Jun 26, 2023 at 11:40:28AM +0800, Yuan Yao wrote: > On Mon, Jun 26, 2023 at 08:08:20AM +0800, Yan Zhao wrote: > > On Sun, Jun 25, 2023 at 03:14:37PM +0800, Xiaoyao Li wrote: > > > On 6/20/2023 10:34 AM, Yan Zhao wrote: > > > > On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote: > > > > > On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote: > > > > > > For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory > > > > > > types when cache is disabled and non-coherent DMA are present. > > > > > > > > > > > > With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the > > > > > > EPT memory type when guest cache is disabled before this patch. > > > > > > Removing the IPAT bit in this patch will allow effective memory type to > > > > > > honor PAT values as well, which will make the effective memory type > > > > > Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g., > > > > > which guests can benefit from this change? > > > > This patch is actually a preparation for later patch 10 to implement > > > > fine-grained zap. > > > > If when CR0.CD=1 the EPT type is WB + IPAT, and > > > > when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are > > > > without IPAT, then we have to always zap all EPT entries. > > > > > > > > Given removing the IPAT bit when CR0.CD=1 only makes the quirk > > > > KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if > > > > the guest PAT overwrites it), it's still acceptable. > > > > > > Per my understanding, the reason why KVM had KVM_X86_QUIRK_CD_NW_CLEARED is > > > to ensure the memory type is WB to achieve better boot performance for old > > > OVMF. > > It works well for OVMF c9e5618f84b0cb54a9ac2d7604f7b7e7859b45a7, > > which is Apr 14 2015. > > > > > > > you need to justify the original purpose is not broken by this patch. > > > > Hmm, to dig into the history, the reason for this quirk is explained below: > > > > commit fb279950ba02e3210a16b11ecfa8871f3ee0ca49 > > Author: Xiao Guangrong <guangrong.xiao@intel.com> > > Date: Thu Jul 16 03:25:56 2015 +0800 > > > > KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED > > > > OVMF depends on WB to boot fast, because it only clears caches after > > it has set up MTRRs---which is too late. > > > > Let's do writeback if CR0.CD is set to make it happy, similar to what > > SVM is already doing. > > > > > > which means WB is only a must for fast boot before OVMF has set up MTRRs. > > At that period, PAT is default to WB. > > > > After OVMF setting up MTRR, according to the definition of no-fill cache > > mode, "Strict memory ordering is not enforced unless the MTRRs are > > disabled and/or all memory is referenced as uncached", it's valid to > > honor PAT in no-fill cache mode. > > Does it also mean that, the honor PAT in such no-fill cache mode should > also happen for non-quirk case ? e.g. the effective memory type can be > WC if EPT is UC + guest PAT is WC for CD=1. No. Only the quirk KVM_X86_QUIRK_CD_NW_CLEARED indicates no-fill cache mode (CD=1 and NW=0). Without the quirk, UC + IPAT is desired. > > > Besides, if the guest explicitly claim UC via PAT, why should KVM return > > WB? > > In other words, if it's still slow caused by a UC value in guest PAT, > > it's desired to be fixed in guest instead of a workaround in KVM. > > the quirk may not work after this patch if the guest PAT is > stronger than WB for CD=1, we don't if any guest "works correctly" based > on this quirk, I hope no. How about highlight this in commit message At least for Seabios and OVMF, the PAT is WB by default. Even after MTRRs enabled, if there are UC ranges, they are small in size and are desired to be UC. So, I think it's ok. > explicitly ? Will try to explain the background and possible influence. Thanks > > Also I agree that such issue should be fixed in guest not in KVM. >
On Mon, Jun 26, 2023 at 08:08:20AM +0800, Yan Zhao wrote: > On Sun, Jun 25, 2023 at 03:14:37PM +0800, Xiaoyao Li wrote: > > On 6/20/2023 10:34 AM, Yan Zhao wrote: > > > On Tue, Jun 20, 2023 at 10:42:57AM +0800, Chao Gao wrote: > > > > On Fri, Jun 16, 2023 at 10:38:15AM +0800, Yan Zhao wrote: > > > > > For KVM_X86_QUIRK_CD_NW_CLEARED, remove the ignore PAT bit in EPT memory > > > > > types when cache is disabled and non-coherent DMA are present. > > > > > > > > > > With the quirk KVM_X86_QUIRK_CD_NW_CLEARED, WB + IPAT are returned as the > > > > > EPT memory type when guest cache is disabled before this patch. > > > > > Removing the IPAT bit in this patch will allow effective memory type to > > > > > honor PAT values as well, which will make the effective memory type > > > > Given guest sets CR0.CD, what's the point of honoring (guest) PAT? e.g., > > > > which guests can benefit from this change? > > > This patch is actually a preparation for later patch 10 to implement > > > fine-grained zap. > > > If when CR0.CD=1 the EPT type is WB + IPAT, and > > > when CR0.CD=0 + mtrr enabled, EPT type is WB or UC or ..., which are > > > without IPAT, then we have to always zap all EPT entries. > > > > > > Given removing the IPAT bit when CR0.CD=1 only makes the quirk > > > KVM_X86_QUIRK_CD_NW_CLEARED more strict (meaning it could be WC/UC... if > > > the guest PAT overwrites it), it's still acceptable. > > > > Per my understanding, the reason why KVM had KVM_X86_QUIRK_CD_NW_CLEARED is > > to ensure the memory type is WB to achieve better boot performance for old > > OVMF. > It works well for OVMF c9e5618f84b0cb54a9ac2d7604f7b7e7859b45a7, > which is Apr 14 2015. > > > > you need to justify the original purpose is not broken by this patch. > > Hmm, to dig into the history, the reason for this quirk is explained below: > > commit fb279950ba02e3210a16b11ecfa8871f3ee0ca49 > Author: Xiao Guangrong <guangrong.xiao@intel.com> > Date: Thu Jul 16 03:25:56 2015 +0800 > > KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED > > OVMF depends on WB to boot fast, because it only clears caches after > it has set up MTRRs---which is too late. > > Let's do writeback if CR0.CD is set to make it happy, similar to what > SVM is already doing. > > > which means WB is only a must for fast boot before OVMF has set up MTRRs. > At that period, PAT is default to WB. > > After OVMF setting up MTRR, according to the definition of no-fill cache > mode, "Strict memory ordering is not enforced unless the MTRRs are > disabled and/or all memory is referenced as uncached", it's valid to > honor PAT in no-fill cache mode. Does it also mean that, the honor PAT in such no-fill cache mode should also happen for non-quirk case ? e.g. the effective memory type can be WC if EPT is UC + guest PAT is WC for CD=1. > Besides, if the guest explicitly claim UC via PAT, why should KVM return > WB? > In other words, if it's still slow caused by a UC value in guest PAT, > it's desired to be fixed in guest instead of a workaround in KVM. the quirk may not work after this patch if the guest PAT is stronger than WB for CD=1, we don't if any guest "works correctly" based on this quirk, I hope no. How about highlight this in commit message explicitly ? Also I agree that such issue should be fixed in guest not in KVM. > > > >
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0ecf4be2c6af..c1e93678cea4 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm) static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { - u8 cache; - /* We wanted to honor guest CD/MTRR/PAT, but doing so could result in * memory aliases with conflicting memory types and sometimes MCEs. * We have to be careful as to what are honored and when. @@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) { if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) - cache = MTRR_TYPE_WRBACK; + return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; else - cache = MTRR_TYPE_UNCACHABLE; - - return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; + return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) | + VMX_EPT_IPAT_BIT; } return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;