Message ID | 20230209072220.6836-8-jgross@suse.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp192418wrn; Wed, 8 Feb 2023 23:41:56 -0800 (PST) X-Google-Smtp-Source: AK7set8wg/WrBZgK2gnsQoiSkco7rvMaMl7EYdnYJ9oejJbDEMqoIe3GxNXOuyuvrZsUyc2qCHZV X-Received: by 2002:a17:906:1908:b0:878:5f8e:26c0 with SMTP id a8-20020a170906190800b008785f8e26c0mr5494310eje.7.1675928515794; Wed, 08 Feb 2023 23:41:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675928515; cv=none; d=google.com; s=arc-20160816; b=HiWsq56aMkN3If9cxyTWXHLQbuinMFOofx4Vhqg1IAMvfSE6ptf3/3HxgIws1/+/lx tr8Vd+5uiWrjsKM8g9FjOCziN4Stnz6N4LyKdpcklTr++sv6Ecte/03JGMwgXZ6ARiKK V/tMyH3X6RyrV5cYUHuHXrVQ1nlBH5P0hv56kEvHrI9jPFDcA3G7ZKYdGm2qASam9vrH 0cMXn5MvbW7611OqJbVshRBZDMrko7pHXhEq4cQqfoIyA1J0Bh3QuZyln0w3poofDmUu sfHTTufTBhpGVExBpIHBGDU1ThrowP0/SCoabeMIBwWQGexnRul+8+E6J7WXv4JS2Skq +Y8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=JQKD/lPKkbb0qc+8vhUXutdmkqWMq0MqdsxYYVlo8yw=; b=yRxXnLR1DGpPG3mYAtMIx7DFQb1MRNpgzibcSs9UAuK6lq6F+Gy8jK+v65dL69uPOn 1q0pC6QrWtm7N/SJ7cZpipikjN5Sl0jZQfLlGDmarrAr5lhI9bubz9p+sN/cFgmwiSDo mc43ux5Bq0R6wSF9nFHnDJFZn9KBz2yb+6hvQvfQXOmRjfEXTH5RR3yEJrArLilxc0/o duXx2XEku0INSGBiKDvAqhFXAYF4aBlXvPPmq00BDHhW2ISdpxFHPYvE3q7r49rgDbLm vZXMxWJlLfuKaUCYNI1/LiRSWVS6aNL61EQFcyJwDlQMesQ4nq+m4ko1SVbahwVX/BHI F5Kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=JGSuT3c0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 5-20020a170906020500b0087bdbb732f3si2372654ejd.903.2023.02.08.23.41.33; Wed, 08 Feb 2023 23:41:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=JGSuT3c0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229763AbjBIHXh (ORCPT <rfc822;ybw1215001957@gmail.com> + 99 others); Thu, 9 Feb 2023 02:23:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229958AbjBIHXY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 9 Feb 2023 02:23:24 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E82B4B1BB for <linux-kernel@vger.kernel.org>; Wed, 8 Feb 2023 23:23:04 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 920B15C21E; Thu, 9 Feb 2023 07:23:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1675927383; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JQKD/lPKkbb0qc+8vhUXutdmkqWMq0MqdsxYYVlo8yw=; b=JGSuT3c00URgBx3jlPXWSEIffym0BdFx2yUguZRDV33VGPqO6l76IYPCFpExHSArPjdIuc Ex6uV/C5p+S2ISH8FwrCTyFucgikotYb0ni+3Yp8PIpS9POS6HiIcPhI0e2jk6UO7fuz7w HSYZEol2ZnM/VGfWSUEvxTNbDWdcgiY= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3E89B1339E; Thu, 9 Feb 2023 07:23:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id we8VDlef5GMbeQAAMHmgww (envelope-from <jgross@suse.com>); Thu, 09 Feb 2023 07:23:03 +0000 From: Juergen Gross <jgross@suse.com> To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: lists@nerdbynature.de, mikelley@microsoft.com, torvalds@linux-foundation.org, Juergen Gross <jgross@suse.com>, Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com> Subject: [PATCH v2 7/8] x86/mm: only check uniform after calling mtrr_type_lookup() Date: Thu, 9 Feb 2023 08:22:19 +0100 Message-Id: <20230209072220.6836-8-jgross@suse.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230209072220.6836-1-jgross@suse.com> References: <20230209072220.6836-1-jgross@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757338419167843318?= X-GMAIL-MSGID: =?utf-8?q?1757338419167843318?= |
Series |
x86/mtrr: fix handling with PAT but without MTRR
|
|
Commit Message
Juergen Gross
Feb. 9, 2023, 7:22 a.m. UTC
Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be
WB or INVALID after calling mtrr_type_lookup(). Those tests can be
dropped, as the only reason to not use a large mapping would be
uniform being 0. Any MTRR type can be accepted as long as it applies
to the whole memory range covered by the mapping, as the alternative
would only be to map the same region with smaller pages instead using
the same PAT type as for the large mapping.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/mm/pgtable.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
Comments
On Thu, 2023-02-09 at 08:22 +0100, Juergen Gross wrote: > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c > index e4f499eb0f29..7b9c5443d176 100644 > --- a/arch/x86/mm/pgtable.c > +++ b/arch/x86/mm/pgtable.c > @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, > pgprot_t prot) > u8 mtrr, uniform; 'mtrr' is now unused. Can it be dropped? Same for the pmd_set_huge(). > > mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform); > - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && > - (mtrr != MTRR_TYPE_WRBACK)) > + if (!uniform) > return 0;
From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM > > Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be > WB or INVALID after calling mtrr_type_lookup(). Those tests can be > dropped, as the only reason to not use a large mapping would be > uniform being 0. Any MTRR type can be accepted as long as it applies > to the whole memory range covered by the mapping, as the alternative > would only be to map the same region with smaller pages instead using > the same PAT type as for the large mapping. > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > arch/x86/mm/pgtable.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c > index e4f499eb0f29..7b9c5443d176 100644 > --- a/arch/x86/mm/pgtable.c > +++ b/arch/x86/mm/pgtable.c > @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) > u8 mtrr, uniform; > > mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform); > - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && > - (mtrr != MTRR_TYPE_WRBACK)) > + if (!uniform) > return 0; > > /* Bail out if we are we on a populated non-leaf entry: */ > @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) > u8 mtrr, uniform; > > mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform); > - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && > - (mtrr != MTRR_TYPE_WRBACK)) { > + if (!uniform) { > pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n", > __func__, addr, addr + PMD_SIZE); I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an SEV-SNP Confidential VM). The original filtering here based on MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable() where it incorrectly thinks an address range matches two different variable MTRRs, and hence clears "uniform". Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes of memory: [ 0.043592] MTRR variable ranges enabled: [ 0.048308] 0 base 000000000000 mask FFFF00000000 write-back [ 0.057450] 1 base 000100000000 mask FFF000000000 write-back [ 0.063972] 2 disabled [ 0.066755] 3 disabled [ 0.070024] 4 disabled [ 0.072856] 5 disabled [ 0.076112] 6 disabled [ 0.078760] 7 disabled Variable MTRR #0 covers addresses up to 4 GiByte, while #1 covers 4 GiByte to 64 GiByte. But in mtrr_type_lookup_variable(), address range 0xF8000000 to 0xF81FFFFF is matching both MTRRs, when it should be matching just #0. The problem looks to be this code in mtrr_type_lookup_variable(): if ((start & mask) != (base & mask)) continue; If the mask bits of start and base are different, then the MTRR doesn't match, and the continue statement should be executed. That's correct. But if the mask bits are the same, that's not sufficient for the MTRR to match. If the end address is less than base, the MTRR doesn't match, and the continue statement should still be executed, which isn't happening. But somebody please check my thinking. :-) Michael
On 11.02.23 01:06, Edgecombe, Rick P wrote: > On Thu, 2023-02-09 at 08:22 +0100, Juergen Gross wrote: >> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c >> index e4f499eb0f29..7b9c5443d176 100644 >> --- a/arch/x86/mm/pgtable.c >> +++ b/arch/x86/mm/pgtable.c >> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, >> pgprot_t prot) >> u8 mtrr, uniform; > > 'mtrr' is now unused. Can it be dropped? Same for the pmd_set_huge(). I guess it will be used again, due to the comment you made for the whole series. Juergen
On 13.02.23 02:08, Michael Kelley (LINUX) wrote: > From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM >> >> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be >> WB or INVALID after calling mtrr_type_lookup(). Those tests can be >> dropped, as the only reason to not use a large mapping would be >> uniform being 0. Any MTRR type can be accepted as long as it applies >> to the whole memory range covered by the mapping, as the alternative >> would only be to map the same region with smaller pages instead using >> the same PAT type as for the large mapping. >> >> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> arch/x86/mm/pgtable.c | 6 ++---- >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c >> index e4f499eb0f29..7b9c5443d176 100644 >> --- a/arch/x86/mm/pgtable.c >> +++ b/arch/x86/mm/pgtable.c >> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) >> u8 mtrr, uniform; >> >> mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform); >> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && >> - (mtrr != MTRR_TYPE_WRBACK)) >> + if (!uniform) >> return 0; >> >> /* Bail out if we are we on a populated non-leaf entry: */ >> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) >> u8 mtrr, uniform; >> >> mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform); >> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && >> - (mtrr != MTRR_TYPE_WRBACK)) { >> + if (!uniform) { >> pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n", >> __func__, addr, addr + PMD_SIZE); > > I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an > SEV-SNP Confidential VM). The original filtering here based on > MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable() > where it incorrectly thinks an address range matches two different variable > MTRRs, and hence clears "uniform". > > Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes > of memory: > > [ 0.043592] MTRR variable ranges enabled: > [ 0.048308] 0 base 000000000000 mask FFFF00000000 write-back > [ 0.057450] 1 base 000100000000 mask FFF000000000 write-back > [ 0.063972] 2 disabled > [ 0.066755] 3 disabled > [ 0.070024] 4 disabled > [ 0.072856] 5 disabled > [ 0.076112] 6 disabled > [ 0.078760] 7 disabled > > Variable MTRR #0 covers addresses up to 4 GiByte, while #1 covers > 4 GiByte to 64 GiByte. But in mtrr_type_lookup_variable(), address > range 0xF8000000 to 0xF81FFFFF is matching both MTRRs, when it > should be matching just #0. > > The problem looks to be this code in mtrr_type_lookup_variable(): > > if ((start & mask) != (base & mask)) > continue; > > If the mask bits of start and base are different, then the > MTRR doesn't match, and the continue statement should be > executed. That's correct. But if the mask bits are the same, > that's not sufficient for the MTRR to match. If the end > address is less than base, the MTRR doesn't match, and > the continue statement should still be executed, which > isn't happening. > > But somebody please check my thinking. :-) I don't see a flaw in your reasoning. Rick mentioned a problem with this patch in a KVM guest. I'll try to reproduce his setup for checking whether fixing mtrr_type_lookup_variable() is enough, or if we need to keep the tests for WB in this patch. Juergen
On 13.02.23 02:08, Michael Kelley (LINUX) wrote: > From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM >> >> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be >> WB or INVALID after calling mtrr_type_lookup(). Those tests can be >> dropped, as the only reason to not use a large mapping would be >> uniform being 0. Any MTRR type can be accepted as long as it applies >> to the whole memory range covered by the mapping, as the alternative >> would only be to map the same region with smaller pages instead using >> the same PAT type as for the large mapping. >> >> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> arch/x86/mm/pgtable.c | 6 ++---- >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c >> index e4f499eb0f29..7b9c5443d176 100644 >> --- a/arch/x86/mm/pgtable.c >> +++ b/arch/x86/mm/pgtable.c >> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) >> u8 mtrr, uniform; >> >> mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform); >> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && >> - (mtrr != MTRR_TYPE_WRBACK)) >> + if (!uniform) >> return 0; >> >> /* Bail out if we are we on a populated non-leaf entry: */ >> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) >> u8 mtrr, uniform; >> >> mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform); >> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && >> - (mtrr != MTRR_TYPE_WRBACK)) { >> + if (!uniform) { >> pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n", >> __func__, addr, addr + PMD_SIZE); > > I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an > SEV-SNP Confidential VM). The original filtering here based on > MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable() > where it incorrectly thinks an address range matches two different variable > MTRRs, and hence clears "uniform". > > Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes > of memory: > > [ 0.043592] MTRR variable ranges enabled: > [ 0.048308] 0 base 000000000000 mask FFFF00000000 write-back > [ 0.057450] 1 base 000100000000 mask FFF000000000 write-back I've read the SDM chapter for MTRRs again. Doesn't #1 violate the requirements for MTRR settings? The SDM says: For ranges greater than 4 KBytes, each range must be of length 2^n and its base address must be aligned on a 2^n boundary, where n is a value equal to or greater than 12. The base-address alignment value cannot be less than its length. For example, an 8-KByte range cannot be aligned on a 4-KByte boundary. It must be aligned on at least an 8-KByte boundary. This makes the reasoning below wrong. > [ 0.063972] 2 disabled > [ 0.066755] 3 disabled > [ 0.070024] 4 disabled > [ 0.072856] 5 disabled > [ 0.076112] 6 disabled > [ 0.078760] 7 disabled > > Variable MTRR #0 covers addresses up to 4 GiByte, while #1 covers > 4 GiByte to 64 GiByte. But in mtrr_type_lookup_variable(), address > range 0xF8000000 to 0xF81FFFFF is matching both MTRRs, when it > should be matching just #0. > > The problem looks to be this code in mtrr_type_lookup_variable(): > > if ((start & mask) != (base & mask)) > continue; > > If the mask bits of start and base are different, then the > MTRR doesn't match, and the continue statement should be > executed. That's correct. But if the mask bits are the same, > that's not sufficient for the MTRR to match. If the end > address is less than base, the MTRR doesn't match, and > the continue statement should still be executed, which > isn't happening. > > But somebody please check my thinking. :-) I think you need to correct the hypervisor. Juergen
From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 15, 2023 5:40 AM > > On 13.02.23 02:08, Michael Kelley (LINUX) wrote: > > From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 > PM > >> > >> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be > >> WB or INVALID after calling mtrr_type_lookup(). Those tests can be > >> dropped, as the only reason to not use a large mapping would be > >> uniform being 0. Any MTRR type can be accepted as long as it applies > >> to the whole memory range covered by the mapping, as the alternative > >> would only be to map the same region with smaller pages instead using > >> the same PAT type as for the large mapping. > >> > >> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> > >> Signed-off-by: Juergen Gross <jgross@suse.com> > >> --- > >> arch/x86/mm/pgtable.c | 6 ++---- > >> 1 file changed, 2 insertions(+), 4 deletions(-) > >> > >> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c > >> index e4f499eb0f29..7b9c5443d176 100644 > >> --- a/arch/x86/mm/pgtable.c > >> +++ b/arch/x86/mm/pgtable.c > >> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t > prot) > >> u8 mtrr, uniform; > >> > >> mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform); > >> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && > >> - (mtrr != MTRR_TYPE_WRBACK)) > >> + if (!uniform) > >> return 0; > >> > >> /* Bail out if we are we on a populated non-leaf entry: */ > >> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, > pgprot_t prot) > >> u8 mtrr, uniform; > >> > >> mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform); > >> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && > >> - (mtrr != MTRR_TYPE_WRBACK)) { > >> + if (!uniform) { > >> pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a > huge-page mapping due to MTRR override.\n", > >> __func__, addr, addr + PMD_SIZE); > > > > I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an > > SEV-SNP Confidential VM). The original filtering here based on > > MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable() > > where it incorrectly thinks an address range matches two different variable > > MTRRs, and hence clears "uniform". > > > > Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes > > of memory: > > > > [ 0.043592] MTRR variable ranges enabled: > > [ 0.048308] 0 base 000000000000 mask FFFF00000000 write-back > > [ 0.057450] 1 base 000100000000 mask FFF000000000 write-back > > I've read the SDM chapter for MTRRs again. Doesn't #1 violate the requirements > for MTRR settings? The SDM says: > > For ranges greater than 4 KBytes, each range must be of length 2^n and its > base address must be aligned on a 2^n boundary, where n is a value equal to > or greater than 12. The base-address alignment value cannot be less than its > length. For example, an 8-KByte range cannot be aligned on a 4-KByte boundary. > It must be aligned on at least an 8-KByte boundary. > > This makes the reasoning below wrong. Argh. It sure looks like you are right. I just assumed the MTRRs coming from Hyper-V were good. Shame on me. :-( I've ping'ed the Hyper-V team to see what they say. But it's hard to see how they could argue that these MTRRs are correctly formed. The Intel spec is unambiguous. Even if Hyper-V agrees that the MTRRs are wrong, a fix will take time to propagate. In the meantime, it seems like the Linux mitigations could be any of the following: 1) Keep the test for WB in pud_set_huge() and pmd_set_huge() 2) Remove the test, but have "uniform" set to 1 when multiple MTRRs are matched but all have the same caching type, which you proposed to solve Rick Edgecombe's problem. This is likely to paper over the problem I saw with the Hyper-V MTRRs because the incorrectly matching MTRRs would all be WB. 3) In *all* Hyper-V VMs (not just Confidential VMs), disable X86_FEATURE_MTRR and use the new override to set the default type to WB. Hopefully we don't have to do this, but I can submit a separate patch if it becomes necessary. Michael
On 15.02.23 20:38, Michael Kelley (LINUX) wrote: > From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 15, 2023 5:40 AM >> >> On 13.02.23 02:08, Michael Kelley (LINUX) wrote: >>> From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 >> PM >>>> >>>> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be >>>> WB or INVALID after calling mtrr_type_lookup(). Those tests can be >>>> dropped, as the only reason to not use a large mapping would be >>>> uniform being 0. Any MTRR type can be accepted as long as it applies >>>> to the whole memory range covered by the mapping, as the alternative >>>> would only be to map the same region with smaller pages instead using >>>> the same PAT type as for the large mapping. >>>> >>>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >>>> Signed-off-by: Juergen Gross <jgross@suse.com> >>>> --- >>>> arch/x86/mm/pgtable.c | 6 ++---- >>>> 1 file changed, 2 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c >>>> index e4f499eb0f29..7b9c5443d176 100644 >>>> --- a/arch/x86/mm/pgtable.c >>>> +++ b/arch/x86/mm/pgtable.c >>>> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t >> prot) >>>> u8 mtrr, uniform; >>>> >>>> mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform); >>>> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && >>>> - (mtrr != MTRR_TYPE_WRBACK)) >>>> + if (!uniform) >>>> return 0; >>>> >>>> /* Bail out if we are we on a populated non-leaf entry: */ >>>> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, >> pgprot_t prot) >>>> u8 mtrr, uniform; >>>> >>>> mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform); >>>> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && >>>> - (mtrr != MTRR_TYPE_WRBACK)) { >>>> + if (!uniform) { >>>> pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a >> huge-page mapping due to MTRR override.\n", >>>> __func__, addr, addr + PMD_SIZE); >>> >>> I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an >>> SEV-SNP Confidential VM). The original filtering here based on >>> MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable() >>> where it incorrectly thinks an address range matches two different variable >>> MTRRs, and hence clears "uniform". >>> >>> Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes >>> of memory: >>> >>> [ 0.043592] MTRR variable ranges enabled: >>> [ 0.048308] 0 base 000000000000 mask FFFF00000000 write-back >>> [ 0.057450] 1 base 000100000000 mask FFF000000000 write-back >> >> I've read the SDM chapter for MTRRs again. Doesn't #1 violate the requirements >> for MTRR settings? The SDM says: >> >> For ranges greater than 4 KBytes, each range must be of length 2^n and its >> base address must be aligned on a 2^n boundary, where n is a value equal to >> or greater than 12. The base-address alignment value cannot be less than its >> length. For example, an 8-KByte range cannot be aligned on a 4-KByte boundary. >> It must be aligned on at least an 8-KByte boundary. >> >> This makes the reasoning below wrong. > > Argh. It sure looks like you are right. I just assumed the MTRRs coming from > Hyper-V were good. Shame on me. :-( I assumed the same, as I didn't see any flaw in your reasoning before. :-) > I've ping'ed the Hyper-V team to see what they say. But it's hard to see how > they could argue that these MTRRs are correctly formed. The Intel spec is > unambiguous. > > Even if Hyper-V agrees that the MTRRs are wrong, a fix will take time to > propagate. In the meantime, it seems like the Linux mitigations could be > any of the following: > > 1) Keep the test for WB in pud_set_huge() and pmd_set_huge() > > 2) Remove the test, but have "uniform" set to 1 when multiple MTRRs are > matched but all have the same caching type, which you proposed to > solve Rick Edgecombe's problem. This is likely to paper over the > problem I saw with the Hyper-V MTRRs because the incorrectly matching > MTRRs would all be WB. > > 3) In *all* Hyper-V VMs (not just Confidential VMs), disable X86_FEATURE_MTRR > and use the new override to set the default type to WB. Hopefully we don't > have to do this, but I can submit a separate patch if it becomes necessary. 4) Sanitize MTRRs in mtrr_cleanup(), resulting in MTRR#1 in your example to be modified to start at 0 (which would not really help to solve the multiple match you are seeing, but I'm about to solve that one, too, as the multiple MTRR match is allowed in the specs, but not really handled correctly in mtrr_type_lookup()). Juergen
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index e4f499eb0f29..7b9c5443d176 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) u8 mtrr, uniform; mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform); - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && - (mtrr != MTRR_TYPE_WRBACK)) + if (!uniform) return 0; /* Bail out if we are we on a populated non-leaf entry: */ @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) u8 mtrr, uniform; mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform); - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && - (mtrr != MTRR_TYPE_WRBACK)) { + if (!uniform) { pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n", __func__, addr, addr + PMD_SIZE); return 0;