Message ID | be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1199043vqo; Fri, 19 May 2023 05:34:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4lKjHiIaChpW/JuUibmv80gpTtG3ATyoEKz4ZXyxk8aBRqTUys+F9/vKSwiGEFTNoC6aL7 X-Received: by 2002:a17:90a:7105:b0:23f:582d:f45f with SMTP id h5-20020a17090a710500b0023f582df45fmr2206976pjk.1.1684499677051; Fri, 19 May 2023 05:34:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684499677; cv=none; d=google.com; s=arc-20160816; b=Gzm5UJn1JWYLIr6KaB8HuHe9fqEnJOgmlM3waboGU7aC+ghgXzbeJ0nR0hVyQZ5nBl s+QY312pFWEuwx4HNDln2PPakTyYhgHNuIjVWJwPsMPRrA05ab3+Idn2wQnR7iU/wmlZ Ik+0RS0U0rNg2dYdBS7Hj4gsAnihWMOxhxKFS7P+14D3bCj2uHgjbD7VQ4Q+2Ukec7+A hkDSxozIo4q1Rrv1tadflRBb1FoDssMpNfW5ZCZxfz2TwJIAi7OGZvLQhI4Vw+2e4AHQ Lb3Ale/QHPY4ta3jbtBXUzoP8bMJM+TLAoN7kjxkd1Q8mq1hVE2A2msw0NI63hNsBAh0 QvwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=03tctokGmyymx6MCWWFJKb1aUgzrY2cKdlqJXe6eYB0=; b=r/Hj2EQQAJGTqDXYF/B6kdpytHLBOR5oqoGsBpDR0pdcaQioMKEh86GEIvzJYIq5TU vQC1Q0fZ7LyC+pYy32kTlBFPtDMHU7Ei/Cn5t4OZUDxNNUQ0dU3owkHP45KsjIJiLg5U LFHj6o36/ho2tLs4hGLVrzNVhX2VcCTxlrzBR8mpaRjWcwTbn6+Cas7cFBB05syR0Uie 2TQqcrkz3Nz//luGY8dbVBwS+kXAQdShYviHkuSIoqNEFB9RIynEUn77UoETu5z22rLR CEHjt5mHmOkSaAGhNngLI8+BQ1BLdhMeIoGYGiPPlyrEE+OeTSrgeLNcZV5dwCgZbC9g T2Sw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t12-20020a17090ae50c00b0022c9cb7662csi1641187pjy.159.2023.05.19.05.34.18; Fri, 19 May 2023 05:34:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231347AbjESMRk (ORCPT <rfc822;wlfightup@gmail.com> + 99 others); Fri, 19 May 2023 08:17:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbjESMRi (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 19 May 2023 08:17:38 -0400 X-Greylist: delayed 1799 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 19 May 2023 05:17:36 PDT Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D908199; Fri, 19 May 2023 05:17:36 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <mail@maciej.szmigiero.name>) id 1pzyFc-0005jV-AN; Fri, 19 May 2023 13:26:24 +0200 From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> To: Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com> Cc: Maxim Levitsky <mlevitsk@redhat.com>, Santosh Shukla <santosh.shukla@amd.com>, vkuznets@redhat.com, jmattson@google.com, thomas.lendacky@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK Date: Fri, 19 May 2023 13:26:18 +0200 Message-Id: <be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766325933326994505?= X-GMAIL-MSGID: =?utf-8?q?1766325933326994505?= |
Series |
KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
|
|
Commit Message
Maciej S. Szmigiero
May 19, 2023, 11:26 a.m. UTC
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware I noticed that with vCPU count large enough (> 16) they sometimes froze at boot. With vCPU count of 64 they never booted successfully - suggesting some kind of a race condition. Since adding "vnmi=0" module parameter made these guests boot successfully it was clear that the problem is most likely (v)NMI-related. Running kvm-unit-tests quickly showed failing NMI-related tests cases, like "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests and the NMI parts of eventinj test. The issue was that once one NMI was being serviced no other NMI was allowed to be set pending (NMI limit = 0), which was traced to svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather than for the "NMI pending" flag. Fix this by testing for the right flag in svm_is_vnmi_pending(). Once this is done, the NMI-related kvm-unit-tests pass successfully and the Windows guest no longer freezes at boot. Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> --- It's a bit sad that no-one apparently tested the vNMI patchset with kvm-unit-tests on an actual vNMI-enabled hardware... arch/x86/kvm/svm/svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Fri, May 19, 2023, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> > > While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware > I noticed that with vCPU count large enough (> 16) they sometimes froze at > boot. > With vCPU count of 64 they never booted successfully - suggesting some kind > of a race condition. > > Since adding "vnmi=0" module parameter made these guests boot successfully > it was clear that the problem is most likely (v)NMI-related. > > Running kvm-unit-tests quickly showed failing NMI-related tests cases, like > "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests > and the NMI parts of eventinj test. > > The issue was that once one NMI was being serviced no other NMI was allowed > to be set pending (NMI limit = 0), which was traced to > svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather > than for the "NMI pending" flag. > > Fix this by testing for the right flag in svm_is_vnmi_pending(). > Once this is done, the NMI-related kvm-unit-tests pass successfully and > the Windows guest no longer freezes at boot. > > Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> > --- > > It's a bit sad that no-one apparently tested the vNMI patchset with > kvm-unit-tests on an actual vNMI-enabled hardware... That's one way to put it. Santosh, what happened? This goof was present in both v3 and v4, i.e. it wasn't something that we botched when applying/massaging at the last minute. And the cover letters for both v3 and v4 state "Series ... tested on AMD EPYC-Genoa".
Hi Sean and Maciej, On 5/19/2023 9:21 PM, Sean Christopherson wrote: > On Fri, May 19, 2023, Maciej S. Szmigiero wrote: >> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >> >> While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware >> I noticed that with vCPU count large enough (> 16) they sometimes froze at >> boot. >> With vCPU count of 64 they never booted successfully - suggesting some kind >> of a race condition. >> >> Since adding "vnmi=0" module parameter made these guests boot successfully >> it was clear that the problem is most likely (v)NMI-related. >> >> Running kvm-unit-tests quickly showed failing NMI-related tests cases, like >> "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests >> and the NMI parts of eventinj test. >> >> The issue was that once one NMI was being serviced no other NMI was allowed >> to be set pending (NMI limit = 0), which was traced to >> svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather >> than for the "NMI pending" flag. >> >> Fix this by testing for the right flag in svm_is_vnmi_pending(). >> Once this is done, the NMI-related kvm-unit-tests pass successfully and >> the Windows guest no longer freezes at boot. >> >> Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") >> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > > Reviewed-by: Sean Christopherson <seanjc@google.com> > >> --- >> >> It's a bit sad that no-one apparently tested the vNMI patchset with >> kvm-unit-tests on an actual vNMI-enabled hardware... > > That's one way to put it. > > Santosh, what happened? This goof was present in both v3 and v4, i.e. it wasn't > something that we botched when applying/massaging at the last minute. And the > cover letters for both v3 and v4 state "Series ... tested on AMD EPYC-Genoa". My bad that I only ran svm_test with vnmi in past using Sean's KUT branch remotes/sean-kut/svm/vnmi_test and saw that vnmi test was passing. Here are the logs: --- PASS: vNMI enabled but NMI_INTERCEPT unset! PASS: vNMI with vector 2 not injected PASS: VNMI serviced PASS: vnmi --- However when I ran mentioned tests by Maciej, I do see the failure. Thanks for this pointing out. Reviewed-by : Santosh Shukla <Santosh.Shukla@amd.com> Best Regards, Santosh
On 19.05.2023 17:51, Sean Christopherson wrote: > On Fri, May 19, 2023, Maciej S. Szmigiero wrote: >> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >> >> While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware >> I noticed that with vCPU count large enough (> 16) they sometimes froze at >> boot. >> With vCPU count of 64 they never booted successfully - suggesting some kind >> of a race condition. >> >> Since adding "vnmi=0" module parameter made these guests boot successfully >> it was clear that the problem is most likely (v)NMI-related. >> >> Running kvm-unit-tests quickly showed failing NMI-related tests cases, like >> "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests >> and the NMI parts of eventinj test. >> >> The issue was that once one NMI was being serviced no other NMI was allowed >> to be set pending (NMI limit = 0), which was traced to >> svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather >> than for the "NMI pending" flag. >> >> Fix this by testing for the right flag in svm_is_vnmi_pending(). >> Once this is done, the NMI-related kvm-unit-tests pass successfully and >> the Windows guest no longer freezes at boot. >> >> Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") >> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > > Reviewed-by: Sean Christopherson <seanjc@google.com> > I can't see this in kvm/kvm.git trees or the kvm-x86 ones on GitHub - is this patch planned to be picked up for -rc5 soon? Technically, just knowing the final commit id would be sufficit for my purposes. Thanks, Maciej
On Thu, Jun 01, 2023, Maciej S. Szmigiero wrote: > On 19.05.2023 17:51, Sean Christopherson wrote: > > On Fri, May 19, 2023, Maciej S. Szmigiero wrote: > > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> > > > > > > While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware > > > I noticed that with vCPU count large enough (> 16) they sometimes froze at > > > boot. > > > With vCPU count of 64 they never booted successfully - suggesting some kind > > > of a race condition. > > > > > > Since adding "vnmi=0" module parameter made these guests boot successfully > > > it was clear that the problem is most likely (v)NMI-related. > > > > > > Running kvm-unit-tests quickly showed failing NMI-related tests cases, like > > > "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests > > > and the NMI parts of eventinj test. > > > > > > The issue was that once one NMI was being serviced no other NMI was allowed > > > to be set pending (NMI limit = 0), which was traced to > > > svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather > > > than for the "NMI pending" flag. > > > > > > Fix this by testing for the right flag in svm_is_vnmi_pending(). > > > Once this is done, the NMI-related kvm-unit-tests pass successfully and > > > the Windows guest no longer freezes at boot. > > > > > > Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") > > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > > I can't see this in kvm/kvm.git trees or the kvm-x86 ones on GitHub - > is this patch planned to be picked up for -rc5 soon? > > Technically, just knowing the final commit id would be sufficit for my > purposes. If Paolo doesn't pick it up by tomorrow, I'll apply it and send a fixes pull request for -rc5.
On 1.06.2023 20:04, Sean Christopherson wrote: > On Thu, Jun 01, 2023, Maciej S. Szmigiero wrote: >> On 19.05.2023 17:51, Sean Christopherson wrote: >>> On Fri, May 19, 2023, Maciej S. Szmigiero wrote: >>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>>> >>>> While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware >>>> I noticed that with vCPU count large enough (> 16) they sometimes froze at >>>> boot. >>>> With vCPU count of 64 they never booted successfully - suggesting some kind >>>> of a race condition. >>>> >>>> Since adding "vnmi=0" module parameter made these guests boot successfully >>>> it was clear that the problem is most likely (v)NMI-related. >>>> >>>> Running kvm-unit-tests quickly showed failing NMI-related tests cases, like >>>> "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests >>>> and the NMI parts of eventinj test. >>>> >>>> The issue was that once one NMI was being serviced no other NMI was allowed >>>> to be set pending (NMI limit = 0), which was traced to >>>> svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather >>>> than for the "NMI pending" flag. >>>> >>>> Fix this by testing for the right flag in svm_is_vnmi_pending(). >>>> Once this is done, the NMI-related kvm-unit-tests pass successfully and >>>> the Windows guest no longer freezes at boot. >>>> >>>> Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") >>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> >>> >>> Reviewed-by: Sean Christopherson <seanjc@google.com> >>> >> >> I can't see this in kvm/kvm.git trees or the kvm-x86 ones on GitHub - >> is this patch planned to be picked up for -rc5 soon? >> >> Technically, just knowing the final commit id would be sufficit for my >> purposes. > > If Paolo doesn't pick it up by tomorrow, I'll apply it and send a fixes pull > request for -rc5. Thanks Sean.
On Fri, 19 May 2023 13:26:18 +0200, Maciej S. Szmigiero wrote: > While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware > I noticed that with vCPU count large enough (> 16) they sometimes froze at > boot. > With vCPU count of 64 they never booted successfully - suggesting some kind > of a race condition. > > Since adding "vnmi=0" module parameter made these guests boot successfully > it was clear that the problem is most likely (v)NMI-related. > > [...] Applied to kvm-x86 fixes, thanks! [1/1] KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK https://github.com/kvm-x86/linux/commit/b2ce89978889 -- https://github.com/kvm-x86/linux/tree/next https://github.com/kvm-x86/linux/tree/fixes
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index ca32389f3c36..54089f990c8f 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3510,7 +3510,7 @@ static bool svm_is_vnmi_pending(struct kvm_vcpu *vcpu) if (!is_vnmi_enabled(svm)) return false; - return !!(svm->vmcb->control.int_ctl & V_NMI_BLOCKING_MASK); + return !!(svm->vmcb->control.int_ctl & V_NMI_PENDING_MASK); } static bool svm_set_vnmi_pending(struct kvm_vcpu *vcpu)