Message ID | 20230721201859.2307736-20-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp476890vqg; Fri, 21 Jul 2023 14:33:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlFUWauTB0VNFvaUwAk39KNMvt6gJuu5plo3lcinVTe+Kbhg2kZ1/7jZ6DRXdM2An1lCHZKI X-Received: by 2002:a05:6358:3422:b0:134:d78f:67bc with SMTP id h34-20020a056358342200b00134d78f67bcmr1221913rwd.14.1689975212199; Fri, 21 Jul 2023 14:33:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689975212; cv=none; d=google.com; s=arc-20160816; b=bKQS3tIBlr1cyfpS+MaaccQxrKjXGVjAHU0X1perOICIz6+0djZFt5lL07J9KLtxrh 90E5Bplx3pR9JD1hxlOO/STi84yXUIRS+h40NyeYj1gqajZwpjF9q3BvyGVs4YJWtGiD gu38fFSCCrfU4Q/YaHqaBbuCwmoQauWbKzvkCsagrHAk2yn7U+s2VnEp2NBPwWo/Ahl5 dOthbAJxwrDR9rPHAI9ezIvm3awQtiEeYT16gm4M6wjcrSAPpliWz9UgoIZ9/OXsxxaL CJm58s3vHnBRc+FgBBpP4GSnJNluG6OEGmoS0j7a/7TuupwTUEgLF+1NoPMpcwRvsldt RuZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=f1LNi9Sq5NIBXtzGJVGKttDHRo4VG7MGlit5nvyUtF0=; fh=GitwJflddDxrBBVhnTGEzIZYOBTU7RKSqL+OyCDv/0M=; b=QpVeTniUCEydqBei/ktf2HG+BYms/00hmVHw9D+bAbA5bdR9gTj2GezmSVXvaeL2FS kqCTGJNqMtinEsLXS66E/diOAK1HJYzTdtDmDqs6oQ11oa7WIUtcBYCsP/0htw1xzKy4 nD3Tc9j8Y1OLHqQJpwhESW8t44+ohS0QFJ2f5rFzaSKFsUNxCWjuoYydt8hsLr/hvD5b gjr6RWO1PjFdTios5NxGgPbmI5vdqS1jdxzB8cordnCvK11FEahKfUfkvxs77jjLZn15 rfTLqHXCkII7tCqjospSGQFn/Hi/Tlg9Uvo2SWISKLHsBSAP8fNq1XOaVAYmW8jjQcXN ehhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=lupqznqf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v24-20020a63f218000000b005578994f212si3710467pgh.425.2023.07.21.14.33.18; Fri, 21 Jul 2023 14:33:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=lupqznqf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230502AbjGUUVU (ORCPT <rfc822;assdfgzxcv4@gmail.com> + 99 others); Fri, 21 Jul 2023 16:21:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231311AbjGUUUy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 21 Jul 2023 16:20:54 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F04883AA2 for <linux-kernel@vger.kernel.org>; Fri, 21 Jul 2023 13:20:02 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c8f360a07a2so2266429276.2 for <linux-kernel@vger.kernel.org>; Fri, 21 Jul 2023 13:20:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689970783; x=1690575583; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=f1LNi9Sq5NIBXtzGJVGKttDHRo4VG7MGlit5nvyUtF0=; b=lupqznqfPScgKgAlsl68k2HwQOWYVzqFz7+e39gz+UVj6wvvC+ma4BNLkDyLgMbDCr sXcaIXufKIRqTjEo6kaASseH4LuovJPFo1FpnMySbHlymLZC8toxxIjvN2Aiw4ju6EAG VaD/mWdhEO8ur/DDeO9tEnMiMm8VItIJH8OP+EP+LpWzQ/NrR5L8c8s2LMhTcLypPS2U oBg+8IvkdXfhKuyTvHG3Xef0QbwUQ6ZzSvJM7Mo/H4jFrr/vFd5FgUJyl+qeE3dTtsH5 MY6DHfpkgF/82Qh+aTdBRtINfnVpzFGxerBDIxZVXvcHDxqf5FZ0G6Srcd0y+VCH+a4h +9CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689970783; x=1690575583; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=f1LNi9Sq5NIBXtzGJVGKttDHRo4VG7MGlit5nvyUtF0=; b=UE9jJl4FmdRBt8lVAb8J0GqHbC9FtiPMgkofkW2dh9ugrklT4sy1td/iXtfna/RJT6 GjIIlvu9TU/QzVZjrCtQV0WWwWsJoW7OjVbg64Uo7lNcafxofTJSaV0iJOkiWCsKXyhA ADcT2SYT6+W5mMeExzrQaeFzsPU9aB5cBZNbUReCOuQy7hk78nQuLe5Bh1zm/R6nZse5 9uZKmtqkXjtDI4Su8ShA6TvfBm3cUjhllE5bFq4ksGoXWvC+/sucbPauPpYaO6erEq/s ladM1OwD4Q1SJck+Pc8pU8p6WCQPHvnF/rY4oqd15+oCeE/vtHSqruVikiHbLxfUzjLI BR2w== X-Gm-Message-State: ABy/qLYiejwXCdiKng84QDxgOLtaZUQFWEEVF3pGVngPUNnd8ijDShwK f4PlfqIVhNQARc8ZvT4EYh4LpbWwP4k= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a5b:bc2:0:b0:ce9:64b3:80dc with SMTP id c2-20020a5b0bc2000000b00ce964b380dcmr18891ybr.1.1689970783817; Fri, 21 Jul 2023 13:19:43 -0700 (PDT) Reply-To: Sean Christopherson <seanjc@google.com> Date: Fri, 21 Jul 2023 13:18:59 -0700 In-Reply-To: <20230721201859.2307736-1-seanjc@google.com> Mime-Version: 1.0 References: <20230721201859.2307736-1-seanjc@google.com> X-Mailer: git-send-email 2.41.0.487.g6d72f3e995-goog Message-ID: <20230721201859.2307736-20-seanjc@google.com> Subject: [PATCH v4 19/19] KVM: VMX: Skip VMCLEAR logic during emergency reboots if CR4.VMXE=0 From: Sean Christopherson <seanjc@google.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Andrew Cooper <Andrew.Cooper3@citrix.com>, Kai Huang <kai.huang@intel.com>, Chao Gao <chao.gao@intel.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772067448268133010 X-GMAIL-MSGID: 1772067448268133010 |
Series |
x86/reboot: KVM: Clean up "emergency" virt code
|
|
Commit Message
Sean Christopherson
July 21, 2023, 8:18 p.m. UTC
Bail from vmx_emergency_disable() without processing the list of loaded
VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be
impossible for the list to have entries if VMX is already disabled, and
even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e.
processing the list is pointless even if it somehow isn't empty.
Assuming no existing KVM bugs, this should be a glorified nop. The
primary motivation for the change is to avoid having code that looks like
it does VMCLEAR, but then skips VMXON, which is nonsensical.
Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
Comments
On Fri, 2023-07-21 at 13:18 -0700, Sean Christopherson wrote: > Bail from vmx_emergency_disable() without processing the list of loaded > VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be > impossible for the list to have entries if VMX is already disabled, and > even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e. > processing the list is pointless even if it somehow isn't empty. > > Assuming no existing KVM bugs, this should be a glorified nop. The > primary motivation for the change is to avoid having code that looks like > it does VMCLEAR, but then skips VMXON, which is nonsensical. > > Suggested-by: Kai Huang <kai.huang@intel.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/vmx/vmx.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 5d21931842a5..0ef5ede9cb7c 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -773,12 +773,20 @@ static void vmx_emergency_disable(void) > > kvm_rebooting = true; > > + /* > + * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be > + * set in task context. If this races with VMX is disabled by an NMI, > + * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to > + * kvm_rebooting set. > + */ I am not quite following this comment. IIUC this code path is only called from NMI context in case of emergency VMX disable. How can it race with "VMX is disabled by an NMI"? It should be the normal vmx_hardware_disable() may race with NMI, but not this one? > + if (!(__read_cr4() & X86_CR4_VMXE)) > + return; > + > list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu), > loaded_vmcss_on_cpu_link) > vmcs_clear(v->vmcs); > > - if (__read_cr4() & X86_CR4_VMXE) > - kvm_cpu_vmxoff(); > + kvm_cpu_vmxoff(); > } > > static void __loaded_vmcs_clear(void *arg) Anyway, the actual code change LGTM: Reviewed-by: Kai Huang <kai.huang@intel.com>
On Tue, Jul 25, 2023, Kai Huang wrote: > On Fri, 2023-07-21 at 13:18 -0700, Sean Christopherson wrote: > > Bail from vmx_emergency_disable() without processing the list of loaded > > VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be > > impossible for the list to have entries if VMX is already disabled, and > > even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e. > > processing the list is pointless even if it somehow isn't empty. > > > > Assuming no existing KVM bugs, this should be a glorified nop. The > > primary motivation for the change is to avoid having code that looks like > > it does VMCLEAR, but then skips VMXON, which is nonsensical. > > > > Suggested-by: Kai Huang <kai.huang@intel.com> > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > arch/x86/kvm/vmx/vmx.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 5d21931842a5..0ef5ede9cb7c 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -773,12 +773,20 @@ static void vmx_emergency_disable(void) > > > > kvm_rebooting = true; > > > > + /* > > + * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be > > + * set in task context. If this races with VMX is disabled by an NMI, > > + * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to > > + * kvm_rebooting set. > > + */ > > I am not quite following this comment. IIUC this code path is only called from > NMI context in case of emergency VMX disable. The CPU that initiates the emergency reboot can invoke the callback from process context, only responding CPUs are guaranteed to be handled via NMI shootdown. E.g. `reboot -f` will reach this point synchronously. > How can it race with "VMX is disabled by an NMI"? Somewhat theoretically, a different CPU could panic() and do a shootdown of the CPU that is handling `reboot -f`.
On Tue, 2023-07-25 at 11:15 -0700, Sean Christopherson wrote: > On Tue, Jul 25, 2023, Kai Huang wrote: > > On Fri, 2023-07-21 at 13:18 -0700, Sean Christopherson wrote: > > > Bail from vmx_emergency_disable() without processing the list of loaded > > > VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be > > > impossible for the list to have entries if VMX is already disabled, and > > > even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e. > > > processing the list is pointless even if it somehow isn't empty. > > > > > > Assuming no existing KVM bugs, this should be a glorified nop. The > > > primary motivation for the change is to avoid having code that looks like > > > it does VMCLEAR, but then skips VMXON, which is nonsensical. > > > > > > Suggested-by: Kai Huang <kai.huang@intel.com> > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > --- > > > arch/x86/kvm/vmx/vmx.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > index 5d21931842a5..0ef5ede9cb7c 100644 > > > --- a/arch/x86/kvm/vmx/vmx.c > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > @@ -773,12 +773,20 @@ static void vmx_emergency_disable(void) > > > > > > kvm_rebooting = true; > > > > > > + /* > > > + * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be > > > + * set in task context. If this races with VMX is disabled by an NMI, > > > + * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to > > > + * kvm_rebooting set. > > > + */ > > > > I am not quite following this comment. IIUC this code path is only called from > > NMI context in case of emergency VMX disable. > > The CPU that initiates the emergency reboot can invoke the callback from process > context, only responding CPUs are guaranteed to be handled via NMI shootdown. > E.g. `reboot -f` will reach this point synchronously. > > > How can it race with "VMX is disabled by an NMI"? > > Somewhat theoretically, a different CPU could panic() and do a shootdown of the > CPU that is handling `reboot -f`. Yeah this is the only case I can think of too. Anyway, LGTM. Thanks for explaining.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 5d21931842a5..0ef5ede9cb7c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -773,12 +773,20 @@ static void vmx_emergency_disable(void) kvm_rebooting = true; + /* + * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be + * set in task context. If this races with VMX is disabled by an NMI, + * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to + * kvm_rebooting set. + */ + if (!(__read_cr4() & X86_CR4_VMXE)) + return; + list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu), loaded_vmcss_on_cpu_link) vmcs_clear(v->vmcs); - if (__read_cr4() & X86_CR4_VMXE) - kvm_cpu_vmxoff(); + kvm_cpu_vmxoff(); } static void __loaded_vmcs_clear(void *arg)