Message ID | 20230310221414.811690-3-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1128480wrd; Fri, 10 Mar 2023 14:24:04 -0800 (PST) X-Google-Smtp-Source: AK7set8EbTMVPOj2iRNE3hCeWkd6vOwUh29r8OyYMFM5NmLyobEG2WXsTY+Cbvyf8e+LEEX34goE X-Received: by 2002:a17:903:22c5:b0:19d:1686:989 with SMTP id y5-20020a17090322c500b0019d16860989mr32775323plg.59.1678487044549; Fri, 10 Mar 2023 14:24:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678487044; cv=none; d=google.com; s=arc-20160816; b=qGbKMimX/csVM/WP4Mjlc2izWmqBCZ6dEaxBnbwSe8gBmQIw9/enUAd2nruQ9dAIe8 I9igzNmKD6VLVftueseNIJT7Y42jxMi1Evh886qCtwXAKEyZQ5Xodnll9FJR5nXxjfcV E/Qs49pUhnPJPmnEwF0W0i9zOk1k/ukNJ7w6sW+kZkWTpxlhbjuPZtj9MPFHVxB8ArFT heWUmUmF0e071l7tvIGe8MmefFU+69OofMmFZ+J0At9zTUiaYNO6LhdEKfqCphNhqXCi NWuoAbf1QsYvbEH/y5r3IeomOrupEb3FsHgYnABtKG1e1agXW1bYCjQQ9KpOtw+PWern xUbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=2opxKga+fb5i4H+BY6IhBhjz7Vmgmcavrh6WO0nAabw=; b=X8KsCFMne9CkbbNXcIvP2aKxwXHsnnOY05+QZ06Xd9+SYwDN7Ed441LLEAIIwtho0v sYR4xkil7pcRSonFdRqj5XdTnA8fUr+LlQ9sjf5caIkQ3KyWIufhMR0JtksQia2PRONZ Q9OHa8WH2V5TWOgs81Sp7aOU0wuE0iSwFVkF+0BmaAZaPTlT182vD1aUK9BGQNJOAkdg 0WP3qd84PBxtiAzbVlrZjqxGzDmrNRPqnOCBR8VxzywjK1Zv1r12XI3eGW1CoYUh0bs3 VkV9n4F4xt5KvXFqqjSVgucKmBIxP2C+k4gXCw9t9Ju8v8WXRGJwOcWxuBRUfvX+2lpK SmDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=XANFaeRY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ke11-20020a170903340b00b0019ca6e611d3si870429plb.174.2023.03.10.14.23.46; Fri, 10 Mar 2023 14:24:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=XANFaeRY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231567AbjCJWOe (ORCPT <rfc822;carlos.wei.hk@gmail.com> + 99 others); Fri, 10 Mar 2023 17:14:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231355AbjCJWOY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 10 Mar 2023 17:14:24 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8D4613F56F for <linux-kernel@vger.kernel.org>; Fri, 10 Mar 2023 14:14:21 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id m6-20020a056902118600b00aeb1e3dbd1bso7061743ybu.9 for <linux-kernel@vger.kernel.org>; Fri, 10 Mar 2023 14:14:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678486460; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=2opxKga+fb5i4H+BY6IhBhjz7Vmgmcavrh6WO0nAabw=; b=XANFaeRYOx5Rd/kapDHhNmO90qoMVlvQlylWOJKFiusgRTk50x3TP11Tb99KjhRtRp Ilp6Fnx2ETLBE3ONxbTQK/JHBUg90svkVN5Ksjq6gDLd6XGGLH3cF/yhlaBgocMStinz BgYUmwsqJH2QawF4/6NH9+aU6CFGqKWcG0poEIYHj6EwiQB9E2zPA277B2ph+E8+ViKI skXt6AXZdpliK6f0KboRPpk16NPL23XsUnuJZwzJxP6uQZnZFD4DhdkN8ZAiCA34+SyC eiBTO57MgJwbaZpawRvV5J7htbbDb96jJ2ohiFCXMIPVyvl34nTNPoV17hacH1Od+NjM RPGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678486460; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2opxKga+fb5i4H+BY6IhBhjz7Vmgmcavrh6WO0nAabw=; b=wO/v/zsyPiowWpu2VLAC5uY8eJX6HRguS3xn2zljEJytpguTi9S8ZLSv4tskrPHtwQ bXpgsyjl/nOMQokd9vSyBZ3EfQed8xH6HRYXYu6NR3um4Dw4vpkfeyE7tdWJG+cGpJPM kh9UpdhHD0WS5uLsoO/hEOOM+9eELT3Ek0Gv75snmZOvLnHMjxpsV2BRrjBReK+iGiCb BjVKGTA32qbNeT8B0gfpGDz1eC8m1IyRD9M9VIwHqEoMbvNZemrXNEWxQns8CPnrvhDL CbY19uDfu5CxjHbiomkdnxe6r0rvpF1BjNoqR1aIECOYnq+u7D0n/ffqMzBRmgNzY3o7 cOjg== X-Gm-Message-State: AO0yUKV0Ow6TbZrtufWUpyGqKjKLGKNM528uBq0Zl3NfiQjISB4pauco ehLKj4zSotvBze4tIm7hTeLBJSGlGS4= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1149:b0:8da:3163:224 with SMTP id p9-20020a056902114900b008da31630224mr2418826ybu.0.1678486460678; Fri, 10 Mar 2023 14:14:20 -0800 (PST) Reply-To: Sean Christopherson <seanjc@google.com> Date: Fri, 10 Mar 2023 14:14:14 -0800 In-Reply-To: <20230310221414.811690-1-seanjc@google.com> Mime-Version: 1.0 References: <20230310221414.811690-1-seanjc@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230310221414.811690-3-seanjc@google.com> Subject: [PATCH 2/2] KVM: Don't enable hardware after a restart/shutdown is initiated From: Sean Christopherson <seanjc@google.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Marc Zyngier <maz@kernel.org>, Oliver Upton <oliver.upton@linux.dev>, James Morse <james.morse@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Zenghui Yu <yuzenghui@huawei.com>, kvmarm@lists.linux.dev, Huacai Chen <chenhuacai@kernel.org>, Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, Anup Patel <anup@brainfault.org>, Atish Patra <atishp@atishpatra.org>, kvm-riscv@lists.infradead.org, Sean Christopherson <seanjc@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760021231043667034?= X-GMAIL-MSGID: =?utf-8?q?1760021231043667034?= |
Series |
KVM: Fix race between reboot and hardware enabling
|
|
Commit Message
Sean Christopherson
March 10, 2023, 10:14 p.m. UTC
Reject hardware enabling, i.e. VM creation, if a restart/shutdown has
been initiated to avoid re-enabling hardware between kvm_reboot() and
machine_{halt,power_off,restart}(). The restart case is especially
problematic (for x86) as enabling VMX (or clearing GIF in KVM_RUN on
SVM) blocks INIT, which results in the restart/reboot hanging as BIOS
is unable to wake and rendezvous with APs.
Note, this bug, and the original issue that motivated the addition of
kvm_reboot(), is effectively limited to a forced reboot, e.g. `reboot -f`.
In a "normal" reboot, userspace will gracefully teardown userspace before
triggering the kernel reboot (modulo bugs, errors, etc), i.e. any process
that might do ioctl(KVM_CREATE_VM) is long gone.
Fixes: 8e1c18157d87 ("KVM: VMX: Disable VMX when system shutdown")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
Comments
On Fri, 10 Mar 2023 22:14:14 +0000, Sean Christopherson <seanjc@google.com> wrote: > > Reject hardware enabling, i.e. VM creation, if a restart/shutdown has > been initiated to avoid re-enabling hardware between kvm_reboot() and > machine_{halt,power_off,restart}(). The restart case is especially > problematic (for x86) as enabling VMX (or clearing GIF in KVM_RUN on > SVM) blocks INIT, which results in the restart/reboot hanging as BIOS > is unable to wake and rendezvous with APs. > > Note, this bug, and the original issue that motivated the addition of > kvm_reboot(), is effectively limited to a forced reboot, e.g. `reboot -f`. > In a "normal" reboot, userspace will gracefully teardown userspace before > triggering the kernel reboot (modulo bugs, errors, etc), i.e. any process > that might do ioctl(KVM_CREATE_VM) is long gone. > > Fixes: 8e1c18157d87 ("KVM: VMX: Disable VMX when system shutdown") > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > virt/kvm/kvm_main.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 6cdfbb2c641b..b2bf4c105181 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -5182,7 +5182,20 @@ static void hardware_disable_all(void) > static int hardware_enable_all(void) > { > atomic_t failed = ATOMIC_INIT(0); > - int r = 0; > + int r; > + > + /* > + * Do not enable hardware virtualization if the system is going down. > + * If userspace initiated a forced reboot, e.g. reboot -f, then it's > + * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling > + * after kvm_reboot() is called. Note, this relies on system_state > + * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops > + * hook instead of registering a dedicated reboot notifier (the latter > + * runs before system_state is updated). > + */ > + if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF || > + system_state == SYSTEM_RESTART) > + return -EBUSY; Since we now seem to be relying on system_state for most things, is there any use for 'kvm_rebooting' other than the ease of evaluation in __svm_vcpu_run? M.
On Sun, Mar 12, 2023, Marc Zyngier wrote: > On Fri, 10 Mar 2023 22:14:14 +0000, > Sean Christopherson <seanjc@google.com> wrote: > > > > Reject hardware enabling, i.e. VM creation, if a restart/shutdown has > > been initiated to avoid re-enabling hardware between kvm_reboot() and > > machine_{halt,power_off,restart}(). The restart case is especially > > problematic (for x86) as enabling VMX (or clearing GIF in KVM_RUN on > > SVM) blocks INIT, which results in the restart/reboot hanging as BIOS > > is unable to wake and rendezvous with APs. > > > > Note, this bug, and the original issue that motivated the addition of > > kvm_reboot(), is effectively limited to a forced reboot, e.g. `reboot -f`. > > In a "normal" reboot, userspace will gracefully teardown userspace before > > triggering the kernel reboot (modulo bugs, errors, etc), i.e. any process > > that might do ioctl(KVM_CREATE_VM) is long gone. > > > > Fixes: 8e1c18157d87 ("KVM: VMX: Disable VMX when system shutdown") > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > virt/kvm/kvm_main.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 6cdfbb2c641b..b2bf4c105181 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -5182,7 +5182,20 @@ static void hardware_disable_all(void) > > static int hardware_enable_all(void) > > { > > atomic_t failed = ATOMIC_INIT(0); > > - int r = 0; > > + int r; > > + > > + /* > > + * Do not enable hardware virtualization if the system is going down. > > + * If userspace initiated a forced reboot, e.g. reboot -f, then it's > > + * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling > > + * after kvm_reboot() is called. Note, this relies on system_state > > + * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops > > + * hook instead of registering a dedicated reboot notifier (the latter > > + * runs before system_state is updated). > > + */ > > + if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF || > > + system_state == SYSTEM_RESTART) > > + return -EBUSY; > > Since we now seem to be relying on system_state for most things, is > there any use for 'kvm_rebooting' other than the ease of evaluation in > __svm_vcpu_run? Sadly, yes. The x86 implementations of emergency_restart(), __crash_kexec() and other emergency reboot flows disable virtualization and set 'kvm_rebooting' without touching system_state. VMX and SVM rely on 'kvm_rebooting' being set to avoid triggering (another) BUG() during the emergency. On my todo list is to better understand whether or not the other architectures that utilize the generic hardware enabling (ARM, RISC-V, MIPS) truly need to disable virtualization during a reboot, versus KVM simply being polite. E.g. on x86, if VMX is left enabled, reboot may hang depending on how the reboot is performed. If other architectures really truly need to disable virtualization, then they likely need something similar to x86's emergency reboot shenanigans.
On Mon, 13 Mar 2023 15:02:27 +0000, Sean Christopherson <seanjc@google.com> wrote: > > On my todo list is to better understand whether or not the other architectures > that utilize the generic hardware enabling (ARM, RISC-V, MIPS) truly need to disable > virtualization during a reboot, versus KVM simply being polite. E.g. on x86, if VMX > is left enabled, reboot may hang depending on how the reboot is performed. If > other architectures really truly need to disable virtualization, then they likely > need something similar to x86's emergency reboot shenanigans. At least pre-CCA, there isn't much to do, because there is no such thing as "disabling virtualisation". For kexec, the only things we need to do are to go back to EL2 in the nVHE case, and in any case to put all other CPUs back into the firmware (PSCI CPU_OFF). CCA may well add other things into the picture, because it is a parallel exception level that KVM doesn't really control. That's one of the many open questions I have about this "lovely" piece of architecture. Of course, if we were to completely ignore CCA and instead use the underlying HW (aka RME), things would be a lot simpler and we'd be back to my original statement... M.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6cdfbb2c641b..b2bf4c105181 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5182,7 +5182,20 @@ static void hardware_disable_all(void) static int hardware_enable_all(void) { atomic_t failed = ATOMIC_INIT(0); - int r = 0; + int r; + + /* + * Do not enable hardware virtualization if the system is going down. + * If userspace initiated a forced reboot, e.g. reboot -f, then it's + * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling + * after kvm_reboot() is called. Note, this relies on system_state + * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops + * hook instead of registering a dedicated reboot notifier (the latter + * runs before system_state is updated). + */ + if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF || + system_state == SYSTEM_RESTART) + return -EBUSY; /* * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu() @@ -5195,6 +5208,8 @@ static int hardware_enable_all(void) cpus_read_lock(); mutex_lock(&kvm_lock); + r = 0; + kvm_usage_count++; if (kvm_usage_count == 1) { on_each_cpu(hardware_enable_nolock, &failed, 1);