Message ID | 20221022154819.1823133-5-eesposit@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp1259744wrr; Sat, 22 Oct 2022 08:52:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7ff62+X5p/mMwV3B0Nex/kHyjwS+CmCcyz3Q3BuiU633fdkbTsYiQapKpNhPsWSQISu76V X-Received: by 2002:a17:907:160e:b0:791:8ae7:4fb8 with SMTP id hb14-20020a170907160e00b007918ae74fb8mr21286076ejc.271.1666453966057; Sat, 22 Oct 2022 08:52:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666453966; cv=none; d=google.com; s=arc-20160816; b=TlXS9IILeVXmJGUOCT0b4v3Y4q7CjjQ00JHcTMvKYrfmBFF6EtHmXu+VrH1Qr4q8SA eowxl7P+d26HwOSkTtYZEmqYUCGaAlPkmHNmrzwmgiIrnDF7AR2DR2fQbs2L6tEmX3vF +iDTJ5pxHIIUbE/NsKdPkBskFAp4a18oSE5kcnoaOVsYmQaSwFDhZTITTem5EeoHEAuS EDn/oDmMvVQZka+9WRcEZNNjkgCNW9KIiVkJJOEz3JncDePxE9K6QpQm+FsSNaDVTaVp E7TOwcRF0Akd1cpOFfPUOZr9+vJBkRGuMZn9NlVCq3r4cj0kei66Y86anKbpZmcNVEI2 lxqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=G1V0vMZpfXGyU72nQJGgurcWjMWd4+DUDDvGesYkw+w=; b=otWFK5OpiFFpnGEyF0Xg/lV0Wp8KqepcsXgl7hdo7wZuD9BLNEYbhr2yo8l52Pzl0/ 5jK9adMRVga6qQai3GIkT/CGaWlX8he5NZ9PzpZHqChM0TCTl/tNjaFwGuX21cFSQPEG X6GkJDV96k1RO2ox8V4+IGaVbOYi0zhGXwlMEdRkKz+35ET+CNWx3JOaInQC6lHZDYPr uQIU7d55d14wI3FeXZzTethmx9/XAC+Iks7zHsN7kuObhOovosFU3acL25rOdT7Af0y2 bzazDFIjHd8Yi4UpLz9LytAmvq79yN61zd8qjSs0flZxefUJgf1CPsH2EpIsS0isaAXw COVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TkytolYN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s9-20020a508d09000000b004591f959c3bsi20471835eds.427.2022.10.22.08.52.21; Sat, 22 Oct 2022 08:52:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TkytolYN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229975AbiJVPsu (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Sat, 22 Oct 2022 11:48:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229876AbiJVPsd (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 22 Oct 2022 11:48:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BADB524F789 for <linux-kernel@vger.kernel.org>; Sat, 22 Oct 2022 08:48:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666453709; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G1V0vMZpfXGyU72nQJGgurcWjMWd4+DUDDvGesYkw+w=; b=TkytolYNrexfMGbAD7fSZQujch3i2Yz2G/JbPva8cnWwM7eIBe4QsceJs7HM/A3a8AnHCe oHCjPO5eCRlvhcB+jsxjXbkTwqWQdRvWyb6T5ycnG9sUU+MhX2Gsp+3rtYRuXmlUG4YNdD gGc+dz1D9uqJ+j/2F80edbgGr6dWpLc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-103-_KyNeSWwMFydFnCfRiW-Lg-1; Sat, 22 Oct 2022 11:48:24 -0400 X-MC-Unique: _KyNeSWwMFydFnCfRiW-Lg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 980EC800B30; Sat, 22 Oct 2022 15:48:23 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3A31F4A9268; Sat, 22 Oct 2022 15:48:23 +0000 (UTC) From: Emanuele Giuseppe Esposito <eesposit@redhat.com> To: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net>, Maxim Levitsky <mlevitsk@redhat.com>, Sean Christopherson <seanjc@google.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, David Hildenbrand <david@redhat.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Emanuele Giuseppe Esposito <eesposit@redhat.com> Subject: [PATCH 4/4] KVM: use signals to abort enter_guest/blocking and retry Date: Sat, 22 Oct 2022 11:48:19 -0400 Message-Id: <20221022154819.1823133-5-eesposit@redhat.com> In-Reply-To: <20221022154819.1823133-1-eesposit@redhat.com> References: <20221022154819.1823133-1-eesposit@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747403633714788400?= X-GMAIL-MSGID: =?utf-8?q?1747403633714788400?= |
Series |
KVM: API to block and resume all running vcpus in a vm
|
|
Commit Message
Emanuele Giuseppe Esposito
Oct. 22, 2022, 3:48 p.m. UTC
Once a vcpu exectues KVM_RUN, it could enter two states:
enter guest mode, or block/halt.
Use a signal to allow a vcpu to exit the guest state or unblock,
so that it can exit KVM_RUN and release the read semaphore,
allowing a pending KVM_KICK_ALL_RUNNING_VCPUS to continue.
Note that the signal is not deleted and used to propagate the
exit reason till vcpu_run(). It will be clearead only by
KVM_RESUME_ALL_KICKED_VCPUS. This allows the vcpu to keep try
entering KVM_RUN and perform again all checks done in
kvm_arch_vcpu_ioctl_run() before entering the guest state,
where it will return again if the request is still set.
However, the userspace hypervisor should also try to avoid
continuously calling KVM_RUN after invoking KVM_KICK_ALL_RUNNING_VCPUS,
because such call will just translate in a back-to-back down_read()
and up_read() (thanks to the signal).
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/x86.c | 8 ++++++++
virt/kvm/kvm_main.c | 21 +++++++++++++++++++++
3 files changed, 31 insertions(+)
Comments
On 10/22/22 17:48, Emanuele Giuseppe Esposito wrote: > Once a vcpu exectues KVM_RUN, it could enter two states: > enter guest mode, or block/halt. > Use a signal to allow a vcpu to exit the guest state or unblock, > so that it can exit KVM_RUN and release the read semaphore, > allowing a pending KVM_KICK_ALL_RUNNING_VCPUS to continue. > > Note that the signal is not deleted and used to propagate the > exit reason till vcpu_run(). It will be clearead only by > KVM_RESUME_ALL_KICKED_VCPUS. This allows the vcpu to keep try > entering KVM_RUN and perform again all checks done in > kvm_arch_vcpu_ioctl_run() before entering the guest state, > where it will return again if the request is still set. > > However, the userspace hypervisor should also try to avoid > continuously calling KVM_RUN after invoking KVM_KICK_ALL_RUNNING_VCPUS, > because such call will just translate in a back-to-back down_read() > and up_read() (thanks to the signal). Since the userspace should anyway avoid going into this effectively-busy wait, what about clearing the request after the first exit? The cancellation ioctl can be kept for vCPUs that are never entered after KVM_KICK_ALL_RUNNING_VCPUS. Alternatively, kvm_clear_all_cpus_request could be done right before up_write(). Paolo > Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> > --- > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/x86.c | 8 ++++++++ > virt/kvm/kvm_main.c | 21 +++++++++++++++++++++ > 3 files changed, 31 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index aa381ab69a19..d5c37f344d65 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -108,6 +108,8 @@ > KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \ > KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > +#define KVM_REQ_USERSPACE_KICK \ > + KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT) > > #define CR0_RESERVED_BITS \ > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index b0c47b41c264..2af5f427b4e9 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10270,6 +10270,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > } > > if (kvm_request_pending(vcpu)) { > + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) { > + r = -EINTR; > + goto out; > + } > if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { > r = -EIO; > goto out; > @@ -10701,6 +10705,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu) > r = vcpu_block(vcpu); > } > > + /* vcpu exited guest/unblocked because of this request */ > + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) > + return -EINTR; > + > if (r <= 0) > break; > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index ae0240928a4a..13fa7229b85d 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3431,6 +3431,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) > goto out; > if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu)) > goto out; > + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) > + goto out; > > ret = 0; > out: > @@ -4668,6 +4670,25 @@ static long kvm_vm_ioctl(struct file *filp, > r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap); > break; > } > + case KVM_KICK_ALL_RUNNING_VCPUS: { > + /* > + * Notify all running vcpus that they have to stop. > + * Caught in kvm_arch_vcpu_ioctl_run() > + */ > + kvm_make_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); > + > + /* > + * Use wr semaphore to wait for all vcpus to exit from KVM_RUN. > + */ > + down_write(&memory_transaction); > + up_write(&memory_transaction); > + break; > + } > + case KVM_RESUME_ALL_KICKED_VCPUS: { > + /* Remove all requests sent with KVM_KICK_ALL_RUNNING_VCPUS */ > + kvm_clear_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); > + break; > + } > case KVM_SET_USER_MEMORY_REGION: { > struct kvm_userspace_memory_region kvm_userspace_mem; >
Am 23/10/2022 um 19:48 schrieb Paolo Bonzini: > On 10/22/22 17:48, Emanuele Giuseppe Esposito wrote: >> Once a vcpu exectues KVM_RUN, it could enter two states: >> enter guest mode, or block/halt. >> Use a signal to allow a vcpu to exit the guest state or unblock, >> so that it can exit KVM_RUN and release the read semaphore, >> allowing a pending KVM_KICK_ALL_RUNNING_VCPUS to continue. >> >> Note that the signal is not deleted and used to propagate the >> exit reason till vcpu_run(). It will be clearead only by >> KVM_RESUME_ALL_KICKED_VCPUS. This allows the vcpu to keep try >> entering KVM_RUN and perform again all checks done in >> kvm_arch_vcpu_ioctl_run() before entering the guest state, >> where it will return again if the request is still set. >> >> However, the userspace hypervisor should also try to avoid >> continuously calling KVM_RUN after invoking KVM_KICK_ALL_RUNNING_VCPUS, >> because such call will just translate in a back-to-back down_read() >> and up_read() (thanks to the signal). > > Since the userspace should anyway avoid going into this effectively-busy > wait, what about clearing the request after the first exit? The > cancellation ioctl can be kept for vCPUs that are never entered after > KVM_KICK_ALL_RUNNING_VCPUS. Alternatively, kvm_clear_all_cpus_request > could be done right before up_write(). Clearing makes sense, but should we "trust" the userspace not to go into busy wait? What's the typical "contract" between KVM and the userspace? Meaning, should we cover the basic usage mistakes like forgetting to busy wait on KVM_RUN? If we don't, I can add a comment when clearing and of course also mention it in the API documentation (that I forgot to update, sorry :D) Emanuele > > Paolo > >> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> >> --- >> arch/x86/include/asm/kvm_host.h | 2 ++ >> arch/x86/kvm/x86.c | 8 ++++++++ >> virt/kvm/kvm_main.c | 21 +++++++++++++++++++++ >> 3 files changed, 31 insertions(+) >> >> diff --git a/arch/x86/include/asm/kvm_host.h >> b/arch/x86/include/asm/kvm_host.h >> index aa381ab69a19..d5c37f344d65 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -108,6 +108,8 @@ >> KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >> #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \ >> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >> +#define KVM_REQ_USERSPACE_KICK \ >> + KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT) >> #define >> CR0_RESERVED_BITS \ >> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | >> X86_CR0_TS \ >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index b0c47b41c264..2af5f427b4e9 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -10270,6 +10270,10 @@ static int vcpu_enter_guest(struct kvm_vcpu >> *vcpu) >> } >> if (kvm_request_pending(vcpu)) { >> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) { >> + r = -EINTR; >> + goto out; >> + } >> if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { >> r = -EIO; >> goto out; >> @@ -10701,6 +10705,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu) >> r = vcpu_block(vcpu); >> } >> + /* vcpu exited guest/unblocked because of this request */ >> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) >> + return -EINTR; >> + >> if (r <= 0) >> break; >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index ae0240928a4a..13fa7229b85d 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -3431,6 +3431,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu >> *vcpu) >> goto out; >> if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu)) >> goto out; >> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) >> + goto out; >> ret = 0; >> out: >> @@ -4668,6 +4670,25 @@ static long kvm_vm_ioctl(struct file *filp, >> r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap); >> break; >> } >> + case KVM_KICK_ALL_RUNNING_VCPUS: { >> + /* >> + * Notify all running vcpus that they have to stop. >> + * Caught in kvm_arch_vcpu_ioctl_run() >> + */ >> + kvm_make_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); >> + >> + /* >> + * Use wr semaphore to wait for all vcpus to exit from KVM_RUN. >> + */ >> + down_write(&memory_transaction); >> + up_write(&memory_transaction); >> + break; >> + } >> + case KVM_RESUME_ALL_KICKED_VCPUS: { >> + /* Remove all requests sent with KVM_KICK_ALL_RUNNING_VCPUS */ >> + kvm_clear_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); >> + break; >> + } >> case KVM_SET_USER_MEMORY_REGION: { >> struct kvm_userspace_memory_region kvm_userspace_mem; >> >
Am 24/10/2022 um 09:43 schrieb Emanuele Giuseppe Esposito: > > > Am 23/10/2022 um 19:48 schrieb Paolo Bonzini: >> On 10/22/22 17:48, Emanuele Giuseppe Esposito wrote: >>> Once a vcpu exectues KVM_RUN, it could enter two states: >>> enter guest mode, or block/halt. >>> Use a signal to allow a vcpu to exit the guest state or unblock, >>> so that it can exit KVM_RUN and release the read semaphore, >>> allowing a pending KVM_KICK_ALL_RUNNING_VCPUS to continue. >>> >>> Note that the signal is not deleted and used to propagate the >>> exit reason till vcpu_run(). It will be clearead only by >>> KVM_RESUME_ALL_KICKED_VCPUS. This allows the vcpu to keep try >>> entering KVM_RUN and perform again all checks done in >>> kvm_arch_vcpu_ioctl_run() before entering the guest state, >>> where it will return again if the request is still set. >>> >>> However, the userspace hypervisor should also try to avoid >>> continuously calling KVM_RUN after invoking KVM_KICK_ALL_RUNNING_VCPUS, >>> because such call will just translate in a back-to-back down_read() >>> and up_read() (thanks to the signal). >> >> Since the userspace should anyway avoid going into this effectively-busy >> wait, what about clearing the request after the first exit? The >> cancellation ioctl can be kept for vCPUs that are never entered after >> KVM_KICK_ALL_RUNNING_VCPUS. Alternatively, kvm_clear_all_cpus_request >> could be done right before up_write(). > > Clearing makes sense, but should we "trust" the userspace not to go into > busy wait? Actually since this change is just a s/test/check, I would rather keep test. If userspace does things wrong, this mechanism would still work properly. > What's the typical "contract" between KVM and the userspace? Meaning, > should we cover the basic usage mistakes like forgetting to busy wait on > KVM_RUN? > > If we don't, I can add a comment when clearing and of course also > mention it in the API documentation (that I forgot to update, sorry :D) > > Emanuele > >> >> Paolo >> >>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> >>> --- >>> arch/x86/include/asm/kvm_host.h | 2 ++ >>> arch/x86/kvm/x86.c | 8 ++++++++ >>> virt/kvm/kvm_main.c | 21 +++++++++++++++++++++ >>> 3 files changed, 31 insertions(+) >>> >>> diff --git a/arch/x86/include/asm/kvm_host.h >>> b/arch/x86/include/asm/kvm_host.h >>> index aa381ab69a19..d5c37f344d65 100644 >>> --- a/arch/x86/include/asm/kvm_host.h >>> +++ b/arch/x86/include/asm/kvm_host.h >>> @@ -108,6 +108,8 @@ >>> KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >>> #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \ >>> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >>> +#define KVM_REQ_USERSPACE_KICK \ >>> + KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT) >>> #define >>> CR0_RESERVED_BITS \ >>> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | >>> X86_CR0_TS \ >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index b0c47b41c264..2af5f427b4e9 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -10270,6 +10270,10 @@ static int vcpu_enter_guest(struct kvm_vcpu >>> *vcpu) >>> } >>> if (kvm_request_pending(vcpu)) { >>> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) { >>> + r = -EINTR; >>> + goto out; >>> + } >>> if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { >>> r = -EIO; >>> goto out; >>> @@ -10701,6 +10705,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu) >>> r = vcpu_block(vcpu); >>> } >>> + /* vcpu exited guest/unblocked because of this request */ >>> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) >>> + return -EINTR; >>> + >>> if (r <= 0) >>> break; >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>> index ae0240928a4a..13fa7229b85d 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -3431,6 +3431,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu >>> *vcpu) >>> goto out; >>> if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu)) >>> goto out; >>> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) >>> + goto out; >>> ret = 0; >>> out: >>> @@ -4668,6 +4670,25 @@ static long kvm_vm_ioctl(struct file *filp, >>> r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap); >>> break; >>> } >>> + case KVM_KICK_ALL_RUNNING_VCPUS: { >>> + /* >>> + * Notify all running vcpus that they have to stop. >>> + * Caught in kvm_arch_vcpu_ioctl_run() >>> + */ >>> + kvm_make_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); >>> + >>> + /* >>> + * Use wr semaphore to wait for all vcpus to exit from KVM_RUN. >>> + */ >>> + down_write(&memory_transaction); >>> + up_write(&memory_transaction); >>> + break; >>> + } >>> + case KVM_RESUME_ALL_KICKED_VCPUS: { >>> + /* Remove all requests sent with KVM_KICK_ALL_RUNNING_VCPUS */ >>> + kvm_clear_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); >>> + break; >>> + } >>> case KVM_SET_USER_MEMORY_REGION: { >>> struct kvm_userspace_memory_region kvm_userspace_mem; >>> >>
On 10/24/22 09:43, Emanuele Giuseppe Esposito wrote: >> Since the userspace should anyway avoid going into this effectively-busy >> wait, what about clearing the request after the first exit? The >> cancellation ioctl can be kept for vCPUs that are never entered after >> KVM_KICK_ALL_RUNNING_VCPUS. Alternatively, kvm_clear_all_cpus_request >> could be done right before up_write(). > > Clearing makes sense, but should we "trust" the userspace not to go into > busy wait? I think so, there are many other ways for userspace to screw up. > What's the typical "contract" between KVM and the userspace? Meaning, > should we cover the basic usage mistakes like forgetting to busy wait on > KVM_RUN? Being able to remove the second ioctl if you do (sort-of pseudocode based on this v1) kvm_make_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); down_write(&kvm->memory_transaction); up_write(&kvm->memory_transaction); kvm_clear_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); would be worth it, I think. Paolo
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index aa381ab69a19..d5c37f344d65 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -108,6 +108,8 @@ KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \ KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) +#define KVM_REQ_USERSPACE_KICK \ + KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT) #define CR0_RESERVED_BITS \ (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b0c47b41c264..2af5f427b4e9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10270,6 +10270,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) } if (kvm_request_pending(vcpu)) { + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) { + r = -EINTR; + goto out; + } if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { r = -EIO; goto out; @@ -10701,6 +10705,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu) r = vcpu_block(vcpu); } + /* vcpu exited guest/unblocked because of this request */ + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) + return -EINTR; + if (r <= 0) break; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ae0240928a4a..13fa7229b85d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3431,6 +3431,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) goto out; if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu)) goto out; + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) + goto out; ret = 0; out: @@ -4668,6 +4670,25 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap); break; } + case KVM_KICK_ALL_RUNNING_VCPUS: { + /* + * Notify all running vcpus that they have to stop. + * Caught in kvm_arch_vcpu_ioctl_run() + */ + kvm_make_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); + + /* + * Use wr semaphore to wait for all vcpus to exit from KVM_RUN. + */ + down_write(&memory_transaction); + up_write(&memory_transaction); + break; + } + case KVM_RESUME_ALL_KICKED_VCPUS: { + /* Remove all requests sent with KVM_KICK_ALL_RUNNING_VCPUS */ + kvm_clear_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK); + break; + } case KVM_SET_USER_MEMORY_REGION: { struct kvm_userspace_memory_region kvm_userspace_mem;