Message ID | 20221103190601.399343-1-vkuznets@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp719350wru; Thu, 3 Nov 2022 12:16:37 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5jXDNNPd0DZBpnp8L+64jGrgM1qepsDosTL7GOEdZpAOF4rvD8XTo5zxibxYL1GhLgU73e X-Received: by 2002:a17:907:6095:b0:78d:bb0b:c34d with SMTP id ht21-20020a170907609500b0078dbb0bc34dmr30710683ejc.662.1667502997738; Thu, 03 Nov 2022 12:16:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667502997; cv=none; d=google.com; s=arc-20160816; b=SFqxurkA2A7qXIerBzZswflhkSVELe42VaxGkUeb/9vyJ0TSTavtYNlgKZueOFRaxq c8JCqxAVeA+Wap0l4J/NqaiBdZDfX1UJjwOVVVQV2NaC9PBEQNhRW3ZEcwdyCtbn/xpB 7XXkfc5hnz99j9UsB73yX+TaHOmVaYhRwI5Hb55RFM0Qkn0dJEQ5vcomj+g01hGvIF8N IsTRQ6Ig0gPN3q6VMIp6Iypzzm3cFW3UDZ+eBh57YgTYpod3ePKTSeDm+U8VMx3haX8u jzXPeDGKUIN9W792kgo0ctzgEa83mZqTZhbZp6vv4Kfk6LfNl0o9BnHQqGa+9Cd66JBo omAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=1Ns3xU1SqKKJDYkuvUPPy7RWE9S2cnkow8czUZNPGj8=; b=GFH25kC0a4LmtGnfjskwwyuEU9BhGpIy5xeiMW1AXh2Kt8CzuUaS38rOtuXEvkz0Vu 7DiQNWEIoGXBrHz9lwsyh6LGJvHU8UXQVXcTtkb0kxWxTAOlsjO7HJIEOIZKt8Kf2yRT SaLBuKeo7mawqJW/XclFc02RZQ0QUK8BL3HoQ2oyAeSqyP/csqHi9NKyk40qrKy1tfr8 KaYyZUXSn0nxGOcOlE3cPQhKVE0Rr4Ge0LcEX0QAVbjGElBJIsw1CaVUlbXZsD6s9qQQ efUNgEbcwIbPEodC/H6iYkoY/qfz7sdutdsfErLtqRoYNCn3oz1/w7WsxuO3kvB9dbk/ D5OQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ELG1mZSm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hd9-20020a170907968900b007815a44de91si2510216ejc.771.2022.11.03.12.16.12; Thu, 03 Nov 2022 12:16:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ELG1mZSm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231230AbiKCTHK (ORCPT <rfc822;yves.mi.zy@gmail.com> + 99 others); Thu, 3 Nov 2022 15:07:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229493AbiKCTHH (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 3 Nov 2022 15:07:07 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCA2A1D0F3 for <linux-kernel@vger.kernel.org>; Thu, 3 Nov 2022 12:06:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667502367; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=1Ns3xU1SqKKJDYkuvUPPy7RWE9S2cnkow8czUZNPGj8=; b=ELG1mZSmNc9RJGQeZqGA9ufKgMXxKakarwXprP+b3kDRj5vfvh0U84IOUSa7tegBocGFjH 3xA/5R8jGVjRlOY/G1OR7BQm8heJNGoqdMuq6snsE6w2JGkc7SRsNc4XoSdxvVi5FP8Av9 znWxj7Hnde3wY4jdPvrCufknZ9Sglr0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-426-qrCib1lGMUGTf6kfahNkiA-1; Thu, 03 Nov 2022 15:06:04 -0400 X-MC-Unique: qrCib1lGMUGTf6kfahNkiA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1A0F63C32C54; Thu, 3 Nov 2022 19:06:04 +0000 (UTC) Received: from ovpn-194-252.brq.redhat.com (ovpn-194-252.brq.redhat.com [10.40.194.252]) by smtp.corp.redhat.com (Postfix) with ESMTP id 58AEF2166B26; Thu, 3 Nov 2022 19:06:02 +0000 (UTC) From: Vitaly Kuznetsov <vkuznets@redhat.com> To: linux-hyperv@vger.kernel.org Cc: "K. Y. Srinivasan" <kys@microsoft.com>, Haiyang Zhang <haiyangz@microsoft.com>, Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>, Michael Kelley <mikelley@microsoft.com>, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining Date: Thu, 3 Nov 2022 20:06:01 +0100 Message-Id: <20221103190601.399343-1-vkuznets@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748503623536579484?= X-GMAIL-MSGID: =?utf-8?q?1748503623536579484?= |
Series |
x86/hyperv: Restore VP assist page after cpu offlining/onlining
|
|
Commit Message
Vitaly Kuznetsov
Nov. 3, 2022, 7:06 p.m. UTC
Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
'if (*hvp)' condition. This works for root partition as hv_cpu_die()
does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
for them. This causes VP assist page to remain unset after CPU
offline/online cycle:
$ rdmsr -p 24 0x40000073
10212f001
$ echo 0 > /sys/devices/system/cpu/cpu24/online
$ echo 1 > /sys/devices/system/cpu/cpu24/online
$ rdmsr -p 24 0x40000073
0
Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
NULL (and it's also NULL initially).
Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
present a (potential) issue for KVM. While Hyper-V uses
CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses CPUHP_AP_KVM_STARTING
which comes earlier in CPU teardown sequence. It is theoretically
possible that Enlightened VMCS is still in use. It is unclear if the
issue is real and if using KVM with Hyper-V root partition is even
possible.
While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().
Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist page MSR")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 28 deletions(-)
Comments
From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Thursday, November 3, 2022 12:06 PM > > Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing > to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under > 'if (*hvp)' condition. This works for root partition as hv_cpu_die() > does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks > non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]' > for them. Did you consider having hv_cpu_die() free the VP assist page and set hv_vp_assist_page[cpu] to NULL in the non-root case? That would make the root and non-root cases more consistent, and it would make hv_cpu_init() and hv_cpu_die() more symmetrical. The hv_cpu_die() path frees up pretty much all the other per-CPU resources. I don't know why it keeps the VP assist page for re-use if the CPU comes back online later. You added the original code for allocating the vp_assist_page in commit a46d15cc1a, so maybe you remember if there are any gotchas. :-) Michael > This causes VP assist page to remain unset after CPU > offline/online cycle: > > $ rdmsr -p 24 0x40000073 > 10212f001 > $ echo 0 > /sys/devices/system/cpu/cpu24/online > $ echo 1 > /sys/devices/system/cpu/cpu24/online > $ rdmsr -p 24 0x40000073 > 0 > > Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in > hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is > pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to > NULL (and it's also NULL initially). > > Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may > present a (potential) issue for KVM. While Hyper-V uses > CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses > CPUHP_AP_KVM_STARTING > which comes earlier in CPU teardown sequence. It is theoretically > possible that Enlightened VMCS is still in use. It is unclear if the > issue is real and if using KVM with Hyper-V root partition is even > possible. > > While on it, drop the unneeded smp_processor_id() call from hv_cpu_init(). > > Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist > page MSR") > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++-------------------- > 1 file changed, 26 insertions(+), 28 deletions(-) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index f49bc3ec76e6..a269049a43ce 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void) > static int hv_cpu_init(unsigned int cpu) > { > union hv_vp_assist_msr_contents msr = { 0 }; > - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; > + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; > int ret; > > ret = hv_common_cpu_init(cpu); > @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu) > if (!hv_vp_assist_page) > return 0; > > - if (!*hvp) { > - if (hv_root_partition) { > - /* > - * For root partition we get the hypervisor provided VP assist > - * page, instead of allocating a new page. > - */ > - rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); > - *hvp = memremap(msr.pfn << > - > HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT, > - PAGE_SIZE, MEMREMAP_WB); > - } else { > - /* > - * The VP assist page is an "overlay" page (see Hyper-V TLFS's > - * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed > - * out to make sure we always write the EOI MSR in > - * hv_apic_eoi_write() *after* the EOI optimization is disabled > - * in hv_cpu_die(), otherwise a CPU may not be stopped in the > - * case of CPU offlining and the VM will hang. > - */ > + if (hv_root_partition) { > + /* > + * For root partition we get the hypervisor provided VP assist > + * page, instead of allocating a new page. > + */ > + rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); > + *hvp = memremap(msr.pfn << > HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT, > + PAGE_SIZE, MEMREMAP_WB); > + } else { > + /* > + * The VP assist page is an "overlay" page (see Hyper-V TLFS's > + * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed > + * out to make sure we always write the EOI MSR in > + * hv_apic_eoi_write() *after* the EOI optimization is disabled > + * in hv_cpu_die(), otherwise a CPU may not be stopped in the > + * case of CPU offlining and the VM will hang. > + */ > + if (!*hvp) > *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO); > - if (*hvp) > - msr.pfn = vmalloc_to_pfn(*hvp); > - } > - WARN_ON(!(*hvp)); > - if (*hvp) { > - msr.enable = 1; > - wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); > - } > + if (*hvp) > + msr.pfn = vmalloc_to_pfn(*hvp); > + > + } > + if (!WARN_ON(!(*hvp))) { > + msr.enable = 1; > + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); > } > > return hyperv_init_ghcb(); > -- > 2.38.1
"Michael Kelley (LINUX)" <mikelley@microsoft.com> writes: > From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Thursday, November 3, 2022 12:06 PM >> >> Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing >> to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under >> 'if (*hvp)' condition. This works for root partition as hv_cpu_die() >> does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks >> non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]' >> for them. > > Did you consider having hv_cpu_die() free the VP assist page and > set hv_vp_assist_page[cpu] to NULL in the non-root case? Oh yes, I did, and I even wrote a patch (attached) but it failed in testing :-( My testing was to run CPU onlining/offlining loop and and try using KVM on the same CPU at the same time. I was able to see issues when KVM was not able to reach to Enlightened VMCS because VP assist page was NULL. > That would > make the root and non-root cases more consistent, and it would make > hv_cpu_init() and hv_cpu_die() more symmetrical. The hv_cpu_die() > path frees up pretty much all the other per-CPU resources. I don't > know why it keeps the VP assist page for re-use if the CPU comes back > online later. > > You added the original code for allocating the vp_assist_page in > commit a46d15cc1a, so maybe you remember if there are any > gotchas. :-) The root cause of the problem I observed seems to be the order of CPU hotplug. Hyper-V uses the last CPUHP_AP_ONLINE_DYN stage while KVM has his own dedicated CPUHP_AP_KVM_STARTING one so we end up freeing VP assist page before turning KVM off on the CPU. It may be sufficient to introduce a new stage for Hyper-V and put it before KVM's. It's in my backlog to explore this path but it may take me some time to get back to it :-( > > Michael > >> This causes VP assist page to remain unset after CPU >> offline/online cycle: >> >> $ rdmsr -p 24 0x40000073 >> 10212f001 >> $ echo 0 > /sys/devices/system/cpu/cpu24/online >> $ echo 1 > /sys/devices/system/cpu/cpu24/online >> $ rdmsr -p 24 0x40000073 >> 0 >> >> Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in >> hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is >> pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to >> NULL (and it's also NULL initially). >> >> Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may >> present a (potential) issue for KVM. While Hyper-V uses >> CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses >> CPUHP_AP_KVM_STARTING >> which comes earlier in CPU teardown sequence. It is theoretically >> possible that Enlightened VMCS is still in use. It is unclear if the >> issue is real and if using KVM with Hyper-V root partition is even >> possible. >> >> While on it, drop the unneeded smp_processor_id() call from hv_cpu_init(). >> >> Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist >> page MSR") >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> >> --- >> arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++-------------------- >> 1 file changed, 26 insertions(+), 28 deletions(-) >> >> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c >> index f49bc3ec76e6..a269049a43ce 100644 >> --- a/arch/x86/hyperv/hv_init.c >> +++ b/arch/x86/hyperv/hv_init.c >> @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void) >> static int hv_cpu_init(unsigned int cpu) >> { >> union hv_vp_assist_msr_contents msr = { 0 }; >> - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; >> + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; >> int ret; >> >> ret = hv_common_cpu_init(cpu); >> @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu) >> if (!hv_vp_assist_page) >> return 0; >> >> - if (!*hvp) { >> - if (hv_root_partition) { >> - /* >> - * For root partition we get the hypervisor provided VP assist >> - * page, instead of allocating a new page. >> - */ >> - rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); >> - *hvp = memremap(msr.pfn << >> - >> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT, >> - PAGE_SIZE, MEMREMAP_WB); >> - } else { >> - /* >> - * The VP assist page is an "overlay" page (see Hyper-V TLFS's >> - * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed >> - * out to make sure we always write the EOI MSR in >> - * hv_apic_eoi_write() *after* the EOI optimization is disabled >> - * in hv_cpu_die(), otherwise a CPU may not be stopped in the >> - * case of CPU offlining and the VM will hang. >> - */ >> + if (hv_root_partition) { >> + /* >> + * For root partition we get the hypervisor provided VP assist >> + * page, instead of allocating a new page. >> + */ >> + rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); >> + *hvp = memremap(msr.pfn << >> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT, >> + PAGE_SIZE, MEMREMAP_WB); >> + } else { >> + /* >> + * The VP assist page is an "overlay" page (see Hyper-V TLFS's >> + * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed >> + * out to make sure we always write the EOI MSR in >> + * hv_apic_eoi_write() *after* the EOI optimization is disabled >> + * in hv_cpu_die(), otherwise a CPU may not be stopped in the >> + * case of CPU offlining and the VM will hang. >> + */ >> + if (!*hvp) >> *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO); >> - if (*hvp) >> - msr.pfn = vmalloc_to_pfn(*hvp); >> - } >> - WARN_ON(!(*hvp)); >> - if (*hvp) { >> - msr.enable = 1; >> - wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); >> - } >> + if (*hvp) >> + msr.pfn = vmalloc_to_pfn(*hvp); >> + >> + } >> + if (!WARN_ON(!(*hvp))) { >> + msr.enable = 1; >> + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); >> } >> >> return hyperv_init_ghcb(); >> -- >> 2.38.1 >
From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Thursday, November 3, 2022 12:06 PM > > Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing > to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under > 'if (*hvp)' condition. This works for root partition as hv_cpu_die() > does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks > non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]' > for them. This causes VP assist page to remain unset after CPU > offline/online cycle: > > $ rdmsr -p 24 0x40000073 > 10212f001 > $ echo 0 > /sys/devices/system/cpu/cpu24/online > $ echo 1 > /sys/devices/system/cpu/cpu24/online > $ rdmsr -p 24 0x40000073 > 0 > > Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in > hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is > pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to > NULL (and it's also NULL initially). > > Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may > present a (potential) issue for KVM. While Hyper-V uses > CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses > CPUHP_AP_KVM_STARTING > which comes earlier in CPU teardown sequence. It is theoretically > possible that Enlightened VMCS is still in use. It is unclear if the > issue is real and if using KVM with Hyper-V root partition is even > possible. > > While on it, drop the unneeded smp_processor_id() call from hv_cpu_init(). > > Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist > page MSR") > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++-------------------- > 1 file changed, 26 insertions(+), 28 deletions(-) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index f49bc3ec76e6..a269049a43ce 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void) > static int hv_cpu_init(unsigned int cpu) > { > union hv_vp_assist_msr_contents msr = { 0 }; > - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; > + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; > int ret; > > ret = hv_common_cpu_init(cpu); > @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu) > if (!hv_vp_assist_page) > return 0; > > - if (!*hvp) { > - if (hv_root_partition) { > - /* > - * For root partition we get the hypervisor provided VP assist > - * page, instead of allocating a new page. > - */ > - rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); > - *hvp = memremap(msr.pfn << > - > HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT, > - PAGE_SIZE, MEMREMAP_WB); > - } else { > - /* > - * The VP assist page is an "overlay" page (see Hyper-V TLFS's > - * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed > - * out to make sure we always write the EOI MSR in > - * hv_apic_eoi_write() *after* the EOI optimization is disabled > - * in hv_cpu_die(), otherwise a CPU may not be stopped in the > - * case of CPU offlining and the VM will hang. > - */ > + if (hv_root_partition) { > + /* > + * For root partition we get the hypervisor provided VP assist > + * page, instead of allocating a new page. > + */ > + rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); > + *hvp = memremap(msr.pfn << > HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT, > + PAGE_SIZE, MEMREMAP_WB); > + } else { > + /* > + * The VP assist page is an "overlay" page (see Hyper-V TLFS's > + * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed > + * out to make sure we always write the EOI MSR in > + * hv_apic_eoi_write() *after* the EOI optimization is disabled > + * in hv_cpu_die(), otherwise a CPU may not be stopped in the > + * case of CPU offlining and the VM will hang. > + */ > + if (!*hvp) > *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO); > - if (*hvp) > - msr.pfn = vmalloc_to_pfn(*hvp); > - } > - WARN_ON(!(*hvp)); > - if (*hvp) { > - msr.enable = 1; > - wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); > - } > + if (*hvp) > + msr.pfn = vmalloc_to_pfn(*hvp); > + > + } > + if (!WARN_ON(!(*hvp))) { > + msr.enable = 1; > + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); > } > > return hyperv_init_ghcb(); > -- > 2.38.1 Reviewed-by: Michael Kelley <mikelley@microsoft.com>
On Thu, Nov 03, 2022 at 08:06:01PM +0100, Vitaly Kuznetsov wrote: > Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing > to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under > 'if (*hvp)' condition. This works for root partition as hv_cpu_die() > does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks > non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]' > for them. This causes VP assist page to remain unset after CPU > offline/online cycle: > > $ rdmsr -p 24 0x40000073 > 10212f001 > $ echo 0 > /sys/devices/system/cpu/cpu24/online > $ echo 1 > /sys/devices/system/cpu/cpu24/online > $ rdmsr -p 24 0x40000073 > 0 > > Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in > hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is > pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to > NULL (and it's also NULL initially). > > Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may > present a (potential) issue for KVM. While Hyper-V uses > CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses CPUHP_AP_KVM_STARTING > which comes earlier in CPU teardown sequence. It is theoretically > possible that Enlightened VMCS is still in use. It is unclear if the > issue is real and if using KVM with Hyper-V root partition is even > possible. > > While on it, drop the unneeded smp_processor_id() call from hv_cpu_init(). > > Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist page MSR") > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Applied to hyperv-fixes. Thanks.
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index f49bc3ec76e6..a269049a43ce 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void) static int hv_cpu_init(unsigned int cpu) { union hv_vp_assist_msr_contents msr = { 0 }; - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; int ret; ret = hv_common_cpu_init(cpu); @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu) if (!hv_vp_assist_page) return 0; - if (!*hvp) { - if (hv_root_partition) { - /* - * For root partition we get the hypervisor provided VP assist - * page, instead of allocating a new page. - */ - rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); - *hvp = memremap(msr.pfn << - HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT, - PAGE_SIZE, MEMREMAP_WB); - } else { - /* - * The VP assist page is an "overlay" page (see Hyper-V TLFS's - * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed - * out to make sure we always write the EOI MSR in - * hv_apic_eoi_write() *after* the EOI optimization is disabled - * in hv_cpu_die(), otherwise a CPU may not be stopped in the - * case of CPU offlining and the VM will hang. - */ + if (hv_root_partition) { + /* + * For root partition we get the hypervisor provided VP assist + * page, instead of allocating a new page. + */ + rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); + *hvp = memremap(msr.pfn << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT, + PAGE_SIZE, MEMREMAP_WB); + } else { + /* + * The VP assist page is an "overlay" page (see Hyper-V TLFS's + * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed + * out to make sure we always write the EOI MSR in + * hv_apic_eoi_write() *after* the EOI optimization is disabled + * in hv_cpu_die(), otherwise a CPU may not be stopped in the + * case of CPU offlining and the VM will hang. + */ + if (!*hvp) *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO); - if (*hvp) - msr.pfn = vmalloc_to_pfn(*hvp); - } - WARN_ON(!(*hvp)); - if (*hvp) { - msr.enable = 1; - wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); - } + if (*hvp) + msr.pfn = vmalloc_to_pfn(*hvp); + + } + if (!WARN_ON(!(*hvp))) { + msr.enable = 1; + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64); } return hyperv_init_ghcb();