Message ID | 20231020151242.1814-5-kirill.shutemov@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp1125434vqb; Fri, 20 Oct 2023 08:13:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IESFQwdtxBKhSY70RGR1uVedW3/sfNY9AaclMx79lpUPwLogv96ZPJer22olDNNSHsJjCr8 X-Received: by 2002:a05:6a21:7783:b0:17a:7c2:d4a6 with SMTP id bd3-20020a056a21778300b0017a07c2d4a6mr2114042pzc.55.1697814802765; Fri, 20 Oct 2023 08:13:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697814802; cv=none; d=google.com; s=arc-20160816; b=XZHrP33HvVGxRB39Y1xm7uA8Z8QDLxW0m0mtd8PvGR7rBKsn8hD18sY6kLmAW3aiRE w6GA7z2yWQZbluPlS3i8TOIo7IERsNAymA/XJFro8wmN6hTET49n6Bumc2zdHUU+KIPe vfok5AwoVzA1xS7gspjJ/6pJw+g8rO5Pi6FYkGGRp/974kjmOBQTv4HV5udPIIH0xVeT bZME/v1RTaBHHZwjvJRDA+AredVvNFujhk6+DHPlXTxFJFIZlibpO+3QCWKdXWf231Mj dUMAOiOcGr1KoLJf5P5N8ZpULf3EQYxBnH7BgGxdGF/uNR8kUr6Smr1paGlBLWqvEmlw gb/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=iw6JvSvu16kdVfUPtZTMZ6dHhC4KaBhFUkCnK0C19pc=; fh=zgww269bCnQ8VQepSkUb3i3YdaPqj+fISfghODbGw6M=; b=Ya5Jy2c+4X68maz1e2wOtTFZ+O/7EtRDPGfjSzK5hkel1Ol4g3UfLM+uX0ZmtD7Wfm GWUtpw2qstVXIG7LrscPnDHDq3mRHsCqYoXTz+wRaktmgq2Czgaylicw9xCt19MhaCYz dfY7bmrKLOD2I4kcb/DQw1jT3hyvfI5ulNxvBYZJ5RKhj8EEgpJAurvBF8kKCAarIwlW p5bD5kdG8hym6x32nDHjdTFH7HNtpUa1g6OvkUW9QFEzOU4mCANq49ZCEjSHZ6ANXQy0 9+vbP1XR4lA77e4vAxK5mfvDDJj9PTgcC04H7qy2Mv4FrJbsuoMwpnUbS1+22mW335LX rGrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=M3wr1ALK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id ay8-20020a1709028b8800b001c0eefc0dfesi1910705plb.130.2023.10.20.08.13.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 08:13:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=M3wr1ALK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 2DDFD83FA177; Fri, 20 Oct 2023 08:13:19 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377675AbjJTPNM (ORCPT <rfc822;a1648639935@gmail.com> + 26 others); Fri, 20 Oct 2023 11:13:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377653AbjJTPNC (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 20 Oct 2023 11:13:02 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75FE9D6B for <linux-kernel@vger.kernel.org>; Fri, 20 Oct 2023 08:12:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697814780; x=1729350780; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MPIjCtbNsICrtwqXzYBNDwDeX3sLS3gFAi6SoZOpAO8=; b=M3wr1ALKLBSje8Ikwl/9PKg8FPHyHjLUkUIp+ZmoLiszw/O/SRRBChfL h15XxMovWnD04y2yg/laqcGtYe9Zb1jLuIR2LAl6aIYNsw6MdyrdA50G6 R2dqBQIHmAd5Z5Rj4etimVSYHVwLcwR9/Jo4oK3A1/n7uKtZjBlqBQKwT b8JAJwDUZ3GVa0VUi3L3nZbHvIWPog/fvzcz906FvyCQ8K5hUPNsulk7n EyyrWxVr3mHZxx7wNDr6mZY6gYJpJ/n6AaQjLfbovPHsBaC2BF4e4/hxB CuZqNDoVdFCpypQVLh3qdiHJkWoHjdMHWCYzALwNuPkjsFLvKhzlcyij4 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="376893621" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="376893621" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:12:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="761080250" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="761080250" Received: from dgutows1-mobl.ger.corp.intel.com (HELO box.shutemov.name) ([10.249.39.237]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 08:12:48 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id 10D5D10A296; Fri, 20 Oct 2023 18:12:45 +0300 (+03) From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org Cc: "Rafael J. Wysocki" <rafael@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Adrian Hunter <adrian.hunter@intel.com>, Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, Elena Reshetova <elena.reshetova@intel.com>, Jun Nakajima <jun.nakajima@intel.com>, Rick Edgecombe <rick.p.edgecombe@intel.com>, Tom Lendacky <thomas.lendacky@amd.com>, "Kalra, Ashish" <ashish.kalra@amd.com>, Sean Christopherson <seanjc@google.com>, "Huang, Kai" <kai.huang@intel.com>, Baoquan He <bhe@redhat.com>, kexec@lists.infradead.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Paolo Bonzini <pbonzini@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Vitaly Kuznetsov <vkuznets@redhat.com> Subject: [PATCHv2 04/13] x86/kvm: Do not try to disable kvmclock if it was not enabled Date: Fri, 20 Oct 2023 18:12:33 +0300 Message-ID: <20231020151242.1814-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231020151242.1814-1-kirill.shutemov@linux.intel.com> References: <20231020151242.1814-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Fri, 20 Oct 2023 08:13:19 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780287854596128478 X-GMAIL-MSGID: 1780287854596128478 |
Series |
x86/tdx: Add kexec support
|
|
Commit Message
Kirill A. Shutemov
Oct. 20, 2023, 3:12 p.m. UTC
kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
present in the VM. It leads to write to a MSR that doesn't exist on some
configurations, namely in TDX guest:
unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30)
kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
features.
Do not disable kvmclock if it was not enabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
---
arch/x86/kernel/kvmclock.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
Comments
On Fri, Oct 20, 2023, Kirill A. Shutemov wrote: > kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is > present in the VM. It leads to write to a MSR that doesn't exist on some > configurations, namely in TDX guest: > > unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000) > at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30) > > kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt > features. > > Do not disable kvmclock if it was not enabled. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown") > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Wanpeng Li <wanpengli@tencent.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: Sean Christopherson <seanjc@google.com> > --- Reviewed-by: Sean Christopherson <seanjc@google.com>
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is > present in the VM. It leads to write to a MSR that doesn't exist on some > configurations, namely in TDX guest: > > unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000) > at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30) > > kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt > features. > > Do not disable kvmclock if it was not enabled. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown") > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Wanpeng Li <wanpengli@tencent.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kernel/kvmclock.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index fb8f52149be9..f2fff625576d 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -24,8 +24,8 @@ > > static int kvmclock __initdata = 1; > static int kvmclock_vsyscall __initdata = 1; > -static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME; > -static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK; > +static int msr_kvm_system_time __ro_after_init; > +static int msr_kvm_wall_clock __ro_after_init; > static u64 kvm_sched_clock_offset __ro_after_init; > > static int __init parse_no_kvmclock(char *arg) > @@ -195,7 +195,8 @@ static void kvm_setup_secondary_clock(void) > > void kvmclock_disable(void) > { > - native_write_msr(msr_kvm_system_time, 0, 0); > + if (msr_kvm_system_time) > + native_write_msr(msr_kvm_system_time, 0, 0); > } > > static void __init kvmclock_init_mem(void) > @@ -294,7 +295,10 @@ void __init kvmclock_init(void) > if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { > msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW; > msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW; > - } else if (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { > + } else if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { > + msr_kvm_system_time = MSR_KVM_SYSTEM_TIME; > + msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK; > + } else { > return; > } This should work, so Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> but my personal preference would be to change kvm_guest_cpu_offline() to check KVM features explicitly instead of checking MSRs against '0' at least becase it already does so for other features. Completely untested: diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index b8ab9ee5896c..1ee49c98e70a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -454,7 +454,9 @@ static void kvm_guest_cpu_offline(bool shutdown) kvm_pv_disable_apf(); if (!shutdown) apf_task_wake_all(); - kvmclock_disable(); + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2) || + kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) + kvmclock_disable(); }
On Fri, Oct 20, 2023, Vitaly Kuznetsov wrote: > > --- > > arch/x86/kernel/kvmclock.c | 12 ++++++++---- > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > > index fb8f52149be9..f2fff625576d 100644 > > --- a/arch/x86/kernel/kvmclock.c > > +++ b/arch/x86/kernel/kvmclock.c > > @@ -24,8 +24,8 @@ > > > > static int kvmclock __initdata = 1; > > static int kvmclock_vsyscall __initdata = 1; > > -static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME; > > -static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK; > > +static int msr_kvm_system_time __ro_after_init; > > +static int msr_kvm_wall_clock __ro_after_init; > > static u64 kvm_sched_clock_offset __ro_after_init; > > > > static int __init parse_no_kvmclock(char *arg) > > @@ -195,7 +195,8 @@ static void kvm_setup_secondary_clock(void) > > > > void kvmclock_disable(void) > > { > > - native_write_msr(msr_kvm_system_time, 0, 0); > > + if (msr_kvm_system_time) > > + native_write_msr(msr_kvm_system_time, 0, 0); > > } > > > > static void __init kvmclock_init_mem(void) > > @@ -294,7 +295,10 @@ void __init kvmclock_init(void) > > if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { > > msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW; > > msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW; > > - } else if (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { > > + } else if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { > > + msr_kvm_system_time = MSR_KVM_SYSTEM_TIME; > > + msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK; > > + } else { > > return; > > } > > This should work, so > > Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> > > but my personal preference would be to change kvm_guest_cpu_offline() > to check KVM features explicitly instead of checking MSRs against '0' > at least becase it already does so for other features. Completely > untested: > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index b8ab9ee5896c..1ee49c98e70a 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -454,7 +454,9 @@ static void kvm_guest_cpu_offline(bool shutdown) > kvm_pv_disable_apf(); > if (!shutdown) > apf_task_wake_all(); > - kvmclock_disable(); > + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2) || > + kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) > + kvmclock_disable(); > } That would result in an unnecessray WRMSR in the case where kvmclock is disabled on the command line. It _should_ be benign given how the code is written, but it's not impossible to imagine a scenario where someone disabled kvmclock in the guest because of a hypervisor bug. And the WRMSR would become a bogus write to MSR 0x0 if someone made a "cleanup" to set msr_kvm_system_time if and only if kvmclock is actually used, e.g. if someone made Kirill's change sans the check in kvmclock_disable().
Sean Christopherson <seanjc@google.com> writes: > On Fri, Oct 20, 2023, Vitaly Kuznetsov wrote: >> > --- >> > arch/x86/kernel/kvmclock.c | 12 ++++++++---- >> > 1 file changed, 8 insertions(+), 4 deletions(-) >> > >> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c >> > index fb8f52149be9..f2fff625576d 100644 >> > --- a/arch/x86/kernel/kvmclock.c >> > +++ b/arch/x86/kernel/kvmclock.c >> > @@ -24,8 +24,8 @@ >> > >> > static int kvmclock __initdata = 1; >> > static int kvmclock_vsyscall __initdata = 1; >> > -static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME; >> > -static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK; >> > +static int msr_kvm_system_time __ro_after_init; >> > +static int msr_kvm_wall_clock __ro_after_init; >> > static u64 kvm_sched_clock_offset __ro_after_init; >> > >> > static int __init parse_no_kvmclock(char *arg) >> > @@ -195,7 +195,8 @@ static void kvm_setup_secondary_clock(void) >> > >> > void kvmclock_disable(void) >> > { >> > - native_write_msr(msr_kvm_system_time, 0, 0); >> > + if (msr_kvm_system_time) >> > + native_write_msr(msr_kvm_system_time, 0, 0); >> > } >> > >> > static void __init kvmclock_init_mem(void) >> > @@ -294,7 +295,10 @@ void __init kvmclock_init(void) >> > if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { >> > msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW; >> > msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW; >> > - } else if (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { >> > + } else if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { >> > + msr_kvm_system_time = MSR_KVM_SYSTEM_TIME; >> > + msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK; >> > + } else { >> > return; >> > } >> >> This should work, so >> >> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> >> >> but my personal preference would be to change kvm_guest_cpu_offline() >> to check KVM features explicitly instead of checking MSRs against '0' >> at least becase it already does so for other features. Completely >> untested: >> >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >> index b8ab9ee5896c..1ee49c98e70a 100644 >> --- a/arch/x86/kernel/kvm.c >> +++ b/arch/x86/kernel/kvm.c >> @@ -454,7 +454,9 @@ static void kvm_guest_cpu_offline(bool shutdown) >> kvm_pv_disable_apf(); >> if (!shutdown) >> apf_task_wake_all(); >> - kvmclock_disable(); >> + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2) || >> + kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) >> + kvmclock_disable(); >> } > > That would result in an unnecessray WRMSR in the case where kvmclock is disabled > on the command line. It _should_ be benign given how the code is written, but > it's not impossible to imagine a scenario where someone disabled kvmclock in the > guest because of a hypervisor bug. And the WRMSR would become a bogus write to > MSR 0x0 if someone made a "cleanup" to set msr_kvm_system_time if and only if > kvmclock is actually used, e.g. if someone made Kirill's change sans the check in > kvmclock_disable(). True but we don't have such module params to disable other PV features so e.g. KVM_FEATURE_PV_EOI/KVM_FEATURE_MIGRATION_CONTROL are written to unconditionally. Wouldn't it be better to handle parameters like 'no-kvmclock' by clearing the feature bit in kvm_arch_para_features()'s return value so all kvm_para_has_feature() calls for it just return 'false'? We can even do an umbreall "no-kvm-features=<mask>" to cover all possible debug cases.
On Mon, Oct 23, 2023, Vitaly Kuznetsov wrote: > Sean Christopherson <seanjc@google.com> writes: > > > On Fri, Oct 20, 2023, Vitaly Kuznetsov wrote: > >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > >> index b8ab9ee5896c..1ee49c98e70a 100644 > >> --- a/arch/x86/kernel/kvm.c > >> +++ b/arch/x86/kernel/kvm.c > >> @@ -454,7 +454,9 @@ static void kvm_guest_cpu_offline(bool shutdown) > >> kvm_pv_disable_apf(); > >> if (!shutdown) > >> apf_task_wake_all(); > >> - kvmclock_disable(); > >> + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2) || > >> + kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) > >> + kvmclock_disable(); > >> } > > > > That would result in an unnecessray WRMSR in the case where kvmclock is disabled > > on the command line. It _should_ be benign given how the code is written, but > > it's not impossible to imagine a scenario where someone disabled kvmclock in the > > guest because of a hypervisor bug. And the WRMSR would become a bogus write to > > MSR 0x0 if someone made a "cleanup" to set msr_kvm_system_time if and only if > > kvmclock is actually used, e.g. if someone made Kirill's change sans the check in > > kvmclock_disable(). > > True but we don't have such module params to disable other PV features so > e.g. KVM_FEATURE_PV_EOI/KVM_FEATURE_MIGRATION_CONTROL are written to > unconditionally. Wouldn't it be better to handle parameters like > 'no-kvmclock' by clearing the feature bit in kvm_arch_para_features()'s > return value so all kvm_para_has_feature() calls for it just return > 'false'? We can even do an umbreall "no-kvm-features=<mask>" to cover > all possible debug cases. I don't know that it's worth the effort, or that it'd even be a net positive. Today, kvm_para_has_feature() goes through to CPUID every time, e.g. we'd have to add a small bit of infrastructure to snapshot and clear bits, or rework things to let kvm_para_has_feature peek at kvmclock. And things like KVM_FEATURE_PV_TLB_FLUSH would be quite weird, e.g. we either end up leaving the feature bit set while returning "false" for pv_tlb_flush_supported(), or we'd clear the feature bit for a rather large number of conditions that don't really have anything to do with KVM_FEATURE_PV_TLB_FLUSH being available. static bool pv_tlb_flush_supported(void) { return (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && !kvm_para_has_hint(KVM_HINTS_REALTIME) && kvm_para_has_feature(KVM_FEATURE_STEAL_TIME) && !boot_cpu_has(X86_FEATURE_MWAIT) && (num_possible_cpus() != 1)); }
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index fb8f52149be9..f2fff625576d 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -24,8 +24,8 @@ static int kvmclock __initdata = 1; static int kvmclock_vsyscall __initdata = 1; -static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME; -static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK; +static int msr_kvm_system_time __ro_after_init; +static int msr_kvm_wall_clock __ro_after_init; static u64 kvm_sched_clock_offset __ro_after_init; static int __init parse_no_kvmclock(char *arg) @@ -195,7 +195,8 @@ static void kvm_setup_secondary_clock(void) void kvmclock_disable(void) { - native_write_msr(msr_kvm_system_time, 0, 0); + if (msr_kvm_system_time) + native_write_msr(msr_kvm_system_time, 0, 0); } static void __init kvmclock_init_mem(void) @@ -294,7 +295,10 @@ void __init kvmclock_init(void) if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW; msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW; - } else if (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { + } else if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { + msr_kvm_system_time = MSR_KVM_SYSTEM_TIME; + msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK; + } else { return; }