[RFC] x86: override prefer_mwait_c1_over_halt to avoid loading cpuidle-haltpoll driver

Message ID 1669952252-32710-1-git-send-email-lirongqing@baidu.com
State New
Headers
Series [RFC] x86: override prefer_mwait_c1_over_halt to avoid loading cpuidle-haltpoll driver |

Commit Message

Li,Rongqing Dec. 2, 2022, 3:37 a.m. UTC
  From: Li RongQing <lirongqing@baidu.com>

x86 KVM guests with MWAIT can load cpuidle-haltpoll driver, and will
cause performance degradation, so override prefer_mwait_c1_over_halt
to a new value, aviod loading cpuidle-haltpoll driver

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 arch/x86/include/asm/processor.h | 2 +-
 arch/x86/kernel/process.c        | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
  

Comments

Thomas Gleixner Dec. 2, 2022, 6:48 p.m. UTC | #1
Li!

On Fri, Dec 02 2022 at 11:37, lirongqing@baidu.com wrote:
> From: Li RongQing <lirongqing@baidu.com>
>
> x86 KVM guests with MWAIT can load cpuidle-haltpoll driver, and will
> cause performance degradation, so override prefer_mwait_c1_over_halt
> to a new value, aviod loading cpuidle-haltpoll driver

Neither the subject line nor the above makes any sense to me.

prefer_mwait_c1_over_halt() is a function which is invoked and when it
returns true then the execution ends up in the code path you are
patching.

> @@ -889,6 +889,7 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
>  	} else if (prefer_mwait_c1_over_halt(c)) {
>  		pr_info("using mwait in idle threads\n");
>  		x86_idle = mwait_idle;
> +		boot_option_idle_override = IDLE_PREF_MWAIT;

What you do is setting boot_option_idle_override to a new value, but
that has nothing to do with prefer_mwait_c1_over_halt() at all.

So how are you overriding that function to a new value?

But that's just a word smithing problem.

The real and way worse problem is that you pick a variable, which has
the purpose to capture the idle override on the kernel command line, and
modify it as you see fit, just to prevent that driver from loading.

select_idle_routine() reads it to check whether there was a command line
override or not. But it is not supposed to write it. Why?

Have you checked what else evaluates that variable? Obviously not,
because a simple grep would have told you:

drivers/cpuidle/cpuidle-haltpoll.c:     if (boot_option_idle_override != IDLE_NO_OVERRIDE)
drivers/idle/intel_idle.c:      if (boot_option_idle_override != IDLE_NO_OVERRIDE)

Congratulations!

Your patch breaks the default setup of every recent Intel system on the
planet because it not only prevents the cpuidle-haltpoll, but also the
intel_idle driver from loading.

Seriously. It's not too much asked to do at least a simple grep and look
at all _nine_ places which evaluate boot_option_idle_override.

Thanks,

      tglx
  
Li,Rongqing Dec. 4, 2022, 11:31 a.m. UTC | #2
> -----Original Message-----
> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Saturday, December 3, 2022 2:48 AM
> To: Li,Rongqing <lirongqing@baidu.com>; mingo@redhat.com; bp@alien8.de;
> dave.hansen@linux.intel.com; x86@kernel.org; peterz@infradead.org;
> tony.luck@intel.com; wyes.karny@amd.com; linux-kernel@vger.kernel.org;
> rafael.j.wysocki@intel.com
> Subject: Re: [PATCH][RFC] x86: override prefer_mwait_c1_over_halt to avoid
> loading cpuidle-haltpoll driver
> 
> Li!
> 
> On Fri, Dec 02 2022 at 11:37, lirongqing@baidu.com wrote:
> > From: Li RongQing <lirongqing@baidu.com>
> >
> > x86 KVM guests with MWAIT can load cpuidle-haltpoll driver, and will
> > cause performance degradation, so override prefer_mwait_c1_over_halt
> > to a new value, aviod loading cpuidle-haltpoll driver
> 
> Neither the subject line nor the above makes any sense to me.
> 
> prefer_mwait_c1_over_halt() is a function which is invoked and when it returns
> true then the execution ends up in the code path you are patching.
> 
> > @@ -889,6 +889,7 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
> >  	} else if (prefer_mwait_c1_over_halt(c)) {
> >  		pr_info("using mwait in idle threads\n");
> >  		x86_idle = mwait_idle;
> > +		boot_option_idle_override = IDLE_PREF_MWAIT;
> 
> What you do is setting boot_option_idle_override to a new value, but that has
> nothing to do with prefer_mwait_c1_over_halt() at all.
> 
> So how are you overriding that function to a new value?
> 
> But that's just a word smithing problem.
> 
> The real and way worse problem is that you pick a variable, which has the
> purpose to capture the idle override on the kernel command line, and modify it
> as you see fit, just to prevent that driver from loading.
> 
> select_idle_routine() reads it to check whether there was a command line
> override or not. But it is not supposed to write it. Why?
> 
> Have you checked what else evaluates that variable? Obviously not, because a
> simple grep would have told you:
> 
> drivers/cpuidle/cpuidle-haltpoll.c:     if (boot_option_idle_override !=
> IDLE_NO_OVERRIDE)
> drivers/idle/intel_idle.c:      if (boot_option_idle_override !=
> IDLE_NO_OVERRIDE)
> 
> Congratulations!
> 
> Your patch breaks the default setup of every recent Intel system on the planet
> because it not only prevents the cpuidle-haltpoll, but also the intel_idle driver
> from loading.
> 
> Seriously. It's not too much asked to do at least a simple grep and look at all
> _nine_ places which evaluate boot_option_idle_override.
> 

Sorry for the careless

Thanks for the review, I will send a new version, which export a function to tell haltpoll driver whether or not mwait is used, like below


diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 67c9d73..159ef33 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -862,4 +862,6 @@ bool arch_is_platform_page(u64 paddr);
 #define arch_is_platform_page arch_is_platform_page
 #endif

+bool is_mwait_idle(void);
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c21b734..330972c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -896,6 +896,12 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
                x86_idle = default_idle;
 }

+bool is_mwait_idle(void)
+{
+       return x86_idle == mwait_idle;
+}
+EXPORT_SYMBOL_GPL(is_mwait_idle);
+
 void amd_e400_c1e_apic_setup(void)
 {
        if (boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E)) {
diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c
index 3a39a7f..8cf1ddf 100644
--- a/drivers/cpuidle/cpuidle-haltpoll.c
+++ b/drivers/cpuidle/cpuidle-haltpoll.c
@@ -17,6 +17,7 @@
 #include <linux/sched/idle.h>
 #include <linux/kvm_para.h>
 #include <linux/cpuidle_haltpoll.h>
+#include <linux/processor.h>

 static bool force __read_mostly;
 module_param(force, bool, 0444);
@@ -111,6 +112,9 @@ static int __init haltpoll_init(void)
        if (!kvm_para_available() || !haltpoll_want())
                return -ENODEV;

+       if (is_mwait_idle())
+               return -ENODEV;
+
        cpuidle_poll_state_init(drv);

        ret = cpuidle_register_driver(drv);



Thanks

-Li
  

Patch

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 67c9d73..6bc94fd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -658,7 +658,7 @@  extern void amd_e400_c1e_apic_setup(void);
 extern unsigned long		boot_option_idle_override;
 
 enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT,
-			 IDLE_POLL};
+			 IDLE_POLL, IDLE_PREF_MWAIT};
 
 extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c21b734..a16985c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -889,6 +889,7 @@  void select_idle_routine(const struct cpuinfo_x86 *c)
 	} else if (prefer_mwait_c1_over_halt(c)) {
 		pr_info("using mwait in idle threads\n");
 		x86_idle = mwait_idle;
+		boot_option_idle_override = IDLE_PREF_MWAIT;
 	} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
 		pr_info("using TDX aware idle routine\n");
 		x86_idle = tdx_safe_halt;