[V2] x86/split_lock: Add sysctl to control the misery mode

Message ID 20221014180506.211592-1-gpiccoli@igalia.com
State New
Headers
Series [V2] x86/split_lock: Add sysctl to control the misery mode |

Commit Message

Guilherme G. Piccoli Oct. 14, 2022, 6:05 p.m. UTC
  Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, not only it shows the warn message, but also intentionally
introduces a slowdown (through sleeping plus serialization mechanism)
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.

Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slowdowns the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While is understandable that a malicious application could try a DoS
by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].

So let's add a sysctl to allow controlling the "misery mode" behavior,
as per Thomas suggestion on [2]. This way, users running legacy and/or
proprietary software are allowed to still execute them with a decent
performance while still observe the warning messages on kernel log.

[0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/

[1] https://github.com/doitsujin/dxvk/issues/2938

[2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/

Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
Cc: Andre Almeida <andrealmeid@igalia.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Paul Gofman <pgofman@codeweavers.com>
Cc: Pavel Machek <pavel@denx.de>
Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Zebediah Figura <zfigura@codeweavers.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---


V2:
- Switched to sysctl approach following Thomas' suggestion (thanks!).

Andre tested the patch and will comment in this thread - seems everything is
working as expected and we can enable/disable that, affecting the misery mode
as one expects.

I've tried to keep the semaphore's up()/down() calls in-sync/paired, hence
my approach of two delayed tasks, with and without misery.

Reviews / comments are greatly appreciated.
Thanks,


Guilherme


 Documentation/admin-guide/sysctl/kernel.rst | 18 ++++++
 arch/x86/kernel/cpu/intel.c                 | 61 +++++++++++++++++----
 2 files changed, 69 insertions(+), 10 deletions(-)
  

Comments

André Almeida Oct. 14, 2022, 6:17 p.m. UTC | #1
Hi Guilherme,

On 10/14/22 15:05, Guilherme G. Piccoli wrote:
> Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
> changed the way the split lock detector works when in "warn" mode;
> basically, not only it shows the warn message, but also intentionally
> introduces a slowdown (through sleeping plus serialization mechanism)
> on such task. Based on discussions in [0], seems the warning alone
> wasn't enough motivation for userspace developers to fix their
> applications.
> 
> Happens that originally the proposal in [0] was to add a new mode
> which would warns + slowdown the "split locking" task, keeping the
> old warn mode untouched. In the end, that idea was discarded and
> the regular/default "warn" mode now slowdowns the applications. This
> is quite aggressive with regards proprietary/legacy programs that
> basically are unable to properly run in kernel with this change.
> While is understandable that a malicious application could try a DoS
> by split locking, it seems unacceptable to regress old/proprietary
> userspace programs through a default configuration that previously
> worked. An example of such breakage was reported in [1].
> 
> So let's add a sysctl to allow controlling the "misery mode" behavior,
> as per Thomas suggestion on [2]. This way, users running legacy and/or
> proprietary software are allowed to still execute them with a decent
> performance while still observe the warning messages on kernel log.
> 
> [0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/
> 
> [1] https://github.com/doitsujin/dxvk/issues/2938
> 
> [2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/
> 
> Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
> Cc: Andre Almeida <andrealmeid@igalia.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Paul Gofman <pgofman@codeweavers.com>
> Cc: Pavel Machek <pavel@denx.de>
> Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Zebediah Figura <zfigura@codeweavers.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

Tested-by: André Almeida <andrealmeid@igalia.com>

Tested on an Intel i7-1165G7 using a small benchmarking script, the 
behavior is effectively reverted when using the sysctl option.

Also, you might want to document this option somewhere? Maybe at 
Documentation/admin-guide/sysctl/kernel.rst
  
Guilherme G. Piccoli Oct. 14, 2022, 6:20 p.m. UTC | #2
On 14/10/2022 15:17, André Almeida wrote:
> [...]
> 
> Tested-by: André Almeida <andrealmeid@igalia.com>
> 
> Tested on an Intel i7-1165G7 using a small benchmarking script, the 
> behavior is effectively reverted when using the sysctl option.
> 
> Also, you might want to document this option somewhere? Maybe at 
> Documentation/admin-guide/sysctl/kernel.rst

Thanks André!

About documenting, it seems it's already done in the patch, right?
Cheers,


Guilherme
  
André Almeida Oct. 14, 2022, 6:22 p.m. UTC | #3
On 10/14/22 15:20, Guilherme G. Piccoli wrote:
> On 14/10/2022 15:17, André Almeida wrote:
>> [...]
>>
>> Tested-by: André Almeida <andrealmeid@igalia.com>
>>
>> Tested on an Intel i7-1165G7 using a small benchmarking script, the
>> behavior is effectively reverted when using the sysctl option.
>>
>> Also, you might want to document this option somewhere? Maybe at
>> Documentation/admin-guide/sysctl/kernel.rst
> 
> Thanks André!
> 
> About documenting, it seems it's already done in the patch, right?
> Cheers,


oops, my bad :)
  
Luck, Tony Oct. 14, 2022, 6:26 p.m. UTC | #4
Looks reasonable.

Are these games multi-threaded with split locks happening on multiple CPUs in parallel?
If they are, then skipping both the 10ms delay and the serialization is needed.

But if split locks are only from one CPU at a time, then possibly it would have
been enough to just have this mitigation skip the:

+		if (msleep_interruptible(10) > 0)
+			return;

Maybe best not to second guess. You have left the default as "mitigation on",
so I'm happy.

-Tony
  
Guilherme G. Piccoli Oct. 15, 2022, 12:19 a.m. UTC | #5
On 14/10/2022 15:26, Luck, Tony wrote:
> Looks reasonable.
> 
> Are these games multi-threaded with split locks happening on multiple CPUs in parallel?
> If they are, then skipping both the 10ms delay and the serialization is needed.
> 
> But if split locks are only from one CPU at a time, then possibly it would have
> been enough to just have this mitigation skip the:
> 
> +		if (msleep_interruptible(10) > 0)
> +			return;
> 
> Maybe best not to second guess. You have left the default as "mitigation on",
> so I'm happy.
> 
> -Tony

Hi Tony, thanks for your review!

Some games are indeed multi-threaded, so as you said, I think it's
better if we skip the whole thing when the sysctl is off - a kind of
fallback to the old "warn" mode before the misery was added heh

Cheers,


Guilherme
  
Bagas Sanjaya Oct. 16, 2022, 3 a.m. UTC | #6
On Fri, Oct 14, 2022 at 03:05:06PM -0300, Guilherme G. Piccoli wrote:
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index ee6572b1edad..508952e42914 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1298,6 +1298,24 @@ watchdog work to be queued by the watchdog timer function, otherwise the NMI
>  watchdog — if enabled — can detect a hard lockup condition.
>  
>  
> +split_lock_mitigate (x86 only)
> +=============
> +

The heading underline above is too short (doesn't cover the whole text
length), so I have applied the fixup:

---- >8 ----

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index c733d424d4e830..4824cfed71ab31 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1315,7 +1315,7 @@ watchdog — if enabled — can detect a hard lockup condition.
 
 
 split_lock_mitigate (x86 only)
-=============
+==============================
 
 For x86 CPUs supporting the split lock detection mechanism, this parameter
 allows the users to turn off what is called "the misery mode", which

> +For x86 CPUs supporting the split lock detection mechanism, this parameter
> +allows the users to turn off what is called "the misery mode", which
> +introduces intentional delay in userspace applications that split locks.
> +The goal of the misery mode is to prevent using such unaligned access to
> +DoS the system dropping the performance overall, but some of these split
> +locking programs are legacy and/or proprietary software that cannot be fixed,
> +so using this sysctl is a way to allow them to run with a decent performance.
> +
> += ===================================================================
> +0 Disables the misery mode - just warns the split lock on kernel log.
> +1 Enables the misery mode (this is the default) - penalizes the split
> +  lockers with intentional performance degradation.
> += ===================================================================
> +
> +
>  stack_erasing
>  =============
>  

The wording can be improved:

---- >8 ----

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 4824cfed71ab31..961c19f4beae51 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1320,15 +1320,15 @@ split_lock_mitigate (x86 only)
 For x86 CPUs supporting the split lock detection mechanism, this parameter
 allows the users to turn off what is called "the misery mode", which
 introduces intentional delay in userspace applications that split locks.
-The goal of the misery mode is to prevent using such unaligned access to
-DoS the system dropping the performance overall, but some of these split
-locking programs are legacy and/or proprietary software that cannot be fixed,
-so using this sysctl is a way to allow them to run with a decent performance.
+The goal of this mode is to prevent using such unaligned access to
+drop the overall performance (and DoS the system). However, some of programs
+which uses split locking are legacy and/or proprietary software that cannot
+be fixed, so disabling this sysctl can allow them to run with a decent
+performance.
 
 = ===================================================================
 0 Disables the misery mode - just warns the split lock on kernel log.
-1 Enables the misery mode (this is the default) - penalizes the split
-  lockers with intentional performance degradation.
+1 Enables the misery mode (default)
 = ===================================================================
 
 
Thanks.
  
Bagas Sanjaya Oct. 16, 2022, 12:26 p.m. UTC | #7
On 10/16/22 10:00, Bagas Sanjaya wrote:
>  = ===================================================================
>  0 Disables the misery mode - just warns the split lock on kernel log.
> -1 Enables the misery mode (this is the default) - penalizes the split
> -  lockers with intentional performance degradation.
> +1 Enables the misery mode (default)
>  = ===================================================================
>  

Oops, on the table above, s/Disables/Disable/ and s/Enables/Enable/.
  
Guilherme G. Piccoli Oct. 17, 2022, 1:57 p.m. UTC | #8
Thank you Bagas! I see you fixed the documentation in more than one
place, appreciate that.

What's the next step then, re-submit with your fixes, or wait more
feedback perhaps?

If a maintainer plans to pick this one, maybe they can just apply your
fix-up on top of it.
Cheers,


Guilherme
  
Guilherme G. Piccoli Oct. 21, 2022, 4:56 p.m. UTC | #9
On 14/10/2022 15:05, Guilherme G. Piccoli wrote:
> Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
> changed the way the split lock detector works when in "warn" mode;
> basically, not only it shows the warn message, but also intentionally
> introduces a slowdown (through sleeping plus serialization mechanism)
> on such task. Based on discussions in [0], seems the warning alone
> wasn't enough motivation for userspace developers to fix their
> applications.
> 
> Happens that originally the proposal in [0] was to add a new mode
> which would warns + slowdown the "split locking" task, keeping the
> old warn mode untouched. In the end, that idea was discarded and
> the regular/default "warn" mode now slowdowns the applications. This
> is quite aggressive with regards proprietary/legacy programs that
> basically are unable to properly run in kernel with this change.
> While is understandable that a malicious application could try a DoS
> by split locking, it seems unacceptable to regress old/proprietary
> userspace programs through a default configuration that previously
> worked. An example of such breakage was reported in [1].
> 
> So let's add a sysctl to allow controlling the "misery mode" behavior,
> as per Thomas suggestion on [2]. This way, users running legacy and/or
> proprietary software are allowed to still execute them with a decent
> performance while still observe the warning messages on kernel log.
> 
> [0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/
> 
> [1] https://github.com/doitsujin/dxvk/issues/2938
> 
> [2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/
> 
> Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
> Cc: Andre Almeida <andrealmeid@igalia.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Paul Gofman <pgofman@codeweavers.com>
> Cc: Pavel Machek <pavel@denx.de>
> Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Zebediah Figura <zfigura@codeweavers.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---

Hi Dave / Thomas, do you think this version is good enough?

If so, would be possible to pick it still in the v6.1-rc cycle, since it
is a fix?

What about the documentation improvements from Bagas, should I re-send
(V3) with that, or would you pick them when merging?

Thanks in advance,


Guilherme
  
Dave Hansen Oct. 21, 2022, 5:27 p.m. UTC | #10
On 10/14/22 11:05, Guilherme G. Piccoli wrote:
> Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
> changed the way the split lock detector works when in "warn" mode;
> basically, not only it shows the warn message, but also intentionally
> introduces a slowdown (through sleeping plus serialization mechanism)
> on such task. Based on discussions in [0], seems the warning alone
> wasn't enough motivation for userspace developers to fix their
> applications.
> 
> Happens that originally the proposal in [0] was to add a new mode
> which would warns + slowdown the "split locking" task, keeping the
> old warn mode untouched. In the end, that idea was discarded and
> the regular/default "warn" mode now slowdowns the applications. This
> is quite aggressive with regards proprietary/legacy programs that
> basically are unable to properly run in kernel with this change.
> While is understandable that a malicious application could try a DoS
> by split locking, it seems unacceptable to regress old/proprietary
> userspace programs through a default configuration that previously
> worked. An example of such breakage was reported in [1].
> 
> So let's add a sysctl to allow controlling the "misery mode" behavior,
> as per Thomas suggestion on [2]. This way, users running legacy and/or
> proprietary software are allowed to still execute them with a decent
> performance while still observe the warning messages on kernel log.
> 
> [0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/
> 
> [1] https://github.com/doitsujin/dxvk/issues/2938
> 
> [2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/
> 
> Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
> Cc: Andre Almeida <andrealmeid@igalia.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Paul Gofman <pgofman@codeweavers.com>
> Cc: Pavel Machek <pavel@denx.de>
> Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Zebediah Figura <zfigura@codeweavers.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> 
> V2:
> - Switched to sysctl approach following Thomas' suggestion (thanks!).
> 
> Andre tested the patch and will comment in this thread - seems everything is
> working as expected and we can enable/disable that, affecting the misery mode
> as one expects.
> 
> I've tried to keep the semaphore's up()/down() calls in-sync/paired, hence
> my approach of two delayed tasks, with and without misery.
> 
> Reviews / comments are greatly appreciated.
> Thanks,
> 
> 
> Guilherme
> 
> 
>  Documentation/admin-guide/sysctl/kernel.rst | 18 ++++++
>  arch/x86/kernel/cpu/intel.c                 | 61 +++++++++++++++++----
>  2 files changed, 69 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index ee6572b1edad..508952e42914 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1298,6 +1298,24 @@ watchdog work to be queued by the watchdog timer function, otherwise the NMI
>  watchdog — if enabled — can detect a hard lockup condition.
>  
>  
> +split_lock_mitigate (x86 only)
> +=============
> +
> +For x86 CPUs supporting the split lock detection mechanism, this parameter
> +allows the users to turn off what is called "the misery mode", which
> +introduces intentional delay in userspace applications that split locks.
> +The goal of the misery mode is to prevent using such unaligned access to
> +DoS the system dropping the performance overall, but some of these split
> +locking programs are legacy and/or proprietary software that cannot be fixed,
> +so using this sysctl is a way to allow them to run with a decent performance.

I think this is missing a lot of context.  End users looking here won't
even know what a split lock *is*.  Please either refer over to the real
documentation on this issue or write a brief description about what's
going on.

How about this?

	On x86, each "split lock" imposes a system-wide performance
	penalty.  On larger systems, large numbers of split locks from
	unprivileged users can result in denials of service to well-
	behaved and potentially more important users.

	The kernel mitigates these bad users by detecting split locks
	and imposing penalties: forcing them to wait and only allowing
	one core to execute split locks at a time.

	These mitigations can make those bad applications unbearably
	slow.  Setting split_lock_mitigate=0 may restore some
	application performance, but will also increase system exposure
	to denial of service attacks from split lock users.

> += ===================================================================
> +0 Disables the misery mode - just warns the split lock on kernel log.

... and exposes the system to Denial-of-Service attacks.  That's an
awfully big side-effect to not mention.

> +1 Enables the misery mode (this is the default) - penalizes the split
> +  lockers with intentional performance degradation.
> += ===================================================================

As much as I love the misery terminology, let's try to use one term.
Let's either call it "misery" *or* "mitigations", not both.

> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 2d7ea5480ec3..2aacf9d6c723 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1034,8 +1034,32 @@ static const struct {
>  
>  static struct ratelimit_state bld_ratelimit;
>  
> +static unsigned int sysctl_sld_mitigate = 1;
>  static DEFINE_SEMAPHORE(buslock_sem);
>  
> +#ifdef CONFIG_PROC_SYSCTL
> +static struct ctl_table sld_sysctls[] = {
> +	{
> +		.procname       = "split_lock_mitigate",
> +		.data           = &sysctl_sld_mitigate,
> +		.maxlen         = sizeof(unsigned int),
> +		.mode           = 0644,
> +		.proc_handler	= proc_douintvec_minmax,
> +		.extra1         = SYSCTL_ZERO,
> +		.extra2         = SYSCTL_ONE,
> +	},
> +	{}
> +};
> +
> +static int __init sld_mitigate_sysctl_init(void)
> +{
> +	register_sysctl_init("kernel", sld_sysctls);
> +	return 0;
> +}
> +
> +late_initcall(sld_mitigate_sysctl_init);
> +#endif
> +
>  static inline bool match_option(const char *arg, int arglen, const char *opt)
>  {
>  	int len = strlen(opt), ratelimit;
> @@ -1146,11 +1170,18 @@ static void split_lock_init(void)
>  		split_lock_verify_msr(sld_state != sld_off);
>  }
>  
> -static void __split_lock_reenable(struct work_struct *work)
> +static void __split_lock_reenable_sem(struct work_struct *work)
>  {

"sem" is a pretty crummy name.  Wouldn't

	__split_lock_reenable_unlock()

be much more clear?

>  	sld_update_msr(true);
>  	up(&buslock_sem);
>  }
> +static DECLARE_DELAYED_WORK(split_lock_reenable_sem, __split_lock_reenable_sem);
> +
> +static void __split_lock_reenable(struct work_struct *work)
> +{
> +	sld_update_msr(true);
> +}
> +static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable);

Better yet, do you *really* need two functions and two
DECLARE_DELAYED_WORK()'s?

You could have a single delayed_work, and then just do:

static void split_lock_warn(unsigned long ip)
{
	bool need_release_sem = false;
	...

	if (down_interruptible(&buslock_sem) == -EINTR)
		return;
	need_release_sem = true;


Then, farther down, you do:

	split_lock_reenable->data = need_release_sem;
	schedule_delayed_work_on(cpu, &split_lock_reenable);

Then, in the work func:
	
	bool need_release_sem = work->data;

	if (need_release_sem)
		up(...);

That's nice and compact.  It's also logically easy to follow because you
can see how the need_release_sem gets set only after the
down_interruptible().  It's also nice to have both sites share the
'need_release_sem' naming for grepping.

>  /*
>   * If a CPU goes offline with pending delayed work to re-enable split lock
> @@ -1169,10 +1200,9 @@ static int splitlock_cpu_offline(unsigned int cpu)
>  	return 0;
>  }
>  
> -static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable);
> -
>  static void split_lock_warn(unsigned long ip)
>  {
> +	struct delayed_work *wk;

I think we can spare two bytes to make this "work".

>  	int cpu;
>  
>  	if (!current->reported_split_lock)
> @@ -1180,14 +1210,25 @@ static void split_lock_warn(unsigned long ip)
>  				    current->comm, current->pid, ip);
>  	current->reported_split_lock = 1;
>  
> -	/* misery factor #1, sleep 10ms before trying to execute split lock */
> -	if (msleep_interruptible(10) > 0)
> -		return;
> -	/* Misery factor #2, only allow one buslocked disabled core at a time */
> -	if (down_interruptible(&buslock_sem) == -EINTR)
> -		return;
> +	if (sysctl_sld_mitigate) {
> +		/*
> +		 * misery factor #1:
> +		 * sleep 10ms before trying to execute split lock.
> +		 */
> +		if (msleep_interruptible(10) > 0)
> +			return;
> +		/*
> +		 * Misery factor #2:
> +		 * only allow one buslocked disabled core at a time.
> +		 */
> +		wk = &split_lock_reenable_sem;
> +		if (down_interruptible(&buslock_sem) == -EINTR)
> +			return;

It's a little confusing to set:

	wk = &split_lock_reenable_sem;

and then not use it.

I'd probably set it below the lock check and return.

> +	} else
> +		wk = &split_lock_reenable;

Brackets, please:

	} else {
		wk = &split_lock_reenable;
	}

(if you keep this hunk).

>  	cpu = get_cpu();
> -	schedule_delayed_work_on(cpu, &split_lock_reenable, 2);
> +	schedule_delayed_work_on(cpu, wk, 2);
>  
>  	/* Disable split lock detection on this CPU to make progress */
>  	sld_update_msr(false);
  
Guilherme G. Piccoli Oct. 21, 2022, 7:03 p.m. UTC | #11
Hi Dave, thanks for the thorough review!
Comments inline below:

On 21/10/2022 14:27, Dave Hansen wrote:
> [...]
>> +For x86 CPUs supporting the split lock detection mechanism, this parameter
>> +allows the users to turn off what is called "the misery mode", which
>> +introduces intentional delay in userspace applications that split locks.
>> +The goal of the misery mode is to prevent using such unaligned access to
>> +DoS the system dropping the performance overall, but some of these split
>> +locking programs are legacy and/or proprietary software that cannot be fixed,
>> +so using this sysctl is a way to allow them to run with a decent performance.
> 
> I think this is missing a lot of context.  End users looking here won't
> even know what a split lock *is*.  Please either refer over to the real
> documentation on this issue or write a brief description about what's
> going on.
> 
> How about this?
> 
> 	On x86, each "split lock" imposes a system-wide performance
> 	penalty.  On larger systems, large numbers of split locks from
> 	unprivileged users can result in denials of service to well-
> 	behaved and potentially more important users.
> 
> 	The kernel mitigates these bad users by detecting split locks
> 	and imposing penalties: forcing them to wait and only allowing
> 	one core to execute split locks at a time.
> 
> 	These mitigations can make those bad applications unbearably
> 	slow.  Setting split_lock_mitigate=0 may restore some
> 	application performance, but will also increase system exposure
> 	to denial of service attacks from split lock users.
> 
>> += ===================================================================
>> +0 Disables the misery mode - just warns the split lock on kernel log.
> 
> ... and exposes the system to Denial-of-Service attacks.  That's an
> awfully big side-effect to not mention.
> 
>> +1 Enables the misery mode (this is the default) - penalizes the split
>> +  lockers with intentional performance degradation.
>> += ===================================================================
> 
> As much as I love the misery terminology, let's try to use one term.
> Let's either call it "misery" *or* "mitigations", not both.
> 

OK, regarding the documentation, I'll follow your suggestion in the V3,
good stuff.


>> [...]
>> -static void __split_lock_reenable(struct work_struct *work)
>> +static void __split_lock_reenable_sem(struct work_struct *work)
>>  {
> 
> "sem" is a pretty crummy name.  Wouldn't
> 
> 	__split_lock_reenable_unlock()
> 
> be much more clear?
> 

Agreed...


>> [...]
> Better yet, do you *really* need two functions and two
> DECLARE_DELAYED_WORK()'s?
> 
> You could have a single delayed_work, and then just do:
> 
> static void split_lock_warn(unsigned long ip)
> {
> 	bool need_release_sem = false;
> 	...
> 
> 	if (down_interruptible(&buslock_sem) == -EINTR)
> 		return;
> 	need_release_sem = true;
> 
> 
> Then, farther down, you do:
> 
> 	split_lock_reenable->data = need_release_sem;
> 	schedule_delayed_work_on(cpu, &split_lock_reenable);
> 
> Then, in the work func:
> 	
> 	bool need_release_sem = work->data;
> 
> 	if (need_release_sem)
> 		up(...);
> 
> That's nice and compact.  It's also logically easy to follow because you
> can see how the need_release_sem gets set only after the
> down_interruptible().  It's also nice to have both sites share the
> 'need_release_sem' naming for grepping.
> 

...but, this is a very good suggestion, and will eliminate the need for
two delayed_works, right?


>> [...]

>> +	struct delayed_work *wk;
> 
> I think we can spare two bytes to make this "work".
> 
>> [...]
> 
> It's a little confusing to set:
> 
> 	wk = &split_lock_reenable_sem;
> 
> and then not use it.
> 
> I'd probably set it below the lock check and return.
> 
>> +	} else
>> +		wk = &split_lock_reenable;
> 
> Brackets, please:
> 
> 	} else {
> 		wk = &split_lock_reenable;
> 	}
> 
> (if you keep this hunk).
> 

But then we're back to discussing the approach of multiple delayed works.

I guess I prefer your idea of passing the state and have a single one,
will do this in V3 OK? If you or anybody else disagrees and prefer the
approach of 2 workers, let me know.

Cheers,


Guilherme
  
Dave Hansen Oct. 21, 2022, 7:07 p.m. UTC | #12
On 10/21/22 12:03, Guilherme G. Piccoli wrote:
...
>> 	if (need_release_sem)
>> 		up(...);
>>
>> That's nice and compact.  It's also logically easy to follow because you
>> can see how the need_release_sem gets set only after the
>> down_interruptible().  It's also nice to have both sites share the
>> 'need_release_sem' naming for grepping.
>>
> 
> ...but, this is a very good suggestion, and will eliminate the need for
> two delayed_works, right?

Yes, that too.
  

Patch

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index ee6572b1edad..508952e42914 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1298,6 +1298,24 @@  watchdog work to be queued by the watchdog timer function, otherwise the NMI
 watchdog — if enabled — can detect a hard lockup condition.
 
 
+split_lock_mitigate (x86 only)
+=============
+
+For x86 CPUs supporting the split lock detection mechanism, this parameter
+allows the users to turn off what is called "the misery mode", which
+introduces intentional delay in userspace applications that split locks.
+The goal of the misery mode is to prevent using such unaligned access to
+DoS the system dropping the performance overall, but some of these split
+locking programs are legacy and/or proprietary software that cannot be fixed,
+so using this sysctl is a way to allow them to run with a decent performance.
+
+= ===================================================================
+0 Disables the misery mode - just warns the split lock on kernel log.
+1 Enables the misery mode (this is the default) - penalizes the split
+  lockers with intentional performance degradation.
+= ===================================================================
+
+
 stack_erasing
 =============
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2d7ea5480ec3..2aacf9d6c723 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1034,8 +1034,32 @@  static const struct {
 
 static struct ratelimit_state bld_ratelimit;
 
+static unsigned int sysctl_sld_mitigate = 1;
 static DEFINE_SEMAPHORE(buslock_sem);
 
+#ifdef CONFIG_PROC_SYSCTL
+static struct ctl_table sld_sysctls[] = {
+	{
+		.procname       = "split_lock_mitigate",
+		.data           = &sysctl_sld_mitigate,
+		.maxlen         = sizeof(unsigned int),
+		.mode           = 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.extra1         = SYSCTL_ZERO,
+		.extra2         = SYSCTL_ONE,
+	},
+	{}
+};
+
+static int __init sld_mitigate_sysctl_init(void)
+{
+	register_sysctl_init("kernel", sld_sysctls);
+	return 0;
+}
+
+late_initcall(sld_mitigate_sysctl_init);
+#endif
+
 static inline bool match_option(const char *arg, int arglen, const char *opt)
 {
 	int len = strlen(opt), ratelimit;
@@ -1146,11 +1170,18 @@  static void split_lock_init(void)
 		split_lock_verify_msr(sld_state != sld_off);
 }
 
-static void __split_lock_reenable(struct work_struct *work)
+static void __split_lock_reenable_sem(struct work_struct *work)
 {
 	sld_update_msr(true);
 	up(&buslock_sem);
 }
+static DECLARE_DELAYED_WORK(split_lock_reenable_sem, __split_lock_reenable_sem);
+
+static void __split_lock_reenable(struct work_struct *work)
+{
+	sld_update_msr(true);
+}
+static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable);
 
 /*
  * If a CPU goes offline with pending delayed work to re-enable split lock
@@ -1169,10 +1200,9 @@  static int splitlock_cpu_offline(unsigned int cpu)
 	return 0;
 }
 
-static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable);
-
 static void split_lock_warn(unsigned long ip)
 {
+	struct delayed_work *wk;
 	int cpu;
 
 	if (!current->reported_split_lock)
@@ -1180,14 +1210,25 @@  static void split_lock_warn(unsigned long ip)
 				    current->comm, current->pid, ip);
 	current->reported_split_lock = 1;
 
-	/* misery factor #1, sleep 10ms before trying to execute split lock */
-	if (msleep_interruptible(10) > 0)
-		return;
-	/* Misery factor #2, only allow one buslocked disabled core at a time */
-	if (down_interruptible(&buslock_sem) == -EINTR)
-		return;
+	if (sysctl_sld_mitigate) {
+		/*
+		 * misery factor #1:
+		 * sleep 10ms before trying to execute split lock.
+		 */
+		if (msleep_interruptible(10) > 0)
+			return;
+		/*
+		 * Misery factor #2:
+		 * only allow one buslocked disabled core at a time.
+		 */
+		wk = &split_lock_reenable_sem;
+		if (down_interruptible(&buslock_sem) == -EINTR)
+			return;
+	} else
+		wk = &split_lock_reenable;
+
 	cpu = get_cpu();
-	schedule_delayed_work_on(cpu, &split_lock_reenable, 2);
+	schedule_delayed_work_on(cpu, wk, 2);
 
 	/* Disable split lock detection on this CPU to make progress */
 	sld_update_msr(false);