Message ID | 20221014180506.211592-1-gpiccoli@igalia.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp314347wrs; Fri, 14 Oct 2022 11:08:47 -0700 (PDT) X-Google-Smtp-Source: AMsMyM74ZTlnIxwrnvxiLEkfpT+tG//p3naAoSETU+B3x68NLZkNXa6cxH11ZslBtT1f19qjfL8/ X-Received: by 2002:a17:907:a46:b0:782:1c1c:8141 with SMTP id be6-20020a1709070a4600b007821c1c8141mr4420222ejc.549.1665770926989; Fri, 14 Oct 2022 11:08:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665770926; cv=none; d=google.com; s=arc-20160816; b=YpemZK2CZ5xQ1uGtYMDfPCmnzNviposYMJUvaDROyKRPGny8h5WTfIojCt7RGo3xKQ KYVDrmPiVG3nWAaUfZ6wFDfmYa6OJDYBCzr1Z2QbvokWKuvrw0uS3VY71c9j59d0/qvu G+v3Bz5OXmFY7PIyOMqtjZYD/DcxkhHkxxZgEWJvEaJ6FtoBvSk6PSWvA57qU3xqInaQ St8xXFT7nW7GcxgI6v/Md9P2np+XTOrfUAn4GEOayodEoOpYtG7S+8sPV4gssiex9SY+ f65b9jYeAFDm3rgLkOQDZO7bobnF72tbY+r1g8SE5P0eaxAx7MBU5zBtKTNttwybO9qi VE6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=OolHboUoYatLDt1nCSapsrjbktDg/Duc9mI4hH+Y1qM=; b=VTxysUoPVcCpVxvtBTzsUx/G+3c1+H1NdMxPHrujz5gmcnjk084lFOEEBO977CEQ9k KAcKfItW7GPwSEqveLiB2xNWLAL1EplUJeCGd4fCVWBNH2JUWbbSnqQhvf8a6Cq+IRpD UNV1zaOzKpM1n+023sktV/Iz1CcwQyu8bK0UABBtvAzpVey3JTcgR1gmAoByybWhJbRn Z1DAg++NTswytt7bvcM4RbA3til2XNbodd2ypMFDb1w8Ipmx3WMRJxqnUtRpPMpTAiEV K7s9GH3iY5VqFkTVnKcX8wQ4PswnxVG5ZInhsheSvtHVIEjHlWRUF8ZS/HXRVOIoOzxB JXjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=LTYl6ujY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w24-20020a17090633d800b0078d20294fe1si2674665eja.168.2022.10.14.11.08.21; Fri, 14 Oct 2022 11:08:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=LTYl6ujY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230392AbiJNSFx (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Fri, 14 Oct 2022 14:05:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229751AbiJNSFu (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 14 Oct 2022 14:05:50 -0400 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D71F8BDF; Fri, 14 Oct 2022 11:05:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=OolHboUoYatLDt1nCSapsrjbktDg/Duc9mI4hH+Y1qM=; b=LTYl6ujYQ5QBsyw2g/9srC4JSo 3BeUtbdYMcIxehrx4hsvF4jmvoE0sHZiifRB1E1Tv6EH/X7qaawhSThn1Tgut59O9a3fF11uupSCz F61A4udFX+1Qdmkg36ptTLMnnNjk4PA0KIyJFg7nC65sVIOOF2g2e+cZOTRx79RrMkB8LbG7vboYv G8kEPVuI3cKtSUeFZXrOTe3Nd3hj1nHQzBstf/sWpBPvRhs6E+lA++gUdWbfsDlWOAwqArhZbX4Xb ryLqb4RG/1LYO5/tj9uEYfvws8aKBFVxOz9pOe6UCk01A7+GAS06Qvlwv5nZUsZBvvl0ptZ5koiiy 8JNwRLdA==; Received: from [179.113.159.85] (helo=localhost) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1ojP3q-001YP6-TM; Fri, 14 Oct 2022 20:05:31 +0200 From: "Guilherme G. Piccoli" <gpiccoli@igalia.com> To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, luto@kernel.org, corbet@lwn.net, linux-doc@vger.kernel.org, kernel-dev@igalia.com, kernel@gpiccoli.net, "Guilherme G. Piccoli" <gpiccoli@igalia.com>, Andre Almeida <andrealmeid@igalia.com>, Fenghua Yu <fenghua.yu@intel.com>, Joshua Ashton <joshua@froggi.es>, Melissa Wen <mwen@igalia.com>, Paul Gofman <pgofman@codeweavers.com>, Pavel Machek <pavel@denx.de>, Pierre-Loup Griffais <pgriffais@valvesoftware.com>, Tony Luck <tony.luck@intel.com>, Zebediah Figura <zfigura@codeweavers.com> Subject: [PATCH V2] x86/split_lock: Add sysctl to control the misery mode Date: Fri, 14 Oct 2022 15:05:06 -0300 Message-Id: <20221014180506.211592-1-gpiccoli@igalia.com> X-Mailer: git-send-email 2.38.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746687415494093869?= X-GMAIL-MSGID: =?utf-8?q?1746687415494093869?= |
Series |
[V2] x86/split_lock: Add sysctl to control the misery mode
|
|
Commit Message
Guilherme G. Piccoli
Oct. 14, 2022, 6:05 p.m. UTC
Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, not only it shows the warn message, but also intentionally
introduces a slowdown (through sleeping plus serialization mechanism)
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.
Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slowdowns the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While is understandable that a malicious application could try a DoS
by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].
So let's add a sysctl to allow controlling the "misery mode" behavior,
as per Thomas suggestion on [2]. This way, users running legacy and/or
proprietary software are allowed to still execute them with a decent
performance while still observe the warning messages on kernel log.
[0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/
[1] https://github.com/doitsujin/dxvk/issues/2938
[2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/
Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
Cc: Andre Almeida <andrealmeid@igalia.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Paul Gofman <pgofman@codeweavers.com>
Cc: Pavel Machek <pavel@denx.de>
Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Zebediah Figura <zfigura@codeweavers.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
V2:
- Switched to sysctl approach following Thomas' suggestion (thanks!).
Andre tested the patch and will comment in this thread - seems everything is
working as expected and we can enable/disable that, affecting the misery mode
as one expects.
I've tried to keep the semaphore's up()/down() calls in-sync/paired, hence
my approach of two delayed tasks, with and without misery.
Reviews / comments are greatly appreciated.
Thanks,
Guilherme
Documentation/admin-guide/sysctl/kernel.rst | 18 ++++++
arch/x86/kernel/cpu/intel.c | 61 +++++++++++++++++----
2 files changed, 69 insertions(+), 10 deletions(-)
Comments
Hi Guilherme, On 10/14/22 15:05, Guilherme G. Piccoli wrote: > Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") > changed the way the split lock detector works when in "warn" mode; > basically, not only it shows the warn message, but also intentionally > introduces a slowdown (through sleeping plus serialization mechanism) > on such task. Based on discussions in [0], seems the warning alone > wasn't enough motivation for userspace developers to fix their > applications. > > Happens that originally the proposal in [0] was to add a new mode > which would warns + slowdown the "split locking" task, keeping the > old warn mode untouched. In the end, that idea was discarded and > the regular/default "warn" mode now slowdowns the applications. This > is quite aggressive with regards proprietary/legacy programs that > basically are unable to properly run in kernel with this change. > While is understandable that a malicious application could try a DoS > by split locking, it seems unacceptable to regress old/proprietary > userspace programs through a default configuration that previously > worked. An example of such breakage was reported in [1]. > > So let's add a sysctl to allow controlling the "misery mode" behavior, > as per Thomas suggestion on [2]. This way, users running legacy and/or > proprietary software are allowed to still execute them with a decent > performance while still observe the warning messages on kernel log. > > [0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/ > > [1] https://github.com/doitsujin/dxvk/issues/2938 > > [2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/ > > Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") > Cc: Andre Almeida <andrealmeid@igalia.com> > Cc: Fenghua Yu <fenghua.yu@intel.com> > Cc: Joshua Ashton <joshua@froggi.es> > Cc: Melissa Wen <mwen@igalia.com> > Cc: Paul Gofman <pgofman@codeweavers.com> > Cc: Pavel Machek <pavel@denx.de> > Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Zebediah Figura <zfigura@codeweavers.com> > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Tested-by: André Almeida <andrealmeid@igalia.com> Tested on an Intel i7-1165G7 using a small benchmarking script, the behavior is effectively reverted when using the sysctl option. Also, you might want to document this option somewhere? Maybe at Documentation/admin-guide/sysctl/kernel.rst
On 14/10/2022 15:17, André Almeida wrote: > [...] > > Tested-by: André Almeida <andrealmeid@igalia.com> > > Tested on an Intel i7-1165G7 using a small benchmarking script, the > behavior is effectively reverted when using the sysctl option. > > Also, you might want to document this option somewhere? Maybe at > Documentation/admin-guide/sysctl/kernel.rst Thanks André! About documenting, it seems it's already done in the patch, right? Cheers, Guilherme
On 10/14/22 15:20, Guilherme G. Piccoli wrote: > On 14/10/2022 15:17, André Almeida wrote: >> [...] >> >> Tested-by: André Almeida <andrealmeid@igalia.com> >> >> Tested on an Intel i7-1165G7 using a small benchmarking script, the >> behavior is effectively reverted when using the sysctl option. >> >> Also, you might want to document this option somewhere? Maybe at >> Documentation/admin-guide/sysctl/kernel.rst > > Thanks André! > > About documenting, it seems it's already done in the patch, right? > Cheers, oops, my bad :)
Looks reasonable. Are these games multi-threaded with split locks happening on multiple CPUs in parallel? If they are, then skipping both the 10ms delay and the serialization is needed. But if split locks are only from one CPU at a time, then possibly it would have been enough to just have this mitigation skip the: + if (msleep_interruptible(10) > 0) + return; Maybe best not to second guess. You have left the default as "mitigation on", so I'm happy. -Tony
On 14/10/2022 15:26, Luck, Tony wrote: > Looks reasonable. > > Are these games multi-threaded with split locks happening on multiple CPUs in parallel? > If they are, then skipping both the 10ms delay and the serialization is needed. > > But if split locks are only from one CPU at a time, then possibly it would have > been enough to just have this mitigation skip the: > > + if (msleep_interruptible(10) > 0) > + return; > > Maybe best not to second guess. You have left the default as "mitigation on", > so I'm happy. > > -Tony Hi Tony, thanks for your review! Some games are indeed multi-threaded, so as you said, I think it's better if we skip the whole thing when the sysctl is off - a kind of fallback to the old "warn" mode before the misery was added heh Cheers, Guilherme
On Fri, Oct 14, 2022 at 03:05:06PM -0300, Guilherme G. Piccoli wrote: > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index ee6572b1edad..508952e42914 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -1298,6 +1298,24 @@ watchdog work to be queued by the watchdog timer function, otherwise the NMI > watchdog — if enabled — can detect a hard lockup condition. > > > +split_lock_mitigate (x86 only) > +============= > + The heading underline above is too short (doesn't cover the whole text length), so I have applied the fixup: ---- >8 ---- diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index c733d424d4e830..4824cfed71ab31 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1315,7 +1315,7 @@ watchdog — if enabled — can detect a hard lockup condition. split_lock_mitigate (x86 only) -============= +============================== For x86 CPUs supporting the split lock detection mechanism, this parameter allows the users to turn off what is called "the misery mode", which > +For x86 CPUs supporting the split lock detection mechanism, this parameter > +allows the users to turn off what is called "the misery mode", which > +introduces intentional delay in userspace applications that split locks. > +The goal of the misery mode is to prevent using such unaligned access to > +DoS the system dropping the performance overall, but some of these split > +locking programs are legacy and/or proprietary software that cannot be fixed, > +so using this sysctl is a way to allow them to run with a decent performance. > + > += =================================================================== > +0 Disables the misery mode - just warns the split lock on kernel log. > +1 Enables the misery mode (this is the default) - penalizes the split > + lockers with intentional performance degradation. > += =================================================================== > + > + > stack_erasing > ============= > The wording can be improved: ---- >8 ---- diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 4824cfed71ab31..961c19f4beae51 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1320,15 +1320,15 @@ split_lock_mitigate (x86 only) For x86 CPUs supporting the split lock detection mechanism, this parameter allows the users to turn off what is called "the misery mode", which introduces intentional delay in userspace applications that split locks. -The goal of the misery mode is to prevent using such unaligned access to -DoS the system dropping the performance overall, but some of these split -locking programs are legacy and/or proprietary software that cannot be fixed, -so using this sysctl is a way to allow them to run with a decent performance. +The goal of this mode is to prevent using such unaligned access to +drop the overall performance (and DoS the system). However, some of programs +which uses split locking are legacy and/or proprietary software that cannot +be fixed, so disabling this sysctl can allow them to run with a decent +performance. = =================================================================== 0 Disables the misery mode - just warns the split lock on kernel log. -1 Enables the misery mode (this is the default) - penalizes the split - lockers with intentional performance degradation. +1 Enables the misery mode (default) = =================================================================== Thanks.
On 10/16/22 10:00, Bagas Sanjaya wrote: > = =================================================================== > 0 Disables the misery mode - just warns the split lock on kernel log. > -1 Enables the misery mode (this is the default) - penalizes the split > - lockers with intentional performance degradation. > +1 Enables the misery mode (default) > = =================================================================== > Oops, on the table above, s/Disables/Disable/ and s/Enables/Enable/.
Thank you Bagas! I see you fixed the documentation in more than one place, appreciate that. What's the next step then, re-submit with your fixes, or wait more feedback perhaps? If a maintainer plans to pick this one, maybe they can just apply your fix-up on top of it. Cheers, Guilherme
On 14/10/2022 15:05, Guilherme G. Piccoli wrote: > Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") > changed the way the split lock detector works when in "warn" mode; > basically, not only it shows the warn message, but also intentionally > introduces a slowdown (through sleeping plus serialization mechanism) > on such task. Based on discussions in [0], seems the warning alone > wasn't enough motivation for userspace developers to fix their > applications. > > Happens that originally the proposal in [0] was to add a new mode > which would warns + slowdown the "split locking" task, keeping the > old warn mode untouched. In the end, that idea was discarded and > the regular/default "warn" mode now slowdowns the applications. This > is quite aggressive with regards proprietary/legacy programs that > basically are unable to properly run in kernel with this change. > While is understandable that a malicious application could try a DoS > by split locking, it seems unacceptable to regress old/proprietary > userspace programs through a default configuration that previously > worked. An example of such breakage was reported in [1]. > > So let's add a sysctl to allow controlling the "misery mode" behavior, > as per Thomas suggestion on [2]. This way, users running legacy and/or > proprietary software are allowed to still execute them with a decent > performance while still observe the warning messages on kernel log. > > [0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/ > > [1] https://github.com/doitsujin/dxvk/issues/2938 > > [2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/ > > Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") > Cc: Andre Almeida <andrealmeid@igalia.com> > Cc: Fenghua Yu <fenghua.yu@intel.com> > Cc: Joshua Ashton <joshua@froggi.es> > Cc: Melissa Wen <mwen@igalia.com> > Cc: Paul Gofman <pgofman@codeweavers.com> > Cc: Pavel Machek <pavel@denx.de> > Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Zebediah Figura <zfigura@codeweavers.com> > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > --- Hi Dave / Thomas, do you think this version is good enough? If so, would be possible to pick it still in the v6.1-rc cycle, since it is a fix? What about the documentation improvements from Bagas, should I re-send (V3) with that, or would you pick them when merging? Thanks in advance, Guilherme
On 10/14/22 11:05, Guilherme G. Piccoli wrote: > Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") > changed the way the split lock detector works when in "warn" mode; > basically, not only it shows the warn message, but also intentionally > introduces a slowdown (through sleeping plus serialization mechanism) > on such task. Based on discussions in [0], seems the warning alone > wasn't enough motivation for userspace developers to fix their > applications. > > Happens that originally the proposal in [0] was to add a new mode > which would warns + slowdown the "split locking" task, keeping the > old warn mode untouched. In the end, that idea was discarded and > the regular/default "warn" mode now slowdowns the applications. This > is quite aggressive with regards proprietary/legacy programs that > basically are unable to properly run in kernel with this change. > While is understandable that a malicious application could try a DoS > by split locking, it seems unacceptable to regress old/proprietary > userspace programs through a default configuration that previously > worked. An example of such breakage was reported in [1]. > > So let's add a sysctl to allow controlling the "misery mode" behavior, > as per Thomas suggestion on [2]. This way, users running legacy and/or > proprietary software are allowed to still execute them with a decent > performance while still observe the warning messages on kernel log. > > [0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/ > > [1] https://github.com/doitsujin/dxvk/issues/2938 > > [2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/ > > Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") > Cc: Andre Almeida <andrealmeid@igalia.com> > Cc: Fenghua Yu <fenghua.yu@intel.com> > Cc: Joshua Ashton <joshua@froggi.es> > Cc: Melissa Wen <mwen@igalia.com> > Cc: Paul Gofman <pgofman@codeweavers.com> > Cc: Pavel Machek <pavel@denx.de> > Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Zebediah Figura <zfigura@codeweavers.com> > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > --- > > > V2: > - Switched to sysctl approach following Thomas' suggestion (thanks!). > > Andre tested the patch and will comment in this thread - seems everything is > working as expected and we can enable/disable that, affecting the misery mode > as one expects. > > I've tried to keep the semaphore's up()/down() calls in-sync/paired, hence > my approach of two delayed tasks, with and without misery. > > Reviews / comments are greatly appreciated. > Thanks, > > > Guilherme > > > Documentation/admin-guide/sysctl/kernel.rst | 18 ++++++ > arch/x86/kernel/cpu/intel.c | 61 +++++++++++++++++---- > 2 files changed, 69 insertions(+), 10 deletions(-) > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index ee6572b1edad..508952e42914 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -1298,6 +1298,24 @@ watchdog work to be queued by the watchdog timer function, otherwise the NMI > watchdog — if enabled — can detect a hard lockup condition. > > > +split_lock_mitigate (x86 only) > +============= > + > +For x86 CPUs supporting the split lock detection mechanism, this parameter > +allows the users to turn off what is called "the misery mode", which > +introduces intentional delay in userspace applications that split locks. > +The goal of the misery mode is to prevent using such unaligned access to > +DoS the system dropping the performance overall, but some of these split > +locking programs are legacy and/or proprietary software that cannot be fixed, > +so using this sysctl is a way to allow them to run with a decent performance. I think this is missing a lot of context. End users looking here won't even know what a split lock *is*. Please either refer over to the real documentation on this issue or write a brief description about what's going on. How about this? On x86, each "split lock" imposes a system-wide performance penalty. On larger systems, large numbers of split locks from unprivileged users can result in denials of service to well- behaved and potentially more important users. The kernel mitigates these bad users by detecting split locks and imposing penalties: forcing them to wait and only allowing one core to execute split locks at a time. These mitigations can make those bad applications unbearably slow. Setting split_lock_mitigate=0 may restore some application performance, but will also increase system exposure to denial of service attacks from split lock users. > += =================================================================== > +0 Disables the misery mode - just warns the split lock on kernel log. ... and exposes the system to Denial-of-Service attacks. That's an awfully big side-effect to not mention. > +1 Enables the misery mode (this is the default) - penalizes the split > + lockers with intentional performance degradation. > += =================================================================== As much as I love the misery terminology, let's try to use one term. Let's either call it "misery" *or* "mitigations", not both. > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c > index 2d7ea5480ec3..2aacf9d6c723 100644 > --- a/arch/x86/kernel/cpu/intel.c > +++ b/arch/x86/kernel/cpu/intel.c > @@ -1034,8 +1034,32 @@ static const struct { > > static struct ratelimit_state bld_ratelimit; > > +static unsigned int sysctl_sld_mitigate = 1; > static DEFINE_SEMAPHORE(buslock_sem); > > +#ifdef CONFIG_PROC_SYSCTL > +static struct ctl_table sld_sysctls[] = { > + { > + .procname = "split_lock_mitigate", > + .data = &sysctl_sld_mitigate, > + .maxlen = sizeof(unsigned int), > + .mode = 0644, > + .proc_handler = proc_douintvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_ONE, > + }, > + {} > +}; > + > +static int __init sld_mitigate_sysctl_init(void) > +{ > + register_sysctl_init("kernel", sld_sysctls); > + return 0; > +} > + > +late_initcall(sld_mitigate_sysctl_init); > +#endif > + > static inline bool match_option(const char *arg, int arglen, const char *opt) > { > int len = strlen(opt), ratelimit; > @@ -1146,11 +1170,18 @@ static void split_lock_init(void) > split_lock_verify_msr(sld_state != sld_off); > } > > -static void __split_lock_reenable(struct work_struct *work) > +static void __split_lock_reenable_sem(struct work_struct *work) > { "sem" is a pretty crummy name. Wouldn't __split_lock_reenable_unlock() be much more clear? > sld_update_msr(true); > up(&buslock_sem); > } > +static DECLARE_DELAYED_WORK(split_lock_reenable_sem, __split_lock_reenable_sem); > + > +static void __split_lock_reenable(struct work_struct *work) > +{ > + sld_update_msr(true); > +} > +static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable); Better yet, do you *really* need two functions and two DECLARE_DELAYED_WORK()'s? You could have a single delayed_work, and then just do: static void split_lock_warn(unsigned long ip) { bool need_release_sem = false; ... if (down_interruptible(&buslock_sem) == -EINTR) return; need_release_sem = true; Then, farther down, you do: split_lock_reenable->data = need_release_sem; schedule_delayed_work_on(cpu, &split_lock_reenable); Then, in the work func: bool need_release_sem = work->data; if (need_release_sem) up(...); That's nice and compact. It's also logically easy to follow because you can see how the need_release_sem gets set only after the down_interruptible(). It's also nice to have both sites share the 'need_release_sem' naming for grepping. > /* > * If a CPU goes offline with pending delayed work to re-enable split lock > @@ -1169,10 +1200,9 @@ static int splitlock_cpu_offline(unsigned int cpu) > return 0; > } > > -static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable); > - > static void split_lock_warn(unsigned long ip) > { > + struct delayed_work *wk; I think we can spare two bytes to make this "work". > int cpu; > > if (!current->reported_split_lock) > @@ -1180,14 +1210,25 @@ static void split_lock_warn(unsigned long ip) > current->comm, current->pid, ip); > current->reported_split_lock = 1; > > - /* misery factor #1, sleep 10ms before trying to execute split lock */ > - if (msleep_interruptible(10) > 0) > - return; > - /* Misery factor #2, only allow one buslocked disabled core at a time */ > - if (down_interruptible(&buslock_sem) == -EINTR) > - return; > + if (sysctl_sld_mitigate) { > + /* > + * misery factor #1: > + * sleep 10ms before trying to execute split lock. > + */ > + if (msleep_interruptible(10) > 0) > + return; > + /* > + * Misery factor #2: > + * only allow one buslocked disabled core at a time. > + */ > + wk = &split_lock_reenable_sem; > + if (down_interruptible(&buslock_sem) == -EINTR) > + return; It's a little confusing to set: wk = &split_lock_reenable_sem; and then not use it. I'd probably set it below the lock check and return. > + } else > + wk = &split_lock_reenable; Brackets, please: } else { wk = &split_lock_reenable; } (if you keep this hunk). > cpu = get_cpu(); > - schedule_delayed_work_on(cpu, &split_lock_reenable, 2); > + schedule_delayed_work_on(cpu, wk, 2); > > /* Disable split lock detection on this CPU to make progress */ > sld_update_msr(false);
Hi Dave, thanks for the thorough review! Comments inline below: On 21/10/2022 14:27, Dave Hansen wrote: > [...] >> +For x86 CPUs supporting the split lock detection mechanism, this parameter >> +allows the users to turn off what is called "the misery mode", which >> +introduces intentional delay in userspace applications that split locks. >> +The goal of the misery mode is to prevent using such unaligned access to >> +DoS the system dropping the performance overall, but some of these split >> +locking programs are legacy and/or proprietary software that cannot be fixed, >> +so using this sysctl is a way to allow them to run with a decent performance. > > I think this is missing a lot of context. End users looking here won't > even know what a split lock *is*. Please either refer over to the real > documentation on this issue or write a brief description about what's > going on. > > How about this? > > On x86, each "split lock" imposes a system-wide performance > penalty. On larger systems, large numbers of split locks from > unprivileged users can result in denials of service to well- > behaved and potentially more important users. > > The kernel mitigates these bad users by detecting split locks > and imposing penalties: forcing them to wait and only allowing > one core to execute split locks at a time. > > These mitigations can make those bad applications unbearably > slow. Setting split_lock_mitigate=0 may restore some > application performance, but will also increase system exposure > to denial of service attacks from split lock users. > >> += =================================================================== >> +0 Disables the misery mode - just warns the split lock on kernel log. > > ... and exposes the system to Denial-of-Service attacks. That's an > awfully big side-effect to not mention. > >> +1 Enables the misery mode (this is the default) - penalizes the split >> + lockers with intentional performance degradation. >> += =================================================================== > > As much as I love the misery terminology, let's try to use one term. > Let's either call it "misery" *or* "mitigations", not both. > OK, regarding the documentation, I'll follow your suggestion in the V3, good stuff. >> [...] >> -static void __split_lock_reenable(struct work_struct *work) >> +static void __split_lock_reenable_sem(struct work_struct *work) >> { > > "sem" is a pretty crummy name. Wouldn't > > __split_lock_reenable_unlock() > > be much more clear? > Agreed... >> [...] > Better yet, do you *really* need two functions and two > DECLARE_DELAYED_WORK()'s? > > You could have a single delayed_work, and then just do: > > static void split_lock_warn(unsigned long ip) > { > bool need_release_sem = false; > ... > > if (down_interruptible(&buslock_sem) == -EINTR) > return; > need_release_sem = true; > > > Then, farther down, you do: > > split_lock_reenable->data = need_release_sem; > schedule_delayed_work_on(cpu, &split_lock_reenable); > > Then, in the work func: > > bool need_release_sem = work->data; > > if (need_release_sem) > up(...); > > That's nice and compact. It's also logically easy to follow because you > can see how the need_release_sem gets set only after the > down_interruptible(). It's also nice to have both sites share the > 'need_release_sem' naming for grepping. > ...but, this is a very good suggestion, and will eliminate the need for two delayed_works, right? >> [...] >> + struct delayed_work *wk; > > I think we can spare two bytes to make this "work". > >> [...] > > It's a little confusing to set: > > wk = &split_lock_reenable_sem; > > and then not use it. > > I'd probably set it below the lock check and return. > >> + } else >> + wk = &split_lock_reenable; > > Brackets, please: > > } else { > wk = &split_lock_reenable; > } > > (if you keep this hunk). > But then we're back to discussing the approach of multiple delayed works. I guess I prefer your idea of passing the state and have a single one, will do this in V3 OK? If you or anybody else disagrees and prefer the approach of 2 workers, let me know. Cheers, Guilherme
On 10/21/22 12:03, Guilherme G. Piccoli wrote: ... >> if (need_release_sem) >> up(...); >> >> That's nice and compact. It's also logically easy to follow because you >> can see how the need_release_sem gets set only after the >> down_interruptible(). It's also nice to have both sites share the >> 'need_release_sem' naming for grepping. >> > > ...but, this is a very good suggestion, and will eliminate the need for > two delayed_works, right? Yes, that too.
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index ee6572b1edad..508952e42914 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1298,6 +1298,24 @@ watchdog work to be queued by the watchdog timer function, otherwise the NMI watchdog — if enabled — can detect a hard lockup condition. +split_lock_mitigate (x86 only) +============= + +For x86 CPUs supporting the split lock detection mechanism, this parameter +allows the users to turn off what is called "the misery mode", which +introduces intentional delay in userspace applications that split locks. +The goal of the misery mode is to prevent using such unaligned access to +DoS the system dropping the performance overall, but some of these split +locking programs are legacy and/or proprietary software that cannot be fixed, +so using this sysctl is a way to allow them to run with a decent performance. + += =================================================================== +0 Disables the misery mode - just warns the split lock on kernel log. +1 Enables the misery mode (this is the default) - penalizes the split + lockers with intentional performance degradation. += =================================================================== + + stack_erasing ============= diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 2d7ea5480ec3..2aacf9d6c723 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1034,8 +1034,32 @@ static const struct { static struct ratelimit_state bld_ratelimit; +static unsigned int sysctl_sld_mitigate = 1; static DEFINE_SEMAPHORE(buslock_sem); +#ifdef CONFIG_PROC_SYSCTL +static struct ctl_table sld_sysctls[] = { + { + .procname = "split_lock_mitigate", + .data = &sysctl_sld_mitigate, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + {} +}; + +static int __init sld_mitigate_sysctl_init(void) +{ + register_sysctl_init("kernel", sld_sysctls); + return 0; +} + +late_initcall(sld_mitigate_sysctl_init); +#endif + static inline bool match_option(const char *arg, int arglen, const char *opt) { int len = strlen(opt), ratelimit; @@ -1146,11 +1170,18 @@ static void split_lock_init(void) split_lock_verify_msr(sld_state != sld_off); } -static void __split_lock_reenable(struct work_struct *work) +static void __split_lock_reenable_sem(struct work_struct *work) { sld_update_msr(true); up(&buslock_sem); } +static DECLARE_DELAYED_WORK(split_lock_reenable_sem, __split_lock_reenable_sem); + +static void __split_lock_reenable(struct work_struct *work) +{ + sld_update_msr(true); +} +static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable); /* * If a CPU goes offline with pending delayed work to re-enable split lock @@ -1169,10 +1200,9 @@ static int splitlock_cpu_offline(unsigned int cpu) return 0; } -static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable); - static void split_lock_warn(unsigned long ip) { + struct delayed_work *wk; int cpu; if (!current->reported_split_lock) @@ -1180,14 +1210,25 @@ static void split_lock_warn(unsigned long ip) current->comm, current->pid, ip); current->reported_split_lock = 1; - /* misery factor #1, sleep 10ms before trying to execute split lock */ - if (msleep_interruptible(10) > 0) - return; - /* Misery factor #2, only allow one buslocked disabled core at a time */ - if (down_interruptible(&buslock_sem) == -EINTR) - return; + if (sysctl_sld_mitigate) { + /* + * misery factor #1: + * sleep 10ms before trying to execute split lock. + */ + if (msleep_interruptible(10) > 0) + return; + /* + * Misery factor #2: + * only allow one buslocked disabled core at a time. + */ + wk = &split_lock_reenable_sem; + if (down_interruptible(&buslock_sem) == -EINTR) + return; + } else + wk = &split_lock_reenable; + cpu = get_cpu(); - schedule_delayed_work_on(cpu, &split_lock_reenable, 2); + schedule_delayed_work_on(cpu, wk, 2); /* Disable split lock detection on this CPU to make progress */ sld_update_msr(false);