Message ID | 20240203-arm64-sve-ptrace-regset-size-v1-1-2c3ba1386b9e@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-51041-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:9bc1:b0:106:209c:c626 with SMTP id op1csp987561dyc; Sat, 3 Feb 2024 04:19:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IEsCYDEofAn6nUuxPG9f3hPENohzAFeNYrPmepC8uwX8grN7KUTMsrfEe4AGOzSGC1xcCZF X-Received: by 2002:a05:620a:1910:b0:785:4786:e6db with SMTP id bj16-20020a05620a191000b007854786e6dbmr7838448qkb.2.1706962788874; Sat, 03 Feb 2024 04:19:48 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706962788; cv=pass; d=google.com; s=arc-20160816; b=z6PLWGCUi+IW+YMJhgaVmVD+zv8a4q93Oz/lufNyzoWYm4H30Z7HXK12wIUcxQjc80 i/oe2ur+vrnICHypth2Fouy3dXLMZnT+pVqLrDTveayoUjgGyn8rqm2EVbWpTFmq3SHz J4oPP63PeLH8vU1AAv8WQOczTplN74DrrMeGywVs5MKV41gFdiFt6WG+pyfrx+1j/OEd Q119+kviqDxpPThOUnWHXXApQNz7gB1J48JTvODwAc74NHhIliWW9xTa7TdNyGNu6TOK q80D47Y2rQnkh/bijXoLRuTAcze6GxTVxjxuAgxl+CAKKGyUjFAueDfrV1KcwU0OqNPH L66w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=o7DyLTk5Rb9T1c66HHRXr40jIY1n7OKpySmwCEA64GM=; fh=EWyfgM2QjzAcpTV2/UOcQgXxsAo6mFCsMF+bhWYQucM=; b=kjH3lEz9xaWyTjCxTQpGVEJ7y1QLAjWc4t5S+XMzAz0eNKSpb/TaFf3wPlVx/o+Lsg d2Vq28zbt3yNOH5LYulL295Aej3xA9e1/dScdPY0hkqIL5VbdLGyp+Ocl/VbgoAHnb+x gUA/cWzE67mKZJPlDLVUI1GbnFTLq+1Wpm1pUaCA7mgSVcn6kBepZdR6cmDGVPpmOJjo 5OUHXbZ638W6bdbbAcj0uo8lShsOtnyVqiY6i9GUlyFS6t4HKKM/fqEXUljdSfJTGmIZ 0HrgUBUt1Bay9ikuAOp5Y6+GgnaONl5SVvkwyMGq2fDwSQ0nK+PhDr9o8KlUt5q3yQva mmug==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=b3jUk3jE; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-51041-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-51041-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=1; AJvYcCW28dymG+DTv6x0UIiNCSoB/MKDX4jXYmHv1Ja6FFNTiwEeLcrl6qpZmWykqHe4M+ns5BjSq1JeO0iBMWPBXNaaGZLIew== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id h26-20020a05620a21da00b007848531e3bfsi4127607qka.563.2024.02.03.04.19.48 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 03 Feb 2024 04:19:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-51041-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=b3jUk3jE; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-51041-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-51041-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 96EED1C215D9 for <ouuuleilei@gmail.com>; Sat, 3 Feb 2024 12:19:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7D5825DF0E; Sat, 3 Feb 2024 12:17:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="b3jUk3jE" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0CEE4C7E for <linux-kernel@vger.kernel.org>; Sat, 3 Feb 2024 12:17:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706962676; cv=none; b=jgwfkNj8F0QW0smuFs8sc0ETJMccUesPi6XAuTg6lr4VMNTyyg9m2ctrpD+5ybdnKeBlkpaNyE78SINizG6A99jRbvFEr/apAZ49uu2mdtXpHaJXMpxKDezQ72R+pKfrq417xrnWp+DHUHhvdB9AMWBs6HjCO7+uayewEF6i6k4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706962676; c=relaxed/simple; bh=vy+gCofdcx6ll8+v76H/qPs2ENfC5pAt7yrvH4goBis=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=TEzpRB6+0o2+v10iowuhxq8ZpphhA5Yoi/wFQUfUeCyTopwYcOQXwsujTCHiTbNsBbeege0gRrwFbMcXjeldtIgnu6mn0W5kGuXgBSCQrFlciy+J90mqZj4HHxBO6AOESQbVqfLphHThUOa3CHfYJt14b2BcLsQsH7jCaJ3/LNM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=b3jUk3jE; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5FE8FC433F1; Sat, 3 Feb 2024 12:17:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706962676; bh=vy+gCofdcx6ll8+v76H/qPs2ENfC5pAt7yrvH4goBis=; h=From:Date:Subject:To:Cc:From; b=b3jUk3jED00JB4d5ZmPLNOyP7PzXe0rsAAE5eTu1E1dcIA8DWQEKfaAzIrpcaYIIE Tzcg2CQv+SsSHq4iUWMvkbFe/boWL9GKTvOc0oWzq66kpntoQRo+XPHxUptLuWJgXN b/JLuDbLJIDSddt+pUY0qgm74Kz7c/A0H3a4F0FQtQM22d17UtNBLPd1EWk0c/3hU3 EUqABCSUNtJTw9E1dniPW1FH1/bAx6ywPv1xn4wcgtqJAenCY46kjcwvoFS5Sa4JjQ PBDNdMkYKYz7F2ByG7ppCfBNXvTMnJwGavjKZLOb9UhnjXjInBr7TCsvHDaQwIJgwX Aeq8kS21P7EyA== From: Mark Brown <broonie@kernel.org> Date: Sat, 03 Feb 2024 12:16:49 +0000 Subject: [PATCH] arm64/sve: Lower the maximum allocation for the SVE ptrace regset Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240203-arm64-sve-ptrace-regset-size-v1-1-2c3ba1386b9e@kernel.org> X-B4-Tracking: v=1; b=H4sIALAuvmUC/x3MQQqEMAxG4atI1hNoi4j1KsMsqv5qFjqSiIji3 S1uHnybd5FBBUZNcZFiF5P/kuE/BXVTWkaw9NkUXChdDiedq5JtB6+bpg6sGA0bm5zg4FsXQx2 rCE95sSoGOd7993ffDz8dWJRuAAAA To: Will Deacon <will@kernel.org>, Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Martin <Dave.Martin@arm.com>, Oleg Nesterov <oleg@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Doug Anderson <dianders@chromium.org>, Mark Brown <broonie@kernel.org> X-Mailer: b4 0.13-dev-a684c X-Developer-Signature: v=1; a=openpgp-sha256; l=3817; i=broonie@kernel.org; h=from:subject:message-id; bh=vy+gCofdcx6ll8+v76H/qPs2ENfC5pAt7yrvH4goBis=; b=owEBbQGS/pANAwAKASTWi3JdVIfQAcsmYgBlvi7oIzODfqoTowIcCe0ki5UVSh/liF1VN7Wy4MNj T8lOzlGJATMEAAEKAB0WIQSt5miqZ1cYtZ/in+ok1otyXVSH0AUCZb4u6AAKCRAk1otyXVSH0BnmB/ oCvkdlZlBANeWVDPRv1h2mmoJStN4LjG3Hm7mUn095rc5PRLvvFqaOpOvHst2WtJ3CaFypw2a1hE2G 0k2PKlQJJqdzA/0wJPun5esVVS11Sy5lIy36KWMiUWVNh1uAJdAuSX30seGw2J3jAzlDCBJ2WmMYkV Q8bzdocPTcVjWvhZXk5Nad4lxh47ERNJbaZKW/T2RlYZxaaxXGNqezOlHo+7EkjGvPB9wI86Z4EUX8 Q/VNbxDkOWYu9YeE0ffzoCnYxL8xivKvwNDoGA4IxFFXVi3/maO35o6DeI1Qu1IS01viFoI4f4eCca lQ5ZLbHUqWG5+jdHLabKFzLMMkun9X X-Developer-Key: i=broonie@kernel.org; a=openpgp; fpr=3F2568AAC26998F9E813A1C5C3F436CA30F5D8EB X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789880212889358497 X-GMAIL-MSGID: 1789880212889358497 |
Series |
arm64/sve: Lower the maximum allocation for the SVE ptrace regset
|
|
Commit Message
Mark Brown
Feb. 3, 2024, 12:16 p.m. UTC
Doug Anderson observed that ChromeOS crashes are being reported which
include failing allocations of order 7 during core dumps due to ptrace
allocating storage for regsets:
chrome: page allocation failure: order:7,
mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=urgent,mems_allowed=0
...
regset_get_alloc+0x1c/0x28
elf_core_dump+0x3d8/0xd8c
do_coredump+0xeb8/0x1378
with further investigation showing that this is:
[ 66.957385] DOUG: Allocating 279584 bytes
which is the maximum size of the SVE regset. As Doug observes it is not
entirely surprising that such a large allocation of contiguous memory might
fail on a long running system.
The SVE regset is currently sized to hold SVE registers with a VQ of
SVE_VQ_MAX which is 512, substantially more than the architectural maximum
of 16 which we might see even in a system emulating the limits of the
architecture. Since we don't expose the size we tell the regset core
externally let's define ARCH_SVE_VQ_MAX with the actual architectural
maximum and use that for the regset, we'll still overallocate most of the
time but much less so which will be helpful even if the core is fixed to
not require contiguous allocations.
We could also teach the ptrace core about runtime discoverable regset sizes
but that would be a more invasive change and this is being observed in
practical systems.
Reported-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
We should probably also use the actual architectural limit for the
bitmasks we use in the VL enumeration code, though that's both a little
bit more involved and less immediately a problem.
---
arch/arm64/include/asm/fpsimd.h | 10 +++++-----
arch/arm64/kernel/ptrace.c | 3 ++-
2 files changed, 7 insertions(+), 6 deletions(-)
---
base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3
change-id: 20240202-arm64-sve-ptrace-regset-size-21b0928969e1
Best regards,
Comments
Hi, On Sat, Feb 3, 2024 at 4:18 AM Mark Brown <broonie@kernel.org> wrote: > > Doug Anderson observed that ChromeOS crashes are being reported which > include failing allocations of order 7 during core dumps due to ptrace > allocating storage for regsets: > > chrome: page allocation failure: order:7, > mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), > nodemask=(null),cpuset=urgent,mems_allowed=0 > ... > regset_get_alloc+0x1c/0x28 > elf_core_dump+0x3d8/0xd8c > do_coredump+0xeb8/0x1378 > > with further investigation showing that this is: > > [ 66.957385] DOUG: Allocating 279584 bytes > > which is the maximum size of the SVE regset. As Doug observes it is not > entirely surprising that such a large allocation of contiguous memory might > fail on a long running system. > > The SVE regset is currently sized to hold SVE registers with a VQ of > SVE_VQ_MAX which is 512, substantially more than the architectural maximum > of 16 which we might see even in a system emulating the limits of the > architecture. Since we don't expose the size we tell the regset core > externally let's define ARCH_SVE_VQ_MAX with the actual architectural > maximum and use that for the regset, we'll still overallocate most of the > time but much less so which will be helpful even if the core is fixed to > not require contiguous allocations. > > We could also teach the ptrace core about runtime discoverable regset sizes > but that would be a more invasive change and this is being observed in > practical systems. > > Reported-by: Doug Anderson <dianders@chromium.org> > Signed-off-by: Mark Brown <broonie@kernel.org> Confirmed that when I send a "quit" signal to Chrome now that the allocation I see for "core_note_type" NT_ARM_SVE goes down from 279,584 bytes (n=17474) to just 8,768 bytes (n=548). I'm not intimately familiar with this code so I'll skip the Reviewed-by unless someone thinks it would be valuable for me to analyze more. I think there are already plenty of people who know this well, so for now, just: Tested-by: Douglas Anderson <dianders@chromium.org>
On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote: > Doug Anderson observed that ChromeOS crashes are being reported which > include failing allocations of order 7 during core dumps due to ptrace > allocating storage for regsets: > > chrome: page allocation failure: order:7, > mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), > nodemask=(null),cpuset=urgent,mems_allowed=0 > ... > regset_get_alloc+0x1c/0x28 > elf_core_dump+0x3d8/0xd8c > do_coredump+0xeb8/0x1378 > > with further investigation showing that this is: > > [ 66.957385] DOUG: Allocating 279584 bytes > > which is the maximum size of the SVE regset. As Doug observes it is not > entirely surprising that such a large allocation of contiguous memory might > fail on a long running system. > > The SVE regset is currently sized to hold SVE registers with a VQ of > SVE_VQ_MAX which is 512, substantially more than the architectural maximum > of 16 which we might see even in a system emulating the limits of the > architecture. Since we don't expose the size we tell the regset core > externally let's define ARCH_SVE_VQ_MAX with the actual architectural > maximum and use that for the regset, we'll still overallocate most of the > time but much less so which will be helpful even if the core is fixed to > not require contiguous allocations. > > We could also teach the ptrace core about runtime discoverable regset sizes > but that would be a more invasive change and this is being observed in > practical systems. This is not hard at all: see 27e64b4be4b8 ("regset: Add support for dynamically sized regsets") But since this is precisely what was ripped out, I guess adding it back may be controversial (?) > > Reported-by: Doug Anderson <dianders@chromium.org> > Signed-off-by: Mark Brown <broonie@kernel.org> > --- > We should probably also use the actual architectural limit for the > bitmasks we use in the VL enumeration code, though that's both a little > bit more involved and less immediately a problem. Since these masks are 64 bytes each and rarely accessed, it seemed pointless complexity to make them resizeable... > --- > arch/arm64/include/asm/fpsimd.h | 10 +++++----- > arch/arm64/kernel/ptrace.c | 3 ++- > 2 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h > index 50e5f25d3024..cf5f31181bc8 100644 > --- a/arch/arm64/include/asm/fpsimd.h > +++ b/arch/arm64/include/asm/fpsimd.h > @@ -62,12 +62,12 @@ static inline void cpacr_restore(unsigned long cpacr) > * When we defined the maximum SVE vector length we defined the ABI so > * that the maximum vector length included all the reserved for future > * expansion bits in ZCR rather than those just currently defined by > - * the architecture. While SME follows a similar pattern the fact that > - * it includes a square matrix means that any allocations that attempt > - * to cover the maximum potential vector length (such as happen with > - * the regset used for ptrace) end up being extremely large. Define > - * the much lower actual limit for use in such situations. > + * the architecture. Using this length to allocate worst size buffers > + * results in excessively large allocations, and this effect is even > + * more pronounced for SME due to ZA. Define more suitable VLs for > + * these situations. > */ > +#define ARCH_SVE_VQ_MAX 16 > #define SME_VQ_MAX 16 Ack, though part of the reason for not doing this was to discourage people from allocating statically sized buffers in general. If the kernel is now juggling two #defines for the maximum vector size, this feels like it may seed bitrot... > > struct task_struct; > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > index dc6cf0e37194..e3bef38fc2e2 100644 > --- a/arch/arm64/kernel/ptrace.c > +++ b/arch/arm64/kernel/ptrace.c > @@ -1500,7 +1500,8 @@ static const struct user_regset aarch64_regsets[] = { > #ifdef CONFIG_ARM64_SVE > [REGSET_SVE] = { /* Scalable Vector Extension */ > .core_note_type = NT_ARM_SVE, > - .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), > + .n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX, > + SVE_PT_REGS_SVE), > SVE_VQ_BYTES), > .size = SVE_VQ_BYTES, > .align = SVE_VQ_BYTES, > > --- > base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3 > change-id: 20240202-arm64-sve-ptrace-regset-size-21b0928969e1 > > Best regards, > -- > Mark Brown <broonie@kernel.org> This looks reasonable, but patching the .n fields doesn't look too bad either; I'll post a patch separately for comparison. [...] Cheers ---Dave
On Mon, Feb 05, 2024 at 05:11:59PM +0000, Dave Martin wrote: > On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote: > > We could also teach the ptrace core about runtime discoverable regset sizes > > but that would be a more invasive change and this is being observed in > > practical systems. > This is not hard at all: see > 27e64b4be4b8 ("regset: Add support for dynamically sized regsets") > But since this is precisely what was ripped out, I guess adding it back > may be controversial (?) Also just that people might want to backport and while it's not super *hard* I tend to prefer to do something as minimal as possible as a fix, the less we do the less the chances that we mess up. > > We should probably also use the actual architectural limit for the > > bitmasks we use in the VL enumeration code, though that's both a little > > bit more involved and less immediately a problem. > Since these masks are 64 bytes each and rarely accessed, it seemed > pointless complexity to make them resizeable... I was suggesting making them use the architectural maximum rather than making them dynamic. > > +#define ARCH_SVE_VQ_MAX 16 > > #define SME_VQ_MAX 16 > Ack, though part of the reason for not doing this was to discourage > people from allocating statically sized buffers in general. I was going to do a patch adding a comment to the header noting that this is not actually the architectural maximum since at present it's a bit of a landmine, people who have some idea of the architecture likely have a rough idea what sort of allocation size is needed for the maximum SVE state and are likely to not double check the value provided (I think that's what happened with the refactoring to remove the dynamic sizing). A comment in the header is still very missable but it'd be something. > If the kernel is now juggling two #defines for the maximum vector size, > this feels like it may seed bitrot... Ideally we'd just not have the existing define externally but it's there and it's been used.
On Mon, Feb 05, 2024 at 05:41:47PM +0000, Mark Brown wrote: > On Mon, Feb 05, 2024 at 05:11:59PM +0000, Dave Martin wrote: > > On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote: > > > > We could also teach the ptrace core about runtime discoverable regset sizes > > > but that would be a more invasive change and this is being observed in > > > practical systems. > > > This is not hard at all: see > > 27e64b4be4b8 ("regset: Add support for dynamically sized regsets") > > > But since this is precisely what was ripped out, I guess adding it back > > may be controversial (?) > > Also just that people might want to backport and while it's not super > *hard* I tend to prefer to do something as minimal as possible as a fix, > the less we do the less the chances that we mess up. Totally agree with that: the core code has now gone in another direction, so we should consider that a done deal. I'm just using the old patch as an illustration of the (low) level of complexity. > > > We should probably also use the actual architectural limit for the > > > bitmasks we use in the VL enumeration code, though that's both a little > > > bit more involved and less immediately a problem. > > > Since these masks are 64 bytes each and rarely accessed, it seemed > > pointless complexity to make them resizeable... > > I was suggesting making them use the architectural maximum rather than > making them dynamic. > > > > +#define ARCH_SVE_VQ_MAX 16 > > > #define SME_VQ_MAX 16 > > > Ack, though part of the reason for not doing this was to discourage > > people from allocating statically sized buffers in general. > > I was going to do a patch adding a comment to the header noting that > this is not actually the architectural maximum since at present it's > a bit of a landmine, people who have some idea of the architecture > likely have a rough idea what sort of allocation size is needed for the > maximum SVE state and are likely to not double check the value provided > (I think that's what happened with the refactoring to remove the dynamic > sizing). A comment in the header is still very missable but it'd be > something. > > > If the kernel is now juggling two #defines for the maximum vector size, > > this feels like it may seed bitrot... > > Ideally we'd just not have the existing define externally but it's there > and it's been used. To clarify, is this intended as a temporary band-aid against silly behaviour while a cleaner solution is found, or a permanent limitation? We'd need to change various things if the architectural max VL actually grew, so no forward-portability is lost immediately if the kernel adopts 16 internally, but I'm still a little concerned that people may poke about in the kernel code as a reference and this will muddy the waters regarding how to do the right thing in userspace (I know people shouldn't, but...) [...] Cheers ---Dave
On Wed, Feb 07, 2024 at 12:23:56PM +0000, Dave Martin wrote: > On Mon, Feb 05, 2024 at 05:41:47PM +0000, Mark Brown wrote: > > On Mon, Feb 05, 2024 at 05:11:59PM +0000, Dave Martin wrote: > > > If the kernel is now juggling two #defines for the maximum vector size, > > > this feels like it may seed bitrot... > > Ideally we'd just not have the existing define externally but it's there > > and it's been used. > To clarify, is this intended as a temporary band-aid against silly > behaviour while a cleaner solution is found, or a permanent limitation? Ideally we'd just make everything dynamic, other than the regset issue and the bitmasks used for VL enumeration we're there already. Making the bitmasks dynamically sized is more painful but are also doing enumeration that userspace doesn't need to do. > We'd need to change various things if the architectural max VL actually > grew, so no forward-portability is lost immediately if the kernel > adopts 16 internally, but I'm still a little concerned that people may > poke about in the kernel code as a reference and this will muddy the > waters regarding how to do the right thing in userspace (I know people > shouldn't, but...) I think if we fix the ptrace regset issue we're doing a good enough job of just using fully dynamic sizing with no limits other than what's been enumerated there. We could possibly deal with the enumberation code by changing it to use ZCR/SMCR_ELx_LEN_ based defines so that it's obviously coming from what we can possibly write to the register but it's a bit less clear how to do that neatly.
On Wed, Feb 07, 2024 at 01:09:51PM +0000, Mark Brown wrote: > On Wed, Feb 07, 2024 at 12:23:56PM +0000, Dave Martin wrote: > > On Mon, Feb 05, 2024 at 05:41:47PM +0000, Mark Brown wrote: > > > On Mon, Feb 05, 2024 at 05:11:59PM +0000, Dave Martin wrote: > > > > > If the kernel is now juggling two #defines for the maximum vector size, > > > > this feels like it may seed bitrot... > > > > Ideally we'd just not have the existing define externally but it's there > > > and it's been used. > > > To clarify, is this intended as a temporary band-aid against silly > > behaviour while a cleaner solution is found, or a permanent limitation? > > Ideally we'd just make everything dynamic, other than the regset issue > and the bitmasks used for VL enumeration we're there already. Making > the bitmasks dynamically sized is more painful but are also doing > enumeration that userspace doesn't need to do. For the bitmasks, we'd be saving (512 - 16) / 8 = 62 bytes for each of SVE and SME (I think). The tradeoff really didn't seem worth it... > > > We'd need to change various things if the architectural max VL actually > > grew, so no forward-portability is lost immediately if the kernel > > adopts 16 internally, but I'm still a little concerned that people may > > poke about in the kernel code as a reference and this will muddy the > > waters regarding how to do the right thing in userspace (I know people > > shouldn't, but...) > > I think if we fix the ptrace regset issue we're doing a good enough job > of just using fully dynamic sizing with no limits other than what's been > enumerated there. We could possibly deal with the enumberation code by > changing it to use ZCR/SMCR_ELx_LEN_ based defines so that it's > obviously coming from what we can possibly write to the register but > it's a bit less clear how to do that neatly. OK, but we still seem to have two competing approaches: clamp SVE_VQ_MAX for kernel internal purposes, or restore the dynamic sizing of NT_ARM_SVE based on the new regset core behaviour. Are you saying we should or both, or otherwise which one? Cheers ---Dave
On Wed, Feb 07, 2024 at 01:51:42PM +0000, Dave Martin wrote: > On Wed, Feb 07, 2024 at 01:09:51PM +0000, Mark Brown wrote: > > I think if we fix the ptrace regset issue we're doing a good enough job > > of just using fully dynamic sizing with no limits other than what's been > > enumerated there. We could possibly deal with the enumberation code by > > changing it to use ZCR/SMCR_ELx_LEN_ based defines so that it's > > obviously coming from what we can possibly write to the register but > > it's a bit less clear how to do that neatly. > OK, but we still seem to have two competing approaches: clamp SVE_VQ_MAX > for kernel internal purposes, or restore the dynamic sizing of > NT_ARM_SVE based on the new regset core behaviour. > Are you saying we should or both, or otherwise which one? I'd like to remove uses of SVE_VQ_MAX, to the extent possible by making things dynamic but if not by not using it I would like to not use SVE_VQ_MAX partly due to the considerations you mentioned about people picking up from looking at the kernel source that it's a good idea.
Hi Doug, On Mon, Feb 05, 2024 at 09:02:08AM -0800, Doug Anderson wrote: > Hi, > > On Sat, Feb 3, 2024 at 4:18 AM Mark Brown <broonie@kernel.org> wrote: > > > > Doug Anderson observed that ChromeOS crashes are being reported which > > include failing allocations of order 7 during core dumps due to ptrace > > allocating storage for regsets: > > > > chrome: page allocation failure: order:7, > > mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), > > nodemask=(null),cpuset=urgent,mems_allowed=0 > > ... > > regset_get_alloc+0x1c/0x28 > > elf_core_dump+0x3d8/0xd8c > > do_coredump+0xeb8/0x1378 > > > > with further investigation showing that this is: > > > > [ 66.957385] DOUG: Allocating 279584 bytes > > > > which is the maximum size of the SVE regset. As Doug observes it is not > > entirely surprising that such a large allocation of contiguous memory might > > fail on a long running system. > > > > The SVE regset is currently sized to hold SVE registers with a VQ of > > SVE_VQ_MAX which is 512, substantially more than the architectural maximum > > of 16 which we might see even in a system emulating the limits of the > > architecture. Since we don't expose the size we tell the regset core > > externally let's define ARCH_SVE_VQ_MAX with the actual architectural > > maximum and use that for the regset, we'll still overallocate most of the > > time but much less so which will be helpful even if the core is fixed to > > not require contiguous allocations. > > > > We could also teach the ptrace core about runtime discoverable regset sizes > > but that would be a more invasive change and this is being observed in > > practical systems. > > > > Reported-by: Doug Anderson <dianders@chromium.org> > > Signed-off-by: Mark Brown <broonie@kernel.org> > > Confirmed that when I send a "quit" signal to Chrome now that the > allocation I see for "core_note_type" NT_ARM_SVE goes down from > 279,584 bytes (n=17474) to just 8,768 bytes (n=548). I'm not > intimately familiar with this code so I'll skip the Reviewed-by unless > someone thinks it would be valuable for me to analyze more. I think > there are already plenty of people who know this well, so for now, > just: > > Tested-by: Douglas Anderson <dianders@chromium.org> I can pick this up as a short-term hack if it solves the problem for you, but I also saw that you posted: https://lore.kernel.org/r/20240205092626.v2.1.Id9ad163b60d21c9e56c2d686b0cc9083a8ba7924@changeid to fallback onto vmalloc() for large allocations. What's your preference for a fix? Cheers, Will
On Fri, Feb 09, 2024 at 05:11:55PM +0000, Will Deacon wrote: > I can pick this up as a short-term hack if it solves the problem for you, > but I also saw that you posted: > https://lore.kernel.org/r/20240205092626.v2.1.Id9ad163b60d21c9e56c2d686b0cc9083a8ba7924@changeid > to fallback onto vmalloc() for large allocations. > What's your preference for a fix? We need the change kvzalloc() regardless since the ZA regset is just over 64K which is on the big side, I do think that reducing the reported regset size is useful as a fix independently since even with kvzalloc() it's still pointless and rude to ask for such a big allocation and we might reasonably be generating core files or dumping crashing processes under memory pressure. Dave was looking at sizing the regsets dynamically but there's enough going on there that it's not great if people want to pick it up for stable.
On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote: > Doug Anderson observed that ChromeOS crashes are being reported which > include failing allocations of order 7 during core dumps due to ptrace > allocating storage for regsets: > > chrome: page allocation failure: order:7, > mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), > nodemask=(null),cpuset=urgent,mems_allowed=0 > ... > regset_get_alloc+0x1c/0x28 > elf_core_dump+0x3d8/0xd8c > do_coredump+0xeb8/0x1378 > > with further investigation showing that this is: > > [ 66.957385] DOUG: Allocating 279584 bytes > > which is the maximum size of the SVE regset. As Doug observes it is not > entirely surprising that such a large allocation of contiguous memory might > fail on a long running system. > > The SVE regset is currently sized to hold SVE registers with a VQ of > SVE_VQ_MAX which is 512, substantially more than the architectural maximum > of 16 which we might see even in a system emulating the limits of the > architecture. Since we don't expose the size we tell the regset core > externally let's define ARCH_SVE_VQ_MAX with the actual architectural > maximum and use that for the regset, we'll still overallocate most of the > time but much less so which will be helpful even if the core is fixed to > not require contiguous allocations. > > We could also teach the ptrace core about runtime discoverable regset sizes > but that would be a more invasive change and this is being observed in > practical systems. > > Reported-by: Doug Anderson <dianders@chromium.org> > Signed-off-by: Mark Brown <broonie@kernel.org> > --- > We should probably also use the actual architectural limit for the > bitmasks we use in the VL enumeration code, though that's both a little > bit more involved and less immediately a problem. > --- > arch/arm64/include/asm/fpsimd.h | 10 +++++----- > arch/arm64/kernel/ptrace.c | 3 ++- > 2 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h > index 50e5f25d3024..cf5f31181bc8 100644 > --- a/arch/arm64/include/asm/fpsimd.h > +++ b/arch/arm64/include/asm/fpsimd.h > @@ -62,12 +62,12 @@ static inline void cpacr_restore(unsigned long cpacr) > * When we defined the maximum SVE vector length we defined the ABI so > * that the maximum vector length included all the reserved for future > * expansion bits in ZCR rather than those just currently defined by > - * the architecture. While SME follows a similar pattern the fact that > - * it includes a square matrix means that any allocations that attempt > - * to cover the maximum potential vector length (such as happen with > - * the regset used for ptrace) end up being extremely large. Define > - * the much lower actual limit for use in such situations. > + * the architecture. Using this length to allocate worst size buffers > + * results in excessively large allocations, and this effect is even > + * more pronounced for SME due to ZA. Define more suitable VLs for > + * these situations. > */ > +#define ARCH_SVE_VQ_MAX 16 > #define SME_VQ_MAX 16 > > struct task_struct; > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > index dc6cf0e37194..e3bef38fc2e2 100644 > --- a/arch/arm64/kernel/ptrace.c > +++ b/arch/arm64/kernel/ptrace.c > @@ -1500,7 +1500,8 @@ static const struct user_regset aarch64_regsets[] = { > #ifdef CONFIG_ARM64_SVE > [REGSET_SVE] = { /* Scalable Vector Extension */ > .core_note_type = NT_ARM_SVE, > - .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), > + .n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX, > + SVE_PT_REGS_SVE), > SVE_VQ_BYTES), Do we need an actual check somewhere that we don't bust this limit? Since ZCR_ELx_LEN_MASK was changed from 0x1ff to 0xf, it looks like the kernel itself will not generate an overlarge VL, although it feels a bit like this guarantee arrives by accident. Could ARCH_SVE_VQ_MAX be based on ZCR_ELx_LEN_MASK instead? Userspace could specify vl > sve_vl_from_vq(ARCH_SVE_VQ_MAX) in PTRACE_SETREGSET; I'm not sure exactly what happens there. (The original 0x1ff value of ZCR_ELx_LEN_MASK was based on more than just hope, but it does seem appropriate to restrict it now so that it matches the formal architecture, as per commit f171f9e4097d ("arm64/fp: Make SVE and SME length register definition match architecture") ) [...] Cheers ---Dave
On Mon, Feb 12, 2024 at 04:50:49PM +0000, Dave Martin wrote: > On Sat, Feb 03, 2024 at 12:16:49PM +0000, Mark Brown wrote: > > - .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), > > + .n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX, > > + SVE_PT_REGS_SVE), > > SVE_VQ_BYTES), > Do we need an actual check somewhere that we don't bust this limit? .. > Userspace could specify vl > sve_vl_from_vq(ARCH_SVE_VQ_MAX) in > PTRACE_SETREGSET; I'm not sure exactly what happens there. We already have validation against the actual enumerated limits for the system by virtue of setting the vector length to whatever is specified so we'll limit any overly large vector length down to something we can actually support and then reject an attempt to supply register data if we changed the VL from what the user specified. > Since ZCR_ELx_LEN_MASK was changed from 0x1ff to 0xf, it looks like the > kernel itself will not generate an overlarge VL, although it feels a bit > like this guarantee arrives by accident. > Could ARCH_SVE_VQ_MAX be based on ZCR_ELx_LEN_MASK instead? I guess.
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 50e5f25d3024..cf5f31181bc8 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -62,12 +62,12 @@ static inline void cpacr_restore(unsigned long cpacr) * When we defined the maximum SVE vector length we defined the ABI so * that the maximum vector length included all the reserved for future * expansion bits in ZCR rather than those just currently defined by - * the architecture. While SME follows a similar pattern the fact that - * it includes a square matrix means that any allocations that attempt - * to cover the maximum potential vector length (such as happen with - * the regset used for ptrace) end up being extremely large. Define - * the much lower actual limit for use in such situations. + * the architecture. Using this length to allocate worst size buffers + * results in excessively large allocations, and this effect is even + * more pronounced for SME due to ZA. Define more suitable VLs for + * these situations. */ +#define ARCH_SVE_VQ_MAX 16 #define SME_VQ_MAX 16 struct task_struct; diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index dc6cf0e37194..e3bef38fc2e2 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1500,7 +1500,8 @@ static const struct user_regset aarch64_regsets[] = { #ifdef CONFIG_ARM64_SVE [REGSET_SVE] = { /* Scalable Vector Extension */ .core_note_type = NT_ARM_SVE, - .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE), + .n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX, + SVE_PT_REGS_SVE), SVE_VQ_BYTES), .size = SVE_VQ_BYTES, .align = SVE_VQ_BYTES,